1,007 43 25MB
English Pages 1059 [1002] Year 2021
Lecture Notes in Networks and Systems 236
Xin-She Yang Simon Sherratt Nilanjan Dey Amit Joshi Editors
Proceedings of Sixth International Congress on Information and Communication Technology ICICT 2021, London, Volume 2
Lecture Notes in Networks and Systems Volume 236
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Fernando Gomide, Department of Computer Engineering and Automation—DCA, School of Electrical and Computer Engineering—FEEC, University of Campinas— UNICAMP, São Paulo, Brazil Okyay Kaynak, Department of Electrical and Electronic Engineering, Bogazici University, Istanbul, Turkey Derong Liu, Department of Electrical and Computer Engineering, University of Illinois at Chicago, Chicago, USA Institute of Automation, Chinese Academy of Sciences, Beijing, China Witold Pedrycz, Department of Electrical and Computer Engineering, University of Alberta, Alberta, Canada Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Marios M. Polycarpou, Department of Electrical and Computer Engineering, KIOS Research Center for Intelligent Systems and Networks, University of Cyprus, Nicosia, Cyprus Imre J. Rudas, Óbuda University, Budapest, Hungary Jun Wang, Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest developments in Networks and Systems—quickly, informally and with high quality. Original research reported in proceedings and post-proceedings represents the core of LNNS. Volumes published in LNNS embrace all aspects and subfields of, as well as new challenges in, Networks and Systems. The series contains proceedings and edited volumes in systems and networks, spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor Networks, Control Systems, Energy Systems, Automotive Systems, Biological Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems, Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems, Robotics, Social Systems, Economic Systems and other. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution and exposure which enable both a wide and rapid dissemination of research output. The series covers the theory, applications, and perspectives on the state of the art and future developments relevant to systems and networks, decision making, control, complex processes and related areas, as embedded in the fields of interdisciplinary and applied sciences, engineering, computer science, physics, economics, social, and life sciences, as well as the paradigms and methodologies behind them. Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/15179
Xin-She Yang · Simon Sherratt · Nilanjan Dey · Amit Joshi Editors
Proceedings of Sixth International Congress on Information and Communication Technology ICICT 2021, London, Volume 2
Editors Xin-She Yang Middlesex University London, UK
Simon Sherratt University of Reading Reading, UK
Nilanjan Dey JIS University Kolkata, India
Amit Joshi Global Knowledge Research Foundation Ahmedabad, India
ISSN 2367-3370 ISSN 2367-3389 (electronic) Lecture Notes in Networks and Systems ISBN 978-981-16-2379-0 ISBN 978-981-16-2380-6 (eBook) https://doi.org/10.1007/978-981-16-2380-6 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
The Sixth International Congress on Information and Communication Technology was held on February 25–26, 2021, digitally on ZOOM and was organized by Global Knowledge Research Foundation. The associated partners were Springer, SPRINGER NATURE, and InterYIT IFIP. The conference provided a useful and wide platform both for display of the latest research and for exchange of research results and thoughts. The participants of the conference were from almost every part of the world (around 85 countries), with background of either academia or industry, allowing a real multinational multicultural exchange of experiences and ideas. A total of 1150 papers were received for this conference from across 83 countries, among which around 350 papers were accepted and were presented on the digital platform. Due to overwhelming response, we had to drop many papers in a hierarchy of the quality. Totally, 51 technical sessions were organized in parallel in 2 days, and talks were given on both the days. The conference involved deep discussion and issues which are intended to be solved at global levels. New technologies were proposed, experiences were shared and future solutions for design infrastructure for ICT were also discussed. The total papers will be published in 4 volumes of proceedings, among which this is one. The conference consisted of several distinguished authors, scholars, and speakers from all over the world. Amit Joshi, organizing Secretary, ICICT 2021, Sean Holmes, Vice Dean International, College of Business, Arts and Social Sciences, Brunel University London, UK, Mike Hinchey, Immd. Past Chair—IEEE UK and Ireland section & Director of Lero and Professor—Software Engineering, University of Limerick, Ireland, Aninda Bose, Sr. Publishing Editor, Springer Nature, Germany, Xin-She Yang, Professor, Middlesex University, Prof. Jyoti Choudri, Professor, University of Hertfordshire and many were a part of the Inaugural Session and the conference. The conference was organized and conceptualized with collective efforts of a large number of individuals. We would like to thank our committee members and the reviewers for their excellent work in reviewing the papers. Grateful acknowledgements are extended to the team of Global Knowledge Research Foundation for their
v
vi
Preface
valuable efforts and support. We are also thankful to the sponsors, press, print, and electronic media for their excellent coverage of this conference. London, UK Reading, UK Kolkata, India Ahmedabad, India
Xin-She Yang Simon Sherratt Nilanjan Dey Amit Joshi
Contents
Highly Efficient Stochastic Approaches for Computation of Multiple Integrals for European Options . . . . . . . . . . . . . . . . . . . . . . . . . Venelin Todorov, Ivan Dimov, Stoyan Apostolov, and Stoyan Poryazov
1
Spectrum Sensing Data Falsification Attack Reputation and Q-Out-of-M Rule Security Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Velempini Mthulisi, Ngomane Issah, and Mapunya Sekgoari Semaka
11
Lean Manufacturing Tools for Industrial Process: A Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gustavo Caiza, Alexandra Salazar-Moya, Carlos A. Garcia, and Marcelo V. Garcia Lambda Computatrix (LC)—Towards a Computational Enhanced Understanding of Production and Management . . . . . . . . . . . Bernhard Heiden, Bianca Tonino-Heiden, Volodymyr Alieksieiev, Erich Hartlieb, and Denise Foro-Szasz
27
37
Behavioral Analysis of Wireless Channel Under Small-Scale Fading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mridula Korde, Jagdish Kene, and Minal Ghute
47
Towards a Framework to Address Enterprise Resource Planning (ERP) Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stephen Kwame Senaya, John Andrew van der Poll, and Marthie Schoeman
57
Potentials of Digital Business Models in the Construction Industry—Empirical Results from German Experts . . . . . . . . . . . . . . . . . Ralf-Christian Härting, Christopher Reichstein, and Tobias Schüle
73
An Alternative Auction System to Generalized Second-Price for Real-Time Bidding Optimized Using Genetic Algorithms . . . . . . . . . Luis Miralles-Pechuán, Fernando Jiménez, and Josá Manuel García
83
vii
viii
Contents
Low-Cost Fuzzy Control for Poultry Heating Systems . . . . . . . . . . . . . . . . Gustavo Caiza, Cristhian Monta, Paulina Ayala, Javier Caceres, Carlos A. Garcia, and Marcelo V. Garcia Towards Empowering Business Process Redesign with Sentiment Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Selver Softic and Egon Lüftenegger An Integration of UTAUT and Task-Technology Fit Frameworks for Assessing the Acceptance of Clinical Decision Support Systems in the Context of a Developing Country . . . . . . . . . . . . . . . . . . . . . Soliman Aljarboa and Shah J. Miah Research Trends in the Implementation of eModeration Systems: A Systematic Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . Vanitha Rajamany, J. A. van Biljon, and C. J. van Staden From E-Government to Digital Transformation: Leadership . . . . . . . . . . Miguel Cuya and Sussy Bayona-Oré
109
119
127
139 147
Application of Machine Learning Methods on IoT Parking Sensors’ Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dražen Vuk and Darko Androˇcec
157
A Fast Algorithm for Image Deconvolution Based on a Rank Constrained Inverse Matrix Approximation Problem . . . . . . . . . . . . . . . . Pablo Soto-Quiros, Juan Jose Fallas-Monge, and Jeffry Chavarría-Molina
165
On-Body Microstrip Patch Antenna for Breast Cancer Detection . . . . . Sourav Sinha, Sajidur Rahman, Mahajabin Haque Mili, and Fahim Mahmud Machine Learning with Meteorological Variables for the Prediction of the Electric Field in East Lima, Peru . . . . . . . . . . . . Juan J. Soria, Orlando Poma, David A. Sumire, Joel Hugo Fernandez Rojas, and Maycol O. Echevarria
177
191
Enhanced Honeyword Generation Method Using Advanced DNA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nwe Ni Khin and Khin Su Myat Moe
201
A Review: How Does ICT Affect the Health and Well-Being of Teenagers in Developing Countries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Willone Lim, Bee Theng Lau, Caslon Chua, and Fakir M. Amirul Islam
213
Multi-image Crowd Counting Using Multi-column Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . O˘guzhan Kurnaz and Cemal Hanilçi
223
Contents
ix
Which Features Are Helpful? The Antecedents of User Satisfaction and Net Benefits of a Learning Management System (LMS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bernie S. Fabito, Mico C. Magtira, Jessica Nicole Dela Cruz, Ghielyssa D. Intrina, and Shannen Nicole C. Esguerra
233
Performance Analysis of a Neuro-Fuzzy Algorithm in Human-Centered and Non-invasive BCI . . . . . . . . . . . . . . . . . . . . . . . . . . Timothy Scott C. Chu, Alvin Chua, and Emanuele Lindo Secco
241
A Workflow-Based Support for the Automatic Creation and Selection of Energy-Efficient Task-Schedules on DVFS Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ronny Kramer and Gudula Rünger Artificial Intelligence Edge Applications in 5G Networks . . . . . . . . . . . . . Carlota Villasante Marcos A Concept for the Use of Chatbots to Provide the Public with Vital Information in Crisis Situations . . . . . . . . . . . . . . . . . . . . . . . . . . Daniel Staegemann, Matthias Volk, Christian Daase, Matthias Pohl, and Klaus Turowski
253 269
281
Fuzzy Reinforcement Learning Multi-agent System for Comfort and Energy Management in Buildings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Panagiotis Kofinas, Anastasios Dounis, and Panagiotis Korkidis
291
Discrete Markov Model Application for Decision-Making in Stock Investments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oksana Tyvodar and Pylyp Prystavka
311
Howling Noise Cancellation in Time–Frequency Domain by Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Huaguo Gan, Gaoyong Luo, Yaqing Luo, and Wenbin Luo
319
Daily Trajectory Prediction Using Temporal Frequent Pattern Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mingyi Cai, Runze Yan, and Afsaneh Doryab
333
Quick and Dirty Prototyping and Testing for UX Design of Future Robo-Taxi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dokshin Lim and Minhee Lee
345
Iterative Generation of Chow Parameters Using Nearest Neighbor Relations in Threshold Network . . . . . . . . . . . . . . . . . . . . . . . . . . Naohiro Ishii, Kazuya Odagiri, and Tokuro Matsuo
357
Effective Feature Selection Using Ensemble Techniques and Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jayshree Ghorpade-Aher and Balwant Sonkamble
367
x
Contents
A Generalization of Secure Comparison Protocol with Encrypted Output and Its Efficiency Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Takumi Kobayashi and Keisuke Hakuta Conceptualizing Factors that Influence Learners’ Intention to Adopt ICT for Learning in Rural Schools in Developing Countries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Siphe Mhlana, Baldreck Chipangura, and Hossana Twinomurinzi The Innovation Strategy for Citrus Crop Prediction Using Rough Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alessandro Scuderi, Giuseppe Timpanaro, Giovanni La Via, Biagio Pecorino, and Luisa Sturiale
377
391
403
Predicting Traffic Path Recommendation Using Spatiotemporal Graph Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hitendra Shankarrao Khairnar and Balwant Sonkamble
413
Machine Learning and Context-Based Approaches to Get Quality Improved Food Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alexander Muenzberg, Janina Sauer, Andreas Hein, and Norbert Roesch
423
Components of a Digital Transformation Strategy: A South African Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kudzai Mapingire, Hanlie Smuts, and Alta Van der Merwe
437
Evaluation of Face Detection and Recognition Methods in Smart Mirror Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muhammad Bagus Satrio, Aji Gautama Putrada, and Maman Abdurohman
449
Comparative Analysis of Grid and Tree Topologies in Agriculture WSN with RPL Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Febrian Aji Pangestu, Maman Abdurohman, and Aji Gautama Putrada
459
Designing a Monitoring and Prediction System of Water Quality Pollution Using Artificial Neural Networks for Freshwater Fish Cultivation in Reservoirs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R Raden Muhamad Irvan, Maman Abdurohman, and Aji Gautama Putrada Sentence-Level Automatic Speech Segmentation for Amharic . . . . . . . . . Rahel Mekonen Tamiru and Solomon Teferra Abate
469 477
Urban Change Detection from VHR Images via Deep-Features Exploitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Annarita D’Addabbo, Guido Pasquariello, and Angelo Amodio
487
Region Awareness for Identifying and Extracting Text in the Natural Scene . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vinh Loc Cu, Xuan Viet Truong, Tien Dao Luu, and Hoang Viet Nguyen
501
Contents
Analysis of Effectiveness of Selected Classifiers for Recognizing Psychological Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marta Emirsajłow and Łukasz Jele´n
xi
511
A Virtual Reality System for the Simulation of Neurodiversity . . . . . . . . Héctor López-Carral, Maria Blancas-Muñoz, Anna Mura, Pedro Omedas, Àdria España-Cumellas, Enrique Martínez-Bueno, Neil Milliken, Paul Moore, Leena Haque, Sean Gilroy, and Paul F. M. J. Verschure
523
An Algorithm Classifying Brain Signals in the Control Problem . . . . . . Urszula Jagodzi´nska-Szyma´nska and Edward S˛edek
533
Fast Geometric Reconstruction Using Genetic Algorithms from Single or Multiple Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Afafe Annich, Imane Laghman, Abdellatif E. L. Abderrahmani, and Khalid Satori Metagenomic Analysis: A Pathway Toward Efficiency Using High-Performance Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gustavo Henrique Cervi, Cecília Dias Flores, and Claudia Elizabeth Thompson
543
555
A Machine Learning Approach to CCPI-Based Inflation Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R. Maldeni and M. A. Mascrenghe
567
On Profiling Space Reduction Efficiency in Vector Space Modeling-Based Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . Alaidine Ben Ayed, Ismaïl Biskri, and Jean-Guy Meunier
577
Proposal of a Methodology for the Implementation of a Smart Campus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sonia-Azucena Pupiales-Chuquin, Gladys-Alicia Tenesaca-Luna, and María-Belén Mora-Arciniegas Emotion Cause Detection with a Hierarchical Network . . . . . . . . . . . . . . . Jing Wan and Han Ren Skills and Human Resource Management for Industry 4.0 of Small and Medium Enterprises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sukmongkol Lertpiromsuk, Pittawat Ueasangkomsate, and Yuraporn Sudharatna Fully Passive Unassisted Localization System Without Time Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ Przemysław Swiercz
589
603
613
623
xii
Contents
Appropriation Intention of a Farm Management Information System Through Usability Evaluation with PLS-SEM Analysis . . . . . . . Helga Bermeo-Andrade and Dora González-Bañales
633
Collaborative Control of Mobile Manipulator Robots Through the Hardware-in-the-Loop Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luis F. Santo, Richard M. Tandalla, and H. Andaluz
643
Application ArcGIS on Modified-WQI Method to Evaluate Water Quality of the Euphrates River, Iraq, Using Physicochemical Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ali Chabuk, Hussein A. M. Al-Zubaidi, Aysar Jameel Abdalkadhum, Nadhir Al-Ansari, Salwan Ali Abed, Ali Al-Maliki, Jan Laue, and Salam Ewaid
657
Information Retrieval and Analysis of Digital Conflictogenic Zones by Social Media Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Pilgun and Alexander A. Kharlamov
677
Introducing a Test Framework for Quality of Service Mechanisms in the Context of Software-Defined Networking . . . . . . . . . Josiah Eleazar T. Regencia and William Emmanuel S. Yu
687
Building a Conceptual Model for the Acceptance of Drones in Saudi Arabia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Roobaea Alroobaea
701
A Channel Allocation Algorithm for Cognitive Radio Users Based on Channel State Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nakisa Shams, Hadi Amirpour, Christian Timmerer, and Mohammad Ghanbari
711
A Framework for Studying Coordinated Behaviour Applied to the 2019 Philippine Midterm Elections . . . . . . . . . . . . . . . . . . . . . . . . . . . William Emmanuel S. Yu
721
COVID-19 X-ray Image Diagnosis Using Deep Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alisa Kunapinun and Matthew N. Dailey
733
Jumping Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Atiq Ur Rehman, Ashhadul Islam, Nabiha Azizi, and Samir Brahim Belhaouari Reinforcement Learning for the Problem of Detecting Intrusion in a Computer System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quang-Vinh Dang and Thanh-Hai Vo
743
755
Contents
Grey Wolf Optimizer Algorithm for Suspension Insulator Designing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dyhia Doufene, Slimane Bouazabia, Sid A. Bessedik, and Khaled Ouzzir Green IT Practices in the Business Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . Andrea Mory, Diego Cordero, Silvana Astudillo, and Ana Lucia Serrano
xiii
763 773
Study of Official Government Website and Twitter Content Quality in Four Local Governments of Indonesia . . . . . . . . . . . . . . . . . . . . Nita Tri Oktaviani, Achmad Nurmandi, and Salahudin
783
Design and Implementation of an Industrial Multinetwork TCP/IP of a Distributed Control System with Virtual Processes Based on IOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wilson Sánchez Ocaña, Elizabeth Alvarado Rubio, Edwin Torres López, and Alexander Toapanta Casa
797
Cross-Textual Analysis of COVID-19 Tweets: On Themes and Trends Over Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joseph Marvin Imperial, Angelica De La Cruz, Emmanuel Malaay, and Rachel Edita Roxas
813
A Novel Approach for Smart Contracts Using Blockchain . . . . . . . . . . . . Manar Abdelhamid and Khaled Nagaty
823
Redundant Bus Systems Using Dual-Mode Radio . . . . . . . . . . . . . . . . . . . . Felix Huening, Franz-Josef Wache, and David Magiera
835
Practice of Tech Debt Assessment and Management with TETRA™ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boris Kontsevoi, Denis Syraeshko, and Sergei Terekhov
843
Low-Cost Health Monitoring System: A Smart Technological Device for Elderly People . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tamanna Shaown, M. Shohrab Hossain, and Tasnim Morium Mukur
851
An Improved Genetic Algorithm With Initial Population Strategy and Guided Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ali Aburas
861
Intuitive Searching: An Approach to Search the Decision Policy of a Blackjack Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhenyu Pan, Jie Xue, and Tingjian Ge
869
Generation and Extraction of Color Palettes with Adversarial Variational Auto-Encoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmad Moussa and Hiroshi Watanabe
889
Crime Mapping Approach for Crime Pattern Identification: A Prototype for the Province of Cavite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aries M. Gelera and Edgardo S. Dajao
899
xiv
Contents
Hardware in the Loop of an Omnidirectional Vehicle Using Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jonathan A. Romero, Edgar R. Salazar, Edgar I. De la Cruz, Geovanny P. Moreno, and Jéssica D. Mollocana Re-hub-ILITY: A Personalized Home System and Virtual Coach to Support and Empower Elderly People with Chronic Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Claudio Pighini, Ambra Cesareo, Andrea Migliavacca, Matilde Accardo, and Maria Renata Guarneri An Empirical Evaluation of Machine Learning Methods for the Insurance Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Dammann, Nicolai Gnoss, Pamela Kunert, Eike-Christian Ramcke, Tobias Schreier, Ulrike Steffens, and Olaf Zukunft
911
923
933
Construction and Practice of Task-Driven Learning Model Based on TBL and CBL in Post MOOC Era . . . . . . . . . . . . . . . . . . . . . . . . Cuiping Li and Hanbin Wu
943
Research Progress on Influencing Factors of Sense of Control in the Elderly and Its Effects on Successful Aging . . . . . . . . . . . . . . . . . . . . Haiying Qian and Hanbin Wu
953
Indicators of Choosing Internet User’s Responsible Behavior . . . . . . . . . Olga Shipunova, Irina Berezovskaya, Swetlana Kedich, and Nina Popova
961
E-Governance and Privacy in Pandemic Times . . . . . . . . . . . . . . . . . . . . . . Kotov Alexander and Naumov Victor
971
A Next-Generation Telemedicine and Health Advice System . . . . . . . . . . Shah Siddiqui, Adrian Hopgood, Alice Good, Alexander Gegov, Elias Hossain, Wahidur Rahman, Rezowan Ferdous, Murshedul Arifeen, and Zakir Khan
981
“Can Mumbai Indians Chase the Target?”: Predict the Win Probability in IPL T20-20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. G. T. L. Karunathilaka, S. K. Rajakaruna, R. Navarathna, K. Anantharajah, and M. Selvarathnam
991
Method for Extracting Cases Relevant to Social Issues from Web Articles to Facilitate Public Debates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1001 Akira Kamiya and Shun Shiramatsu Comparative Analysis of Cloud Computing Security Frameworks for Financial Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015 Sudhish Mohanan, Nandagopal Sridhar, and Sajal Bhatia Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027
Editors and Contributors
About the Editors Xin-She Yang obtained his D.Phil. in Applied Mathematics from the University of Oxford and subsequently worked at the Cambridge University and the National Physical Laboratory (UK) as Senior Research Scientist. He is currently Reader in Modelling and Optimization at Middlesex University London and Adjunct Professor at Reykjavik University (Iceland). He is also Elected Bye-Fellow at the Cambridge University and IEEE CIS Chair for the Task Force on Business Intelligence and Knowledge Management. He was included in the “2016 Thomson Reuters Highly Cited Researchers” list. Simon Sherratt was born near Liverpool, England, in 1969. He is currently Professor of Biosensors at the Department of Biomedical Engineering, University of Reading, UK. His main research area is signal processing and personal communications in consumer devices, focusing on wearable devices and health care. Professor Sherratt received the 1st place IEEE Chester Sall Memorial Award in 2006, the 2nd place in 2016 and the 3rd place in 2017. Nilanjan Dey is an Associate Professor in the Department of Computer Science and Engineering, JIS University, Kolkata, India. He has authored/edited more than 75 books with Springer, Elsevier, Wiley and CRC Press and published more than 300 peer-reviewed research papers. Dr. Dey is Editor-in-Chief of the International Journal of Ambient Computing and Intelligence; Series Co-Editor of Springer Tracts in Nature-Inspired Computing (STNIC); and Series Co-Editor of Advances in Ubiquitous Sensing Applications for Healthcare, Elsevier. Amit Joshi is Director of the Global Knowledge Research Foundation and the International Chair of InterYIT at the International Federation of Information Processing (IFIP, Austria). He has edited more than 40 books for Springer, ACM and other reputed publishers. He has also organized more than 50 national and international
xv
xvi
Editors and Contributors
conferences and workshops in association with the ACM, Springer and IEEE in, e.g. India, Thailand and the UK.
Contributors Hussein A. M. Al-Zubaidi University of Babylon, Babylon, Iraq Solomon Teferra Abate Addis Ababa University, Addis Ababa, Ethiopia Aysar Jameel Abdalkadhum Al-Qasim Qasim, Iraq
Green
University-Babylon/Iraq,
Al
Manar Abdelhamid The British University in Egypt, Cairo, Egypt Abdellatif E. L. Abderrahmani Department of Computer Sciences FSDM, LISAC, University Sidi Mohammed Ben Abdellah, Atlas FEZ, Morocco Maman Abdurohman School of Computing, Telkom University, Bandung, Indonesia Ali Aburas SUNY Morrisville, Morrisville, NY, USA Matilde Accardo Info Solution s.p.a, Vimodrone, Italy Nadhir Al-Ansari Lulea University of Technology, Lulea, Sweden Kotov Alexander Dentons Europe LLP, Saint Petersburg, Russia Salwan Ali Abed University of Al-Qadisiyah, Diwaniya, Iraq Volodymyr Alieksieiev National Technical University ‘Kharkiv Polytechnic Institute’, Kharkiv, Ukraine Soliman Aljarboa Department of Management Information System, College of Business and Economics, Qassim University, Buridah, Saudi Arabia; Business School, Victoria University, Footscray, VIC, Australia Ali Al-Maliki Ministry of Science and Technology, Baghdad, Iraq Roobaea Alroobaea Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia Hadi Amirpour Institute of Information Technology, Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria Angelo Amodio Planetek Italia S.R.L, Bari, Italy K. Anantharajah Faculty of Engineering, University of Jaffna, Kilinochchi, Sri Lanka H. Andaluz Universidad de Las Fuerzas Armadas ESPE, Sangolquí, Ecuador
Editors and Contributors
xvii
Darko Androˇcec Faculty of Organization and Informatics, University of Zagreb, Varaždin, Croatia Afafe Annich Department of Computer Sciences FSDM, LISAC, University Sidi Mohammed Ben Abdellah, Atlas FEZ, Morocco; Higher Institute of Information and Communication, Rabat, Morocco Stoyan Apostolov Faculty of Mathematics and Informatics, Sofia University, Sofia, Bulgaria Murshedul Arifeen Time Research & Innovation (Tri), Southampton, UK; Khilgaon, Dhaka, Bangladesh Silvana Astudillo Universidad de Cuenca, Cuenca, Ecuador Paulina Ayala Universidad Tecnica de Ambato UTA, Ambato, Ecuador Nabiha Azizi Electronic Document Management Laboratory (LabGED), Badji Mokhtar-Annaba University, Annaba, Algeria Sussy Bayona-Oré Universidad Nacional Mayor de San Marcos, Lima, Peru; Universidad Autónoma del Perú, Lima, Peru Samir Brahim Belhaouari ICT Division, College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar Alaidine Ben Ayed Université du Québec à Montréal, Montréal, QC, Canada Irina Berezovskaya Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia; Emperor Alexander I St. Petersburg State Transport University, St. Petersburg, Russia Helga Bermeo-Andrade Universidad de Ibagué, Ibagué, Colombia Sid A. Bessedik Université Amar Telidji de Laghouat, Laghouat, Algeria Sajal Bhatia Sacred Heart University, Fairfield, Connecticut, USA Ismaïl Biskri Université du Québec à Trois-Rivières, Trois-Rivières, QC, Canada Maria Blancas-Muñoz Synthetic Perceptive Emotive Cognitive Systems (SPECS) Lab, Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain Slimane Bouazabia University of Science and Technology Houari Boumediene Bab Ezzouar, Laghouat, Algeria Javier Caceres Universidad Tecnica de Ambato UTA, Ambato, Ecuador Mingyi Cai Carnegie Mellon University, Pittsburgh, PA, USA Gustavo Caiza Universidad Politecnica Salesiana, UPS, Quito, Ecuador
xviii
Editors and Contributors
Alexander Toapanta Casa Department of Electricity and Electronics, Universidad de las Fuerzas Armadas ESPE, Sangolquí, Ecuador Gustavo Henrique Cervi Federal University of Health Sciences (UFCSPA), Porto Alegre, Brazil Ambra Cesareo LifeCharger s.r.l, Milan, Italy Ali Chabuk University of Babylon, Babylon, Iraq Jeffry Chavarría-Molina Escuela de Matemática, Instituto Tecnológico de Costa Rica, Cartago, Costa Rica Baldreck Chipangura University of South Africa, Johannesburg, South Africa Timothy Scott C. Chu Robotics Laboratory, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK; Mechanical Engineering Department, De La Salle University, Manila, Philippines Alvin Chua Mechanical Engineering Department, De La Salle University, Manila, Philippines Caslon Chua Faculty of Science, Engineering and Technology, Swinburne University of Technology, Hawthorn, VIC, Australia Diego Cordero Universidad Católica de Cuenca, Cuenca, Ecuador Jessica Nicole Dela Cruz National University, Manila, Sampaloc, Philippines Vinh Loc Cu Can Tho University, Can Tho, Vietnam Miguel Cuya Universidad Nacional Mayor de San Marcos, Lima, Peru Annarita D’Addabbo IREA-CNR, Bari, Italy Christian Daase Otto-von-Guericke University Magdeburg, Magdeburg, Germany Matthew N. Dailey Asian Institute of Technology, Pathumthani, Thailand; Information and Communication Technologies, Pathumthani, Thailand Edgardo S. Dajao Graduate School of Engineering, Pamantasan ng Lungsod ng Maynila, Manila, Philippines Michael Dammann Hamburg University of Applied Sciences, Department of Informatics, Hamburg, Germany Quang-Vinh Dang Industrial University of Ho Chi Minh city, Ho Chi Minh city, Vietnam Angelica De La Cruz National University, Manila, Philippines Edgar I. De la Cruz Universidad de las Fuerzas Armadas ESPE, Sangolquí, Ecuador
Editors and Contributors
xix
Ivan Dimov Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Department of Parallel Algorithms, Sofia, Bulgaria Afsaneh Doryab University of Virginia, Charlottesville, VA, USA Dyhia Doufene University of Science and Technology Houari Boumediene Bab Ezzouar, Laghouat, Algeria Anastasios Dounis Department of Biomedical Engineering, University of West Attica, Athens, Greece Maycol O. Echevarria Universidad Peruana Unión, Lima, Peru Marta Emirsajłow Department of Computer Engineering, Wrocław University of Science and Technology, Wrocław, Poland Shannen Nicole C. Esguerra National University, Manila, Sampaloc, Philippines Àdria España-Cumellas Synthetic Perceptive Emotive Cognitive Systems (SPECS) Lab, Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain Salam Ewaid Southern Technical University, Basra, Iraq Bernie S. Fabito National University, Manila, Sampaloc, Philippines Rezowan Ferdous Time Research & Innovation (Tri), Southampton, UK; Khilgaon, Dhaka, Bangladesh Cecília Dias Flores Federal University of Health Sciences (UFCSPA), Porto Alegre, Brazil Denise Foro-Szasz Carinthia University of Applied Sciences, Villach, Austria Huaguo Gan School of Physics and Materials Science, Guangzhou University, Guangzhou, China Carlos A. Garcia Universidad Técnica de Ambato,UTA, Ambato, Ecuador Marcelo V. Garcia Universidad Técnica de Ambato,UTA, Ambato, Ecuador; University of Basque Country, UPV/EHU, Bilbao, Spain Josá Manuel García Department of Information and Communication Engineering, University of Murcia, Murcia, Spain Tingjian Ge University of Massachusetts, Lowell, MA, USA Alexander Gegov School of Computing, The University of Portsmouth (UoP), Portsmouth, UK Aries M. Gelera Department of Computer Studies, Cavite State University, Rosario, Cavite, Philippines
xx
Editors and Contributors
Mohammad Ghanbari Institute of Information Technology, Alpen-AdriaUniversität Klagenfurt, Klagenfurt, Austria; University of Essex, Colchester, UK Jayshree Ghorpade-Aher P.I.C.T, MIT World Peace University, Pune, India Minal Ghute Yeshwantrao Chavan College of Engineering, Nagpur, India Sean Gilroy BBC, Manchester, UK Nicolai Gnoss Hamburg University of Applied Sciences, Department of Informatics, Hamburg, Germany Dora González-Bañales Instituto Tecnológico de Durango/Tecnológico Nacional de México, Durango, Mexico Alice Good School of Computing, The University of Portsmouth (UoP), Portsmouth, UK Maria Renata Guarneri LifeCharger s.r.l, Milan, Italy Keisuke Hakuta Shimane University, Matsue, Shimane, Japan Cemal Hanilçi Electrical and Electronic Engineering, Bursa Technical University, Bursa, Turkey Leena Haque BBC, Manchester, UK Ralf-Christian Härting Aalen University of Applied Sciences, Business Administration, Aalen, Germany Erich Hartlieb Carinthia University of Applied Sciences, Villach, Austria Bernhard Heiden Carinthia University of Applied Sciences, Villach, Austria; University of Graz, Graz, Austria Andreas Hein Carl von Ossietzky University Oldenburg, Oldenburg, Germany Adrian Hopgood School of Computing, The University of Portsmouth (UoP), Portsmouth, UK Elias Hossain Time Research & Innovation (Tri), Southampton, UK; Khilgaon, Dhaka, Bangladesh Felix Huening University of Applied Science Aachen, Aachen, Germany Joseph Marvin Imperial National University, Manila, Philippines Ghielyssa D. Intrina National University, Manila, Sampaloc, Philippines R Raden Muhamad Irvan School of Computing, Telkom University, Bandung, Indonesia Naohiro Ishii Advanced Institute of Industrial Technology, Tokyo, Japan
Editors and Contributors
xxi
Ashhadul Islam ICT Division, College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar Fakir M. Amirul Islam Faculty of Health, Arts and Design, Swinburne University of Technology, Hawthorn, VIC, Australia Ngomane Issah Department of Computer Science, University of Limpopo, Menkweng, South Africa Urszula Jagodzinska-Szyma ´ nska ´ PIT-RADWAR S.A., Warsaw, Poland Łukasz Jelen´ Department of Computer Engineering, Wrocław University of Science and Technology, Wrocław, Poland Fernando Jiménez Department of Information and Communication Engineering, University of Murcia, Murcia, Spain Juan Jose Fallas-Monge Escuela de Matemática, Instituto Tecnológico de Costa Rica, Cartago, Costa Rica Akira Kamiya Nagoya Institute of Technology, Aichi, Japan D. G. T. L. Karunathilaka Faculty of Engineering, Kilinochchi, Sri Lanka
University of Jaffna,
Swetlana Kedich Emperor Alexander I St. Petersburg State Transport University, St. Petersburg, Russia Jagdish Kene Shri Ramdeobaba College of Engineering and Management, Nagpur, India Hitendra Shankarrao Khairnar Research Scholar PICT, Cummins College of Engineering, Pune, India Zakir Khan Time Research & Innovation (Tri), Southampton, UK; Khilgaon, Dhaka, Bangladesh Alexander A. Kharlamov Institute of Higher Nervous Activity and Neurophysiology, RAS, Moscow, RF, Russia; Moscow State Linguistic University, Moscow, RF, Russia; Higher School of Economics, Moscow, RF, Russia Nwe Ni Khin Yangon Technological University, Computer Engineering and Information Technology, Yangon, Republic of the Union of Myanmar Takumi Kobayashi Shimane University, Matsue, Shimane, Japan Panagiotis Kofinas Department of Biomedical Engineering, University of West Attica, Athens, Greece Boris Kontsevoi Intetics Inc., Naples, FL, USA Mridula Korde Shri Ramdeobaba College of Engineering and Management, Nagpur, India
xxii
Editors and Contributors
Panagiotis Korkidis Department of Biomedical Engineering, University of West Attica, Athens, Greece Ronny Kramer Department of Computer Science, Chemnitz University of Technology, Chemnitz, Germany Alisa Kunapinun Asian Institute of Technology, Pathumthani, Thailand; Industrial Systems Engineering, Pathumthani, Thailand Pamela Kunert Hamburg University of Applied Sciences, Department of Informatics, Hamburg, Germany O˘guzhan Kurnaz Mechatronics Engineering, Bursa Technical University, Bursa, Turkey Giovanni La Via Department of Agriculture, Food and Environment (Di3A), University of Catania, Catania, Italy Imane Laghman Department of Computer Sciences FSDM, LISAC, University Sidi Mohammed Ben Abdellah, Atlas FEZ, Morocco Bee Theng Lau Faculty of Engineering, Computing and Science, Swinburne University of Technology, Kuching, Sarawak, Malaysia Jan Laue Lulea University of Technology, Lulea, Sweden Minhee Lee Samsung Electronics, Suwon, Gyeonggi, South Korea Sukmongkol Lertpiromsuk Regular MBA Program, Kasetsart Business School, Kasetsart University, Bangkok, Thailand Cuiping Li Jiangxi University of Traditional Chinese Medicine, Jiangxi, China Dokshin Lim Department of Mechanical and System Design Engineering, Hongik University, Seoul, South Korea Willone Lim Faculty of Engineering, Computing and Science, Swinburne University of Technology, Kuching, Sarawak, Malaysia Edwin Torres López Department of Electricity and Electronics, Universidad de las Fuerzas Armadas ESPE, Sangolquí, Ecuador Héctor López-Carral Synthetic Perceptive Emotive Cognitive Systems (SPECS) Lab, Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain; Universitat Pompeu Fabra (UPF), Barcelona, Spain Egon Lüftenegger CAMPUS 02 University of Applied Sciences, IT & Business Informatics, Graz, Austria Gaoyong Luo School of Physics and Materials Science, Guangzhou University, Guangzhou, China
Editors and Contributors
xxiii
Wenbin Luo School of Physics and Materials Science, Guangzhou University, Guangzhou, China Yaqing Luo Department of Mathematics, London School of Economics and Political Science, London, UK Tien Dao Luu Can Tho University, Can Tho, Vietnam David Magiera University of Applied Science Aachen, Aachen, Germany Mico C. Magtira National University, Manila, Sampaloc, Philippines Fahim Mahmud American International University-Bangladesh (AIUB), Dhaka, Bangladesh Emmanuel Malaay National University, Manila, Philippines R. Maldeni Robert Gordon University, Aberdeen, Scotland Kudzai Mapingire University of Pretoria, Pretoria, South Africa Enrique Martínez-Bueno Synthetic Perceptive Emotive Cognitive Systems (SPECS) Lab, Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain M. A. Mascrenghe Informatics Institute of Technology, Colombo, Sri Lanka Tokuro Matsuo Advanced Institute of Industrial Technology, Tokyo, Japan Jean-Guy Meunier Université du Québec à Montréal, Montréal, QC, Canada Siphe Mhlana University of South Africa, Johannesburg, South Africa Shah J. Miah Newcastle Business School, University of Newcastle, Newcastle, NSW, Australia Andrea Migliavacca LifeCharger s.r.l, Milan, Italy Mahajabin Haque Mili American International University-Bangladesh (AIUB), Dhaka, Bangladesh Neil Milliken Atos, London, UK Luis Miralles-Pechuán School of Computing, Technological University Dublin, Dublin, Ireland Khin Su Myat Moe Yangon Technological University, Computer Engineering and Information Technology, Yangon, Republic of the Union of Myanmar Sudhish Mohanan Sacred Heart University, Fairfield, Connecticut, USA Jéssica D. Mollocana Universidad de las Fuerzas Armadas ESPE, Sangolquí, Ecuador Cristhian Monta Universidad Tecnica de Ambato UTA, Ambato, Ecuador
xxiv
Editors and Contributors
Paul Moore Atos, London, UK María-Belén Mora-Arciniegas Departamento de Ciencias de La Computación y Electrónica, Universidad Técnica Particular de Loja, San Cayetano Alto y Marcelino Champagnat S/N, Loja, Ecuador Geovanny P. Moreno Universidad de las Fuerzas Armadas ESPE, Sangolquí, Ecuador Andrea Mory Universidad de Las Islas Baleares, Palma de Mallorca, Spain Ahmad Moussa Graduate School of Fundamental Science and Engineering, Waseda University, Tokyo, Japan Velempini Mthulisi Department of Computer Science, University of Limpopo, Menkweng, South Africa Alexander Muenzberg University of Applied Science Zweibrücken, Germany; Carl von Ossietzky University Oldenburg, Oldenburg, Germany
Kaiserslautern,
Tasnim Morium Mukur United International University, Dhaka, Bangladesh Anna Mura Synthetic Perceptive Emotive Cognitive Systems (SPECS) Lab, Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain Khaled Nagaty The British University in Egypt, Cairo, Egypt R. Navarathna OCTAVE, John Keells Group Centre of Excellence for Data and Advanced Analytic, Colombo, Sri Lanka Hoang Viet Nguyen Can Tho University, Can Tho, Vietnam Achmad Nurmandi Master of Government Science, Universitas Muhammadiyah Yogyakarta, Yogyakarta, Indonesia Wilson Sánchez Ocaña Department of Electricity and Electronics, Universidad de las Fuerzas Armadas ESPE, Sangolquí, Ecuador Kazuya Odagiri Sugiyama Jyogakuen University, Nagoya, Japan Nita Tri Oktaviani Master of Government Science, Universitas Muhammadiyah Yogyakarta, Yogyakarta, Indonesia Pedro Omedas Synthetic Perceptive Emotive Cognitive Systems (SPECS) Lab, Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain Khaled Ouzzir University of Science and Technology Houari Boumediene Bab Ezzouar, Laghouat, Algeria Zhenyu Pan University of Massachusetts, Lowell, MA, USA
Editors and Contributors
xxv
Febrian Aji Pangestu School of Computing, Telkom University, Bandung, Indonesia Guido Pasquariello IREA-CNR, Bari, Italy Biagio Pecorino Department of Agriculture, Food and Environment (Di3A), University of Catania, Catania, Italy Claudio Pighini LifeCharger s.r.l, Milan, Italy; Politecnico di Milano, Milan, Italy Maria Pilgun Institute of Linguistics, RAS, Moscow, RF, Russia; Moscow State Linguistic University, Moscow, RF, Russia; Higher School of Economics, Moscow, RF, Russia Matthias Pohl Otto-von-Guericke University Magdeburg, Magdeburg, Germany Orlando Poma Universidad Peruana Unión, Lima, Peru Nina Popova Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia Stoyan Poryazov Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Department of Information Modeling, Sofia, Bulgaria Pylyp Prystavka Department of Applied Mathematics, National Aviation University, Kyiv, Ukraine Sonia-Azucena Pupiales-Chuquin Departamento de Ciencias de La Computación y Electrónica, Universidad Técnica Particular de Loja, San Cayetano Alto y Marcelino Champagnat S/N, Loja, Ecuador Aji Gautama Putrada Advanced and Creative Networks Research Center, Telkom University, Bandung, Indonesia Haiying Qian Jiangxi University of Traditional Chinese Medicine, Nanchang, Jiangxi, China Sajidur Rahman Universität Bremen, Bremen, Germany Wahidur Rahman Time Research & Innovation (Tri), Southampton, UK; Khilgaon, Dhaka, Bangladesh S. K. Rajakaruna Faculty of Engineering, University of Jaffna, Kilinochchi, Sri Lanka Vanitha Rajamany School of Computing, UNISA, Pretoria, South Africa Eike-Christian Ramcke Hamburg University of Applied Sciences, Department of Informatics, Hamburg, Germany Josiah Eleazar T. Regencia Ateneo de Manila University, Quezon City, Philippines
xxvi
Editors and Contributors
Christopher Reichstein Cooperative State University BW, Heidenheim/Brenz, Germany Han Ren Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou, China Norbert Roesch University of Applied Science Kaiserslautern, Zweibrücken, Germany Joel Hugo Fernandez Rojas Universidad Peruana Unión, Lima, Peru Jonathan A. Romero Universidad de las Fuerzas Armadas ESPE, Sangolquí, Ecuador Rachel Edita Roxas National University, Manila, Philippines Elizabeth Alvarado Rubio Department of Electricity and Electronics, Universidad de las Fuerzas Armadas ESPE, Sangolquí, Ecuador Gudula Rünger Department of Computer Science, Chemnitz University of Technology, Chemnitz, Germany Salahudin Government Science, Universitas Muhammadiyah Malang, Malang City, Indonesia Edgar R. Salazar Universidad de las Fuerzas Armadas ESPE, Sangolquí, Ecuador Alexandra Salazar-Moya Universidad Técnica de Ambato,UTA, Ambato, Ecuador Luis F. Santo Universidad de Las Fuerzas Armadas ESPE, Sangolquí, Ecuador Khalid Satori Department of Computer Sciences FSDM, LISAC, University Sidi Mohammed Ben Abdellah, Atlas FEZ, Morocco Muhammad Bagus Satrio School of Computing, Telkom University, Bandung, Indonesia Janina Sauer University of Applied Science Kaiserslautern, Zweibrücken, Germany; Carl von Ossietzky University Oldenburg, Oldenburg, Germany Marthie Schoeman School of Computing, University of South Africa, Johannesburg, South Africa Tobias Schreier Hamburg University of Applied Sciences, Department of Informatics, Hamburg, Germany Tobias Schüle Aalen University of Applied Sciences, Business Administration, Aalen, Germany Alessandro Scuderi Department of Agriculture, Food and Environment (Di3A), University of Catania, Catania, Italy
Editors and Contributors
xxvii
Emanuele Lindo Secco Robotics Laboratory, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK Edward S˛edek PIT-RADWAR S.A., Warsaw, Poland; University of Science and Technology (UTP), Bydgoszcz, Poland M. Selvarathnam Faculty of Engineering, University of Jaffna, Kilinochchi, Sri Lanka Mapunya Sekgoari Semaka Department of Computer Science, University of Limpopo, Menkweng, South Africa Stephen Kwame Senaya School of Computing, University of South Africa, Johannesburg, South Africa Ana Lucia Serrano Universidad de Cuenca, Cuenca, Ecuador Nakisa Shams Department of Electrical Engineering, École de technologie supérieure, Montreal, QC, Canada Tamanna Shaown Brac University, Dhaka, Bangladesh Olga Shipunova Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia Shun Shiramatsu Nagoya Institute of Technology, Aichi, Japan M. Shohrab Hossain Bangladesh University of Engineering and Technology, Dhaka, Bangladesh Shah Siddiqui School of Computing, The University of Portsmouth (UoP), Portsmouth, UK Sourav Sinha Technische Universität München, Munich, Germany Hanlie Smuts University of Pretoria, Pretoria, South Africa Selver Softic CAMPUS 02 University of Applied Sciences, IT & Business Informatics, Graz, Austria Balwant Sonkamble Pune Institute of Computer Technology, Pune, India Juan J. Soria Universidad Peruana Unión, Lima, Peru Pablo Soto-Quiros Escuela de Matemática, Instituto Tecnológico de Costa Rica, Cartago, Costa Rica Nandagopal Sridhar Sacred Heart University, Fairfield, Connecticut, USA Daniel Staegemann Otto-von-Guericke Germany
University
Magdeburg,
Magdeburg,
Ulrike Steffens Hamburg University of Applied Sciences, Department of Informatics, Hamburg, Germany
xxviii
Editors and Contributors
Luisa Sturiale Department of Civil Engineering and Architecture (DICAR), University of Catania, Catania, Italy Yuraporn Sudharatna Department of Management, Kasetsart Business School, Kasetsart University, Bangkok, Thailand David A. Sumire Universidad Peruana Unión, Lima, Peru ´ Przemysław Swiercz Faculty of Electronics, Department of Computer Engineering, Wrocław University of Science and Technology, Wrocław, Poland Denis Syraeshko Intetics Bel Ltd., Minsk, Belarus Rahel Mekonen Tamiru Bahir Dar University, Bahir Dar, Ethiopia Richard M. Tandalla Universidad de Las Fuerzas Armadas ESPE, Sangolquí, Ecuador Gladys-Alicia Tenesaca-Luna Departamento de Ciencias de La Computación y Electrónica, Universidad Técnica Particular de Loja, San Cayetano Alto y Marcelino Champagnat S/N, Loja, Ecuador Sergei Terekhov Intetics Bel Ltd., Minsk, Belarus Claudia Elizabeth Thompson Federal University of Health Sciences (UFCSPA), Porto Alegre, Brazil Christian Timmerer Institute of Information Universität Klagenfurt, Klagenfurt, Austria; Bitmovin, Klagenfurt, Austria
Technology,
Alpen-Adria-
Giuseppe Timpanaro Department of Agriculture, Food and Environment (Di3A), University of Catania, Catania, Italy Venelin Todorov Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Department of Information Modeling, Sofia, Bulgaria; Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Department of Parallel Algorithms, Sofia, Bulgaria Bianca Tonino-Heiden University of Graz, Graz, Austria Xuan Viet Truong Can Tho University, Can Tho, Vietnam Klaus Turowski Otto-von-Guericke University Magdeburg, Magdeburg, Germany Hossana Twinomurinzi University of Johannesburg, Johannesburg, South Africa Oksana Tyvodar Department of Applied Mathematics, National Aviation University, Kyiv, Ukraine Pittawat Ueasangkomsate Department of Management, Kasetsart Business School, Kasetsart University, Bangkok, Thailand
Editors and Contributors
xxix
Atiq Ur Rehman ICT Division, College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar J. A. van Biljon School of Computing, UNISA, Pretoria, South Africa Alta Van der Merwe University of Pretoria, Pretoria, South Africa John Andrew van der Poll Graduate School of Business Leadership (SBL), University of South Africa, Midrand, South Africa C. J. van Staden School of Computing, UNISA, Pretoria, South Africa Paul F. M. J. Verschure Synthetic Perceptive Emotive Cognitive Systems (SPECS) Lab, Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain Naumov Victor Institute of State and Law, Russian Academy of Sciences, Saint Petersburg, Russia Carlota Villasante Marcos Ericsson España SA, Madrid, Spain Thanh-Hai Vo Industrial University of Ho Chi Minh city, Ho Chi Minh city, Vietnam Matthias Volk Otto-von-Guericke University Magdeburg, Magdeburg, Germany Dražen Vuk Mobilisis d.o.o, Varaždin, Jalkovec, Croatia Franz-Josef Wache University of Applied Science Aachen, Aachen, Germany Jing Wan Guangdong University of Foreign Studies, Guangzhou, China Hiroshi Watanabe Graduate School of Fundamental Science and Engineering, Waseda University, Tokyo, Japan Hanbin Wu Jiangxi University of Traditional Chinese Medicine, Nanchang, Jiangxi, China Jie Xue University of California, Santa Barbara, CA, USA Runze Yan University of Virginia, Charlottesville, VA, USA William Emmanuel S. Yu Ateneo de Manila University, Quezon City, Philippines Olaf Zukunft Hamburg University of Applied Sciences, Department of Informatics, Hamburg, Germany
Highly Efficient Stochastic Approaches for Computation of Multiple Integrals for European Options Venelin Todorov, Ivan Dimov, Stoyan Apostolov, and Stoyan Poryazov
Abstract In this work we investigate advanced stochastic methods for solving a specific multidimensional problems related to computation of European style options in computational finance. Recently stochastic methods have become very important tool for high performance computing of very high dimensional problems in computational finance. The advantages and disadvantages of several highly efficient stochastic methods connected to European options evaluation will be analyzed. For the first time multidimensional integrals up to 100 dimensions related to European options will be computed with highly efficient lattice rules. Keywords Monte Carlo and quasi-Monte Carlo methods · Multidimensional integrals · Option pricing · High performance computing
1 Introduction Recently Monte Carlo (MC) and quasi-Monte Carlo (QMC) approaches are established as a very attractive and necessary computational tools in finance [11]. The field of computational finance is more complicated with increasing number of applications [2]. The option pricing is a key problem in financial markets [5, 6, 12] and especially V. Todorov · S. Poryazov Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Department of Information Modeling, Acad. Georgi Bonchev Str., Block 8, 1113 Sofia, Bulgaria e-mail: [email protected] V. Todorov (B) · I. Dimov Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Department of Parallel Algorithms, Acad. G. Bonchev Str., Block 25 A, 1113 Sofia, Bulgaria e-mail: [email protected]; [email protected] I. Dimov e-mail: [email protected] S. Apostolov Faculty of Mathematics and Informatics, Sofia University, 5 James Bourchier Blvd., 1164 Sofia, Bulgaria © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_1
1
2
V. Todorov et al.
difficult when the dimension of the problem goes higher [1]. MC and QMC methods are appropriate for solving multidimensional problems [7] and are used not only for option pricing [9], but also in other problems in computational finance [4, 13]. The basic definitions that we are using are taken from [11].
2 Problem Settings and Motivation Let’s deal with a European call option [11] whose payoff depends on k > 1 assets with prices Si , i = 1, ..., k.. Following [11] we assume that at expiry time T , and risk-free interest rate r , the payoff is given by h(S1 , . . . , Sk ), S is the value at expiry of the i-th asset. Then for the option value V : V = e−r (T −t) (2π(T − t))−k/2 (det )−1/2 (σ1 . . . σk )−1 ∞
∞ ...
0
0
h(S1 , . . . , Sk ) S1 . . . Sk
exp −0.5α −1 α dS1 . . . dSk , −1 ln(Si /Si ) − (r − σi2 /2)(T − t) . αi = σi (T − t)1/2 According to [11] the most important case in recent models is when the payoff function is the exponent function. We will now give a brief explanation which demonstrates the strength of the MC and QMC approach [7]. According to [7], a time of order 1093 s will be necessary for computation of the integral with the deterministic approach, and 1 year has 31536 × 103 s. According to [7] a time of 10 × 107 × 2 × 10−7 ≈ 20s will be necessary in order to evaluate the multidimensional integral with the same accuracy. We summarize that in the case of 100-dimensional integral it is 5 × 1091 times faster than the deterministic one. That motivates our study on the new highly efficient stochastic approaches for the problem under consideration.
Highly Efficient Stochastic Approaches for Computation …
3
3 Highly Efficient Stochastic Approaches Based on Lattice Rules We will use this rank-1 lattice sequence [14]: xk =
k z , k = 1, . . . , N , N
(1)
where N is an integer, N ≥ 2, z = (z 1 , z 2 , . . . z s ) is the generating vector and {z} denotes the fractional part of z. For the definition of the E sα (c) and Pα (z, N ) see [14] and for more details, see also [1]. In 1959 Bahvalov showed that [1] there exists an optimal choice of the generating vector z: N −1
1 (log N )β(s,α) k z − f f (u)du ≤ cd(s, α) (2) N N Nα k=0 [0,1)s for the function f ∈ E sα (c) with α > 1. The generating vector z which satisfies (2) is an optimal generating vector [14]. The main bottleneck lies in the creation of the optimal vectors, especially for very high dimensions [11]. The first generating vector in our study is the generalized Fibonacci numbers, for more details see [14]: z = (1, Fn(s) (2), . . . , Fn(s) (s)).
(3)
If we change the generating vector to be optimal in the way described in [10] we have improved the lattice sequence. This is a 200-dimensional base-2 generating vector of prime numbers for up to 220 = 1048576 points, constructed recently by Dirk Nuyens [10]. The special choice of this optimal generating vector is definitely more efficient than the Fibonacci generating vector, which is only optimal for the two-dimensional case [14]. For this improved lattice rule is satisfied [10]: D ∗N = O
log s N N
.
4 Numerical Examples and Results The numerical study includes high performance computing of the multidimensional integrals:
4
V. Todorov et al.
ex p(x1 x2 x3 ) ≈ 1.14649907.
(4)
⎛ ⎞⎞ ⎛ 5 5 exp ⎝ 0.5ai xi2 ⎝2 + sin x j ⎠⎠ ≈ 2.923651,
(5)
[0,1]3
j=1, j =i
i=1
[0,1]5
where ai = (1, 0.5, 0.2, 0.2, 0.2). exp
8
[0,1]20
0.1xi
= 1.496805.
(6)
i=1
[0,1]8
20 exp xi ≈ 1.00000949634.
(7)
i=1
We also have done high performance computing with our methods for the first time on a 100-dimensional integral:
100 exp xi ,
[0,1]100
i=1
I100 =
(8)
using the exponential function in Taylor series and integrating (x1 · · · x100 )n we receive 100 exp xi = [0,1]100
=
∞ n=0
i=1
1 =100 F100 (1, · · · , 1; 2, · · · , 2; 1). (n + 1)100 n!
where p Fq (a1 , · · · , a p ; b1 , · · · , bq ; x) is the generalized hypergeometric function p Fq (a1 , · · ·
, a p ; b1 , · · · , bq ; x) =
∞ (a1 )n · · · (a p )n x n , (b1 )n · · · (bq )n n! n=0
and (c)n = c(c + 1) · · · (c + n − 1) is the Pochhammer symbol. We also include in the experiments the 50-dimensional integral of the same kind:
Highly Efficient Stochastic Approaches for Computation …
50 exp xi .
[0,1]50
i=1
I50 =
5
(9)
The results are given in the Tables including the relative error (RE) of the MC and QMC method that has been used, the CPU-time (T) in seconds and the samples (#). We will make a high performance computation, including the Optimized lattice rule (OP), the Fibonacci based rule (FI), the Adaptive approach (AD) [8] and the Sobol quasi-random sequence (SO) [3]. For the 3-dimensional integral, for the number of samples Generalized Fibonacci numbers of the corresponding dimensionality, the best relative error is produced by the optimized lattice algorithm OP—see Table 1, but for a preliminary given time in seconds the optimized method OPT and the Fibonacci latice rule FI gives results of the same order—see Table 2. For the 5-dimensional integral again the best approach is OPT method, for N = 440096 it gives relative error of 8.16e − 7—see Table 3, while for 20s again FI method gives results of the same order as the optimized method—see Table 4. For the 8-dimensional integral the Adaptive approach, the Sobol QMC algorithm, and the Fibonacci approach produce relative error of the same order—see Table 5, but for a preliminary given time in seconds, Fibonacci approach is better than both Sobol QMC and Adaptive approach—see Table 6. For the 20-dimensional integral Sobol QMC approach is better than both Fibonacci and Adaptive approach—see Table 7 and Adaptive approach requires very huge amount of time—near one hour for number of samples N = 524888 due to the division of the subareas in the description of the algorithm. Thats why we omit this algorithm for the 50- and 100-dimensional integrals. For 20s for 20-dimensional integral the best result is produced again by the optimized lattice rule—1.23e − 8 in Table 8. For the 50-dimensional integral Fibonacci approach is worse than Sobol approach by at least 1 order—see Table 9, but for a preliminary given time in seconds Sobol QMC and Fibonacci approach give relative errors of the same order—see Table 10. It is worth mentioning that the Sobol approach requires more amount of time due to generation of the sequence, while Fibonacci lattice rules and Optimized approach are more faster and computationally efficient algorithms. For the 100-dimensional integral the best result is produced by the optimized lattice approach—it gives 4.78e − 6 for number of samples N = 220 —see Table 11 and for 100s it produces a relative error of 8.16e − 7 which is very high accuracy and with 3–4 orders better than the other stochastic approaches. So we demonstrate here the advantages of the new lattice method and its capability to achieve very high accuracy for less than a minute on a laptop with a quad-core CPU (Table 12).
6
V. Todorov et al.
Table 1 Algorithmic comparison of RE for (4) # OP T AD T 19513 35890 66012 121415 223317
1.93e-5 3.18e-6 2.65e-6 9.16e-7 8.01e-7
0.01 0.04 0.07 0.12 0.20
3.21e-4 6.55e-5 5.12e-5 5.11e-5 9.34e-5
2.21 6.41 9.86 15.4 24.2
FI
T
SO
T
4.69e-4 5.46e-6 5.34e-6 5.34e-6 1.73e-6
0.02 0.06 0.11 0.12 0.22
4.98e-5 1.56e-5 8.11e-6 3.08e-6 2.05e-6
0.56 1.45 2.31 3.80 6.13
Table 2 Algorithmic comparison of RE for the (4) T OP AD 0.1 1 2 5 10 20
9.16e-7 6.37e-7 4.22e-7 1.84e-7 6.09e-8 1.57e-8
8.67e-4 2.96e-5 5.45e-4 1.14e-4 6.56e-5 2.04e-5
Table 3 Algorithmic comparison of RE for the (5) # OP T AD T 13624 52656 103519 203513 400096
6.72e-5 1.53e-5 8.48e-6 6.25e-6 8.16e-7
0.02 0.06 0.09 0.15 0.40
1.89e-3 2.31e-3 2.01e-3 3.42e-4 9.12e-4
2.33 6.18 9.94 16.2 45.6
Table 4 Algorithmic comparison of RE for the (5) T OP AD 0.1 1 5 10 20
3.07e-6 1.32e-6 1.13e-6 5.47e-7 3.52e-7
1.34e-2 2.44e-3 4.93e-4 1.88e-3 2.71e-4
FI
SO
1.32e-6 3.22e-7 2.06e-7 1.47e-7 3.89e-7 1.53e-8
3.21e-4 8.21e-5 2.96e-5 5.00e-6 2.71e-6 1.65e-6
FI
T
SO
T
9.59e-4 6.96e-4 8.72e-5 8.04e-5 7.26e-5
0.03 0.06 0.13 0.25 0.50
1.76e-4 5.05e-5 2.70e-5 7.57e-6 2.52e-6
0.56 1.45 2.52 6.07 10.63
FI
SO
7.26e-5 2.28e-5 5.94e-6 3.85e-7 7.49e-7
8.22e-4 2.91e-4 1.71e-5 1.79e-5 4.71e-6
Highly Efficient Stochastic Approaches for Computation … Table 5 Algorithmic comparison of RE for the (6) # OP T AD T 16128 32192 64256 128257 510994
1.79e-6 1.56e-6 8.01e-7 6.22e-7 3.21e-7
0.04 0.05 0.08 0.13 0.34
1.10e-5 3.32e-5 4.65e-5 8.25e-6 7.07e-6
12.6 33.3 54.2 88.3 233.6
FI
T
SO
T
8.08e-4 1.03e-4 5.03e-5 8.13e-6 5.95e-6
0.03 0.07 0.11 0.14 0.57
8.87e-5 5.42e-5 2.34e-5 4.45e-6 3.32e-6
0.13 0.58 2.49 6.36 19.45
Table 6 Algorithmic comparison of RE for the (6) T OP AD 1 2 5 10 20
2.18e-7 1.32e-7 9.03e-8 5.00e-8 2.55e-8
6.34e-4 1.58e-4 1.44e-4 6.61e-5 2.77e-5
Table 7 Algorithmic comparison of RE for the (7) # OP T AD T 2048 16384 65536 131072 524288
2.84e-6 1.04e-6 9.21e-7 6.15e-7 5.33e-8
0.02 0.12 0.91 2.13 8.13
1.14e-2 4.96e-4 9.75e-4 1.25e-5 1.96e-6
8.6 60.3 474.2 888.3 2356
9.14e-7 1.08e-7 5.87e-8 3.56e-8 1.23e-8
210 212 216 220
7.88e-6 1.88e-6 8.44e-8 4.28e-8
0.05 0.17 2.14 17.65
6.23e-4 1.55e-4 9.72e-5 6.08e-5
SO 2.02e-5 2.73e-5 8.88e-6 5.23e-6 2.11e-6
FI
T
SO
T
0.03 0.13 1.17 2.34 8.34
8.44e-4 6.82e-5 8.34e-6 3.77e-6 1.91e-7
0.13 1.68 8.69 14.36 57
1.58e-3 1.028e-3 8.58e-4 4.31e-4 1.27e-4
Table 9 Algorithmic comparison of RE for the (9) # OP T FI
FI 5.34e-6 2.57e-6 1.52e-7 3.45e-6 1.82e-7
8.22e-5 3.12e-5 1.36e-5 8.85e-6 2.15e-6
Table 8 Algorithmic comparison of RE for the (7) T OP AD 1 2 5 10 20
7
FI
SO
1.48e-5 9.17e-6 5.19e-6 1.73e-6 1.38e-7
3.25e-5 3.97e-5 1.45e-5 2.71e-6 1.76e-6
T
SO
T
0.08 0.35 5.21 32.76
8.88e-5 5.21e-5 9.11e-4 4.88e-6
3.5 16 73 276
8
V. Todorov et al.
Table 10 Algorithmic comparison of RE for the (9) T OP FI 1 2 10 100
9.14e-7 7.51e-7 9.34e-8 1.34e-9
SO
1.58e-3 1.028e-3 3.01e-4 5.23e-5
Table 11 Algorithm comparison of the RE for the (8) # OP T FI T 210 212 216 220
6.83e-3 3.77e-4 3.36e-5 4.78e-6
0.05 0.17 9.1 57.6
4.13e-1 1.15e-1 6.12e-2 3.18e-2
0.06 0.18 9.2 58.7
1.48e-4 9.17e-5 8.73e-5 1.03e-5
SO
T
6.31e-2 1.23e-2 2.31e-3 2.34e-4
18 34 170 861
Table 12 Algorithm comparison of the RE for the 100-dimensional integral (8) T OP FI SO 1 2 10 100
2.67e-3 1.89e-4 3.22e-5 8.16e-7
7.18e-2 6.02e-2 4.12e-2 1.13e-2
9.31e-2 8.66e-2 6.94e-2 3.88e-3
5 Conclusion A comprehensive experimental study of optimized lattice rule, Fibonacci lattice sets, Sobol sequence, and Adaptive approach has been done for the first time on some case test functions related to option pricing. Optimized lattice rule described here is not only one of the best available algorithms for high dimensional integrals but also one of the few possible methods, because in this work we show that the deterministic algorithms need an huge amount of time for the evaluation of the multidimensional integral, as it was discussed in this paper. The numerical tests show that the improved lattice rule is efficient for multidimensional integration and especially for computing multidimensional integrals of a very high dimensions up to 100. The novelty is that the new proposed optimized method gives very high accuracy for less than a minute on laptop even for 100-dimensional integral. It is an important element since this may be crucial in order to achieve a more reliable interpretation of the results in European style options which is foundational in computational finance. Acknowledgements Venelin Todorov is supported by the Bulgarian National Science Fund under Project DN 12/5-2017 “Efficient Stochastic Methods and Algorithms for Large-Scale Problems” and by the National Scientific Program “Information and Communication Technologies for a Single
Highly Efficient Stochastic Approaches for Computation …
9
Digital Market in Science, Education and Security (ICT in SES)”, contract No DO1-205/23.11.2018, financed by the Ministry of Education and Science in Bulgaria. Stoyan Apostolov is supported by the Bulgarian National Science Fund under Young Scientists Project KP-06-M32/2 - 17.12.2019 “Advanced Stochastic and Deterministic Approaches for Large-Scale Problems of Computational Mathematics”.
References 1. Bakhvalov N (2015) On the approximate calculation of multiple integrals. J Complex 31(4):502–516 2. Boyle PP, Lai Y, Tan K (2001) Using lattice rules to value low-dimensional derivative contracts 3. Bratley P, Fox B (1988) Algorithm 659: implementing Sobol’s Quasirandom sequence generator. ACM Trans Math Softw 14(1):88–100 4. Broadie M, Glasserman P (1997) Pricing American-style securities using simulation. J Econ Dyn Control 21(8–9):1323–1352 5. Centeno V, Georgiev IR, Mihova V, Pavlov V (2019) Price forecasting and risk portfolio optimization. In: AIP conference proceedings, vol 2164, no 1, p 060006 6. Chance DM, Brook R (2009) An introduction to derivatives and risk management, 8th edn. South-Western College Pub 7. Dimov I (2008) Monte Carlo methods for applied scientists. World Scientific, London, Singapore, p 291p 8. Dimov I, Georgieva R (2010) Monte Carlo algorithms for evaluating Sobol’ sensitivity indices. Math Comput Simul 81(3):506–514 9. Duffie D (2001) Dynamic asset pricing theory, 3rd edn. Princeton University Press 10. Kuo FY, Nuyens D (2016) Application of quasi-Monte Carlo methods to elliptic PDEs with random diffusion coefficients—a survey of analysis and implementation. Found Comput Math 16(6):1631–1696 11. Lai Y, Spanier J (1998) Applications of Monte Carlo/Quasi-Monte Carlo methods in finance: option pricing. In: Proceedings of the Claremont Graduate University conference 12. Raeva E, Georgiev I (2018) Fourier approximation for modeling limit of insurance liability. In AIP conference proceedings, vol 2025, no 1, p 030006 13. Tan KS, Boyle PP (2000) Applications of randomized low discrepancy sequences to the valuation of complex securities. J Econ Dyn Control 24:1747–1782 14. Wang Y Hickernell FJ (2000) An historical overview of lattice point sets. In: Monte Carlo and Quasi-Monte Carlo methods. In: Proceedings of a conference held at Hong Kong Baptist University, China
Spectrum Sensing Data Falsification Attack Reputation and Q-Out-of-M Rule Security Scheme Velempini Mthulisi , Ngomane Issah, and Mapunya Sekgoari Semaka
Abstract Cognitive Radio Networks equipped with dynamic spectrum access are envisioned to address spectrum scarcity by allowing secondary users (SU) to utilise vacant spectrum bands opportunistically. The SUs utilise cooperative spectrum sensing (CSS) to make accurate spectrum access decision and to avoid interference. Unfortunately, malicious users can cooperate with SUs and share false observations leading to inaccurate spectrum access decision. The Spectrum Sensing Data Falsification (SSDF) attack is caused by malicious users. In this study, we investigated the SSDF attack and a dynamic defence mechanism called the reputation and q-outof-m rule scheme designed to address the effects of SSDF attack. The scheme was implemented in cognitive radio ad hoc networks. The fusion node was not considered. The success, missed detection, and false alarm probabilities were considered as evaluation metrics and the MATLAB simulations. Keywords Cognitive radio ad hoc networks · Cooperative spectrum sensing · Fusion centre · Primary users · Secondary users · Reputation-based system · Q-out-of-m rule · SSDF attack
1 Introduction The ever-increasing wireless devices lead to the overcrowding of unlicensed spectrum [1–3], and the underutilization of licensed spectrum. The Cognitive Radio Networks (CRN) address the spectrum scarcity by enabling Secondary Users (SUs) to access the vacant licensed spectrum opportunistically [4–10]. This is achieved V. Mthulisi (B) · N. Issah · M. S. Semaka Department of Computer Science, University of Limpopo, Menkweng, South Africa e-mail: [email protected] N. Issah e-mail: [email protected] M. S. Semaka e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_2
11
12
V. Mthulisi et al.
through Cooperative Spectrum Sensing (CSS) where SUs collaborative in sensing, [11–14]. Unfortunately, malicious SUs share false spectrum observations [3, 15, 16], leading to incorrect spectrum access decisions. This may cause Primary Users (PUs) interference or denial of service (DoS) to SUs [17]. The attack is known as Spectrum Sensing Data Falsification (SSDF)/Byzantine Attack [18, 19]. In this paper, an investigation of the SSDF attack in Cognitive Radio Ad Hoc Networks (CRAHN) was conducted. Further, we proposed a scheme, which integrates the reputation and q-out-of-m rule schemes [20]. The reputation-based system evaluates each SU’s past reports to determine its trustworthiness while the q-out-of-m rule in which q sensing reports out of 60% m nodes are randomly polled to make the final decision [21, 22]. MATLAB was used to and simulate the proposed scheme. The scheme can detect and isolate malicious nodes.
2 Related Work The SSDF attack can cause DoS to SUs or interference to PUs. The authors in [23] proposed a scheme to counter the SSDF attack in CRAHN. The scheme implements a q-out-of-m rule with a modified z-test. Chen et al. [24] implemented a scheme that mitigates the SSDF attack in a distributed CRN environment called destiny-based scheme (DBS). This scheme incorporated CSS where SUs share their sensing reports. However, the hit and run attack was not considered. Pongaliur et al. [25] proposed a distributed scheme to counter the SSDF attack known as the multi-fusion-based distributed spectrum sensing (MFDSS). The scheme implemented the modified z-test to combat extreme outliers and the reputation-based system for the final decision-making. The authors in [26] proposed a reputation-based system that clustered the SUs based on their history and initial reputation. However, the study did not consider the unintentionally misbehaving SUs and the hit and run attacks. The work in [27] used the suspicious level to address the SSDF attack. SUs that were deemed suspicious were isolated. The reports from SUs with trustworthy history were included in decision-making. The advantage of this study was that it restored the reports of unintentionally misbehaving SUs. Ye et al. [28] investigated and isolated the SSDF attack by implementing a statistical consensus-based algorithm in CRAHN. The scheme also isolated reports from unintentionally misbehaving SUs.
3 Network Model We considered two types of users, SUs and PUs. The SUs cooperate in both sensing and sharing sensing data. However, cooperative sensing can be compromised by malicious nodes, which report incorrect spectrum observations during CSS. We study three kinds of malicious nodes; the always yes, always no, and the hit and run. The
Spectrum Sensing Data Falsification Attack …
13
always yes report that a vacant spectrum is occupied. The always no report that an occupied spectrum is vacant. The hit and run malicious node alters its reports to avoid detection [29, 30]. We also studied two types of SUs, the legitimate SUs, and the unintentionally misbehaving SUs. The legitimate SUs make use of the cognitive capabilities and the unintentionally misbehaving SUs report incorrect sensing data due to hidden terminal problem, signal fading, and multipath fading [31]. The SUs incorporate energy detection in sensing the vacant channels. They also determine the off periods of the spectrum by detecting the signal strength of the PUs as shown in Eqs. (1–8). H0 : xi (t) = n i (t), i = 1, 2 . . . .
(1)
H1 : xi (t) = ci (t)si (t) + ai (t), i = 1, 2 . . . , N ,
(2)
where H0 denotes that the PU signal is absent and H1 denotes that the PU is present. N being the number of SUs, xi (t) is the ith sample of the received signal, si (t) is the PU signal, and ci (t) is the channel gain, while ai (t) denotes the additive white Gaussian noise (AWGN). The energy E of a given signal xi (t) is: ∞ E=
|xi (t)|∧ 2dt
(3)
−∞
This can be modelled using Perceval’s theorem as ∞
∞ |xi (t) | dt = 2
−∞
|xiπ ( f )|∧ 2dt
(4)
−∞
where xiπ
∞ =
e−2πi dt
(5)
−∞
The received energy observation E o can be modelled as a Normal random variable with mean u and variance σ 2 following the hypotheses H0 and H1
H0 : E 0 ∼ N u i , σi2 H1 : E 0 ∼ N u i1 , σi21
(6)
Comparing E 0 with a given TV γ1 , the local binary decision ϕ was obtained as
14
V. Mthulisi et al.
H0 : E 0 < γ1 whereE o ∼ N u i , σi2 H1 : E 0 > γ1 whereE 0 ∼ N u i1 , σi21
(7)
The local binary decision ϕ was based on the following criterion ⎧ ⎨
ϕ > E 0 accept H1 , conclude that PU is present ϕ < E 0 accept H0 , conclude that PU is absent ⎩ E 0 < ϕ < E 0 accept H1 , make another observation
(8)
Letting s to denote the successful PU detection with F the number of successes and f to denote the success probability with δ = 1 − failure rate, with N SUs N si . Given F and δ, the probability of correctly detecting PUs can be given by i=1 measured by a binomial probability distribution as follows:
n Φ
=
n n! s Φ f n−Φ , Φ = 0, 1 . . . , n and 0 ≤ p ≤ 1. = P(Φ) = Φ!(n − Φ)! Φ (9)
With mean n
n n! Φ n−Φ s Φ f n−Φ = ns. s f E(Φ) = = Φ Φ − Φ)!Φ! (n Φ=1 Φ=1 n
(10)
And variance σ 2 = ns[(n − 1)s + 1 − ns] = ns(1 − s) = ns f.
(11)
After the SUs have computed their binary decisions, they share observations as depicted in Fig. 1. After sensing the spectrum, the SUs report their spectrum observations to their neighbouring nodes. Unfortunately, the malicious nodes MU 1 and MU 2 interfere with the normal operations of the network for either greedy reasons or to monopolise the band or to mislead the SUs and cause DoS or interference. The goal of the SSDF attack can be modelled as follows: ∀iwhere i ∈ {neighbour}, report = 1 when actualobservation = 0 ∃iwhere i ∈ {neighbour}, report = 0 when actual_observation = 1
(12)
Reputation-based system The reputation-based system considered the trustworthiness of the SUs by evaluating the history of their reports. SUs with reputation values that were above the TV of 0.6 were considered as malicious and their reports were isolated otherwise they are included in the final decision. When the SUs with TVs below 0.6 are selected, the q-out-of-m is implemented. In the q-out-of-m rule, 60% of the nodes with good
Spectrum Sensing Data Falsification Attack …
15
Fig. 1 Cooperative spectrum sensing [22]
reputation are selected and the final decision is informed by q, which is either 0 or 1. The reputation-based system and the q-out-of-m rule are shown in algorithms 1 and 2, respectively. Algorithm 1 [30]
where m is the assessor node which performs the data fusion. The variable i denotes the neighbour SU and di(t) is the status of the spectrum band. Si(t) is the value of the neighbour report, and gm(t) is the final decision at device m. and r mi is the current reputation of the device i at device m. Algorithm 2 [31]
16
V. Mthulisi et al.
The q-out-of-m rule randomly polls 60% of the nodes with good reputation to be considered in decision-making. If majority report 1, then the final transmission decision is that the band is occupied, otherwise 0, it’s not occupied.
4 Results and Analysis MATLAB R2015a was used to implement and simulate the scheme. The false alarm probability and missed detection probability were set to 0.1 with SNR to 10 dB. Energy detection was used as the sensing technique with a detection TV of 0. Different network scenarios ranging from N = 10, 50,100,150–250 nodes and the percentage of malicious nodes ranged from 10, 20, 40, 50–60% on each network size were considered. The SUs sensed the spectrum band then computed their binary observations and shared their observations. We evaluated our scheme and compared its performance with the MFDSS [25] and the DBS [24] schemes. Figures 2, 3, and 4 depict the reputation-based system results. It shows the nodes IDs in a network with 50 nodes and their reputation values. Figure 3 shows the nodes that were isolated because their reputation values were above the TV value of 0.6. The inputs of these nodes were not considered in the final decision-making. Figure 4 shows the results of the nodes that had reputation values that were below the TV. These nodes were selected and their reputation values were included in the q-out-of-m rule phase. In Fig. 5, the scheme’s success probability was evaluated. We varied the number of nodes from (N = 10 to N = 250) with 10% of the nodes being MUs. We investigated the hit and run (H n R), the always yes (Ay), and the always no attacks (An). We also evaluated the impact of the unintentionally misbehaving nodes (U m). In N = 10,
Spectrum Sensing Data Falsification Attack …
17
Fig. 2 Nodes above threshold value
Fig. 3 Reputation-based system
we had 1 Ay attack. As we increased the nodes to 50, we noticed that the proposed scheme was not affected. In Fig. 6, we had Ay = 1, An = 1, and SSDF attack for N = 10. We observed that there was no effect on the success probability of the schemes because the of H n R attack and MU s where not considered. The proposed scheme performed better because of the q-out-of-m rule scheme implemented which isolated the U m nodes and H n R attack before the final decisions were made.
18
V. Mthulisi et al.
Fig. 4 Nodes below threshold value
Fig. 5 Success probability with 10% MUs
Figure 7 present the success probability, in N = 10, we had 4 MU s. We set 1 U M, 1 Ay, 1 An, and 1 H n R attack. The performance of the proposed scheme and MFDSS scheme were slightly affected and both schemes managed to detect 75% of the MUs. Figure 8 exhibited different trends compared to Fig. 7 because of the increase in the number of MU s in Fig. 8. In N = 100, we had 40 MU s. We observed a huge drop in the performance however, the proposed scheme achieved the highest detection accuracy, which is more than 60% compared to the MFDSS and DBS schemes. When the number of hit and run attacks increases, the DBSD scheme’s detection
Spectrum Sensing Data Falsification Attack … Fig. 6 Success probability with 20% Mus
Fig. 7 Success probability with 40% Mus
Fig. 8 Success probability with 50% MUs
19
20
V. Mthulisi et al.
Fig. 9 Success probability with 60% MUs
accuracy reduced drastically. This is because in its current form, the DBSD scheme is not designed to combat the hit and run attack and unintentionally misbehaving SUs. The MFDSS scheme is designed to combat the hit and run attack but is not optimised to combat a large number of attacks. MFDSS scheme was implemented using modified Z-test, which performs better when the byzantine failure rate cannot be estimated. The results show that the Um nodes have an effect on the performance of the MFDSS scheme. We present the success probability of the schemes in detecting the SSDF with MUs = 50% in Fig. 8. In N = 10, MUs = 5, we evaluated the performance of the network under 1 Um, 1 4y, 1 An, and 2 HnR attack in Fig. 8. The proposed scheme managed to detect all the attacks in the network because it is optimised to detect all the types of SSDF attacks. The DBS scheme managed to detect 60% of the MUs in the network while the MFDSS scheme managed to detect only 80% of the MUs in the network. In Fig. 9, the results show that with an increase in the number of SUs and MUs, and where many U m and H n R attacks were considered, the schemes’ success probability was reduced. We assigned 40 MU s in the network in Fig. 9. We noted that the H n R attack and U m nodes were the attacks with the highest negative impact on the network. The H n R attack can contain characteristics of legitimate SU s which reduces the detection probability of the schemes. In Fig. 10, we examined the schemes’ missed detection probabilities in detecting the SSDF attack in the network under different scenarios in each network size. In N = 10, we had only one attack implemented, the Ay attack. We observed that all the schemes were able to detect the Ay attack because the attack probability of the Ay exhibits the attributes of an outlier which can be easily detected by any fusion scheme. All the schemes had low miss detection probabilities in detecting the Ay SSDF attack. Increasing the nodes to 50 with 10% MUs and using the same parameters as in success probability, we set 1 Ayattack, 1 An attack, 2 misbehaving SUs, and 1 HnR attack. Our scheme had the lowest missed detection probability. In N = 100 with 10% of the nodes being malicious where 4 were Um nodes, 2 were the Ay attack, 2 An attack,
Spectrum Sensing Data Falsification Attack …
21
Fig. 10 Missed detection probability with 10% Mus
and 2 HnR attack. Due to the Um nodes, observed an increase in the missed detection probability of the proposed scheme. The number of nodes was increased to N = 150, with a random variation of the attack strategies. The Ay attack was set to 4, the An attack to 4, the HnR attack to 4, and Um nodes to 3. The proposed scheme had positive missed detection probability results. The proposed scheme detected and isolated all the malicious nodes with assistance of the q-out-of-m rule scheme implemented that detects all the MUs and Um nodes in the first fusion phase. The DBSD scheme is susceptible to byzantine attacks and Um nodes. In Fig. 11, the performance of the schemes was investigated under different SSDF attack scenarios. For N = 10, we set two different scenarios, we set 1 Ay attack and 1 An attack. We observed that with the Ay and An attacks, the schemes can detect and isolate them given their estimated attack probabilities. In Fig. 12, we analyse the missed detection probability of the schemes in N = 10, 50, 100, 150 to 250, with MU s = 40. In N = 10, we ha d MU s = 4. We set 1
Fig. 11 Missed detection probability with 20% MUs
22
V. Mthulisi et al.
Fig. 12 Missed detection probability with 40% MUs
U m, 1 Ay, 1 An, and 1 H n R attack. In N = 50, we set MU s = 20 where Ay = 5, An = 5, H n R = 5, and U m = 5. The results show that the proposed scheme outperformed the MFDSS and DBS schemes in missed detection probability. In Fig. 13, the number of MUs was the same as the number of SUs, this caused the results to exhibit a different pattern. However, the results show that our scheme performed better than the other schemes and had the lowest missed detection. The DBSD scheme had a missed detection percentage of 40% and the MFDSS had a missed detection percentage of 20%. This was caused by the HnR attack and Um nodes. The MFDSS scheme and DBSD scheme had limitations in detecting the HnR attack and Um nodes due to their design properties discussed in the literature. In a network with N = 50, we had 25 MUs with Um = 5, Ay = 6, An = 8, and HnR = 6. The missed detection probability in N = 100 when MUs were 50% increased. We set Um = 12, Ay = 15, An = 15, and HnR = 8. In N = 150 we set MUs = 75, we Fig. 13 Missed detection probability with 50% MUs
Spectrum Sensing Data Falsification Attack …
23
Fig. 14 Missed detection probability with 60% MUs
randomly set Um = 18, Ay = 25, An = 15, and HnR = 17. With an increase in the number of SUs and MUs where we had a high number of Um and HnR attacks, the schemes’ missed detection probability increased. In Fig. 14, the number of MUs was more than the number of SUs, which caused the missed detection probability to increase. The missed detection probability increased when we increased the H n R attack and the Um nodes. The U m nodes can contain malicious results while having good reputations.
5 Conclusion In this study, we proposed a scheme which integrates the reputation and q-out-of-m rule schemes to address the effects of the SSDF attack. We studied the always yes, always no, and the hit and run attacks. We also investigated the legitimate SUs and the unintentionally misbehaving SUs in order to discriminate them from MUs. The proposed scheme was compared to the MFDSS scheme and the DBS scheme. The results show that the proposed scheme performed better in all the metrics as it had the highest success probability and the lowest missed detection probability. Acknowledgements “This work is based on the research supported in part by the National Research Foundation” of South Africa for the grant, Unique Grant No. “94077”.
References 1. Zhang X, Zhang X, Han L, Ruiqing X (2018) Utilization-oriented spectrum allocation in an underlay cognitive radio network. IEEE Access 6:12905–12912 2. Pang D, Deng Z, Hu G, Chen Y, Xu M (2018) Cost sharing based truthful spectrum auction with collusion-proof. China Commun 2(15):74–87
24
V. Mthulisi et al.
3. Mapunya SS, Velempini M (2019) The design and implementation of a robust scheme to combat the effect of malicious nodes in cognitive radio ad hoc networks. South Afr Comput J 2(31):178–194 4. Boddapati KH, Bhatnagar RM, Prakriya S (2018) Performance of incremental relaying protocols for cooperative multi-hop CRNs. IEEE Trans Veh Technol 67(7):6006–6022 5. Wei Z-H, Hu B-J (2018) A fair multi-channel assignment algorithm with practical implementation in distributed cognitive radio networks. IEEE Access 6:14255–14267 6. Osama E, Maha E, Osamu M, Hiroshi F (2018) Game theoretic approaches for cooperative spectrum sensing in energy-harvesting cognitive radio networks. IEEE Access 6:11086–11100 7. Chakraborty A, Banerjee JS, Chattopadhyay A (2017) Non-uniform quantized data fusion rule alleviating control channel overhead for cooperative spectrum sensing in cognitive radio networks. In: 2017 IEEE 7th international advance computing conference (IACC). Hyderabad, India 8. Mapunya S, Velempini M (2018) Investigating spectrum sensing security threats in cognitive radio networks. In: Ad Hoc networks. Springer, Niagara Falls, pp 60–68 9. Kishore R, Ramesha CK, Tanuja S (2016) Superior selective reporting mechanism for cooperative spectrum sensing in cognitive radio networks. In: 2016 international conference on wireless communications, signal processing and networking (WiSPNET), Chennai, India 10. Ouyang J, Lin M, Zou Y, Zhu W-P, Massicotte D (2017) Secrecy energy efficiency maximization in cognitive radio networks. Open Access J 5:2641 11. Guo H, Jian W, Luo W (2017) Linear soft combination for cooperative spectrum sensing in cognitive radio networks. IEEE Commun Lett 21(7):1089–7798 12. Morozov MY, Perfilov OY, Malyavina NV, Teryokhin RV, Chernova I (2020) Combined approach to SSDF-attacks mitigation in cognitive radio networks. In: 2020 systems of signals generating and processing in the field of on board communications, Moscow, Russia, Russia, 2020 13. Sun Z, Xu Z, Hammad MZ, Ning X, Wang Q, Guo L (2019) Defending against massive SSDF attacks from a novel perspective of honest secondary users. IEEE Commun Lett 23(10):1696– 1699 14. Liu X, Li F, Na Z (2015) Optimal resource allocation in simultaneous cooperative spectrum sensing and energy harvesting for multichannel cognitive radio. J Latex Class Files 5:3801– 3812 15. Maric S, Reisenfeld S, Goratti L (2016) A simple and highly effective SSDF attacks mitigation method. In: 2016 10th international conference on signal processing and communication systems (ICSPCS), Gold Coast, QLD, Australia 16. Pei Q, Li H, Liu X (2017) Neighbour detection-based spectrum sensing algorithm in distributed cognitive radio networks. Chin J Electron 26(2):399–406 17. Mapunya S, Velempini M (2019) The design of byzantine attack mitigation scheme in cognitive radio ad-hoc networks. In: 2018 international conference on intelligent and innovative computing applications (ICONIC), Holiday Inn Mauritius, Mon Trésor, Plaine Magnien, Mauritius 18. Wang H, Yao Y-D, Peng S (2018) Prioritized secondary user access control in cognitive radio networks. IEEE Access 6:11007–11016 19. Nie G, Ding G, Zhang L, Wu Q (2017) Byzantine defense in collaborative spectrum sensing via Bayesian learning. IEEE Access 5:20089–20098 20. Ngomane I, Velempini M, Dlamini SV (2016) The design of a defence mechanism to mitigate the spectrum sensing data falsification attack in cognitive radio ad hoc networks. In: 2016 international conference on advances in computing and communication engineering (ICACCE), Durban, South Africa 21. Avinash S, Feng L, Jie W (2008) A novel CDS-based reputation monitoring system for wireless sensor networks. In: 2008 the 28th international conference on distributed computing systems workshops, Beijing, China 22. Abdelhakim M, Zhang L, Ren J, Li T (2011) Cooperative sensing in cognitive networks under malicious attack. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), Prague, Czech Republic
Spectrum Sensing Data Falsification Attack …
25
23. Ngomane I, Velempini M, Dlamini SV (2018) The detection of the spectrum sensing data falsification attack in cognitive radio Ad Hoc networks. In: 2018 conference on information communications technology and society (ICTAS), Durban, South Africa 24. Chen C, Song M, Xin C (2013) A density based scheme to countermeasure spectrum sensing data falsification attacks in cognitive radio networks. In: 2013 IEEE global communications conference (GLOBECOM). Atlanta, GA, USA 25. Pongaliur K, Xiao L (2014) Multi-fusion based distributed spectrum sensing against data falsification attacks and Byzantine failures in CR MANET. In: 2014 IEEE 22nd international symposium on modelling, analysis & simulation of computer and telecommunication systems. Paris, France 26. Hyder CS, Grebur B, Xiao L, Ellison M (2014) ARC: adaptive reputation-based clustering against spectrum sensing data falsification attacks. IEEE Trans Mob Comput 13(8):1707–1719 27. Wang W, Li H, Sun Y, Han Z (2009) Attack-proof collaborative spectrum sensing in cognitive radio networks. In: 2009 43rd annual conference on information sciences and systems, Baltimore, MD, USA, 2009 28. Yu FR, Tang H, Huang M, Li Z, Mason P (2009) “Defense against spectrum sensing data falsification attacks in mobile ad hoc networks with cognitive radios. In: MILCOM 2009–2009 IEEE military communications conference. Boston, MA, USA 29. Ngomane I, Velempini M, Dlamini SD (2018) Trust-based system to defend against the spectrum sensing data falsification attack in cognitive radio ad hoc network. In: International conference on advances in big data, computing and data communication system, Durban, South Africa 30. Ngomane I, Velempini M, Dlamini SV (2017) Detection and mitigation of the spectrum sensing data falsification attack in cognitive radio ad hoc networks. In: Southern Africa telecommunication networks and applications conference (SATNAC) 2017, Barcelona, Spain 31. Fragkiadakis AG, Tragos EZ, Askoxylakis IG (2013) A survey on security threats and detection techniques in cognitive radio networks. IEEE Commun Surveys Tutorials 15(1):428–445
Lean Manufacturing Tools for Industrial Process: A Literature Review Gustavo Caiza , Alexandra Salazar-Moya , Carlos A. Garcia , and Marcelo V. Garcia
Abstract Any company or industry that wants to go ahead as a competitive company should know, analyze, and project its processes toward a reality of conceptual innovation and technical applications that allow it to delve into a sustained control in the use of its resources. To improve the productivity required to increase profits from production processes, the Japanese Philosophy of Lean Manufacturing is proposed as a strategy to reduce anything that does not add value to processes, i.e., Productive Waste. This article proposes a literary review demonstrating the effectiveness of Lean Manufacturing in different industrial approaches, with different themes, always framed in the Continuous Improvement of the productive industrial environment. Keywords Lean manufacturing · Productive waste · Continuous improvement · Productivity
G. Caiza (B) Universidad Politecnica Salesiana, UPS, 170146 Quito, Ecuador e-mail: [email protected] A. Salazar-Moya · C. A. Garcia · M. V. Garcia Universidad Técnica de Ambato,UTA, 180103 Ambato, Ecuador e-mail: [email protected] C. A. Garcia e-mail: [email protected] M. V. Garcia e-mail: [email protected]; [email protected] M. V. Garcia University of Basque Country, UPV/EHU, 48013 Bilbao, Spain
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_3
27
28
G. Caiza et al.
1 Introduction In the Production Management structure, the economic component that a manufacturing company must invest to obtain a final product is transcendental to achieve the operational success of the company [6]. So, developing effective resource schemes for competitive production is a strategy that will allow an industry to achieve consistent development for its present and future plans [3]. Today, in an increasingly competitive market, where work is conceived as a survival strategy, manufacturing companies, as well as other companies, are determined to understand its processes and improve them in a systematically way, through the adoption of management tools and philosophies [14] and, as it will demonstrate, the Lean Manufacturing or Lean Production is one of them [18]. The Toyota Production System is the anteroom subsequently known as Lean Manufacturing, influencing manufacturing techniques around the world and is accepted by countless manufacturers in different disciplines, because, through the application of a series of management techniques, it is oriented to produce only what is necessary at the right time, through the elimination of production waste that cause production costs to rise [21]. Due to it is a tool that supports business productivity and higher revenue management for companies, there is a wide range of research carried out over the years, with topics related to Lean Manufacturing, its theoretical basis, the tools that make it up, and the technological support of which it is currently relying on, to go one step further in the innovation of production processes [10]. The above concludes the need to have a greater applied knowledge of Lean Manufacturing, so the present study aims to establish a bibliographic framework that allows everyone knowing at the theoretical level what is the Lean Manufacturing, the principles that govern it, but above all, the applicability of the different techniques that make up this methodology in different types of industries. This paper has the following sections: Sect. 2 describes the outline that was followed to select the most relative sources of information to sustain the research, Sect. 3 indicating an assessment of the information found, noting the issues that would have been important to find and could not be referenced in the related work, finally Sect. 4 points out the conclusions that had been reached after the conduct of this investigation, in addition to a proposal for further research work based on what the present has ceased to learn and unmet expectations.
2 Research Methodology To select the sources of information that are part of this analysis, I used a previous conceptual base knowledge of Lean Manufacturing. Based on this knowledge, the search is made in specialized digital books loaded in different repositories, the selection of these books has not been limited by their publication date due to the theory
Lean Manufacturing Tools for Industrial Process …
29
Search and Selection Scheme.png
Item Search and Selection Scheme.png
(a) Book Search
(b) Scientific Item Search
(c) Databases consulted Fig. 1 Research methodology
was born in the early years of the twentieth century and in essence has not undergone changes over the years, although if it has been reached by the modernization of companies, technological advances and evolution of knowledge management. Besides, this search was both in the English language, which is considered the universal language and in which a greater range of technical information was found, as in the Spanish language, in which information can be obtained closer to Latin American reality. To select the books, the authors known (for their impact or development on the subject) were prioritized and then moved on to the content according to the theme raised in this study. To contrast the theoretical framework of the Lean Manufacturing, information was sought in scientific articles, which in addition to containing the conceptual basis already identified, are a good reference of experiments, applications, and research conducted to implement this philosophy in different companies, of different types, of different scopes and obviously, with different results. The scientific articles were searched in a first instance with an age limit of 5 years, and with this age, the largest number of articles have been used, however, the search was expanded years ago to focus more content. The search was also done with priority in English and later in Spanish. The search schema is illustrated in Fig. 1a, b. As a database for the search for scientific articles, IEEE Explorer was selected in the first instance because it has been validated that it is the one which has the most related information to the research topic, with 50% of the total referenced articles, the remaining 50% comes from other bases such as Google Scholar, Springer Link,
30
G. Caiza et al.
Scientific.Net, Dialnet, Mdpi, and Semantic Scholar (see Fig. 1c), where articles published in scientific journals and conferences proceedings developed around the world were consulted, using for the respective searches the keywords: Lean Manufacturing, Productive Waste, Continuous Improvement, Productivity.
3 Literature Review 3.1 Lean Manufacturing Lean Manufacturing is a philosophy, and that is well known, so in a first definition, it is said that is a philosophy that talks of doing a hundred of small improvements daily, instead of making a single “boom” once a year, with a focus on improvement focused on detail [20]. This philosophy promotes the scope of objectives and the acquisition of a particular culture of savings, based on quality control and the application of strategies, tactics, and skills [19], so it is understood that not only material resources are spoken of, but also knowledge, communication and information flows, according to the analysis carried out by Gleeson, Goodman et al. in their study [7], where they observed the performance of the staff and their involvement in projects and developments concerning Lean Manufacturing, rescuing the value of the cognitive burden in the fruit of labor actions. Lean Manufacturing is also a model of organization and management of manufacturing systems and the resources implicit within such management: human resource, material resource, machinery resource and the method by which processes are executed to improve the quality of the products and the efficiency achieved through the constant identification and reduction of productive waste [12]. By reducing the waste in value-adding activities, the lead time is reduced, which should not be greater than 10 times the time you add value to the product [15]. In the study [16] carried out by A. Sayid and Nar-E-Alam, Lean Manufacturing is applied in the production processes, validating that a reduction in delivery time is achieved, contrasting the initial reality of the plant with the final reality after the project. Lean Manufacturing is not just a concept, it is also important to know the objective that is pursued with its implementation, and this is, to generate a Culture of Continuous Improvement, based on communication and teamwork, adapting the method to each particular case, intending to find new ways of doing things more agilely, flexibly, economically [8] and in all kinds of industries; as demonstrated by Liu and Yang and Prunhetgrueng in their investigations about the effective application of Lean Manufacturing tools in footwear production [11, 13], also B. Arevalo—Barrera, F. Parreno—Marcos et al. in the block building industry [2], as well as Juárez López, Pérez López et al. in the manufacture of railway transport [9] and Durakovic with Demir et al. in Lean Manufacturing applicability study in a lot of types of industries [5].
Lean Manufacturing Tools for Industrial Process …
31
Lean Manufacturing application is feasible throughout the supply chain, so, it is not only applicable just in productive part, this is fully proven by Theagarajan and Manojar [17], who developed research on the improvement of the Supply Chain in a leather footwear company using Lean Management practices, concluding that the performance of the system is improved throughout the supply chain, rescuing the application in terms of the culture in which the industry is involved, since one might think that at birth in Eastern culture, it should go inside into the Japanese culture to make the Lean Manufacturing work because the really important thing is to change the views or the mood with which processes are managed in a company, as Chiarini, Baccarani et al. in their study, in which they compare the Toyota Production System, the philosophy of Lean Manufacturing and Zen Philosophy derived from Japanese Zen Buddhism [4].
3.2 The Lean Manufacturing Implementation Route For the implementation of Lean Manufacturing, it is essential to know that there is no universal method, but this development will be affected by the nature of the industry and its production processes. However, in the first instance, you must run the techniques and tools that allow you to modify the ways of working. It is also important to know that it is preferable to start with an area or pilot process and then extend the lessons learned and hits to more processes. It is also essential the commitment of senior management for the implementation of Continuous Improvement due to there is necessary the opening of minds of workers and sowing a philosophy in them, in addition to the need to make investments in both training and structural modifications [8] (Table 1). It is suggested to start by identifying the waste (matrix for analysis) and the relationship between them, to assess the impact of the waste on the productivity and select the most important ones and reach the root cause of the waste (fishbone analysis) to determine the actions to be taken to reduce waste or, in the best case, eliminate it; in the study conducted by E. Amrina and R. Andryan, two specific types of matrices have been used for the identification of residues and prioritization from a study through the use of statistical quality analysis tools [1].
3.3 Discussion and Analysis of Results When reviewing the literature loaded in e-book format, 22 books were investigated in total, referred to 10, which are considered to have the best information for the purpose pursued in the present work, in these books, over time the approach that the Toyota Production System conveyed into the philosophy of the Lean Manufacturing, the techniques, and tools that this philosophy was shown, in addition to presenting
Waiting
Overproduction
Excessive storage space, Containers or boxes too large, Low stock rotation, High storage costs, Excessive means of handling (forklifts, etc.)
Inventory
Possible causes
Processes with low capacity, Unidentified or out-of-control bottlenecks, Excessively long machine change or setup times, Misper-production forecasts, Overproduction, Rework for product quality defects, Hidden problems and inefficiencies Excessive storage space, Containers or Processes with low capacity, Unidentified boxes too large, Low stock rotation, High or out-of-control bottlenecks, storage costs, Excessive means of Excessively long machine change or handling (forklifts, etc.) setup times, Misper-production forecasts, Overproduction, Rework for product quality defects, Hidden problems and inefficiencies The operator waits for the machine to Non-standardized working methods, Poor finish, Excess of material queues within layout due to accumulation or dispersion the process, Unplanned stops, Time to of processes, Capacity imbalances, Lack perform other indirect tasks, Time to of appropriate machinery, Delayed execute reprocessing, The machine waits operations by the omission of materials for the operator to finish a pending task, or parts, Production in large batches, An operator waits for another operator Low coordination between operators, High machine preparation times or tool changes
Feature
Type of waste
Table 1 Actions against waste in production
(continued)
Balanced production; line balancing, Product-specific layout; Manufacturing in cells in u, Automatization with a human touch (Jidoka), Rapid change of techniques and tools (SMED), Multipurpose training of operators, Supplier delivery system, Improve maintenance according to assembly sequences
Balanced production, Product distribution in a specific section; cell manufacturing, Jit supplier deliveries, Intermediate task monitoring, Changing the mindset in organization and production management
Balanced production, Product distribution in a specific section; cell manufacturing, Jit supplier deliveries, Intermediate task monitoring, Changing the mindset in organization and production management
Lean actions against waste
32 G. Caiza et al.
Defective, rejected products and reworks
Containers are too large, or heavy and difficult to handle, Excessive movement operations and material handling, Maintenance equipment circulates empty through the plant
Transport /unnecessary movements
Possible causes
Obsolete layout, A large lot of batches, Poor and inflexible processes, Non-uniform production programs, High preparation times, Excessive intermediate warehouses, Low efficiency of operators and machines, Frequent reprocesses Waste of time, material resources and Unnecessary movements, Suppliers or money, Inconsistent planning, processes not able, Operator errors, Questionable quality, Complex process Inadequate operator training or flow, Additional human resources needed experience, Inappropriate techniques or for inspection and reprocessing, Extra tools, Poor or poorly designed production space and techniques for reprocessing, process Unreliable machinery, Low motivation of operators
Feature
Type of waste
Table 1 (continued) Flexible manufacturing cell-based equipment layout, Gradual switch to flow production according to set cycle time, Multipurpose or multifunctional workers, Reordering and readjustment of the layout to facilitate employee movements Autonomatization with a human touch (Jidoka), Standardization of operations, Implementation of warning elements or alarm signals (andon), Anti-error mechanisms or systems (Poka-Yoke), Increased reliability of machines, Implementation preventive maintenance, Quality assurance in position, Production in a continuous flow to eliminate manipulations of workpieces, Visual control: Kanban, 5S, and andon
Lean actions against waste
Lean Manufacturing Tools for Industrial Process … 33
34
G. Caiza et al.
different ways of applying the Lean Manufacturing to continue with the conceptualization and application of Continuous Improvement. However, among the contents that would be needed to be found in specialized books would be the differences in the application schemes of the methodology in different industries and eventually the difference that exists in the implementation between a manufacturing company and a service company. That is, it would be important to find a compendium of lived experiences and lessons learned, highlighting the successes and errors that have arisen, to this book may be an example and guide of application. Concerning the scientific articles, 58 research documents were selected at first instance (once duplicates have been removed), the first review was according to what is described in the title and in the abstract, after a reading of the complete document in which the tests that have been carried out, the universe analyzed and the explicit of the conclusions are valued, finally 34 articles are referenced, which, when analyzing its contents, allow to counteract theoretical information with the practical application in different management environments. The authors have not agreed on the number of Tools of Lean Manufacturing, their classification, and possible use schemes, more than 30 tools have been identified to implement this methodology, but in the research that was reviewed cover at most 10 of them, with special emphasis on Value Stream Mapping (VSM) as a tool for current state analysis and projection of a future state. Likewise, the Toyota Production System, in representing these tools in House scheme, proposes that they should be applied structurally from its foundations, but, no material has been identified that speaks of the implementation of the methodology under this scheme of "construction", that is, passing through all levels: Foundations, heart, pillars and finally the roof that is the achievement of a balanced, thin or Lean company.
4 Conclusions This work has demonstrated conceptually the value that Lean Manufacturing has in business and industrial management since as has been stated, it focuses on the disposal of productive waste, which leads to a reduction in product costs and ultimately a greater prospect of profit in the industry. Lean Manufacturing consists of different timely and schematic application tools in different work environments, then, the industries will depend on their current reality, their business approach, and their strategic planning to select which of these tools will be most useful to achieve the objectives that have been set when choosing this philosophy as a strategy to improve their productivity. The realization of this work has allowed to define the theme for future research, in which a first work is proposed on the techniques of measuring the increase in productivity and the decrease in costs in finished products, and a second work on the use of technological applications in the different tools of Lean Manufacturing.
Lean Manufacturing Tools for Industrial Process …
35
References 1. Amrina E, Andryan R (2019) Assessing wastes in rubber production using lean manufacturing: a case study. In: 2019 IEEE 6th international conference on industrial engineering and applications (ICIEA), no I, pp 328–332. IEEE 2. Arevalo-Barrera B, Parreno-Marcos FE, Quiroz-Flores JC, Alvarez-Merino JC (2019) Waste reduction using lean manufacturing tools: a case in the manufacturing of bricks. In: 2019 IEEE international conference on industrial engineering and engineering management (IEEM), pp 1285–1289. IEEE. 10.1109/IEEM44572.2019.8978508 3. Boginsky AI, Chursin AA (2019) Optimizing product cost. Russ Eng Res 39(11):940–943 4. Chiarini A, Baccarani C, Mascherpa V (2018) Lean production, Toyota production system and Kaizen philosophy. TQM J 30(4):425–438 5. Durakovic B, Demir R, Abat K, Emek C (2018) Lean manufacturing: trends and implementation issues. Period Eng Nat Sci (PEN) 6(1):130 6. Fawcett SE, Smith SR, Bixby Cooper M (1997): Strategic intent, measurement capability, and operational success: making the connection. Int J Phys Distrib Logist Manage 27(7):410–421 7. Gleeson F, Goodman L, Hargaden V, Coughlan P (2017) Improving worker productivity in advanced manufacturing environments. In: 2017 International conference on engineering, technology and innovation (ICE/ITMC), vol 2018, pp 297–304. IEEE 8. Hernández Matías JC, Vizán Idoipe A (2013) Lean manufacturing, Conceptos. Técnicas e Implantación. Fundación EOI, Madrid 9. Juárez López Y, Pérez Rojas A, Rojas Ramírez J (2012) Diagnóstico de Procesos Previos a la Aplicación de la Manufactura Esbelta. Nexo Revista Científica 25(1):09–17 10. Khalaf Albzeirat M (2018) Literature review: lean manufacturing assessment during the time period (2008–2017). Int J Eng Manage 2(2):29 11. Liu Q, Yang H (2017) Lean implementation through value stream mapping: a case study of a footwear manufacturer. In: 2017 29th Chinese control and decision conference (CCDC), pp 3390–3395. IEEE 12. Madariaga Neto F (2020) Lean manufacturing: Exposición Adaptada a la Fabricación Repetitiva de Familias de Productos Mediante Procesos Discretos. Bubok Publishing 13. Phetrungrueng P (2018) Waste reduction in the shoe sewing factory. In: 2018 5th international conference on industrial engineering and applications (ICIEA), pp 340–344. IEEE 14. Prashar A (2016) A conceptual hybrid framework for industrial process improvement: integrating Taguchi methods, Shainin System and Six Sigma. Product Plan Control 27(16):1389–1404 15. Santos J, Wysk R, Torres JM (2006) Improving production with lean thinking. Wiley, Hoboken 16. Sayid Mia A, Nur-E-Alam (2017) Footwear industry in Bangladesh: reduction of lead time by using lean tools. J Environ Sci Comput Sci Eng Technol 6(3). http://jecet.org/download_ frontend.php?id=192&table=SectionC:EngineeringScience 17. Theagarajan SS, Manohar HL (2015) Lean management practices to improve supply chain performance of leather footwear industry. In: 2015 international conference on industrial engineering and operations management (IEOM), pp 1–5. IEEE 18. Ur Rehman A, Usmani YS, Umer U, Alkahtani M (2020) Lean approach to enhance manufacturing productivity: a case study of Saudi Arabian factory. Arab J Sci Eng 45(3):2263–2280 19. Wilson L (2010) How to implement lean manufacturing. Mc Graw Hill Companies, New York 20. Womak JP, Jones DT (2003) Lean thinking, banish waste and create wealth in your Corporation. Simon & Schuster Inc., New York 21. Wong YC, Wong KY, Ali A (2009) Key practice areas of lean manufacturing. In: 2009 international association of computer science and information technology, Spring conference, pp 267–271. IEEE
Lambda Computatrix (LC)—Towards a Computational Enhanced Understanding of Production and Management Bernhard Heiden, Bianca Tonino-Heiden, Volodymyr Alieksieiev, Erich Hartlieb, and Denise Foro-Szasz
Abstract This paper describes why and how the Artificial Intelligence (AI) discipline will affect decisions and what is the difference to decisions in the past, focusing on the application to industrial production. From this analysis with a global economic systemic towards universal model, there will be given a globally emerging company structures outlook, concerning their decision situation, and way how to decide in future rightly. For this purpose, universal logical tools, that implement the lambda calculus for quantification logic may be useful. We define these as Lambda Computatrix (LC). Examples are Theorem Provers (TP) like Isabelle or Lambda-Prolog. They help to decide precisely to reach company, societal and environmental goals. The key issue of the LC is the universal approach of computation, which is connecting graph-theoretically potentially every node with each other one by a communication edge. LC hence enables increasingly intelligent communication, informationally and materially in terms of logistics and production processes. Keywords Lambda computatrix · Artificial intelligence · Industrial production and management · Theorem prover
B. Heiden · E. Hartlieb · D. Foro-Szasz Carinthia University of Applied Sciences, 9524 Villach, Austria B. Heiden (B) · B. Tonino-Heiden University of Graz, 8010 Graz, Austria e-mail: [email protected] URL: http://www.cuas.at V. Alieksieiev National Technical University ‘Kharkiv Polytechnic Institute’, 61002 Kharkiv, Ukraine
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_4
37
38
B. Heiden et al.
1 Introduction 1.1 How Artificial Intelligence (AI) Can Support and Regain Human Society Values “First they sleep, then they creep, afterwards they leap.” In the first year of planting something, nature sleeps, in the second year, it creeps and does really hard to start and maintain and from the third year on it leaps and makes its progress to establish a big plant and tree, so a gardener’s saying states. When people want to establish an institution, this is far more the case. So, how can AI provide for starting and maintaining ongoing management and processes? (1) First of all, AI-devices can systematically read all information, that is available on earth today, about the past and the presence. (2) Secondly, AI may find relations between connected relations as a human with intuition would, for instance in a morphic resonance [1, 2] and the state of the “ether” in the magnetic field (Einstein in [3, 4]). The Internet of the united universe web (uuw), as a projection for possible future “universe wide” technologies and cultures. And (3) thirdly, AI gives an output of what “she” has found and thinks can be a fruitful next step, this is the predictive part. An example for the above categorial analysis could be the following: One entity (e.g. country, region, organization, etc.) wants to build a new university with the aim of making scientific research on most needed subjects (cf. [5]). Another entity thinks about strengthening their inter- and transdisciplinary research. A third entity is experienced in making lectures online available for the whole world (cf. [6]). AI would bring all this information according to (1), (2) together and give various outputs (3), how to handle, manage and improve all these institutions. As a short summary in Table 1, a universal informational applicable processing task is shown, in accordance with the above said. This can then also be regarded as basic cybernetic cycle in the close to human context. With regard to the theory U of Otto Scharmer et al. [7, 8] (1) corresponds to the “seeing” process, (2) to the meditation point, the point of maximum relatedness with the world—which he calls “presencing” and (3) corresponds to the emerging paradigm, which he calls “prototyping”.
Table 1 Categorial cybernetic steps in information processing from humans as well as machines (1) Read Data (2) (3)
Correlate Project
Correlation Extrapolation
Lambda Computatrix (LC)—Towards a Computational Enhanced …
39
1.2 Production and Management Production and management decisions are nowadays necessarily increasingly fast, because of faster product and innovation cycles, and hence their consequences are far reaching in terms of societal impact as well as of economic one. For this reason these decisions have to be “proved” more diligently, always assuming that the company behind those decisions is governing their decisions according to economic market requirements, which includes a lot of other categories of decisions, and which makes those decisions hence multicriterial or multivariate. Example in an automated production, for mass production, the decisions have to be done fast and at the same time rightly with regard to the current best available knowledge in the production task.
1.3 Theorem Proving (TP) TP and with it the higher order logic goes back to Frege’s “Begriffsschrift” in 1879 (cf. [9]). Later, in 1940, Alonzo Church formalized the now valid form as the Simple Type Theory (STT), which is now the base for a wide range of modern TPs [9, 10]. There are nowadays existing TPs that are an implementation of formal logic on the computer [11]. According to [11, p. 11] this logic can be divided into three categories: (1) propositional logic (fully automated), (2) first-order logic (partly automated), (3) Higher Order Logic (HOL) (interactively automated). Applications of TP are manyfolded. They can be used in technical applications to make a reliability analysis of systems, which are too difficult to calculate and are system critical [12]. TPs are, because they are open with regard to solutions, preferable to be used dialogically. This implies human interaction. But another way has recently been opened up to usage: Machine learning combined with TPs. These can then be further automated tremendously [13]. This can be seen as an AI-self application strategy.
1.4 Content and Organization of This Paper In this work we will first give in Sect. 1.5 the goals of this work. In Sect. 2 we describe our understanding of the Lambda Computatrix (LC). In Sect. 3 an overview, what are characteristic tasks in management and what are some methods typically used out of a wide manifold is given. In Sect. 4 we sketch what is TP with regard to applications and in Sect. 5 we look at TP, production and management. We conclude how TP can be applied to the given characteristic tasks and will summarize important findings and finally give an outlook for future directions of TP applications in production and management.
40
B. Heiden et al.
1.5 Goal The goal of this work is to answer the following three questions: (1) What is TP especially in the context of production and management? (2) How can the management actively use TP in production processes? (3) Why is TP increasingly important for future production and management?
2 Lambda Computatrix (LC) The development of programming languages began as early as the 19th century, but significant progress was not made until after the rise of computation in the fifties (>1940, 1950) [14]. Thus, about a thousand programming languages have been developed since the development of the first computer. Konrad Zuse developed the very first programming language between 1942 and 1945 and called it “Plankalkül”. Over the last 80 years, “Plankalkül” has given rise to many different forms and variations of programming languages [15]. One of these languages is Prolog, which is related to this work. Developed in the early 1970s by Alain Colmerauer, Prolog is a combination of the terms “Programming” and “Logic” and thus stands for a programming language that is based on formal logic [16]. Prolog is a declarative programming language. In addition to this, it is also divided into procedural (or imperative) programming languages. The difference lies in the approach. In imperative programming, a desired result is achieved by telling the computer exactly which sequence it has to follow. In declarative programming you define what you want to achieve and let the computer find out the “optimal way” to achieve it [17]. We see, that this is a “search” process, which can in general be regarded as a key and core feature of AI, and as an optimisation task. The optimisation is done by the logic of reasoning in natural language. Logic can be regarded itself as an algorithm related closely to humans and their language. Hence, it must “logically” inherit a natural process governed by its evolutionary conditions. λ-Prolog is further more a logical programming language, that extends Prolog by terms of higher functions, λ terms, higher order associations, polymorphic types and mechanisms for creating modules and secure abstract data types. By introducing strong type binding, which is missing in Prolog, it is possible, for example, to distinguish between numbers and sets of numbers or entities and functions. λ-Prolog is mainly used for “meta-programming”, i.e. programmes or programming languages that can be self-manipulated or adapted. Meta-programming is also used for TPs. In summary, λ-Prolog is a language based on the Lambda (λ)—calculus, which can be used to clearly prove what a computable function is. The λ-calculus was used in the 1930s by Alonzo Church and Stephen Cole Kleene to give a negative answer to the problem of decision-making and to find the foundations of a logical system. The λ-calculus has significantly influenced the development of functional (typified) programming languages.
Lambda Computatrix (LC)—Towards a Computational Enhanced …
41
3 Simulation in Production and Management Methods Simulation programmes show a high benefit when measurements or experiments in reality have limited properties. Example population development could be too slow, autocatalytic behaviour could be too fast or factory planning could be too expensive. With suitable modelling, a simulation can execute even extensive systems and present correlations (compare (2) in Table 1) between individual system variables. This provides a basis for logging, analysing and interpreting of possible processes [18]. In management it is important to decide according to a complex environment. A key decision criterion is to look at cycles, which is important as it systemically can be described with this process how order emerges in self-organization as well as in cybernetic systems [19]. As an example in [20] dynamic backcycling is modelled by Technical Innovations Systems (TIS) cycles, Industry Life-Cycles (ILC) and Technology Life-Cycles (TLC). Markard focusses on the term that a TIS-cycle is coupled to TLC or ILC with regard to the sigmoid growth phases (formative, growth, mature, decline) and has a corresponding structure. So, the management has to consider where TIS is in the development. This can also be understood as a higher order process, as explained in Sect. 5 more explicitly. So, with regard to innovative systems (TIS) it is important to adequately model the TIS, which could be done with TP, as one possibility, to predict (cf. also (3) in Table 1) its behaviour and to make management decisions. Actual management methods in the dawn of Industry 4.0 can be found in [21]. A typical management method is to build innovation cooperation or build on a technology and competence analysis [22]. Other methods are strategic technology management [23] or normative corporate governance [24]. A method to build up new businesses by management is trend anticipating business model innovation [21].
4 Theorem Proving (TP) We sum up here possible TP applications for exemplary application types, related to management and production. The AI-legal and the logistics case.
4.1 AI-Legal According to Benzmüller intelligence can be ordered with regard to five steps [25]: “(1) solve hard problems (2), successfully act in known, unknown and dynamic environment (requires perception, planning, agency, etc.), (3) reason abstractly & rationally, trying to avoid inconsistency and contradiction, (4) reflect upon itself and to adjust its reasoning & behaviour with respect to upper goals and norms, and (5) interact socially with others entities and to align own values and norms with those of a larger society for a greater good.”
42
B. Heiden et al.
Steps (1, 2) are in his opinion the actual state of affairs in the world. With (3) TPs come into play. By means of this, e.g. number theoretic proofs can be given that otherwise could not be accomplished. In this case mathematics and number theory is the application. But there are far more interdisciplinary applications possible, like in law, production, and management. When we look at the internationalizing law, e.g. Lurger 1999 with regard to unification of European contract law, she states that the universal principle is ’contractual solidarity’ (’Vertragliche Solidarität’) [26, p. 155], [27, p. 17,19,76], which can be regarded in good accordance to number (5) of Benzmüller. We come to the same conclusion when we look at this from an evolutionary point of view, as done in Eccles [28], stating that the human language leads to increasing cooperative or social behaviour, as the aggression centre in human evolution is decreasing in growth rate relatively compared to social resp. language brain parts in humans. The natural language can be understood in this context as the precursor of TP, so TP is the other way round the logical extension of natural language for humans, a suitable automation application. This means that it is one possible way of extending the mind by automation support of TP (cf. also Benzmüller above (3)–(5)). By means of this, not only an otherwise unreachable form of social interaction becomes reality, but also their consistency and logical adequacy. Factual social relations become then true on the base of mind-extending ever complex reasoning. For this human and machine read- and controllability is the key driving force. This will then lead not only to increasingly reasonably laws, but also to larger possible communities, uniting them in complex and differentiated form.
4.2 Logistics Theorem provers in logistics are, e.g. investigated in [29] for Logistics Service Supply Chains (LSCCs) and their base elements like serial and parallel process connections. These processes are modelled in HOL, showing that LSCCs and elementary processes like serial/parallel can be translated into HOL, and hence modelled in this way for logistics, production processes and processes in general. Concerning Logistic Management (LM) & TP in the modern world, the logistic industry has a key role. With increasing complexity of production, retail and market systems as well as with the tendency of modern market to switching over the online sales (especially in time of pandemic), the complexity of logistic systems and supply chains are enlarged. With it, the decision-making process (DMP) is becoming more complicated, has to be executed in faster time and with minimal outlays. The formalization of management decisions in logistics using TP may not only decrease the time for DMP, but also reduce the human mistakes and enable to predict the future logistic systems’ developments. In this section we will try to answer the following questions: (1) what are the main managerial tasks and challenges in logistics; (2) how can these tasks and challenges be solved or optimized and what is the connection to using TP; (3) what is necessary for implementation of TP methods in logistic DMP.
Lambda Computatrix (LC)—Towards a Computational Enhanced …
43
(1) The successful LM is based on reliable communication, integration and coordination throughout the whole supply chain as well as between customers. In the supply chain, e.g. the decision in one element directly affects all other elements, so that these properties are extremely relevant for managing the system as single entity [30]. According to [30, p. 6] the key issues of LM can be, e.g. emphasized: logistics integration, material handling and order picking, transportation and vehicle routing, inventory and warehouse management, etc. In addition to this, according to [31], three strategic level LM-decisions can be distinguished: “(1) Make-to-Order or Make-to-Stock; (2) Push or Pull inventory deployment logic; (3) inventory centralization or decentralization” and we supplement “(4) Make or Buy” as another strategic LM-decision from the organizational point of view. (2) TP can, e.g. be used to check the reliability of the supply chain. First we look at two methods now used in the management of logistics and then to the proposed TP method: (a) In [29] a method consisting of block diagrams, focussing on parallel and series-parallel configurations, or reliability block diagrams, is used for the formal analysis of the logistic service supply chain. (b) In addition to this, the formalization can be used to solve the transportation and vehicle routing problems. Using and proving the algorithms (see [32], Fulkerson Theorem [33, 34]) such problems help to make an, e.g. time-optimal decision fast and with minimal risks. Now to use TP, these methods have to be formalized in form of (c) set theory, formal logic, logical and informational equations [19] or even in natural language to describe the problem of interest in the LM. So, we have to translate (a-b) into a logical form that can be used by TP. (3) To use TP methods in LM there has to be done an automation of the translation processes in both directions. From the language description of LM into, e.g. methods (a-b) and to TP (c) and back. The bidirectional process then generates the higher order according to orgiton theory and leads to an open process through human-machine interaction (cf. [35–37]). An application, how this can be accomplished, is given in the recent work of Benzmüller [38].
5 TP in Production and Management, Summary, Conclusions and Outlook One of the reasons why it is important for future production and management processes to deal with multicriteria decisions intelligently is that interconnected systems are very sensitive to correction solutions, as there arise intrinsic exponential or growth functions, as those are the drivers for economic and efficiency benefit. To deal decisions rightly in such an environment, all participants have to be very careful, as each member can trigger positive or negative exponential caused effects. This means, managers have to be as intelligent as possible, and this will then only be possible by appropriate tools like TP in this case, for logical correct and hence company reasonable multicriteria decisions. In general, this is an important application field of eXplainable AI (XAI) [39].
44
B. Heiden et al.
An important factor for production and management decisions in a company is that there are increasingly complex environments. In these complex environment decisions of higher order or complexity have to take place. Order here can be understood as dynamic process, which is subject to self-organizational processes of a complex unity or organisation (company, institution, etc.), far away from the thermodynamic equilibrium state. Order in this sense can also be explained, by back-coupling processes, in which case the number of feedback loops of some kind is referred to as the degree of (potential) order measure. Completely defined “order” needs in addition an ethical decision about what is “good” (compare also Benzmüller in Sect. 4 (4), (5)). The higher order is then here constituted by “intelligent” solutions of the staff, which means that each staff member has to be informed about processes and be trained to correctly interpret these. This is usually done by means of education. In the context of complex environment a self-guided answer is essential. For this reason, tools for coping with complex decisions, like TPs, simulation tools, etc. have to be used increasingly in successful management and production. As a conclusion, TP or LC can and will be used increasingly. Despite more than 100 years of development we seem to be just in the beginning. Humans will have to use intelligent computer tools to improve their rational communication and their sociality, which then is also potentially uniting humanity as well as actively peace forming (cf. [27, p. 80]). For future generations it will be important to use such tools, as if they were natively given, to be a rational and hence justice oriented society. Finally, TP allows for reasoning in the same way, and in conjunction with reasonable humans or how reasonable humans do with quantified logic. This sort of logic has been shown for some cases so powerful, that it would not even be possible, in the up to now elapsed time of the universe, to perform the same task with a more simple logic [9]. The unifying feature of LC, to be capable of more simple logical forms and to be open to even other forms, allows for universalisation and unification, that is necessary for an increasingly connected and interactive world of human and machine of the human-o, the human—machine—product triangle [35], that is to be managed adequately on the edge of current time.
References 1. Sheldrake R (2020) Wikipedia. Accessed 09 Feb 2020. https://en.wikipedia.org/wiki/ RupertSheldrake 2. Sheldrake R (2009) A new science of life/morphic resonance. https://www.sheldrake.org/ books-by-rupert-sheldrake/a-new-science-of-life-morphic-resonance 3. Mehra J (2001) The golden age of theoretical physics. In: Albert Einstein’s ’first’ paper, World Scientific, pp. 1–18. https://doi.org/10.1142/97898128105880001 4. Mehra J, Einstein A (1971) Albert Einsteins erste wissenschaftliche Ar- beit ’Mein lieber Onkel’ über die Untersuchung des Atherzustandes im magnetischen Felde. Physik J 27(9):386–391. https://doi.org/10.1002/phbl.19710270901
Lambda Computatrix (LC)—Towards a Computational Enhanced …
45
5. Weber A Faßmann: new university in Upper Austria is a strong sig- nal for science and research (in German). https://www.ots.at/presseaussendung/OTS_20200828_OTS0145/ fassmann-neue-universitaet-in-oberoesterreich-ist-starkes-signal-fuer-wissenschaft-undforschung?asbox=box1&asboxpos=1 6. Wikipedia. Massachusetts Institute of Technology. Accessed 09 Feb 2020. https://en.wikipedia. org/wiki/Massachusetts_Institute_of_Technology 7. Scharmer C, Theorie U (2011) Von der Zukunft her führen: Prescencing als soziale Technik. Carl-Auer Verlag 8. Scharmer O, Käufer K (2013) Leading from the emerging future—from ego-system to ecosystem economies—applying theory U to transforming business, society, and self. BerrettKoehler Publishers Inc 9. Benzmüller C, Miller D (2014) Automation of higher-order logic. In: Computational logic. Handbook of the history of logic, vol 9. North Holland, pp 215–253. https://doi.org/10.1016/ B978-0-444-51624-4.50004-6 10. Church A (1940) A formulation of the simple theory of types. J Symbol Logic 5(2):56–68. https://doi.org/10.2307/2266170 11. Ballarin C (2008) Introduction to Isabelle. Acessed 29 Aug 2020. http://www21.in.tum.de/? ballarin/belgrade08-tut/session01/session01.pdf 12. Hasan O, Tahar S, Abbasi N (2010) Formal reliability analysis using theorem proving. IEEE Trans Comput 59(5):579–592. https://doi.org/10.1109/tc.2009.165 13. Kaliszyk C, Chollet F, Szegedy C (2017) HolStep: a machine learning dataset for higher-order logic theorem proving. In: CoRR. arXiv: 1703.00426 14. von Foerster H et al (2004) Cybernetics—Kybernetik, The Macy-Conferences 1946–1953, Volume II - Band II, Essays and Documents—Essays und Dokumente. Ed. by C. Pias. Vol. II, Band II. diaphanes, Zürich, Berlin 15. Zuse H (1999) Geschichte der Programmiersprachen. Forschungsberichte des Fach- bereichs Informatik. Berlin: Technische Universität, 69 S 16. Lockermann AW (2010) PROLOG—Tutorium zur Vorlesung “Datenbanken & Wissensrepräsentation” 17. Jan-Dirk (2018) Imperative vs. deklarative Programmierung in IT-Talents, IT- Talents Blog. ITTalents, IT-Talents Blog. https://www.it-talents.de/blog/it-talents/imperative-vs-deklarativeprogrammierung 18. Dangelmaier W, Laroque C, Klaas A (eds) (2013) Simulation in Produktion und Logistik, vol Band 316. Heinz Nixdorf Institut, zgl. Tagungsband 15. ASIM-Fachtagung Simulation in Produktion und Logistik, Paderborn, 09.-11. Oktober 2013 zgl. ASIM-Mitteilung Nr. 147. Verlagsschriftenreihe des Heinz Nixdorf Instituts. ISBN: 978-3-942647-35-9 19. Heiden B,Tonino-Heiden B (2020) Key to artificial intelligence (AI). In: Advances in intelligent systems and computing. Springer, pp 647–656. https://doi.org/10.1007/978-3-030-55190-249 20. Markard J (2020) The life cycle of technological innovation systems. In: Technological forecasting and social change 153. https://doi.org/10.1016/j.techfore.2018.07.045 21. Granig P, Ratheiser V, Gaggl E (2018) Trendantizipierende Geschäfts-modellinnovation. In: MIT Innovations management zu Industrie 4.0. Springer, pp 97–112. https://doi.org/10.1007/ 978-3-658-11667-58 22. Hartlieb E et al (2018) Aufbau von Innovationskooperationen im Kontext von Industrie 4.0 und IoT. In: MIT Innovations management zu Industrie 4.0. Springer, pp 1–13. https://doi.org/ 10.1007/978-3-658-11667-51 23. Gassmann O, Wecht CH, Winterhalter S (2018) Strategisches Technolo- giemanagement für die Industrie 4.0. In: MIT Innovations management zu Industrie 4.0. Springer, pp 15–27. https:// doi.org/10.1007/978-3-658-11667-5_2 24. Ivancic R, Huber RA (2018) Normative Unternehmensführung 4.0". In: Mit Innovations management zu Industrie 4.0. Springer, pp 139–154. https://doi.org/10.1007/978-3-658-116675_11 25. Benzmüller C, Lomfeld B (2020) Lecture: ethical and legal challenges of AI and data science. In: Benzmüller C, Lomfeld B (eds) Practical seminar block course: 14.-25. September, Freie
46
26.
27. 28. 29.
30.
31. 32. 33. 34. 35. 36. 37.
38.
39.
B. Heiden et al. Universität Berlin. https://mycampus.imp.fu-berlin.de/portal/site/4e732ee4-5f15-4065-8bb1471dafd573e4 Lurger B (1999) Grundfragen der Vereinheitlichung des Vertragsrechts in der Europäischen Union. In: Martiny D, Witzleb N (eds) Auf dem Wege zu einem Europäischen Zivilge- setzbuch. Schriftenreihe der Juristischen Fakultät der Europa-Universität Viadrina Frankfurt (Oder). Springer, Berlin, pp 141–167. https://doi.org/10.1007/978-3-642-60141-5_9 Tonino-Heiden B (2020) Vertragsinterpretation nach dem Ubereinkommen der Vereinten Nationen über Verträge über den Internationalen Warenkauf. Unpublished Work Eccles JC (1989) Evolution of the brain creation of the self. Routledge, London. XV, 282 S. ISBN: 0-415-02600-8 Ahmed W, Hasan O, Tahar S (2016) Towards formal reliability analysis of logistics service supply chains using theorem proving. In: Konev B, Schulz S, Simon L (eds), IWIL—2015, 11th international workshop on the implementation of logics, vol 40. EPiC series in computing. EasyChair, pp 1–14. https://doi.org/10.29007/6l77 Louren HR (2005) Logistics management—an opportunity for metaheuristics. In: Rego C, Alidaee B (eds) Metaheuristic optimization via memory and evolution: Tabu search and scatter search. Springer, pp 329–356. https://doi.org/10.1007/0-387-23667-8_15 Wanke P (2004) Strategic logistics decision making. Int J Phys Distribut Logist Manage 34(6):466–478 Dasgupta S, Papadimitriou C, Vazirani U (2008) Algorithms. The McGraw-Hill Companies, Boston. ISBN: 9870073523408 Dantzig G, Fulkerson D (1955) On the max flow min cut theorem of networks. In: RAND corporation (P-826), pp. 1–13. http://www.dtic.mil/dtic/tr/fulltext/u2/605014.pdf Heiden B et al (2020) Framing artificial intelligence (AI) additive manufacturing (AM) (unpublished) Heiden B, Alieksieiev V, Tonino-Heiden B (2020) Communication in Human—Machine— Product Triangle—Universal properties of the automation chain (unpublished) Heiden B et al (2019) Orgiton theory (unpublished) Heiden B, Tonino-Heiden B (2020) Philosophical studies—special Orgiton theory/Philosophische Untersuchungen - Spezielle Orgitontheorie (En- glish and German Edition) (unpublished) Benzmüller C, Parent X, van der Torre L (2020) Designing normative theories for ethical and legal reasoning: LogiKEy framework, methodology, and tool support. In: Artificial intelligence 287. https://doi.org/10.1016/j.artint.2020.103348 Gunning D (2017) Explainable artificial intelligence (XAI). In: Defense advanced research projects agency (DARPA), nd Web 2
Behavioral Analysis of Wireless Channel Under Small-Scale Fading Mridula Korde, Jagdish Kene, and Minal Ghute
Abstract The demand for wireless communication has grown in recent years due to increase use of mobile services. The wireless communication channel constitutes the basic physical link established in the transmitter and the receiver. It is the challenging situation to model any wireless channel for radio wave propagation over tough geographical terrain like hilly areas, sea surface, mountains, etc. Channel modeling is one of the most fundamental aspects for study of optimization and design of transmitter and receiver. The operating environment factors like fading, multipath propagation, types of geographical areas limit the performance of wireless communication system. In this paper, wireless channel is modeled by randomly time-variant linear systems. The behavioral analysis of the channel model is performed for both the Rayleigh and Rician fading channels in terms of error probability, and it is shown that in small-scale fading, the Rayleigh fading can be preferred over Rician fading due to high SNR. Keywords Small-scale fading · Wireless channel · Rician fading · Rayleigh fading
1 Introduction Due to statistical nature of mobile radio channel, performance of wireless communication systems can be affected severely. The direct line of sight path is rarely available unlike satellite links. In practical situations, line of sight path is obstructed severely by high buildings, hilly regions and woods, etc. Wired channels are stationary and their M. Korde (B) · J. Kene Shri Ramdeobaba College of Engineering and Management, Nagpur 440013, India e-mail: [email protected] J. Kene e-mail: [email protected] M. Ghute Yeshwantrao Chavan College of Engineering, Nagpur 441110, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_5
47
48
M. Korde et al.
behavior can be predictable, but wireless channels are always random in behavior and become difficult to predict as mobile terminal moves in space. The presence of cellular systems is dominant in urban area compared to rural areas where these types of obstructions giving rise to multipath fading are commonly noticed. Therefore, modeling of the medium, i.e., channel becomes absolutely important especially in area where line of sight is obstructed by sea, hills, etc. To obtain the power profile and characteristics of received signal, the simplest model can be given as follows: Let X = Transmitted signal, Y = Received signal, H = Impulse response of wireless channel, Then, the received signal is related with the transmitted signal by convolution operation which can be stated as follows: Y ( f ) = H ( f )X ( f ) + n( f )
(1)
Here H (f) is impulse response of channel in frequency domain and n (f) is the noise response in frequency domain. The major purpose of channel characterization is in the design and planning of communication systems. Performance of wireless communication channels can be described mainly by phenomena, namely, path loss, shadowing, and multipath fading. An accurate estimate of propagation parameters is an essential thing in the design and planning of wireless communication system. Two types of fading are considered in propagation models: large-scale and small-scale fading. •Large-scale fading: When there is very long distance between transmitter and receiver (more than thousands of meter), there is significant attenuation in the average signal-power attenuation. When there is movement occurring at large areas with moderate speed, path loss is present. The attenuation in signal strength and path loss can be because geographical obstructions between the transmitter and receiver. This phenomenon of steady decrease in power is refereed as large-scale fading [1]. •Small-scale fading: When receiver is moved away from transmitter for a fraction of wavelength, the variations occur in the instant received signal power. These variations can extend up to 30–40 dB. These immediate and fast changes in amplitude and phase of received radio signal for a very short duration which lasts for few seconds and for a very small distance of the order of small wavelength is referred as small-scale fading [1]. The focus of channel modeling in this paper is studied based on the physical phenomena like multipath propagation, type of terrains, and Doppler shift due to motion of the mobile. The performance evaluation of mainly Rayleigh and Rician fading is taken into consideration for tapped delay model of wireless channel.
Behavioral Analysis of Wireless Channel Under Small-Scale Fading
49
The paper is organized in this way. Section 2 describes various factors affecting small-scale fading. Section 3 elaborates the concept of wireless channel model considering fading parameters of Rayleigh and Rician distribution. Section 4 illustrates simulation results and discussions. Conclusion of the paper is elaborated in Sect. 5.
2 Small-Scale Fading Causing Factors Many physical factors are influential for small-scale fading in the radio propagation. Some of them are as follows: Multipath propagation: The signal energy is constantly dissipated by rapidly changing environment caused by reflecting objects and scattering in the channel. The energy of signal attenuates in the factors like amplitude, phase and time which gives rise to multiple version of a transmitted signal. These signals are moved in accordance with time and space when arrived at the receiver. These random fluctuations in phase and amplitude of the various multipath components give rise to small-scale fading. Due to intersymbol interference, the phenomenon of small-scale fading and multiple components reaching to receiver increase rapidly. Speed of mobile: Because of Doppler shift effect on every multipath component, a random frequency modulation occurs in radio signal as the speed of mobile station moves away from base station. The speeds of pedestrian mobile station and fast moving mobile station are relatively different. This creates positive Doppler shift if the mobile receiver makes a movement in the direction of the base station and it creates negative Doppler shift if a mobile makes movement in opposite direction from the base station. Surrounding of mobile: The objects present in the vicinity of radio channel play an important role in inducing Doppler shift in electromagnetic waves. When the speed of surrounding objects become greater rate than the mobile station, then this effect needs to be encountered in behavior of wireless channel as it can dominate the effect of the small-scale fading. Otherwise, surrounding objects motion can be ignored. Transmission bandwidth of signal: Bandwidth of signal significantly affects the strength of received signal over the channel. Greater the bandwidth of transmitted radio signal compared to bandwidth of the multipath channel, there will be distortion in the received signal. But the strength of received signal remains only over a limited area. The received signal strength fades very fast if there is drastic difference in between signal bandwidth of transmitted radio and that of multipath channel [2].
50
M. Korde et al.
3 Wireless Channel Model A linear filter which has time varying impulse response can be considered for basic model of wireless channel. The time variations can be existing due to receiver motion in space. The impulse response of the mobile radio channel relates the small-scale changes in received radio signal. Let v is constant ground velocity with which the receiver moves along the ground. If distance d is fixed for modeling channel between transmitter and receiver, then it can be assumed that channel can behave like a linear time invariant system. The multipath components can be significantly present for the various spatial positions of the receiver. Therefore, the impulse response of the linear time invariant channel varies as per the movement and physical distance of receiver. Let x (t) represents the transmitted signal, then the received signal Y (d, t) at a position d can be expressed as ∞ x(τ )h(d, t − τ )dτ
Y (d, t) = x(t) ⊗ h(d, t) =
(2)
−∞
For a casual system, h (d, t) = 0 for t < 0. If v is the velocity, the position of the receiver can be expressed asd = vt. As v is constant, y (vt, t) is a function of t, above equation can be written as ∞ Y (t) = x(t) ⊗ h(vt, t) =
x(τ )h(vt, t − τ )dτ
(3)
−∞
In scenario of slow fading channel, transmitted baseband signal changes at a faster rate than the rate of change of channel impulse response. In this scenario, the nature channel can be considered as static over one or many reciprocal bandwidth intervals. This gives rise to Doppler spread in frequency domain. Noise is an important factor in degradation of channel performance [1]. Large-scale fading is dependent largely on the distance of mobile unit from receiver and relative obstructions in the entire signal path. The occurrence of moderate reflective paths in small-scale fading can be well described by a Rayleigh or a Rician probability density function (PDF). Hence small-scale fading is also called as Rayleigh or Rician fading [3]. When direct dominant path like line of sight path (LOS) is absent between transmitter and receiver, the fading can be considered as Rayleigh fading. Statistical time varying nature of received spectrum can be well demonstrated using Rayleigh distribution in a flat fading signal. The Rayleigh distribution has a probability density function given by
Behavioral Analysis of Wireless Channel Under Small-Scale Fading
51
2 r −r 0≤r ≤∞ p(r ) = 2 exp σ 2σ 2
(4)
where σ is the rms value of the received voltage signal before envelope detection and σ2 is the time-average power of the received signal before envelope detection, r is the received signal voltage level. The corresponding cumulative distributions function for threshold value of R beyond which probability that the envelope of the received signal should not is given by R P(R) = Pr(r ≤ R) =
−R 2 p(r )dr = 1 − exp 2σ 2
(5)
0
In the presence of line of sight propagation path, Rician fading can be considered. The Rician distribution is given by p(r ) =
2 ) Ar r −(r 2 +A 2σ 2 for A ≥ 0, r ≥ 0 e I o σ2 σ2
(6)
A is called as peak amplitude of the dominant signal and I0 is the modified Bessel function of first kind and zero order. Parameter K is used to describe the Rician distribution. K is defined as the ratio between the deterministic signal power and variance of the multipath. K (d B) = 10log
A2 dB 2σ 2
(7)
In Rayleigh fading, multiple reflective paths are many as there is absence of dominant line-of-sight (LOS) propagation path. In Rician fading, there is also a dominant LOS path. With K factor of K = ∞, the fading channel gives worst performance. In this case, it can be considered as worst case fading channel similar to Gaussian channel. With K factor of K = 0, the fading channel gives best performance. In this case, it can be considered as best case fading channel similar to Rician channel [4].
4 Results and Discussions Small-scale fading is the primary source, which affects the performance of wireless channel. For the wireless channel model with described above, Rician fading is considered as small-scale fading factors. Figure 1 shows cumulative distributive function of Rician channel by exploiting Eq. 7. The component, 2σ2 (K + 1) is referred as mean squared value of Rician distribution. σ2 is the variance of the component Gaussian noise processes in (1) [5, 6]. Figure 1 shows CDF of Rician fading channel
52
M. Korde et al. 0
10
-1
10
-2
log CDF
10
-3
10
-4
10
-5
10
-40
-35
-30
-25
-20
-15 -10 Amplitude /RMS (dB)
-5
0
5
10
Fig. 1 Cumulative distributive function of Rician PDF
on a logarithmic probability scale. Figure 2 shows approximated and analytical probability distributive function for Rayleigh distribution exploiting Eq. 4 for the wireless channel model under consideration. The mean value of r mean of the Rayleigh distribution is given by 0.9 approximated PDF analytical PDF
0.8 0.7 0.6
f(r)
0.5 0.4 0.3 0.2 0.1 0
0
0.5
1
1.5 r
2
Fig. 2 The approximated and analytical Rayleigh PDF for σ2 = 0.5
2.5
3
Behavioral Analysis of Wireless Channel Under Small-Scale Fading
∞ rmean = E[r ] =
r p(r )dr = σ
π = 1.2533σ 2
53
(8)
0
σ2r is called as variance of the Rayleigh distribution. It is the representation of ac power in the signal envelope. σr2
= E r 2 − E 2 [r ] =
∞ r 2 p(r )dr −
σ 2π = 0.4292σ 2 2
(9)
0
The median value of r is 1.177σ . The mean and median differ by only 0.55 dB in a Rayleigh fading signal. Assuming that there is no intersymbol interference (flat fading), the small-scale fading can be considered for simulation. Fading level can also be considered as to remain approximately constant for one signaling interval. In AWGN channel model, the fading amplitudes are relatively different than Rayleigh or Rician distributed random variables. This significantly affects the amplitude of the signal as well as power spectrum of the received signal. The modeling of fading behavior can be done by a Rician or a Rayleigh distribution. In general, for a Tapped Delay (TDL) model, two types of fading, Rayleigh and Rician, were considered for different number of taps. From the BER versus SNR plots it was seen that if the channel is modeled with Rayleigh fading, the SNR performance gives better result. Various parameters were taken into consideration, e.g., the number of taps, the Doppler spectrum of each tap, the Rician factors K, and the power distribution of each tap. Signal to Noise ratio increases significantly in case of Rayleigh distribution (Figs. 3 and 4).
5 Conclusion Small-scale fading impacts the time delay and the dynamic fading range of signal levels within a small-scale local area at a receiver antenna. Multipath fading and motion of mobile create degrading of the performances of wireless system. There are essentially three atmospheric phenomena responsible for the multipath propagation, viz., reflection, refraction, and scattering. The factors responsible for these phenomena are interfering objects in the atmosphere like buildings, walls, sharp edges or corners, small objects like lamp posts, etc. In such cases, a deterministic description does not give sufficient information about the radio channel. The statistical methods can be reliable to obtain the exact behavior of the channel. Rayleigh fading can give reliable approximation in a large number of practical scenarios. But in many practical scenarios it becomes invalid. Less number of strong fading is observed in the Rician model. Also comparatively stronger Line-of-Sight (LOS)
54
M. Korde et al. BER vs SNR (dB) -1
10
Rician theory
-2
10
BER
-3
10
-4
10
-5
10
2
4
6
8
10
12
SNR (dB)
Fig. 3 BER analysis of Rician fading channel BER vs SNR (dB) 10
BER
10
10
10
10
-1
Rayleigh theory
-2
-3
-4
-5
5
10
15
20 25 SNR (dB)
Fig. 4 BER analysis of Rayleigh fading channel
30
35
40
45
Behavioral Analysis of Wireless Channel Under Small-Scale Fading
55
component can be found in the Rician model. In this paper, comparison of Rayleigh channel and Rician channels has been performed on the basis of BER (Bit error rate). MATLAB simulations are used for performance comparative analysis of Rayleigh and Rician fading channel models in terms of BER analysis. On comparing the two channel models, Rayleigh model is observed to be the more accurate model that can be considered for developing multipath fading channel model.
References 1. 2. 3. 4. 5.
Garg VK (2007) Radio propagation and propagation path-loss models. Wireless Commun Netw Kostov N (2003) Mobile radio channeling in matlab. Radio Eng, vol 2 Rappaport. Wireless communication, 2nd edn, pp 105–212 Goldsmith A Wireless communications. Stanford University Yoo DS Channel characterization and system design in wireless communication. Communication and Signal Processing Laboratory, University of Michigan 6. Sklar B (1993) Rayleigh fading channels in mobile digital communication systems Part 1: characterization. IEEE Communication Magazine
Towards a Framework to Address Enterprise Resource Planning (ERP) Challenges Stephen Kwame Senaya , John Andrew van der Poll , and Marthie Schoeman
Abstract This paper considers some prominent Information System (IS) models and their emphases concerning correct IS development. An Enterprise Resource Planning (ERP) system as a complex IS noted, and on the strength of the emphases observed in IS models and ERP challenges elicited in earlier work, a comprehensive synthesis of the said emphases and challenges is presented. Following such synthesis, a framework to address the said ERP challenges is developed. Key to such framework is aspects of User Experience (UX) and a Formal Methods (FMs) component aimed at addressing some Software Development Life Cycle (SDLC) issues like complexity, hidden information, and traceability often present in a modern ERP. The framework is considered a useful aid for the analysis, design, and development of complex ERPs in the world of ISs. Keywords Business processes · Enterprise Resource Planning (ERP) · ERP framework · Formal Methods (FMs) · Information System (IS) models · SDLC · UX
1 Introduction Enterprise resource planning (ERP) systems integrate data from all functional departments of an organisation into a single unit [1] and facilitate data and information sharing across the various departments and business units [2]. Many organisations utilise ERP systems to optimise business processes for creating a strategic and S. K. Senaya (B) · M. Schoeman School of Computing, University of South Africa, Johannesburg, South Africa e-mail: [email protected] M. Schoeman e-mail: [email protected] J. A. van der Poll Graduate School of Business Leadership (SBL), University of South Africa, Midrand, South Africa e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_6
57
58
S. K. Senaya et al.
competitive advantage [3, 4]. ERP failures in business have become predominant over the years and the causes of these system failures may often be attributed to errors in the development life cycle of software, especially at requirement and specification stages. These may be attributed to the lack of a framework or model for ERP systems development [5, 6]. Numerous scholars suggested models to address such failure, yet succeeded only partially [7]. An ERP may be viewed as a large Information System (IS), consequently we investigate in this paper well-known IS models and elicit common emphases and challenges of ERPs. These emphases and challenges are subsequently synthesised to develop and propose a framework for correct ERPs. The use of formal methods (FMs) is suggested as an integral and critical part of correct ERP development in the proposed framework. The layout of the paper is: Following the introduction, we present our research questions in Sect. 2, our research methodology in Sect. 3 and a review of selected ERP models in Sect. 4. Common emphases and challenges in ERP frameworks are synthesised in Sect. 5, together with a brief discussion on ERP challenges from previous work by the researchers. Owing to the potential of formalisms in software development, an ERP framework with FMs as central to critical parts is presented in Sect. 6. Reasons for incorporating FMs in the proposed ERP framework are discussed in Sect. 7. Conclusions and future work are presented in Sect. 8.
2 Research Question (RQs) The RQs addressed in this paper are as follows: 1. 2. 3.
What frameworks for evaluating an ERP as an IS are available? What are the challenges of ERP frameworks in the IS milieu? How may the incorporation of FMs into an IS or ERP framework assist with the development of correct ERPs?
3 Methodology We conducted a comprehensive literature review on prominent frameworks in the IS/ERP space to identify emphases and further synthesise challenges, identified by the researchers from previous work. The literature on IS/ERP frameworks collected for this study was selected based on the relevance and impact in the ERP arena. A total of eight (8) well-known frameworks from the 1980s to 2019 were considered and examined to determine those that are most often used. The emphases elicited are presented next. Emphases The common emphases presented by the eight (8) IS frameworks are
Towards a Framework to Address Enterprise Resource …
• • • • • •
59
Data/Information Quality Systems Quality Service Quality Systems Reliability Usability Organisation Benefit
Challenges We likewise investigated nine (9) common challenges around these frameworks, and found these to be [11] • • • • • • • • •
Complexity Human Resources Project Management Business Processes SDLC Alignment Traceability Hidden Information Partial ERP Integration Little or No Reliability
Subsequently, four (4) IS/ERP frameworks with the highest rate of compliance to the common aspects were selected for further analyses. Our review focused on aspects that are mostly ignored by these IS/ERP frameworks since ERPs are usually very large systems covering many diverse aspects (Fig. 3) as presented in this paper. The nine challenges indicated with ERPs frameworks above prevent organisations from reaping the full benefit of ERPs, hence the need for a framework for correct ERPs to mitigate these challenges. Following an inductive research approach, we developed a framework to address the ERP challenges. The use of formality in the framework was identified to address critical areas in ERP development.
4 Review of IS/ERP Frameworks According to Shannon and Weaver’s [8] model of communication, the correctness and success of a system depend on technical; semantic; and effective, or influence factors. The technical factors deal with the accuracy of the message that has been transmitted, the semantic factors consider the precision of the message to convey the desired semantics, and its influence reflects the effect of the message transmitted. Mason [9] adapted the model in [8] to include a behavioural dimension in developing information theory, emphasising how changes in user behaviour could impact on the success of an IS. DeLone and McLean [10] further adapted both the [8, 9] models to produce the DeLone and McLean (D&M) IS Success Model with six distinctive
60
S. K. Senaya et al.
Fig. 1 D&M IS success model [10]
aspects/dimensions to measure IS success as depicted in Fig. 1. Aspects of quality with respect to System Quality and Information Quality (indicated in Sect. 3) are critical parts of their framework as indicated in Fig. 1. Also, the Use and User Satisfaction are dependent variables of System Quality and Information Quality from which Individual Impact and Organisational Impact are derived. Over time the DeLone and McLean model for IS success became an important tool for measuring and justifying dependent variables in IS research. DeLone and McLean [10] further claimed the following: • An IS scholar has a broad list of individual dependent variables to choose from, and the number of dependent variables should be reduced for effectively comparing results in IS research. – This finding recognises the complexity of later ERP systems. • There are too little research efforts towards the measurement of IS impact on organisational performance in IS research. – This finding agrees with later work [11] that indicated Project Management, Business Processes, and Human Resources as critical components of an ERP. • The multidimensional nature of IS factors should be treated as such. – Again, cognizance is given to the complexity and multi-dimensionality of later ERP systems. Subsequently DeLone and McLean [12] enhanced their 1992 model in 2003 by making changes to Quality and Service to expand the scope of dimensions in the new model. The Quality dimension was expanded to embody three major aspects instead of two as in the 1992 model. They also expanded the Use component to Intention to Use and Use and replaced Individual Impact and Organisational Impact with Net Benefits and incorporated feedback loops to Intention to Use and User Satisfaction. Their updated model is given in Fig. 2.
Towards a Framework to Address Enterprise Resource …
61
Fig. 2 Updated D&M IS success model [13]
Fig. 3 ERP system components linking functions [14]
A modern ERP as an IS comprises of Human Resources (HR); Inventory; Sales and Marketing; Procurement; Finance and Accounting; Customer Relationship Management (CRM); Engineering/Production; and Supply Chain Management (SCM) [14, 15]. These are incorporated through a common database that connects all units and allows for information sharing and decision-making in the organisation as per Fig. 3. Both back- and front-office processes are linked directly to the central database, allowing for synchronisation of data from all functional units. The Suppliers and Customers are (indirectly) linked to the same database through the Manufacturing Application and Sales and Distribution/Service Applications modules, respectively, offering end-to-end capabilities to the system. Owing to the said integration, any system/function error in any component of the system could affect the central database.
62
S. K. Senaya et al.
Fig. 4 E-Learning success model (Source Holsapple and Lee-Post [18])
Returning to the conceptual development of an IS, DeLone and McLean [16] used their 2003 model to assess the success of e-commerce systems with particular focus on buyers and sellers as key stakeholders (users). Such an e-commerce system would fit as a Financial ERP Application in Fig. 3. The DeLone and McLean models have been consistently enhanced to fit the requirements of several ISs or ERPs, e.g. [17]. Holsapple and Lee-Post [18] also modified the Delone and McLean 2003 model to develop an E-Learning Success Model, aimed at measuring the success of e-learning courses from IS perspectives. Their model incorporated three software system development phases namely; System design, System delivery, and System outcome (Fig. 4). They concluded that the ELearning Success model could be employed to measure ISs or ERPs specific to the online learning environment. Yu and Qian [19] researched the success of an electronic health records (EHR) system (which could be an ERP Service application in Fig. 3) through a theoretical model and survey. The relational variables examined in their research are training to self-efficacy; self-efficacy to use; system quality; information quality; service quality; and use to user satisfaction, as well as the use and user satisfaction to net benefits of the EHR system. They concluded the EHR system’s success model and measurement scale are valuable for measuring the use and administration of health ISs. Their EHR systems success model is depicted in Fig. 5. Tilahun and Fritz [20] applied the D&M 2003 IS model to measure the success of an Electronic Medical Record (EMR) system in low resource areas (EMR would typically be a service application in Fig. 3). Following a quantitative research choice (Saunders et al.’s Research Onion [21]), they determined that Service quality is
Towards a Framework to Address Enterprise Resource …
63
Fig. 5 EHR systems success model [19]
the strongest determinant for Use and User satisfaction, while User satisfaction emerged as the most important factor in ensuring the perceived net benefit of an EMR. They recommended that to improve Service quality there should be continuous basic computer literacy training for users (i.e., health professionals). Their construct given in Fig. 6 was based on the updated D&M IS Success Model. Tilahun and Fritz [20] validated numerous interrelationships, viz., the relationships among various quality dimensions and computer literacy; and UX (user experience), leading to the perceived benefit for a user or organisation (Fig. 6). Naturally, net benefit is vital to a company wishing to achieve a competitive advantage through their IS success. Mustafa et al. [3] also investigated organisational critical success factors (CSFs) and their effect on ERP implementation from a user’s point of view, culminating in the framework in Fig. 7. They concluded that most ERP researchers focus on top managers when evaluating ERP critical success factors (CSFs). The views of the main
Fig. 6 EMR constructs by Tilahun and Fritz [20]
64
S. K. Senaya et al.
Fig. 7 Critical success factors (CSFs) model [3]
implementers or users of the system are usually not considered due to the exclusion by most IS researchers. Focusing on top managers and excluding implementers or users could adversely affect the resultant system. Mustafa et al.’s [3] CSFs address more detailed aspects, e.g., Project management, Business process reengineering, etc. This agrees with the findings of Senaya et al. [11]. The above literature review provides an answer to our RQ1. Next, we turn our attention to specific ERP emphases in the frameworks and associated challenges.
Towards a Framework to Address Enterprise Resource …
65
5 Evaluating Emphases and Challenges in the ERP Frameworks Four (4) prominent IS/ERP frameworks evaluated by the researchers from previous work are • • • •
E-Learning Success Model [18] Electronic Health Record (EHR) Systems Success Model [19] Electronic Medical Record (EMR) Constructs [20] Critical success factors (CSFs) model [3]
The selection of the above four (4) frameworks was on the strength of some emphases they possess in union with lists of emphases for evaluating ERPs success from literature reviewed by the researchers. The frameworks’ features were likewise considered to determine the rate of failure their possible weaknesses pose to the success of such ERPs. Senaya et al. [11] likewise identified some challenges as causes of ERP failure discussed below.
5.1 Linking ERPs with IS Success Models Business Processes (BPs): One of the major reasons for ERP systems failure stems from a system’s inability to correctly align with the business processes of the organisation. Management of organisations’ business processes depends on how these organisations achieve efficiency in what they do through integrating ERP systems to optimise activities, while aligning business processes with the organisations’ strategies and goals [22]. Yaseen [23], Friedrich et al. [24], and Zerbino et al. [25] all attribute ERP failure to incorrect business processes. The BP misalignment coincides with [3] “Business process reengineering” CSF (Fig. 7). Project Management (PM): Inadequate (software) project management practices have also been cited as causes of ERP failure [11, 26, 27], hence the implicit recognition of adhering to best PM practices in some of the IS models above. While challenges surrounding ERPs have increased progressively during the past decade in different sectors worldwide, there are no appropriate frameworks that deal with project management issues [28]. Consequently, any framework for addressing ERP failure should, therefore, pay attention to PM, specifically software project management (SPM) aspects. Complexity: System complexity remains a challenge in large ISs and, therefore, ERPs also. The above IS models all aim to address complexity, often via quality considerations. Selecting an appropriate ERP for an organisation remains a complex undertaking [27].
66
S. K. Senaya et al.
Human Resources (HR): All components of an ERP system may be deemed to be equally important, with (traditional) HR importance being right up there with the rest. In our work, however, we view the HR challenge as the lack of skill sets of the employees in the company, specifically, the software developers. In this regard, part of ERP failure could be attributed to a shortage of individuals with the necessary software development skills, specifically in the use of Formal Method (FMs) to facilitate correct software development. The application of FMs allows for the identification and subsequent discharging of proof obligations arising from formal specifications. There remains, however, a lack of skilled personnel to take up this challenge [29], equally so in the development of correct commercial ICTs which usually embed large ERPs [30]. SDLC Non-alignment: Incorrect or challenged software development practices are arguably the most prominent reason for ERP failure. Vast amounts of literature and practitioner case studies have been devoted to this aspect. Suffice it to note that the complexity (refer to a previous item) of modern IS systems and their underlying ERPs are at the heart of SDLC misalignments [31]. Hidden Information: ERP failure may also arise as a result of hidden information, compromising the reliability of the system [14, 32]. Hidden information is related to traceability discussed below. Traceability: Challenges around traceability strongly correlate with aspects around SDLC processes and hidden information. Incorrect linking of IS components may lead to challenges in the ERP modules [33]. Partial ERP Module Integration: Incorrect integration of legacy systems with a new ERP or running mixed standalone systems with a partial ERP may lead to incompatibilities. Many of the CSFs in Fig. 7, e.g., technological infrastructure link with challenges around partial ERP integration. Reliability: Reliability is classified as a non-functional requirement and owing to the above challenges an ERP may exhibit little or no reliability. Reliability may, therefore, be viewed as an overarching requirement, as well as a consequence of any or all of the above challenges.
5.2 Summary of ERP Challenges Table 1 summarises the ERP challenges encountered in this paper regarding the emphasis placed on the foregoing IS models and previous research work by the researchers. The extent to which each aspect is recognised is tallied. Such information is utilised in the construction of a high-level framework to address ERP failure.
Towards a Framework to Address Enterprise Resource …
67
Table 1 Summary of ERP emphases and challenges recognised ERP frameworks
[18]
[19]
[20]
[3]
Survey [11]
Total
Emphases Usability
X
X
X
X
–
4
Information quality
X
X
X
–
–
3
User satisfaction
X
X
X
–
–
3
System reliability
X
X
X
X
–
4
Sub total
4
4
4
2
0
14
Challenges Complexity
–
X
X
X
X
4
Human resources
X
X
X
X
X
5
Project management
X
X
X
–
X
4
Business processes
X
X
X
–
X
4
SDLC alignment
X
X
X
X
X
5
Traceability
X
X
X
X
X
5
Hidden information
X
X
X
X
X
5
Partial ERP integration
X
X
X
X
X
5
Little or no reliability
X
X
X
X
X
5
Sub total
8
9
9
7
9
42
Grand totals
12
13
13
9
9
56
Source Synthesised by researchers
5.3 Discussion and Analysis From Table 1 we notice all of Usability, Information Quality, User Satisfaction, and Reliability to be important regarding the emphasis placed on these in the IS frameworks considered. Also, Usability and User Satisfaction may be combined into UX (User Experience), in line with HCI classifications. While the classification in Senaya et al. [11] did not consider the Table 1 emphases per se, it did score high (a value of 9) together with two other frameworks with respect to ERP challenges identified. The CSF framework of Tilahun and Fritz [20] has an overall lower score of 7—they omitted Business Processes and Project Management, but both these are covered by the other IS frameworks, as well as the survey. The above discussions provide an answer to our RQ2.
68
S. K. Senaya et al.
6 The Proposed Framework From Table 1 resulting from the analyses of the IS models and the Senaya et al. [11] survey, we synthesise the high-level framework in Fig. 8 aimed at addressing the ERP challenges and emphases identified before. In line with the IS models in Sect. 4, our framework embodies four grouped dimensions as indicated by the dotted-line blocks: ERP Challenges: Eight challenge areas as identified in Table 1 have been embedded in the framework with the 9th challenge (ensuring reliability) being an overarching, non-functional requirement (see next discussion). Success Factors: These are aspects aimed at addressing the challenges. They are training, the use of formality in SDLC processes to address complexity (formal specifications), eliminating hidden information (complete ERP integration), and enhanced traceability as elicited in Table 1. Training was identified as an important success factor relating to HR (e.g., training developers in the use of formality to acquire a desired set of software development skills). A success factor for reducing partial ERPs is integration to (amongst other things) improve the UX for ERP users. Outcomes: It is anticipated that the results of applying the framework would lead to higher data/information and system quality with improved UX, and system reliability leading to organisational benefit. Competitive Advantage: It is hoped that the application of the conceptual framework in Fig. 8 would create a strategic advantage for organisations through an improved ERP environment.
Fig. 8 Framework to facilitate ERP development (Source Synthesised by researchers)
Towards a Framework to Address Enterprise Resource …
69
Following the development of the Fig. 8 framework and accompanying discussion, we arrive at an answer to RQ3.
7 Reasons for Incorporating Formal Methods (FMs) in the Proposed ERP Framework Formal methods (FMs) employ the use of mathematical notations and logical reasoning to define properties of an IS/ERP system correctly to avoid undue constraining of these properties [34]. Owing to the various challenges of ERPs elicited before, the use of FMs may be central in constructing and resolving issues of such systems. For instance, in describing the properties of the system both at lower and higher software levels and integrating the various ERP modules. Hence, the researchers suggest the incorporation of FMs as a central part of the proposed framework (Fig. 8). With the FMs style of developing systems, the specifier begins with creating a formal specification of the system by defining the desired properties before developing the resultant system [35]. A formal specification also offers a dependable approach for investigating system processes, satisfying requirements and testing results, as well as writing an instructional guide for software systems [34]. Also, a formal concept is the notion of what is understood at a human level to give a clear interpretation of the system at the specification phase [36]. It worth noting the value of FMs to produce correct systems such as ERPs [37]. However, the adoption of FMs techniques in practice amongst software developers is not encouraging [38]. Though FMs usage requires a rigorous effort in mastering the underlying discrete mathematics and logical principles, the researchers postulate it to be no harder than mastering any modern programming language.
8 Conclusion This paper reviewed several IS frameworks developed over the past couple of decades. An ERP architecture was presented and acknowledged to be a large and complex component of an IS. The emphases of the said IS frameworks together with ERP challenges identified from previous work by the authors were synthesised into Table 1. The major prominent emphases and challenges were identified and on the strength of these, an ERP framework to adhere to the emphases and address the challenges was synthesised. The framework has four dimensions, in line with the multi-dimensionality of IS frameworks, for example, the Tilahun and Fritz [20] framework gone before. Of particular importance in our framework is the suggested use of Formal Methods to address aspects of complexity, hidden information, and traceability in conjunction with an SDLC.
70
S. K. Senaya et al.
Future work in this area may be pursued along a number of avenues: The Fig. 8 framework is conceptual (similar to the IS frameworks discussed above) and needs to be refined by unpacking the individual entities and components. Once completed, a formal methods approach to reason about the properties of the links and entities indicated can be launched, thereby enhancing the framework. An industry survey among practitioners should also be undertaken, either through qualitative interviews or quantitatively to establish the relationships indicated, similar to the Fig. 6 framework in Tilahun and Fritz [20]. All these are aimed at deriving a formal ERP model that could be adopted by ERP software engineers.
References 1. Rajan CA, Baral R (2015) Adoption of ERP system: an empirical study of factors influencing the usage of ERP and its impact on end use. IIMB Manag Rev 27(2):105–117. https://doi.org/ 10.1016/j.iimb.2015.04.008 2. Gelogo YE, Kim H (2014) Mobile integrated enterprise resource planning system architecture. Int J Control Autom 7(3):379–388 3. Mustafa A, Serra YE, Ekmekçi KA (2015) The effect of ERP implementation csfs on business performance: an empirical study on users ’ perception. Procedia—Soc Behav Sci 210:35–42. https://doi.org/10.1016/j.sbspro.2015.11.326 4. Ju P, Wei H, Tsai C (2016) Model of post-implementation user participation within ERP advice network. Asia Pacific Manag Rev 21(2):92–101. https://doi.org/10.1016/j.apmrv.2015.11.001 5. Khalifa G, Irani Z, Baldwin LP, Jones S (2004) Evaluating information technology with you in mind. Electron J Inf Syst Eval 4(1). http://www.ejise.com/volume-4/volume4-issue1/issue1art5.htm#_ftn1 edn. Academic Conferences Limited, Reading, England 6. Clancy T (2014) The standish group report chaos—Project smart 7. Dwivedi YK et al (2015) Research on information systems failures and successes: status update and future directions. Loughbrgh. Univ. Institutiobal Repos 8. Shannon CE, Weaver W (1949) The mathematical theory of communication. Urbana. https:// doi.org/10.1152/ajpgi.00415.2011 9. Mason RO (1978) Measuring information output: a communication system approach. Inf Manag 6 10. DeLone WH, McLean ER (1992) Information systems success: the quest for the dependent variable. Inf Syst Res 3(1):4 11. Senaya SK, van der Poll JA, Schoeman MA (2019) Categorisation of enterprise resource planning (ERP) failures: an opportunity for formal methods in computing. In: Conference on science, engineering and waste management (SETWM-19), pp 181–187, Birchwood, November 18–19, 2019, Johannesburg (South Africa), November 2019, no 8, p 6. https:// doi.org/10.17758/EARES8.EAP1119287 12. DeLone WH, McLean ER (2003) The DeLone and McLean model of information systems success: a ten-year update. J Manag Inf Syst 19(4):9–30. https://doi.org/10.1080/07421222. 2003.11045748 13. DeLone WH, McLean ER (2002) Information systems success measurement. Found Trends® Inf Syst 2(1):1–116. https://doi.org/10.1561/2900000005 14. Mazzawi R (2014) Enterprise resource planning implementation failure: a case study from Jordan. J Bus Adm Manag Sci Res 3(5):79–86 15. Pravin G (2017) Basic modules of ERP system. In: ESDS 2013. https://www.esds.co.in/blog/ basic-modules-of-erp-system/#idc-container. Accessed 16 Oct 2017 16. DeLone WH, McLean ER (2004) Measuring e-Commerce success: applying the DeLone & McLean information systems success model. Int J Electron Commer 9(1):31–47
Towards a Framework to Address Enterprise Resource …
71
17. Stacie P, DeLone W, McLean E (2008) Measuring information systems success: models, dimensions, measures, and interrelationships. Eur J Inf Syst 17(3):236–263. https://doi.org/10.1057/ ejis.2008.15 18. Holsapple CW, Lee-Post A (2006) Defining, assessing, and promoting e-learning success: an information systems perspective. Decis Sci J Innov Educ 4(1):67–85. https://doi.org/10.1111/ j.1540-4609.2006.00102.x 19. Yu P, Qian S (2018) Developing a theoretical model and questionnaire survey instrument to measure the success of electronic health records in residential aged care. PLOS Open Access Artic Distrib Under Terms Creat Commons Attrib Licens 1–18. https://journals.plos.org/plo sone/article?id=10.1371/journal.pone.0190749 20. Tilahun B, Fritz F (2015) Modeling antecedents of electronic medical record system implementation success in low-resource setting hospitals. BMC Med Inform Decis Mak 1–9. https:// doi.org/10.1186/s12911-015-0192-0 21. Saunders M, Lewis P, Thornhill A (2019) Research methods for business students, 6th edn. Pearson. ISBN: 9781292208787 22. Nikolova-Alexieva V (2012) Exploring the state of business processes management in the bulgarian enterprises. Procedia—Soc Behav Sci 62:1350–1354. https://doi.org/10.1016/j.sbs pro.2012.09.230 23. Yaseen SG (2009) Critical factors affecting enterprise resource planning implementation: an explanatory case study. IJCSNS Int J Comput Sci Netw Secur 9(4):359–363 24. Friedrich I, Sprenger J, Breitner MH (2009) CRM Evaluation an approach for selecting suitable software packages. Inf Softw Technol 2000 25. Zerbino P, Aloini D, Dulmin R, Mininno V (2017) Framing ERP success from an information systems failure perspective: a measurement endeavor. J Electron Commer Organ 15(2):31–47. https://doi.org/10.4018/JECO.2017040103 26. Deshmukh PD, Thampi GT, Kalamkar VR (2015) Investigation of quality benefits of ERP implementation in Indian SMEs. Procedia Comput Sci 49(1):220–228. https://doi.org/10.1016/ j.procs.2015.04.247 27. Zeng Y-R, Wang L, Xu X-H (2015) An integrated model to select an ERP system for chinese small- and medium-sized enterprise under uncertainty. Technol Econ Dev Econ 23(1):38–58. https://doi.org/10.3846/20294913.2015.1072748 28. Mitra P, Mishra S (2016) Behavioral aspects of ERP implementation: a conceptual review. Interdisc J Inform Knowl Manage 11:17–30 29. O’Regan G (2017) Concise guide to software engineering: from fundamentals to application methods. Undergraduate topics in computer science, Springer, ISBN 978-3-319-57750-0 30. Nemathaga AP, van der Poll JA (2019) Adoption of formal methods in the commercial world. In: Eight international conference on advances in computing, communication and information technology (CCIT 2019), pp 75–84. https://doi.org/10.15224/978-1-63248-169-6-12 31. Futekova N, Monov V (2014) Conceptual framework for evaluating the effectiveness of the implementation of enterprise resource planning systems in small and medium-sized enterprises. Econ Altern 3:117–125 32. Shatat AS (2015) Critical success factors in enterprise resource planning (ERP) system implementation: an exploratory study in Oman. Electron J Inf Syst Eval 18(1):36–45 33. Delater A, Paech B (2016) Traceability between system model, Project model and source code. Inst Comput Sci Univ Heidelb 1–7 34. Spivey J (1992) The Z notation: a reference manual, 2nd edn., England 35. Woodcock J, Davies J (1996) Using Z specification, refinement, and proof. Prentice Hall 36. Alsaig A, Alsaig A, Mohammad M (2017) EAI endorsed transactions context-based project management. EAI Endorsed Trans Context Syst Appl 4(1):1–10 37. Jeremy G, José Nuno O (2009) Teaching formal methods. In: Second international conference, TFM 2009 Eindhoven, The Netherlands, November 2–6, 2009, Proceedings, 2009, Jeremy Gib., pp 1–184. https://doi.org/10.1007/978-3-642-04912-5 38. Gruner S (2010) Formal methods and agile methods, FM+AM’11. South African Comput J 6(46):1–2. http://ssfm.cs.up.ac.za/workshop/FMAM11.htm. Accessed 1 July 2016
Potentials of Digital Business Models in the Construction Industry—Empirical Results from German Experts Ralf-Christian Härting, Christopher Reichstein, and Tobias Schüle
Abstract Digitization and new business models in the construction industry gain increasing relevance. Therefore, an empirical study was carried out in Germany, Austria and Switzerland based on a theoretical foundation. The target of this study is to examine to what extent digitization has already changed the construction industry, what will change in the future, and what are possible benefits of new digital business models. The structural equation model (SEM) approach identifies four key constructs (KPI, individualization, efficiency, communication) that could have an impact on the potentials of digital business models (PDBM) and their processes in the construction industry. Of those four hypotheses two (efficiency and communication) have a significant impact. Keywords Potentials · Digitization · Construction industry · Empirical results · Quantitative study · German experts
1 Introduction Digitization enables a lot of new opportunities which can be realized from construction industry companies. Meanwhile, increasing competition forces enterprises to keep up with digitization to maintain the level. It is obvious that digitization is an important part of transformation also for the construction industry [1]. Contrary to other sectors, the German construction industry is at the beginning of digitization. Between 2000 and 2011, productivity in the German construction industry rose by only 4.1%, while overall German productivity grew by 11% over the same period [2]. This represents a below-average production development. Like Industry 4.0, the construction industry needs an intelligent construction site that R.-C. Härting (B) · T. Schüle Aalen University of Applied Sciences, Business Administration, Aalen, Germany e-mail: [email protected] C. Reichstein Cooperative State University BW, Heidenheim/Brenz, Germany © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_7
73
74
R.-C. Härting et al.
enables all buildings and machines to be networked. “Digitization is transforming the construction sector throughout its entire life cycle, from design to operation and maintenance” [3]. The structure of this paper is as shown hereafter: in Chap. 2, the terms “digitization” and “construction” are defined as they are understood in this study. Thereafter, Chap. 3 is dealing with the concept and the design of our empirical study. In Chap. 4, the research methods are described to understand, among other things, how the data was gathered. In Chap. 5, the study results are shown. The paper is completed with a conclusion in Chap. 6.
2 Digitization in the Construction Industry The wording “digitization” can be interpreted differently: the first one is the transformation of analog information into digital form using appropriate electronic equipment. Information becomes to data, which are able to be stored and transferred through digital networks [4]. In the second case digitization stands for activities, which are generated by the implementation of innovative applications based on digital approaches. Therefore, digitization can be classified according to various intensity stages: from the presentation of information (website) to digital business models like augmented reality applications [1]. A widespread understanding of digitization leads to the usage of new and efficient concepts, like Artificial Intelligence, Cloud and Mobile Computing or Big Data [5]. The wording “digitization” [8] denotes the just beginning phase of upheaval, in that machine actions are replacing intelligent actions of men for the first time. As a result of these approaches, “intelligent products” will be networked to a new digital infrastructure, generally known as Internet of Things [6]. Mainly the capability to communicate and gather data is the foundation for multiple new conceptions. Firms can learn in much more details than before about the usage of their services and thus can afterward improve their products and services. On that basis, the development of new processes, functions, and business models is possible by taking advantage of advanced analytical capabilities created by digitization technologies like Big Data [7, 8]. Digitization is becoming increasingly important in all industries. This has an impact on established value chains as well as on the redesign of business models. The following quotation specifies the existing potential of the construction industry: “Digitization transforms the construction sector throughout asset’s lifecycle from design to operation and maintenance [3].”
Potentials of Digital Business Models in the Construction Industry …
75
3 Background and Research Design A conceptual model was developed considering the status quo of the literature [9], which provides the basis for measuring the digitization potential in the construction sector (Fig. 1). Referring to the design of the study, we identified four constructs (KPI, individualization, efficiency, communication) which influence the potentials of digitization with respect to new digital business models in the construction sector. In the following, the four constructs and its effect on the potentials of digital business models (PDBM) will be described in more detail. In this context, the derivation and formulation of the hypotheses also become clear. Through digitization, many processes can be optimized within firm with respect to effectiveness and cost-efficiency which potentially increase key performance indicators (KPI), i.e., monetary business performances such as sales and profit. For example, it is evident that the new Internet technologies enable faster and better distribution of information due to non-existent geographical or time restrictions [10]. H1: KPIs positively influence the PDBM in the construction industry. Companies are making great efforts to use the Internet in the right way to coordinate their value activities with customers, suppliers, and other business partners, with the goal of improving business performance [11]. A lot of technologies classified under Industry 4.0 have long been in use, now they are being networked nationwide. To link cross-company supply chains, further industry standards are necessary. Germany has the best chances to play an outstanding role in the realization of the Smart Factory [12].
Fig. 1 Results of the structural equation model
76
R.-C. Härting et al.
H2: Individualization positively influence the PDBM in the construction industry. One of the key success factors of companies today is important information that is collected, processed, and analyzed [13]. This provides potential for a path to more individual products for the customer. The process of digitization allows firms to become closer to their customers by adapting their offerings to their customers’ needs, which is of major importance because of the speed of the digital transformation, as otherwise competitors will be faster. Changed business processes and new business models resulting from digitization also make it possible to improve services which may have significant impacts on company’s success. Within environments of increasing complexity, companies should generally focus on customers [14]. As a result of Industry 4.0 and advancing digitization, many agile approaches have emerged that companies can use to acquire customers individually and to reduce transaction costs. H3: Efficiency positively influence the PDBM in the construction industry. To increase the competitiveness of a company, the technological achievements of Industry 4.0 can be used to make business processes more agile [15]. The more innovative and agile a company is, the simpler it is to develop new processes as well as business models [16]. In particular, improved business processes can reduce future error rates, since new digitized processes are not only faster but also more precise than analog processes [17]. This ultimately leads to increased efficiency. Furthermore, Big Data can be used to generate data that may provide competitive advantages which is further enhanced by new analytical methods [16]. Besides the volume, the speed, and the variation, complex data are getting especially significant for firms because of the possibility to receive valuable information [15]. H4: Communication positively influence the PDBM in the construction industry. One of the main benefits of new digital business processes is the improvement of general communication. Digital technologies, therefore, provide new ways of collaboration and offer the chance of short decision-making processes in companies [18]. Meetings online can also be used for presentations, simulations, or for simple communication. Independent of time and space, Internet technologies today make it possible to hold very different types of meetings [19]. New ways for businesses, individuals, networked devices, and governments to work, communicate, and collaborate result in easy exchange and interactions as well as a multitude of accessible data [13]. Digitalized business models and improved digital processes also increase collaborations. Considering that data such as customer information is a key factor in today’s business environment for offering individual products and services, it is especially rewarding for organizations to communicate and exchange information with each other [20]. If organizations share all data, structures along the entire value chain can be optimized using Big Data, resulting in new digital business processes and, at best, more digital business models [18, 21].
Potentials of Digital Business Models in the Construction Industry …
77
4 Methodology The conceptual model provides four hypotheses which are to be tested by means of a quantitative research using the open-source software LimeSurvey [22, 23]. In March 2018, the web-based online study started, and it ended in April 2020. After data cleaning the sample is n = 102 responses from construction experts in Germany, Austria, and Switzerland. 27.5% of the experts work in small companies with 0.10) impact. Thus, KPIs indeed describe a positive effect (+0.975), but it is not strong enough on potentials of digital business models in this case. The usage of digital technologies in this field, which is described in literature, could strengthen new business models and their processes in an effective way. For example, Artificial Intelligence (AI) brings a lot of opportunities for increasing productivity or even lower labor and transaction costs. As a result, digital technologies can have a huge impact on new business models driven by digitization to increase effectiveness [13]. H2 (Individualization has a positive influence on the potentials of digital business models) measures the potential of individualization to digital business models. In fact, it has a positive (+0.607) impact. As the impact is not significant (p > 0.10), this hypothesis has to be rejected. The result shows that individualization has a positive, but not significant effect on digital business models in the construction industry. H3 (Efficiency has a positive influence on the potentials of digital business models) treats with the effect if efficiency can be raised in digital business models in the construction industry. This construct has a positive (+0.079) impact on the endogenous variable and the slightly significance level (p ≤ 0.10) is good enough to confirm an influence. Therefore, this hypothesis can be confirmed in terms of statistical requirements. There are further aspects of efficiency. Especially the combination of digital business models and agility is an innovative way to gain more efficiency in business processes [14]. H4 (Communication has a positive influence on the potentials of digital business models) deals with the issue if communication could be improved in digital business models. The p-value of hypothesis four shows a very strong impact (+0.000). The significance is at a maximum low level with p ≤ 0.01. Therefore, hypothesis four can be confirmed. The construct communication leads to a high positive effect on digital business models in the construction industry. The results described in the last sections will be presented in the following table in a detailed way (Table 1). In case of single item sets, there is a different way to work on in the quantitative research. The quality criteria according to Homburg are only used in modeling with multi-items [29]. The potential of digital business models and their processes in the construction industry is abbreviated as PDBM. Table 1 SEM coefficients Hypotheses
SEM path
Path coefficient
Significance (p-value)
H1
KPI → PDMI
0.032
+0.975
H2
Individualization → PDMI
0.515
+0.607
H3
Efficiency → PDMI
1.758
+0.079
H4
Communication → PDMI
3.727
+0.000
Potentials of Digital Business Models in the Construction Industry …
79
Fig. 2 Results of descriptive questions
Descriptive Analysis of the Use of RFID and BIM The questionnaire has two additional questions, which were not part of the SEM. There are only 89 responses to these two questions in contrast to 102 responses to the other questions in the study. One question is focusing on the use of Building Information Modeling (BIM), the other one on the use of RFID transponders in the construction industry. BIM creates the possibility to work sustainable, as any failure can already be detected on the computer in the planning and simulation phase [18]. RFID transponders offer a great potential to simplify the maintenance process for buildings and machines [30]. The results in Fig. 2 show that the use of BIM is more popular than the use of RFID transponders. The majority of experts agree or strongly agree with the statements that BIM and RFID change the processes in the construction industry and make it more sustainable.
6 Conclusion The study investigates four general influencing impacts on digital business models and their processes in the construction industry. For this purpose, the authors used a structural equation modeling approach. All four influencing constructs which could be identified are described with five detailed indicators in the hypothesis model. One out of four determinants, the construct communication, has a positive and highly significant influence on the research question which describes a great potential on digital business models and their processes. The determinant efficiency has a positive and slightly significant influence. The other two determinants individualization and KPI have no significant impact on the potential of digital business models.
80
R.-C. Härting et al.
Concerning the highly significant influence construct communication, the slightly significant influence factor efficiency and the not significant other hypotheses, the study shows interesting results. Considering the construct of communication, new digital business models offer great opportunities. The internal as well as the external communication can be improved. Regarding this fact, cooperation with external stakeholders can be improved and will lead to an enhanced business model. Considering the factor of efficiency, new digital business models provide a reduction of the error frequency in companies. Furthermore, the efficiency can be increased with the use of new, digital technologies. Nonetheless, these hypotheses are worth exploring, as this survey just included construction experts from German-speaking countries. This paper is limited in terms of some framework conditions. The location, sample size, and time frame bring with it some limitations. The research results can be used as a basic research for further elaborations to expand potentials concerning digital business models using digitization technologies (e.g., Big Data, RFID, BIM) in the construction industry. A lot of topics are still not yet fully explored, such as differences in various countries or deeper insights to existing business models. Furthermore, an empirical qualitative research approach could lead to more detailed findings. The fast-growing developments in topics of digital business models offer opportunities to improve business in the construction industry. In the future, the importance of digitization in the construction industry will continue to grow and become an important part of the company’s success. Acknowledgments This work was supported by Thinh Nguyen and Felix Häfner. We would like to take this opportunity to thank you very much for your great support.
References 1. Bauer L, Boksberger P, Herget J (2008) The virtual dimension in tourism: criteria catalogue for the assessment of Etourism applications. In O’Connor P, Höpken W, Gretzel U (eds) Information and Communication technologies in Tourism: Proceedings of the International Conference in Innsbruck 2008, pp 522–532. Springer, Wien 2. Baumanns T, Freber P, Schober K, Kirchner F (2017) https://www.rolandberger.com/publicati ons/publication_pdf/roland_berger_hvb_studie_bauwirtschaft_20160415_1_.pdf 3. Hautala K, Järvenpää M, Pulkkinen P (2017) https://trid.trb.org/view.aspx?id=1472605 4. Business Dictionary (2017) http://www.businessdictionary.com/definition/digitization.html 5. Härting R, Reichstein C, Haarhoff R, Härtle N, Stiefl J (2019) Driver to gain from digitization in the tourism industry. Proce ICICT 3:293–306 6. Schmidt R, Möhring M, Härting R, Reichstein C, Neumaier P, Jozinovi´c J (2015) Industry 4.0—potentials for creating smart products: empirical research results. In Abramowicz W, Kokkinaki A (eds) 18th international conference on business information systems. Lecture notes in business information processing, vol 208, pp 16–27. Springer, Cham 7. Breuer P, Forina L, Moulton J (2017) http://cmsoforum.mckinsey.com/article/beyond-thehype-capturing-value-from-big-data-and-advanced-analytics
Potentials of Digital Business Models in the Construction Industry …
81
8. Schmidt R, Möhring M, Maier S, Pietsch J, Härting R (2014) Big data as strategic enabler— insights from Central European enterprises. In: Abramowicz W, Kokkinaki A (eds) 17th international conference on business information systems. Lecture notes in business information processing. Springer, New York, pp 50–60 9. Cooper DR, Schindler PS, Sun J (2006) Business research methods. McGraw-Hill, New York 10. Gretzel U, Yuan Y-L, Fesenmaier DR (2000) Preparing for the new economy: advertising strategies and change in destination marketing organizations. J Travel Res 39:146–156 11. Barua A, Prabhudev P, Whinston A, Yin F (2004) An empirical investigation of net-enabled business value. Manage Inf Syst Q 28(4):585–620 12. Bauer W, Herkommer O, Schlund S (2015) Die Digitalisierung der Wertschöpfung kommt in deutschen Unternehmen an. https://www.hanser-elibrary.com/doi/10.3139/104.111273 13. Härting R, Reichstein C, Schad M (2018) Potentials of digital business models—empirical investigation of data driven impacts in industry. Procedia Comput Sci 126:1495–1506 14. Kaim R, Härting R, Reichstein C (2019) Benefits of agile project management in an environment of increasing complexity—a transaction cost analysis. In: Proceedings KES IDT. Springer, New York, pp 195–204 15. Fraunhofer INT (2014) Das Fraunhofer-Institut für Naturwissenschaftlich-Technische Trendanalysen berichtet über neue Technologien: Big Data, March 2014, p 87 16. Abbasi A, Sarker S, Roger HL, Chiang S (2016) Big data research in information systems: toward an inclusive research agenda. J Assoc Inf Syst 17(2):2 17. Clauß T, Laufien SM (2017) Digitale Geschäftsmodelle: Systematisierung und Gestaltungsoptionen. WiSt – Wirtschaftswissenschaftliches Studium 10:6 18. Gholami R, Watson RT, Hasan H, Molla A, Andersen NB (2016) Information systems solutions for environmental sustainability: how can we do more? J Assoc Inf Syst 17(7):526 19. Fast-Berglund Â, Harlin U, Âkerman M (2015) Digitization of meetings—from white-boards to smart-boards. Procedia CIRP 41:1125–1130 20. Obermeier R (2017) Industrie 4.0 als unternehmerische Gestaltungsaufgabe: Strategische und operative Handlungsfelder für Industriebetriebe - Betriebswirtschaftliche, technische und rechtliche Herausforderungen. Springer Gabler, Wiesbaden, p 3 21. Chi M, Li YZ (2016) Digital business strategy and firm performance: the mediation effects of E-collaboration capability. J (JAIS) 86 22. Rea LM, Parker RA (2014) Designing and conducting survey research: a comprehensive guide. Wiley, San Francisco 23. Projectteam TL (2017) https://www.limesurvey.org/de/ 24. Wong KKK (2013) Partial least squares structural equation modeling (PLS-SEM) techniques using SmartPLS. Mark Bull 24(1):1–32 25. Fornell C, Larcker DF (1981) Evaluating structural equation models with unobservable variables and measurement error. J Market Res 18:39–50 26. Markus KA (2012) Principles and practice of structural equation modeling by Rex B. Kline. Struct Equ Model Multidisciplinary J 19(3):509–512 27. Chin WW (1998) The partial least squares approach to structural equation modeling. Modern Methods Bus Res 295(2):295–336 28. Ringle C, Wende S, Will A (2017) www.smartpls.de 29. Ringle C, Sarstedt M, Straub D (2012) A critical look at the use of PLS-SEM in MIS Quarterly. MIS Q 36(1):iii–xiv 30. Dellarocas C (2003) The digitization of word of mouth: promise and challenges of online feedback mechanisms. Manage Sci 49(10):1407–1424
An Alternative Auction System to Generalized Second-Price for Real-Time Bidding Optimized Using Genetic Algorithms Luis Miralles-Pechuán, Fernando Jiménez, and Josá Manuel García
Abstract Real-Time Bidding is a new Internet advertising system that has become very popular in recent years. This system works like a global auction where advertisers bid to display their impressions in the publishers’ ad slots. The most popular system to select which advertiser wins each auction is the Generalized second-price auction, in which the advertiser that offers the most, wins the bet and is charged with the price of the second largest bet. In this paper, we propose an alternative betting system with a new approach that not only considers the economic aspect, but also other relevant factors for the functioning of the advertising system. The factors that we consider are, among others, the benefit that can be given to each advertiser, the probability of conversion from the advertisement, the probability that the visit is fraudulent, how balanced are the networks participating in RTB and if the advertisers are not paying over the market price. In addition, we propose a methodology based on genetic algorithms to optimize the selection of each advertiser. We also conducted some experiments to compare the performance of the proposed model with the famous Generalized Second-Price method. We think that this new approach, which considers more relevant aspects besides the price, offers greater benefits for RTB networks in the medium and long-term. Keywords Advertising exchange system · Online advertising networks · Genetic algorithms · Real-time bidding · Advertising revenue system calculation · Generalized second-price
L. Miralles-Pechuán (B) School of Computing, Technological University Dublin, Dublin, Ireland e-mail: [email protected] F. Jiménez · J. M. García Department of Information and Communication Engineering, University of Murcia, Murcia, Spain e-mail: [email protected] J. M. García e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_8
83
84
L. Miralles-Pechuán et al.
1 Introduction Real-Time Bidding consists of a large global market where publishers auction ad slots each time a user accesses their web pages [1]. In this great market, advertisers make an offer for displaying their adverts on the websites of the publishers. If an advertiser wins the bid, its advert is displayed instantaneously. In the real-time auctions, the whole process of auction, acquisition and ad display, takes place in the time a user loads a website (less than 100 ms) [2, 3]. RTB advertisers participate in the auction and the Generalized Second-Price (GSP) system is usually used to select the advertiser. In the well-known GSP system, the highest bid wins the auction and the bidder pays a price equal to the secondhighest amount bidden [4, 5]. Even though the GSP is not a verifiable auction system, it continues to be one of the most implemented auction mechanisms. The Advert selection process can be seen as a combined optimization problem treated as a stochastic control problem. Policies for online advert allocation have been developed based on placement quality and advertisers’ Cost-per-click (CPC) bids [6]. In this respect, the studies of Balseiro [6] should be emphasized, since he makes a deep analysis of the balance that must exist between both economic performances by selecting the most profitable advert and the quality of service offered to advertisers. Machine Learning (ML) models and deep learning models are frequently applied for optimization in many situations related to online campaigns [7]. For example, some publishers want to charge a fee regardless of whether users click or not on the advert, while some advertisers only want to pay if a click is generated. This problem can be solved using an intermediate role called “arbitrageurs”, and its success depends on how accurate the Click-through rate (CTR) estimations are [8]. The CTR represents the number of clicks an advert gets divided by the total number of impressions. Other studies encourage ANs to apply machine learning techniques to online auctions [9]. These models predict in a precise manner the acceptance of a user given an advert so that the probability of purchase increases considerably. In similar studies, adverts are ranked by the probability of being clicked, in such a way that the top-ranked adverts are likelier to be displayed [4, 10]. It is also possible to improve the performance of the RTB systems by maximizing the publishers’ profits. In this sense, Yuan et al. [11] focus on fixing the reserve price or the floor price, which is the price below which the publisher is not willing to sell. Increasing this price means that, in some cases, the winners, instead of having to pay the price of the second-highest bet, they have to pay the reserve price. In addition, in other cases, it will make some advertisers automatically raise their bets to get impressions. It is important not to raise the price too high, since it could trigger the number of impressions that remain unsold. In this research, we propose to optimize the function to select an advertisement based not only on the economic aspect, but we take into account a set of objectives such as the satisfaction of the publishers and the reduction of fraud for the advertising ecosystem to work properly.
An Alternative Auction System to Generalized …
85
In this paper, we propose an alternative payment model to GSP which takes into account not only the economic performance in the short term, but also considers many other variables in order to guarantee that all involved parties (advertisers, publishers and especially ANs) will make reasonable profits. The presented work is in line with that of Balseiro [6], in which he considers not only maximizing revenues, but also the ad quality. The achievements of this paper consist of developing an RTB platform that evaluates when ranking an advert, all the indispensable requirements to make possible an adequate advertising ecosystem performance. We consider our work to be of great interest due to the fact that it is the first article in RTB aimed at improving system performance by improving the ad selection system. The betting model presented takes into account many factors that are key to the proper development of the RTB advertising system. The idea presented in this article can be adapted by the RTB networks, adding or removing some of the variables, making this model more beneficial for advertisers, improving their advertising experience, and therefore, attracting new advertisers that increase the volume of business on these platforms. The rest of the paper is organized as follows. Section 2 explains the proposed method in general terms and illustrates the structure and each of the modules that compose the RTB platform, especially the ad selector module (ASM). Then in Sect. 3, the main objectives for the proper functioning of the RTB platform, the rules to prevent online fraud and the penalties to ensure that the common objectives are met, are defined. Lastly, a methodology to optimize the weights of the ad selection function through a GA. In Sect. 4, our experiments are described and a brief analysis of the obtained results are drafted. In Sect. 5, the conclusions from our paper and some possible lines of research for future work are presented.
2 Our Novel Advertising Exchange System The proposed AdX system implements an Advert Selection Function (ASF) that evaluates the necessary objectives for a proper system functioning. The objectives of our system are the advertisers’ impression percentage, spam advertisers, campaigns profitability, advertising network balance, publishers’ click-fraud and income maximization. These objectives are described in detail in Sect. 2.2. It seems of most importance to us to develop a system aimed at the satisfaction of all the roles involved in online advertising rather than a system only focused on the selection of the most cost-effective advert. In order to implement our AdX system, one variable will be used to represent each objective and one weight will be used to model each objective’s importance in the advert selection formula, as expressed in Eq. 3. The weights are optimized through a genetic algorithm (GA) according to the system’s performance. The GA uses the system’s performance, expressed in economic terms, as the fitness value. The fitness value is calculated by subtracting the total penalizations (Pen 1 , ..., Pen 5 ) from the
86
L. Miralles-Pechuán et al.
Fig. 1 Advertising exchange system structure. The AdX consists basically of the AES and all the ANs that take part in the exchange of adverts
total income derived from the system. Our methodology is able to find the best values for the weights, given any configuration. The best weights are those that maximize the income while minimizing the sum of all the penalties. Penalties are economic sanctions that are applied when a goal is not met. The less an objective is met, the higher the associated penalty will be. The value of these weights can be calculated offline and then the system configuration can be updated periodically. Our methodology is able to find the optimal weights using a GA from the definition of the objectives, the penalties and the rules in order to prevent online fraud. As it is shown in Fig. 1, in our proposed system, all ANs exchange adverts among themselves through the Advertising Exchange System (AES). The most important AES processes are: selecting the best advert from among all the candidates, keeping the fraud detection system updated and managing collections and payments from advertisers and publishers [12, 13].
2.1 Advertising Exchange System In order to develop the AdX, we propose the AES shown in Fig. 2. The designed AdX uses the CPC payment model. It is composed of four interconnected and interdependent modules: the CTR estimation module, the Fraud Detection module, the ASM and the database. Each module is designed for a different purpose and all of them are needed to make the advertising exchange possible.
An Alternative Auction System to Generalized …
87
Fig. 2 Advertising exchange system structure. The AES is the cornerstone of our system, since it performs all the necessary functions for an appropriate advert exchange
The most important module is the Advert Selector. The other three modules (CTR estimation, Fraud detection and Database module) provide the necessary information, so that the Advert Selector can choose the advert with the highest performance.
2.1.1
Module 1: CTR Estimation
The CTR of an advert is calculated as the ratio between the number of clicks and the number of impressions. But in the case of a single visit, the CTR can be computed as the probability that a user generates a click on the advert displayed on a website. This probability is expressed as a real number lying within the range [0, 1]. Accurately estimating the CTR of an advert is one of the biggest challenges of online advertising [14]. Bearing in mind that we implement the CPC payment method in this system, the ANs need to give priority to the most profitable adverts in order to maximize their income. Machine Learning has been applied with great success in classification problems such as image and sound recognition [15], COVID-19 planning [16, 17] or CTR estimation [18]. In the case of CTR estimation, the dataset for machine learning methods contains data fields with users’ features and websites’ features such as advert size, advert position or category of the page. The output of the model is “1” when the user generates a click and “0” when the user does not generate a click. We should clarify that, rather than predicting the class model, what is predicted is the probability that the output belongs to the class “1” in the [0, 1] range.
88
2.1.2
L. Miralles-Pechuán et al.
Module 2: Fraud Detection
The Fraud Detection module informs about the probability of an advert being spam and the probability of a click being fraudulent. The Fraud Detection module is designed to measure the probability of an advert being spam and the probability of a click on the advert of a publisher’s website being fraudulent. The probability of fraud in both cases can be expressed as a real number within the range [0, 1], r ∈ R, r ∈ [0, 1]. As we have mentioned previously, calculating the probability of fraud is a highly complex process. Therefore, it becomes very difficult to determine when a person is committing fraud from a single click or from a single advert impression. To assess whether an advertiser or a publisher is cheating, it is necessary to evaluate a large enough set of clicks or advert impressions. Moreover, the models that determine the probability of fraud would have to take into account the historical publishers’ clicks and advertisers’ impressions. In the case of spam adverts, some information regarding advertisers should also be considered, such as the duration of the advertisers’ campaigns, the type of products he/she advertises, the adverts’ CTR or the users’ behaviour when the advert is displayed [19]. In the case of click-fraud, some publisher’s features should be examined [20]. Furthermore, data about the users who visited the page need to be collected. Some important factors involved in detecting click-fraud are IP distribution, most visiting hours, publisher’s CTR, countries and cities with more visits to the page, the type of users who obtain access and users’ behaviour before and after generating a click [21]. The probability P of an advert or a publisher’s click Adi being fraudulent can be expressed as P(Adi | f raud) = α, and P(Adi |not f raud) = 1 − P(Adi | f raud).
2.1.3
Module 3: Database for Algorithm Execution
The database records all the necessary information to carry out all the processes involved in online advertising. The database stores all the required information about advertisers, publishers and ANs to allow the ASM to work optimally. The most important data stored in the database consists of information related to the advertisers’ payments and the publishers’ charges. In addition, information about any fraud committed and information used by the ASF such as the advert CTR and the advert CPC fixed by each advertiser is also stored in the database. In the same way, whenever a user makes a visit to a page, an advert is displayed and the database is updated. The value of the probability that the click is fraudulent and that the advertisement is spam is also updated.
2.1.4
Module 4: Advertiser Selection Module
Whenever a user accesses a publisher’s website, a selection from among all adverts takes place. All adverts belonging to a different category from that of the publisher’s
An Alternative Auction System to Generalized …
89
website that is being accessed are discarded. Those adverts which are not discarded are called candidates. Then, only one advert from among all the candidates is selected, that is, the one that possesses the maximum Ad Rank value. To select the best advert, we apply the (Adver t) function, which assigns a real value in the range [0, 1] to all candidate adverts. The Ad Rank is explained in detail in Sect. 2.2.4. The (Adver t) function includes weights which are assigned in proportion to the importance of each objective. The Ad Rank is calculated considering all the AdX objectives. As can be seen in Fig. 2, this module takes into account both the CTR and the likelihood of advertisers and publishers being fraudulent. It also consults and updates the database where information about advertisers’ campaigns, AN balance, publishers’ account status and ANs’ performance is stored.
2.2 Development of the Advertisement Exchange System To develop the AdX we followed the following steps. First, we defined the necessary objectives in order to ensure the proper functioning of the publicity ecosystem. To ensure that objectives are met, we defined one economic penalty for each objective, in such a way that the more the objectives remain unmet, the greater the penalties sum will be, as explained in the following points. In addition, we created a set of rules in order to prevent the AdX from fraudulent activities. We established a metric expressed in economic terms in order to measure the AdX performance. Finally, we developed Algorithm 2 for the ASF and we defined a methodology to find the optimal configuration of weights using a GA.
2.2.1
Definition of the Objectives for the AdX
Several objectives should be met in order to have a successful AdX [22], where the optimization of some objectives may lead to the detriment of others. For example, the AdX should generate profits to the publishers as high as possible. But, at the same time, the AdX should not charge advertisers a price so high that their campaigns become unprofitable. The objectives of the algorithm comprising all adverts adi belonging to advertisers advi ∈ Adv, and all publishers pubi ∈ Pub of the ANi , where ANi ∈ Ad X , are: • (O1) Advertisers’ impression percentage: All advertisers need to display a reasonable amount of adverts so that all of them are satisfied. If the algorithm focuses just on maximizing the income of the AdX, then some advertisers may be left with no impressions. Thus, we should guarantee an equitable distribution of the advert impression number where advertisers paying a higher price have the advantage that their adverts are more frequently displayed. • (O2) Spam advertisers: Many advertisers display adverts on the Internet with malicious intent. These adverts are known as spam advertisers and they are very
90
•
•
•
•
L. Miralles-Pechuán et al.
detrimental to the online advertising ecosystem [23], so we should calculate the probability that an advert is spam. We expect to reduce as much as possible the instances in which they are displayed. In the case of implementing the system, we should also have a team in charge of verifying if an advertiser is trying to mislead users whenever the system alerts that an advertiser may be cheating. (O3) Campaigns profitability: Some inexperienced advertisers may pay for their campaigns a price above the prevailing market price. It is not advisable to take advantage of this kind of advertisers by charging them a higher price. Our AdX should make profitable campaigns for all kinds of advertisers. Hence, we need to ensure that in our AdX, the advert prices are similar to those in the market, that is Pri zead Pricemkt . (O4) Advertising network balance: Through collaboration, all ANs should make it possible for other ANs to display adverts in other ANs. If we want all ANs to participate in the AdX, then the number of adverts received by each ANs should be similar to the number of adverts delivered, that is, Advr ec − Advdel 0. (O5) Publishers’ click-fraud: Fraud committed by publishers is known as clickfraud and it can become very harmful to advertising campaigns. These fraudulent clicks are not generated by a genuine user interested in the product.1 Due to click-fraud, advertisers end up paying for clicks that do not bring any benefit. This increases the likelihood that advertisers shift to another ANs offering more profitable campaigns. Thus, we should avoid displaying in the AdX spam adverts. (O6) Income maximization: This is the most important goal, but we place it in the last position because each of the previous objectives has an associated penalty for it except this one. The Advert Selector algorithm should look for the most profitable adverts in order to distribute the highest amount of revenue possible among all publishers. The income value represents the money collected from the advertisers. Publishers should obtain reasonable economic returns so that they are discouraged from moving to other platforms and encouraged to recruit new advertisers.
2.2.2
Economic Penalties for the AdX
To ensure that the objectives are met, we define an economic penalty Pen i and a coefficient X i associated with each penalty, for each of the first five objectives Obji , where i = 1, ..., 5. In such a way that each penalty is applied whenever its corresponding AdX objective is not met. The rationale behind these penalties is that those participants (ANs, advertisers and publishers) who are not satisfied with the AdX usually leave the platform, which translates into economic losses. The X i coefficients allow us to increase or diminish the economic penalization that is applied when a goal is not met.
1
They are performed with the intent of increasing the publishers’ revenue or of harming the online platform. Many publishers may click on their own adverts or tell their friends to do so. There are also clicks made by click-bots which aim to harm the advertising ecosystem [24, 25].
An Alternative Auction System to Generalized …
91
The five penalties we have defined are: • (P1) Impression advert percentage: We must apply a penalty for each advertiser that fails to display a sufficient number of adverts. P1 can be expressed as “For each advertiser whose average ratio of advert impressions lies below 25%, we will subtract X 1 times the average proceeds of the advertisers in these ANs from the total Income”. • (P2) Spam advertisers: We can define P2 as: “For each click from a spam advertiser, we will deduct X 2 times the money generated by these clicks from the total Incomes”. • (P3) Campaign profitability: We want to avoid any abuse against inexperienced advertisers who may be made to pay a price above the market price. P3 can be expressed as “For each advertiser who pays a price 25% above the market price for his/her campaign, we will deduct X 3 times the money generated by that advertiser from the total Income”. • (P4) Advertising network balance: When an AN is not satisfied, it may stop working with the platform. Therefore, P4 is expressed as “For each AN that receives 25% fewer adverts than the number of adverts it delivers, we will reduce X 4 times the incomes of that AN to the total Incomes”. • (P5) Publishers’ click-fraud: As mentioned previously, click-fraud makes advertisers’ campaigns fail. To avoid this, we created the following penalty P5: “For each fraudulent click from a publishr, we will deduct X 5 times the value of this click from total Income”.
2.2.3
Online Fraud AdX Actions
We should highlight that in our present study, fraud is not just considered as an economic issue, but also as an ethical issue. Therefore, we must define a set of policies and rules oriented towards respecting their interests. AdX Policies: Any publisher who wants to participate in the business must accept several AdX policies aimed at reducing fraud to the greatest extent possible, so that the advertising habitat may be protected. These policies seek to expel publishers before they receive any payment if the system’s expert group determines that fraud was intentionally committed. Additionally, we could consider imposing fines on advertisers who use the platform to deliver spam adverts and to all those publishers who use black-hat techniques in order to increase their income. AdX Rules: In addition to the AdX policies, we defined a set of rules focusing on preventing fraud. These rules set clear-cut criteria for expelling from the AdX those publishers, advertisers or ANs who commit fraud. The difference between the rules and the penalties is that infringement of rules leads to expulsion from the AdX platform, while penalties are used to undermine the performance when objectives have not been met. In order to make the algorithm more efficient, we only check the rules that lead to expulsion for each N visits, where N = 1, 000. The rules that we define are:
92
L. Miralles-Pechuán et al.
• (R1) Fraudulent advertisers: To dissuade advertisers from trying to display spam adverts, we defined the following rule: “If an advertiser commits fraud on more than 20% of his/her adverts and the number of adverts is greater than 200 then he/she will be expelled” • (R2) Fraudulent publishers: We expel those publishers whose malicious clicks amount to a certain percentage above a predetermined threshold μ. Hence, we defined the following rule: “If a publisher commits fraud on more than 20% of his/her clicks and the number of clicks is higher than a specific threshold, in our case 30, then the publisher will be expelled”. • (R3) Fraudulent ANs: To discourage ANs from allowing their publishers and advertisers to commit fraud so as to win more money, we defined the following rule: “If 20% or more of the members of an AN are fraudulent advertisers or fraudulent publishers, and the number of visits is greater than V, where V = 2, 000, then the AN will be expelled from the platform”.
2.2.4
Advert Selector Module
In order to optimize the performance of the algorithm tasked with selecting an advert, we should define a function to evaluate all the objectives defined above according to the pre-established economic metric. Since the system has six objectives, the ASF also has six variables. Each variable is normalized and can be expressed as a real number within the range [0, 1]. The weights assigned to each variable are represented by θi , in such a way that they satisfy the Eq. 1. These weights do not have to be updated for each visit because this would lead to a very high computational cost. The values of these weights can be recalculated offline every few days. In addition, to ensure that the values of the weights are reliable, they must be calculated over a sufficiently large number of visits, since a small number of visits might not represent well the overall advert network behaviour. The weights’ optimal value for a network may vary depending on multiple factors such as the number of advertisers, the number of publishers, the number of ANs, the average click-fraud and the spam adverts within the AdX. 6
θi = 1
(1)
i=1
To determine the best advert to be displayed on each user visit we assign to each advert the Ad Rank value. The Ad Rank is recalculated for each candidate advert each time a user visits a publisher’s website applying the (Adver t) function as expressed in Eq. 3. Ad Rank ← (Adver t) (2)
An Alternative Auction System to Generalized …
93
(Adver t) = (θ1 × AN Satis f action) + (θ2 × Adver tiser Satis f action) +(θ3 × Spam Adver ts) + (θ4 × Campaign Cost) +(θ5 × Fraud Publisher ) + (θ6 × Ad V alue) (3) We now describe each of the variables representing the objectives of the AdX system: 1. AN Satisfaction: It expresses the satisfaction of the members of the network represented by the ratio between adverts received and adverts delivered. We should give priority to the advertisers from the unbalanced networks. The closer the value of this variable is to “1”, the more dissatisfied are the members of the network. Hence, we should try to help those networks that are most dissatisfied. The values of the variables are normalized to the range [0, 1] using Eq. 4 to give priority to unbalanced ANs. AN Satis f action = 1 −
Received V isits (Received V isits + Delivered V isits)
(4)
2. Advertiser Satisfaction: As expressed in Eq. 5, this variable measures the satisfaction of an advertiser according to the number of impressions each advertiser obtains. The closer to “1” the value of the variable is, the more discontent the advertiser will be. Therefore, we must give priority to those advertisers by displaying their adverts. Adver tiser Satis f action =
Potential V isits × Ad V alue (Potential V isits + Received V isits)
(5)
3. Spam Adverts: This variable represents the probability that an advert is of spam type. The likelier an advert is to be spam, the closer to zero the value of this variable will be. Therefore, spam ads are less likely to be shown. 4. Campaign Cost: The price of a campaign must be similar to the general market price. If an advertiser pays a price above the market price, the value of this variable will get closer to zero, as expressed in Eq. 6. Campaign Cost =
Advertiser Price (Advertiser Price + Real Price)
(6)
5. Fraud Publisher: It represents the probability that a click is fraudulent. The likelier the publisher is to be fraudulent, the closer to zero its value will be. 6. Ad Value: It represents the price the advertiser is willing to pay and it is calculated by Eq. 7. The closer to “1”, the greater the price the advertiser is willing to pay. To normalize the value of this variable, we divide the price the advertiser is willing to pay by the maximum value of the category. Ad V alue = C T R ×
C PC Adver tiser Max(Categor y C PC Adver tiser )
(7)
94
2.2.5
L. Miralles-Pechuán et al.
Measuring the Advertising Exchange System Performance
In order to measure the AdX performance, we have established a metric expressed in economic terms. As expressed in Eq. 8, the AdX performance is given by the difference between all the AdX incomes and the sum of all the penalties. The algorithm tries to maximize the AdX incomes, but at the same time, it tries to achieve all the objectives in order to minimize the AdX penalty value so that the AdX performance value will be as high as possible. Ad X Per f or mance = Ad X I ncomes – Ad X Penalties
(8)
The Ad X I ncomes represents the money collected from all advertisers from displaying their adverts, which is equal to the sum of the value of all clicks, as expressed in Eq. 9. N Click Price ( j) (9) Ad X I ncomes = j=1
Ad X Penalties is the sum total of all penalties, as expressed in Eq. 10, and it represents the financial penalty derived from not fulfilling the AdX objectives. Ad X Penalties =
5
Penalt y (i)
(10)
i=1
2.2.6
Mathematical System Description
Let us define a set of ANs as AN s =< AN 1 , AN 2 , ..., AN n >, with n number of ANs where each AN n has a list of advertisers Ad j such that ∃Ad j ∈ AN n , a set of publishers such that ∃Pbk ∈ AN n and a set of visits such that ∃vl ∈ AN n . Each Ad j is defined by a set of adverts Ad j =< a1 , ..., am >, where Ad j ⊆ A / Adm ), and A is the set comprising all the adverts. Finally, V and (ai ∈ Ad j ∧ ai ∈ is the set of visits ∀vi ∈ V ; ∀AN s. The selected advert ai is the advert belonging to the advert set A =< a1 , ..., am > and also ai ∈ Ad j which leads to the maximum income I , that is, select A = ai | ai ∈ A ∧ A ⊆ A ∴ ai ∈ A . We must maximize the total Incomes Ik and minimize Pk for all adverts ai from ANk , that is, Max N the sum of allpenalties ai ai I k − P k where N is the number of ANs, ANk with k =< 1, ..., N >, k=1 for an advert ai ∈ Ad j and a ANk this system is subject to: • Fraud (ai ) > 0: There is fraud on the part of the advertiser. • Fraud ( pi ) > 0: There is fraud on the part of the publisher, where pi ∈ P and P is the set of publishers.
An Alternative Auction System to Generalized …
95
• C T R ak i = C T R ak i × ϕ j and ϕ j represents the number of categories of ai with ϕ j ≤ p where p is the number of categories C j and j =< 1, ..., p > ∧ϕ ∈ R. i = ai (x1 , x2 , . . . , xw ) where X = feature ai . • C T R ak
(x| xaw ) is an aadvertiser ai ai × tc i − (ep i × M ai ) where tc is the • I k = Click × C T R ak i × Price Click total number of clicks on the advert, ep is the income received by the publisher per click, M is the number of samples for the adverts and Pz is the corresponding penalty, and
I ak i =
6
θi
i=1
3 Calculating the Optimal Value of the Weights Using a Genetic Algorithm Each variable of the ASF represents one criterion and it is multiplied by the weight such that the sum total of all the weights equals “1”, as expressed in Eq. 1. To obtain the optimal value for all weights, we applied optimization techniques based on GAs. Each time a visit occurs on a publisher’s site within the AdX, the ASM selects only one advert among the candidates. Algorithm 2 is in charge of taking into account all the objectives and updating the variables used by the ASF. The optimal weight configuration is the combination that generates the highest AdX performance according to the established metric. Algorithm 2 returns the AdX performance for a given weight configuration. We can think of Fig. 4 as a small module that returns the performance of the system (fitness of a GA function) according to the weights that are introduced as inputs. In order to find out the best weight configuration, we apply a GA with the following components.
3.1 Representation and Initial Population As genotype, we use a binary fixed-length representation. As it can be seen in Fig. 3, we used a length of 48 bits to represent each weight. Therefore, each weight can be represented with a value between 0 and 248 − 1, which is very high precision. Each individual I of the population is formed by the six weights and it is represented by a string of (6 × 48 bits = 288 bits) binary digits. The initial population is obtained at random with a uniform distribution. The size of the population is 100 in order to obtain diversity and an appropriate time of convergence [26]. The number of generations is 100 (Number of iterations in the stop criteria). Therefore, the number of evaluations for the function goal
96
L. Miralles-Pechuán et al.
Fig. 3 Weight codification using individuals of a GA
is 10,000 (100 individuals × 100 generations = 100, 000 evaluations). In some experiments, this number of assessments has been appropriate for the stabilization of the algorithm [26].
3.2 Handling Constraints The genotype used to represent solutions does not satisfy the constraint that all weights add up to “1”. However, an individual genotype I G is a string of random binary digits can be converted in six integer numbers, which is called individual phenotype I P, where each integer is in the range [0, 248 − 1]. Once the individual I G has been decoded into the individual I P, it can be easily transformed into a new array, called repaired individual I R, that satisfies the constraint (all numbers are in the range [0, 1] and add up to one) applying Algorithm 1. The repaired individual I R should be calculated as a prior step to the evaluation of the individual. In this way, the constraint is always satisfied without the need to design specialized operators for solution initialization, crossover or mutation. Algorithm 1 Repair algorithm. Require: Individual IP Ensure: Repaired individual IR Sum ← 0 for i = 1 to 6 do sum = sum + I P[i]; end for for i = 1 to 6 do I R[i] = I P[i]/sum; end for return I R
An Alternative Auction System to Generalized …
97
3.3 Fitness Function To calculate the fitness of each individual of the population I , the following steps are performed: • Obtaining the repaired individual I R (array of 6 real numbers in [0, 1] that satisfies the constraint) of the individual I P. • Calculating the fitness value using Eq. 11. Fitness(I R) =
N 5 Click Price I R ( j) − Penalt y j=1
IR
(i)
(11)
i=1
3.4 Genetic Algorithm Parameter Configuration We use “Double-point” for the crossover operator, that is, we select two points among which the genes of the individuals are interchanged. The parameter “Elitism percentage” is set to 5%. The parent selection method used is the “roulette wheel” (proportional selection and stochastic sampling with replacement). The replacement used method is “Generational replacement” in which a completely new population is generated with new children from existing parents via crossover and mutation. We applied similar parameters to the simple design GA proposed in Goldberg et al. [26]. The main reason is that our GA entails a high selective pressure (in comparison with other techniques of selection and generation replacement are a binary tournament or replacing steady-state) that takes a reasonable convergence time for our available computing capacity [27]. Since we used a binary simple representation and the constraint management does not require specialized operators, we consider to be appropriate the crossing and the uniform mutation operators proposed in Goldberg et al. [26]. To find the best combination values, the mutation probability, and the crossover probability are tested in the first configuration, which uses 10 ANs, with values from 0.1 to 1 with increments of 0.1. Therefore, we try 100 different combinations as expressed in Table 2. To calculate the best combination, we chose the best average configuration after executing the algorithm 10 times. Once the best combination is selected, we run the algorithm 30 times and then we calculate the average of the fitness function. The time required for each execution to take place is of approximately 14 minutes and 25 s (Fig. 4).
98
L. Miralles-Pechuán et al.
Algorithm 2 Advertising exchange system algorithm.
6 Require: ( i=1 θi = 1 : values), Data: Advertisers, publishers, ANs and users Ensure: Fitness 1: for all visi ∈ V is do
For all visits 2: for all advi ∈ Adv do
For all advertisers 3: if (Categor y (V isit) = Categor y (Adver t)) then Advert value calculation Function 4: Ad V alue ← F((θ1 × AN Satis) + (θ2 × Adv. Satis) + (θ3 × Spam Adver ts) + (θ4 × Camp Cost) + (θ5 × Fraud Publisher ) + (θ6 × Ad V alue)) end if if (Ad V alue > Max) then
Selects the best advertiser among all possibles Max ← Ad V alue Selected Ad ← Ad j end if end for if (N um (V isits) mod 1000 = 0) then
For each 1000 visits update parameters pen i ∈ Pen, advi ∈ Adv, an i ∈ AN s ← UpdateParameters()
Updates all roles parameters 13: Apply Rule 1( pubi ∈ Pub, Adv j ) It checks if there are cheats publishers and ejects them 14: Apply Rule 2(advi ∈ Adv, Adv j ) It checks if there are cheats advertisers and ejects them 15: Apply Rule 3(ANi ∈ AN s, Adv j ) It checks if there are cheats ANs and ejects them 16: end if 17: end for 18: Calculate the value of the variables: Incomes, Pen 1 , Pen 2 , Pen 3 , Pen 4 , Pen 5 19: Fitness ← I ncomes − (Pen 1 + Pen 2 + Pen 3 + Pen 4 + Pen 5 ) 20: Return Fitness 5: 6: 7: 8: 9: 10: 11: 12:
Fig. 4 Advert exchange weight optimization algorithm using genetic algorithms
An Alternative Auction System to Generalized …
99
3.5 Justification for the Chosen Values of the Coefficients, Penalties and Rules Click-fraud, spam adverts, and unsatisfied advertisers are factors that hurt the advertising ecosystem. However, determining the exact value of their negative impact on the AdX is a very complex task. Even these values were calculated, we still cannot ensure that they will be optimal for a long time because the scenario could change quickly. Therefore, finding the optimal configuration for all thresholds is out of the scope of our work and, for this reason, these values have been configured manually. However, we can briefly explain why we have configured the followings variables: 1) the coefficients (X 1 , ..., X 5 ) associated with each penalty, 2) the thresholds above which penalties are applied and 3) the conditions of each rule to expel a role from the platform. The thresholds of the penalties Pen 1 , Pen 3 and Pen 4 , representing the satisfaction degree, are configured to approximately 0.25%. Penalties Pen 2 and Pen 5 refer to click-fraud and spam adverts, respectively. In penalties Pen 2 and Pen 5 , 1/2 times the revenue obtained by the fraudulent clicks and the spam adverts is subtracted to the total income. With regards to the thresholds of the rules, we decided to expel from the AdX system all those ANs, publishers or advertisers committing more than 20% of fraud. In order to decide if a party involved in the system has committed fraud, it is necessary to analyze a large enough set of clicks or adverts. In order to do this, we define the followings conditions. For publishers, the number of fraudulent clicks must be greater than 30. For advertisers, the number of adverts must be greater than 200. For ANs, the number of visits must be greater than 2,000. If instead of analyzing 150,000 visits, we analyze 10 million, the threshold values will have to be higher.
4 Experiments and Results To prove that our system is valuable, we compared in experiment I the performance of the GA system with the extended GSP method. After applying the GSP method, we applied the penalties defined in our system. Finally, the aim of experiment II is to demonstrate that our GA is capable of adjusting the values of its weights to the new system configuration.
4.1 Preparation of the Experiments Our system takes into account many parameters to select an advert such as spam adverts, CTR, fraudulent publishers, the bid price and so on. There are some data
100
L. Miralles-Pechuán et al.
sets covering one of the considered aspects, but they are far from what we need. For this reason, to perform the experiments, both the visits and the configuration of each of the advertisers of all ANs have been simulated. In this work, we launched an experiment that would help us to understand the importance of each variable when the value of the penalty remains constant. To find the optimum values of the weights, we applied a GA. The GA is implemented in the environment Visual Studio C# version 12.0.31101.00 Update 4, on a computer with the following features: Intel®Core i5-2400 [email protected] GHz with 16 Gb RAM, with the operating system Windows 7 Pro, Service Pack 1, 64 bit. We have used the Genetic Algorithm Framework (GAF) package for C#2 to implement the GA. The GAF package was designed to be the simplest way to implement a GA in C#. It includes good documentation and a variety of functions for crossover, mutation and selection operators. In addition, it allows the customization of the operator functions by the developer. For achieving a deep evaluation of our proposed GA, we run the experiments I and II. We developed an environment of AdX with the following configurations. The percentage of an advert of being spam is randomly set within the range from 13 to 16%. The percentage of the publisher being fraudulent is randomly set with values in the range from 17% to 20%. The price the advertiser is willing to pay and the advert’s real value are randomly set with values between 0.2 and 1.2 dollars. In the same way, the CTR value of an advert is randomly set in the range [0, 1]. We used in experiments I the following number of ANs: 10, 20, 30, 40 and 50. Therefore, five different configurations are tested where each AN has 10 advertisers, 100 publishers and 150,000 user visits. Finally, each publisher’s page may belong to one of the 20 different categories and an advert can only be displayed in the pages with the same category. In the first experiment, we compared the system performance both for the cases when ANs collaborate with each other and when they operate independently, by applying the famous GSP auction method [29, 30]. We conducted five configurations for the collaborative system and five for the independent system. The GSP selects the advert with a higher price and the advertiser is charged with the value of the second priciest advert. Our system is focused on a collaborative AdX, so it would make no sense to apply the penalties when ANs operate independently. Therefore, we will not use the GA since there are no weights to be optimized in the ASF. In this experiment, we have compared the profits obtained in the independent and in the collaborative AdXs using the GSP methods. The average values of 30 executions are shown in Table 1. The sum total of all income when ANs operate independently is 375,886.80$ and 810,454.93$ when they collaborate with each other. This is an increase of a 215.61%. When ANs work independently, the AdX displays only those adverts that belong to the AN which the user is visiting. However, when ANs collaborate with each other, adverts from any AN can be displayed.
2
The GAF is a .net/Mono assembly, freely available via NuGet, that allows implementing GA in the environment of programming C# using only a few lines of code [28].
An Alternative Auction System to Generalized … Table 1 Results of the GA and the GSP systems No of ANs 10 20 30 Independent Collaborative
25,149.36 55,811.83
50,039.76 110,588.53
75,402.54 164,773.42
101
40
50
100,097.97 216,562.86
125,197.18 262,718.30
Fig. 5 Experiment I: Obtained profits by the Independent and the Collaborative systems using five configurations
If the AdX can choose an advertiser out of several networks instead of only one, the results will be much better. As can be seen in Fig. 5, the obtained profit when ANs collaborate is much higher than when they do not.
4.2 Experiment I In the second experiment, we configured the GA with the following settings. We set the coefficient value associated with each penalty as follows x1 = x2 = x3 = x4 = x5 = 0.5. Assigning to all weights the same value allows us to see more clearly the relative importance of each objective. These values are calculated by using the average value of ten different experiments for each probability combination. As shown in Table 2, the best probability combination consists of a crossover probability of 0.7 and a mutation probability of 0.2. Once we calculated the best combination, we executed the algorithm 30 times and we calculated the average. The results are shown in Table 3. The optimal values of the weights in the first configuration, which uses 10 ANs, for the best fitness function are shown in Fig. 7. We have ordered the variables in descending order according to their importance. As shown in Fig. 6 and in Table 3, the performance of the GSP system is worse than the performance of our GA system.
102
L. Miralles-Pechuán et al.
Table 2 Fitness value for crossover and mutation probability for all possible crossover and mutation probability value combinations with 0.1 increments ranging from 0.1 to 1. These values are the average value of 10 executions Mutation prob.
Crossover prob. 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.9
1
0.1
9,920.4
9,971.5
9,711.7
9,997.3
9,783.6
9,763.8
10,018.1 9,893.0
0.8
9,750.1
9,898.5
0.2
5,016.2
9,538.9
9,761.9
9,753.2
9,737.0
10,012.7 10,032.3 9,532.8
9,775.0
9,785.7
0.3
9,509.8
9,757.2
9,810.6
9,630.8
9,804.0
9,808.1
9,493.8
9,803.0
9,693.1
9,606.2
0.4
9,761.3
9,819.3
9,756.6
9,920.3
9,687.9
9,547.6
9,844.0
9,443.6
9,549.6
9,755.2
0.5
9,828.0
9,561.0
9,625.4
9,454.0
9,633.1
9,710.0
9,743.5
9,873.1
9,365.4
9,629.7
0.6
9,717.2
9,813.5
9,310.7
9,730.9
9,430.4
9,929.8
9,761.7
9,525.6
9,436.9
9,671.4
0.7
9,507.1
9,604.4
9,569.9
9,691.2
9,565.6
9,490.1
9,532.3
9,878.3
9,297.7
9,255.0
0.8
9,932.8
9,776.1
9,212.0
9,417.7
9,513.3
9,724.2
9,738.0
9,312.8
9,410.1
9,825.9
0.9
9,681.5
9,383.4
9,490.5
9,732.4
9,708.5
9,691.3
9,755.8
9,454.7
9,534.1
9,532.3
1
9,609.7
9,479.9
9,788.1
9,716.4
9,630.7
9,609.4
9,977.5
9,383.0
9,893.3
9,947.2
Table 3 Average of the GA and the GSP systems for the five configurations No of ANs 10 20 30 40 GA GSP
10,146.59 –26,727.29
19,188.95 –41,331.81
30,861.78 –63,645.60
41,587.55 –100,379.85
Fig. 6 Experiment I: comparison between GSP system and our GA system
50 50,167.97 –124,853.94
An Alternative Auction System to Generalized …
103
Fig. 7 Experiment I: best weight configuration using 10 ANs. Each weight has a value lying within the range [0, 1] and indicating the importance of each of the objectives. We have also represented the average of all the variables with a dashed red line
This is because the GSP system does not take into account any objective defined for the AdX, but only the economic performance, and therefore, the penalizations are very high. This makes us think that our system is interesting for those networks that want all their involved parties to be satisfied and want an ecosystem with little fraud. As can be observed, weights θ4 and θ1 are the most important. We have to keep in mind that the metric used in the fitness function is defined in economic terms. The weight θ4 is associated with Campaign Cost and it indicates if an advertiser’s campaign was priced above the market price. If those advertisers who are willing to pay more money for an advert were to leave the platform, the income would fall dramatically. On the other hand, θ1 regulates the N etwor k Satis f action variable which describes the network satisfaction with respect to the number of visits received and delivered. If a network leaves the AdX, all publishers and all advertisers who belong to this network will be lost, and so the costs would be very large. θ6 represents the weight associated with the variable Ad V alue, which represents the advertisement value. It is logical that it should have a high value because when more profitable adverts are selected, the ANs’ income increases. The weights θ3 , θ5 and θ2 reflect the values associated with fraud. θ3 is associated with the Spam Adver ts variable, which indicates the probability that an advertisement is of the spamming type. θ5 is associated with Fraud Publisher which indicates if an advertisement is fraudulent. Displaying spam adverts and receiving fraudulent clicks have a negative impact on the AdX, this is why the values of these two weights are similar. If we were to increase the value of these weights, we would have to increase the coefficient θ2 associated with this penalty value 2 or 3 times the amount of money obtained through fraud, instead of just 0.5 times. Finally, weight θ2 is associated with Adver tiser Satis f action, which indicates the satisfaction of an advertiser with respect to the number of adverts displayed. This weight usually has a value close to zero and it leads us to think that it is almost of no importance, since it is already automatically defined with weight θ4 . This means that, if the ANs are balanced, it is likely that the number of adverts posted by the publishers will lie also within the objective set.
104
L. Miralles-Pechuán et al.
Fig. 8 Experiment II: Best weight configuration changing the coefficients of the penalties
Table 4 Values of the genetic algorithm in experiment II Max value Avg value Min value 5,968.59
5,756.76
5,304.92
Std. dev. 153.32
4.3 Experiment II If we recall the results of Experiment I, we realize that θ2 was the least weight on the optimization of the weights of each objective. In the following experiment, we are going to increase the weight associated with objective 3 to verify that the GA is able to adapt to these changes. We also made another experiment with the same configuration as in this experiment, except for the value of the penalties’ coefficients, in order to see how weight values are readjusted. To achieve this purpose, we create an experiment in which we only change the coefficients of the penalties in the following way: x1 = x3 = x4 = x5 = 0.5, while x2 = 3, which represents the value associated with the variable θ2 . The rest of the parameters remains the same, as in the configuration of experiment I. The average value of the 30 executions are 5,756.76. The results of the experiment can be seen in Table 4. In this system, we have used the same configuration as in the previous system and we have also shown the calculated values. Figure 8 shows the results of the best weight configuration with the highest fitness value. As it is shown, the most important value is θ2 , which represents the advertiser’s satisfaction. We can observe how the values of θ1 , θ3 , θ4 and θ5 continue to maintain the same order that they had in Fig. 1, in terms of their weight. This is obvious since all we have done is to change the value of just one variable. The conclusion is simple. We have shown that if we change the coefficients of the penalties, then the values of the weights also change, so that the advert selection formula is again optimized.
An Alternative Auction System to Generalized …
105
5 Conclusions and Future Work Our work addresses a problem in the literature which, although not much studied, is of no less importance. To our knowledge, there is no other publication that focuses on creating a system for small networks to exchange adverts among themselves in order to improve their performance. We must bear in mind that the majority of ANs do not reveal their algorithms and methods since that would mean giving away part of their competitive advantage, which may have taken them many years of research. In this article, we have seen how to select an advert in an AdX system. We have seen how the selection of an advert is not a trivial task but a complex task that must take into account multiple objectives, often with conflicting interests, and each goal is associated with a weight to be optimized. One of the main achievements of this work is having provided a starting point from which an AdX system can be built and which takes into account the main threats and problems of online advertising. In addition, a methodology was developed to find the appropriate weights for a function that considers all the necessary objectives that create a proper AdX ecosystem. Our goal was not to develop a methodology to improve CTR prediction or fraud detection but to develop a methodology that helps in obtaining the best advert selection function after assuming that the CTR and the Fraud detection modules were correctly developed. Obviously, the more reliable and precise the modules, that provide data, the greater the system’s performance will be. We have seen that the optimum weights for the advert selection module vary depending on the goals, penalties, the number of advertisers and campaigns, as well as the settings of everything that composes the AdX. Therefore, there is no optimal configuration that can be extended to all systems. Studying the optimal value for each optimization would be an interesting line for future works. These values could be found by constructing complex simulated systems and testing them in a real scenario. As a future line of research, we might also attempt to include both the CPM payment model and the CPA payment model to the AdX. Furthermore, we may be able to develop new modules that enable ANs to cooperate among themselves with the aim of improving fraud detection. For this purpose, they could interchange information such as the CTR of the page, the CTR of the adverts or the behavioural patterns of the users. This could be done by collecting samples of behaviour for later analysis using models of machine learning. The more the samples, the greater their quality, so that more accurate models can be built. Further research could also involve developing a scalable system, i.e., instead of building a system of 10 networks with 10 advertisers and 100 publishers in each network, we could develop a system with 1,000 networks comprising 10,000 advertisers and 100,000 publishers. However, this would require better hardware and more computers working in parallel. Furthermore, to carry out this system, we could consider replicating the advert exchange system by using a distributed rather than a centralized architecture. These modules should be synchronized with the ongoing exchange of information within the networks, so that the variables are updated and they can optimize their response
106
L. Miralles-Pechuán et al.
time for each user. In order to do this, a communication protocol between the different Advert Exchange Systems will be required. This protocol will transfer the necessary information within the system in order to optimize the economic profits of the system, avoid fraud, and finally maintain the level of satisfaction of all the parties involved in the system.
References 1. Ren K, Qin J, Zheng L, Yang Z, Zhang W, Yu Y (2019) Deep landscape forecasting for realtime bidding advertising. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 363–372 2. Yuan S, Wang J, Zhao X (2013) Real-time bidding for online advertising: measurement and analysis. In: Proceedings of the seventh international workshop on data mining for online advertising, p 3. ACM 3. Adikari S, Dutta K (2015) Real time bidding in online digital advertisement. In: New horizons in design science: broadening the research agenda, pp 19–38. Springer 4. Qing C, Feng-Shan B, Bin G, Tie-Yan L (2015) Global optimization for advertisement selection in sponsored search. J Comput Sci Technol 30(2):295–310 5. Wei Y, Baichun X, Wu L (2020) Learning and pricing models for repeated generalized secondprice auction in search advertising. Eur J Operat Res 282(2):696–711 6. Balseiro SR, Feldman J, Mirrokni V, Muthukrishnan S (2014) Yield optimization of display advertising with ad exchange. Manage Sci 60(12):2886–2907 7. Le Q, Miralles-Pechuán L, Kulkarni S, Su J, Boydell O (2020) An overview of deep learning in industry. Data Anal AI, pp 65–98 8. Cavallo R, Mcafee RP, Vassilvitskii S (2015) Display advertising auctions with arbitrage. ACM Trans Econ Comput 3(3):15 9. Perlich C, Dalessandro B, Raeder T, Stitelman O, Provost F (2014) Machine learning for targeted display advertising: transfer learning in action. Mach Learn 95(1):103–127 10. Benjamin E, Michael O, Michael S (2005) Internet advertising and the generalized second price auction: selling billions of dollars worth of keywords. Technical report, National Bureau of Economic Research 11. Yuan S, Wang J, Chen B, Mason P, Seljan S (2014) An empirical study of reserve price optimisation in real-time bidding. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining 12. Korula N, Mirrokni V, Nazerzadeh H (2015) Optimizing display advertising markets: challenges and directions. Available at SSRN 2623163 13. Minastireanu E-A, Mesnita G (2019) Light gbm machine learning algorithm to online click fraud detection. J Inf Assur Cybersecur 14. Ktena SI, Tejani A, Theis L, Myana PK, Dilipkumar D, Huszár F, Yoo S, Shi W (2019) Addressing delayed feedback for continuous training with neural networks in CTR prediction. In: Proceedings of the 13th ACM conference on recommender systems, pp 187–195 15. Zhong S-H, Liu Y, Liu Y (2011) Bilinear deep learning for image classification. In: Proceedings of the 19th ACM international conference on multimedia, pp 343–352. ACM 16. Miralles-Pechuán L, Jiménez F, Ponce H, Martínez-Villaseñor L (2020) A methodology based on deep q-learning/genetic algorithms for optimizing covid-19 pandemic government actions. In: Proceedings of the 29th ACM international conference on information & knowledge management, CIKM ’20, pp 1135–1144, New York, NY, USA. Association for Computing Machinery 17. Miralles-Pechuán L, Ponce H, Martínez-Villaseñor L (2020) Optimization of the containment levels for the reopening of Mexico city due to Covid-19. IEEE Latin Amer Trans 100(1e)
An Alternative Auction System to Generalized …
107
18. Miralles-Pechuán L, Rosso D, Jiménez F, García JM (2017) A methodology based on deep learning for advert value calculation in cpm, cpc and cpa networks. Soft Comput 21(3):651–665 19. Zarras A, Kapravelos A, Stringhini G, Holz T, Kruegel C, Vigna G (21014) The dark alleys of madison avenue: understanding malicious advertisements. In: Proceedings of the 2014 conference on internet measurement conference, pp 373–380. ACM 20. Nir K, Jeffrey V (2019) Online advertising fraud. Computer 52(1):58–61 21. Zhang L, Guan Y (2008) Detecting click fraud in pay-per-click streams of online advertising networks. In: The 28th international conference on distributed computing systems, ICDCS’08, pp 77–84. IEEE 22. Cui Y, Zhang R, Li W, Mao J (2011) Bid landscape forecasting in online ad exchange marketplace. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 265–273. ACM 23. Ellis-Chadwick F, Doherty NF (2012) Web advertising: the role of e-mail marketing. J Bus Res 65(6):843–848 24. Blizard T, Livic N (2012) Click-fraud monetizing malware: a survey and case study. In: 2012 7th international conference on malicious and unwanted software (MALWARE), pp 67–72. IEEE 25. Daswani N, Mysen C, Rao V, Weis S, Gharachorloo K, Ghosemajumder S (2008) Online advertising fraud. Crimeware: understanding new attacks and defenses 26. Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison Wesley, p 102 27. Bäck T (1994) Selective pressure in evolutionary algorithms: a characterization of selection mechanisms. In: Proceedings of the first IEEE conference on evolutionary computation, pp 57–62. IEEE Press 28. Newcombe J (2015) Genetic algorithm framework. Online; Accessed 05 Dec 2015 29. Chen FY, Chen J, Xiao Y (2007) Optimal control of selling channels for an online retailer with cost-per-click payments and seasonal products. Product Operat Manage 16(3):292–305 30. Benjamin E, Michael O, Michael S (2007) Internet advertising and the generalized second-price auction: selling billions of dollars worth of keywords. Amer Econ Rev 97(1):242–259
Low-Cost Fuzzy Control for Poultry Heating Systems Gustavo Caiza , Cristhian Monta, Paulina Ayala , Javier Caceres, Carlos A. Garcia , and Marcelo V. Garcia
Abstract The evolution of technologies advances by leaps and bounds, that is why this article describes the implementation of a prototype of fuzzy monitoring and control with low-cost restrictions to control the heating of poultry farms through the use of under floor heating and solar energy. The monitoring system is based on free distribution LAMP servers. Fuzzy control is implemented with restricted membership functions to keep heating in an optimal state. The benefits provided by the sun and Ecuador’s geographical location, make this resource an important source of renewable energy that was used for the heating process of close environment, thus creating an ideal environment for the process of poultry breeding. Keywords Low-cost automation · Fuzzy control · Solar energy · LAMP server
G. Caiza (B) Universidad Politecnica Salesiana UPS, 170146 Quito, Ecuador e-mail: [email protected] C. Monta · P. Ayala · J. Caceres · C. A. Garcia · M. V. Garcia Universidad Tecnica de Ambato UTA, 180103 Ambato, Ecuador e-mail: [email protected] P. Ayala e-mail: [email protected] J. Caceres e-mail: [email protected] C. A. Garcia e-mail: [email protected] M. V. Garcia e-mail: [email protected]; [email protected] M. V. Garcia University of Basque Country UPV/EHU, 48013 Bilbao, Spain
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_9
109
110
G. Caiza et al.
1 Introduction Poultry farming is a fundamental field for sustenance and socioeconomic development of the country, such as agriculture, made up of a large production chain providing supply for poultry feeding, being corn, soybeans, and balanced products in general; it also covers the transport sector that provides its service for the transfer of the product to different places. The labour that is used in this field is also directly and indirectly involved, since the quality and lifestyle of the inhabitants surrounding the production sector is enhanced. For breeding chickens in poultry farms, it is necessary to have a properly monitored and controlled environment that allows an adequate development of the fowl and can increase production at the lowest possible cost, the main factors to control are temperature and humidity that affect the assimilation of food and the energy consumption of the fowl. In recent years, several studies have been presented on the automation and monitoring of humidity, temperature, and other physical poultry variables. Most of these investigations have limited themselves to implementing an automation system which controls temperature and humidity, but with little emphasis on the control model [4], due to this, in the results, a time of 400 s is required to establish the values to be controlled and can be observed in comparison to the set point. Other studies apply the Sliding Mode Controller which allows to develop a non-linear dynamic mathematical model considering the final balance and the mass of the system [6], where the temperature overshoot with the set point is greater compared to humidity. Fuzzy logic has been used to develop optimization models for poultry systems because they allow the correct combination of energy resources to be selected more effectively, taking into account the various conflicting criteria such as costs, availability of supply, type of energy, etc. Ahamed et al. [1]. Models based on fuzzy logic have achieved more realistic solutions from renewable energy sources. It also makes it feasible to conceptualize the uncertainty of the system into a neat quantifiable parameter. Therefore, models based on fuzzy logic can be adopted so that planning and heating control in poultry reaches practical solutions [5]. This project proposes the use of Arduino which is an open source electronics platform and the Raspbian operating system of the Linux distribution, as well as the use of sensors and actuators for process control and the use of the sun as renewable energy to drive heat through the poultry system. Later, the data storage process is carried out in the database mounted on LAMP servers by using the Raspberry Pi3 microcomputer, as the last stage, once the data has been obtained, these were displayed via wireless connection to users registered in the system. The content of the article is structured as follows: Sect. 2; which presents the case study used for the research method. In Sect. 3, the idea is shaped with the implementation of fuzzy control with restrictions in low-cost systems; the results obtained are presented in Sect. 4; finally, conclusions and possible future work are presented in Sect. 5.
Low-Cost Fuzzy Control for Poultry Heating Systems
111
2 Case Study In a poultry farm, it is very important to be able to control its environmental parameters because they directly influence the growth and development of the fowl. For this reason, depending on the fowl’s age, a temperature is recommended within a range of 30–25 °C and 50% humidity, which guarantees an ideal environment for the fowl and thus a final product of very good quality, also complying with the established norms and standards [5]. In this study, a system that allows the control and monitoring of various physical parameters of a poultry farm using fuzzy logic with restrictions together with the Arduino open source development platform and the Raspbian operating system are presented, in addition to the use of sensors and actuators for the control of the process and the use of the sun as renewable energy to drive heat through the system. The monitoring is done thanks to the development of a LAMP server. Several achievements were made by integrating all the devices that are presented in the three sections of Fig. 1, First of all, obtaining real temperature and humidity data through the use of the DS18B20 and DHT11 sensor. Once these data were obtained, they are sent to the microcontroller which performs two processes: first, the control process that uses fuzzy logic through the use of the Mandani model with restrictions to adapt the ideal temperature and humidity signals for the correct growth and development of the fowls in the sheds, and second, the process of storing the data in the database mounted on LAMP servers by using the Raspberry Pi3, as the last stage. Once the data has been obtained, they are displayed via wireless connection to the users registered in the system.
Fig. 1 Design of the prototype of the control and monitoring system
112
G. Caiza et al.
3 Solution Proposal For the development of this study, it is first necessary to determine the appropriate values for each of the variables to be controlled, such as temperature, humidity, and ventilation. The temperature at the time of vatting the bird should be 37.6 ◦ C, during the first two weeks of life, the fowl cannot control its body temperature; this phenomenon is called “poikilothermia”. After this time, fowls have already developed the ability to regulate their temperature, calling this process “homeothermy”, and also while it continues to grow, fowls need lower temperature values. Regarding humidity, which is the saturation of water with respect to the air at any given temperature, expressed as a percentage, ranges of humidity directly affect the fowl’s ability to cool down through panting, directly influencing ammonia production inside hatchery environments, which, in turn, affects their growth (see Table 1). Ventilation is another variable that is controlled based on systems that operate by means of depression, so called because, inside the poultry farm, pressure is lower than outside, trying to create emptiness inside the poultry farm. To reach this point, it is necessary to precisely control both the air intake, as well as extraction in the windows, thus achieving the adjustment between the air intake and the air outlet. As in the previous variables, the fowls need a certain amount of flow or net air (m 3 / h) as shown in Table 2.
Table 1 Temperature and humidity values by age of the fowl Fowl’s age Temperature ◦ C 1st–2nd day 3rd–7th days 2nd week 3rd week 4th week 5th week onwards.
30–32 29–30 27–29 25–27 23–25 21–23
Table 2 Ventilation by depression in poultry sheds Age (days) Weight (kg) 7 42
0.189 2.2
Relative Humidity (%) 35–40 35–45 40–45 45–50 50–55 55–60
Airflow (m3 /h/fowl) 0.22 9.8
Low-Cost Fuzzy Control for Poultry Heating Systems
113
3.1 Fuzzy Inference Based on Linear Programming Fuzzy linear programming proposes that the fuzzy model is equivalent to a classical linear programming maximization. It can be presented as: cT x ≥∼ z, Ax ≤∼ b and x ≥ 0. Where ≥∼ is the fuzzy version of ≥ which means “essentially greater than or equal to; and ≤∼ which means” “essentially less than or equal to”. The fuzzy constraints of the model are shown below in Eqs. (1)–(4) λ pi + ti ≤ pi ; i = 1, . . . , m + 1
(1)
Bi x − ti ≤ di
(2)
ti ≤ pi
(3)
x, t ≥ 0
(4)
where λ is a new variable introduced that can take values between 0 and 1, pi is the tolerance intervals of each row i; ti is a variable that measures the degree of violation of constraint i; B is equivalent to the combinatorial of −c and A of the objective function and the constraints, Bi is the element of row i of B; d is the combinatorial of −z and b of the objective function and the constraints; di is the element of row i of d [3]. To implement the control with restrictions, the Mamdani-type inference system was chosen because it supports working with two or more outputs at the same time. This type of fuzzy inference was applied to the control of temperature and humidity using the MATLAB software and its “Fuzzy Logic Tool Box”, which is used to develop fuzzy systems using the graphical user interface (GUI) [2]. For the analysis of the fuzzy inference system, the FLT Fuzzy Logic Toolbox used five graphical tools for modelling, the editing section, and finally the observation. The fuzzy inference system Fuzzy Inference System allows determining the inference method used. For this study, the Mandani type inference method was used, and also the following input variables entered: temperature and humidity, and the output variables: motor time and fan bits Restrictions that will be explained in the following sections were used. For the membership functions, the minimum and maximum values of both input variables and output variables, were analyzed, as shown in Table 3, and then five membership functions were written for each variable. In the membership functions, a trapezoidal function was used in each of the extremes since some parameters acquire non-finite values, thus centering the mean value at 0 and to improve precision, triangular functions are used since they facilitate calculations performed by the controller because they are linear functions. Based on reviewed information, a fuzzy linear programming (PLEM) proposal has been developed, by applying uncertainty in the temperature and humidity demand
114
G. Caiza et al.
Table 3 Maximum and minimum values of variables Variable Maximum Minimum Mean value Maximum value value reference value Temperature Humidity Motor time Fan bits
32.69 ◦ C 60% 25 s 1023 bits
20.5 ◦ C 35% 0s 500 bits
26.595 ◦ C 47.5% 12.5 s 761.5 bits
6.095 ◦ C 12.5% 12.5 s 261.5 bits
Minimum reference value
Mean reference value
–6.095 ◦ C –12.5% –12.5 s –261.5 bits
0 ◦C 0% 0s 0 bits
Fig. 2 Membership functions for the temperature variable
restrictions. The following fuzzy parameters are used to represent the uncertainties of demands for temperature and humidity in the PLEM model. Fuzzy parameters: – – – –
λ1 Degree of temperature satisfaction. λ2 Degree of humidity satisfaction. T˜ Maximum tolerance for temperature demand. H˜ Maximum tolerance for humidity demand
Other parameters based on actuators energy consumption of the poultry house are: – – – – – –
f e p d Energy flow [Wh/day] between points p and d; p = 1, P; d ∈ Qp. f p p d Power Flow [W] between points p and d; p = 1, P; d ∈ Qp. E Ss Energy generated [Wh/day] by a panel type s; s = 1, S. E D p Energy demand [Wh/day] of point p; p = 1, P. P D p Power demand [W] of point p; p = 1, P. P Ii Maximum Power [W] of an inverter type i; i = 1, I.
A strategy to manage the uncertainty in parameters such as: energy consumption, temperature, and humidity set points. These values are represented, through belonging or membership functions, and shown in Fig. 2. Figure 2 shows the memberships of each of the parameters that are transformed into fuzzy, temperature, humidity, energy demand, and power consumption of the actuators. The right side of the figure shows mathematically each section of the function. It is established that the fuzzy parameters will be integrated into the demand
Low-Cost Fuzzy Control for Poultry Heating Systems
115
restrictions for temperature, humidity, energy, and power. The new fuzzy restrictions for energy (5) and power (6) are shown below. p
f eq p +
S
˜ 1 − 1) + E D p + E Ss · xs ps E(λ
q=1| p∈Q q
s=1
d∈Q p
p
I
f e pd ; p = 1, ...., P (5)
q=1| p∈Q q
f pq p +
˜ 2 − 1) + P D p + P Ii · xi pi P(λ
i=1
f p pd ; p = 1, ...., P
d∈Q p
(6) The data used for the temperature variable were, as the highest value (32 ◦ C) and the lowest value (20 ◦ C) as indicated in Table 3, which are the temperature boundary data to control. Taking the EFP function as an example, the lower value p = –4, the upper value d = 0 and the modal value m = 2. For the membership functions of the Motor Time variable, the time it takes to achieve the temperature variation from the lowest value (20 ◦ C) to the highest value (32 ◦ C), which was 25 s from the start of the engine that allows the circulation of the water flow. Therefore, as in the previous case, the mean value was centred at 0. Taking the TFP function as an example, the lower value p = –8.34, the upper value d = 0 and the modal value m = 4.17 For the case of the humidity input variable, the difference between the highest value (60%) and the lowest value (35%) provides us with the range of humidity percentage to be controlled. Taking the HNP function as an example, the lower value p = –8.34, the upper d = 0, and the modal value m = 4.17. The data that controls the fan variable for the membership functions are given in bits. For this case, the microcontroller’s operating resolution is taken into account, taking its lowest value equal to 0 and the highest one equal to 255 bits. Taking the BNP function as an example, the lower value p = –174.33, the upper value d = 0, and the modal value m = 87.16.
3.2 LAMP Server The developed system has the ability to monitor in real time the variables to be controlled, for which a LAMP server was developed, the server makes use of the HTTP protocol (Hypertext Transfer Protocol), which allows us to interact with the application layer within the TCP/IP model. To sum up, it allows the communication among web pages from a web server to the user’s browser (see Fig. 3). A MySql server is also used, which allows managing related open source databases. They are fundamental within the project, thus being able to store the data of the variables to control and organize them in table forms. Therefore, it is able to relate each table in the same way required by the user. The creation of these charts is performed using the specific language SQL (Structured Query Language), managing to give direct instructions to the server
116
G. Caiza et al.
Fig. 3 Website developed to monitor process
4 Results To carry out system tests, different values of temperature and humidity recommended by the fowl’s age were taken. The graphics charts show the response of the system, each one representing a different temperature and humidity value within the established values. Data was taken every hour for 24 h for 21 days with different setpoints and were stored in the database. For recommended values from the third week of age of the fowl onwards, regarding temperature which ranges from 21 ◦ C to 27 ◦ C, it presents a deviation of ± 0.06 ◦ C to the reference value, and for humidity that ranges from 45 to 60%, it shows a deviation of ± 0.5% to ± 1.5% at different times of the day. See Figs. 4 and 5 The results show that the system works correctly because the temperature and humidity of the poultry is maintained during the 24 h, although the reference values vary, deviations are relatively low since they fluctuate between 0.06 ◦ C and 0.09 ◦ C regarding temperature, and humidity fluctuates between 0.5 and 1.5%.
Low-Cost Fuzzy Control for Poultry Heating Systems
117
Fig. 4 SP versus temperature
Fig. 5 SP versus humidity
5 Conclusions This work presents the implementation of a fuzzy control system with restrictions using the Mamdani inference method, which allowed controlling two input variables and two output variables. For this, it was necessary to know the behaviour of the plant, and thus establish the functions of memberships and fuzzy control rules, where the
118
G. Caiza et al.
outcome of the set of fuzzy rules was a single value in defuzzification using the centroid method. The use of fuzzy control allows the system to be efficient, versatile, and simple, as the results show. The system also allows real-time monitoring of the processes carried out in the plant from anywhere in the world, using free hardware and software with a LAMP server (Linux, Apache, MySql, PHP) that allows safe browsing, a low maintenance, considerably reducing their costs, making it a low-cost automation solution. The under floor heating system avoids the emission of greenhouse gases, which improves the quality of life of operators and poultry.
References 1. Ahamed NU, Taha ZB, Khairuddin IBM, Rabbi M, Rahaman SM, Sundaraj K (2016) Fuzzy logic controller design for intelligent air-conditioning system. In: 2016 2nd international conference on control science and systems engineering (ICCSSE), pp 232–236. IEEE, Singapore. 10.1109/CCSSE.2016.7784388, https://ieeexplore.ieee.org/document/7784388/ 2. Fannakh M, Elhafyani ML, Zouggar S (2019) Hardware implementation of the fuzzy logic MPPT in an Arduino card using a Simulink support package for PV application. IET Renew Power Gener 13(3):510–518 3. Kumar D, Singh J, Singh OP (2013) Seema: a fuzzy logic based decision support system for evaluation of suppliers in supply chain management practices. Math Comput Model 58(11– 12):1679–1695 4. Nuyya O, Sergeeva E, Khomenko A (2018) Modeling, simulation and implementation of $a$ low- scale poultry farm control system. In: 2018 10th international congress on ultra modern telecommunications and control systems and workshops (ICUMT), pp 1–5. IEEE, Moscow, Russia. https://doi.org/10.1109/ICUMT.2018.8631253, https://ieeexplore.ieee.org/document/ 8631253/ 5. Padilha A, Farret F, Popov V (2001) Neurofuzzy controller in automated climatization for poultry houses. In: IECON’01. 27th annual conference of the ieee industrial electronics society (Cat. No.37243), vol 1, pp 70–75. IEEE, Denver, CO, USA. https://doi.org/10.1109/IECON.2001. 976456, http://ieeexplore.ieee.org/document/976456/ 6. Upachaban T, Boonma A, Radpukdee T (2016) Climate control system of a poultry house using sliding mode control. In: 2016 international symposium on flexible automation (ISFA), pp 53– 58. IEEE, Cleveland, OH, USA. https://doi.org/10.1109/ISFA.2016.7790135, http://ieeexplore. ieee.org/document/7790135/
Towards Empowering Business Process Redesign with Sentiment Analysis Selver Softic and Egon Lüftenegger
Abstract In this paper, we propose a novel approach of empowering the Business Process Redesign (BPR) by using sentiment analysis on comments collected during the redesign phase of business processes. For this purpose, we trained and tested our Sentiment Analysis Module (SAM) to prioritize and classify the stakeholder comments as a part of software tool for BPMN based modeling and annotation tool. The preliminary result with evaluation test cases seem to be promising regarding effective ranking and classifying the improvement proposals on BPMN design. However, the findings are also leaving space for improvements in training data segment and in extending the tool with social BPMN functionality. Keywords Business process redesign · Business process management · Sentiment analysis · Decision support
1 Introduction The quantity and complexity of business processes that needed to be integrated led to the creation of Business Process Management (BPM). BPM represents a structured, consistent, and coherent approach for understanding, modeling, enacting, analyzing, documenting, and changing business processes for contributing business performance [1, 10, 13]. BPM provides concepts, methods, techniques, and tools that cover all aspects of managing a process—plan, organize, monitor, control—as well as its actual execution [1]. Traditional BPM methodologies often follow a top-down decomposition approach resulting in a long running process improvement process, that requires intensive S. Softic (B) · E. Lüftenegger CAMPUS 02 University of Applied Sciences, IT & Business Informatics, Graz, Austria e-mail: [email protected] E. Lüftenegger e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_10
119
120
S. Softic and E. Lüftenegger
negotiations for achieving change within the BPM lifecycle. This traditional process improvement approach can become problematic for companies due to the unpredicted market conditions. Changing preferences in the customer’s needs require fast changes in a business process model. Hence, there is a need for an agile approach for reacting to the changing business landscape. One of the possible empowerment could be using the advanced technologies like artificial intelligence and machine learning and methods such as sentiment analysis to analyze the opinions and insights from different stakeholder in BPM process in a fast and efficient way. In this paper, we consider such case by involving a sentiment analysis module into a conventional design process scenario and using it as empowering assistant for prioritization of redesign suggestions and comments on the process.
2 Related Work 2.1 Business Process Redesign Business Process Redesign (BPR) or Business Process Re-engineering aims at improvement of vital aspects of business processes aiming at achieving some special goal, e.g., reducing costs. The importance of BPR was initially outlined by the work of Davenport and Short [4] in early 90s. However, this wave of enthusiasm flattened out by the end of decade. As main reasons for this, reported in literature were named the concept misuse (false labeling of the projects as BPR project), immaturity of necessary tools, and too intensive approach regarding the phase of application. Revival of the BPR concept according to [5] happened in relation to BPM, where several studies appeared showing that organizations which are more process oriented performed better then those which did not follow this paradigm. Studies that followed confirmed these findings. This established the new credibility to the process thinking. The BPR has been seen in this case as set of tools that can be used within BPM.
2.2 Business Process Modeling The overall goal of Business Process Modeling is to establish a common perspective and understanding for a business process within an enterprise between the relevant stakeholders involved. Hereby, the graphical representation such as flowchart or similar serves as base to show the process steps and workflows. This approach is widely used to recognize and prevent potential weaknesses and implement improvements in companies processes as well as to offer a good base for comprehensive understanding of a process in general.
Towards Empowering Business Process Redesign with Sentiment Analysis
121
2.3 BPMN The BPMN 2.0 (Object Management Group, 2011) is a new standard for business process specification developed by a variety of Business Process Modeling (BPM) tool vendors. This standard is one of the most important forms of representing business process models, offering clear and simple semantics to describe the business process of a business [2, 12]. This language was developed with the intention of modeling typical business modeling activities [9, 10]. This is another important reason for choosing this notation because our software-based methodology is oriented towards business alignment. The goal of the approach presented here is to provide entrepreneurs with a simple BPMN 2.0 tool without the complexity and cost of enterprise software currently offered at the market.
2.4 Data Mining, Sentiment Analysis, and Opinion Mining in Business Processes Data mining is being used in the field of BPM for process mining. The process mining is focused on processes at run-time, more precisely for re-creating a business process from systems logs. Opinion mining is a sub-discipline of data mining and computational linguistics for extracting, classifying, understanding, and assessing opinions. Sentiment analysis is often used in opinion mining for extracting opinions expressed in text. However, current research is focused on e-business and e-commerce like social media and social networks like Twitter and Flickr rather than BPM and BPR [3].
3 BPM Lifecycle and SentiProMo Tool BPM lifecycle described in [5] represents different phases of the process beginning by analysis and ending by process monitoring and controlling and process discovery. Our usage scenario in this lifecycle is placed between the process analysis and process redesign phases. During the design phase of the BPM lifecycle, social software adequately integrates the needs of all stakeholders [11]. We use our SentiPromo Tool [7] for this purpose to empower the (re)-design through integration of stakeholder’s needs expressed as opinions.
122
S. Softic and E. Lüftenegger
Fig. 1 Commenting workflow and the role of sentiment analysis
3.1 Using Sentiment Analysis in BPR SentiProMo Tool1 was developed in our department in order to provide a possibility of a role-based social intervention within the business process (re)-design. The roles supported in this tool are leaned on prior research on business process knowledge management framework [6]: Activity Performer (AP), Process Owner (PO), Process Designer (PD), Superior Decision Maker (SDM), and Customer (C). According to [11], BPM tools that follow the social BPM paradigm provide a mechanism to handle priorities within a business process [11]. This also applies to SentiProMo Tool (Fig. 1). The architecture relies basically on three layers: user interface, modules, and data base layer. The user interface layer basically offers views on results from underlying layers. The data base layer provides database for storing the comments on processes and handles the BPMN models repository. For our observation, we focus on modules layer. Beside process modeler and business process repository module, the tool has the task commenting module which allows adding task-wise comments to process from the perspective of different roles. As empowerment of commenting process in background runs the Semantic Annotation Module (SAM), which classifies the comments and assign them to a positive or negative sentiment using a real score. The implementation of SAM is described in detail in the next section. The data base layer provides database for storing the comments on processes and handles the BPMN models repository.
1
https://sites.google.com/view/sentipromo.
Towards Empowering Business Process Redesign with Sentiment Analysis Table 1 Training data sets for SAM module Source Amazon IMDB Yelp Twitter Total
123
# prelabeled instances 1000 1000 1000 648579 651579
Table 2 Top 5 models explored for SAM module Rank Trainer
Accuracy AUC
AUPRC
F1-score
Duration
1.
AveragedPerceptronBinary
0,8244
0,8549
0,7678
46,3
2.
SdcaLogisticRegressionBinary
0,8186
0,8907
0,8485
0,7585
39,8
3.
LightGbmBinary
0,8082
0,8810
0,8352
0,7373
273,1
4.
SymbolicSgdLogisticRegressionBinary
0,8045
0,8754
0,8276
0,7321
38,9
5.
LinearSvmBinary
0,7930
0,8600
0,8106
0,6997
37,6
0,8963
4 Sentiment Analysis Module (SAM) Sentiment Analysis Module (SAM) was implemented using the ML .NET for classifying comments in English language. The SAM module uses supervised learning as base for comment classification. Table 1 shows an overview over data sets that were used for training. The training data originates from Sentiment Labeled Sentence Data Set from UCI Machine Learning Repository2 and from Sentiment140 data set from Stanford.3 The training was preformed with different number of iteration on different algorithms. Averaged perceptron binary classification model turned to be the best choice in this case. As we can see in Table 2, this model shows best AUC (Area Under The Curve) and other relevant measures [8].
4.1 Evaluation We evaluated the SAM module with additional external data sets (Twitter US Airline Sentiment4 ) containing review tweets on US airplane companies in order to estimate the results obtained through trainings. After data cleaning the evaluation data
2
https://archive.ics.uci.edu/ml/datasets/Sentiment+Labelled+Sentence. http://help.sentiment140.com/for-students/. 4 https://inclass.kaggle.com/crowdflower/twitter-airline-sentiment. 3
124
S. Softic and E. Lüftenegger
Table 3 Confusion matrix for test data set True positive Predicted positive Predicted negative
TP = 1782 FN = 581
Table 4 Calculated measures on test data set Measure Value Sensitivity Specificity Accuracy F1 Score
0,7541 0.8427 0.8246 0.6376
True negative FP = 1445 TN = 7741
Derivations TPR =TP/(TP+FN) SPC = TN / (FP + TN) ACC = (TP + TN) / (P + N) F1 = 2TP / (2TP + FP + FN)
set contained 11549 tweets. The result after classifying the data set with SAM are shown in confusion matrix in Tables 3 and 4 through common information retrieval measures [8].
5 Preliminary Results Each time we use the task commenting module to comment a single task from a stakeholders perspective as shown in Fig. 2 SAM module calculates on the fly the sentiment score for the given comment. In Fig. 3, we present the view that shows the processed sentiment analysis of the stakeholders’ comments over all commented tasks within the SentiProMo tool. Each processed comment is presented as a row. For each row, we have the following elements presented as columns from the leftmost to the rightmost as follows: the task identifier, the task name, the stakeholders’ category (from the identified stakeholders we mentioned before), the comment made by a specific stakeholders, the calculated sentiment score as positive or negative number, and a timestamp that registers the time of the comment insertion by the corresponding stakeholder. Figure 4, shows an overview score as positive or negative number performed by SentiProMo of the sentiment of the whole business process as negative sentiment and positive sentiment. The software calculates the resulting number by adding all negative and positives sentiments of each task.
Towards Empowering Business Process Redesign with Sentiment Analysis
Fig. 2 Adding and classifying task-wise comments in SentiProMo Tool
Fig. 3 Sentiment analysis module (SAM) applied to comment analysis Fig. 4 Overall sentiment score in a business process
125
126
S. Softic and E. Lüftenegger
6 Conclusion and Outlook Our contribution introduces shows how we use opinion mining, particularly sentiment analysis as empowerment, in the context of BPR and social BPM. Sentiment analysis is a perfect fit for the field of BPR and social BPM because we can analyze the user’s opinions with it and engage immediate changes in the process redesign. Currently, our Sentiment Analysis Module (SAM) in SentiProMo tool is limited to an accuracy of around 80%. In preliminary evaluation, we also obtained encouraging results for accuracy, sensitivity, specificity, and F1-score. In future, we will provide more training data to improve the performance of classification module. We will also further extend our software tool with a social web feature for capturing stakeholders’ feedback on a more massive scale. As possible improvement for ranking would be adding the configurable weighting of scores based on creators profile.
References 1. van der Aalst WMP (2003) Business process management demystified: a tutorial on models, systems and standards for workflow management. In: Desel J, Reisig W, Rozenberg G (eds) Lectures on concurrency and petri nets. Lecture notes in computer science, vol 3098, pp 1–65. Springer 2. Allweyer T (2009) BPMN 2.0: introduction to the standard for business process modeling. Books on Demand 3. Chen H, Zimbra D (2010) Ai and opinion mining. IEEE Intel Syst 25(3):74–76 4. Davenport TH, Short JE (1990) The new industrial engineering: information technology and business process redesign. Sloan Manage Rev 31(4):11–27. http://sloanreview.mit.edu/smr/ issue/1990/summer/1/ 5. Dumas M, Rosa ML, Mendling J, Reijers HA (2018) Fundamentals of business process management. Springer, Berlin Heidelberg 6. Hrastnik J, Cardoso J, Kappe F (2007) The business process knowledge framework, pp 517–520 7. Lüftenegger E, Softic S (2020) Sentipromo: a sentiment analysis-enabled social business process modeling tool. Business process management workshops. BPM 2020. Lecture Notes in business information processing. Springer, Cham 8. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press 9. Muehlen MZ, Recker J (2008) How much language is enough? Theoretical and practical use of the business process modeling notation. In: Bellahsène Z, Léonard M (eds) Advanced information systems engineering, pp 465–479. Springer, Berlin Heidelberg 10. Recker J, Indulska M, Rosemann M, Green P (2006) How good is bpmn really? Insights from theory and practice. In: Proceedings of the 14th European conference on information systems, ECIS 2006 11. Schmidt R, Nurcan S (2009) Bpm and social software. In: Ardagna D, Mecella M, Yang J (eds) Business process management workshops. Springer, Berlin Heidelberg, pp 649–658 12. Zor S, Schumm D, Leymann F (2011) A proposal of bpmn extensions for the manufacturing domain. In: Proceedings of the 44th CIRP international conference on manufacturing systems (2011) 13. Zott C, Amit R, Massa L (2011) The business model: recent developments and future research. J Manage 37(4):1019–1042
An Integration of UTAUT and Task-Technology Fit Frameworks for Assessing the Acceptance of Clinical Decision Support Systems in the Context of a Developing Country Soliman Aljarboa and Shah J. Miah Abstract This paper is to create a basis of theoretical contribution for a new Ph.D. thesis in the area of Clinical Decision Support Systems (CDSS) acceptance. Over the past three years, we conducted the qualitative research into three distinctive phases to develop an extended Task-Technology Fit (TTF) Framework. These phases are for initiating requirement generation of the framework, discovering the factors of the framework through perspectives and evaluating the new proposed framework. The new condition is related to developing country in which various sectors such as healthcare is mostly under attention. We conduct a new inspection for assisting decisions support technology and its usefulness in this sector to integrate with other frameworks for assisting the value, use and how can be better accepted in context of healthcare professionals. Keywords CDSS · Healthcare · Developing countries · Technology acceptance · UTAUT · And TTF
1 Introduction CDSS is one type of Health Information Systems (HIS) that is used in diagnoses, dispensing appropriate medications, making recommendations and providing relevant information that all contribute to medical practitioners’ decision-making [1]. CDSS help medical practitioners to make their decisions and produce good advice based on up-to-date scientific proof [2]. CDSS is a system that needs more research S. Aljarboa (B) Department of Management Information System, College of Business and Economics, Qassim University, Buridah, Saudi Arabia e-mail: [email protected] Business School, Victoria University, Footscray, VIC, Australia S. J. Miah Newcastle Business School, University of Newcastle, Newcastle, NSW, Australia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_11
127
128
S. Aljarboa and S. J. Miah
for new knowledge generation to reconcile and increase the interaction between the physician and CDSS in order to assist and support the physician to use the system successfully. It is necessary to investigate the determinants of CDSS acceptance for medical applications. According to Sambasivan et al. [3], developing countries face more difficulties than developed countries in implementing HIS. They also argued that improving the quality of healthcare performance can only be achieved by the acceptance of HIS by doctors. Several frameworks have been studied to determine the factors that affect the acceptance of HIS. However, there is still a lack of research regarding the factors that affect physicians’ acceptance of CDSS in developing countries [4]. Understanding and identifying the factors that influence the acceptance of information technology help designers and suppliers to reach a better understanding regarding the requirements of the end users. This leads to providing information systems which are more appropriate to the characteristics and conditions of the user and the work environment.
1.1 Unified Theory of Acceptance and Use of Technology (UTAUT) The UTAUT model was established and examined by Venkatesh et al. [5], where they investigated and analyzed eight different models and theories in order to discover and identify the factors that influencing user acceptance of technology. These models include the following: the theory of reasoned action, the technology acceptance model, the motivational model, the theory of planned behaviour, a model combining the technology acceptance model and the theory of planned behaviour; the model of PC utilization, the innovation diffusion theory and the social cognitive theory. This contributed to providing and providing a model capable of interpretation and gaining a greater understanding of user acceptance of technology. In addition to that, the factors most influencing the user acceptance were also identified, and this led to many studies in several fields to use the UTAUT model. The UTAUT model includes four variables: gender, age, experience and voluntariness of use. In addition, UTAUT model comprises four major determinants which are: performance expectancy, effort expectancy, social influence and facilitating conditions [5].
1.2 Task-Technology Fit (TTF) TTF Model indicates that information technology has a positive and effective role on user performance in the event that the features and characteristics of the technology
An Integration of UTAUT and Task-Technology Fit Frameworks…
129
are appropriate and fit with the business mission [6]. The TTF model has been adopted and applied in both different technologies and also in HIT [7, 8]. TTF examines both of the two factors, the technology characteristics and the task characteristics to order understanding appropriate the requirements of the task to improve the performance of the user [6]. TTF includes two constructs: task characteristics and technology characteristics, which influence the utilization and task performance [6]. TTF proves that if the technology used provides features that fit the demands, then satisfactory performance will be achieved [9]. The task characteristics and technical characteristics affect TTF, resulting in an impact on system usage and also on user performance. The paper organised as follows. The next section describes the background details of the proposed research. The section after that presents research methodology followed by the data analysis. Section 4 describes m modified proposed framework following by the discussion and conclusion of the paper.
2 Background The study of CDSS acceptance contributes significantly to revealing many of the barriers and advantages in adopting the system and provides a significant opportunity for the success of the implementation of CDSS. The investigation and discovery of the factors that influence the acceptance of CDSS by the end user are crucial to its successful implementation [10]. Several previous studies have indicated the need to conduct high-quality studies to determine the factors that influence the acceptance of CDSS by physicians. In a study by Arts et al. [11], regarding the acceptance and obstacles concerning using CDSS in general practice, their results indicated a need to conduct more research on this issue to have a much better understanding of CDSS features required by GPs and to direct suppliers and designers to produce more effective systems based on the demands and requirements of the end user. Understanding the aspects which contribute to technology acceptance by physicians in the healthcare industry is significant for ensuring a simple application of new technologies [12]. IS acceptance’s motivation is connected directly to the concept that systems are capable of completing their daily activities [1]. Acceptance of the CDSS is crucial in order to provide better health care services, since if the user does not accept the technology, the non-acceptance may affect negatively the health care and well-being of patients [1].
2.1 Theoretical Conceptual Framework On the basis of the study and revision of different acceptance models, this research proposes to integrate TTF with UTAUT as Fig. 1 shows. This seems to be an appropriate conceptual framework to provide a contributed and effective model, which
130
S. Aljarboa and S. J. Miah
Fig. 1 Conceptual Model of Integration of UTAUT and TTF
is able to identify the determinants that affect CDSS as well as distinguishing the determinants that influence the new technology in the domain of HIS (Fig. 1). Several studies have combined TTF with TAM [13, 14]. Usoro et al. [15] asserted that the combined both TAM and TTF together will help provide significant explanatory. In addition, several studies have combined TTF with UTAUT to investigate the technology acceptance [16, 17] The integration of UTAUT and TTF frameworks will contribute considerably in identifying and discovering important factors which contribute to the understanding and investigation of user acceptance of technology. UTAUT and TTF have various advantages that help one to learn the factors that affect technology, so their combination contributes to achieving the most comprehensive advantages and benefits. For understanding user acceptance in the technology of healthcare, we must comprehend not only the facts which affect acceptance, but also how these factors are fit as well. Even though various researches have explained the matter of ‘fit’, it is insufficient since its significance within the organization must be explored in detail, combining the technology with the user, to understand the issues which are concerned with the implementation of healthcare technology. There is actually a strong need to gain, address and understand the empirical support for the factor of fit when determining the acceptance of healthcare technologies by users [18]. Researchers must examine the factors that affect user acceptance when it comes to the evaluation of the issue of user acceptance along with the factor the ‘fit’ among the technology and the users [18]. Khairat et al. [1] indicated that combining the models and frameworks would develop and enhance user acceptance to promote and assist the successful adoption of CDSS. They stated that if the user did not accept the system, there would be a
An Integration of UTAUT and Task-Technology Fit Frameworks…
131
lack of use of the technological system and, moreover, may threaten the healthcare and well-being of patients.
3 Methodology This research employed a qualitative approach to collect data by conducting semistructured interviews. Fifty-four interviews have been conducted with GPs in three stages to obtain their perspectives on and attitudes to the factors that influence the acceptance of CDSS. The procedure and implementation of three different stages in the qualitative approach contributes to increase the level of validity and certainty of the data collected. The first stage initiates the factors’ generation of the model through convergent interviews, where researchers interviewed 12 health professionals. The second stage discovered and identified the factors of the model, by interviewing 42 GPs. These interviews helped the researchers to recognize the factors that influence the acceptance of CDSS through collecting perspectives, beliefs and attitudes of GPs towards CDSS. The third stage involved a review of the new proposed framework; researchers sought to increase the validation of the final framework by discussing it with three GPs and the extent of their agreement and views about it. Several studies had collected data based on the first and second stages in order to provide more accurate and detailed results for the phenomenon or issues studied [19, 20]. In this research, a third stage was added to have further investigation results of the proposed new framework.
3.1 Stage One: Initiated Requirement Generation of the Framework In Stage One, 12 exploratory interviews of GPs were conducted, using a convergent interviewing technique to gather insights and reasons for the factors behind the usage of CDSS. In this stage, the UTAUT and TTF factors were reviewed and their appropriateness was also reviewed to clarify and explore the appropriateness for the integration framework. Convergent interviewing is a qualitative approach. It aims to collect, describe and understand individual preferences, attitudes and beliefs or to identify his or her significant interests [21]. The initial interviews in this approach help to make the questions more structured for the subsequent interviews. This enhances the credibility of the research. [22]. The convergent interview technique helped to recognize and identify the themes more easily and accurately [23]. This stage contributed to obtaining and discovering new factors by using convergent interviewing and devising questions based on previous interviews. The convergent technique was very relevant
132
S. Aljarboa and S. J. Miah
and valuable as it enabled the researcher to swiftly find the necessary issues and to establish the questions for the next stage [24].
3.2 Stage Two: Discovering the Factors of the Framework Through Perspectives This stage provided significant data through 42 interviews with GPs. The questions in this stage related to the issues and factors mentioned and raised in the interviews of Stage One. We used the case study approach to gather more data from 42 participants to explore and identify the factors that influence the acceptance of CDSS by GPs. This approach has been widely applied in several different fields of research, due to its distinctiveness and its contribution to obtaining valuable results [25, 26]. A case study collects data which greatly contributes focusing on the research and identifies the issues [27]. Moreover, Gioia et al. [28] pointed out that such an approach provides opportunities for gaining insights into emerging concepts. This approach contributed to the exploration of new factors and the development of a proposal framework that explained the factors that influence the acceptance of CDSS by GPs. In-depth interviews led to the investigation of factors that influence the acceptance of CDSS. This also helps obtaining a broader understanding of perspectives and attitudes of the GPs towards the adoption of CDSS. The results of the in-depth interviews showed that all factors of both UTAUT and TTF influence the acceptance of CDSS by GPs, except social influence factor, and the new discovered factors included Accessibility, Patient satisfaction, Communicability (With physicians) and Perceived Risk.
3.3 Stage Three: Validation of a New Proposed Framework The third stage refers to reviewing and evaluating of the final framework with three physicians in order to obtain views and a final impression on the influencing factors that have been identified. This stage increased along with the second stage of validity, the results and helps to gain a more comprehensive understanding of the final framework by the end users of CDSS. The participants in this stage were among those 12 GPs who were interviewed in Stage One. This stage was added to obtain more views of the influencing factors from the physicians because there new factors had been identified that had not been asked of them. Herm et al. [29] indicated that reviewing the framework through interviews improves the validity of the framework. Maunder et al. [30] developed a framework for assessing the e-health readiness of dietitians. They conducted their study in three stages: a literature review, identification of topics related to the study and interviews with 10 healthcare experts to verify and confirm the validity.
An Integration of UTAUT and Task-Technology Fit Frameworks…
133
4 Data Analysis Thematic analysis was employed to analyze the data collected from the participants to understand and discover more about their experiences, perspectives and attitudes regarding the factors that influence the acceptance of CDSS. The thematic analysis technique is widely applied in HIS studies [31]. Following this approach, contributed to the formation of theories and models through a set of steps that assist to generate factors [32]. NVivo software has been used to analyse the data through applying six-step stages of thematic analysis established by Braun, Clarke [33]. This study followed those same phases to analyze the qualitative data which included: (1) Familiarising data, (2) Generating initial codes, (3) Searching for themes, (4) Reviewing themes, (5) Defining and naming themes and (6) Producing the report. In Phase One (Familiarising data), the recording was reviewed more than once for analyzing each interview’s transcript to highlight the important issues and perspectives of the GPs. In Phase Two (Generating initial codes), documents are coded according to their appropriate name in NVivo. Each code was linked into nodes to facilitate the process of building main and sub-themes. In Phase Three (Searching for themes), after the initial arrangement and coding of the data, the symbols were classified into possible themes in addition to creating related sub-themes for the main themes. In Phase Four (Reviewing themes), the themes and their codes (established in the previous step) were checked and confirmed through comparing them with the interviews’ transcripts. In Phase Five (Defining and naming themes), this step expresses the final access to the main themes, their identification and approval regarding their relevance to the codes. This prompted a comprehensive analysis of every theme and determined an illuminating or descriptive name for each theme. Phase Six: (Producing the report), a detailed explanation of each theme was undertaken to facilitate understanding of each factor that influence the acceptance of CDSS.
5 Modified Proposed Framework The study results showed that the following factors influence the acceptance of CDSS by GPs. Performance Expectancy (Time, Alert, Accurate, Reduce Errors, Treatment Plan), Effort Expectancy and Facilitating Conditions (Training, Technical Support, Updating), Task-technology fit, Technology characteristics (Internet, Modern Computers), Task characteristics, Accessibility, Patient satisfaction, Communicability (With physicians) and Perceived Risk (Time risk, Functional performance risk of the system) influences the acceptance of CDSS by GPs. These are shown in Fig. 2. The results contributed to gain an insight into the factors that influence the acceptance and intention to use CDSS. Furthermore, more ideas and understanding of how to enhance the acceptance of CDSS and other advanced HIS systems were discovered and obtained.
134
S. Aljarboa and S. J. Miah
Fig. 2 Final modified framework
6 Discussion and Conclusion The CDSS is one of the advanced decision support mechanism that helps physicians to make more correct decisions using evidence-based facts or contents. Healthcare in developing countries needs more improved practices in this aspect, in order to better understand the health protocols, to reduce medical errors and to provide better health care services. The framework developed in this research provides a new approach that helps to understand the factors that influence the use of CDSS. This will greatly benefit researchers, developers and providers of medical systems by way of designing more successful systems implementations. In addition, the new framework provides a better understanding regarding the features and tools in CDSS that help health professionals to provide quality and effective medical services and care. Several HIS projects and systems have failed due to lack of consideration of the human side and the end user considerations while designing health systems [34]. Analysis and determination of the requirements of the end user of CDSS before its implementation and the final accreditation will save time, effort and money, and will also contribute to the adoption of a successful HIS [35]. Furthermore, Kabukye et al. [36] found that while health systems can improve health care, their adoption is still low because their systems do not meet the requirements of the user. A limitation of this study is that this research relied on participants in Saudi Arabia as a developing country. The focus was mainly in two cities: Riyadh, which is the largest city in Saudi Arabia in terms of population and is also the capital, and Qassim, as is it one of the closest areas to Riyadh [37]. According to UN-Habitat [37], the population of Riyadh is 8,276,700 people, while the population of Qassim has 1,464,800 people. These cities were chosen due to travel and location restrictions in addition to the time and cost factors.
An Integration of UTAUT and Task-Technology Fit Frameworks…
135
It was challenging to obtain enough participants due to the nature of their work and their concerns and being busy with patient care. The data collection process was interrupted due to longer waiting period for the GPs to agree to conduct the interview to obtain a suitable time for them. This research has only relied on a qualitative approach to explore the factors that affect GPs’ acceptance of CDSS. Consequently, a quantitative approach was not suitable. The aim of this research is to build and develop theory instead of testing in a real healthcare decision domain [38]. Therefore, applying a qualitative approach through conducting semi-structured interviews is appropriate for this research. This research provides an opportunity for future research to study and verify the study’s framework in studies that influence the acceptability of any new HIS design (for instance, using design science research [39–41]). This research was conducted in Saudi Arabia through utilising the interview technique, so it may be possible to conduct other similar studies using other research tools in other countries to determine if there are different or new factors. In addition, the focus of this study was on GPs, so other healthcare professionals would be of interests. It is possible to conduct further research that considers specialist or health professionals or consultants in different medical departments.
References 1. Khairat S, Marc D, Crosby W, Al Sanousi A (2018) Reasons for physicians not adopting clinical decision support systems: critical analysis. JMIR. Med Inform 6(2):e24-es 2. Liberati EG, Ruggiero F, Galuppo L, Gorli M, González-Lorenzo M, Maraldi M et al (2017) What hinders the uptake of computerized decision support systems in hospitals? A qualitative study and framework for implementation. Implement Sci 12(1):113 3. Sambasivan M, Esmaeilzadeh P, Kumar N, Nezakati H (2012) Intention to adopt clinical decision support systems in a developing country: effect of Physician’s perceived professional autonomy, involvement and belief: a cross-sectional study BMC Med Inform Decis Mak 1(12): 142 4. Bawack RE, Kala Kamdjoug JR (2018) Adequacy of UTAUT in clinician adoption of health information systems in developing countries: The case of Cameroon. Int J Med Informatics 109:15–22 5. Venkatesh V, Morris MG, Davis GB, Davis FD (2003) User acceptance of information technology: toward a unified view. MIS Q: 425–78 6. Goodhue DL, Thompson RL (1995) Task-technology fit and individual performance. MIS Q 19(2):213–236 7. Ali SB, Romero J, Morrison K, Hafeez B, Ancker JS (2018) Focus section health IT usability: applying a task-technology fit model to adapt an electronic patient portal for patient work. Appl Clin Inform 9(1):174–184 8. Gatara M, Cohen JF (2014) Mobile-health tool use and community health worker performance in the Kenyan context: a task-technology fit perspective: Proceedings of the Southern African institute for computer scientist and information technologists annual conference 2014 on SAICSIT 2014 Emp by Technology Association for computing machinery Centurion, South Africa. 229–40 9. Irick ML (2008) Task-technology fit and information systems effectiveness. J Knowl Manag Pract 9(3):1–5
136
S. Aljarboa and S. J. Miah
10. Lourdusamy R, Mattam XJ (2020) Clinical decision support systems and predictive analytics. In: Jain V, Chatterjee JM (eds) machine learning with health care perspective: machine learning and healthcare. Springer International Publishing, Cham, pp 317–355 11. Arts DL, Medlock SK, van Weert HCPM, Wyatt JC, Abu-Hanna A (2018) Acceptance and barriers pertaining to a general practice decision support system for multiple clinical conditions: a mixed methods evaluation. PLoS ONE 13(3):1–16 12. Lin C, Roan J, Lin IC (2012) Barriers to physicians’ adoption of healthcare information technology: An empirical study on multiple hospitals. J Med Syst 36(3):1965–1977 13. Narman P, Holm H, Hook D, Honeth N, Johnson P (2012) Using enterprise architecture and technology adoption models to predict application usage. J Syst Softw 85:1953–1967 14. Wu B, Chen X (2017) Continuance intention to use MOOCs: Integrating the technology acceptance model (TAM) and task technology fit (TTF) model. Comput Hum Behav 67:221–232 15. Usoro A, Shoyelu S, Kuofie M (2010) Task-technology fit and technology acceptance models applicability to e-tourism. J Econ Dev Manag IT Financ Mark 2(1):1 16. Afshan S, Sharif A (2016) Acceptance of mobile banking framework in Pakistan. Telemat Inf 33:370–387 17. Park J, Gunn F, Lee Y, Shim S (2015) Consumer acceptance of a revolutionary technologydriven product: the role of adoption in the industrial design development. J Retail Consum Serv 26:115–124 18. Mohamadali NAK, Garibaldi JM (2012) Understanding and addressing the ‘Fit’ between user technology and organization in evaluating user acceptance of healthcare technology: international conference on health informatics 1: 119–124 19. Joseph M, Chad P (2017) A manager/researcher can learn about professional practices in their workplace by using case research. J Work Learn 29(1):49–64 20. Mai CCC, Perry C, Loh E (2014) Integrating organizational change management and customer relationship management in a casino. UNLV Gaming Research & Rev J 18(2):1 21. Dick R (1990) Convergent interviewing, interchange, version 3, Brisbane 22. Remenyi D, Williams B, Money A, Swartz E (1998) Doing research in business and management: an introduction to process and method. Sage, London 23. Golafshani N (2003) Understanding reliability and validity in qualitative research. Qual Rep 8(4):597–606 24. Rao S, Perry C (2003) Convergent interviewing to build a theory in under-researched areas: Principles and an example investigation of Internet usage in inter-firm relationships. J Cetacean Res Manag 6(4):236–247 25. Cheek C, Hays R, Smith J, Allen P (2018) Improving case study research in medical education: a systematised review. Med Educ 480–487 26. Fàbregues S, Fetters MD (2019) Fundamentals of case study research in family medicine and community health. Fam Med Community Health 7(2):e000074-e 27. Johnson B, Christensen LB 4th ed (2012) Educational research: quantitative, qualitative, and mixed approaches. SAGE Publications, Thousand Oaks, CA 28. Gioia DA, Corley KG, Hamilton AL (2013) Seeking Qualitative Rigor in Inductive Research: Notes on the Gioia Methodology. Organ Res Methods 16(1):15–31 29. Herm LV, Janiesch C, Helm A, Imgrund F, Fuchs K, Hofmann A et al (2020) A consolidated framework for implementing robotic process automation projects, Springer International Publishing, Cham p. 471–88 30. Maunder K, Walton K, Williams P, Ferguson M, Beck E (2018) A framework for eHealth readiness of dietitians. Int J Med Informatics 115:43–52 31. Christie HL, Schichel MCP, Tange HJ, Veenstra MY, Verhey FRJ, de Vugt ME (2020) Perspectives from municipality officials on the adoption dissemination, and implementation of electronic health interventions to support caregivers of people with dementia: inductive thematic analysis JMIR aging 3(1):e17255 32. Connolly M (2003) Qualitative analysis: a teaching tool for social work research. Qual Soc Work 2(1):103–112
An Integration of UTAUT and Task-Technology Fit Frameworks…
137
33. Braun V, Clarke V (2006) Using thematic analysis in psychology. Qual Res Psychol 3(2):77– 101 34. Teixeira L, Ferreira C, Santos BS (2012) User-centered requirements engineering in health information systems: a study in the hemophilia field. Comput Methods Programs Biomed 106(3):160–174 35. Kilsdonk E, Peute LW, Riezebos RJ, Kremer LC, Jaspers MW (2016) Uncovering healthcare practitioners’ information processing using the think-aloud method: From paper-based guideline to clinical decision support system. Int J Med Informatics 86:10–19 36. Kabukye JK, Koch S, Cornet R, Orem J, Hagglund M (2018) User requirements for an electronic medical records system for oncology in developing countries: a case study of Uganda. AMIA Annual Symposium proceedings AMIA Symposium 2017:1004–1013 37. United Nations Human Settlements Programme, Saudi Cities Report (2019) https://unhabitat. org/sites/default/files/2020/05/saudi_city_report.english.pdf. Accessed 21 Nov 2020 38. Miah S, J A (2014) Demand-driven cloud-based business intelligence for healthcare decision making. hand-book of research on demand-driven web services: theory technologies and applications. Theory Technol Appl: 324 39. Miah SJ, Gammack JG, McKay J (2019) A metadesign theory for tailorable decision support. J Assoc Inf Syst 20(5):570–603 40. Islam MR, Miah SJ, Kamal ARM, Burmeister O (2019) A design construct for developing approaches to measure mental health conditions. Australas J Inf Syst 23:1–22 41. Miah SJ, Shen J, Lamp JW, Kerr D, Gammack J (2019) Emerging insights of health informatics research: a literature analysis for outlining new themes. Australas J Inf Syst 23:1–18
Research Trends in the Implementation of eModeration Systems: A Systematic Literature Review Vanitha Rajamany , J. A. van Biljon , and C. J. van Staden
Abstract The 2020 COVID-19 health pandemic has accelerated the trend towards digitizing education. Increased digitization necessitates a robust and regulatory framework for monitoring standards in a knowledge society, which requires adaptivity to the continuous changes in the quality assurance processes (moderation). This provides the rationale for an investigation into the literature trends in eModeration processes. This study draws on a systematic literature review as methodology to examine the extant literature on trends in eModeration research including the purpose of the research, methodologies and limitations regarding existing eModeration systems. The findings reveal that there is little, if any, empirical evidence of systems dedicated to online moderation of assessments specifically within the secondary school sector and that eModeration is mainly an emergent phenomenon with numerous adoption challenges, especially in resource constrained contexts. Keywords eModeration · eAssessment · Quality assurance · eSubmission · eMarking
1 Introduction Education is tasked with preparing students for economies that are experiencing turbulent changes [1]. The Fourth Industrial Revolution (4IR) has demanded an inevitable transformation in education, making Education 4.0 the buzzword within the educational fraternity [2]. Education 4.0, enabling new possibilities by aligning humans and technology, is a response to the needs of 4IR. A prediction of 4IR is V. Rajamany (B) · J. A. van Biljon · C. J. van Staden School of Computing, UNISA, Pretoria, South Africa e-mail: [email protected] J. A. van Biljon e-mail: [email protected] C. J. van Staden e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_12
139
140
V. Rajamany et al.
that, traditional methods and platforms for assessments may become irrelevant or insufficient [2]. Additionally, the global COVID-19 pandemic has accelerated 4IR predictions towards innovation and growth in digital solutions. The pandemic has refocused attention on eLearning and has necessitated a radical change in assessment processes. Tertiary institutions are increasingly adopting ICTs for online submission (e-submission) and electronic marking (e-marking). Increasing questions about the performance of eLearning systems have driven Higher Education Institutions (HEIs) to try different approaches to address the quality problems posed by the use of eLearning [3]. Moderation is a quality assurance process through which teachers share their knowledge and expectations concerning standards to improve the consistency of the judgement of assessments [4]. The moderator comments on the marking of a colleague and provides feedback on the consistency of the marking [5]. eAssessment needs innovative solutions to optimize the new moderation processes necessitated by the transformation from traditional paper-based moderation methods to electronic moderation [6]. A number of international studies claim generalizability in driving efforts at reforming moderation processes and increasing quality standards in education [7–9]. Prevailing research is generally supportive of a standards-based model, to develop moderation as a practical process in an attempt to raise standards [8, 10–12]. In contrast to online assessment and automated marking, which have been studied in depth and successfully applied in HEIs, the electronic moderation of school-based assessments is a relatively new phenomenon [13]. Based on the dynamic growth of online assessments, a usable, credible eModeration system is, therefore, critical. The research question can thus be stated as: What are the research trends regarding the implementation of eModeration systems? Assessment has traditionally been a process of written submissions [14]. Developments in access to, and advances in, ICT services have facilitated the area of digital assessment (eAssessment) [15] which is described as the use of technology to support and manage the assessment process life cycle [16]. eSubmission and eMarking technologies are gradually becoming the norm in UK Higher Education resulting in an increased interest in the electronic management of assessments [5]. This paper is structured as follows: Section 1 provides an introduction presenting the background, context and rationale for this paper. Section 2 indicates the literature review process. Section 3 outlines the findings and summarizes existing technological solutions for conducting moderation processes online. Section 4 concludes this paper.
2 Systematic Literature Review A Systematic Literature Review (SLR) is a rigorous, standardized methodology for the systematic, meticulous review of research results in a specific field [17]. The SLR is based on a detailed, well-articulated question. Furthermore, it isolates relevant studies, evaluates their quality and condenses the evidence by the use of an explicit
Research Trends in the Implementation…
141
methodology [18]. The search terms included, were: eModeration, digital moderation, digital moderation of assessments and digital platform for external moderation. Only English peer reviewed journal articles and articles published at conference proceedings from 2012 to 2020 were included. Given the dynamic nature of technology, there is a time lapse between the implementation of a system and when the system is, in fact, reported in academic literature. Restricting the search to a certain period of time is thus a limitation of this study as a system which has not yet been reported on, but could, in fact, exist. Literature focusing on studies in domains other than education were excluded. Within this group of papers, only papers that described implemented eModeration systems were included since this study focused on practical, evidence-based findings regarding the implementation of moderation systems. These exclusion criteria limited the number of papers retrieved. A further limitation arises from the search strategy focusing only on information system specific databases such as Scopus and Inspec. Specialized education databases such as ERIC were not specifically consulted. The search strategy followed is depicted in Fig. 1.
Fig. 1 Search strategy
142
V. Rajamany et al.
3 Results and Findings In this section, five systems/studies investigating eModeration will firstly be described individually. Secondly, the key focus of four of these five systems are summarized (cf. Table 1). The Digital Moderation Project [19] focused on teacher requirements prior to the creation of an actual eModeration system. Hence, the Table 1 Key focus of existing moderation systems System
Purpose
Proof of concept trial (SPARK) [10]
Improving peer review HEI processes of assessments in HEIs using technology to address quality assurance
Context
Findings An online tool should be context-sensitive; streamlined, efficient, cost-effective, sustainable and fit for purpose
Digital moderation project [19]
To determine teacher Secondary schools requirements for submitting assessments via an online digital platform
Inconclusive, no existing eModeration system could be found
User experience evaluation framework [21]
A framework for evaluating the user experience of an eModeration system
HEI
An eModeration system should enable moderators to upload marked scripts, download scripts, track the moderation process, provide security and notifications when moderation is complete
Adaptive Comparative Judgement System (ACJS) [20]
ICT system for social online moderation using comparative judgement of digital portfolios. Pairs of digital portfolios are dynamically generated for each assessor to judge. Area provided for assessors to record individual notes about each portfolio
HEI
It is feasible to use ICTs to support comparative judgements. An important finding is that the reliability of the final scores was not high
Computer assisted Machine learning evaluation system [11] techniques for solving problems of variances in evaluation
HEI
Machine learning can accurately predict scores of a second evaluator based on scores allocated by the first evaluator
Research Trends in the Implementation…
143
Digital Moderation Project was not included in the discussion. Based on the literature reviewed, preliminary findings are presented in Table 1. Newhouse and Tarricone [20] describe a system for pairwise comparison used in social online moderation to assist teachers with understanding of standards. A custom-made tool is used to store digital samples of assessments. The focus is on supporting social online moderation by generating groups of portfolios for each assessor to judge (cf. Table 1). The system calculates individual assessor scores to establish their reliability. System use is preceded and followed by standardization discussions using an online platform. Moderation takes the form of online scoring so that consensus is reached in awarding a grade rather than using the system to moderate assessments. The New Zealand Qualification’s Authority [19] conducted a survey to determine teacher requirements for an online platform for the submission of assessments. However, there is no further indication of the development of such a system (cf. Table 1). Van Staden [21] describes an eModerate system used and tested at two private Higher Education Institutions in SA. Assessors upload marked assessments and a moderator downloads these assessments for moderation. Stakeholders receive notification when moderation is completed. This study focused on a framework for evaluating the user experience of the eModerate system (cf. Table 1). Kamat and Dessai [11] present a system implementing machine language to establish the quality of the assessment and to validate consistency in evaluation. The system predicts a mark for each examiner to control variations in appraisals. Artificial Neural Network (ANN) modelling is then used on evaluations carried out by different examiners to predict the marks that would be obtained as though one examiner had performed all evaluations in the course (cf. Table 1). Durcheva et al. [14] describe the TeSLA system integrated into the Moodle platform and implemented in specialized courses. The emphasis in the TeSLA system is on the task design specifically focusing on ensuring academic integrity and eliminating opportunities for cheating by using photos, videos or audio recordings of registered students. The literature reviewed indicates that there are a limited number of studies applicable to the eModeration context. The findings indicate a focus on proof of concept systems and teacher requirements for using a digital platform to conduct moderation. Based on these findings, an online tool should be context-sensitive, streamlined, efficient, cost-effective, sustainable and fit for purpose. Only one of the five studies considered, i.e. the User Experience Evaluation Framework [21] provides comprehensive functionality which enables a moderator to access assessed scripts, annotate these scripts and upload them together with a report for the initial assessor to retrieve. The proof of concept (SPARK) system [10] only outlines the requirements for an eModeration system while Booth and Rennie [10] report only on the first phase of a seven-phase project. Van Staden [21] mentions a web-based eModerate System specifically designed for use at a HEI, but the actual moderation process is not necessarily an inherent function afforded by the eModerate System. Moderators are able to complete the
144
V. Rajamany et al.
moderation either using tools provided by a word processor or the sticky note functionality provided by Adobe products. Noteworthy amongst the findings is that the institution hosting the eModerate System should have adequate Internet connectivity and infrastructure, which is also a necessary prerequisite for 4IR. Additionally, technology limitations can hamper the digital moderation process [21]. The other systems namely (ACJS) and the Computer Assisted Evaluation System (cf. Table 1) focus on comparing the judgements provided by two evaluators either by generating a pair of portfolios or by using machine language to predict the accuracy of the judgements. However, the reliability of the final scores is dependent on teacher experience.
4 Conclusion This paper outlines a literature review investigating current trends on the use of technology in implementing moderation processes. The findings highlighted the importance of improving peer review processes using technology and machine learning techniques to determine variances in assessments. Notably, only two of the five studies focused on the implementation of technology in completing moderation processes. The five studies examined make use of qualitative and quantitative analyses of technological solutions, where the focus seems to be on quality assurance and the context predominantly that of HEIs. The lack of literature on the implementation of eModeration systems is the most pertinent finding of this paper, pointing to a knowledge gap on eModeration systems. It is, therefore, necessary for more research to be conducted on digital solutions for conducting moderation processes and, especially so in other educational contexts like the secondary school environment. Another important new direction is the improvement of peer review processes by using machine learning techniques to determine variances in assessments.
References 1. Motala S, Menon K (2020) In search of the “new normal”: reflections on teaching and learning during Covid-19 in a South African unversity. Southern African Rev Educ 26(1):80–99 2. Hussin AA (2018) ‘Education 4.0 made simple: ideas for teaching’, Int J Educ Lit Stud 6(3) 92. available at: https://journals.aiac.org.au/index.php/IJELS/article/view/4616 3. Farhan MK, Talib HA, Mohammed MS (2019) Key factors for defining the conceptual framework for quality assurance in e-learning. J Inf Technol Manag 11(3):16–28. https://doi.org/10. 22059/jitm.2019.74292 4. Handa M (2018) Challenges of moderation practices in private training establishments in New Zealand. Masters Dissertation, Unitec Institute of Technology 5. Vergés Bausili A (2018) From piloting e-submission to electronic management of assessment (EMA): mapping grading journeys. Br J Edu Technol 49(3):463–478. https://doi.org/10.1111/ bjet.12547
Research Trends in the Implementation…
145
6. Volante L (2020) ‘What will happen to school grades during the coronavirus pandemic?’ the conversation Africa, april. https://theconversation.com/what-will-happen-to-school-gradesduring-the-coronavirus-pandemic-135632?utm_medium=email&utm_campaign=Latest from the conversation for april 8 2020&utm_content=Latest from the conversation for april 8 2020+CID_1cd271e3ef246a59 7. Colbert P, Wyatt-Smith C, Klenowski V (2012) A systems-level approach to building sustainable assessment cultures: Moderation, quality task design and dependability of judgement. Policy Futur Educ 10(4):386–401. https://doi.org/10.2304/pfie.2012.10.4.386 8. Connolly S, Klenowski V, Wyatt-Smith CM (2012) Moderation and consistency of teacher judgement: teacher’s views. Br Edu Res J 38(4):593–614. https://doi.org/10.1080/01411926. 2011.569006 9. Wyatt-Smith C, et al (2017) ‘Standards of practice to standards of evidence: developing assessment capable teachers’, Assessment in Education: Principles Policy and Practice. Routledge, 24(2), 250–270. https://doi.org/10.1080/0969594X.2016.1228603 10. Booth S, Rennie M (2015) ‘A technology solution for the he sector on benchmarking for quality improvement purpose’s, In: Proceedings of the 2015 AAIR annual forum. Australasian association for institutional research Inc 22–32. https://doi.org/10.1145/3132847.3132886 11. Kamat VV, Dessai KG (2018) ‘e-moderation of answer-scripts evaluation for controlling intra/inter examiner heterogeneity’. In: IEEE 9th international conference on technology for education. T4E IEEE 130–133. https://doi.org/10.1109/T4E.2018.00035 12. Krause K et al (2013) Assuring final year subject and program achievement standards through inter–university peer review and moderation. http://www.uws.edu.au/latstandards 13. Van Staden C, Kroeze J, Van Biljon J (2019) Digital transformation for a sustainable society in the 21st century, IFIP international federatin for information processing 2019. Ed by IO pappas et al Cham: Springer International Publishing (Lecture Notes in Computer Science). https:// doi.org/10.1007/978-3-030-29374-1 14. Durcheva M, Pandiev I, Halova E, Kojuharova N, Rozeva, A (2019) Innovations in teaching and assessment of engineering courses, Supported by authentication and authorship analysis system. In: AIP conference proceedings, 1–9. https://doi.org/10.1063/1.5133514 15. Chia SP (2016) An investigation into student and teacher perceptions of, and attitudes towards, the use of information communication technologies to support digital forms of summative performance assessment in the applied information technology and engineering studies c. Doctor of Philosophy, School of Education, Edith Cowan University. https://doi.org/10.1057/ 978-1-349-95943-3_324 16. Moccozet L, Benkacem O, Tardy C, Berisha, E, Trindade RT, Burgi PY (2018) ‘A versatile and flexible framework for e-assessment in Higher-Education’. in 2018 17th International conference on information technology based higher education and training, ITHET 2018, 1–6. https://doi.org/10.1109/ITHET.2018.8424764 17. Kitchenham B, Pearl Brereton O, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering—a systematic literature review. Inf Softw Technol Elsevier BV, 51(1):7–15. https://doi.org/10.1016/j.infsof.2008.09.009 18. Boell K, Cecez-Kecmanovic D (2015) On being ‘systematic’ in literature reviews in IS. J Inf Technol 30(2):161–173 19. New-Zealand–Qualifications Authority (2016) Digital Moderation Discussion Paper Wellington 20. Newhouse CP, Tarricone P (2016) Online moderation of external assessment using pairwise judgements. In: Australian council for computers in education 2016 conference refereed proceedings. Brisbane, 132–129 21. Van Staden C (2017) User experience evaluation of electronic moderation systems: a case study at a private higher education institution in South Africa. doctoral dissertation, school of computing, University of South Africa
From E-Government to Digital Transformation: Leadership Miguel Cuya and Sussy Bayona-Oré
Abstract In these times of pandemic, the world has witnessed the power of connectivity, and many organizations have seen the need to rethink their models and even reinvent themselves in response to this new form of global connectivity. This transformation has also changed the way public organizations deliver services to citizens. Successfully driving the digital transformation process requires a leader with skills and knowledge. Leadership has been considered as a critical factor that impacts egovernment. This paper presents the competences of those responsible for driving digital transformation (DT) in organizations. The results show that leadership is one of the most desired competences followed by technological knowledge, business vision, and customer orientation. These aspects must be considered as a fundamental basis, in any organization that takes on the challenge of this paradigm shift. Keywords Leadership · Digital transformation · Individual factors · E-Government
1 Introduction Currently, with the incorporation of Information and Communication Technologies (ICT), the use of the Internet, the massification of mobile technology has changed the way business is done, daily activities, and data processing [1]. The customer behavior and expectations are changing, and the transformation of the business model based on technology, innovation, and research it becomes necessary [2]. In this context, the practical “totality” acquires a more globalized meaning in all areas of business activity [3] and the top managers should understand digitalization and leading the change [4]. This situation forces the leaders of the organizations to adopt new digital M. Cuya · S. Bayona-Oré Universidad Nacional Mayor de San Marcos, Av. Germán Amézaga s/n, Lima, Lima, Peru e-mail: [email protected] S. Bayona-Oré (B) Universidad Autónoma del Perú, Av. Panamericana Sur Km. 16.3, Villa el Salvador, Lima, Peru © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_13
147
148
M. Cuya and S. Bayona-Oré
technologies in their services that allow them to meet the demands of consumers [5]. In this way, organizations, regardless of the sector will face drastic changes in their corporate culture, in order to adapt their structures, processes, and models to face the new paradigms that respond to the context of the digital revolution, henceforth called Digital Transformation (DT) [6, 7]. Achievements or successes DT will depend on several factors; the stage of maturity, types of initiative, value of initiatives, and the structure of the teams focused on the customer experience [5]. It is worth mentioning the common mistake of understanding DT as the acquisition and use of cutting-edge technology, without considering the other factors, where despite the investment it is usually not possible to obtain the expected transformational impact. Experts attribute to this type of failure the lack of digital culture in leaders and decision-makers [8, 9]. Adopting DT will then be a priority for top management to seek new business models based on current customer needs, making the most of the organization’s talent pools, and finding a strategic response to the trend of the economy and digital technology [7, 10]. However, the success of these initiatives depends not only on the decision of the leader, but also on the commitment, capabilities, characteristics, and style of leadership of the leader throughout the transformation process [5]. These characteristics will be called individual factors, which will be studied under a constructivist and systemic approach. The study of technology from different perspectives is not at all unconnected with the problems of the foundation of knowledge [10, 11]. This work elaborates an epistemological analysis from a holistic and systemic approach in search of knowing the predominant factors which allow carrying out successfully the DT processes. The term systemic is related to the whole in a system, which in turn is defined as the set of elements functionally related to each other so that each element of the system is a function of some other element, and there is no single element [12]. Understanding the individual factors of a leader in DT within an epistemic framework leads us furthermore to review the causal transformations of the organization [13, 14]. Leading DT involves making organizational changes [15] and visible leadership support is required to drive change [16]. Another important point in the research is the methodology in finding new truths, these truths emanate from the relativistic analysis around a set of beliefs or paradigms [17]. A review of the specialized literature will be carried out focusing on the predominant factors regarding its role as a fundamental actor in the changes of culture and models required in a digital transformation. To this end, texts of obligatory reference in the field will be reviewed and those contributions of specialized academic articles between 2017 and 2020 will be included. In this work, we will focus on the individual characteristic of the leader and the relationship of this one in the organizations in the DT, present in the literature. Therefore, after the introduction, the rest of the paper is structured as follows: Sect. 2 presents the background of the research work; Sect. 3 presents the methodology used; Sect. 4 presents the results and discussion of results. Finally, conclusions and future work are included in Sect. 5.
From E-Government to Digital Transformation: Leadership
149
2 Background This section covers the basic concepts on which this research work is based such as digital transformation (DT), digital maturity, and leadership.
2.1 Digital Transformation (DT) Disruptive technological advances force organizations to find new ways of doing business so that they can adapt and evolve, thus avoiding the risk of going out of business [1]. The industrial ecosystem is being transformed into a digital one where globalized communication and artificial intelligence has been facilitating decentralization and automation in decision making [18]. This process of change to digitalization that organizations must adopt is seen by many professionals as a global mega-trend called Digital Transformation, which has the capacity to fundamentally change industries and their operations [5]. DT is the management process that guides the culture, strategy, methodologies, and capabilities of an organization from digital technologies [4]. It is essential to adopt the transformation in order to stay in the market, increasingly global, with more specific demands and constant change by consumers [6]. Likewise, the implementation of digital strategies for transformation will not be immune to the problems of the organization, whether structural or maturity problems [1]. However, it is important to consider that the DT can be a great opportunity to obtain better economic benefits [19]. The government must be able to attract new talents with skills, knowledge, experience, interpersonal abilities, and others with the purpose of address the challenge of digital transformation.
2.2 Digital Maturity Organizations that are not digital natives pursue digital maturity as an indispensable necessity to survive and succeed in today’s society [8]. Maturity is understood as a state of being complete or ready, based on the result of progress in the development of a system, and being specific in the DT we could use the term digital maturity to determine the state of readiness to successfully adopt the transformation [4]. Digital maturity is achieved through commitment, investment, and leadership. Companies with digital maturity set realistic priorities and commit to the hard work of achieving digital maturity. Digital maturity is acquired not only when the productive processes are digitized, but also when digital thinking or culture leads all actions [8]. Davenport and Redman [9] establish a model of digital maturity based on talent in four key areas: Technology, Data, Process, and Organizational Change (see Fig. 1).
150
M. Cuya and S. Bayona-Oré
Fig. 1 Systemic construct of individual factors of the DT leader
2.3 Leadership In the business world, the term leadership has been and is being widely studied in the administrative sciences, so leadership is understood as the ability of a person to influence others to energize, support, or encourage them to achieve the goal of a common project and achieve improvements in the organization [20]. Leadership is the ability to influence a group to achieve a vision or set of goals, where the source of this influence could be formal, such as the managerial position in an organization [21]. In this business environment, leadership has been studied from two points of view; the first from the individual as part of a hierarchical position in the organization, and the other as a process of social influence occurring in a social system [22, 23]. Leadership is a process-oriented toward social transaction and interrelationship that occurs between leaders and collaborators in the fulfillment of common objectives [24]. Several studies agree that leadership in management teams plays a fundamental role in a digital integration process, since they are the main agents of change [23].
3 Methodology The present research is a review of the specialized literature corresponding to models, strategies, success, and failure cases experienced by organizations during the adoption of DT and focus on the predominant individual factors of the leader through inductive and systemic reasoning. For this purpose, academic and indexed articles, published in the period 2017–2020, were reviewed. To extract the relevant information from each of the scientific articles, an Excel format was designed with the following information, general data of the article, DT leader competences, leadership styles, competences definitions, and the influence of competences on DT. A total of
From E-Government to Digital Transformation: Leadership
151
22 peer-reviewed articles from journals or conferences were identified and analyzed in order to identify the predominant individual factors of the leader.
4 Results As a result of the literature review, a set of individual competences were identified that the leaders of today’s organizations should have developed by beginning the DT process. Table 1 shows the list of the individual competences identified. It can be seen that the most frequently mentioned competences required for digital transformation leaders are leadership, communication, business vision, technological knowledge, and agile capabilities. Most of the adoption case studies correspond to regions of developed countries in technological areas, which would indicate that despite the fact that technology is of global tendency, adoption of DT in less developed countries has some kind of paradigmatic brake. This could be a reason for future research. Leadership competence is the factor often mentioned in studies. It is achieved through organizational change, and this in turn through empowered leadership that directs change management. The process of DT requires consideration of the value of intellectual and human capital, and it is the leader who is responsible for managing change through those resources. In the same line, Brock [29] and Alunni [26] consider leadership as one of the success factors to be taken into account in the management capacities and in any organization that adopts a DT process. With this competence, the leaders focus on changing the organizational culture as part of the efforts to DT [32]. Communication is an important factor for the individual leader in overcoming the challenge of cultural and operational impediments that the organization presents in adopting DT. Alunni [26] considers communication as one of the three main factors for transformation. Communication is the competence that allows the strategy to be known inside and allows it to be adjusted through continuous communication with Table 1 Competences of the DT leaders Competences
Studies
Total
Leadership
[4, 5, 9, 26, 27, 28, 30, 31, 19, 33, 38, 39, 41]
13
Communication
[1, 5, 9, 19, 25, 26, 28, 36, 37]
9
Business vision
[5, 6, 9, 31, 32, 34, 37, 38]
8
Technological knowledge
[4, 6, 27, 28, 29, 31, 33, 38]
8
Agile capabilities
[29, 5, 33, 38, 40, 41]
6
Transparency
[1, 5, 25, 33, 36]
5
Customer orientation
[6, 26, 28]
3
Teamwork
[5, 33, 38]
2
Coaching
[5, 28]
2
152
M. Cuya and S. Bayona-Oré
the client [34]. The business vision must be designed and communicated to the client so that he or she captures the value proposal. The vision as a leader’s competence is related to his strategic capacity to respond to the environment not only by thinking about what to do but also by proposing a new one [31]. “Leaders play a key, nodal role, guiding and providing vision on the new role implications, the focus of the new contribution and the impact it has on the client” [26]. For changes to be really successful, it must be understood that DT is not only a matter of technology change but a question of strategic vision that allows an adequate organizational change and process redesign [6]. That is why leaders must have the ability to focus their employees on a clear objective as part of the strategy [34]. It is important that the leader’s vision to have to face the change as something structural and only as a change of technology [5]. The systemic vision was one of the 11 competences identified in the work of developing management competences for the improvement of integrated care [35]. Brock [29] highlights the importance of the technological capabilities of leaders in organizations, to make them face the resistance of change and contribute to organizational agility under a digital model. Breuer [28] considers the digital competences of the leader as a necessary leadership evolution to face the changes in the industry. In one of the cases of Alunni’s work [26], they highlighted the importance of enhancing technological capabilities through training, thus reinforcing the importance of technological knowledge in the DT initiative. For her part, Meriño [31] indicates that one of the main obstacles in digital transformation is the lack of digital skills on the part of the leaders. To have the creativity, vision, and strategy, the leaders of these times have to be up to date with the technological changes, trends, and tools that exist today and even more so in the business world. Agile capabilities are related to the flexibility to accommodate small, medium, and large changes to processes [29]. Transparency would create trust and break down many fears that would smooth out the transformation and alignment [26]. In DT, as in any change, organizations face greater complexity and create uncertainty, so leaders should take into account this competence in the design of their communications [33]. The DT should prioritize employees’ and customers’ experiences with digitalization, requiring leaders to keep their digital skills up to date to take advantage of the opportunities that arise [25]. Customer orientation is related to know how to interact with new customer needs and how they access information in a digital context, to be considered in the digital strategy of the organization [34]. Teamwork allows us to strengthen and reinforce the synergy in the organization and encourage to exploit the individual potential for the benefit of the group, which are agile creating networks of contact either in formal or informal teams [33]. Teams must be agile and selected according to their skills [28, 33]. The leader must empower the team through a role of mentor, coach, and tutor [5, 28]. Leaders should influence even informal teams, creating networks of knowledge sharing through an inclusive environment [33]. The change process will require the leader’s ability to join the team as a coach, encouraged to bring out the best in each member for the benefit of the transformation [5]. In the e-government paradigm shift, the digital government promotes a new model of e-government [42]. A governmental reform precedes the DT of public services.
From E-Government to Digital Transformation: Leadership
153
Leadership is a critical success factor to implement local e-government [43]. The results of this review show that leadership is a competence for digital transformation.
5 Conclusions The digital transformation in public and private organizations is a process that requires adjust their products and services to the requirements of this digital trend where customer behavior and expectations are changing. Many factors are involved and must be considered in the transformation process. This paper presents the results of literature review focused on the importance of the leader’s role in order to make the change process successful, considering that DT is not only an improvement in the company’s technological resources, it is not only about improving a process, but about reviewing and rethinking a reengineering from the business model with a strategic and digital vision. Communication and technological knowledge are competences mentioned in the studies. For future work, we conduct a review using the Systematic Literature Review (SRL) method with the purpose of establishing the competences responsible for digital transformation and e-government of public institutions.
References 1. Mehta K (2019) Ahead of the curve: How to implement a successful digital strategy to stay relevant in today’s tech-driven world. Qual Progress 52(10):30–39 2. Verhoef P, Broekhuizen T, Bart Y, Bhattacharya A, Qi Dong J, Fabian N, Haenlein M (2021) Digital transformation: a multidisciplinary reflection and research agenda. J Bus Res 122:889– 901 3. Cuenca-Fontbona J, Matilla K, Compte-Pujol M (2020) Transformación digital de los departamentos de relaciones públicas y comunicación de una muestra de empresas españolas. Revista de Comunicación 19(1):75–92 4. Wrede M, Velamuri V, Dauth T (2020) Top managers in the digital age: exploring the role and practices of top managers in firms ‘digital transformations. Manage Decis Econ 41:1549–1567 5. Kazim F (2019) Digital transformation and leadership style: a multiple case study. ISM J Int Bus 3(1):24–33 6. Muñoz D, Sebástian A, Nuñez M (2019) La Cultura Corporativa: Claves De La Palanca Para La Verdadera Transformación Digital. Revista Prisma Soc 25:439–463 7. Wang H, Feng J, Zhang H, Li X (2020) The effect of digital transformation strategy on performance: the moderating role of cognitive conflict. Int J Conflict Manage 31(3):441–462 8. Álvarez M, Capelo M, Álvarez J (2019) La madurez digital de la prensa española. Estudio de caso. Revista Latina de Comunicación Soc 74:499–520 9. Davenport T, Redman T (2020) Digital transformation comes down to talent in 4 key areas. Harvard Bus Rev Digit Art 2–5 10. McGrath R, McManus R (2020) Discovery-driven digital transformation. Harvard Bus Rev 98(3):124–133 11. Estany A (2007) El Impacto De Las Ciencias Cognitivas en La Filosofía De La Ciencia. Eidos 6:26–61 12. Terán C, Serrano J, Romero E, Terán H (2017) Epistemología De Las Ciencias Computacionales en Ingeniería. Revista Didasc@lia: Didáctica y Educación 8(3):213–224
154
M. Cuya and S. Bayona-Oré
13. Yaguana H, Rodríguez N, Eduardo C (2019) La Sociedad Digital, Una Construcción De Nuevos Paradigmas. Revista Prisma Soc 26:1–3 14. Izaguirre-Remón C, Ortíz-Bosch C, Alba-Martínez D (2019) Filosofía de la ciencia y enfoque ciencia-tecnología-sociedad: un análisis marxista de la epistemología social. Innovación Tecnológica 25:1–14 15. Halpern N, Mwesiumo D, Suau-Sanchez P, Budd T, Bråthen S (2021) Ready for digital transformation? The effect of organisational readiness, innovation, airport size and ownership on digital change at airports. J Air Transp Manage 90: 16. Iannacci F, Pietrix A, De Blok C, Resca A (2019) Reappraising maturity models in e-Government research: the trajectory-turning point theory. J Strateg Inf Syst 28(3):310–329 17. Bunge M (2019) Promesas y peligros de los avances tecnológicos. Revista Trilogía 11(21):7–10 18. Fernández J (2020) Los fundamentos epistemológicos de la transformación digital y sus efectos sobre la Agenda 2030 y los derechos humanos. Revista Icade. Revista de las Facultades de Derecho y Ciencias Económicas y Empresariales 19. Mugge P, Abbu H, Michaelis T, Kwiatkowski A, Gudergan G (2020) Patterns of digitization: a practical guide to digital transformation. Res Technol Manage 63(2):27–35 20. Baque L, Triviño, Viteri D (2020) Las habilidades gerenciales como aliado del líder para ejecutar la estrategia organizacional. Dilemas Contemporáneos: Educación, Política y Valores 7:1–16 21. Jiménez A, Villanueva M (2018) Los estilos de liderazgo y su influencia en la organización: Estudio de casos en el Campo de Gibraltar. Gestión Joven 18:183–195 22. Zarate A, Matviuk S (2012) Inteligencia emocional y prácticas de liderazgo en las organizaciones colombianas. Cuadernos de Administración 28(47):89–102 23. Villar M, Araya-Castillo L (2019) Consistencia entre el enfoque de liderazgo y los estilos de liderar: clave para la transformación y el cambio. Pensamiento Gestión 46:187–221 24. Cortés-Valiente J (2017) Liderazgo emocional: cómo utilizar la inteligencia emocional en la gestión de los colaboradores. Memorias (0124-4361) 15(28):1–23 25. Albukhitan S (2020) Developing digital transformation strategy for manufacturing. Proc Comput Sci 170:664–671 26. Alunni L, Llambías N (2018) Explorando La Transformación Digital Desde Adentro. Palermo Bus Rev 17:11–30 27. Area-Moreira M, Santana P, Sanabria A (2020) La transformación digital de los centros escolares. Obstáculos y resistencias. Digit Educ Rev 37:15–31 28. Breuer S, Szillat P (2019) Leadership and digitalization: contemporary approaches towards leading in the modern day workplace. Dialogue 1311–9206(1):24–36 29. Brock J, von Wangenheim F (2019) Demystifying AI: what digital transformation leaders can teach you about realistic artificial intelligence. California Manage Rev 61(4):110–134 30. Ifenthaler D, Egloffstein M (2020) Development and implementation of a maturity model of digital transformation. TechTrends Link Res Pract Improve Learn 64(2):302 31. Meriño R (2020) Competencias Digitales Para La Transformación De Las Empresas, Las Claves, Gestión Del Talento, Valores Y Cultura Organización Que Promueva La Educación Continua. Revista Daena (Int J Good Consci) 15(1):350–354 32. Müller S, Obwegeser N, Glud JV, Johildarson G (2019) Digital innovation and organizational culture: the case of a Danish media company. Scandinavian J Inf Syst 31(2):3–34 33. Petrucci T, Rivera M (2018) Leading growth through the digital leader. J Leadership Studies 12(3):53–56 34. Sebastian I, Ross J, Beath C, Mocker M, Moloney KG, Fonstad NO (2017) How big old companies navigate digital transformation. MIS Q Executive 16(3):197–213 35. Merino M, Bayona M, López-Pardo E, Morera Castell R, Martí T (2019) Desarrollo de Competencias Directivas para la mejora de la Atención Integrada. Int J Integrated Care (IJIC) 19(S1):1–2 36. Abbu H, Mugge P, Gudergan G, Kwiatkowski A (2020) Digital leadership—character and competency differentiates digitally mature organizations. In: 2020 IEEE international conference on engineering, technology and innovation (ICE/ITMC), Engineering, Technology and Innovation (ICE/ITMC), 1–9
From E-Government to Digital Transformation: Leadership
155
37. Guzmán V, Muschard B, Gerolamo M, Kohl H, Rozenfeld H (2020) Characteristics and skills of leadership in the context of industry 4.0. Proc Manuf 43:543–550 38. Imran F, Shahzad K, Butt A, Kantola J (2020) Leadership competencies for digital transformation: evidence from multiple cases. In: Kantola J, Nazir S, Salminen V (eds) Advances in human factors, business management and leadership. AHFE 2020. Advances in intelligent systems and computing, vol 1209. Springer, Cham 39. Jonathan G (2020) Digital transformation in the public sector: identifying critical success factors. In: Themistocleous M, Papadaki M (eds) Information systems. EMCIS 2019. Lecture Notes in Business Information Processing, vol 381. Springer, Cham 40. Kerroum K, Khiat A, Bahnasse A, Aoula E, Khiat Y (2020) The proposal of an agile model for the digital transformation of the University Hassan II of Casablanca 4.0. Proc Comput Sci 175:403–410 41. Burchardt C, Maisch B (2019) Digitalization needs a cultural change–examples of applying agility and Open Innovation to drive the digital transformation. Proc CIRP 84:112–117 42. Shin S, Ho J, Pak V (2020) Digital transformation through e-Government innovation in Uzbekistan. In: 2020 22nd international conference on advanced communication technology (ICACT), Phoenix Park, PyeongChang, Korea (South), pp. 632–639 43. Morales V, Bayona-Oré S (2019) Factores Críticos de Éxito en el Desarrollo de E-Gobierno: Revisión Sistemática de la Literatura. Revista Ibérica de Sistemas e Tecnologias de Informaçã 23:233–247
Application of Machine Learning Methods on IoT Parking Sensors’ Data Dražen Vuk and Darko Androˇcec
Abstract Internet of things brings many innovations and will impact our everyday life. Smart parking is one of the important IoT usage scenarios in urban environments. In this work, we show how to use machine learning methods on Internet of things parking sensors’ data to detect free parking spaces. We have used about 100,000 instances of data from NBPS parking sensors provided by Mobilisis company. These are actual data from parking sensors with a magnetometer deployed all over the world. The data was preprocessed, normalized, and clustered, because temperature has a large effect on the value of the magnetometer. Next, the XGBoost algorithm and different architectures of artificial neural networks were used to predict whether the parking space is free or not. Used machine learning methods achieve better accuracy than the current classic algorithm based on the history data of a particular parking sensor that is currently used in production (Mobilisis smart parking solution). Keywords Machine learning · Parking sensor · Smart city · Artificial neural networks · Internet of things
1 Introduction Internet of things (IoT) extends the use of the Internet by using different physical objects—things (sensors, actuators, and microcontrollers). IoT services often use cloud computing to store sensors’ data and access it from remote locations. Smart city concepts use both mentioned technologies to improve the lives of citizens. One of the key issues for citizens is to find an available parking spot in the city. For that reason, many companies have decided to manufacture parking sensors and appropriate smart D. Vuk (B) Mobilisis d.o.o, Varazdinska ulica - odvojak II 7, 42000 Varaždin, Jalkovec, Croatia e-mail: [email protected] D. Androˇcec Faculty of Organization and Informatics, University of Zagreb, Pavlinska 2, 42000 Varaždin, Croatia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_14
157
158
D. Vuk and D. Androˇcec
parking cloud solutions. One of such companies is Mobilisis d.o.o. with more than 4,000 parking sensors installed worldwide. In this work, we applied machine learning methods (XGBoost and artificial neural networks) to detect whether a car is above a specific parking sensor, which means a specific parking space is not available. The current detection is located using a classic software algorithm on the sensor itself, and an additional detection check is once again performed on the server based on the history of all parking for a particular sensor, and in this way, the algorithm concludes whether the space is free or occupied. In this way, the detection of NBPS sensors is 97% accurate, and in this work, we will show that more accurate prediction is possible by using machine learning methods. The remaining sections of this paper are organized as follows: In Sect. 2, a current state of the art is given. Section 3 explains used data and preprocessing. In Sect. 4, we give details on used machine learning methods and our models’ results. Finally, Sect. 5 concludes this paper and gives guidelines for future research.
2 Related Work Nowadays, smart city services are often driven by the Internet of things technologies. Mijac et al. [1] conducted a systematic literature review in order to investigate proposed smart city services driven by IoT. After obtaining the list of relevant papers, the papers were categorized by the smart city services into the following categories: traffic and transport; environment monitoring; accessibility and healthcare; waste management; public lighting; energy management; city infrastructure, and category other. Neirotti et al. [2] provided a comprehensive definition of the smart city term. They also explored various smart city initiatives. There are also existing works on applying machine learning methods in smart city scenarios. Mahdavinejad et al. [3] assess the various machine learning methods that deal with the challenges presented by IoT data by considering smart cities as the main use case. The key contribution of this study is the presentation of a taxonomy of machine learning algorithms explaining how different techniques are applied to the smart city IoT data. Mohammadi and Al-Fuqaha [4] proposed a framework for semi-supervised deep reinforcement learning for smart city big data. They also articulated several challenges and trending research directions for incorporating machine learning to realize new smart city services. Chin et al. [5] tested the performance of four machine learning classification algorithms in correlating the effects of weather data on short journeys made by cyclists in London. Ullah et al. [6] performed a comprehensive review to explore the role of artificial intelligence, machine learning, and deep reinforcement learning in the evolution of smart cities. One of their topics was also machine learning method usage in intelligent transportation systems. Smart parking is the main theme of some existing works. Khanna and Anand [7] proposed an IoT-based cloud-integrated smart parking system. Lin et al. [8] performed a survey on smart parking solutions. Al-Turjman and Malekloo [9] classified the smart parking systems while considering soft and hard design factors. Bock
Application of Machine Learning Methods …
159
et al. [10] described machine learning approaches to automatically generate maps of parking spots and to predict parking availability. Saharan et al. [11] proposed an occupancy-driven machine learning based on-street parking pricing scheme. The proposed scheme uses machine learning based approaches to predict occupancy of parking lots in Seattle city, which in turn is used to deduce occupancy-driven prices for arriving vehicles. Amato et al. [12] proposed a CNN architecture for visual parking occupancy detection. Provoost et al. [13] examined the impact of the Web of things and artificial intelligence in smart cities, considering a real-world problem of predicting parking availability. Traffic cameras are used as sensors, together with weather forecasting web services. Yamin Siddiqui et al. [14] presented smart occupancy detection for road traffic parking using a deep extreme learning machine.
3 Data of Parking Sensors NarrowBand parking sensor (NBPS) was used in our work. It is an autonomous, wireless compact sensor for monitoring the occupancy of parking spaces, which allows cities to more easily manage the challenges of parking. The sensor uses earth magnetic field measurement technology to detect the presence of vehicles in the parking lot. These sensors are installed under a layer of asphalt in each parking space. In the event of any strong magnetic change, the sensor sends all changes to the server via the NBIoT network [20]. The packet sent to the server consists of a dozen of different data. The packet itself is 48 bytes in size. The data that are important for this work are the values of both magnetometers (x, y, z and x2, y2, z2) and the temperature for each magnetometer. The IoT NBPS sensor is manufactured by Mobilisis d.o.o. Raw data from parking sensors were obtained from Mobilisis d.o.o company. These are actual data from parking sensors with a magnetometer. The mentioned sensors are spread all over the world. The data obtained from the parking sensors are the values of two magnetometers and temperatures, seven values in total: (x, y, z), (x2, y2, z2), and temperature. Also, the detection of the presence/absence of the car above the sensor, the parking time, the location code, and the sensor MAC address for each data is obtained. The biggest problem with raw data is that each sensor gets completely different values from the magnetometer in the case when there is no car above the sensor or when the car is above the sensor, and this problem is mostly related to the calibration of the magnetometer and the problem of so-called “drifting”. Magnetic values when the temperature, sensor age, and environment change, the calibration is stable only at the temperatures at which the calibration was performed, but when the temperature changes the magnetometer changes its magnetic reading by more than 150 milligaus, which is equal to the change in a situation where the car is above the sensor. Also, it is important to mention that with each sensor this change in the reading of magnetic values in relation to temperature is different. The problem can also be manholes, metal fences, neighboring cars that do not park according to regulations, but park near or across the line, metal dust falling from the car, power lines, transformers,
160
D. Vuk and D. Androˇcec
and everything else that can affect the magnetic change. Ideally, raw data should be prepared so that each sensor has similar values as all other sensors when there is no car above it. This is unfortunately impossible due to the orientation of the sensor as well as various other influences.
4 Application of Machine Learning and Results We have used Keras [15] framework (neural network method/deep learning) and XGBoost [16] (gradient boosting decision tree algorithm) to detect cars in parking spaces. For both machine learning methods, it will be explained in detail what the input data are, how we have prepared input data, and which combinations give the best results. Complete Python code used in this work is publicly available at https:// github.com/dvuk/primjena_metoda_strojnog_ucenja_na_podatke_interneta_stvari.
4.1 Normalization of Parking Sensor Data Examples of how to normalize the values from the NBPS parking sensor will be shown below. The values for normalization will be the values from the two magnetometers {x1, y1, z1} and {x2, y2, z2}, temperature, magnitude calculated by both sensors, and difference of vector from magnetometer M1 and M2. We have used the MinMax normalization algorithm. MinMax normalization is a normalization strategy that linearly transforms x into y = (x-min)/(max-min), where min and max are the minimum and maximum values in X. When x = min, then it is y = 0, and when x = max, then y = 1. This means that the minimum value in X is mapped to 0, and the maximum in X at 1. Thus, the entire range of X values from min to max is mapped to the range from 0 to 1.
4.2 XGBoost 100,000 instances of data have been prepared for car detection using the XGBoost algorithm [16], of which the first 90,000 data will be reserved for training while the second part of 10,000 will be used to test machine learning accuracy. The input parameters for machine learning will be the values from the first magnetometer (X, Y, Z), the values from the second magnetometer (x, y, z), temperature, magnitude from the first magnetometer, the difference of the vector between the first and the second magnetometer, and the sum of the vectors of the first magnetometer. The importance of features for the predictive modeling problem was created using the “sclearn.feature-selection” library in Python. Basically, this part tests the model by removing the features according to their importance. If the feature has no effect
Application of Machine Learning Methods …
161
Fig. 1 Importance of different features
on the prediction or if it has a negative effect on the prediction, then such a feature should be removed. Figure 1 shows a graph with the importance of each input feature. 90,000 instances of data were used to train the XGBoost model. The GridSearchCV class was used to search for the parameters that give the best results. By looking for the parameters that give the best result, the accuracy increased from 90.59 to 94.19%, so the obtained result was increased by 3.6%. Executing our code to find the best parameters (−0.217778, max-depth = 8, n-estimators = 150), while the default or the default data in XGBClassifier is max-depth = 4, n-estimators = 100. With help of tuning of the XGBClassifier parameters and the training of the 90,000 instances, it has been achieved a score of 94.19%. It is important to note that the data with which the model was built are different from the data with which the prediction was tested. Normalizing the input did not help machine learning in the prediction and accuracy of car detection. When we trained a model with only one sensor, then machine learning gives slightly better results and the accuracy increases to 97.4, but on the other hand, it is almost practically impossible to train the model for each individual sensor. Therefore, the aim of this paper is to find a solution that will give the accuracy of the prediction for at least 97%, but for all sensors, not just one.
4.3 Neural Networks Models For training neural network models in Keras, we have used raw sensor data (90,000 instances for train set, and 10,000 instances for test set). The GridSearchCV [17] tool was used to tune the parameters. GridSearchCV works on the principle of searching for all combinations for a particular set of data and models. We have tuned the
162
D. Vuk and D. Androˇcec
following parameters: optimizer algorithm, initializer, epochs, batches, and activation function. Using the GridSearchCV class for the activation parameter, the best result is obtained with softplus. The GridSearchCV class was also used for the batch-size, epochs, init, and optimizer parameters. For the most accurate detection the following parameters were obtained: “batch-size”: 20, “epochs”: 100, “init”: “uniform”, “optimizer”: “rmsprop”. These results took 19 h to GridSearchCV algorithm compare all combinations and show which parameters are best to use for the largest accuracy. The selection and number of layers for Keras neural network models cannot be calculated analytically. Basically one should start with trial and error. Different combinations need to be tried, depending on the domain problem. Our neural network model implementation is publicly available at https://github.com/dvuk/primjena_ metoda_strojnog_ucenja_na_podatke_interneta_stvari. We have used the following layers: sequential, LSTM [18], dense, dropout, and simple RNN [19]. Using the GridSearchCV tool to select the best parameters, setting the layers for deep learning and training models with 90,000 parking spaces achieved a result of only 68.91%. When we trained a model with only one sensor then deep learning gives much better results, accuracy increases to as much as 97.83%, but it is infeasible to train separately for each sensor.
4.4 Using Clustering to Improve Results To improve the existing results, which are still unsatisfactory both for Keras and XGBoost models, raw data should be prepared in a way that values of magnetometer when there is no car on a parking lot are similar regardless of their magnetometer calibration, temperature drift, or sensor orientation. The idea is to make an algorithm similar to data clustering and it is applied to each sensor individually. That would mean that any magnetic change should be stored in a memory or in a database. For each new magnetic value from the sensor, we should look for the most common value in the database with a specific temperature, because the temperature has a large effect on the value of the magnetometer. The most common value obtained for each axis separately should actually be a reference to when the car is not above the sensor. Preparing data in this way ensures that raw data is obtained from all magnetic sensors so that X, Y, and Z have almost the same values when there is no car above the sensor. As already mentioned, the most common value of the magnetometer is (−506, 120, −360) and if the new value obtained from the parking sensor is, for example, (−606, −64, −60) then by subtracting the vector we get the following value (−506–606, −120–64, −360–60) => (101, −56, −300). This example shows that X, Y, and Z still have large values and are far from the reference or most common value. From the above, it can easily be concluded that it is a car parked above this sensor. In theory, if such values are obtained from the parking sensor (−515, −135, −345) and subtract the reference or most common value as in the following example: (−506–515, −120–135, −360–344) => (9, 15, −16) it is clear that the values are
Application of Machine Learning Methods …
163
small. By subtracting the most common reference value, values close to zero are obtained, which means that there is no car above a sensor. This method significantly increases accuracy and enable our XGBoost and neural network models to have a more accurate prediction. Using this preprocessing method, Keras model achieved an accuracy of 98,08%, and the XGBoost model achieved an accuracy of 98,57%, which is a better result than 97% achieved using current non-machine learning methods used in Mobilisis’s smart parking solution in production. Of course, there are also some downsides to this algorithm. Namely, for each sensor in the database or some other container, we must have a parking history stored in order to calculate the most common or a reference value. For example, if a new sensor is installed, it will not have any history and it will be necessary to make at least three parkings, so that the algorithm can learn which values are the most common.
5 Conclusion Smart parking initiatives are one of the most popular smart city use cases. Intelligent parking systems assist drivers to find an available parking space by using a mobile application or digital displays near to roads. In this work, we show how to use machine learning methods such as XGBoost and artificial neural networks in Keras on Internet of things parking sensors’ data to detect free parking spaces. We have used actual data from parking sensors with a magnetometer deployed all over the world from NBPS parking sensors provided by Mobilisis d.o.o. company. Using the preprocessing method, the Keras model achieved an accuracy of 98,08%, and XGBoost model achieved an accuracy of 98,57%, which are better results than 97% achieved using non-machine learning methods used in Mobilisis’s smart parking solution currently in production. Our models are publicly available at https://github. com/dvuk/primjena_metoda_strojnog_ucenja_na_podatke_interneta_stvari. In our future work, we plan to apply our model to other parking sensors’ data. We will also try other state-of-the-art machine learning algorithms (e.g., LightGBM, CatBoost), and various other architectures of artificial neural networks. Accuracy of machine learning methods is better than classical algorithms, but data availability and preparation are crucial here, so we will examine more on parking data preprocessing and post-processing in our future work.
References 1. Mijac M, Androcec D, Picek R (2017) Smart city services driven by IoT: a systematic review. J Econ Soc Dev 4:40–50 2. Neirotti P, De Marco A, Cagliano AC, Mangano G, Scorrano F (2014) Current trends in smart city initiatives: some stylised facts. Cities 38:25–36. https://doi.org/10.1016/j.cities.2013. 12.010
164
D. Vuk and D. Androˇcec
3. Mahdavinejad MS, Rezvan M, Barekatain M, Adibi P, Barnaghi P, Sheth AP (2018) Machine learning for internet of things data analysis: a survey. Digit Commun Netw 4:161–175. https:// doi.org/10.1016/j.dcan.2017.10.002 4. Mohammadi M, Al-Fuqaha A (2018) Enabling cognitive smart cities using big data and machine learning: approaches and challenges. IEEE Commun Mag 56:94–101. https://doi.org/10.1109/ MCOM.2018.1700298 5. Chin J, Callaghan V, Lam I (2017) Understanding and personalising smart city services using machine learning, the Internet-of-Things and Big Data. In: 2017 IEEE 26th international symposium on industrial electronics (ISIE). pp. 2050–2055. https://doi.org/10.1109/ISIE.2017. 8001570 6. Ullah Z, Al-Turjman F, Mostarda L, Gagliardi R (2020) Applications of artificial intelligence and machine learning in smart cities. Comput Commun 154:313–323. https://doi.org/10.1016/ j.comcom.2020.02.069 7. Khanna A, Anand R (2016) IoT based smart parking system. In: 2016 international conference on internet of things and applications (IOTA), pp. 266–270. https://doi.org/10.1109/IOTA.2016. 7562735 8. Lin T, Rivano H, Mouël FL (2017) A survey of smart parking solutions. IEEE Trans Intell Transp Syst 18:3229–3253. https://doi.org/10.1109/TITS.2017.2685143 9. Al-Turjman F, Malekloo A (2019) Smart parking in IoT-enabled cities: a survey. Sustain Cities Soc 49: https://doi.org/10.1016/j.scs.2019.101608 10. Bock F, Di Martino S, Sester M (2017) Data-driven approaches for smart parking. In: Altun Y, Das K, Mielikäinen T, Malerba D, Stefanowski J, Read J, Žitnik M, Ceci M, Džeroski S (eds) Machine learning and knowledge discovery in databases, pp 358–362. Springer, Cham. https://doi.org/10.1007/978-3-319-71273-4_31 11. Saharan S, Kumar N, Bawa S (2020) An efficient smart parking pricing system for smart city environment: a machine-learning based approach. Future Gener Comput Syst 106:622–640. https://doi.org/10.1016/j.future.2020.01.031 12. Amato G, Carrara F, Falchi F, Gennaro C, Meghini C, Vairo C (2017) Deep learning for decentralized parking lot occupancy detection. Expert Syst Appl 72:327–334. https://doi.org/ 10.1016/j.eswa.2016.10.055 13. Provoost JC, Kamilaris A, Wismans LJJ, van der Drift SJ, van Keulen M (2020) Predicting parking occupancy via machine learning in the web of things. Internet Things 12: https://doi. org/10.1016/j.iot.2020.100301 14. Yamin Siddiqui S, Adnan Khan M, Abbas S, Khan F (2020) Smart occupancy detection for road traffic parking using deep extreme learning machine. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2020.01.016 15. Géron A (2019) Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Inc 16. Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794. ACM, San Francisco California USA. https://doi.org/10.1145/2939672.2939785 17. Ranjan GSK, Verma AK, Radhika S (2019) K-nearest neighbors and grid search CV based real time fault monitoring system for industries. In: 2019 IEEE 5th international conference for convergence in technology (I2CT), pp 1–5. https://doi.org/10.1109/I2CT45611.2019.903 3691 18. Gers FA, Schmidhuber J, Cummins F (1999) Learning to forget: continual prediction with LSTM, pp 850–855. https://doi.org/10.1049/cp:19991218 ˇ 19. Mikolov T, Kombrink S, Burget L, Cernocký J, Khudanpur S (2011) Extensions of recurrent neural network language model. In: 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 5528–5531. https://doi.org/10.1109/ICASSP.2011.594 7611
A Fast Algorithm for Image Deconvolution Based on a Rank Constrained Inverse Matrix Approximation Problem Pablo Soto-Quiros, Juan Jose Fallas-Monge, and Jeffry Chavarría-Molina
Abstract In this paper, we present a fast method for image deconvolution, which is based on the rank constrained inverse matrix approximation (RCIMA) problem. The RCIMA problem is a general case of the low-rank approximation problem proposed by Eckart-Young. This new algorithm, so-called the fast-RCIMA method, is based on tensor product and Tikhonov’s regularization to approximate the pseudoinverse and bilateral random projections to estimate the rank constrained approximation. The fast-RCIMA method reduces the execution time to estimate optimal solution and preserves the same accuracy of classical methods. We use training data as a substitute for knowledge of a forward model. Numerical simulations on measuring execution time and speedup confirmed the efficiency of the proposed method. Keywords Rank constrained · Pseudoinverse · Fast algorithm · Speedup · Image deconvolution
1 Introduction Image deconvolution is an image processing technique designed to remove blur or enhance contrast and resolution [1, 2]. If X ∈ Rm×n represents an digital image, then the mathematical model in image deconvolution can be written as Ax + η = c,
(1)
P. Soto-Quiros (B) · J. Jose Fallas-Monge · J. Chavarría-Molina Escuela de Matemática, Instituto Tecnológico de Costa Rica, Cartago 30101, Costa Rica e-mail: [email protected] J. Jose Fallas-Monge e-mail: [email protected] J. Chavarría-Molina e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_15
165
166
P. Soto-Quiros et al.
where x ∈ Rmn is a vectorized version of X (i.e., convert X into a column by stacking its columns), A ∈ Rmn×mn is a matrix that models the forward process, η ∈ Rmn is an additive noise, and c ∈ Rmn is a vectorized version of the noisy image C ∈ Rm×n . Given c and A, the goal of the inverse problem is to reconstruct x. Entries in A depend on the point spread functions (PSFs) of the imaging system, and under some assumptions, matrix A may be highly structured so that an efficient algorithms can be used in the inverse problem to reconstruct x [1]. However, matrix A is unknown in most case, and only observed noisy image c is available. To solve this problem, training data is used as substitute for knowledge of the forward model by introducing a rank constrained inverse matrix problem that avoids including A in the problem formulation [2]. p×q Let Rr be the set of all real p × q matrices of rank at most r ≤ min{ p, q}. If S (k) S are training data of vectorized images x and c, respectively, {x }k=1 and {c(k) }k=1 then the goal of the rank constrained inverse matrix problem is to find F ∈ Rrmn×mn that gives a small reconstruction error for the given training set, i.e., find F such that S 1 Fc(k) − x k 22 , F = arg min mn×mn S F∈Rr k=1
(2)
where · 2 is the Euclidean norm. Using relationships between Euclidean norm and Frobenius norm (i.e., · f r ), problem (2) can be reformulated as follows: 1 F = arg min FC − X 2f r , F∈Rrmn×mn S
(3)
where X and C are created using training data, i.e., X = [x (1) x (2) ... x (S) ] ∈ Rmn×S and C = [c(1) c(2) ... c(S) ] ∈ Rmn×S . Once matrix F is computed, a matrix-vector multiplication is required to solve the problem, i.e., x = Fc. Problem (3) is so-called the rank constrained inverse matrix approximation (RCIMA) problem, which is a generalization of the well know low-rank approximation problem proposed by Eckart-Young [3]. In [2], Chung and Chung present a solution of the RCIMA problem, which uses the SVD to estimate the pseudoinverse and the low-rank approximation. The SVD method is very accurate but is critically blocked by computational complexity that makes it impractical in the case of large matrices [4, 5]. In this paper, we propose a new and efficient method to compute a solution of the RCIMA problem (3), so-called the fast-RCIMA method. The approach of this new method uses fast algorithms to approximate the pseudoinverse and low-rank matrix. Most specifically, the fast pseudoinverse technique used in the fast-RCIMA method utilizes a special type of tensor product of two vectors [7] and Tikhonov’s regularization [8, 9] (see Sect. 3.1 below for further details). Besides, a bilateral
A Fast Algorithm for Image Deconvolution Based on a Rank Constrained …
167
random projection and its power-scheme modification [10, 11] are used in this paper to estimate a low-rank approximation. Moreover, in this paper we proposed a generalization of the method developed in [11] to estimate a low-rank approximation, which includes analysis of particular case rank(Y2 ) < r , where Y2 ∈ Rm×r is a right random projection (see Sect. 3.2 below for further details). The proposed implementation to estimate a solution of the RCIMA problem reduces the execution time to compute F in (3) and, moreover, preserves the same accuracy of classical method developed in [2]. In this work, we choose methods already mentioned briefly in [7, 8, 10, 11] for their easy computational implementation and their high-speed and efficiency to calculate the pseudoinverse and low-rank approximation. However, there are other fast algorithms to estimate the pseudoinverse and low-rank approximation (see, e.g., [6, 12–15]). Throughout this paper, we use the following notation: M † denotes the pseudoinvserse of M, Im ∈ Rm×m is the identity matrix, and M(1 : i, :) and M(:, 1 : j) are formed with the first i rows and j columns, respectively, of M. The paper is organized as follows. Section 2 presents a solution of the RCIMA problem based on the SVD method given in [2]. Fast algorithms to estimate the pseudoinverse and low-rank approximation, respectively, are explained in Sect. 3. In Sect. 4, the fast-GRLMA method is proposed to approximate a solution to problem (3). A numerical example based on image deconvolution is presented in Sect. 5. Finally, Sect. 6 contains some concluding remarks.
2 RCIMA Method A solution of problem in (3) is proposed by Chung and Chung [2]. Let C = U V T be the singular value decomposition (SVD) of C. If k = rank(C) and P = X Vk VkT , where Vk = V (:, 1 : k), then a solution of the RCIMA problem is given by F = Pr C † ,
(4)
where Pr is the optimal low-rank approximation of P, such that rank(Pr ) ≤ r (see Sect. 3.2 below for more details). Matrix P is known as the kernel of the RCIMA problem. The associated implementation of (4) is presented below in Algorithm 1. This implementation is so-called the RCIMA method.
168
P. Soto-Quiros et al.
Algorithm 1: RCIMA method
1 2 3 4 5 6 7 8
S S Input : Training Data {x (k) }k=1 ⊆ Rmn , {c(k) }k=1 ⊆ Rmn and r ≤ mn Output: F ∈ Rrmn×mn X = [x (1) x (2) ... x (S) ] C = [c(1) c(2) ... c(S) ] [∼, ∼, V ] = svd(C) k = rank(C) Vk = V (:, 1 : k) P = X Vk VkT , ] = svd(P) [U S, V (:, 1 : r )]T C † F = U (:, 1 : r ) S(1 : r, 1 : r )[V
3 Fast Algorithm to Compute Pseudoinverse and Low-Rank Approximation As mentioned in the above Sect. 2, the RCIMA method uses the SVD to compute pseudoinverses and a low-rank approximation, respectively. However, the SVD method is usually very expensive for high-dimensional matrices. Therefore, in this section, we show two fast algorithms to compute pseudoinverse and low-rank approximation.
3.1 Pseudoinverse and Tensor Product Matrix The pseudoinverse of Z ∈ Rm×n , denoted bv y Z † ∈ Rn×m , is the unique matrix T T satisfying the conditions Z Z † Z = Z , Z † Z Z † = Z † , Z † Z = Z † Z and Z Z † = Z Z † . If Z = U V T is the SVD of Z and k = rank(Z ), then the standard procedure to calculate Z † is as follows: Z † = U (:, 1 : k)[(1 : k, 1 : k)]−1 [V (:, 1 : k)]T .
(5)
Katsikis and Pappas [7] provides a fast and reliable algorithm in order to compute the pseudoinverse of full-rank matrices, which does not use the SVD method. This algorithm is based on a special type of tensor product of two vectors, that is usually used in infinite dimensional Hilbert spaces. The method to compute the pseudoinverse of a full-rank matrix Z in [7] is defined by Z† =
(Z T Z )−1 Z T if m ≥ n . Z T (Z Z T )−1 if m < n
(6)
A Fast Algorithm for Image Deconvolution Based on a Rank Constrained …
169
In 2008, Lu et al. [8] extended the method in [7] for computing the pseudoinverse of rank deficient matrices, using an approximation of Tikhonov’s regularization in order to estimate Z † . If Z is a rank deficient matrix, then the method proposed in [8] to approximate Z † is defined by Z ≈ ZP = †
(Z T Z + α In )−1 Z T if m ≥ n , Z T (Z Z T + α Im )−1 if m < n
(7)
where α is a positive number close to zero. Formula (7) is usefull to approximate Z † because, if α → 0 in (7), then Z † = Z P (see Theorem 4.3 in [9]). The method to estimate the pseudoinverse given by (6)–(7) is so-called the tensor product matrix (TPM) method.
3.2 Low-Rank Approximation and Bilateral Random Projection Given L ∈ Rm×n and r ≤ min{m, n}, the low-rank approximation problem is a minimization problem where the goal is to find L r ∈ Rrm×n such that L − L r 2f r = minm×n L − L r 2f r . L r ∈Rr
(8)
L r is given by If L = U V T is the SVD of L, then L r = U (:, 1 : r )(1 : r, 1 : r )[V (:, 1 : r )]T .
(9)
An alternative method to estimate L r was developed by Fazel et al. [10]. These authors show that a fast method to estimate L r is L r = Y1 (A2T Y1 )† Y2T ,
(10)
where Y1 = L A1 ,
Y2 = L T A2
(11)
are r -bilateral random projections (BRP) of L. Here, A1 ∈ Rn×r and A2 ∈ Rm×r are arbitrary full-rank matrices. Matrices Y1 and Y2 are called the left and right random projections of L, respectively. Comparing with randomized SVD in [14] that extracts the column space from unilateral random projections, the BRP method estimates both column and row spaces from bilateral random projections [11].
170
P. Soto-Quiros et al.
If L is a dense matrix, then number of flops to compute L r is less than that of the SVD based approximation [11]. However, if the singular values of L decay gradually, then the accuracy of (10) may be lost. To solve this problem, Zhou and Tao [11] propose a power-scheme to improve the approximation precision, when A2T Y1 is nonsingular. In this paper, we extend the method consider in [11] for computing the low-rank matrix, when A2T Y1 is singular. The revised method considers a power-scheme given by left and right random projection of X = (L L T )c L and L, respectively, i.e., Y1 = X A1 = (L L T )c L A1 ,
Y2 = L T A2 ,
(12)
where c is a nonnegative integer and A1 ∈ Rn×r , A2 ∈ Rm×r are arbitrary full-rank matrices. This power-scheme is useful because (L L T )c L A1 has the same singular vectors as L A1 , while its singular values decay more rapidly [14]. Besides, note 1 , where A 1 = that left random projection Y1 in (12) can be represented as Y1 = L A L T (L L T )c−1 A1 , i.e., formula (12) is a particular case of (11). Theorem 1 below shows a new alternative formula to estimate a low-rank approximation of L, using a particular choice of A2 . Theorem 1 Consider random projections given by (12), where A1 ∈ Rn×r is a arbitrary full-rank matrix and A2 = L(L T L)c−1 A1 . Then, estimation of low-rank approximation of L in (10) can be simplified to L r = LY2 Y2† .
(13)
Proof Note that (L L T )c L = L(L T L)c , and therefore, left projection in (12) can † be expressed as Y1 = LY2 . Further, from Proposition 3.2 in [9], Y2T Y2 Y2T = Y2† . Then † L r = Y1 A2T Y1 Y2T † T = LY2 L(L T L)c−1 A1 LY2 Y2T
† T c T L L A1 Y2 Y2T = LY2 † = LY2 Y2T Y2 Y2T = LY2 Y2† . Remark 1 Y2† in (13) can be computed by the TPM method explained in Sect. 3.2. Note that we only consider the case when number of rows is greater than or equal to number of columns, because n ≥ r .
A Fast Algorithm for Image Deconvolution Based on a Rank Constrained …
171
The method to estimate the low-rank approximation of L, given by (13) and the TPM method is so-called the modified bilateral random projections (MBRP) method. The pseudocode of the MBRP method is presented in Algorithm 2. Note that bilateral random projections Y1 = L A1 and Y2 = L T A2 = (L T L)c A1 can be computed efficiently using a loop in steps 1–4 of Algorithm 2.
Algorithm 2: MBRP Method for Computing Low-Rank Approximation
1 2 3 4
Input : L ∈ Rm×n , r ≤ min{m, n}, A1 ∈ Rm×r and c ∈ {1, 2, 3, ...} Output: L r ∈ Rrm×n Y2 = A1 for i = 1 : c do Y1 = LY2 Y2 = L T Y1
if Y2 is full rank then L r = LY2 (Y2T Y2 )−1 Y2T else if Y2 is rank deficient then α ∈]0, 1[ L r = LY2 (Y2T Y2 + α In )−1 Y2T 9
5 6 7 8 10
Remark 2 The Power-scheme is useful to improve accuracy of estimation of lowrank approximation. However, if MBRP method is executed in floating-point arithmetic and c is a large number, then all information associated with singular values smaller than μ1/(2c+1) L f r is lost, where μ is the machine precision [14]. Therefore, it is recommended to take a small value for c. Thus, numerical simulations in this paper are computed using c = 3.
4 Fast-RCIMA Method In this section, we propose a new implementation to compute a solution of the RCIMA problem on the basis of the TPM and MBRP methods explained in Sects. 3.1 and 3.2, respectively. A block diagram of this implementation, called the fast-RCIMA method, is presented in Fig. 1, which is explained below: Step 0:
S S and {c}k=1 . Obtain training data {x}k=1
Step 1: Create matrices X = [x (1) x (2) ... x (S) ] and C = [c(1) c(2) ... c(S) ]. Step 2: Compute pseudoinverse C P of C, using the TPM method. Step 3: Compute the kernel matrix P = X Vk VkT , where C = U V T is the SVD of C, Vk = V (:, 1 : k) and k = rank(C) . To avoid to compute the SVD of C, we use the fact that Vk VkT = C † C (see, e.g., equation (2139) in [16]). Therefore, kernel matrix P can be compute as P = XC P C.
172
P. Soto-Quiros et al.
Fig. 1 Block diagram of the fast-RCIMA method
Step 4: Step 5:
r of P, using the MBRP method. Estimate low-rank approximation P r C P . Approximate the RCIMA solution F=P
The associate pseudocode of the fast-RCIMA method is shown in Algorithm 3, where c = 3 is used at the power-scheme.
5 Numerical Simulation In this section, we show a numerical simulation of the proposed fast-RCIMA method for image deconvolution. The numerical examples were run on a desktop computer with a 2.30 GHz processor (Intel(R) Core(TM) i7-4712MQ CPU) and a 8.00 RAM.
5.1 Speedup and Percent Difference Consider two methods that solve the same problem, Method 1 and Method 2, with execution times T1 and T2 , respectively. The computational performance analysis of the fast-RCIMA method is evaluated using the following two metrics: • The speedup S (or acceleration) is the ratio between the execution times of both methods, i.e., S = T2 /T1 . If Method 1 is an improvement of Method 2, then speedup will be grater than 1. However, if Method 1 hurts performance, speedup will be less than 1 (see [17] for more details). • The percent difference P between Method 1 and Method 2, where T1 < T2 , is represented by P = 100(T2 − T1 )/T2 . Then, we say that Method 1 is P% faster than Method 2 (see [17] for more details).
A Fast Algorithm for Image Deconvolution Based on a Rank Constrained …
173
Algorithm 3: Fast-RCIMA Method
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S S Input : Training Data {x (k) }k=1 ⊆ Rmn , {c(k) }k=1 ⊆ Rmn , r ≤ mn and A1 ∈ R S×r Output: F ∈ Rrmn×mn /* Matrix Representation of Training Data X = [x (1) x (2) ... x (S) ] C = [c(1) c(2) ... c(S) ] /* Pseudoinverse of C with TPM Method if C is full rank then if mn ≥ S, then C P = (C T C)−1 C T if mn < S, then C P = C T (CC T )−1 else if C is rank deficient then α ∈]0, 1[ if mn ≥ S, then C P = (C T C + α I S )−1 Z T if mn < S, then C P = C T (CC T + α Imn )−1 /* Kernel Matrix P P = XC P C /* Low-Rank Matrix Approximation of P with MBRP Method Y2 = A1 for i = 1 : 3 do Y1 = PY2 Y2 = P T Y1
if Y2 is full rank then r = PY2 (Y T Y2 )−1 Y T P 2 2 else if Y2 is rank deficient then α ∈]0, 1[ r = PY2 (Y T Y2 + α Ir )−1 Y T P 20 2 2 21 /* Rank Constrained Inverse Matrix Approximation r C P F=P 22
*/
*/
*/ */
16 17 18 19
*/
5.2 Numerical Example for Image Deconvolution This example illustrates the application of the RCIMA method to image deconvolution using training data [2, 18], which was briefly mentioned in Sect. 1. Specifically, we apply RCIMA and fast-RCIMA methods to the problem of filtering a noisyimage C ∈ R90×90 estimation on the basis of the set of a training images X = X (1) , ..., X (S) , where X ( j) ∈ R90 × 90 , for j = 1, ..., S. Our training set X consists of s = 2000 different satellite images taken from the NASA Solar System Exploration Database [19] (see Fig. 2a for 6 sample images). Let vect : Rm×n → Rmn be the vectorization transform. We write x ( j) = vec(X ( j) ) ∈ R8100 , for j = 1, ..., 2000. Instead of images in X , we observed their noisy version C = C (1) , ..., C (s) , where C ( j) ∈ R90×90 . If c( j) = vec(C ( j) ) ∈ R8100 , then c( j) = A( j) x ( j) + n ( j) , where A( j) ∈ R81000×81000 models the forward process and n ( j) ∈ R81000 is generated using the standard normal distribution. Here, A( j) = σ j I81000 and σ j is a constant generated using the standard normal distribution (see Fig. 2b for 6 sample images). We assumed that noisy image C does not
174
P. Soto-Quiros et al.
(a)
(b) Fig. 2 a Some randomly selected satellite images. b Noisy versions of a Table 1 Execution time, MSE, speedup, and percent difference between RCIMA and fast-RCIMA methods cr RCIMA fast-RCIMA Speedup Percent Time (s) MSE Time (s) MSE Difference (%) 0.25 0.5 0.75 1
484.73 454.70 426.31 420.45
102.36 38.72 20.23 2.01
30.47 32.75 34.91 38.01
102.36 38.72 20.23 2.01
15.91 13.88 12.21 11.06
93.71 92.80 91.81 90.95
necessarily belong to C, but it is “similar” to one of them, i.e., there is C ( j) ∈ C such that C ( j) ∈ arg min C (i) − C f r ≤ δ, C (i) ∈C
for a given δ ≥ 0. Finally, to compute the optimal matrix F ∈ R81000×81000 in the RCIMA problem in (3), we defined matrices X = [x (1) x (2) ... x (s) ] ∈ R8100×2000 and C = [c(1) c(2) ... c(s) ] ∈ R8100×2000 . The compression ratio in RCIMA problem is given by cr = r/(mn). Table 1 presents the execution time, MSE, speedup and percent difference between RCIMA and fast-RCIMA methods, for compression ratio cr ∈ {0.25, 0.5, 0.75, 1}, i.e., r ∈ {2025, 4050, 6075, 8100}. Figure 3 shows the estimates of C using both algorithms. Results in Table 1 suggest that fast-RCIMA method can obtain a nearly optimal approximation of the RCIMA problem in less time. Moreover, last column of Table 1 shows that the fast-RCIMA method is 90% faster, approximately, than the RCIMA method. Besides, Fig. 3 shows that the fast-RCIMA method compute optimal F in (3) with the same accuracy as RCIMA method.
A Fast Algorithm for Image Deconvolution Based on a Rank Constrained …
175
(a)
(c)
(e)
(g)
(i)
(b)
(d)
(f)
(h)
(j)
Fig. 3 Illustration to the estimation of noisy image C by RCIMA and fast-RCIMA methods. a Source image X . b Noisy observed image C. c–d Estimation using the RCIMA and fast-RCIMA methods, respectively, for cr = 0.25. e–f Estimation using the RCIMA and fast-RCIMA methods, respectively, for cr = 0.5. g–h Estimation using the RCIMA and fast-RCIMA methods, respectively, for cr = 0.75. i–j Estimation using the RCIMA and fast-RCIMA methods, respectively, for cr = 1
6 Conclusions In this paper, we proposed in this paper a new and faster method for image deconvolution, based on the RCIMA problem in (3), using training data as a substitute for knowledge of a forward model. This new method, so-called the fast-RCIMA method, is based on tensor product and Tikhonov’s regularization to approximate the pseudoinverse, and bilateral random projections to estimate the low-rank approximation. Moreover, in Theorem 1 we present an alternative approach to compute the low-rank matrix approximation. Based on the numerical simulations in Sect. 5, the fast-RCIMA method reduces significantly the execution time to compute optimal solution and increases the speedup, preserving the same accuracy of the classical method to solve the RCIMA problem. Numerical simulation to filter a noisy image in Sect. 5 demonstrates the advantages of the proposed method. Acknowledgements This work was financially supported by the Vicerrectoría de Investigación y Extensión from Instituto Tecnológico de Costa Rica, Cartago, Costa Rica (Research #1440037).
References 1. Campisi P, Egiazarian K (2013) Blind image deconvolution: theory and applications. CRC Press 2. Chung J, Chung M (2013) Computing optimal low-rank matrix approximations for image processing. In: Proceddings IEEE 2013 Asilomar conference on signals, systems and computers, pp 670-674. https://doi.org/10.1109/ACSSC.2013.6810366
176
P. Soto-Quiros et al.
3. Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1(3):211–218. https://doi.org/10.1007/BF02288367 4. Wu T, Sarmadi S, Venkatasubramanian V, Pothen A, Kalyanaraman A (2015) Fast SVD computations for synchrophasor algorithms. IEEE Trans Power Syst 31(2):1651–1652. https://doi. org/10.1109/TPWRS.2015.2412679 5. Allen-Zhu Z, Li Y (2016) LazySVD: even faster SVD decomposition yet without agonizing pain. In: Advances in neural information processing systems, 974–982 6. Courrieu P (2005) Fast computation of Moore-Penrose inverse matrices. Neural information processing. Lett Rev 8(2):25–29 7. Katsikis V, Pappas D (2008) Fast computing of the Moore-Penrose inverse matrix. Electron J Linear Algebra 17:637–650. https://doi.org/10.13001/1081-3810.1287 8. Lu S, Wang X, Zhang G, Zhou Z (2015) Effective algorithms of the Moore-Penrose inverse matrices for extreme learning machine. Intel Data Anal 19(4):743–760. https://doi.org/10. 3233/IDA-150743 9. Barata J, Hussein M (2012) The Moore-Penrose pseudoinverse: a tutorial review of the theory. Brazilian J Phys 42:146–165. https://doi.org/10.1007/s13538-011-0052-z 10. Fazel M, Candes E, Recht B, Parrilo P (2008) Compressed sensing and robust recovery of low rank matrices. In: 42nd Asilomar conference on signals, systems and computers, pp 1043–1047. https://doi.org/10.1109/ACSSC.2008.5074571 11. Zhou T, Tao D (2012) Bilateral random projections. In: IEEE international symposium on information theory proceedings, pp 1286–1290. https://doi.org/10.1109/ISIT.2012.6283064 12. Telfer B, Casasent D (1994) Fast method for updating robust pseudoinverse and Ho-Kashyap associative processors. IEEE Trans Syst Man Cybern 24(9):1387–1390. https://doi.org/10. 1109/21.310515 13. Benson M, Frederickson P (1986) Fast parallel algorithms for the Moore-Penrose pseudoinverse. In: Second conference on hypercube multiprocessors. https://www.osti.gov/biblio/ 7181991-fast-parallel-algorithms-moore-penrose-pseudo-inverse 14. Halko N, Martinsson P, Tropp J (2011) Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288. https:// doi.org/10.1137/090771806 15. Deshpande A, Vempala S (2006) Adaptive sampling and fast low-rank matrix approximation. In: Approximation, randomization, and combinatorial optimization. Algorithms and techniques. Springer, pp 292–303. https://doi.org/10.1007/11830924_28 16. Dattorro J (2005) Convex optimization † Euclidean distance geometry. Meboo Publishing. https://ccrma.stanford.edu/~dattorro/0976401304.pdf 17. Cragon H (2000) Computer architecture and implementation. Cambridge University Press 18. Soto-Quiros P, Torokhti A (2019) Improvement in accuracy for dimensionality reduction and reconstruction of noisy signals. Part II: the case of signal samples. Signal Process 154:272–279. https://doi.org/10.1016/j.sigpro.2018.09.020 19. NASA (2020) NASA solar system exploration database. Online, Accessed 10 Sept 2020. https:// solarsystem.nasa.gov/raw-images/raw-image-viewer
On-Body Microstrip Patch Antenna for Breast Cancer Detection Sourav Sinha, Sajidur Rahman, Mahajabin Haque Mili, and Fahim Mahmud
Abstract Breast cancer is the most common invasive cancer for women. It is the second major cause of cancer that causes death after lung cancer in women. This paper portrays an on-body microstrip patch rectangular antenna, which is found to operate at ISM-Industrial, Scientific and Medical band of 2.4–2.4835 GHz after placing it on the surface of the human breast, designed in the CST microwave studio to specify the tumor in narrow bandwidth. Being highly flexible, FR4 is selected as a substrate, and copper is selected for both ground and patch. To guarantee the safety of the patient, a human breast phantom is constructed consisting of two layersskin and glandular tissue. The tumor is positioned at different locations on the breast phantom model to ensure the efficiency of the device. The S11 value without tumor is −49.405 dB and the voltage standing wave ratio is 1.0067957. Specific absorption rate is 1.18, total efficiency is −6.846 dB and radiation efficiency is −6.846 dB. To make the device biocompatible, all these parameters are experimented by comparing the cancerous tumor’s location and without the cancerous tumor. Keywords Human breast model · Glandular tissue · On-body antenna · SAR
1 Introduction Breast cancer is a type of disease that occurs when the cells in the breast grow excessively. Most breast cancers initiate in the lobules or ducts. Based on the capacity of extension and growth, tumors are classified into malignant and benign. According to the survey by National Cancer Institute, death rates due to breast cancer among
S. Sinha (B) Technische Universität München, Arcisstr. 21, 80333 Munich, Germany S. Rahman Universität Bremen, Bibliothekstr. 1, 28359 Bremen, Germany M. H. Mili · F. Mahmud American International University-Bangladesh (AIUB), Dhaka 1229, Bangladesh © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_16
177
178
S. Sinha et al.
women aging between 20 and 49 were more than double that of any other cancercausing death among women or men from 2012 to 2016. X-ray mammogram, ultrasound, and MRI-magnetic resonance imaging are traditionally implemented for the discovery of breast cancer. These methods possess some limitations as 4–34% of all breast cancers are unidentified due to the differentiation of malignant tissue or poor detection of cells. However, promising results are exhibited by microwave imaging (MI) [1]. The basic technique involved in the MI system is that it transmits and receives the scattered signal for diagnosis. Variant results between the electric and magnetic fields play a major role in recognizing the location of the tumor and its growth [2]. In 1953, microstrip radiators were initially found and extensive researches were made regarding their properties [3]. Ground, substrate, and patch resonator were the elements of this antenna. Being light, low-cost, ease of fabrication, and having a low profile, this antenna is employed for industrial and medical purposes [4]. Its properties would more be increased if substrate thickness, feed line dimension, and patch are optimized [5]. Researches are being continued for the past two decades to mitigate the issue of narrow bandwidth (low factorial bandwidth (FBW = 7%) and to enhance communication and receive more advantages in microstrip antennas [5]. The radiating element can be triangular, semicircular, circular, and square [6]. In this research, an on-body microstrip patch antenna is proposed which is to be placed on the surface of the breast skin and operating at ISM band, frequency of 2.4–2.4835 GHz. The antenna is designed for all human beings with flexible and cheap costs to detect the tumor in the early stage if present. Having several advantages, International Telecommunication Union declared radio communication at the frequency of 2.4–2.4835 GHz for the purpose of industrial, scientific, and medical [7]. The fundamental principle is that human organs containing different bio-tissues possess varied characteristics in terms of parameters such as conductivity and dielectric constant [8]. The on-body antenna is composed of ground, patch, and substrate. The dimensions (length, width, and thickness) are considered in such a way that they can be used by all kinds of people. A human breast phantom is designed with skin and glandular tissue being the two layers on the CST microwave studio. The skin encompasses the glandular tissue present inside the breast. All necessary dielectric material properties are maintained using library materials from CST microwave. A sphericalshaped tumor is designed using the above-mentioned software. For precise detection, the location of the tumor was changed and compared with the presence of tumor and absence of tumor. Additionally, return loss, operating frequency, bandwidth, directivity, radiation pattern, gain, voltage standing wave ratio, specific absorption rate, electric and magnetic field are determined. The structure and design method (Sect. 2), antenna characteristics without cancerous tumor (Sect. 3), and antenna characteristics with a cancerous tumor (Sect. 4) are demonstrated, respectively.
On-Body Microstrip Patch Antenna for Breast Cancer Detection
179
2 Structure and Design Method 2.1 Design of Antenna The fundamental element in microwave imaging is an antenna. The proposed antenna is a rectangular shape with a length of 15 mm and a width of 15 mm to sustain biocompatibility. Radiating rectangular patch of the antenna fed by a rectangular feed line. The antenna consists of substrate, patch, and ground and power is supplied by the feed line. The antenna is experimented with copper in patch and ground of the thickness is 0.1 mm. FR4 was selected as a substrate with 0.8 mm thickness. The propounded antenna thickness is 1 mm. The propounded antenna dimension is (15 × 15 × 1) mm3. Hence, the CST microwave studio can calculate numerous monitors and have the capability to extract a high resolution of antenna data, the experiment was performed in the CST microwave studio [9]. The propounded antenna is designed for working on the surface of the human breast, therefore it is positioned on the lateral sides of the human breast phantom prototype to check its biocompatibility and performances. The spherical-shaped tumor of 5 mm radius is constructed in the CST microwave studio and positioned within the phantom. The results were compared based on the different locations of the tumor. Figure 1a demonstrates the propounded antenna’s geometrical view. The values which are labeled in Fig. 1a are in the unit of millimeter. Figure 1b shows the antenna along with the waveguide port and breast phantom. The power is supplied in a 2 mm wide feedline by the waveguide port. The red part marked in Fig. 1b represents the waveguide port. In Table 1, all parameters of the propounded on-body antenna are tabulated.
a.
b.
Fig. 1 Propounded microstrip patch antenna a Dimensions b with waveguide port and breast phantom
180
S. Sinha et al.
Table 1 The antenna parameters
Antenna part
Length (mm)
Width (mm)
Thickness (mm)
Feed
5.5
2
0.1
Substrate
15
15
0.8
Ground
5.5
15
0.1
2.2 Equations Employed in Design of Propounded Antenna Different parameters are calculated using the following equations for the propounded on-body microstrip patch antenna [10]. Width C
W = 2fr
(1)
εr +1 2
Here, fr = Operating Frequency c = 3 × 108 m/s (Speed of light) Er = 3.5 (The dielectric substrate’s relative permittivity) Dielectric Constant (Effective), εe f f
⎡ 1 εr + 1 εr − 1 ⎣ + = 2 2 1+
⎤ ⎦
(2)
12h w
W = Patch width h = Thickness of substrate (0.8 mm) Length (Effective) Lef f =
c √ 2 fr εe f f
(3)
Length Extension L = 0.412h
+ 0.264 − 0.258 wh + 0.8
εe f f + 0.3 εe f f
w h
(4)
Actual length of the Patch L = L e f f − 2L
(5)
The calculated values obtained from solving the equation are not enough to reach the objective of the design. Therefore, the value of length and width are lessened by keeping the ratio unchanged until ISM band frequency is obtained.
On-Body Microstrip Patch Antenna for Breast Cancer Detection
181
Fig. 2 Human breast phantom (Cross-sectional view)
2.3 Human Breast Phantom Model and Tissue Properties The human breast phantom is created consisting of skin and glandular tissue. All the dielectric properties are completely regulated, such as conductivity, relative permittivity, density, thickness, and loss tangent. The human breast phantom cross-sectional view is visible in Fig. 2. The semicircular breast phantom is constructed in the CST microwave studio, which resembles the human breast layout. The layout is shaped as half sphere and the antenna is placed on the phantom surface. A thickness of 1 and 23 mm outer radius is considered for the skin. Inside the skin, breast fatty tissue (fibro glandular) is placed with a 22 mm radius. The defected cell is located inside the fatty tissue with a 2 mm radius (Fig. 3).
3 Antenna Characteristics Without Cancerous Tumor 3.1 Reflection Coefficient or S11 Parameter Reflection coefficient determines the sum of power reflected or radiated from an antenna [11]. After placing on the surface of the breast phantom, radiation pattern is noted. The x-axis in Fig. 4 defines resonance or operating frequency, which is in the GHz range whereas the y-axis defines the return loss which is in the dB scale. The resonant of the propounded antenna is 2.48 GHz, which falls in the range of the ISM band. The return loss of −49.405 dB proves the performance of the antenna by
182
Fig. 3 Antenna on the human breast phantom surface
Fig. 4 S11 parameter or return loss of the antenna (without cancerous tumor)
S. Sinha et al.
On-Body Microstrip Patch Antenna for Breast Cancer Detection
183
Fig. 5 Far-field radiation pattern (3D view) of the propounded antenna (healthy tissue)
showing the maximum radiation. The bandwidth of the propounded design is noticed at 158.4 MHz (2.4038–2.5622 GHz), thus ensuring safety for all users.
3.2 Far-Field Radiation Pattern In Fig. 5, the radiation pattern of the propounded antenna is elucidated. Since the antenna is made to detect the cancerous tumor, therefore the directivity maintained is unidirectional. Despite being unidirectional, the radiation spreads throughout the organ part is as shown. Practically, the user can also rotate the antenna in all possible sides. Parameters such as directivity is 2.835 dBi, radiation efficiency is −6.846 dB, and total efficiency is −6.846 dB, respectively. Figure 6 is showing the polar view of the far-field radiation pattern of the propounded antenna with 2.61 dBi main lobe magnitude.
3.3 VSWR—Voltage Standing Wave Ratio It is basically a function of measurement of the power radiated from the antenna [12]. The propounded antenna’s VSWR is 1.006 at an operating frequency of 2.48 GHz. This specifies that the antenna’s impedance is matched perfectly with the transmission
184
S. Sinha et al.
Fig. 6 Far-field radiation pattern (polar view) of the propounded antenna
line. For efficient performance, the value of VSWR should be between 1 and 2. The x-axis in Fig. 7 represents frequency in the range of GHz and the y-axis represents VSWR.
3.4 SAR—Specific Absorption Rate The radiation given out by the surrounding tissue is called SAR. It checks the safety level for all users [13]. According to FCC, the specific absorption rate should be less than 2 W/kg to meet the standard [14, 15]. SAR for the propounded antenna is noted 1.18 W/kg at the resonant frequency for 10 g tissue for 1mW of input power (Fig. 8).
4 Antenna Characteristics with Cancerous Tumor To examine the performance of the antenna, a spherical-shaped tumor is constructed in the CST studio suite with a 2 mm radius, and antenna is placed on the surface of the breast phantom. The dielectric property is followed by electric conductivity and permittivity are 0.7 S/m and 55, respectively. By placing it on the varied position of
On-Body Microstrip Patch Antenna for Breast Cancer Detection
185
Fig. 7 VSWR of the antenna (healthy tissue––without cancerous tumor)
the phantom and replacing the position of the tumor, the effects are examined and compared. Figure 9a shows the condition where the tumor is absent. Figure 9b shows the condition where the tumor is in the center (position 1). Figure 9c represents the condition for position 2 (x = 45, y = 0, z = 0) and Fig. 9d represents the condition for position 3 (x = 90, y = 0, z = 0).
4.1 Reflection Coefficient or S11 Parameter Figure 10 depicts a vivid comparison of the S11 parameter by placing the tumor on a different position. Table 2 reflects the resonance frequency and return loss of all positions. From Table 2, it can be summarized that the operating frequency is identical during the absence and presence of a tumor. However, there is a major change in S11. Return loss or S11 is raised rapidly in presence of a tumor and inversely proportional with the distance of antenna and breast surface.
186
S. Sinha et al.
Fig. 8 SAR (Specific Absorption Rate) for 10 g tissue for 1mW of input power
(a)
(b)
(c)
(d)
Fig. 9 Cancerous tumor (malignant) inside human breast in a Without tumor, b Position 1, c Position 2, d Position 3
4.2 Other Characteristics Table 3, represents the compared values of maximum main lobe magnitude, maximum E-field intensity, maximum H-field intensity, and surface current density
On-Body Microstrip Patch Antenna for Breast Cancer Detection
187
Fig. 10 Reflection Coefficient or S11 parameter with and without cancerous tumor of the antenna
Table 2 Resonance frequency and return loss analysis Parameter
Without tumor
With cancerous tumor Position-1
Resonance frequency (GHz)
2.48 −49.405
Return loss or S11 (dB)
Position-2
Position-3
2.485
2.485
2.48
−36.062
−33.271
−24.79
Table 3 Comparison analysis of different characteristics Parameter
Without tumor
With cancerous tumor Position 1
Position 2
Position 3
Main lobe magnitude
2.61
2.67
2.7
2.75
Max. E-field intensity
73659
80138
79755
77259
Max. H-field intensity
560
576
570
562
Surface current density
317
332
329
321
between cancerous tumor and without cancerous tumor. It is identified that the value slightly increases in main lobe magnitude. The numbers of maximum E-field and H-field intensity increase with the presence of tumor and proportion with the distance of tumor [16]. But the surface current density slightly decreases.
188
S. Sinha et al.
5 Conclusion In this research, a rectangular structured microstrip patch antenna with improved parameters has been presented. The purpose of designing the antenna is served in terms of efficiency, size, return loss along with making it functional at a resonant frequency of 2.48 GHz. The propounded design is more improved from the rest of the work in the field of frequency, SAR, radiation pattern, H-field, and E-field. The variation of the frequency curves shows an enormous response in identifying the cancerous and non-cancerous tissue. The VSWR of 1.007 proves that the impedance of the antenna is matched well with the transmission line. Besides, there are significant changes when the results are compared in the presence and absence of the tumor in both electric and magnetic fields. Considering the well-being of the user, SAR is computed as 1.18 W/kg at the operating frequency of 2.48 GHz for 10 g tissue. Hence, it can be said that the devised system is well efficient for the early diagnosis of breast cancer.
References 1. Alsharif F, Kurnaz C (2018) Wearable microstrip patch ultra wide band antenna for breast cancer detection. In: 41st international conference on telecommunications and signal processing (TSP), pp 1–5. Athens, Greece 2. Çalı¸skan R, Gültekin S, Uzer D, Dündar Ö (2015) A microstrip patch antenna design for breast cancer detection. Proc Soc Behav Sci 195:2905–2911 3. Cicchetti R, Faraone A, Caratelli D, Simeoni M (2015) Wideband, multiband, tunable, and smart antenna systems for mobile and UWB wireless applications. Int J Antennas Propagat 4. Gupta KC, Benalla A (1988) Microstrip antenna design. Technology & Engineering, Artech House 5. Saeidi T, Ismail I, Wen WP, Alhawari ARH, Mohammadi A (2019) Ultra-wideband antennas for wireless communication applications. Int J Antennas Propagat 6. Werfelli H, Tayari K, Chaoui M, Lahiani M, Ghariani H (2016) Design of rectangular microstrip patch antenna. In: 2nd international conference on advanced technologies for signal and image processing (ATSIP), pp 798–803, Monastir, Tunisia 7. Hasan RR, Rahman MA, Sinha S, Uddin MN, Niloy TR (2019) In body antenna for monitoring pacemaker. In: International conference on automation, computational and technology management (ICACTM), pp 99–102, London 8. Zhang H, Arslan T, Flynn B (2013) A single antenna based microwave system for breast cancer detection: experimental results. In: Loughborough antennas & propagation conference (LAPC), pp 477–481, Loughborough 9. Hirtenfelder F (2007) Effective antenna simulations using CST MICROWAVE STUDIO®. In: 2nd international ITG conference on antennas, pp 239–239, Munich, Germany 10. Sinha S, Niloy TR, Hasan RR, Rahman MA, Rahman S (2020) A wearable microstrip patch antenna for detecting brain tumor. In: International conference on computation, automation and knowledge management (ICCAKM), pp 85–89, Dubai, UAE 11. Hasan RR, Rahman MA, Sinha S, Uddin MN, Niloy TR (2019) In body antenna for monitoring pacemaker. In: International conference on automation, computational and technology management (ICACTM), pp 99–102. London
On-Body Microstrip Patch Antenna for Breast Cancer Detection
189
12. Sinha S, Hasan RR, Rahman MA, Ali MT, Uddin MN (2019) Antenna design for biotelemetry system. In: International conference on robotics, electrical and signal processing techniques (ICREST), pp 466–469, Dhaka, Bangladesh 13. Islam NA, Arifin F (2016) Performance analysis of a miniaturized implantable PIFA antenna for WBAN at ISM band. In: 3rd international conference on electrical engineering and information communication technology (ICEEICT), pp. 1–5, Dhaka, Bangladesh 14. Safety E, Committee SC, Radiation N, Board IS (2008) IEEE recommended practice for measurements and computations of radio frequency electromagnetic fields with respect to human exposure to such fields 100 kHz –300 GHz. Measurement 2002 15. ICNIRP (2019) Guidelines for limiting exposure to time varying electric, magnetic, and electromagnetic fields. Health Phys 74:494–522 16. Sinha S, Niloy TR, Hasan RR, Rahman MA (2020) In body antenna for monitoring and controlling pacemaker. Adv Sci Technol Eng Syst J 5(2):74–79
Machine Learning with Meteorological Variables for the Prediction of the Electric Field in East Lima, Peru Juan J. Soria , Orlando Poma , David A. Sumire , Joel Hugo Fernandez Rojas , and Maycol O. Echevarria
Abstract Environmental pollution and its effects on global warming and climate change are a key concern for all life on our planet. That is why meteorological variables such as maximum temperature, solar radiation, and ultraviolet levels were analyzed in this study, with a sample of 19564 readings. The data was collected using the Vantage Pro2 weather station, which was synchronized with the time and dates of the electric field measurements made by an EFM-100 sensor. The Machine Learning analysis was applied with the Regression Learner App, from which the linear regression model, regression tree, support vector machine, Gaussian process regression, and ensembles of tree algorithms were trained. The most optimal model for the prediction of the maximum temperature associated with the electric field was the Gaussian Process Regression with an RMSE of 1.3436. Likewise, for the meteorological variable of solar radiation, the optimal model was Regression Tree Medium with an RMSE of 1.3820 and for the meteorological variable of UV level, the most optimal model was Gaussian Process Regression (Rational quadratic) with an RMSE of 1.3410. Gaussian Process Regression allowed for the estimation and prediction of the meteorological variables and it was found that in the winter season at low temperatures the negative electric field is associated with high variability in its behavior; while at high temperatures they are associated with positive electric fields with low variability. Keywords Machine learning · Electric field · Weather variables · Forecast · Regression learner app · Accuracy · Algorithms
J. J. Soria (B) · O. Poma · D. A. Sumire · J. H. F. Rojas · M. O. Echevarria Universidad Peruana Unión, Lima, Peru e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_17
191
192
J. J. Soria et al.
1 Introduction 1.1 Context for the Research Study The atmospheric electric field is a current research area in the East Lima area of Peru, associated with meteorological variables such as maximum temperature, solar radiation, and UV level. In this study, the predictive analytical method of Machine Learning was used to evaluate meteorological variables that influence the electric field in the area of East Lima, Peru. Recent studies in this area contribute to an understanding of the association between atmospheric electricity and the weather [6, 11]. Since 1753, Canton [1] “discovered that the air was generally positively charged with respect to the ground for good weather, with variations associated with different meteorological phenomena, such as precipitation and fog”, this shows the electrical behavior of the geospheric earth and is one of the first works to report the association with meteorological variables influencing the electric field. This close relationship between the electrical state of the atmosphere and changes in climate provided the motivation to continue the study of atmospheric electricity. Harrison suggests that the influence of the global circuit is affected by local factors such as contamination by aerosols, radioactivity, or meteorological disturbances. Harrison and Nicoll [5] and Takashima et al. [18] obtained temporary variations in the aerosol compositions made in the city of Fukuoka. With the development of technology, it became possible to apply modern sensors to monitor the electrical behavior of the atmosphere associated with climatic variation [1]. The data generated from the electric field measured with an EFM sensor was used [15]. According to [19], there is a global atmospheric electric circuit on earth. The variations of the current density of the ionosphere indicate that the current density flows downwards from the ionosphere to the ground as the ocean surface shows day by day the variations associated with the solar activity and with internal changes in the global atmospheric electric circuit. The ionosphere has a potential of approximately 250 kV, maintained by an upward current of approximately 1000 A, from the totality of storm clouds and other highly electrified clouds where the greatest variation in average current is a function of geographic location. According to the observations and models presented by [7], the relative geographic position is very important when considering the installation of electric field measurement (EFM) sensors. The confirmation of the presence of electric charge in the atmosphere raised other questions regarding its association with weather and storms. Thus was the case with Parsons and Mazeas, who confirmed in 1753 that the electric field recently identified in storms is associated with climatic conditions and the local processes of electrification occur in the atmosphere with the absence of appreciable convective clouds, according to Tacza et al. [17], and thus identifying the factors that determine an ecosystem [13]. The relationship between electric field and meteorological variables suggests that the Sun’s magnetic field has an influence. Variations in the solar magnetic field
Machine Learning with Meteorological Variables …
193
Fig. 1 Global atmospheric electric field
and its interplanetary extension, on time scales from days to decades, have been shown to significantly change meteorological variables such as tropospheric pressure, temperature, and lightning rates [15] (Fig. 1). During the past decade, a renewed interest in the problems of the Global Electric Field (GEF) has emerged due to climate applications [20].
2 Literature Review 2.1 Machine Learning Machine Learning is part of artificial intelligence [14] and seeks to provide machines with the ability to learn through data. This means that with quality data, appropriate technologies, and correct analysis, it is possible to create behavioral models to analyze data of high volume and complexity. The process of Machine Learning has interconnected processes, which are data collection, data preparation, model training, model evaluation, and performance improvement. This is also the case for the bayesian networks that are part of artificial intelligence [16]. Currently, in the context of technological convergence and digital transformation, information technologies intervene daily in the lives of people and organizations in a wide range of contexts, making them in many cases indispensable tools and catalysts for greater productivity and decision-making in different fields [14]. Machine learning is the process by which measurements are transformed into parameters for future operations [12]. The machine learning process has interconnected sub-processes, which are data collection, data preparation, model training, model evaluation, and performance improvement. The electric field data were collected with the EFM-100 sensor and the meteorological variables with the Vantage Pro2 sensor [2] equipment that has been installed in the Lima campus of the Universidad Peruana Unión. The study data were synchronized based on the dates and in time intervals of every 5 s, in such a way that they coincide and allow for greater precision. The model training was carried out with the MATLAB software using the Regression Learner App algorithm, obtaining
194
J. J. Soria et al.
adequate performance. The model was evaluated with efficient statistical indicators, which allowed for a high level of precision.
2.2 Regression Learner App The Regression Learner App models [9] are prediction structures that contribute according to the minimum value of the RMSE. The Linear Regression Model, the Regression Tree Model, the Support Vector Machine Model, the Gaussian Process Regression Model, and the Ensembles of Trees Model encompass a wide range within machine learning and seek better accuracy in their prediction which allow for the selection of an optimal model. First, it calculates a weighted average of the noisy observations and is defined by f (x∗ ) = k(x∗ )T (K + σ 2 I )−1 y,
(1)
which is a smoother linear combination of the y values, then the regression of the Gaussian process is also smoother linear [4]. In this study, the smoothing in matrix form helped to predict at training points and then in terms of the equivalent kernel, since the predicted values at the training points were calculated by f (x∗ ) = K (K + σn2 I )−1 y
(2)
3 Materials y Methods 3.1 Methodology This study used MATLAB’s Regression Learner App with Machine Learning prediction for the electric field, in which the Fine Tree, Medium Tree, Ensembles, and Gaussian Process Regression were the models that interacted to obtain the best predictive model [10] as shown in Fig. 2. This study used the Vantage Pro2 Console, Davis equipment, which recorded the measurements of the meteorological variables under study such as maximum temperature, solar radiation, and ultraviolet level [3]. The console works in five modes which are: configuration, current time, maximum graphic mode, minimum graphic mode, and final graphic. The WeatherLink software connects the Vantage Pro2 station to the computer located at the Universidad Peruana Union in the East Lima area as shown in Fig. 3. The electric field sensor installed in the eastern part of the Universidad Peruana Unión campus in Lima recorded electric field data during 2019. As in an earlier
Machine Learning with Meteorological Variables …
195
Fig. 2 Machine learning methodology for electric field prediction
Fig. 3 Location of the vantage Pro2 meteorological station on the Universidad Peruana Unión campus, Lima, Peru, in the East Lima Zone
study, the electric field measurements were made with the EFM-100 atmospheric Electric Field Monitor sensor [2]. New calibrations and measurements with its most powerful electric field sensor for new electro-optical investigations were developed by [21].
4 Results 4.1 Results of Machine Learning Models in Predicting the Electric Field Data collected throughout 2019 on temperature distribution and electric field were matched to exact dates and times by measuring 24 h a day, achieving a direct correlation between the study variables. The data were also normalized and then the Regression Learner App from MATLAB software was used, which obtained the best
196
J. J. Soria et al.
Table 1 Predictive model analysis matrix table Model
RMSE
R-squared
MSE
MAE
Fine tree
1.3869
0.42
1.9234
0.97695
Medium tree
1.3820
0.43
1.9099
0.98492
Ensemble
1.3604
0.44
1.8507
0.97428
Gaussian process regression (Exponential)
1.3462
0.46
1.8123
0.95908
Gaussian process regression (Rational quadratic)
1.3435
0.46
1.8049
0.95638
predictive model from Machine Learning with an RMSE of 1.3435, which optimizes the prediction of the maximum temperature with the electric field, which is shown in Table 1. Table 1 shows the prediction indicators of the Machine Learning models, which were applied in this study, in descending order according to RMSE. The Fine Tree model had an RMSE = 1.3869, the Medium Tree model an RMSE = 1.3820, the Assembly model an RMSE = 1.3604, the Gaussian Process Regression (Exponential) model an RMSE = 1.3462, and the Gaussian Process Regression (Rational Quadratic) model an RMSE = 1.3436. The model with the lowest RMSE was taken, which is Gaussian Process Regression (Rational Quadratic) and which shows a good performance with a determination coefficient of 46% for the comparison of the trained model versus the model where the response is constant and equal to the mean of the training response as shown in Fig. 4. Furthermore, it has a mean absolute error (MAE) of 0.95638.
Fig. 4 Optimal temperature prediction model with electric field
Machine Learning with Meteorological Variables …
197
Fig. 5 Scatter plot of the solar radiation and the electric field
4.2 Description of Solar Radiation with the Electric Field Figure 5 shows the scatter plot between the meteorological variable of solar radiation and the electric field with 19564 measurements, in which MATLAB’s Regression Learner App model predicted an optimal model through Regression trees, specifically a Medium Tree, with an RMSE of 1.382, a determination coefficient (R2 ) of 0.43, an MSE of 1.9099 and an MAE of 0.98492. This means that 43% of the information of the solar radiation was predicted with respect to the electric field.
4.3 Description of the UV Level with the Electric Field In Fig. 6, the scatter plot between the meteorological variable UV level and the electric field with 19564 measurements is shown, in which MATLAB’s Learner App Regression model predicted an optimal model called Gaussian Process Regression, with rational quadratic (GPR), with an RMSE of 1.3410, a coefficient of determination R2 of 0.46, an MSE of 1.806 and an MAE of 0.95606. This means that 46% of the information on the level of UV rays was predicted with respect to the electric field.
198
J. J. Soria et al.
Fig. 6 Scatter plot of the UV level with the electric field
5 Conclusions According to the application of Machine Learning carried out in this study, for the prediction of the meteorological variables in relation to the electric field estimated with the Regression Learner App models, two obtained the highest accuracy in prediction with Gaussian Process Regression models, the maximum temperature (R2 = 0.46) and the UV level (R2 = 0. 46), with an RMSE = 1.3435 and RMSE = 1.3410, respectively. The other main variable solar radiation (R2 = 0.43) provided an RMSE = 1.3820 in which the optimal model was found to be a Regression Tree. The training of the predictive Regression Learner App models with a run time of 6 h, generated a MATLAB 2018 version software code that predicts the electric field with respect to measurements and can be used in future research with the meteorological variables under study.
References 1. 2. 3. 4.
Bennett AJ, Harrison RG (2007) Historical background 62(10) Boltek N (2014) EFM-100 atmospheric electric field monitor guide. www.boltek.com Davis (2019) Vantage Pro2 Manuel de la console. https://www.davisnet.com/legal Eberhard J, Geissbuhler V (2000) Konservative und operative therapie bei harninkontinenz, deszensus und urogenital-beschwerden. Journal fur Urologie und Urogynakologie 7(5):32–46. MIT Press 5. Harrison RG, Nicoll KA (2018) Fair weather criteria for atmospheric electricity measurements. J Atmos Solar Terr Phys 179:239–250. https://doi.org/10.1016/j.jastp.2018.07.008 6. Harrison RG, Marlton GJ (2020) Fair weather electric field meter for atmospheric science platforms. J Electrostat 107. https://doi.org/10.1016/j.elstat.2020.103489 7. Hays PB, Roble RG (1979) A quasi-static model of global atmospheric electricity, 1. The lower atmosphere. J Geophys Res 84(A7):3291. https://doi.org/10.1029/ja084ia07p03291
Machine Learning with Meteorological Variables …
199
8. Lam MM, Freeman MP, Chisham G (2018) IMF-driven change to the Antarctic tropospheric temperature due to the global atmospheric electric circuit. J Atmos Solar Terr Phys 180:148– 152. https://doi.org/10.1016/j.jastp.2017.08.027 9. Mathworks C (2019a) Mastering machine learning a step-by-step guide with MATLAB 10. Mathworks C (2019b) Mastering machine learning a step-by-step guide with MATLAB. https://www.mathworks.com/content/dam/mathworks/ebook/gated/machine-learning-wor kflow-ebook.pdf 11. Nicoll KA, Harrison RG, Barta V, Bor J, Brugge R, Chillingarian A, Chum J, Georgoulias AK, Guha A, Kourtidis K, Kubicki M, Mareev E, Matthews J, Mkrtchyan H, Odzimek A, Raulin JP, Robert D, Silva HG, Tacza J, … Yaniv R (2019) A global atmospheric electricity monitoring network for climate and geophysical research. J Atmospheric Solar-Terrest Phys 184:18–29. https://doi.org/10.1016/j.jastp.2019.01.003 12. Paluszek M, Thomas S (2017) MATLAB machine learning. In: MATLAB machine learning. Apress. https://doi.org/10.1007/978-1-4842-2250-8 13. Rositano F, Bert FE, Piñeiro G, Ferraro DO (2018) Identifying the factors that determine ecosystem services provision in Pampean agroecosystems (Argentina) using a data-mining approach. Environ Dev 25:3–11. https://doi.org/10.1016/j.envdev.2017.11.003 14. Saboya N, Loaiza OL, Soria JJ, Bustamante J (2019) Fuzzy logic model for the selection of applicants to university study programs according to enrollment profile. Adv Intel Syst Comput 850:121–133. https://doi.org/10.1007/978-3-030-02351-5_16 15. Soria JJ, Sumire DA, Poma OSCE (2020) Neural network model with time series for the prediction of the electric field in the East Lima Zone, Peru, vol 2, pp 395–410. https://doi.org/ 10.1007/978-3-030-51971-1_33 16. Sperotto A, Molina JL, Torresan S, Critto A, Pulido-Velazquez M, Marcomini A (2019) A Bayesian networks approach for the assessment of climate change impacts on nutrients loading. Environ Sci Policy 100:21–36. https://doi.org/10.1016/j.envsci.2019.06.004 17. Tacza J, Raulin JP, Macotela E, Norabuena E, Fernandez G, Correia E, Rycroft MJ, Harrison RG (2014) A new South American network to study the atmospheric electric field and its variations related to geophysical phenomena. J Atmos Solar Terr Phys 120:70–79. https://doi. org/10.1016/j.jastp.2014.09.001 18. Takashima H, Hara K, Nishita-Hara C, Fujiyoshi Y, Shiraishi K, Hayashi M, Yoshino A, Takami A, Yamazaki A (2019) Short-term variation in atmospheric constituents associated with local front passage observed by a 3-D coherent Doppler lidar and in-situ aerosol/gas measurements. Atmospheric Environ X:3. https://doi.org/10.1016/j.aeaoa.2019.100043 19. Tinsley BA, Burns GB, Zhou L (2007) The role of the global electric circuit in solar and internal forcing of clouds and climate. Adv Space Res 40(7):1126–1139. https://doi.org/10.1016/j.asr. 2007.01.071 20. Williams E, Mareev E (2014) Recent progress on the global electrical circuit. Atmospheric Res 135–136:208–227. https://doi.org/10.1016/j.atmosres.2013.05.015 21. Zeng R, Zhang Y, Chen W, Zhang B (2008) Measurement of electric field distribution along composite insulators by integrated optical electric field sensor. IEEE Trans Dielectr Electr Insul 15(1):302–310. https://doi.org/10.1109/T-DEI.2008.4446764
Enhanced Honeyword Generation Method Using Advanced DNA Algorithm Nwe Ni Khin and Khin Su Myat Moe
Abstract Today, the security of password files is paramount. There are significant problems for users and companies in various fields. The dangerous attacks for password files are the brute force attack, DoS attacks, and dictionary attacks. Therefore, the researchers try to protect the password files using various algorithms such as honeyword generation algorithm, password hashing algorithm, MD5, and many other algorithms. Among them, the honeyword generation algorithm is one of the best algorithms for attacking the brute force attack. Honeywords generation algorithm is to prevent hackers from attacking the password file by mixing the real and fake passwords stored in the database. The existing honeywords generation algorithm uses the hashing and salting algorithm for creating real and fake passwords to get stronger security. We propose an improved honeywords generation process using an advanced DNA algorithm. The proposed process can save the processing speed. The current DNA algorithm has a weakness in security. Therefore, we apply an advanced DNA algorithm that uses five data lookup tables randomly in our proposed system. Therefore, we can secure and save time by using the improved generation process using an advanced DNA algorithm. We present case studies as a computation and testing results of the DNA algorithm in the paper. Moreover, we describe the process of generating honeywords. Finally, the comparison results of the proposed method and the existing honeywords generation method are described in this paper. Keywords Honeyword generation · DNA sequence · Brute force attack
N. N. Khin (B) · K. S. M. Moe Yangon Technological University, Computer Engineering and Information Technology, Yangon, Republic of the Union of Myanmar e-mail: [email protected] K. S. M. Moe e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_18
201
202
N. N. Khin and K. S. M. Moe
1 Introduction Most organizations around the world want to secretly exchange secret messages through secure channels of communication. So, they use an encryption service with different encryption and decryption algorithms for their own data. Most of the security processes apply the password-based encryption (PBE) algorithm due to the user can easily pick and can’t forgot their passwords. However, current PBE algorithms are weak because most of the attackers using the various attacks, in particularly brute force attacks can easily obtain the keys. Distributing random passwords improves the security method. Setting a secure password is fundamental to keeping the information secure. Therefore, we will introduce an improved honey encryption algorithm that uses advanced DNA algorithm. Improved honeyword generation is a way to obstruct attackers from robbery password files and the password list is stored in the system. The improved honeyword generation algorithm generates the honeywords for deceiving the attackers. The user’s passwords and generating honeywords are stored in the password file in the database. Hackers can attack the password file using a brute force attack. If the attacker gets the password file, he can’t classify which one is the real password because the honeywords and real password are stored in the password file. DNA cryptography works on the concept of DNA computing. In the security area, the concept of the use of DNA computing has emerged as a potential technology that could create new hope for inviolable algorithms. DNA cryptography is used for secure data storage, authentication, digital certificates. The following Fig. 1 is the DNA algorithm. In our proposed system, we combine the two algorithms of improved honeyword generation and enhanced DNA algorithm for overcoming the weakness of the existing system. This proposed paper is made up of five sections. Review of literature on the process of honey encryption and DNA encryption. Section 2 discusses the previous work. Section 3 describes the process of DNA encryption and honeywords generation and the enhanced DNA code sequences. Section 4 focuses on the flowchart of the proposed system. Section 5 discusses the testing results and Sect. 6 presents a comparative study on the honeywords generation method and finally concludes the paper.
Password Hello123
Translate and Encrypt
DNA sequence TTTTAACTACTATGTCAACCCCG
Fig. 1 A figure shows DNA algorithm [1]
Translate and Decrypt
Password Hello123
Enhanced Honeyword Generation Method …
203
2 Literature Review Bonny et al. discussed [2]. Firstly, the plaintext is converted to ASCII value, ASCII value is converted to binary value and then the binary value is converted to DNA code. Secondly, random key is generated in the range of 1–256 which corresponds to the permutation of four characters, A, T, G, and C to produce the ciphertext. Thirdly, decryption is taken place. This system got data confidentiality and data integrity over data transmission. Moe and Win proposed [3] is solved the typo error, storage overhead, and time complexity using hashing algorithm in honeyword generation. Noorunnisa, N. S, Dr. Afree, K. R. discussed Honey Encryption combining with OTP (Vernam Cipher) to encrypt the original message. Increased security level and time consumption are similar to a blowfish [4]. The system of [5] was proposed by Pushpa, B. R. The system is the more powerful system for attacking the attacks. But the key length of ASCII is long. Mavanai et al. proposed [6]. This system performing transposition and folding operations to increase security and prevent brute force attacks but increase complexity. The proposed system [7] was introduced by Kumar, B.R., Sri, S., Katamaraju, G.M.S.A., Rani, P., Harinadh, N., and Saibabu, Ch. This paper includes several intermediate steps, to hide the data due to conceal information from attackers and hijackers and to disseminate their information securely. The proposed method mainly deals with two processes, the Honey Encryption process and the DNA Encryption Algorithm. Honeyword generation passwords using the Honey Encryption process and encryption keys from the user’s passwords are produced using a DNA encryption algorithm. In the key or password distribution section, the resulting DNA code is randomly mapped to seed space using the DTE process. In addition, we propose enhanced DNA encoding using five data lookup tables in the password encoding process. The proposed system protects the brute force attacks and saves processing time.
3 Background Theory The purpose of this idea is to overcome time complexity and to increase the security level. In addition, we sample the DNA encryption algorithm to secure the password file during the sweetwords encoding process.
204
N. N. Khin and K. S. M. Moe
3.1 Enhanced Honeyword Generation Passwords are notoriously infamous and users often choose poor and repeated passwords. Therefore, the honeyword generation process generates the honeywords for deceiving the attackers. The purpose of enhanced honeywords generation methods is to issue an appeal to remove the attackers. In the enhanced honeywords generation process, the real password and generating honeywords are stored in the main server and honey checker. The storage process of the main server and honeychecker are shown in Tables 1 and 2. Honeychecker is a backup server that can be used to categorize the combination of real and honeywords (sweetwords) and that stores confidential information such as authentic passwords and DNA sequences. When the user enters the system, the main server checks the login passwords with the honeychecker. When entering a honeyword, system administration can be immediately identified by honeychecker. There are two main processes of honeychecker: the first step is to distinguish between the fake and actual passwords when entering into the system. Another task is to send a warning message to the administrator when entering honeywords [8]. Firstly, the original passwords are saved as indexes. The main server stores the username and indexes of honeywords. Honeychecker stores the index of real passwords only after converting to DNA sequence [9]. Table 1 Main server [9]
Table 2 Honeychecker [9]
User name
Index of honeywords
Sophia
(28,18,31,49)
Emily
(15,18,31,49)
Victor
(15,28,31,49)
Grace
(15,18,28,49)
Hazel
(15,28,18,31)
Index of real passwords
DNA sequence
15
AGTGGTCAG
28
GATGGATCA
18
CGAACTTGT
31
TAAGAGGTC
49
GCTCGACAT
Enhanced Honeyword Generation Method …
205
3.2 Enhanced DNA Algorithm DNA Encryption is the process of converting text containing numbers and special characters into DNA sequences [10]. Enhanced DNA Algorithm steps are followed: Step 1: Create the five DNA lookup tables for the passwords such as alphabets, numbers, special characters, and symbol as 64 * 5 = 320 words are randomly encoded. Step 2: The passwords are converted into the 3-based DNA sequence using a random DNA lookup table. For example, the password is “my secret!”. The DNA code of the password will be CCG CTT ACT CCT CCA TCT AGG CCA CTA CAC (see Tables 3, 4, 5, 6 and 7).
4 Flowchart of the Proposed System This proposed system flowchart includes the following two steps as shown in Fig. 2. The database file stores the sugarword and honeywords. For a new user, it is needed to register and it will generate a password for the new user. After the registration process, the user can enter the system by using his username and password. If it’s not the registered user, the password doesn’t exist in the database and the server is retrieving a fail login message. If the password is in the database, the server generates the passwords to honeychecker to distinguish between real passwords and honeyword. If the password is correct, the honeychecker allows this user to access it. Otherwise, the administrator sends an alert message to add honeywords to the honeychecker system. DNA algorithm is used for key generation. The first step is to Table 3 Data lookup Table 1 A = AAA B = ACA C = AGA D = ATA
E = GAA
G= GGA
H = GTA
I = AAC
M = GAC N = GCC O = GGC
P = GTC
J = ACC
K = AGC L = ATC
F = GCA
Q = AAG R = ACG S = AGG
T = ATG
U = GAG V = GCG W = GGG
X = GTG
Y = AAT
Z = ACT
1 = AGT
2 = ATT
3 = GAT
4 = GCT
5= GGT
6 = GTT
7 = CAA
8 = CCA
9 = CGA
0 = CTA
! = TAA
@ = TCA # = TGA
$ = TTA
* = CAC
? = CCC
/ = CGC
> = CTC
< = TAC
~ = TCC
Space = | = TTC TGC
\\ = CAG
_ = CCG
= = CGG + = CTG - = TAG
, = TCG
. = TGG : = TTG
; = CAT
% = CCT & = CGT ˆ = CTT
) = TCT
[ =TGT ] = TTT
( = TAT
206
N. N. Khin and K. S. M. Moe
Table 4 Data lookup Table 2 ; = AAA
% = ACA & = AGA ˆ = ATA
( = GAA
) = GCA
[= GGA
\\ = AAC
_ = ACC
= = AGC + = ATC - = GAC
, = GCC
. = GGC : = GTC
* = AAG
? = ACG
/ = AGG
> = ATG
< = GAG
~ = GCG
Space = | = GTG GGG
7 = AAT
8 = ACT
9 = AGT
0 = ATT
! = GAT
@ = GCT # = GGT
$ = GTT
Y = CAA Z = CCA
1 = CGA
2 = CTA
3 = TAA
4 = TCA
5= TGA
6 = TTA
Q = CAC R = CCC
S = CGC
T = CTC
U = TAC
V = TCC
W= TGC
X = TTC
I = CAG
J = CCG
K = CGG L = CTG M = TAG N = TCG
O= TGG
P = TTG
A = CAT
B = CCT
C = CGT
G= TGT
H = TTT
D = CTT E = TAT
F = TCT
] = GTA
Table 5 Data lookup Table 3 A = AAA I = ACA
Q = AGA Y = ATA 7 = GAA * = GCA
\\ = GGA
; = GTA
B = AAC J = ACC
R = AGC
Z = ATC 8 = GAC
?= GCC
_ = GGC
% = GTC
C = AAG K = ACG S = AGG
1 = ATG 9 = GAG
/= GCG
= = GGG &| = GTG
D = AAT
L = ACT
2 = ATT
0 = GAT
>= GCT
+ = GGT ˆ = GTT
E = CAA
M = CCA U = CGA 3 = CTA
! = TAA
= AGT
0 = ATT
2 = GAT
T = GCT
L = GGT
( = CAA
- = CCA
= AGA
_ = ATA
. = GAA
I = AAC
B = ACC
@ = AGC < = ATC
= = GAC : = GCC ˆ = GGC
[ = GTC
P = AAG
J = ACG
C = AGG
# = ATG
~ = GAG
+= GCG
(=GTG
V = AAT
Q = ACT
K = AGT
D = ATT
$ = GAT
Space = - = GGT GCT
% = GTT
1 = CAA
W = CCA R = CGA
L = CTA
E = TAA
*= TCA
|=TGA
, = TTA
5 = CAC
2 = CCC
X = CGC
S = CTC
M = TAC F = TCC
? = TGC
\\ = TTC
8 = CAG
6 = CCG
3 = CGG
Y = CTG T = TAG
N= TCG
G = TGG / = TTG
0 = CAT
9 = CCT
7 = CGT
4 = CTT
U= TCT
O = TGT H = TTT
Z = TAT
; = GGG
5 Testing Results Python programming is used for the experimental results. The results are using with the processor AMD Ryzen 5 3500 U and memory 8.00 GB. Table 8 shows the process of the code of Python 3 compiler of the DNA program by testing the same password “hello123” three times, but the encrypted output of the process of DNA sequence is not the same.
208
N. N. Khin and K. S. M. Moe Start Yes
Register
New Member? No
User Password No
Login Registration Success?
Choosing Data Lookup Table to produce DNA code Using Random Process
Yes
Server checks the key
Yes
DNA Code
Honeychecker classifies honeywords and sugarword
No
Honeyword
Unsuccess Login
Sweetwords are stored in Database (Honeywords + Sugarword)
Sugarword
Login Success
Raise an alarm
End
Fig. 2 Flowchart of proposed system
Table 8 Different outputs from “hello123” No
Password
DNA outputs
Execution time (ms)
1
hello123
TTTTAACTACTATGTCAACCCCGG
0.4947
2
hello123
CATCAAACTACTCCGATGATTCTA
0.3609
3
hello123
TTTTTAGGTGGTTGGGAGGATTAA
0.4886
6 Comparative Study We study the time complexity tasks with the different word lengths, typo safety, and storage overhead are compared to our proposed system with the current system.
Enhanced Honeyword Generation Method … Table 9 Time comparison of existing and our proposed algorithm
209
Password length
Existing method
Current method
7
3.050004
1.59
8
3.100004
2.29
9
3.140004
2.34
10
3.18005
2.37
Fig. 3 Results chart of time complexity
6.1 Time Complexity In this section, we compared other generations of honeywords, including our proposed models due to the complexity of DNA and password security. When trying to use different passwords of different lengths in the proposed method, the time complexity is less than the current method. Table 9 shows the results of experiments with different lengths of passwords, such as COSE789, Floral * 3, GRATIS75%, and silicon32G”. The existing method [3] is more complicated than the current method (see Fig. 3).
6.2 Flatness To test the level of security, the honeywords generation algorithm uses flatness. The flatness level determines which generation of beekeeping keywords is the best way to secure our applications. Flatness calculates how many times an attacker receives a password. Compare the 1/s results of the probability attacker that w is the attacker w and the probability of this attack. When w >= 1/s, the production of honeywords is approximately flat and an enemy can guess the correct random password. Otherwise, w < s is perfectly flat [11].
210
N. N. Khin and K. S. M. Moe
Table 10 Comparison results of previous and our proposed method [3] No
Method
Flatness
Typo safety
Storage overhead
1
Chaffing by tweaking
1/s
Low
(s − 1) * n
2
Password model
1/s
High
(s − 1) * n
3
Take a tail
1/s
High
(s − 1) * n
4
Paired distance protocol
1/s
High
(1 + RS) * n
5
Improved honeywords generation
1/s
High
(1 * n)
6
Our proposed method
1/s
High
(1 * n)
Where s = Number of sweetwords in the password file, n = Number of users, RS = Random String
6.3 Typo Safety Typo safety is the major problem of existing honeywords generation algorithms. This honeywords generation algorithm causes similar problems due to similar honeywords creation [11].
6.4 Storage Overhead The previous honeywords production process produces at least 20 honeywords to deceive the attacker. As a result, attackers are in conflict and cannot easily identify any real or incorrect passwords. However, each user has at least 21 passwords and the database storage problem has become a problem. In our proposed method, we refer to it as a sweetword that contains a real password (sugarword) and honeywords. Because the user has only one password, honeywords becomes (s − 1). Under the current system, the Communication Distance Protocol (CDP) is the optional storage problem. Since our system uses member passwords as honeyindexes and can generally reduce storage overhead rather than the PDP algorithm as shown in Table 10.
7 Conclusion Our proposed honeywords production process using the new DNA algorithm can save more processing time than the existing system. We use the existing enhanced honeywords generation algorithm that can reduce the storage overhead problem. Therefore, our proposed system can solve the storage overhead problem. Moreover, we use an advanced DNA algorithm in the honeywords generation process, our proposed system can get better security. In the future, we will apply our proposed system in the transferring of a large amount of data in many organizations.
Enhanced Honeyword Generation Method …
211
References 1. Marlene, Bachand G (2020) Researchers storing information securely in DNA. 24 August 2020 from http://phys.org/news/2016-17-dna.html 2. Bonny BR, Vijay JF, Mahalakshmi T (2016) Secure data transfer through DNA cryptography using symmetric algorithm. Int J Comput Appl (0975–8887) 133(2) 3. Moe KSM, Win T (2017) Improved hashing and honey-based stronger password prevention against brute force attack. In: 2017 international symposium on electronics and smart devices. IEEE. 978-1-5386-2778-5/17/$31.00 4. Noorunnisa NS, Afree KR (2019) Honey encryption based password manager. JETIR 6(5). www.jetir.org (ISSN-2349-5162) 5. Pushpa BR (2017) A new technique for data encryption using DNA sequence. In: International conference on intelligent computing and control (I2C2) 6. Mavanai S, Pal A, Pandey R, Nadar D (2019) Message transmission using DNA crypto-system. Int J Comput Sci Mobile Comput 8(4):108–114 7. Kumar BR, Sri S, Katamaraju GMSA, Rani P, Harinadh N, Saibabu C (2020) File encryption and decryption using DNA technology. In: Second international conference on innovative mechanisms for industry applications (ICIMIA 2020). IEEE. Xplore Part Number: CFP20K58ART; ISBN: 978-1-7281-4167-1 8. Juels A, Revist RL (2013) Honeywords making password cracking detectable. In: MIT CSAIL 9. Moe KSM, Win T (2018) Protecting private data using improved honey encryption and honeywords generation algorithm. Adv Sci Technol Eng Syst J 3(5):311–320 10. Omer A (2015) DNA cryptography algorithms and applications. Hitec University 11. Gross D (2013) 50 million compromised in Evernote hack. CNN
A Review: How Does ICT Affect the Health and Well-Being of Teenagers in Developing Countries Willone Lim , Bee Theng Lau , Caslon Chua , and Fakir M. Amirul Islam
Abstract In developing agendas regarding teenagers’ use of information and communication technologies (ICTs) in developing countries, both health and wellbeing are typically ignored or assumed to have minor impacts on their lives. The purpose of this study is to describe the positive and negative effects of adopting ICT on teenagers’ health and well-being in developing countries. Several databases were searched to identify articles investigating the positive impacts, negative impacts, and evaluation of mobile health (mHealth) technologies in developing countries. The analyses concluded that teenagers in developing countries are leveraging mHealth applications to access health services, information and interact with physicians remotely. However, the long-term effect of ICT use can be seen in depressive symptoms, musculoskeletal pain, or even anxiety. Many researchers have yet to explore the potential of ICTs from different aspects of teenagers’ health and well-being; however, the negative impacts are more pervasive where ongoing studies have been conducted in the past years. The review provides insight into the benefits of ICT on teenagers in developing countries, but it is crucial to be aware of the negative implications on future health and well-being.
W. Lim (B) · B. T. Lau Faculty of Engineering, Computing and Science, Swinburne University of Technology, Jalan Simpang Tiga, 93350 Kuching, Sarawak, Malaysia e-mail: [email protected] B. T. Lau e-mail: [email protected] C. Chua Faculty of Science, Engineering and Technology, Swinburne University of Technology, Hawthorn, VIC 3122, Australia e-mail: [email protected] F. M. A. Islam Faculty of Health, Arts and Design, Swinburne University of Technology, Hawthorn, VIC 3122, Australia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_19
213
214
W. Lim et al.
Keyword ICT · Impacts · Health · Well-being · Teenagers · Developing countries · Digital devices
1 Introduction The unstoppable advancement of information and communication technologies (ICTs) has brought sweeping changes to the new generation in terms of health and well-being. Implicitly, people between ages 13 and 19, referred to as teenagers or more commonly known as digital natives, are the first generation born in this evolving information age [1]. In other words, this generation was born into a world that was already technologically advanced, that are fundamental to the way that they communicate, learn, and develop. Internet, computer, mobile device, and televisions—all play a formative role in their daily activities [2]. Despite the encouraging sign of the ICT revolution, unfortunately, the inequalities or digital divides are still at large between developed and developing countries. According to United Nations, only 33.33% of the population has internet access in developing countries, compared to 82% in developed countries [3]. Although evidence of both positive and negative impacts of ICT are available, limited research has been conducted on the impacts of ICT, specifically in developing countries. Therefore, this review was conducted to understand the determinants, correlations, and consequences of ICT on health and well-being among teenagers in developing countries, which are critical for informing preventive interventions or measures that may benefit them. The current review is meant to provide insights on the impacts of ICT on health and well-being among teenagers and to identify the gaps for further research in developing countries. This review paper is divided into introduction, methodology, results, conclusion, and future works.
2 Methodology 2.1 Search Strategy The literature was gathered from leading academic databases, with 21,797 from ScienceDirect, 22,263 from Scopus and Google Scholar, to search for articles related to ICT impacts on teenagers’ health and well-being in developing countries. The initial search within each database returned a large number of articles; hence search terms were used with context-related terms such as “health and well-being”, “developing countries”, and “mHealth". The papers collected are only peer-reviewed journal research articles. Additional studies to examine for potentially relevant articles with three (3) from reference lists of all included articles and two (2) from websites. A total of 31 articles after inclusion and exclusion has been applied; were selected for further analysis.
A Review: How Does ICT Affect the Health …
215
2.2 Inclusion and Exclusion Criteria Studies were included if they met the following criteria: (1) focused on teenagers between ages 13 and 19; (2) utilized technologies (mHealth) to improve health; (3) included positive or negative health and well-being outcome; (4) occurring in developing countries; (5) residing in urban and rural areas (6) were published between 2015 and 2020; and (7) written in the English language. In-depth analyses were conducted to eliminate irrelevant articles. The following data were abstracted (1) articles that do not cover the teenagers’ age group, (2) articles published before 2015 and after 2020, and (3) research from developed countries. Once the articles were collected, the exclusion criteria were applied as part of the title and abstract review.
3 Results and Discussions The study of the literature shows that there has been significant research done in the field of ICT, health, and well-being. Major findings have highlighted some main subjects: adoption of ICT, gender gaps, introduction to mobile health, and the detrimental impacts of ICT.
3.1 Adoption of ICT The effects of ICT on teenagers in developing countries have been more visible than ever. However, many questions have been raised about whether its effects are beneficial or damaging teenagers’ health and well-being [4–6]. Several studies have aimed to analyze the adoption of ICT, with different results; improved healthcare services through mobile applications [7], promoting health education [8]; in contrast, excessive use causes vision impairment [9], musculoskeletal pain [10, 11] and depression [12–16].
3.2 Gender Gaps Additionally, many policymakers and practitioners promote ICT in developing countries in the hope that to broaden access among teenagers in improving lifestyle [17]; even so, the gender inequalities are proven as a point of concern [18]. There has been considerable gender digital divide where girls are marginalized or excluded from accessing health resources [19–21]. Besides, teenagers from geographically disadvantaged areas often have limited benefits in using ICT to improve health and
216
W. Lim et al.
well-being. Despite the barrier, many teenagers from developing countries are willing and actively seeking digital health services [5, 22]. Given this omnipresent role of ICT, teenagers are more digitally vulnerable by its nature compared to the previous generation [23, 24]. Although it is evident that the introduction of ICT to teenagers in developing countries can boost their overall health and well-being, but the dependencies on technologies may have adverse health outcomes [25] and dangerous long-term effects on well-being.
3.3 Mobile Health The emergence of mobile health (mHealth) has been seen as an imperative steady growth, especially in developing countries, where the prevalence of adopting mobile health in the global landscape; Africa (67%), Eurasia (26%), and Latin America (13%) [7]. Mobile Health provides teenagers with opportunities to access healthcare services and information virtually [8]. It was widely used as part of a health intervention in low-income countries to promote public health campaigns [7, 26] and many teenagers are willing or intended to adopt mHealth services [27, 28].
3.4 Detrimental Impacts of ICT Health. A recent study was conducted to examine students’ habits of using electronic devices, average hours, viewing distance, and posture when using devices. The results showed that 33.3% of students use digital devices for 2–4 (43.6%) hours a day. Further study concluded that 27% of students experienced eyestrain when using devices while lying down and the prolonged usage may lead to new challenges of digital eyestrain [9]. Another article investigated the relationship between musculoskeletal symptoms and computer usage with the aimed to compare different variables; sociodemographic, musculoskeletal pain, and physical activity level. Their findings indicated 65.1% of students experienced musculoskeletal pain in the anatomical region, such as thoracolumbar spine (46.9%), upper limbs (20%), cervical spine (18.5%), and scapular region (15.8%) [11]. The study was conducted to determine the association between screen-time and body weight using a cross-sectional survey to investigate weight status, screen-time viewing, and students’ demographics. The results revealed a significant relation between screen-time usage and weight status, with 14.4% of students are categorized as overweight while 11.9% were obese and 36.8% among them has exceeded the recommended daily screen-time viewing of 2 h per day. In comparing urban and rural areas, students from urban areas are more likely to be obese than those from rural areas [29] (see Table 1). Well-being. A cross-sectional study reported on the association between mobile phone use and depressive symptoms. The authors described that depressive symptoms increased when the mobile phone is used extensively where 19.1% of students
A Review: How Does ICT Affect the Health …
217
Table 1 Studies on ICT and health Publication
Scope
Age group
Region
Positive impacts
Negative impacts
Cai et al., 2017
Screen-time viewing and overweight
Students aged 10–18 years old
Asia (China)
–
14.4% were overweight and 11.9% obese
Hoque, 2016
mHealth adoption in a developing country
Students
Asia (Dhaka, Bangladesh)
Intention to adopt mHealth
–
Ichhpujani et al., 2019
Visual implications of digital device usage
Students aged 11–17 years old
Asia (India)
–
Eye strain after long hours of devices usage
Queiroz et al., 2017
Musculoskeletal pain related to electronic devices
Students aged 10–19 years old
South America (São Paulo, Brazil)
–
Females are more likely to have musculoskeletal pain
Silva et al., 2016
Computer use and musculoskeletal pain
Students aged 14–19 years old
South America (Pernambuco, Brazil)
–
Cervical and lower back pain
Asia (China)
–
High screen-time associated with poor health
Wang et al., High screen-time Students aged 2018 usage 13–18 years old
who used mobile phones for more than 2 h on weekdays and 18.3% who used mobile devices for more than 5 h during weekends are associated with the increased depressive disorder [14]. The authors aim to determine the relationship between the duration of using gadgets and mental-emotional health by conducting a cross-sectional study to investigate psychological attributes, emotional symptoms, behavioral problems, and prosocial behaviors. The study found that 55.2% of students are having an abnormal mental-emotional state when using gadgets for more than 10 h a week [30]. Based on the authors, the study examined the relationship between suiciderelated behaviors and the use of mobile phones. Results from the survey showed that extensive use of mobile phones indirectly causes depression with the risk of suicide-related and self-harming behaviors [31] (see Table 2).
4 Conclusion and Future Works The interaction with information and communication technology is increasing among teenagers, particularly in developing countries. It brings diverse effects on their
218
W. Lim et al.
Table 2 Studies on ICT and well-being Publication
Scope
Age group
Region
Positive impacts
Negative impacts
Chen et al., 2020
Mobile phone use and suicidal intention
Students aged 13–18 years old
Asia (China)
–
Devices usage causes suicidal actions
Demirci et al., Impact of 2015 smartphones on health
Students
Asia (Isparta, Turkey)
–
Use of smartphone causes depression
Liu et al., 2019
Prolonged mobile phone use
Students aged 15 years old
Asia (China)
–
Risk of getting depression
Van Der Merwe, 2019
ICT use pattern and well-being of open distance learning students
Students
Africa (South Africa)
Low risk of overuse symptoms
–
Wahyuni et al., 2019
Gadgets and mental health
Students aged 8–16 years old
Asia (Indonesia)
–
Digital gadgets affect mental health
Yan et al., 2017
Screen-time Students aged and well-being 13–18 years old
Asia (Wuhan, China)
–
High screen-time and increase in BMI
Zhao et al., 2017
Internet addiction and depression
Asia (China)
–
Internet addiction causes depression
Students
health and well-being. In terms of health perspective, mobile health application has the potential to deliver fundamental health support remotely, which is beneficial for teenagers from rural areas. However, the main findings in this literature review directed to the negative implications of ICT toward teenagers in developing countries. The extended use of digital devices has contributed to the risk of getting health complications such that musculoskeletal pain around the cervical and lumbar region [11], increase in weight among teenagers from urban areas [29], and eye strain causing discomfort, dryness as well as the blurring of vision [9]. This resulted in teenagers’ negative well-being that may lead to depression, mental-emotional health, or even suicidal behavior. Based on the studies reviewed, limited research was presented about the positive impacts of ICT on health and well-being—for instance, a significant underrepresentation on the adoption of mHealth and its potential benefits on teenagers. Hence, researchers have limited knowledge on the effectiveness of mHealth intervention,
A Review: How Does ICT Affect the Health …
219
particularly in developing countries—besides, findings on how ICT contributed to teenagers’ well-being are noticeably low. A more extensive study should be carried out in addition to the future scope for researchers to understand the benefits of mHealth to teenagers in developing countries. Researchers could investigate the adoption of mHealth in developing countries globally to understand how different populations are leveraging the existing mHealth to improve health care. Nevertheless, future research is needed to study the impacts of ICTs on well-being, particularly in the context of underdeveloped countries, to provide information on how to improve the overall well-being or quality of life among teenagers. The findings in this review would be an important contribution to the growing body of evidence investigating the impacts of ICT on health and well-being among teenagers in developing countries.
References 1. Akçayir M, Akçayir G, Pekta¸s HM, Ocak MA (2016) Augmented reality in science laboratories: the effects of augmented reality on university students’ laboratory skills and attitudes toward science laboratories. Comput Human Behav 57:334–342. https://doi.org/10.1016/j.chb.2015. 12.054 2. Areepattamannil S, Khine MS (2017) Early adolescents’ use of information and communication technologies (ICTs) for social communication in 20 countriesExamining the roles of ICTrelated behavioral and motivational characteristics. Comput Human Behav 73:263–272. https:// doi.org/10.1016/j.chb.2017.03.058 3. United Nations: Information and communication technologies (ICTs) | Poverty Eradication. https://www.un.org/development/desa/socialperspectiveondevelopment/issues/inform ation-and-communication-technologies-icts.html 4. Amra B, Shahsavari A, Shayan-Moghadam R, Mirheli O, Moradi-Khaniabadi B, Bazukar M, Yadollahi-Farsani A, Kelishadi R (2017) The association of sleep and late-night cell phone use among adolescents. J Pediatr (Rio J) 93:560–567. https://doi.org/10.1016/j.jped.2016.12.004 5. Singh AP, Misra G (2015) Pattern of leisure-lifestyles among Indian school adolescents: contextual influences and implications for emerging health concerns. Cogent Psychol 2. https://doi. org/10.1080/23311908.2015.1050779 6. Wang H, Zhong J, Hu R, Fiona B, Yu M, Du H (2018) Prevalence of high screen time and associated factors among students: a cross-sectional study in Zhejiang, China. BMJ Open 8:9–12. https://doi.org/10.1136/bmjopen-2018-021493 7. Ippoliti NB, L’Engle K (2017) Meet us on the phone: mobile phone programs for adolescent sexual and reproductive health in low-to-middle income countries. Reprod Health 14:1–8. https://doi.org/10.1186/s12978-016-0276-z 8. Laidlaw R, Dixon D, Morse T, Beattie TK, Kumwenda S, Mpemberera G (2017) Using participatory methods to design an mHealth intervention for a low income country, a case study in Chikwawa, Malawi. BMC Med Inform Decis Mak 17:1–12. https://doi.org/10.1186/s12911017-0485-6 9. Ichhpujani P, Singh RB, Foulsham W, Thakur S, Lamba AS (2019) Visual implications of digital device usage in school children: a cross-sectional study. BMC Ophthalmol 19:1–8. https://doi.org/10.1186/s12886-019-1082-5 10. Queiroz LB, Lourenço B, Silva LEV, Lourenço DMR, Silva CA (2018) Musculoskeletal pain and musculoskeletal syndromes in adolescents are related to electronic devices. J Pediatr (Rio J) 94:673–679. https://doi.org/10.1016/j.jped.2017.09.006
220
W. Lim et al.
11. Silva GRR, Pitangui ACR, Xavier MKA, Correia-Júnior MAV, De Araújo RC (2016) Prevalence of musculoskeletal pain in adolescents and association with computer and videogame use. J Pediatr (Rio J) 92:188–196. https://doi.org/10.1016/j.jped.2015.06.006 12. Demirci K, Akgönül M, Akpinar A (2015) Relationship of smartphone use severity with sleep quality, depression, and anxiety in university students. J Behav Addict 4:85–92. https://doi.org/ 10.1556/2006.4.2015.010 13. Huang Q, Li Y, Huang S, Qi J, Shao T, Chen X, Liao Z, Lin S, Zhang X, Cai Y, Chen H (2020) Smartphone use and sleep quality in chinese college students: a preliminary study. Front Psychiatry 11:1–7. https://doi.org/10.3389/fpsyt.2020.00352 14. Liu J, Liu C, Wu T, Liu BP, Jia CX, Liu X (2019) Prolonged mobile phone use is associated with depressive symptoms in Chinese adolescents. J Affect Disord 259:128–134. https://doi. org/10.1016/j.jad.2019.08.017 15. Van Der Merwe D (2019) Exploring the relationship between ICT use, mental health symptoms and well-being of the historically disadvantaged open distance learning student: a case study. Turkish Online J Distance Educ 20:35–52. https://doi.org/10.17718/tojde.522373 16. Zhao F, Zhang ZH, Bi L, Wu XS, Wang WJ, Li YF, Sun YH (2017) The association between life events and internet addiction among Chinese vocational school students: the mediating role of depression. Comput Human Behav 70:30–38. https://doi.org/10.1016/j.chb.2016.12.057 17. Banaji S, Livingstone S, Nandi A, Stoilova M (2018) Instrumentalising the digital: adolescents’ engagement with ICTs in low- and middle-income countries. Dev Pract 28:432–443. https:// doi.org/10.1080/09614524.2018.1438366 18. The impact of ICT on children and teenagers (2020). https://en.unesco.org/news/impact-ictchildren-and-teenagers 19. Bhandari A (2019) Gender inequality in mobile technology access: the role of economic and social development *. Inf Commun Soc 22:678–694. https://doi.org/10.1080/1369118X.2018. 1563206 20. Danjuma KJ, Onimode BM, Onche OJ (2015) Gender issues & information communication technology for development (ICT4D): prospects and challenges for women in Nigeria. http:// arxiv.org/abs/1504.04644 21. Rashid AT (2016) Digital inclusion and social inequality: gender differences in ICT access and use in five developing countries. Gend Technol Dev 20:306–332. https://doi.org/10.1177/097 1852416660651 22. Maloney CA, Abel WD, McLeod HJ (2020) Jamaican adolescents’ receptiveness to digital mental health services: a cross-sectional survey from rural and urban communities. Internet Interv 21:100325. https://doi.org/10.1016/j.invent.2020.100325 23. Mamun MA, Hossain MS, Siddique AB, Sikder MT, Kuss DJ, Griffiths MD (2019) Problematic internet use in Bangladeshi students: the role of socio-demographic factors, depression, anxiety, and stress. Asian J Psychiatr 44:48–54. https://doi.org/10.1016/j.ajp.2019.07.005 24. Moreau A, Laconi S, Delfour M, Chabrol H (2015) Psychopathological profiles of adolescent and young adult problematic Facebook users. Comput Human Behav 44:64–69. https://doi. org/10.1016/j.chb.2014.11.045 25. Yan H, Zhang R, Oniffrey TM, Chen G, Wang Y, Wu Y, Zhang X, Wang Q, Ma L, Li R, Moore JB (2017) Associations among screen time and unhealthy behaviors, academic performance, and well-being in Chinese adolescents. Int J Environ Res Public Health 14:1–15. https://doi. org/10.3390/ijerph14060596 26. Mohan B, Sharma S, Sharma S, Kaushal D, Singh B, Takkar S, Aslam N, Goyal A, Wander GS (2017) Assessment of knowledge about healthy heart habits in urban and rural population of Punjab after SMS campaign—a cross-sectional study. Indian Heart J 69:480–484. https:// doi.org/10.1016/j.ihj.2017.05.007 27. Alam MZ, Hu W, Kaium MA, Hoque MR, Alam MMD (2020) Understanding the determinants of mHealth apps adoption in Bangladesh: a SEM-Neural network approach. Technol Soc 61:101255. https://doi.org/10.1016/j.techsoc.2020.101255 28. Hoque MR (2016) An empirical study of mHealth adoption in a developing country: the moderating effect of gender concern. BMC Med Inform Decis Mak 16:1–10. https://doi.org/ 10.1186/s12911-016-0289-0
A Review: How Does ICT Affect the Health …
221
29. Cai Y, Zhu X, Wu X (2017) Overweight, obesity, and screen-time viewing among Chinese school-aged children: national prevalence estimates from the 2016 physical activity and fitness in China—the youth study. J Sport Heal Sci 6:404–409. https://doi.org/10.1016/j.jshs.2017. 09.002 30. Wahyuni AS, Siahaan FB, Arfa M, Alona I, Nerdy N (2019) The relationship between the duration of playing gadget and mental emotional state of elementary school students. Open Access Maced J Med Sci 7:148–151. https://doi.org/10.3889/oamjms.2019.037 31. Chen R, Liu J, Cao X, Duan S, Wen S, Zhang S, Xu J, Lin L, Xue Z, Lu J (2020) The relationship between mobile phone use and suicide-related behaviors among adolescents: the mediating role of depression and interpersonal problems. J Affect Disord 269:101–107. https:// doi.org/10.1016/j.jad.2020.01.128
Multi-image Crowd Counting Using Multi-column Convolutional Neural Network O˘guzhan Kurnaz
and Cemal Hanilçi
Abstract Crowd density estimation is an important task for security applications. It is a regression problem consisting of feature extraction and estimation steps. In this study, we propose to use a modified version of previously introduced multi-column convolutional neural network (MCNN) approach for estimating crowd density. While in the original MCNN approach the same input image is applied to the each column of the network, we first propose to apply a different version of the same input image to extract a different mapping from each column. Second, original MCNN first generates an estimated density map and then performs crowd counting. Therefore, we adopt it for crowd counting and compare its performance with the proposed method. Regression task is performed by support vector regression (SVR) using feature vectors obtained from MCCNN. 2000 images selected from UCSD pedestrian dataset are used in the experiments. The regions of interest (ROI) are filtered out and the pixel values at the remaining regions are set to zero. In order to prevent distortion caused by camera position, perspective normalization has been applied as a pre-processing step which dramatically improves the performance. Keywords Crowd density estimation · Convolutional neural network · Crowd counting
1 Introduction As a natural consequence of rapidly growing urbanization, safety is becoming a basic human need. Therefore, the most visited places in cities are usually equipped with surveillance cameras by the authorities and often a human observer watches various O. Kurnaz (B) Mechatronics Engineering, Bursa Technical University, Bursa, Turkey e-mail: [email protected] C. Hanilçi Electrical and Electronic Engineering, Bursa Technical University, Bursa, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_20
223
224
O. Kurnaz and C. Hanilçi
objects (especially a suspected person) and their activities for a long period of time. Thus, such monitoring can result with a failure or missing the target object in crowd scenes. In contrary to surveillance, many accidents caused by the large crowds of people (e.g., concerts, festivals, demonstrations, etc.) were previously observed at various locations and countries. Thus, utilizing computer vision techniques for such analysis has received great attention in recent years. Crowd density estimation −automatically estimating the level of a crowd− is a significant but challenging task for various purposes. Existing research on crowd density estimation can be divided into two categories [1]: (i) holistic and (ii) local approaches. In holistic approach, whole image is processed at once without any segmentation and the relation between the crowd size and feature space is calculated using global features of image [2]. In local approach, the local features extracted from the image are used for performing different approaches such as detection, tracking and pedestrian density estimation [1]. Chan et al. [3], introduced a privacy-preserving method to estimate the size of non-uniform crowds compose from pedestrians moving in various directions. An optimized convolutional neural network (CNN) was proposed for crowd density estimation in [4]. With the proposed method, estimation speed was increased by eliminating some network connections according to the presence of similar feature maps and accuracy was improved by employing two cascade connected CNNs [4]. Zhang et al. [5] proposed a method to train a CNN with two different switchable learning tasks: crowd density estimation and crowd counting, and it was reported that the proposed method gives a better local optimum for both tasks. Sindagi and Patel [6] introduced an end-to-end cascaded network of CNNs for crowd density estimation aiming at jointly learning the crowd count classification and estimation of density map. A combination of shallow and deep fully convolutional networks for predicting the density map for a given crowd image was used in [7]. This combination was shown for capturing high-level information (e.g., face or body detectors) and the low-level features (e.g., blob detectors). A deep residual network was proposed and found to be appropriate for coincident crowd density estimation, violent behavior detection and level classification of crowd density [8]. Hu et al. [9] proposed a deep learning-based approach for estimating the size of a high-level or mid-level crowd in a single image. In that method, a CNN architecture was used for extracting features of crowd and then crowd density and crowd count were employed to learn the crowd features to estimate the specific crowd density. Li et al. [10] proposed an architecture for congested scene recognition by providing a data-driven and deep learning method. This method was shown for understanding highly congested moments and performing accurate count estimation from its representative density maps. In [11], CNN was used to map a given crowd image to corresponding density map. Zhang et al. [12] proposed multi-column CNN (MCNN) to count the number of crowd from a single image with random crowd density and random camera position. This network was used to map the input image to the corresponding crowd density map. In this paper, we propose to use two different modified versions of the previously proposed MCNN approach [12] for crowd density estimation. In original MCNN, three different CNNs with different number of receptive fields were trained using
Multi-image Crowd Counting Using Multi-column …
225
the same input image and then the embeddings of the last convolutional layers were combined to estimate the corresponding density map. The main contributions of our work and its differences from [12] can be summarized as follows: • Rather than training each CNN with the same input image, we propose to use different pre-processed versions of the input image with each CNN. Our motivation for this approach is the fact that, although the parameters (e.g., number of filters, size of the convolutional filters, etc.) are different, each CNN is likely to learn similar embeddings when the same input image is used. However using different pre-processed versions of the same image as the input will result in different representations (therefore different level of information). With these different embeddings, the performance of the crowd density estimation can intuitively be improved. • Rather than using multi-column CNNs in an end-to-end type as proposed in [12], we propose to use it for feature extraction. • We use support vector regression (SVR) using the features extracted from MCCNN for crowd density estimation. • A slight modification on previously proposed MCNN [12] is introduced and we compare its performance with the proposed technique.
2 Methods As mentioned before, crowd density estimation is a standard regression problem consisting of training and test stages. Before extracting features from input images, a pre-processing step is applied where necessary arrangements (e.g., noise removal, background subtraction, and selection of the region of interest) are performed. In this study, selection of the region of interest and perspective normalization are used as pre-processing techniques. Pre-processed images are then used to extract features. Three different feature representations are obtained from the proposed multiimage multi-column CNN (MCCNN) consisting of three CNNs in parallel where each of them is trained using a different version of the input image. Finally, the feature mappings obtained from each CNN are combined to train the support vector regression model and estimate the crowd density. In this section, each step of the proposed crowd density estimation system is briefly explained.
2.1 Pre-processing: Region of Interest (ROI) Selection There may be regions in the images that do not contain useful information for crowd density estimation. Especially, on crowd density estimation, the regions in which no one appears do not contain useful information. Besides increasing the computational complexity, these regions mostly degrade the performance. Thus, using only the
226
O. Kurnaz and C. Hanilçi
regions in which people appear (referred to as region of interest-ROI) in the images considerably improves the crowd density estimation performance [13]. Therefore, we first apply the selection of ROI as a pre-processing step to the images. To do so, relevant regions in the images are first determined and then the pixel values in the remaining regions are set to zero. The ROI in the images are obtained by applying an image filter that is designed to set the pixel values within the regions where people are not located to zero: f (x, y) = 0,
if (x, y) ∈ / ROI
(1)
2.2 Pre-processing: Perspective Normalization Previous studies in crowd density estimation showed that camera position has an important effect on the performance [3]. If the point of view is inclined, the size of the same object varies in different locations due to the effects of perspective. Since people closer to the camera look greater than further away, perspective normalization (PN) should be applied as a pre-processing step to the images for reducing the adverse effects induced by the perspective. Therefore, to tackle the effects of perspective, we apply PN method onto the ROI selected images as described in [3].
2.3 Feature Extraction: Multi-column Convolutional Neural Network (MCCNN) In this study, we propose to extract features using multi-image multi-column CNN (MCCNN) to count crowd. The overall structure of the proposed MCCNN and the number of parameters in each convolutional and fully connected blocks are shown in Fig. 1. Concretely, MCCNN is a deep neural network consisting of three CNNs in parallel. In contrast to the original MCNN approach, the input of each CNN column is a different pre-processed version of the same input image. Using a different pre-processed version of the same image at each column will result a different embedding conveying a different level of information. Extracting features using MCCNN intuitively will result better performance than using hand-crafted features. Because CNNs are known to be powerful in learning the best representative features. Therefore, we extract different feature mappings from each column of MCCNN and then combine these three feature vectors to form a single feature vector. Suppose that x1 , x2 , and x3 correspond to the bottleneck features obtained from the output of the last fully connected layer of each column of MCCNN, respectively. The final feature vector x is obtained by combining these three vectors as x = [x1 x2 x3 ]T . The first column of the MCCNN consists of five convolutional layers followed by three fully connected layers. The output layer of the network is a linear layer
Multi-image Crowd Counting Using Multi-column …
227
Fig. 1 Proposed MCCNN structure for crowd counting. The numbers below each convolutional layers block represent the number of convolutional filters in each layer and the size of convolutional filters, respectively. The numbers below each fully connected (FC) layer corresponds to the number of units in each FC layer
and performs the regression task. The input of this first column is the raw preprocessed image. Generally, each convolutional layer is followed by MaxPooling layer for dimensionality reduction. However, this may result in information loss. Therefore, we used convolutional layer with 2 × 2 kernel size and 2 × 2 stride size. While reducing the dimensionality, this helps to provide semantic information. The foreground image (difference image) obtained by applying the background subtraction process with the ROI selection and PN is used as the input to the second column of the MCCNN. The network is composed of six convolutional layers followed by two fully connected layers. The third column of the MCCNN is again a CNN architecture where the input images are obtained by applying Sobel edge detector followed by ROI selection and PN. The third column includes five convolutional layers and two fully connected layers. With the proposed approach we aim at obtaining different useful embeddings from each column and these embeddings will convey complementary information for crowd density estimation. Thus, intuitively combining these embeddings into a single feature vector will help to boost the crowd counting performance. In order to analyze the proposed MCCNN approach and to provide a deeper analysis we consider five sub-cases in the experiments: – MCCNN-I: Crowd counting is performed in an end-to-end fashion using only the first column of MCCNN structure. – MCCNN-II: Similar to MCCNN-I but the second column of the MCCNN architecture is used for crowd counting. – MCCNN-III: Similar to the MCCNN-I and MCCNN-II but the third column of the MCCNN is used. – MCCNN-IV: MCCNN structure is used for crowd counting in an end-to-end manner. To be more spesific, MCCNN is used for both feature extraction and regression tasks.
228
O. Kurnaz and C. Hanilçi
Fig. 2 Modified multi-column convolutional neural network (MCNN) structure. The numbers below each convolutional layers block represent the number of convolutional filters in each layer and the size of convolutional filters, respectively
– MCCNN-V: MCCNN is used for feature extraction and the 80-dimensional feature vector obtained by combining the embeddings extracted from each column is used with support vector regression (SVR) to perform crowd counting.
2.4 Crowd Counting Using Modified MCNN Since multi-column convolutional neural networks (MCNN) was originally proposed for crowd density map estimation in [12] and reported a great success, we implemented a slightly modified version of MCNN for comparison. The structure of the modified MCNN for crowd counting is depicted in Fig. 2. In the original MCNN approach, each column consists of four convolutional layers with different number of filters and different kernel sizes. The output of the last convolutional layer from each column was then merged to form a feature map. Finally a convolutional layer with 1 × 1 kernel was used to convert feature map to density map. In the modified version of the MCNN in this work, the last convolutional layer at each column is followed by a flattening layer to convert the feature maps learned by each CNN into a single feature vector. Then the last convolutional layer in the original MCNN approach [12] is replaced by a linear output layer to perform the regression task. Similar to the proposed MCCNN, we analyzed the performance of MCNN method in five different ways: – MCNN-I: Given an input image, the first column of the MCNN is used in an end-to-end fashion for crowd counting. – MCNN-II: Similar to the MCNN-I but the second column of the MCNN is used. – MCNN-III: Similar to the MCNN-I and MCNN-II but the third column of the MCNN is used.
Multi-image Crowd Counting Using Multi-column …
229
– MCNN-IV: Rather than using MCNN for feature extraction, end-to-end crowd counting is performed using the modified MCNN. – MCNN-V: Features extracted from each column of the MCNN are combined into a single feature vector and then SVR is used for regression. With these five different ways of analysis, we aim at gaining a better understanding of MCNN approach and determining whether it is better to use it for feature extraction or in an end-to-end manner.
3 Experimental Setup 3.1 Dataset, Network Training, and Performance Criteria The experiments are carried on UCSD [14] dataset. 2000 images selected from the consecutive frames of the UCSD dataset are used in the experiments.1 The dataset is divided into three disjoint subsets (namely training, development, and test subsets) consisting of 1280, 320, and 400 images, respectively. Training set is used to train the models while development set is used for parameter tuning during the training. Finally, the test set is used to measure the performance of the system. Adam optimizer [15] is used for optimizing the parameters of the networks. Mean squared error loss is used as a loss function for training the models. Learning rate is fixed to 0.0001. Mean squared error (MSE) and mean absolute error (MAE) are generally used as the performance criteria for crowd density estimation [5, 12, 16]. Therefore, in order to make a reasonable comparison with the previous studies we used the MSE and MAE as the evaluation metric in the experiments which are defined as MSE =
n 1 (X i − Xˆ i )2 n i=1
(2)
MAE =
n 1 X i − Xˆ i , n i=1
(3)
where X i is the ground truth (number of people in the image), Xˆ i is predicted value and n is total number of test images used in the experiments.
1
2000 images from UCSD dataset are used in the experiments because ground truth of these 2000 images are provided by [14] at http://www.svcl.ucsd.edu/projects/peoplecnt/.
230
O. Kurnaz and C. Hanilçi
4 Results MSE and MAE values obtained using the proposed MCCNN approach and modified MCNN method are summarized in Table 1. From the results in the table, we first note that each column of the MCCNN yields considerably reasonable performance individually (MCCNN-I, MCCNN-II and MCCNN-III in the table). Interestingly, the first and the third columns (MCCNN-I and MCCNN-III) show similar performance in terms of both MSE and MAE criteria. This is possibly because while we apply preprocessed image (ROI selected and PN applied image) as input to the first column, pre-processed and edge detection applied image is applied as the input to the third column. However, the first column is possibly learning and revealing the information induced by the edge detection with the help of the receptive fields in the convolutional layers. Next, we observe that when MCCNN is employed for crowd counting in an end-to-end type (MCCNN-IV) MSE and MAE values slightly reduce in comparison to each individual column. Furthermore, using SVR with features extracted from MCCNN considerably improves the performance. Compared to end-to-end system, using SVR reduces the MSE value from 0.21 to 0.10 (approximately 39% reduction). This observation reveals that each CNN column yields highly complementary information for crowd counting. Next, we analyze the performance of the proposed modified MCNN. From the MCNN results given in Table 1, each column of the modified MCNN structure (MCNN-I, MCNN-II, and MCNN-III) yields reasonable performance. However, combining all columns and employing end-to-end system for crowd counting (MCNN-IV) yields considerably large improvement on the performance. For example, while MCNN-I gives 0.63 and 0.61 MSE and MAE values, respectively, MCNNIV yields 0.41 and 0.49 MSE and MAE values. This corresponds to approximately 34% and 19% improvement in terms of MSE and MAE, respectively. Comparing the multi-image multi-column CNN (MCCNN) and modified singleimage multi-column CNN (MCNN) results given in Table 1, MCCNN outperforms MCNN in all cases. This is possibly because we apply a different pre-processed version of the same input image to each column of CNN in MCCNN. Thus each column will result in a different feature mapping for crowd counting performance. However,
Table 1 Crowd counting results using MCCNN and modified MCNN. The best numbers in each row are given in boldface and globally best numbers are shown in boldface and underlined MCCNN MSE MAE MCNN MSE MAE methods methods MCCNN-I MCCNN-II MCCNN-III MCCNN-IV MCCNN-V
0.24 0.29 0.23 0.21 0.10
0.37 0.42 0.37 0.33 0.20
MCNN-I MCNN-II MCNN-III MCNN-IV MCNN-V
0.63 0.63 0.81 0.41 0.19
0.61 0.62 0.72 0.49 0.34
Multi-image Crowd Counting Using Multi-column …
231
Table 2 Comparison of the proposed method with previous studies on UCSD dataset Methods MSE MAE Zhang et al. [5] Sam et al. [11] Zhang et al. [12] Zou et al. [16] Zhu et al. [17] Proposed method
3.31 2.10 1.35 1.29 1.44 0.10
1.60 1.62 1.07 1.01 1.08 0.20
in MCNN the same raw input image is applied to each column and therefore each column is likely to learn the similar mapping. Thus MCNN is inferior to MCCNN. Finally, we compare the crowd counting results obtained in this study with the previously reported results. Although there exist many studies in literature addressing the crowd counting problem, we compare our results with the ones conducting their experiments on UCSD database in order to make a equitable comparison. Table 2 shows the previously reported MSE and MAE values and the values obtained in our work. As shown in the table both MCCNN and modified MCNN considerably outperform other methods.
5 Conclusion In this study, a multi-image multi-column convolutional neural network (MCCNN) approach is proposed for crowd counting. The proposed MCCNN consists of three CNNs in parallel and a different pre-processed version of the same input image was applied to each column. With this approach we aimed at obtaining different embeddings from each column conveying different level of information. A modified version of the previously proposed single-image multi-column CNN (MCNN) was also proposed in this study. Experimental results conducted on UCSD dataset revealed that proposed MCCNN considerably outperformed earlier studies and the proposed modified MCNN achieved reasonably good performance on crowd counting. Experimental results showed that crowd counting performance considerably improved by the proposed approach in comparison to a single CNN system.
References 1. Ryan D, Denman S, Sridharan S, Fookes C (2015) An evaluation of crowd counting methods, features and regression models. Comput Vis Image Underst 130:1–17 2. Velastin S, Yin JH, Davies A (1995) Crowd monitoring using image processing. Electron Commun Eng J 7(1):37–47
232
O. Kurnaz and C. Hanilçi
3. Chan AB, Liang ZSJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: counting people without people models or tracking. In: Proceedings of the CVPR 4. Fu M, Xu P, Li X, Liu Q, Ye M, Zhu C (2015) Fast crowd density estimation with convolutional neural networks. Eng Appl Artif Intell 43:81–88 5. Zhang C, Li H, Wang X, Yang X (2015) Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of the CVPR 6. Sindagi VA, Patel VM (2017) CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In: Proceedings of the AVSS 7. Boominathan L, Kruthiventi SS, Venkatesh Babu R (2016) CrowdNet: a deep convolutional network for dense crowd counting. In: Proceedings of the MM’16, pp 640–644 8. Marsden M, McGuinness K, Little S, O’Connor NE (2017) ResnetCrowd: a residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification. In: Proceedings of the AVSS 9. Hu Y, Chang H, Nian F, Wang Y, Li T (2016) Dense crowd counting from still images with convolutional neural networks. J Vis Commun Image Represent 38 10. Li Y, Zhang X, Chen D (2018) CSRNet: dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the CVPR, pp 1091–1100 11. Sam DB, Surya S, Babu RV (2017) Switching convolutional neural network for crowd counting. In: Proceedings of the CVPR, pp 4031–4039 12. Zhang Y, Zhou D, Chen S, Gao S, Ma Y (2016) Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the CVPR, pp 589–597 13. Li M, Zhang Z, Huang K, Tan T (2008) Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection. In: Proceedings of the ICPR 14. Chan AB, Vasconcelos N (2008) Modeling, clustering, and segmenting video with mixtures of dynamic textures. IEEE Trans Pattern Anal Mach Intell 30(5):909–926 15. Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: Proceedings of the ICLR, pp 1–15 16. Zou Z, Cheng Y, Qu X, Ji S, Guo X, Zhou P (2019) Attend to count: Crowd counting with adaptive capacity multi-scale CNNs. Neurocomputing 367:75–83 17. Zhu L, Li C, Wang B, Yuan K, Yang Z (2020) DCGSA: a global self-attention network with dilated convolution for crowd density map generating. Neurocomputing 378:455–466
Which Features Are Helpful? The Antecedents of User Satisfaction and Net Benefits of a Learning Management System (LMS) Bernie S. Fabito , Mico C. Magtira , Jessica Nicole Dela Cruz , Ghielyssa D. Intrina , and Shannen Nicole C. Esguerra Abstract The demand for Learning Management Systems (LMS) in Higher Educational Institutions (HEI) has increased dramatically due its flexibility in the delivery of education. While much literature has tried to explore the factors determining the LMS adoption of faculty members through the lenses of various Information (IS) Success Theories, little has been made to understand the relationship between the LMS features and the IS success variables. Hence, the study explored which features and success variables show a possible relationship, which may aid in the decisionmaking process of HEIs that are just starting to adopt an LMS. Results show that the communication features of LMS had the highest relationship with the IS success variables. Keywords Learning Management System (LMS) · Information System (IS) success variables · LMS features · User satisfaction · Net benefits
B. S. Fabito (B) · M. C. Magtira · J. N. D. Cruz · G. D. Intrina · S. N. C. Esguerra National University, Manila, Sampaloc, Philippines e-mail: [email protected] M. C. Magtira e-mail: [email protected] J. N. D. Cruz e-mail: [email protected] G. D. Intrina e-mail: [email protected] S. N. C. Esguerra e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_21
233
234
B. S. Fabito et al.
1 Introduction The rapid advancement in mobile and ubiquitous computing has paved the way for the increasing demand for Learning Management Systems (LMS). It is expected that by the end of 2020, the revenue for the eLearning Market will reach up to USD 31 Billion in comparison to the USD 107 Million in revenue obtained in 2015 [1]. A Learning Management System, when used in an educational setting, serves as a medium where learning can happen ubiquitously using mobile devices [2]. Through an LMS, the teachers can design and deliver instructional materials and provide meaningful assessments that can motivate students to learn [3, 4]. In the literature, a plethora of studies has been made to assert the factors contributing to students’ and faculty members’ adoption of Learning Management Systems. Through Information System acceptance theories like the Information System Success Model (ISSM) [5], researchers can identify the variables that lead to LMS adoption among students and faculty members. Some of these variables include Course Quality [6], Information Quality and System quality [7], Social Influence, Information Quality, and University Management [8], among others. The ISSM is a well-known theory that looks at the success of Information Systems (IS). Understanding the variables that drive IS success can help managers make better decisions to improve the system further. Over the years, this theory has been modified [9], extended, and applied to various industries, including Hotels [10], Hospitals [11], and Libraries [12], among others. The ISSM determinants include System Quality, Information Quality, Service Quality, System Use, User Satisfaction, and Net Benefits [5]. These variables play a role in determining what factors lead to the actual use and satisfaction of using an IS and how it benefits the organization. While studies have shown the external variables that affect the Information Success constructs (specifically on System Quality, Information Quality, and System Quality), little has been studied on how the features of an LMS affect the IS constructs. The features of an LMS define the functions that make an LMS stand out from other existing LMS. In a 2014 study, these features were classified into four components, namely, tools for distribution, tools for communication, tools for interaction, and tools for course administration [13, 13]. These components are present among LMS, and can be used to analyze any research endeavors. Hence, the features of LMS were grouped based on the components presented by [13]. Table 1 shows the different features of an LMS and its description, while Table 2 presents the categories and how the features fall under the categories. This study adds novelty to the existing body of knowledge in LMS adoption by understanding the IS success variables’ antecedents through the LMS features. Specifically, the study intends to discover the relationship between the faculty members’ perception of how helpful are the features of the LMS and the three (3) Information Success variables, namely, System Use, User Satisfaction, and Net Benefits. The conceptual framework shown in Fig. 1 shows the possible relationship between the variables. Actual use and satisfaction [17].
Which Features Are Helpful? The Antecedents of User …
235
Table 1 LMS features Features
Description
Chat support [14]
Allows interaction between the learners and faculty members
Course creation [15]
Allows the administration of courses/subjects in the LMS
Assignment submission [14] Allows the creation of the assignment module where students can view and submit assignments Assessment/quiz [14]
Allows the creation of quizzes and other assessments for the students
Message boards [15]
Allows the creation of a discussion board where students can communicate and provide feedback
Collaboration [16]
Allows the creation of private group boards or channels for interaction and feedback
Mobile integration [16]
Allows the seamless integration of the LMS to mobile devices
Gamification [14]
Allows the utilization of gamification (e.g., badges, coins, rewards) in an LMS
Video conferencing [15]
Allows the conduct of face-to-face meeting between students and faculty members synchronously
Table 2 LMS components [13] Components
Description
Tools for distribution
Allows professors to upload Assignment submission, learning materials and other related assessment/quiz module, resources for the students’ gamification consumption
Features
Tools for communication
Allows student-faculty, facultyfaculty, and student–student communication
Chat support, video conferencing
Tools for interaction
Facilitates feedback and reaction between students and faculty members
Message boards, collaboration
Tools for course administration
Allows course management of the LMS
Course creation, mobile integration
2 Methodology The study made use of a quantitative approach employing an online survey. A total of fifty-six (56) faculty members from a private Higher Educational Institution (HEI) answered the survey through Microsoft Forms Pro from June to July 2019. The survey was answered using a Likert scale of one (1) to four (4) with a verbal interpretation of Strongly Agree, Agree, Disagree, and Strongly Disagree for the IS Success variables. For the different LMS features, the verbal interpretation of Very Helpful, Somewhat helpful, Rarely Helpful, and not at all was used. A Spearman correlation was used to determine the relationship using Stata 11.0.
236
B. S. Fabito et al. Distribution Assignment Submission Assessment / Quiz Module Gamification Communication Chat Support Video Conferencing
Actual use and satisfaction [17]
System Use User Satisfaction
Interaction Message Board Collaboration
Net Benefits
Course administration Course Creation Mobile Integration
Fig. 1 Conceptual framework
3 Results and Discussion The survey shows that 48 or 85% of the respondents are using MS Teams as their LMS. The high number is attributed primarily because MS Teams is the official LMS used in the University where the study was conducted. Microsoft Teams is the Microsoft Office 365 tool for collaboration and communication. Although it is used mainly for business organizations, it can also be used for education. Classroom collaboration, end-to-end assignment management, and OneNote Class Notebooks are just some of the integrated features in the MS Teams for Education [18]. Table 3 presents the mean result of the LMS features and the IS Success variables. Analyzing the table, we can deduce that the features are perceived as somewhat helpful. The Assignment Submission, Chat Support, and the Message Boards had the highest mean result, suggesting that they are the most commonly used features in using an LMS. The result can be further observed in the visualization found in Fig. 2. For the IS Success variables, the respondents mostly agree with using the system, how it provides satisfaction to their work, and how it benefits delivering instruction with the students. Tables 4 and 5 show the Spearman correlation of the IS Success variables and the LMS Features. It can be inferred that there is a weak to moderate monotonic relationship between the variables, which are all statistically significant (p < 0.05). From the four (4) components presented, only communication had a consistent correlation to the three (3) IS success variables. Although the relationship ranges from weak to moderate, the report has shown a possible antecedent of the LMS features to the IS Success variables.
Which Features Are Helpful? The Antecedents of User …
237
Table 3 LMS features and IS success variables mean result LMS features and IS success variables
Mean
SD
Verbal interpretation
Chat support
1.55
0.63
Somewhat helpful
Assignment submission
1.50
0.63
Somewhat helpful
Course creation
1.73
0.79
Somewhat helpful
Assessment/quiz
1.57
0.63
Somewhat helpful
Message boards
1.55
0.60
Somewhat helpful
Mobile integration
1.82
0.83
Somewhat helpful
Collaboration
1.94
0.79
Somewhat helpful
Gamification
2.35
0.99
Somewhat helpful
Video conferencing
2.28
0.90
Somewhat helpful
System use
1.50
0.46
Agree
User satisfaction
1.81
0.50
Agree
Net benefits
1.73
0.49
Agree
Table 4 Spearman correlation result LMS features and IS success variables
Distribution rs
Communication p
rs
Interaction p
rs
Course administration p
rs
p
System use 0.38
0.00 0.50
0.00 0.38
0.00 0.30
0.022
User 0.37 satisfaction
0.00 0.53
0.00 0.44
0.00 0.33
0.010
Net benefits
0.01 0.48
0.00 0.43
0.00 0.32
0.016
0.34
Table 5 Spearman correlation result interpretation LMS features and IS Distribution success variables
Communication
Interaction
Course administration
System use
Weak
Moderate
Weak
Weak
User satisfaction
Weak
Moderate
Moderate
Weak
Net benefits
Weak
Moderate
Moderate
Weak
0.60–0.79 strong
0.80–1.0 very strong
–
0.00–0.19 very weak 0.40–0.59 moderate
Drilling the Spearman correlation for each LMS feature with the IS Success variables (Tables 6, 7, and 8) shows similar output. A weak to moderate monotonic relationship. It was clear that the chat support had the highest relationship with the three (3) IS success variables. This result is also similar to a research that showed
238 Table 6 Spearman correlation result between the LMS features and system use
Table 7 Spearman correlation result between the LMS features and user satisfaction
Table 8 Spearman correlation result between the LMS features and net benefits
B. S. Fabito et al. LMS features
rs
p
Chat support
0.509
0.000
Course creation
0.373
0.004
Assignment submission
0.396
0.002
Assessment/quiz
0.400
0.002
Message boards
0.393
0.002
Collaboration
0.307
0.201
Mobile integration
0.235
*0.081
Gamification
0.233
*0.083
Video conferencing
0.382
0.003
LMS features
rs
p
Chat support
0.492
0.000
Course creation
0.364
0.005
Assignment submission
0.287
0.031
Assessment/quiz
0.4220
0.001
Message boards
0.304,
0.022
Collaboration
0.4207
0.001
Mobile integration
0.291
*0.259
Gamification
0.259
*0.053
Video conferencing
0.3828,
lms features
rs
0.003
p
Chat support
0.483
Course creation
0.319
0.005
Assignment submission
0.287
0.031
Assessment/quiz
0.398
0.003
Message boards
0.293
0.028
Collaboration
0.424
0.001
Mobile integration
0.263
0.050
Gamification
0.214
*0.112
Video conferencing
0.378
0.004
*p
> = 0.05
0.002
Which Features Are Helpful? The Antecedents of User …
239
that faculty members prefer to use an LMS that incorporates communication with the academic community [19]. The result of the present study would help provide input for organizations wanting to adopt an LMS. As mentioned in a study [20, 21], understanding the stakeholders’ needs in an LMS will guarantee its full utilization. To help validate the correlation, a Focus Group Discussion (FGD) with students from the same University was conducted. The FGD was used to determine if the students agree with the result. The interview has shown that the communication feature helps students as it allows them to communicate with concerned faculty members and classmates. Instead of using Social Media or an E-mail, a student can directly communicate with the professor for their academic concern through the LMS. This is one feature that students unanimously agree as an advantage over traditional face-to-face communication. Subsequently, a follow-up survey was conducted on a group of students after analyzing the result of the FGD. The survey includes rating the features that they find the most helpful in using an LMS. The result has shown similar output both from the correlation study and the FGD.
4 Limitations and Recommendations While the study has shown a possible relationship between the LMS features and the IS Success variables, caution is made with interpreting the study’s result due to the low number of respondents. Subsequently, since the study used a spearman correlation, causality between the LMS features and IS success variables cannot be made. The Spearman correlation was used due to the nature of the survey data. Future research endeavors in the same domain may include expanding the features of the LMS. The study may have failed to add other features that may exist in other LMSs that may serve as a predictor for the IS Success variables. New features include software integration to LMS, customized learning analytics reports, etc. Subsequently, exploring how the LMS features affect System Quality and Information Quality as external variables may also be considered to identify which features are desirable and can provide quality output.
References 1. [Infographic] Top learning management system statistics For 2020. https://elearningindustry. com/top-learning-management-system-lms-statistics-for-2020-infographic. Accessed 01 Sep 2020 2. Turnbull D, Chugh R, Luck J (2020) Learning management systems: a review of the research methodology literature in Australia and China. Int J Res Method Educ 1–15. https://doi.org/ 10.1080/1743727X.2020.1737002.
240
B. S. Fabito et al.
3. Kattoua T, Al-Lozi M, Alrowwad A (2020) A review of literature on e-learning systems in higher education. https://www.researchgate.net/publication/309242990_A_Review_of_Litera ture_on_E-Learning_Systems_in_Higher_Education. Accessed 01 Sep 2020 4. Botha A, Smuts H, de Villiers C (2018) Applying diffusion of innovation theory to learning management system feature implementation in higher education: lessons learned. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Springer, pp 56–65. https://doi.org/10.1007/978-3-030-035 80-8_7 5. DeLone WH, McLean ER (2003) The DeLone and McLean model of information systems success: a ten-year update. J Manag Inf Syst 9–30. M.E. Sharpe Inc. https://doi.org/10.1080/ 07421222.2003.11045748 6. Mtebe JS, Raisamo R (2014) A model for assessing learning management system success in higher education in Sub-Saharan countries. Electron J Inf Syst. Dev Ctries 61:1–17. https:// doi.org/10.1002/j.1681-4835.2014.tb00436.x 7. Shahzad A, Hassan R, Aremu AY, Hussain A, Lodhi RN (2020) Effects of COVID-19 in Elearning on higher education institution students: the group comparison between male and female. Qual Quant 1–22. https://doi.org/10.1007/s11135-020-01028-z 8. Fabito BS, Rodriguez RL, Trillanes AO, Lira JIG, Miguel P, Ana QS (2020) Investigating the factors influencing the use of a learning management system (LMS): an extended information system success model ( ISSM ). In: The 4th international conference on e-society, e-education and e-technology (ICSET’20). Association for Computing Machinery, Taipei. https://doi.org/ 10.1145/3421682.3421687. 9. Tajuddin M (2015) Modification of DeLon and McLean model in the success of information system for good university governance 10. Ojo AI (2017) Validation of the DeLone and McLean information systems success model. Healthc Inform Res 23:60–66. https://doi.org/10.4258/hir.2017.23.1.60 11. Ebnehoseini Z, Tabesh H, Deldar K, Mostafavi SM, Tara M (2019) Determining the hospital information system (His) success rate: development of a new instrument and case study. Open Access Maced J Med Sci 7:1407–1414. https://doi.org/10.3889/oamjms.2019.294 12. Alzahrani AI, Mahmud I, Ramayah T, Alfarraj O, Alalwan N (2019) Modelling digital library success using the DeLone and McLean information system success model. J Librariansh Inf Sci 51:291–306. https://doi.org/10.1177/0961000617726123 13. Jurado RG, Petterson T, Gomez AR, Scheja M (2013) Classification of the features in learning management systems. In: XVII Scientific Convention on Engineering and Architecture, Havana City, Cuba, Nov 24–28. XVII Scientific Convention on Engineering and Architecture, vol 53, pp 1689–1699. https://doi.org/10.1017/CBO9781107415324.004. 14. LMS features to improve usability-eLearning industry. https://elearningindustry.com/featuresto-improve-usability-of-lms. Accessed 26 Oct 2020 15. Important LMS features for your e-learning program. https://technologyadvice.com/blog/ human-resources/8-important-lms-features/. Accessed 26 Oct 2020 16. What is an LMS? (2020 Update) | LMS features | LMS use case. https://www.docebo.com/ blog/what-is-learning-management-system/. Accessed 26 Oct 2020 17. Joo YJ, Kim N, Kim NH (2016) Factors predicting online university students’ use of a mobile learning management system (m-LMS). Educ Technol Res Dev 64:611–630. https://doi.org/ 10.1007/s11423-016-9436-7 18. Set up teams for education-M365 education | Microsoft Docs. https://docs.microsoft.com/enus/microsoft-365/education/deploy/set-up-teams-for-education. Accessed 30 Nov 2020 19. Alturki UT, Aldraiweesh A, Kinshuck (2020) View of evaluating the usability and accessibility of LMS “Blackboard” at King Saud University. https://clutejournals.com/index.php/CIER/art icle/view/9548/9617. Accessed 29 Oct 2020 20. Iqbal S (2011) Learning management systems (LMS): inside matters. Inf Manag Bus Rev 3:206–216. https://doi.org/10.22610/imbr.v3i4.935 21. Fabito B, Trillanes A, Sarmiento J (2021) Barriers and challenges of computing students in an online learning environment: insights from one private university in the Philippines. Int J Comput Sci Res 5:441–458. https://doi.org/10.25147/ijcsr.2017.001.1.51
Performance Analysis of a Neuro-Fuzzy Algorithm in Human-Centered and Non-invasive BCI Timothy Scott C. Chu, Alvin Chua, and Emanuele Lindo Secco
Abstract Developments in Brain-Computer Interface machines have provided researchers with the opportunity to interface with robotics and artificial intelligence; and, each BCI—Robotics system employed different Machine Learning algorithms. This study aimed to present a performance analysis for a Neuro-Fuzzy algorithm, specifically the Adaptive Network-Fuzzy Inference System (ANFIS), to classify EEG signals retrieved by the Emotiv INSIGHT in conjunction with the SVM algorithm as reference. Generation of EEG data was done through face gestures, specifically Facial and Eye Gestures. The generated data were fed to both algorithms for simulation experiments. Results showed that the ANFIS tends to be more reliable and marginally better than the SVM algorithm. Compared to SVM, the ANFIS took significant amounts of computational resources requiring higher specs and training time. Keywords BCI · BMI · Human-centered interface · ANFIS · SVM
1 Introduction The human brain interacts with limbs by coursing through electric signals along the nerves and synapses that serve as the bridge connecting the brain to every part of the body. These electrical activities can be captured by the Brain-Computer Interface (BCI) machines by conducting a test called an electroencephalogram (EEG) [1]. This test enabled medical professionals to observe and detect anomalies in a patient’s T. S. C. Chu (B) · E. L. Secco Robotics Laboratory, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK e-mail: [email protected] E. L. Secco e-mail: [email protected] T. S. C. Chu · A. Chua Mechanical Engineering Department, De La Salle University, Manila, Philippines e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_22
241
242
T. S. C. Chu et al.
brain. Developments in BCI technology provided researchers not only access to affordable BCI machines, but also opportunities in robotic developments through the integration of robotics into BCI systems. Studies like [1, 2] explored the use of BCI systems in robotic arm applications. The prior study used raw EEG signals, which were visually processed by determining the spikes generated in an instance. The spikes served as the features which correspond to different gestures on each hand. The study employed a Linear Discriminant Analysis (LDA) as their classifier and was able to successfully distinguish separate sets of actions such as right, right neutral, left, and left neutral with an accuracy of close to 100%. Researchers in [2] took a different approach by employing the Steady-State Visual Evoked Potentials (SSVEP) technique where visual prompts were tied to a certain action command. This was achieved with a LED panel with 4 colors, blue, red, white, and green; additionally, each color was set to flicker at different frequencies. The study used a combination of Power Spectral Density (PSD) and Classification Tree Method (specifically ITR) to process the data and generate predictions with an average effective rate of 73.75%. The studies in [3, 4] all utilized the algorithms in the Emotiv provided Software Development Kit (SDK); however, the researcher in [3] particularly explored a different approach by using the Left and Right wink face gesture to control a robotic arm. Physically handicapped individuals experience hindrances from performing important tasks either at home or in their work. This can be addressed by developing a BCI-Robotics system as an attempt for individuals to regain their lifestyle. However, in the field of BCI control implementation, the distinguishing factor between studies is the performance of the algorithm used in the application. Common algorithms are usually Convolutional Neural Network or Artificial Neural Network, these algorithms offer high accuracy however take up a significant amount of computational time [5]. Another common algorithm utilized in this field is the Support Vector Machines (SVM); this algorithm can effectively classify signals efficiently, given that a proper kernel is provided by the user to create effective hyperplanes, often this becomes a huge challenge to inexperienced users [6]. Also, the SVM’s performance drops according to the size of the dataset, consequently taking up more computational time. The challenge is to utilize or develop an algorithm that offers a good balance between accuracy and computational time. In [7], it mentioned that a degree of overlap with the EEG signal features between similar gestures affected the performance of the machine learning algorithm. With this consideration, an algorithm capable of handling this overlap would be well suited in this application. The Adaptive Neuro-Fuzzy Inference System (ANFIS) algorithm is one form of Fuzzy Neural Network, a hybrid between Fuzzy Logic and Neural Networks, that utilizes a firstorder Sugeno model [5], which is capable of addressing this concern. This research explores the viability of the ANFIS algorithm in the BCI interface in conjunction with the Support Vector Machines (SVM), where the latter algorithm serves as a baseline comparison to put in perspective how well the ANFIS performs on some parameters such as accuracy.
Performance Analysis of a Neuro-Fuzzy Algorithm …
243
2 Theories Involved 2.1 Butterworth Band-Pass Filters Band-Pass filtering is a standard EEG pre-processing technique that uses the concepts of pass-band and stop-band. A pass-band allows information that is within the cutoff frequency to go through the filter, while a stop-band rejects information that is beyond the cut-off frequency. Butterworth Band-pass Filters extends the idea of band-pass filters and introduces the concept of a normalized gain; where instead of hard cut-off passes, it introduces a bell-shaped gain into the system. The Butterworth filters introduce two transfer functions which are represented by Eqs. 1 and 2. The two equations refer to the high-pass filter and the low-pass filter, respectively. |H (ω)| =
1+
|H (ω)| = 1+
Ao ωo 2n
(1)
Ao
(2)
ω
ω ωo
2n
where H(ω) is the normalized gain, Ao as the max gain in pass-bands, ωo is the cut-off frequency, lower for low-pass filters (Eq. 2) and higher for high-pass filters (Eq. 1),
Fig. 1 a High-Pass b Low-Pass Butterworth filters of different orders
244
T. S. C. Chu et al.
ω is the frequency of the input signal, and n as the order of the filter. Figure 1 offers a graphical representation of the Butterworth Band-pass filters.
2.2 Adaptive Neuro-Fuzzy Inference System (ANFIS) The Adaptive Network-Fuzzy Inference-System (ANFIS), developed by JS Jang [5], combines both the concept of Fuzzy Inference System and Adaptive Network, making this a hybrid model. The Fuzzy Inference System is a core concept from Fuzzy Logic wherein it generally possesses four main functions namely, (i) Knowledge Base, (ii) Fuzzification, (iii) Defuzzification, and (iv) Decision-Making Unit as shown in Fig. 2. The Fuzzification process converts crisp input values to fuzzified values under a specified membership function. The fuzzified input is then fed to the DecisionMaking Unit where it runs the input to fuzzy operators (i.e., Min, and Max) and then compares them to a set of criteria established in the Rule Base to determine the appropriate output for the given set of input. Defuzzification returns the fuzzified output values to crisp values. An advantage of this system architecture is that it is capable of accepting vague inputs, also referred to as uncertain inputs, and returns sufficiently accurate outcomes. An Adaptive Network allows the algorithm to adapt to its mistakes and learn to increase the accuracy of the predictions made. This can be considered as a supervised learning algorithm, wherein design parameters are established and are integrated with the nodes to serve as efficacies; and, these efficacies are adjusted accordingly through training with a dataset. Theoretically, the algorithm is subjected to two passes of learning rules to obtain necessary parameter values. Table 1 summarizes the learning process of the algorithm. The ANFIS algorithm is composed of five layers; and, the initial layer is responsible for the fuzzification. Normally, a Gaussian membership function is utilized, however, other options such as triangular and trapezoidal membership function. Layers 2 and 3, are responsible for determining the parameters and normalized parameters using Eqs. 3 and 4, respectively.
Fig. 2 Fuzzy inference system [5]
Performance Analysis of a Neuro-Fuzzy Algorithm … Table 1 Hybrid learning process for ANFIS algorithm
245
Forward pass
Backward pass
Premise parameters
Fixed
Gradient descent
Consequent parameters
Least squares estimates
Fixed
Signals
Node outputs
Error rates
wi = μ Ai (x) × μ Bi (x), i = 1, 2 wi =
wi w1 + w2
(3) (4)
On the fourth layer, consequent parameters are calculated with Eq. 5. Oi4 = w¯ i f i = w¯ i ( pi x + qi y + ri )
(5)
where Oi4 is interpreted as the output of the ith node in layer 4. Finally, on the final thinking layer or fifth layer, all information from the previous layer is collated, calculated, and ‘defuzzified’ to produce a single output using Eq. 6. O15 = overall output =
i
w¯ i f i =
wi f i i i wi
(6)
Figure 3 shows the sample ANFIS algorithm diagram with 2 inputs and a single output. A simple ANFIS algorithm generally follows two rules that are shown in Eqs. 7 and 8 below and can be expanded appropriately to fit a particular use case. Rule 1 : If (x1 is in A1 )and (x2 is in B1 ), then yˆ = p1 x1 + q1 x2 + r1
(7)
Rule 2 : If (x1 is in A2 )and (x2 is in B2 ), then yˆ = p2 x1 + q2 x2 + r2
(8)
Fig. 3 ANFIS diagram, 2 inputs, 1 output [5]
246
T. S. C. Chu et al.
where x 1 and x 2 are the values inputted in the algorithm, and Ai and Bi are the fuzzy sets of the data. The crisp output y results in a value corresponding to the input values and is computed together with the design parameters pi , qi , and r i .
3 Materials and Methods Figure 4 shows the methodology flowchart implemented in this research. The process begins with the Generation of EEG Raw Signals. The generated signals are then Retrieved and Transferred by the Emotiv INSIGHT neuroheadset to the computing hardware to the OpenViBE software. In OpenViBE, the data were processed and recorded into CSV files which are sent to the Machine Learning algorithms for training and evaluation.
3.1 Generation of EEG Raw Signals The generation of EEG data was achieved with two methods; the first was by making and holding a face gesture for 15 s, serving as the first EEG Dataset. The face gestures used were Neutral, Smile, Shocked, and Clench as shown in Fig. 5a.
Fig. 4 Methodology flowchart
Fig. 5 a Face gestures, b Eye gestures
Performance Analysis of a Neuro-Fuzzy Algorithm …
247
The second method of obtaining EEG data this time utilized eye gestures such as Neutral, Eyes Widen, Left Wink, Right Wink, and Closed, in Fig. 5b. Instances of the gesture are obtained, instead of holding. This composed the second EEG Dataset. Executing the mentioned face gestures generated brainwave activities on the Frontal Lobe, which were detected and captured by a BCI Machine.
3.2 Retrieving and Transferring of Generated EEG Data The Emotiv INSIGHT is a non-invasive Brain-Computer Interface (BCI) machine equipped with 5 electrode sensors that follow the international 10–20 system of electrode placements. The neuroheadset obtains the generated EEG signals and translates them into values measured in millivolts. Sensors AF3 and AF4 were observed to be most effective, due to the function of the Frontal Lobe. The Frontal Lobe is considered to be the section of the brain that manages motor skills, actions that require muscle movements to move different parts of the body. Cognitive functions such as learning, thinking, and memory are also managed by this lobe. This includes mental commands or actions where individuals ‘think’ or imagine performing a particular physical movement [8]. The INSIGHT has a sampling frequency of 128 Hz, consequently obtaining 128 samples of brain activity for each electrode per second.
3.3 Pre-processing of Data and Feeding to Algorithm The obtained data were sent to the OpenViBE software where it was processed and recorded. On the first EEG dataset, the research did not employ any filtering process to determine the capacity of the algorithms to manage raw EEG data. While on the second EEG dataset, a Fifth-Order Butterworth Band-Pass filter with the low-pass and high-pass frequencies to be 13 Hz and 43 Hz, respectively. This configuration blocked brainwaves under 13 Hz, which were observed to be noisy. The filtered data was further processed heuristically by locating the spikes detected on the sensors AF3 and AF4 of each instance. These values served as the key features in the sample that effectively define the gesture executed. The generated datasets were then fed to the algorithms for training and evaluation. Table 2 shows the details of the first and second EEG Datasets used in this experiment. Both EEG datasets are obtained with 1, 2, and 4 sample counts per block, affecting the number of features in the dataset.
248
T. S. C. Chu et al.
Table 2 Datasets generated and used Dataset
No. of features
No. of classifications
No. of row inputs
Total array size
First EEG dataset (Face gesture) 1 Sample count
5
4
800
800 × 5
First EEG dataset (Face gesture) 2 Sample count
10
4
800
800 × 10
First EEG dataset (Face gesture) 4 Sample count
20
4
800
800 × 20
Second EEG dataset (Eye gestures) 1 Sample count
5
5
250
250 × 5
Second EEG dataset (Eye gestures) 2 Sample count
10
5
250
4 Results and Discussion Both algorithms were tasked to classify and generate predictions with the obtained datasets. The tests were conducted five times and an average was obtained from all the runs. Results of the ANFIS algorithm were then analyzed together with the SVM.
4.1 Simulation Results for First EEG Dataset Immediately observed in Fig. 6, the ANFIS was not able to produce any predictions on the dataset with 2 and 4 sample counts as it has reached a ‘memory error.’ This Fig. 6. First EEG dataset—average accuracy of the 3 algorithms (4 face gesture classification)
Performance Analysis of a Neuro-Fuzzy Algorithm … Table 3 First EEG dataset overall algorithm training time
249
No. of sample counts
ANFIS (5 epoch)
ANFIS (10 epoch)
SVM
1 Sample count
17,832 s
37,025 s
0.0110 s
2 Sample count
–
–
0.0088 s
4 Sample count
–
–
–
implied that the ANFIS algorithm ran out of RAM to be able to train itself and create predictions with this dataset. The SVM was able to produce predictions with 85.83% and 82.43% accuracy with the dataset consisting of 2 sample counts and 4 sample counts, respectively. Focusing on the dataset with 1 sample count the SVM generated predictions with a 57.91% accuracy rating, while the ANFIS predicted with accuracies of 57.58% and 63.00% for 5 and 10 epochs, respectively. This experiment showed and consolidated 3 potential conclusions, namely: (i) (ii) (iii)
Based on the results of the dataset with 1 sample count, the performance of the ANFIS algorithm is indeed comparable to the performance of the SVM. Increasing the sample count in the dataset offers a degree of improvement in the performance of the algorithms. The ANFIS algorithm is not efficient in managing large datasets.
It was observed that the two sample counts in the dataset were sufficient to produce significantly more accurate results as shown in SVM’s performance results. Based on Table 3, it was observed that the duration SVM takes for training was only a fraction of a second. ANFIS, on the other hand, took significantly longer for training with 17,832 s around 4.95 h, while 10 epochs roughly doubled that amount. This inferred that the ANFIS took up significantly more computational resources than the SVM for this use case and causing the algorithm to crash.
4.2 Simulation Results for Second EEG Dataset Similar tests were conducted on the second EEG Dataset with its results shown in Fig. 7. The SVM generated predictions on the 1 sample count dataset with an accuracy of 52.00%. ANFIS on the other hand produced accuracy ratings of 74.22% and 80.88% for 5 and 10 epochs, respectively. For the dataset with 2 sample counts, all the algorithms performed satisfactorily with an 80.00% accuracy rating for the SVM and 89.33% and 90.13% for the ANFIS with 5 and 10 epochs, respectively. In this set of EEG data, it can be observed that the ANFIS performed significantly better as compared to the first set; this may be due to the size dataset to be smaller. In this experiment, the ANFIS was able to produce results for the dataset with 2 sample counts, which was not possible with the previous datasets. Between the two datasets of different sample counts, the same phenomenon was observed where the SVM algorithm benefitted the most from the increased number of features with an
250
T. S. C. Chu et al.
Fig. 7. Second EEG dataset—average accuracy of the 3 algorithms (all eye gesture classification)
Table 4 Second EEG dataset—overall algorithm training time
No. of samples
ANFIS (5 Epoch)
ANFIS (10 Epoch)
SVM
1 Sample count
264.99 s
590.88 s
0.0054 s
2 Sample counts
48,137 s
101,895 s
0.0037 s
increase in accuracy of 28%, while the ANFIS also showed at most a 15.11% increase in accuracy in the 5 epochs. In this set of experiments, the ANFIS algorithm was able to perform better than the SVM algorithm in both 1 sample count and 2 sample count datasets. Results from this set of experiments have validated the ideas presented in Sect. 4.1, specifically: (i) the performance of the ANFIS algorithm is comparable to the performance of the SVM, and (ii) increasing the sample count in the dataset offers a degree of improvement in the performance of the algorithms. Table 4 shows the time in seconds it took to train the algorithms. Focusing on ANFIS, the algorithm took significantly longer to train with the datasets. In 1 sample count, 5 epochs took 264.99 s or 4.42 min. There was an exponential growth in the duration of training for ANFIS with the dataset with 2 sample counts, doubling the number of features. For 5 epochs, the ANFIS took 48,137.76 s or 13.37 h to train, and nearly double for 10 epochs. Results recorded in Tables 3 and 4 supplement the idea, (iii) the ANFIS algorithm is not efficient in managing large datasets.
5 Conclusion and Recommendation This research explored the applicability of the Adaptive Neuro-Fuzzy Inference System (ANFIS) in BCI system implementations by analyzing the performance of the ANFIS algorithm, together with the SVM as a reference in managing EEG datasets. The research employed the use of facial gestures to generate EEG signals which are
Performance Analysis of a Neuro-Fuzzy Algorithm …
251
captured by the Emotiv INSIGHT. Raw EEG signals extracted from Face Gestures made up the first EEG Dataset while pre-processed EEG signals extracted from Eye Gestures composed the 2nd EEG Dataset. Both EEG datasets were fed into both algorithms to determine the performance of both algorithms. Overall, the ANFIS showed comparable or even better performance to SVM in terms of accuracy, however, the algorithm was not able to produce any results for the large datasets. With regards to training, the ANFIS took significantly more time and computational resources than the SVM. Results from the experiments confirmed that the ANFIS possessed comparable performance to the SVM algorithm in terms of accuracy. Both algorithms experienced a degree of improvement in the performance with more features in the dataset. Due to the time it took for training and the errors it experienced, the ANFIS also took up significantly more computational resources, consequently making ANFIS not efficient in managing large datasets. This concludes that the ANFIS is a viable algorithm for a BCI system implementation as its accuracy ratings are comparable to the SVM, requiring a relatively small dataset. However, feeding the ANFIS with large datasets, especially datasets with a lot of features will require large amounts of computational resources, making the algorithm inefficient. Researchers recommend that a direct interface between the neuroheadset and algorithms may yield better results for future researchers on this topic. Research findings are open to other methodologies where the intuitiveness of the interfaces can support the end-user while interacting with the Machine Interface [9–12]. Acknowledgments This work was presented in dissertation form in fulfillment of the requirements for the M.Sc. in Robotics Engineering for Timothy Chu under the supervision of E.L. Secco from the Robotics Laboratory, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, UK, and Dr. Alvin Chua from the Mechanical Engineering Department, De La Salle University, PH.
References 1. Prince D, Edmonds M, Sutter A, Cusumano M, Lu W, Asari V (2015) Brain machine interface using Emotiv EPOC to control robai cyton robotic arm. In: 2015 national aerospace and electronics conference (NAECON). IEEE, pp 263–266 2. Holewa K, Nawrocka A (2014) Emotiv EPOC neuroheadset in brain-computer interface. In: Proceedings of the 2014 15th international carpathian control conference (ICCC). IEEE, pp 149–152 3. Aguiar S, Yanez W, Benítez D (2016) Low complexity approach for controlling a robotic arm using the Emotiv EPOC Headset. In: 2016 IEEE international autumn meeting on power, electronics and computing (ROPEC). IEEE, pp 1–6 4. Mamani MA, Yanyachi PR (2017) Design of computer brain interface for flight control of unmanned air vehicle using cerebral signals through headset electroencephalograph. In: 2017 IEEE international conference on aerospace and signals (INCAS). IEEE, pp 1–4 5. Jang JS (1993) ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybern 23(3):665–685
252
T. S. C. Chu et al.
6. Tavakoli M, Benussi C, Lopes PA, Osorio LB, de Almeida AT (2018) Robust hand gesture recognition with a double channel surface EMG wearable armband and SVM classifier. Biomed Signal Process Control 46:121–130 7. Li S, Feng H (2019) EEG signal classification method based on feature priority analysis and CNN. In: 2019 international conference on communications, information system and computer engineering (CISCE). IEEE, pp 403–406 8. How Your Brain Works, https://science.howstuffworks.com/life/inside-the-mind/human-brain/ brain8.htm. Accessed 11 Sep 2020 9. Elstob D, Secco EL (2016) A low cost EEG based BCI Prosthetic using motor imagery. Int J Inf Technol Converg Serv 6(1):23–36. http://arxiv.org/abs/1603.02869 10. Secco EL, Moutschen C, Tadesse A, Barrett-Baxendale M, Reid D, Nagar A (2017) Development of a sustainable and ergonomic interface for the EMG control of prosthetic hands. In: Lecture notes of the institute for computer sciences, social informatics and telecommunications engineering, vol 192. Springer, pp 321–327. ISBN 978-3-319-58877-3 11. Secco EL, Caddet P, Nagar AK Development of an algorithm for the EMG control of prosthetic hand, soft computing for problem solving. In: Advances in intelligent systems and computing, vol 1139. Chapter 15. https://doi.org/10.1007/978-981-15-3287-0_15 12. Maereg AT, Lou Y, Secco EL, King R (2020) Hand gesture recognition based on near-infrared sensing wristband. In: Proceedings of the 15th international joint conference on computer vision, imaging and computer graphics theory and applications (VISIGRAPP 2020), pp 110– 117. ISBN: 978-989-758-402-2. https://doi.org/10.5220/0008909401100117
A Workflow-Based Support for the Automatic Creation and Selection of Energy-Efficient Task-Schedules on DVFS Processors Ronny Kramer
and Gudula Rünger
Abstract The performance of a task-based program depends strongly on the schedule and the mapping of the tasks to execution units. When the energy consumption is considered additionally, the selection of an advantageous schedule is time-consuming since many options have to be taken into account. The exploitation of frequency scaling increases the number of possible schedules even further. This article proposes a software framework which supports the selection of energy-efficient schedules for task-based programs by data analysis methods for performance data. The framework contains several components for the data acquisition and data management of performance data as well as components for analyzing the data and selecting a schedule based on the data. The software architecture of the framework is presented and the implementation is described. A workflow which selects a schedule for a given set of tasks on DVFS processors is presented. As example a set of task from the SPEC CPU 2017 benchmark is considered and the performance data for the resulting schedules is presented. Keywords DVFS · Power management · Energy efficiency · Frequency scaling · Multicriteria decision problem · Scheduling · SPEC benchmarks · Task-based programs · Data mining · Data analysis
1 Introduction Task scheduling methods are employed when a task-based application is to be executed on parallel hardware. The optimization goals of the scheduling can be quite different and influence the scheduling decision. In parallel computing, a low parallel R. Kramer (B) · G. Rünger Department of Computer Science, Chemnitz University of Technology, 09107 Chemnitz, Germany e-mail: [email protected] G. Rünger e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_23
253
254
R. Kramer and G. Rünger
execution time is an important optimization goal, which is also called makespan in the context of scheduling. More recently, the energy consumption is an increasingly important goal, which has to be taken into account for parallel application codes [1, 2]. However, scheduling methods have to be modified to meet this goal. This article emphasizes on the scheduling for energy efficiency and proposes a software supported scheduling method based on an analysis of performance data for selecting such a schedule. Processors with dynamic voltage and frequency scaling (DVFS) provide the possibility to adapt the operational frequency such that the energy consumption as well as the execution time is affected. In general, the computational speed increases with increasing frequency, however the energy consumption also increases and vice versa [3–5]. Thus, using a fixed frequency or limiting the frequency scaling range for the execution of an application might be advantageous if the application program is waiting, for example, for input and output operations since the execution time does no increase but the energy consumption decreases. In this article, we consider the scheduling problem that a set of independent tasks is to be scheduled onto a set of processor cores which provide frequency scaling. Thus, the determination of such a Task-Schedule includes that for each task not only the start time and the core are determined but also the operational frequency with which it is executed. Since the tasks in the set of independent tasks have no dependencies, there exists a multitude of possible options for the mapping and the execution order of the tasks each of which can be executed in a different frequency mode. Each of the possible Task-Schedules provides possibly different execution time and energy consumption values. In order to support the selection of an energy-efficient schedule, we propose an approach based on data analysis of these performance data comprising execution time and energy consumption. This approach requires multiple software components which perform the management and analysis of performance data. These components comprise well-known components such as a systematic measurement of data but also a specific data storage and management of data. Most important is a specific workflow which uses the components such that a suitable schedule is selected as a result. This workflow can be designed by the application programmer and contains specific intermediate decisions for the workflow selection. The contributions of this article include the design and implementation of the schedule creation and evaluation framework (scefx) based on reusable components and functions as well as the design of a specific workflow implementing the schedule selection decision with the help of this software framework. Moreover, this workflow is applied to a set of independent tasks which stem for the SPEC CPU 2017 benchmark. Those tasks cover benchmarks which are solely based on integer operations as well as benchmarks based on floating-point operations and include the applications: bwaves, cactuBSSN, deepsjeng, exchange2, fotonik3d, gcc, imagick, lbm, leela, mcf, nab, omnetpp, perlbench, pop2, roms, wrf, x264, xalancbmk, xz [6]. Intensive experimental results are given and demonstrate the effectiveness of our data analytic approach for an energy-efficient schedule selection. The article is structured as follows: Sect. 2 describes the scheduling problem for task-based programs with the optimization goals for DVFS processors and proposes
A Workflow-Based Support for the Automatic …
255
Table 1 Notation of the scheduling problem and the scheduling method Notation Meaning p P n T b B i Eb EB M Sb M SB
Number of execution units Set of p execution units, P = {P1 , . . . , Pp } Number of independent tasks Set of n independent tasks, i.e., T = {T1 , . . . , Tn } Bucket, which is a subset of tasks from T Set of p buckets, i.e., B = {B1 . . . , B p } Control variable when referencing in two sets Energy consumption for a bucket b Energy consumption for a set of buckets B, i.e., E B = {b∈B} E b Makespan of bucket b Makespan of a set of buckets B, i.e., M S B = max{b∈B} M Sb
a workflow for selecting a schedule. Section 3 presents the software framework supporting the workflow and describes the implementation and usage of the framework. Section 4 shows the experimental performance results of the selected schedules for the SPEC CPU benchmark. Section 5 discusses related work and Sect. 6 concludes.
2 Energy-Efficient Task Scheduling The task scheduling problem considered is defined in Sect. 2.1. Section 2.2 introduces a bucket-oriented scheduling and the scheduling selection process based on data analysis of performance data. Section 2.3 describes how frequency scaling is included in the determination of an energy-efficient schedule. The notation used in this section is defined in Table 1.
2.1 Scheduling Problem for Independent Tasks The scheduling problem for a set of n independent tasks T = {T1 , . . . , Tn } to a set of p execution units P = {P1 , . . . , Pp } of a parallel system is to find an assignment of tasks to execution units such that a certain optimization criterion is fulfilled. The execution units can be processors or cores depending on the architecture considered. The optimization criterion can be a small parallel execution time, which is also called makespan in the context of scheduling. The makespan of an assignment of set T to set P is the maximum execution time that is required by one of the execution units. Since the task is independent of each other, any order of execution is possible. Also, each task can be assigned to any core. Thus, there is a multitude of different assignment
256
R. Kramer and G. Rünger
possibilities of tasks to cores which have to be considered and from which the best one has to be selected. Scheduling algorithms or methodologies determine such an assignment. Since scheduling is an NP-completed problem, scheduling methods are usually heuristics. Search-based algorithms employ a search algorithm to select a schedule. In this article, we propose a method which determines a Task-Schedule by analyzing the performance data of the tasks and the different schedules. The data analysis works on specific data formats that are build up for the concept of buckets defined in the next subsection.
2.2 Bucket-Based Scheduling and Execution The scheduling method proposed in this article uses a set of p buckets B = {B1 . . . , B p } where each bucket is a subset of the task set T and p is the number of processors or cores available. When a schedule is executed, the buckets are assigned to the corresponding processors or processor cores. The resulting performance data for execution time and energy consumption are measured and represent the data basis for the analysis. The mapping of buckets to cores for execution is done by a specific software component which is called the executor. The executor software decides how the buckets are assigned to hardware. In this article the executor performs a one-toone mapping of buckets to processor cores so that each core is exclusively used for one bucket, i.e., the tasks in bucket Bi ∈ B are the tasks which are to be executed on execution unit Pi ∈ P, i = 1, . . . , p. Thus, a set of buckets is associated with a specific schedule. However, as the mapping is controlled by the executor, the usage of other executors could allow the assignment of a bucket to several cores, or distribute buckets over multiple machines. The data structure of buckets contains information for each task which comprises an identifier for the executor to request information on how to execute the task as well as cost information. The cost information includes the execution time as well as the energy consumption values for different processor frequencies. With this information the overall makespan M Sb of a bucket b as well as the overall energy consumption E b can be calculated. The makespan M S B of a schedule associated with a set of buckets B is equal to the maximum of all bucket makespans, i.e., M S B = max{b∈B} M Sb . The energy consumption of a set of buckets B is the sum of all energy consumption values for buckets b ∈ B as well as an additional idle power consumption M S B − M Sb for the idle times of the buckets b ∈ B. Since the idle power consumption is assumed to be much smaller than the power consumption of a processor working with its highest frequency, the idle power consumption is neglected during the scheduling decision, however, it still affects measured results and is included in an adaption. For the assignment of task to buckets, different algorithms are used. A makespanbased algorithm referred to as T-Schedule chooses the bucket to assign a task to according to the lowest makespan over all buckets, whereas an energy-based algo-
A Workflow-Based Support for the Automatic …
257
rithm referred to as E-schedule chooses the bucked based on the lowest energy consumption over all buckets. The E-schedules help to provide a more evenly energy consumption distribution over all processor cores but may lead to larger idle times compared to T-schedules which, therefore, may have a larger energy consumption deviation between all cores. Examples visualizing this behavior can be seen in Sect. 4. A general rule which distribution algorithm is the most energy efficient cannot be provided, as the result differs from task set to task set.
2.3 Frequency Scaling for Energy Efficiency The overall power consumption E B of a schedule B can be reduced by reducing the frequency of a processor until the point of its highest efficiency [4]. Furthermore, another reduction is possible as not all processor cores may need to run on their full frequency. Because of idle times caused by different makespans M Sb of buckets, it is possible to reduce the frequency of processor cores without increasing the overall makespan M S B . Due to the fact that on real systems we also account the idle power consumption, the effect of reducing idle time can be even larger than the expected value in our case, which ignores the idle power consumption. This effect can be important for schedules which already use the processor frequency with the highest efficiency, where a lower frequency would increase the estimated power consumption despite the reduction of idle time. To find a Pareto-optimum where the makespan as well as the energy consumption are small at the same time, it is necessary to construct all possible schedules, calculate the execution costs, and then select the best schedule. The theoretical foundation for such a Schedule Selection Process (SSP) has been explored in [7]. This theoretical foundation is now extended to also cover the acquisition, evaluation as well as management of measured performance data and is implemented as a software framework, called the schedule creation and evaluation framework (scefx), which is presented in the following.
3 Schedule Creation and Evaluation Framework This section presents the schedule creation and evaluation framework (scefx), which supports the automation of data acquisition, data evaluation, data management as well as schedule creation and selection. The data is composed of performance data measurements of independent tasks executed on different machines using different processor frequencies. The goal is to support a workflow which determines TaskSchedules with different optimization targets such as the lowest makespan, lowest energy consumption as well as multicriteria optimizations for pareto-optimal combinations where both criteria are minimal [8].
258
R. Kramer and G. Rünger
Fig. 1 Workflow for the collection of baseline data
3.1 Workflow to Determine a Schedule The workflow covers the steps data acquisition, data evaluation, data management, the determination of Task-Schedules and the execution of those schedules. More precisely, the workflow covers – – – –
The Schedule Selection Process (SSP) as presented in Sect. 2 The collection of baseline data (Fig. 1) The schedule execution and acquisition of performance data (Fig. 2) A cost determination (Fig. 3)
The workflow provides several interfaces or entry points which can either be used by a user or by the frameworks internals. In the following Figs. 1, 2 and 3 those entry points are colored in light gray. Collection of baseline data Baseline data are performance data resulting from measurements of specific tasks which are executed in isolation on one core while the other cores are idle. The execution of a single task can be interpreted as a schedule associated with a set of buckets which contains one bucket with that task and further empty buckets. Those schedules are executed using different processor frequencies to collect data about the performance behavior of the task on a specific machine which is stored in form of an application profile. The frequencies are selected by their importance for an interpolation. Harder to interpolate frequencies such as the min, max, and mean frequency have a higher priority than directly neighboring frequencies. However, the user may decide to collect data for all frequencies to improve quality of the application profile and therefore later estimations. Based on this baseline data, the execution costs M Sb and E b of a task set b and therefore a schedule B can be estimated. The sub-workflow for the collection of such
A Workflow-Based Support for the Automatic …
259
Fig. 2 Workflow for the schedule execution and acquisition of performance data
Fig. 3 Workflow for the cost determination
data is shown in Fig. 1. In the case that a new machine is introduced, metadata about the machine is collected followed by the collection of baseline data. In the case of the introduction of new applications, idle machines are used to collect baseline data in the background. Schedule execution and acquisition of performance data To execute a schedule as well as record and analyze the performance data, a workflow visualized in Fig. 2 is provided. This workflow describes the process of monitoring the CPU and the memory usage of each task and combines it with energy measurement data for storage. To prepare the execution of tasks, the execution environment is setup according to the schedule description and hardware parameters such as processor frequencies are configured. While tasks are executed, the monitoring system records the machine load as well as the energy consumption and links them to the active tasks. This recording happens timestep based, however, the start or termination of a task causes an event which triggers the monitoring system to add an additional record.
260
R. Kramer and G. Rünger
Cost determination The cost determination workflow in Fig. 3 uses cost information in form of (machine, application) combinations to determine the execution cost M S B and E B of a schedule B. Also the cost determination workflow is intended to be able to derive cost information for missing data points via interpolation in case of for example missing frequencies.
3.2 Framework Implementation To support and automate the usage of the workflow from Sect. 3.1 we have implemented the framework scefx. The framework scefx provides reusable functions required for task scheduling in general, but especially for the scheduling method of the case study SSP. The framework is written in Python and is built as a distributed system with components which are specialized. The software architecture enables an easy replacement of components or the creation of wrapper for other software systems as long as the interface is compatible with the interface of the replaced component. The interfaces follow the Representational State Transfer (REST) paradigm with JavaScript Object Notation (JSON) as data exchange format, as this reduces the complexity for rapid prototyping compared to other paradigms [9]. For the case study of the SSP, there are five components of which three implement the data acquisition, data evaluation, and data management. The fourth component implements the task-schedule creation and evaluation which in our use case implements the SSP. The fifth component is a user interface which is composed of a set of command line utilities which are able to access the data management as well as the task-schedule creation/evaluation via a web-based REST interface. This fifth component is the normal entry point for a user to work with the system. Via this component the user selects a target machine and provides a list of applications for execution and optimization targets to create a Task-Schedule. The Task-Schedule can be exported in various file formats to be reviewed or executed through the data acquisition component. In the following paragraphs more details for the three data handling components are given. Data Management To provide a central hub to connect all the following components and information, the data management provides an API to a database which stores all measured and generated data but also metadata about machines and applications. The machine metadata, such as information about the processor and memory layout or accelerators, is collected via the likwid library [10] by manually executing helper script when adding a new machine. For applications the metadata consists of general information such as the name and version number but also of information about the steps to reproduce and execution a specific application instance. Globally unique identifiers are used to reference machines, applications, and measurement data which allows components to collect data and communicate asynchronously. Data Acquisition The data acquisition component is a component to execute and monitor task schedules. Also, the component applies schedule machine settings such
A Workflow-Based Support for the Automatic …
261
as the processor frequencies. The component stores all associations between executed tasks and measurement data as described in Sect. 3.1. The implementation of the components solely relies on Python libraries as well as read access to the Linux kernels interface filesystem and is therefore highly portable. However, most Linux distributions require root access to set processor frequencies and therefore the operating systems need to enable rootless access to the corresponding interfaces or to provide a sudo wrapper for root access. Data Evaluation The data evaluation component uses methods such as curve fitting, clustering, or interpolation to determine cost for the execution. For each task of a Task-Schedule, those costs are calculated based on measurements of previous executions. The task-schedule creation/evaluation component communicates with the data evaluation component and requests the estimated execution costs for a (machine, f r equency, application) combination. The data evaluation requests all information from the data management which fits the (machine, application) combination. The data evaluation then uses curve fitting to interpolate possibly missing data and to calculate the estimation for the requested combination.
4 Experimental Evaluation In this section the workflow described in the previous Sect. 3 is applied to a specific set of tasks from the SPEC CPU 2017 benchmark suite as defined in Sect. 4.1. The collected baseline data is used to construct Task Schedules and to estimate their makespan as well as energy consumption in Sect. 4.2. For the two schedules with the lowest makespan as well as the lowest energy consumption the processor frequencies are adapted to minimize the idle time and reduce the energy consumption which leads to two additional schedules. All four schedules are executed to measure their real energy consumption. The percentage of the energy consumption reduction for the schedules in both cases, the estimation as well as the real execution, is calculated and compared.
4.1 Definition of Set of Tasks and Baseline Data Collection For the definition of a set of tasks and the collection of baseline data, the intspeed and specspeed benchmark sets of SPEC CPU 2017 have been selected to have a mixed workload of integer as well as floating-point benchmarks to cover real-world use cases as close as possible. Among the integer benchmarks are a C compiler (gcc), a vehicle scheduling in public mass transportation (mcf), mail server work such as spam checking (perlbench) or a Monte Carlo simulation-based artificial intelligence for the game go (leela). Among the floating-point benchmarks is a simulation of blast
262
R. Kramer and G. Rünger ·104
·105 1 energy consumption [Joule]
execution time [seconds]
3
2
1
0
0.8 0.6 0.4 0.2 0
1 bwaves lbm perlbench
1.5 2 2.5 frequency [GHz] imagick fotonik3d xalancbmk
3
wrf mcf omnetpp
3.5 nab leela
1 pop2 exchange2
1.5 2 2.5 frequency [GHz] roms x264
xz gcc
3
3.5
cactuBSSN deepsjeng
Fig. 4 Baseline data consisting of execution time (left) and energy consumption (right) for selected SPEC CPU benchmarks on an i7 Skylake
waves (bwaves), a solver for the Einstein equations in vacuum (cactuBSSN), image operations performed on large using ImageMagick (imagick) or weather research and forecasting (wrf). All measurement data has been collected on an Intel machine with Skylake architecture. The processor supports 15 different operational frequencies F = {0.8, 1.0, 1.2, 1.4, 1.5, 1.7, 1.9, 2.1, 2.3, 2.5, 2.7, 2.8, 3.0, 3.2, 3.4}, given in GHz. For the collection of baseline data all tasks have been executed exclusively using all available operational frequencies where all processor cores have been activated and had the same fixed frequency. The collected baseline data can be seen in Fig. 4.
4.2 T- and E-Schedule Creation Based on the baseline data, time-oriented (T) and energy-oriented (E)-Schedules have been created which select the bucket a new task is assigned to by either the lowest bucket makespan or the lowest bucket energy consumption. The schedule creation has been performed for all available frequencies with six buckets. The number of buckets was chosen, as it provided the best visualization of the frequency adaptation effect. Despite of the evaluation being based on all supported frequencies of the processor, the following presentation is focused on the frequencies with the lowest energy consumption 2.3 GHz as well as the lowest execution time 3.4 GHz. Exemplary for both frequencies, Fig. 5 visualizes the T- and E-Schedule for 3.4 GHz. The T-Schedules provide an even execution time over all buckets while the energy consumption shows fluctuations. The E-Schedules provide a more evenly distributed
6 4 2 0
0 1 2 3 4 5 T-bucket 3.4 GHz
·104 8 6 4 2 0
8
0 1 2 3 4 5 T-bucket 3.4 GHz
·103
6 4 2 0
0 1 2 3 4 5 E-bucket 3.4 GHz
energy consumption [Joule]
·103
263
execution time [Sec]
execution time [Sec]
8
energy consumption [Joule]
A Workflow-Based Support for the Automatic … ·104 8 6 4 2 0
0 1 2 3 4 5 E-bucket 3.4 GHz
Fig. 5 Makespan and energy consumption per core for 3.4 GHz via the time-oriented approach (left) and the energy-oriented approach (right)
energy consumption over all buckets which however results in a large fluctuation in the execution time. Frequency Adaptation Due to the fluctuation of the makespan for each schedule, the processors frequency for some buckets can be reduced (adapted) to reduce the difference between all makespans. In this set of T- and E-Schedules, the frequencyadapted E-Schedule provides the lowest total energy consumption for 3.4 GHz, which also happens to be the same case for 2.3 GHz. However, this does not apply for all frequencies, for example, for 3.2 GHz the T-Schedule provides the lowest total energy after the frequency adaptation. As 2.3 GHz is the frequency with the highest efficiency for the processor the frequency adaption leads to a slightly higher total energy consumption despite the fact that the idle time was reduced. This is due to the initial assumption of the idle power consumption to be zero as it was much lower than the power consumption when the processor is running at its highest frequency under load. After the analysis of all schedules with and without frequency adaptation two schedules where identified to be pareto optimal and have therefore been selected: 1. the E-schedule with a base frequency of 3.4 GHz as schedule with the lowest makespan and lowest energy consumption of all schedules with the same makespan. 2. the E-Schedule with a base frequency of 2.3 GHz as schedule with the lowest overall energy consumption. The frequency adaptation has assigned the frequencies F3.4 GHz = [3.4, 3.0, 3.2, 3.4, 3.2, 2.8] and F2.3 GHz = [2.3, 2.1, 2.1, 2.3, 2.1, 2.1], both given in GHz. Figure 6 visualizes the makespan resulting for these frequencies. The overall makespan of the schedule with 3.4 GHz as maximal frequency is lower than the schedule with 2.3 GHz, as expected. Furthermore, fluctuation within those schedules is different as well where for 2.3 GHz not only the makespan is larger but also the fluctuation. This can be explained by the frequencies supported by the processor where the common step size is 200 MHz. As a result, fluctuations are more likely to appear for 2.3 GHz than for 3.4 GHz as the relative frequency change in percent is lower for 3.4 GHz. The
264
R. Kramer and G. Rünger
·104
·104 1
execution time [Sec]
execution time [Sec]
Fig. 6 Makespan for the energy-oriented approach with frequency adaptation per core via with 3.4 GHz (left) and 2.3 GHz (right)
0.5
0
0 1 2 3 4 5 E-bucket 3.4 GHz
1
0.5
0
0 1 2 3 4 5 E-bucket 2.3 GHz
Table 2 Comparison of estimated energy consumption and measured energy consumption for the execution of E-Schedules without frequency adaptation (orig) and with frequency adaptation (adap) for 2.3 and 3.4 GHz Schedule Estimation Real Deviation 3.4 GHz orig 3.4 GHz adap Reduction 2.3 GHz orig 2.3 GHz adap Reduction
453553 424356 6,44% 353528 354445 –0,26%
401520 378724 10.75% 293658 287573 2.03%
11.47% 10.75% 16.94% 18,87%
resulting higher granularity raises the chance to find frequencies with a makespan close but not larger than the overall makespan. Measured Energy Consumption The energy consumption resulting from the execution of the schedules with and without frequency adaptation is shown in Table 2. The schedule is identified by its base frequency followed by “orig” for no frequency adaptation and “adapt” for schedules with frequency adaptation. For comparison the estimated and real power consumption are presented, the last column provides the deviation between the estimated and real power consumption in percent. The Table shows that the reduction of the idle time for the schedules executed with 2.3 GHz does indeed lead to a lower energy consumption even if it is only a small improvement of 2.03% which is a total saving of 6085 J or 1.69 Wh. The reduction of idle time for 3.4 GHz reduces the energy consumption by 10.75% which is a total saving of 22796 J or 6.33 Wh. Effect of the Idle Power Consumption As the data collection did not compensate for the idle power consumption of unused processor cores, a high deviation between the estimation and real power consumption was expected. For the schedules without frequency adaptation the deviation is between 11.47 and 16.94%. For the schedules
A Workflow-Based Support for the Automatic …
265
with frequency adaptation the deviation is between 10.75 and 18, 87%. The percentile reduction shows, the measured reduction of the energy consumption is as expected larger than the estimated energy consumption. The observations show the importance of idle power consumption of the estimation of energy consumption. Thus, the initial assumption of the idle power consumption to be zero, while causing no issues with the makespan calculation, did lead to energy consumption estimations which are too high. However, for our use case the makespan is the main criteria to recognize and utilize possibilities to reduce the energy consumption while the estimation of energy consumption is mainly used during the distribution of tasks. For the distribution of tasks, the effective criteria for bucket selection are the relative difference between the tasks. Since the baseline measurements of all tasks have been performed under the same assumption, the error caused by this assumption is present as a constant factor which differs for each frequency. Therefore, as long as the comparisons happen within the same frequency, the error has no effect on the distribution of tasks because as it does not affect the relative difference [11].
5 Related Work Saving energy is an important issue for many applications in computer systems. It was found in [4] that manually setting a frequency for dynamic voltage and frequency scaling (DVFS) can lower the energy consumption but only to a minimum, lowering the frequency any further does increase energy consumption. An approach to achieve Intra-Task DVFS based on profile information is proposed in [12]. The energy aware distribution of workload in form of Virtual Machines is described in [13]. As described in [14] in cases of unbalanced MPI applications the energy consumption can be reduced by reducing wait times on synchronizations through a work estimation and active frequency control. The matter of reducing the energy consumption using data science is not limited to computer systems. A new approach to forecast the load on the electrical grid and schedule the power on time of large appliances is proposed in [15].
6 Conclusion In this article, we have proposed a software framework called schedule creation and evaluation framework (scefx), which is designed to support experiments to collect and evaluate a large variety of task performance data and metadata. An advantage is that existing performance data containing energy consumption and execution times of tasks can be used to adapt the processor frequencies to minimize idle times. We have performed experiments with scefx using a set of tasks from the SPEC CPU 2017 benchmark suite. Schedules with a theoretical minimal energy consumption
266
R. Kramer and G. Rünger
for the highest and the most efficient processor frequency have been constructed. Those schedules have been executed with and without frequency adaption to measure and compare the real to the theoretical energy saving. For the schedules based on the shortest overall execution time the energy saving was 10.75%. For the schedules based on the lowest energy consumption the energy saving was 2.03%. The framework scefx and the specific workflow automate many parts for the collection and evaluation of schedule performance data which allows the execution of more performance experiments in less time. Due to the design of scefx based on reusable components and functions more advanced scheduling techniques can be added and evaluated with minimal additional programming effort. Acknowledgements This work has been supported by the German Ministry of Science and Education (BMBF) project SeASiTe Self-adaption of time-step-based simulation techniques on heterogeneous HPC systems, Grant No. 01IH16012B.
References 1. Aupy G et al (2015) Energy-aware algorithms for task graph scheduling, replica placement and checkpoint strategies. In: Handbook on data centers. Springer New York, New York, NY, Chap. 2, pp 37–80. ISBN: 978-1-4939-2092-1. https://doi.org/10.1007/978-1-4939-2092-12 2. Rocha I et al (2019) HEATS: heterogeneity- and energy-aware task-based scheduling. CoRR arXiv:1906.11321 3. Fedorova A et al (2009) Maximizing power efficiency with asymmetric multicore systems. In: Queue 7.10, pp 30–45. ISSN: 1542-7730. https://doi.org/10.1145/1647300.1658422 4. Le Sueur E, Heiser G (2010) Dynamic voltage and frequency scaling: the laws of diminishing returns. In: Proceedings of the 2010 international conference on power aware computing and systems. HotPower’10. USENIX Association, Vancouver, BC, Canada, pp 1–8 5. Saxe E (2010) Power-efficient software. In: Communication ACM 53.2, pp 44–48. ISSN: 0001-0782. https://doi.org/10.1145/1646353.1646370 6. Bucek J, Lange K-D, Kistowski JV (2018) SPEC CPU 2017: next- generation compute benchmark. In: Companion of the 2018 ACM/SPEC international conference on performance engineering. ICPE ’18. Association for Computing Machinery, Berlin, Germany, pp 41–42. ISBN: 9781450356299. https://doi.org/10.1145/3185768.3185771 7. Rauber T, Rünger G (2019) A scheduling selection process for energy- efficient task execution on DVFS processors. In: Concurrency and computation: practice and experience 31.19. e5043 cpe.5043, e5043. https://doi.org/10.1002/cpe.5043. https://onlinelibrary.wiley.com/ 8. Ehrgott M (2005) Multicriteria optimization, 2nd edn. Springer, Berlin, Heidelberg. OnlineRessource (XIII, 323 p. 88 illus, digi-tal). ISBN: 9783540276593 9. Tihomirovs J, Grabis J (2016) Comparison of SOAP and REST based web services using software evaluation metrics. Inf Technol Manag Sci 19(1):92–97. https://doi.org/10.1515/itms2016-0017 10. Treibig J, Hager G, Wellein G (2010) LIKWID: a lightweight performance- oriented tool suite for x86 multicore environments. In: 2010 39th international conference on parallel processing workshops, pp 207–216 11. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques. The morgan kaufmann series in data management systems. Elsevier Science. ISBN: 9780080890364 12. Qin Y et al (2019) Energy-efficient intra-task DVFS scheduling using linear programming formulation. IEEE Access 7:30536–30547
A Workflow-Based Support for the Automatic …
267
13. Shojafar M et al (2016) An energy-aware scheduling algorithm in DVFS- enabled networked data centers. In: Proceedings of the 6th international conference on cloud computing and services science-volume 2: TEEC, (CLOSER 2016). INSTICC. SciTePress, pp 387–397. ISBN: 978-989-758-182-3. https://doi.org/10.5220/0005928903870397 14. Kappiah N, Freeh VW, Lowenthal DK (2005) Just in time dynamic voltage scaling: exploiting inter-node slack to save energy in MPI Programs. In: SC ’05: Proceedings of the 2005 ACM/IEEE conference on supercomputing, p 33 15. Park S et al (2020) A two-stage industrial load forecasting scheme for day- ahead combined cooling, heating and power scheduling. In: Energies 13.2, p 443. ISSN: 1996-1073. https://doi. org/10.3390/en13020443
Artificial Intelligence Edge Applications in 5G Networks Carlota Villasante Marcos
Abstract In recent years, the Fifth Generation of mobile communications has been thoroughly researched to improve the previous 4G capabilities. As opposed to earlier architectures, 5G Networks provide low latency access to services with high reliability. Additionally, they allow exploring new opportunities for applications that need to offload computing load in the network with a real-time response. This paper analyzes the feasibility of a real-time Computer Vision use case model in small devices using a fully deployed 5G Network. The results show an improvement in Latency and Throughput over previous generations and a high percentage of Availability and Reliability in the analyzed use case. Keywords 5G · URLLC · Computer vision · Artificial intelligence · E2E latency · OWD · E2E service response time · Availability · Reliability
1 Introduction Mobile communications have experienced a continuous transformation process, from mobile phone calls and SMS with 2G to video calls and data consumption anywhere with 4G [1]. Nevertheless, the increase in video streaming and high-data-consuming use cases have caused an impact on the network. In addition, data consumption is rapidly rising, new number of connections appear each day and new technologies such as Internet of things (IoT) need better energy efficiency [2]. The Radiocommunication Sector of the International Telecommunication Union (ITU-R) set up a project named IMT-2020 [3], which has established further stricter requirements to provide high speed, high reliability, low latency, and energy efficiency mobile services that could not be achieved with previous generations. Within that project, ITU-R defined 5G usage scenarios depending on the requirements addressed on the network: enhanced Mobile BroadBand (eMBB), Ultra-Reliable and Low Latency Communications (URLLC), and massive machine-type communications (mMTC). C. Villasante Marcos (B) Ericsson España SA, Madrid, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_24
269
270
C. Villasante Marcos
Every 5G use case or application can be allocated to one or several usage scenarios, depending on the requirements established on the key capabilities of IMT-2020 [4]. The requirements set, and the progress made on the standardization of 5G, offer network operators the possibility to provide new services to a high number of areas such as healthcare, industry, education, tourism, etc. [5]. Computer Vision (CV) can be found nowadays in several real examples and Use Cases (UCs) of those areas, as self-driving car testing, health diagnostics, security and surveillance cameras, or even in stores [6]. Not only are Artificial Intelligence (AI) techniques present on new UCs but they also help optimize network performance, enhance customer experience, boost network management by controlling network slicing and solve complex data problems [7]. Most of the solutions with AI rely on heavy machinery, as it requires high computational load for some of the tasks involved. More often, examples of CV can be seen in lighter devices like smartphones or Virtual Reality (VR) glasses, and other personal devices where the workload is done on the device itself, but it still requires quite complex dedicated hardware and software. 5G Networks provide infrastructure and resources, such as edge computing [8], low latency, and high reliability, in order to move the computation load to another location. In this manner, the devices could be lighter, simpler, and cheaper, as the intelligence relies on the network and it can become a distributed solution [9]. Offloading the devices’ intelligence into the network will also ease to request for more computer capacity, fix any encountered problem, and update the software more easily since the software is in a centralized server. The appearance of so many new UCs has encouraged the launch of several projects and research programs, such as 5G Public–Private Partnership (5GPPP) to explore the possibilities of 5G. Focusing on a European scenario, we can find 5G-EVE “5G European Validation platform for Extensive trials” project [10], whose main purpose is to facilitate a way to validate 5GPPP projects network Key Performance Indicators (KPIs) and services. 5G-EVE evaluates real use cases in a 5G network environment [11] by trying to evaluate which 5GPPP scenario they fit in and the requirements needed to fulfill necessities, showcasing them with a final demo. 5G-EVE gives an end-to-end facility by interconnecting European sites in Greece, Italy, France, and Spain. Specifically, in Spain, 5TONIC [12] was designated as the Spanish laboratory site for 5G-EVE, and where the realization of this study takes place. In this paper, we introduce the analysis of real-time CV with image recognition on small connected devices in 5G networks. For this proposal, a fully deployed 5G NonStandalone (NSA) network located in the 5G-EVE Spanish site, based on Ericsson’s 5G portfolio, is used to give coverage to a device with an incorporated camera which will be continuously streaming video to the CV application located on the network. The results will demonstrate the benefits of the new generation network over the previous mobile generation regarding latency, reliability, and throughput.
Artificial Intelligence Edge Applications in 5G Networks
271
2 Materials and Methods 2.1 Goals The main goal of this work is to evaluate the feasibility of Computer Vision with image recognition UCs on the new generation of mobile networks. As a baseline, a camera is used, streaming video through the network to a server where CV image recognition is being applied and a control command is sent back to the device. On the one hand, edge computing, a technique that enables to perform computation offload in the network near the connected devices, is analyzed to check if there is any advantage. On the other hand, we carry out a comparison between 4 and 5G-New Radio (NR), since the latter ensures low latency and high reliability on the network, enabling real-time response applications.
2.2 Environment The environment for the study is described in Fig. 1, composed by a Camera, a 5G NR NSA Customer Premise Equipment (CPE), a 5G NR Radio Access Network (RAN), option3 or NSA NR [13], a Packet Core (PC) with 5G NSA support and a Server where the CV Application (APP), with a trained Neural Network (NN), is running. As the study was done at the 5TONIC laboratory, the infrastructure used was provided by the project [14]. As shown in Fig. 2, it consists of Raspberry pi with a camera module connected to a 5G CPE as the User Equipment (UE), a RAN (with 4G, 5G NSA and Narrowband-IoT (NB-IoT) access), a Packet Core (with 4G and 5G NSA support) and a Transport layer, which enables to simulate several environments Fig. 1 High-level environment representation
Fig. 2 Use case environment representation of 5Tonic equipment
272
C. Villasante Marcos
and conditions. Furthermore, the RAN was configured with a bandwidth of 50 MHz in B43(4G)/n78(5G) and a Time-Division Duplexing (TDD) pattern of 7:3 to improve the achievable Uplink throughput.
2.3 Measurements The use case evaluated in this paper is a good example of a URLLC service and the most important KPI is Latency or End-to-End (E2E) Latency, as considered in 5GEVE [15]. Availability and Reliability, related to the percentage of correctly received packets, must be evaluated. Computer vision use cases usually come alongside with real-time response applications where immediate reactions and non-packet-loss are needed. For example, if a pedestrian steps in the way of a vehicle, it needs to react as fast as possible to avoid crashes by slowing down or changing directions. Considering that in these types of UCs the UE needs to send a great amount of data through the network, we should also analyze User Data Rate and Peak Data Rate. E2E Latency as a 5G EVE KPI is measured as the duration between the transmission of a small data packet from a source node and the reception of the corresponding answer at the same point. In this paper, we take several measurement methods of Latency to evaluate the use case, such as Ping in order to evaluate such latency as an emulated control packet, Round Trip Time (RTT), shown in (1), and Smooth RTT (SRTT) in (2), as defined in [16] to measure the E2E Latency on a small TCP packet and on the transmission of a video frame. The latter measurement is considered the most important KPI about latency in this study, as it measures the real E2E Latency of the UC on the network. RTT = TimeTCP ACK − TimeTCP Packet
(1)
SRTT = (1 − α) ∗ SRTT + α ∗ R’
(2)
In addition, One-Way Delay (OWD) is also presented. OWD is understood as the time difference between the time of transmission and reception. It could be measured in both uplink and downlink, but in this use case, the relevant path is uplink, since the video is being sent from the UE to the application layer. Until now, the KPIs correspond to network propagation; however, to calculate the complete E2E Service Response Time and analyze the feasibility of these kinds of UCs, the processing time of the video frame on the application must be added, as shown in Fig. 3.
Artificial Intelligence Edge Applications in 5G Networks
273
Fig. 3 Representation of E2E latency, OWD, and E2E service response time
2.4 Methodology The User Equipment used in this study is composed of a Raspberry Pi 3 model B+ [17] with a camera module [18] connected to a 5G NR CPE. In order to have a live video streaming, we used the Linux project UV4L [19] with the raspicam driver [20], where several parameters of the video can be set easily, such as image encoding and resolution, and streaming Server options. In this study, the parameters used were: MJPEG and H264 encoding, 20 frames per second (fps), 640 × 480 (VGA), 720 × 576 (D1), and 1280 × 720 (HD) image resolution [21]. The CV APP Server consists of a dockerized application with a Caffe [22] NN, which receives the images sent by the UE and processes them to perform image recognition to later immediately send a command to the UE. The NN was trained to recognize people and objects that could be found in the lab as shown in Fig. 4. Furthermore, to make the evaluation of the use case and retrieve the measurements and interesting KPIs, the traffic on the network is captured with TCPdump [23], and three specific positions on the environment were selected: the UE, the Core, and the APP Server. In order to extract the KPI of E2E latency, the measurement on UE is the only one needed, but to evaluate OWD delay it is necessary to extract the timestamp of the packet in different locations and subtract them, as shown in (3) for a TCP uplink packet. OWD(TCP)Uplink = Time(TCP)APP Server − Time(TCP)UE
(3)
When trying to measure both E2E latency and OWD of a video frame, another dimension appears, since it is not only measured for one packet but several packets. Fig. 4 Recognition of a chair shown in the image processed by the CV APP Server at 5Tonic
274
C. Villasante Marcos
Fig. 5 Video frame packet sequence to measure Service E2E Latency and OWD. Representation of synchronization NTP Server-Client mechanism, measurement extraction points on UE, Packet Core, and APP Server
The video frame can be defined as the group of packets between two packets with a payload value string containing “–Boundary”. Despite being TCP, not all the packets have an Acknowledgement (ACK) packet, but instead, it is used as a receipt of a group of packets, not necessarily the group corresponding to a video frame. Hence, in this case, the response packet considered to measure E2E Latency is the next ACK packet to appear after a video frame (see Fig. 5). To calculate E2E Service Response Time (E2ESRT ), the OWD of a video frame in the uplink direction OWDVU , the Processing time τ, and time of transmission of the response OWDRD are needed, represented in (4). E2ESRT = OWDVU + τ + OWDRD
(4)
To have the minimum possible error and the highest precision, lab synchronization is one of the key components of the study, and we achieved it by using a Network Time Protocol (NTP) Server-Client mechanism, where all devices and nodes are set by the same clock. In addition, we evaluated several scenarios to have a better understanding of the feasibility of this use case on a 5G network when compared to 4G. We used Netem to add additional latency and simulate the location of the Core and APP Server with respect to the RAN. Edge computing, or local range scenario, is a new scenario that is only present in 5G. Below we considered edge computing as a range of 0 km, regional range scenario of 200 km, and the national range scenario of 400 km. In order to do this analysis and comparison between the results in different scenarios, we added a delay in milliseconds (ms) on the packet core interface, 2 ms for a regional scenario and 4 ms for the national scenario.
Artificial Intelligence Edge Applications in 5G Networks
275
The experiments of this work aim to assess the feasibility of a Computer Vision use case in a 5G network. Depending on the application where CV is present, the delay might be crucial, in an automotive scenario that time is directly related to the velocity (v) of a vehicle or space (s) traveled when braking, as shown in (5). v = s/t = s/E2ESRT
(5)
3 Results The main measurement of this analysis is E2E Latency. First, in order to set a baseline, we measured Latency with the Ping utility for an ICMP packet, which is considered in this study as a control packet. Second, we had to consider that the video stream in this use case is sent over Transmission Control Protocol (TCP) and, since it is a connection-oriented protocol, it guarantees that all sent packets will reach the destination in order and allows to take trustworthy measurements. We took the first measurements for a video stream with an MJPEG encoder and 640 × 480 resolution. Both Latency for a control packet and TCP packets are exposed in Fig. 6. Third, as previously explained, it is important to consider the Service E2E latency. This is the case for the transmission of a whole video frame, shown in Fig. 7. Depending on the type of traffic, control packet, TCP packet, or whole video frame, the E2E latency values differ, but we can observe that all 5G scenarios have lower values than on both 4G regional and national scenarios. 5G Edge computing scenario is the fastest and it reduces in 23% the latency for the presented UC, 66% for control packets, and 44% for TCP packets. As mentioned in Sect. 2.3, OWD can be measured in both directions: uplink and downlink. However, in this scenario, the important measurement is on uplink as the traffic load and the corresponding delay is greater when transmitting video uplink than sending the confirmation of receipt. To make the comparison, we set
Fig. 6 Left: control packet latency boxplot measured with Ping on each scenario. Right: TCP packet latency boxplot measured with SRTT on each scenario
276
C. Villasante Marcos
Fig. 7 Service video frame latency boxplot measured with SRTT on each scenario
Fig. 8 Left: comparison of RTT, OWD uplink, and Ack transmission on a 5G network. Right: Video frame OWD boxplot per encoder and size
three different image resolutions (VGA, D1, and HD) with two different encoders (MJPEG and H264) to observe a possible variation on the OWD (see Fig. 8). As mentioned before, all nodes and devices were synchronized with a common clock in order to achieve minimum error and higher precision. To calculate the propagation error in this environment, we measured the ntp offset in every node every 10 s. If a node is UE, b node is the Core and the dockerized CV APP server is c, we must apply (6) and (7) as defined in [24] to calculate the propagation error. The total error can be approximated as a sum of normal distribution errors. Q =a+b+c
(6)
σ Q2 = (σa )2 + (σb )2 + (σc )2
(7)
For each node, we calculated the standard deviations of the ntp offsets, being 0.387 ms on the UE, 0.317 ms on the Core, and 0.117 ms on the Container, which results in a standard propagated error of ±0.821 ms. The CV application consists of different computing actions: transformation of the images received to NN input known as Blob, the NN detection, interpretation of the detections, and command creation. During the tests, the processing time was observed to be similar with both MJPEG and H264 encoding and with different image resolutions, a total of 20.3 ms. In addition, if we calculate the E2E Service
Artificial Intelligence Edge Applications in 5G Networks
277
Fig. 9 Left: time distribution of each action performed on the CV application. Right: time distribution of E2E Service Response Time
Response Time, as defined in (4), for an MJPEG video stream with VGA resolution and assuming an OWDRD of 5 ms, the result is 87 ms. In Fig. 9, the time distribution of each action in the CV APP and the time distribution of the E2E Response Time can be appreciated. Furthermore, Availability is defined by the percentage of packets that were successfully delivered through the network. In this study, no packet loss was found which corresponds to a 100% of Availability in 5G. Regarding Reliability, it is defined as the percentage of successfully delivered packets within the time constraint required by the service. In Fig. 10, the Video frame OWD Latency Cumulative Distribution Functions (CDFs) for the VGA resolution with MJPEG encoding can be observed. Based on (5), the maximum velocity of a vehicle was calculated in both networks and scenarios, considering that it must detect an object and command an order within 1 m. Assuming E2ESRT as before but with an OWD for upper 95% reliability, the results for the calculated velocity are shown in Table 1. Finally, in order to determine the maximum achievable bandwidth, we used iperf3 [25] on both 5G and 4G networks, achieving 54.6Mbits/sec and 32.2Mbits/sec, respectively. In Fig. 11, the demanded throughput with each type of image resolution is shown. It can be observed, for both D1 and HD resolutions, that the demanded
Fig. 10 Video frame OWD latency CDFs for the VGA resolution on each scenario and reliability set to 95%
278
C. Villasante Marcos
Table 1 Table captions should be placed above the tables Velocity (km/h)
5G edge
5G regional
5G national
4G regional
4G national
40,31 km/h
39,43 km/h
37,7 km/h
35,19 km/h
34,51 km/h
Fig. 11 Demanded throughput per image resolution and maximum achievable throughput thresholds
throughput is above the maximum achievable bandwidth in a 4G network, this implies that to maintain a good Quality of Service a downgrade is required.
4 Conclusions This paper presents the improvements of 5G NSA for a Computer Vision model UC with image resolution. As appreciated on the results, all explored scenarios on 5G have lower values on the calculated E2E latency for each type of traffic, being Edge computing the fastest one. In this UC, the predominant direction of traffic is found to be the uplink traffic and a slight difference is appreciated on the OWD results when the resolution of the video is changed. Despite the processing time on the calculation of the E2E Service Response time, it is observed that 5G Edge scenarios allow a faster velocity on a vehicle, in order to detect an object within 1 m and being able to react with a reliability higher than 95% in the network. In addition, the maximum throughput achievable in a 4G network is below the observed demanded throughput for higher resolutions. This implies that a 5G network is needed to obtain a video stream with high resolutions and, at least, 20fps. Acknowledgements This Master Thesis was supported by Ericsson and Universidad Carlos III de Madrid. I would like to thank Marc Mollà Roselló for his assistance as the Thesis supervisor, and the colleagues from the 5G-EVE project who provided insight and expertise that assisted the research. Also, the friends and family who provided comments that greatly improved the manuscript.
Artificial Intelligence Edge Applications in 5G Networks
279
References 1. What is 5G? https://www.ericsson.com/en/5g/what-is-5g?. Accessed 22 Oct 2020 2. ETSI 5G, https://www.etsi.org/technologies/5g. Accessed 22 Oct 2020 3. ITU towards “IMT for 2020 and beyond”, https://www.itu.int/en/ITU-R/study-groups/rsg5/ rwp5d/imt-2020/Pages/default.aspx. Accessed 22 Oct 2020 4. Elayoubi SE et al (2016) 5G service requirements and operational use cases: analysis and METIS II vision. In: 2016 European Conference on networks and communications (EuCNC), Athens, pp 158–162 5. Discover the benefits of 5G, https://www.ericsson.com/en/5g/use-cases. Accessed 22 Oct 2020 6. Faggella D Computer vision applications–shopping, driving and more, https://emerj.com/aisector-overviews/computer-vision-applications-shopping-driving-and-more. Accessed 22 Oct 2020 7. What role will Artificial Intelligence have in the mobile networks of the future? https://www. ericsson.com/en/networks/offerings/network-services/ai-report. Accessed 22 Oct 2020 8. Ericsson Edge computing-a must for 5G success, https://www.ericsson.com/en/digital-ser vices/edge-computing. Accessed 22 Oct 2020 9. Image recognition applications in the era of 5G, https://www.ericsson.com/en/blog/2019/6/dis tributed-ai-image-recognition-applications. Accessed 22 Oct 2020 10. 5G EVE, https://www.5g-eve.eu/. Accessed 22 Oct 2020 11. Canale S, Tognaccini M, de Pedro LM, Ruiz Alonso JJ, Trichia K, Meridou D et al (2018). D1.1 requirements definition & analysis from participant vertical industries 12. 5TONIC, https://www.5tonic.org/. Accessed 22 Oct 2020 13. 3GPP release 15 overview-IEEE spectrum, https://spectrum.ieee.org/telecom/wireless/3gpprelease-15-overview. Accessed 22 Oct 2020 14. Legouable R, Trichias K, Kritikou Y, Meridou D, Kosmatos E, Skalidi A et al (2019). D2.6 participating vertical industries planning. Zenodo 15. Ruiz Alonso JJ, Benito Frontelo I, Iordache M, Roman R, Kosmatos E, Trichias K et al (2019) D1.4 KPI collection framework. Zenodo. 16. Paxson V, Allman M, Chu J, Sargent M (2011) Computing TCP’s retransmission timer. RFC 6298. https://doi.org/10.17487/RFC6298, https://www.rfc-editor.org/info/rfc6298. Accessed 22 Oct 2020 17. Raspberry Pi 3 Model B+, https://www.raspberrypi.org/products/raspberry-pi-3-model-bplus/. Accessed 22 Oct 2020 18. Camera Module V2, https://www.raspberrypi.org/products/camera-module-v2/. Accessed 22 Oct 2020 19. User space Video4Linux, http://www.linux-projects.org/uv4l/. Accessed 22 Oct 2020 20. Uv4l-raspicam, http://www.linux-projects.org/documentation/uv4l-raspicam/. Accessed 22 Oct 2020 21. Resolution, https://elinetechnology.com/definition/resolution/. Accessed 22 Oct 2020 22. Caffe, https://caffe.berkeleyvision.org/. Accessed 22 Oct 2020 23. TCPDUMP & LIBPCAP, https://www.tcpdump.org/. Accessed 22 Oct 2020 24. Harvard, A Summary of Error Propagation, http://ipl.physics.harvard.edu/wp-uploads/2013/ 03/PS3_Error_Propagation_sp13.pdf. Accessed 22 Oct 2020 25. Iperf3, http://software.es.net/iperf/. Accessed 22 Oct 2020
A Concept for the Use of Chatbots to Provide the Public with Vital Information in Crisis Situations Daniel Staegemann , Matthias Volk , Christian Daase , Matthias Pohl, and Klaus Turowski
Abstract In times of crisis, at which many people experience insecurity, fear, and uncertainty, the government plays a critical role when it comes to civil protection and the distribution of vital information. However, in many cases, misleading or contradictory information are spread by various sources, exacerbating the desperation of the local population and leading to wrong decisions, which might worsen the situation. To counteract this problem, a single system that is highly trustworthy and supports the citizens in gathering tailored high-quality information to overcome the crisis appears to be sensible. In this paper, a conceptual approach is presented that attempts to provide this single point of interaction by combining chatbots, robotic process automation, and data analytics for the tailored provisioning of important information, sourced from pre-selected institutions. This assures the highest possible degree of information quality and therefore acceptance in the potential user base, while also allowing for a fast dissemination of new findings and statistics as well as the ability to monitor the users’ behaviour to gain additional insights. Keywords Chatbot · Robotic process automation · Civil protection · Emergency communication · Big data
D. Staegemann (B) · M. Volk · C. Daase · M. Pohl · K. Turowski Otto-von-Guericke University Magdeburg, Universitaetsplatz 2, 39106 Magdeburg, Germany e-mail: [email protected] M. Volk e-mail: [email protected] C. Daase e-mail: [email protected] M. Pohl e-mail: [email protected] K. Turowski e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_25
281
282
D. Staegemann et al.
1 Introduction While the responsibilities of modern governments are multifarious, one of the primary tasks is the avoidance of serious harm to their citizens. Even though, this applies at any time, it especially holds true in times of crisis [1]. Exemplary occurrences that belong to this category would be terrorist attacks, earthquakes, tsunamis or pandemics. Besides the primary mission of dealing with the situation by means of the corresponding operatives, another highly important challenge is to quickly establish a way to effectively communicate important information from a variety of concerned domains to the public [2]. Especially in consideration of the multitude of rivalling sources of information like news outlets, social media, blogs, and others who might follow their own agenda instead of the best interest of the recipients, the government can be compared to an enterprise that competes to distribute its goods (trustworthy information) to as many customers (citizens) as possible [3]. For this purpose, it is necessary to make the government’s proposition as attractive as possible. However, most competitors as well as the government itself do not directly charge for the provided information. Therefore, the price as a distinctive feature of an information provisioning service is ruled out and other properties like the perceived quality of information and the ease of use of the offered solutions remain as the deciding factors for the decision making. Furthermore, also, outdated official information or those that are aimed at another audience might distract from the latest relevant insights. Apart from that, even not obtaining any information at all or only part of it constitutes a real possibility that might, however, result in negative consequences for the regarded person or its surroundings. This is exacerbated by the fact that, it is not necessarily apparent to the individual, at which point the entirety of the relevant information are acquired or if something essential is still missing. For this reason, considering the duty of ensuring the safety of their citizens as well as the public order, even in case of a crisis, many governments are tasked to provide the public with a solution that is suitable to overcome those aforementioned challenges. Consequently, the aim should be to establish a single point of contact as a comprehensive and reliable source of vital information that is easy to access and therefore reaches a wide audience. This, however, culminates in the following research question. RQ: How can a quickly to activate, scalable, easy to use, always up-to-date communication system be designed to constitute a single point of interaction to a highly heterogeneous audience that requires reliable information from a variety of domains to overcome crisis situations? While it may be undeniable that the correct understanding of the information provided is crucial to fully exploit a system that is suited to the research question, an opportunity for users to communicate further questions can be essential. As noted by Pourebrahim et al. [4], governments in disaster situations are usually prominent users of the microblogging service Twitter, but they use it primarily as a one-way communication platform rather than communicating directly with their audience. Assuming that one or more people who work for the government are responsible for the Twitter account, it is no surprise that they cannot answer every single question
A Concept for the Use of Chatbots to Provide …
283
posed by each user. Therefore, we present a solution that uses a reasonable selection of robotic technologies to provide two-way communication to reduce the problem of ambiguity. The key concepts here are the structuring and analysis of the huge amounts of information by using emerging big data analytics approaches, the collection and updating of data on different systems by incorporating software robots from the field of robotic process automation (RPA) and finally the automated bidirectional distribution of information with the help of traditional chatbots.
2 Background Disasters such as pandemics or attacks on national security pose the problem that they occur unexpectedly and that each emergency protocol usually has to be slightly adapted to the specific scenario [5]. The thirst for information of the population rises, and the amount of available data can be vast and distributed [6]. In order to save lives, a single information system that provides people with everything they need to know is beneficial, especially the rules of conduct in their respective regions. Time is a precious resource in a crisis, and human manpower might be limited as well [2, 5, 6]. Thus, the development of a whole new software solution each time a sudden global or national threat occurs is impractical, especially if the distributed information sources, such as government agencies or research institutions [5], remain largely the same. As an easier alternative, a highly adaptable automation technology should be used in the back end to link the individual sub-processes of data acquisition and processing in order to relieve the human workforce for more complex administrative tasks that cannot be performed by a robot [5, 6]. The front end, in turn, requires automation technology that enables the citizens to access the required data quickly and intuitively. One way to achieve this is a combination of software robots and BDA in the back end to acquire and process important data from various sources, and traditional chatbots in the front end to receive the requests and communicate the corresponding information.
2.1 Chatbots Although the capabilities of language processing have come a long way since Joseph Weizenbaum presented the first chatbot called ELIZA in 1966 [7], the basic principle is still the same. Chatbots mimic a human conversation by analyzing a given input for specific keywords [8]. The chatbot then follows a predefined conversation path to return an appropriate response. By integrating a smart and intuitive interface, it is not only possible to automate complex business processes including the involvement of external users such as customers [9], but also the scalability of the number of users simultaneously using a chatbot is advantageous compared to the one-to-one usefulness of a service employee. Especially in scenarios such as disaster situations,
284
D. Staegemann et al.
where the fast distribution of information could be crucial for the potential saving of lives [2], public interest in currently important details can be very high at an early stage of the crisis, but may decline over time as the situation becomes less confusing. The use of non-human interlocutors such as chatbots enables an extremely flexible question and answer solution compared to purely human service personnel, as they do not need to be hired or dismissed after the initial rush. Although it could be argued that users of social networks in particular could help each other with questions, these platforms also offer the opportunity for anti-social behaviour such as the deliberate dissemination of false information [10], which is why a centralized reliable contact point can be considered advantageous even if the answering of questions is automated and impersonal. With the possibility of extending a chatbot with artificial intelligence and algorithms of machine learning [8], this kind of human-like communication is a promising opportunity to improve the distribution of information in chaotic and volatile time periods.
2.2 Robotic Process Automation While RPA is primarily intended to increase the efficiency and accuracy of backoffice workflows in business companies organizations [11], it is not limited to this task. The core principle, in contrast to tailor-made individual software solutions, is that software robots work exactly as a human would work [9, 12]. Tasks involving a variety of different information systems can be handled by the robots without having to redevelop a single piece of software [13], which greatly facilitates the revision of the entire process cycle. Furthermore, software robots do not require in-depth programming knowledge [8], since they are largely built by recording or entering the sequence of steps required for a process. Besides the key feature of relieving the strain on human workers [13], which can be even more precious in serious situations, software robots bring several other advantages of traditional manufacturing robots to the office area, including round-the-clock accessibility, high efficiency, and low error susceptibility [8, 9]. In this regard, RPA is ideally suited for data-intensive and repetitive tasks [11], which would be easy for a human employee to learn, but can lead to mistakes during their frequent execution. Recent research indicates that the integration of additional technologies such as artificial intelligence and big data analysis may lead to even more complex workflows performed by software robots in the future [13]. From processing unstructured inputs such as natural language to advanced learning skills to resolve unexpected problems, robots could come closer to real human-like behaviour [9]. Especially in unforeseen scenarios with numerous different systems from which necessary data must be collected and updated at high frequency, RPA offers a fast and easy to integrate solution [13].
A Concept for the Use of Chatbots to Provide …
285
2.3 Big Data With an improving capability to produce and capture data, not only the ability to acquire meaningful information increased, but also the accompanying challenges. As a result, the term big data was coined, reflecting those increased demands. According to the National Institute of Standards and Technology (NIST), big data “consists of extensive datasets primarily in the characteristics of volume, velocity, variety, and/or variability that require a scalable architecture for efficient storage, manipulation, and analysis” [14]. While the engineering of the corresponding big data analytics solutions is a highly sophisticated task [15], the benefits are still widely acknowledged [16, 17]. The list of application areas that already profit from the purposeful and timely analysis of data comprises, but is not limited to, sports [18], business [19], education [20], but also healthcare [21]. One further domain that already benefits from big data, but also has plenty of room for further advancements [22–27], is civil protection and disaster management, where timely and correct decisions can be crucial.
3 The Proposed Approach While the clear communication of information from a government to its citizens is always important, this applies even more in times of a crisis, when the correct decisions could make the difference between life and death [28]. Since the majority has no primary sources of information and is also no expert on the relevant topics, they are reliant on the insights of others as a basis for their own behaviour. However, oftentimes it can be time-consuming to obtain the corresponding information, with the evaluation of actuality, applicability to the respective situation and veracity posing additional challenges, especially in times of fake news and the information deluge on social media [29]. As a result, a lot of important information does not reach all of the relevant recipients and the benefits of obtained insights are not fully realized. Therefore, it appears sensible to tackle this issue by positioning validated sources as a single point of information for their respective domain, without increasing the barrier by introducing a high degree of complexity for the user, which would otherwise contribute to the establishment of digital inequality [30]. Due to the government being seen as the most trusted source of information, it is the evident administrator for this kind of solution [31]. Furthermore, since the dissemination of devices like PCs, tablets or smartphones is usually high and they allow for interactive applications, they appear to be an optimal vehicle for such a solution. As the range of potentially important information is wide and those might even vary depending on the user’s location, it is usually impossible to present them on just a few static pages. Yet, a complex navigation to find the desired information reduces the usability and increases the risk of mistakes that lead to incorrect results. Therefore, a more intuitive way of obtaining them is necessary. This is facilitated through the use of chatbots, allowing the user to reach his information goals by
286
D. Staegemann et al.
Fig. 1 The schematic overview of the proposed concept
using natural language instead of following complicated navigation paths or being forced to remember distinct commands. As Fig. 1 indicates, this approach also allows for the correct interpretation of slightly erroneous statements and sentence structures, increasing the accessibility for those who might otherwise be excluded, since language barriers can be a serious issue in crisis situations [32]. However, to always convey up-to-date insights, those bots must be linked to reliable sources that are constantly updated. For this purpose, RPA is an ideal means with its combination of high flexibility and simple setup, allowing to rapidly adapt to any arising necessities [12]. The chatbot tasks the software robot with the retrieval of a piece of information that corresponds to the user’s intent and subsequently incorporates it into its next answer. In that manner, new insights are automatically taken into account and only new types of questions have to be implemented once to be afterwards permanently available. To improve performance, an interim data storage could hold frequently requested information for a predefined amount of time, avoiding the necessity to freshly obtain them every time. However, it is absolutely vital for the system’s success that the selection of the organizations that provide the incorporated information is conducted thoroughly. This assures on the one hand the quality of the input and on the other hand, it also strengthens the confidence of the recipients regarding the system. Additionally, it is necessary to clearly define each sources responsibility according to their expertise and other potentially significant factors (e.g. location) and to also reflect it in the system’s data flow and modelling. This again increases the quality and also helps to avoid contradictions or ambiguities. Since the expertise for operating
A Concept for the Use of Chatbots to Provide …
287
databases might not be available in every organization and they also lack the complete overview of the solutions architecture, the administration is centralized and all the information are stored in a comprehensive, scalable data hub. The data suppliers are just provided the possibility to modify or add entries regarding predefined aspects and to request the addition of new ones that they deem lacking. Besides the possibility to allow the public easy on-demand access to relevant and correct information, the system offers another benefit by allowing for big data analytics. Depending on local regulations, the user’s requests could be aggregated and analyzed to find aspects that quicken interests and that might have not been sufficiently clarified by other channels like news or press conferences to subsequently address this grievance. This, in turn, would not only have a beneficial impact for the users through presentation of highly demanded information but also the government itself. Potential trends, the identification of hotspots of the ongoing crisis and sentiment analysis could be facilitated. Hence, by evaluating the requests that could not be adequately served, it is also possible to detect, which additional information needs to be incorporated into the systems to meet the public’s information needs. Furthermore, in an attempt similar to Google Flu Trends [33] or the mining of twitter and web news to make predictions regarding COVID-19 [34], the analysis of those requests could possibly be used to obtain additional insights when it comes to disease outbreaks and pandemics, thus facilitating a better strategic planning. A schematic overview of the envisioned concept is depicted in Fig. 1, with dotted arrows indicating request and the others symbolizing responses or the entering of new data.
4 Concluding Remarks In the contribution at hand, a single point of interaction that combines the capabilities of chatbots, RPA and data analytics is proposed that provides an accessible platform for people to inform about ongoing crises. In times of insecurity, fear, and uncertainty, the intended approach shall increase the protection of the civil population by a clear and accurate provision of tailored information. At the current stage, the previously presented approach constitutes, for now, a conceptual model without being technically specified and implemented for a concrete country. However, possible pitfalls and thorough considerations have been identified that should be recognized in the future. This includes aspects, for instance, related to not only the general planning and implementation, but also the subsequent use and acceptance by the users. At the final stage, one has to keep in mind that the general awareness and use of such a solution might strongly vary between different countries. Unique cultural tendencies, the individual attitude of the population and existing legislations may influence the actual use. While in many countries, residents believe in a high trustworthiness of their government and, thus, in the goodwill of such a solution, within others the situation can be different. In some cases, people may develop the feeling that, due to the nature of a single point of interaction, the general information independencies could be distracted or even manipulated. Especially in those countries with natural higher
288
D. Staegemann et al.
privacy concerns, the massive storing, aggregation, and analyses of public data may be met with scepticism. For that reason, sophisticated technical concepts are required that prevent malicious and unauthorized access of third parties to generally increase the overall trustworthiness and reliability of such a solution. Apart from the single point of interaction, this also includes the data hub. Since the latter comprises all of the relevant data made available for the system, not only effective anonymizing techniques, but also comprehensive resilience and fault tolerance strategies should be used. Since, when faced with an actual crisis and the potentially high use of the system, it is vital to avoid failures or outages of the system. Furthermore, in the context of the development and integration of the system, researchers of adjacent domains should be consulted. This applies also for large-scale evaluations of the system as well as observations of the interaction between the people and the system. Due to the comprehensive data storage and the data analytics that can be performed, a suitable balance between the results that could and those, which should be presented needs to be identified. A system that might cause additional fear and panic in case of a disaster would not have the desired effect and might even be counterproductive.
References 1. Schneider SK (2018) Governmental response to disasters: key attributes, expectations, and implications. In: Handbook of disaster research. Springer International Publishing, Cham, pp 551–568 2. Kosugi M, Uchida O (2019) Chatbot application for sharing disaster-information. In: Proceedings of the 2019 international conference on information and communication technologies for disaster management. IEEE, pp 1–2 3. Thomas JC (2013) Citizen, customer, partner: rethinking the place of the public in public management. Public Admin Rev 73:786–796 4. Pourebrahim N, Sultana S, Edwards J, Gochanour A, Mohanty S (2019) Understanding communication dynamics on Twitter during natural disasters: a case study of hurricane sandy. Int J Disaster Risk Reduct 37:101176 5. Genc Z, Heidari F, Oey MA, van Splunter S, Brazier FMT (2013) Agent-based information infrastructure for disaster management. In: Intelligent systems for crisis management. Springer, pp 349–355 6. Tsai M-H, Chen J, Kang S-C (2019) Ask diana: a keyword-based chatbot system for waterrelated disaster management. Water 11:234–252 7. Weizenbaum J (1966) ELIZA–-a computer program for the study of natural language communication between man and machine. Commun ACM 9:36–45 8. Rutschi C, Dibbern J (2019) Mastering software robot development projects: understanding the association between system attributes & design practices. In: Proceedings of the 52nd Hawaii international conference on system sciences 9. Scheer A-W (2019) The development lines of process automation. In: The art of structuring. Springer International Publishing, Cham, pp 213–220 10. Mirbabaie M, Bunker D, Stieglitz S, Marx J, Ehnis C (2020) Social media in times of crisis: learning from Hurricane Harvey for the coronavirus disease 2019 pandemic response. J Inf Technol, 026839622092925 11. Siderska J (2020) Robotic process automation—a driver of digital transformation? Eng Manag Prod Serv 12:21–31
A Concept for the Use of Chatbots to Provide …
289
12. Asatiani A, Penttinen E (2016) Turning robotic process automation into commercial success– case OpusCapita. J Inf Technol Teach Cases 6:67–74 13. van der Aalst WMP, Bichler M, Heinzl A (2018) Robotic process automation. Bus Inf Syst Eng 60:269–272 14. NIST (2019) NIST big data interoperability framework: Volume 1, Definitions, Version 3. National Institute of Standards and Technology, Gaithersburg, MD 15. Volk M, Staegemann D, Pohl M, Turowski K (2019) Challenging big data engineering: positioning of current and future development. In: Proceedings of the IoTBDS 2019, pp 351–358 16. Müller O, Fay M, Vom Brocke J (2018) The effect of big data and analytics on firm performance: an econometric analysis considering industry characteristics. J Manag Inf Syst 35:488–509 17. Günther WA, Rezazade Mehrizi MH, Huysman M, Feldberg F (2017) Debating big data: a literature review on realizing value from big data. J Strateg Inf Syst 26:191–209 18. Aversa P, Cabantous L, Haefliger S (2018) When decision support systems fail: insights for strategic information systems from Formula 1. J Strateg Inf Syst 27:221–236 19. Staegemann D, Volk M, Daase C, Turowski K (2020) Discussing relations between dynamic business environments and big data analytics. CSIMQ, 58–82 20. Häusler R, Staegemann D, Volk M, Bosse S, Bekel C, Turowski K (2020) Generating contentcompliant training data in big data education. In: Proceedings of the 12th international conference on computer supported education. SCITEPRESS-Science and Technology Publications, pp 104–110 21. Safa B, Zoghlami N, Abed M, Tavares JMRS (2019) Big data for healthcare: a survey. IEEE Access 7:7397–7408 22. Akter S, Wamba SF (2019) Big data and disaster management: a systematic review and agenda for future research. Ann Oper Res 283:939–959 23. Athanasis N, Themistocleous M, Kalabokidis K, Papakonstantinou A, Soulakellis N, Palaiologou P (2018) The emergence of social media for natural disasters management: a big data perspective. Int Arch Photogramm Remote Sens Spatial Inf Sci XLII-3/W4, 75–82 24. Domdouzis K, Akhgar B, Andrews S, Gibson H, Hirsch L (2016) A social media and crowdsourcing data mining system for crime prevention during and post-crisis situations. J Syst Info Tech 18:364–382 25. Ragini JR, Anand PR, Bhaskar V (2018) Big data analytics for disaster response and recovery through sentiment analysis. Int J Inf Manage 42:13–24 26. Wu D, Cui Y (2018) Disaster early warning and damage assessment analysis using social media data and geo-location information. Decis Support Syst 111:48–59 27. Yu, M., Yang, C., Li, Y.: Big Data in Natural Disaster Management: A Review. Geosciences 8 (2018) 28. Wang B, Zhuang J (2017) Crisis information distribution on Twitter: a content analysis of tweets during Hurricane Sandy. Nat Hazards 89:161–181 29. Mirbabaie M, Bunker D, Deubel A, Stieglitz S (2019) Examining convergence behaviour during crisis situations in social media- case study on the manchester bombing 2017. In: Elbanna A, Dwivedi YK, Bunker D, Wastell D (eds) Smart working, living and organising, vol 533. Springer International Publishing, Cham, pp 60–75 30. Beaunoyer E, Dupéré S, Guitton MJ (2020) COVID-19 and digital inequalities: reciprocal impacts and mitigation strategies. Comput Human Behav, 106424 31. Bunker D (2020) Who do you trust? The digital destruction of shared situational awareness and the COVID-19 infodemic. Int J Inf Manag 32. Fischer D, Posegga O, Fischbach K (2016) Communication barriers in crisis management: a literature review. In: Proceedings of the twenty-fourth European conference on information systems 33. Pervaiz F, Pervaiz M, Abdur Rehman N, Saif U (2012) FluBreaks: early epidemic detection from Google flu trends. J Med Int Res 14:e125 34. Jahanbin K, Rahmanian V (2020) Using twitter and web news mining to predict COVID-19 outbreak. Asian Pac J Trop Med
Fuzzy Reinforcement Learning Multi-agent System for Comfort and Energy Management in Buildings Panagiotis Kofinas, Anastasios Dounis, and Panagiotis Korkidis
Abstract In this paper, a Multi-agent System (MAS) is proposed to maintain the comfort of a building in high levels and simultaneously reduce the overall energy consumption. The multi-agent system consists of three independent agents each one dedicated to one comfort factor. These factors are the thermal comfort, the visual comfort and the air quality. Fuzzy Q-learning algorithm is utilised in all the agents in order to deal with the continuous state-action space. Simulation results highlight the superiority of the system compared to a simple on-off algorithm, as a reduction of 3% is observed and the comfort index remains high throughout the entire simulation. Keywords Multi-agent system · Building · Fuzzy reinforcement learning · Q-learning · Energy management · Comfort management
1 Introduction Energy management is critical in modern societies, as far as environmental and financial aspects are concerned. Since people spent most of their lives inside buildings, the maintenance of the indoor environment in good levels, to assure health and productivity [1], under an energy-efficient framework, is a crucial task. Three major factors affecting the overall comfort in a building are the thermal comfort, the indoor quality and the visual comfort. The temperature of the indoor environment is an indicator of the thermal comfort, while the concentration of the CO2 and illumination level are indicators of indoor air quality and visual comfort, respectively [2]. These indicators are characterised by the following indexes [3]: Thermal Comfort Index (TCI), P. Kofinas (B) · A. Dounis · P. Korkidis Department of Biomedical Engineering, University of West Attica, Athens, Greece e-mail: [email protected] A. Dounis e-mail: [email protected] P. Korkidis e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_26
291
292
P. Kofinas et al.
Indoor Air Quality (IAQ) and Visual Comfort Index (VCI), respectively. In order for the indicators to be kept in acceptable levels, the heating/cooling system, the ventilation system and the artificial lighting system, acting as actuators in the control framework, have to be controlled. There are many methodologies on the thermal comfort and energy management in buildings with HVAC actuators; however, most of these studies focus only on the temperature control [4] or temperature/relative humidity control [5–10]. Few papers consider the maintenance of visual comfort’s high levels by using artificial lighting [11, 12]. In addition to that, few papers propose approaches for ventilation control to improve the indoor air quality in the building [1, 13]. None of these papers consider the building’s indoor comfort as a unified system of the three aforementioned factors. Some approaches on fuzzy logic controllers [14–16] and multi-agent systems (MAS) [17–19] have been proposed, for the maintenance of high-level indoor comfort and simultaneously reducing the overall consumption of the buildings by controlling the indoor temperature as well as the CO2 concentration and the indoor illuminance. These approaches mainly depend on expert knowledge for setting the parameters and they do not deploy any learning mechanism for adapting their behaviour when the building’s dynamics change. Moreover, in most cases, a central agent acts as a coordinator of all the local agents, which can lead to failure if the central agent fails. Reinforcement learning methods have been introduced, in the context of energy management, to enhance the proposed systems with learning mechanisms; however, these approaches focus on a single agent system [20]. Some of them aim to achieve thermal comfort and energy saving, by controlling only the indoor temperature [21, 22] or to achieve visual comfort and energy saving by controlling only the artificial lighting system [23]. The single agent control scheme is also used in cases where more than one control variable is used in order to control both the thermal comfort and the air quality [24]. Specifically, in this case, that state-action space becomes very large. This paper proposes a fully decentralised reinforcement learning MAS for controlling the temperature, the illuminance and the air quality of an office building. The MAS consists of three agents: each responsible for a key factor, e.g. one agent responsible for achieving thermal comfort, one for visual comfort and one for improving air quality. Each agent performs a modified Q-learning approach with fuzzy approximators [25] to deal with the continuous state-action space. The aim of the MAS is to optimise the overall comfort of the building and simultaneously reduce the overall energy consumption. The contribution of this paper is summed up as follows: – A fully decentralised MAS for improving the overall comfort of an office building, where agents act independently of each other is deployed. This control framework is robust since any faults or failure of an agent will not affect the overall system performance. – Q-learning is implemented in each agent in order to adapt its behaviour in environment changes.
Fuzzy Reinforcement Learning Multi-agent System …
293
– Fuzzy approximation is considered in each agent to deal with the continuous stateaction space. – Both indoor comfort and the energy consumption are embedded in the reward signal for achieving simultaneously indoor comfort and energy saving. We organise the paper so forth as follows: reinforcement learning, fuzzy Qlearning and MAS preliminaries are discussed in Sect. 2. Sections 3 and 4 provide an analysis of the building’s model and the control architecture. Simulation results are illustrated and discussed in Sect. 5. Finally, conclusions and future work are given in Sect. 6.
2 Reinforcement Learning Reinforcement learning (RL) is a family of algorithms that have been inspired by the way that humans and living beings are learning. Reinforcement learning has a target to extract a policy through the exploration of possible pairs of state-action. Policy means the mapping between states and actions. The signal that defines whether an agent acts well or not is defined as the reward signal.
2.1 Fuzzy Q-Learning Q-learning [26] is a tabular form of reinforcement learning in which the agent gradually builds a Q-function. This function estimates the future discounted rewards for actions in given states. For a given state x and the action α, the Q(x, a) is the definition of the Q-function output. The values of the Q-function are stored in a table. The following equation gives the update of the Q-table: Q (x, α) ← Q(x, α) + g(R(x, α, x ) + γ max Q(x , α) − Q(x, α)) α
(1)
where Q (x, α) is the new value of the Q-function output for the given state when the action α is applied. The new value is updated by receiving the reward signal R(x, a, x ). In principle, the agent, governed by an exploration/exploitation algorithm performed at the current state x, selects an action α and transits into the next state x . It receives the reward by the transition R(x, α, x ) and updates the Q-value of the pair (x, α), with the assumption that it performs the optimal policy from the state x and onwards. In the update equation, g denotes the learning rate and γ the discount factor. Learning rate values can lie in the range of [0, 1]. Small learning rate values result in small changes of agent’s prior knowledge while higher values make the agent consider more the newer information. If g is set to 0 the agent does not acquire new knowledge while if g is set to 1 the agent considers exclusively the new information. The discount factor γ determines how important the future rewards are. If γ is set to 0, the agent considers only the current rewards and if γ is set to 1 the agent will strive for a long-term high reward. In our study, the learning rate is set to
294
P. Kofinas et al.
0.1 and the discount factor is set to 0.9. This method has two main drawbacks. The first one is that is impractical when the state-action space is very large and the second one is that it cannot be applied for continuous state-action space. Fuzzy logic systems (FLSs) can be used in order to overcome these drawbacks. FLSs can achieve good approximations in the Q-values and can modify the Q-learning into fuzzy Q-learning [27] which can be used both for continuous state-action space and for reducing the state-action space. The algorithm of fuzzy Q-learning can be described as follows: 1. State x observation. 2. Output selection, for each fired fuzzy rule, based on the exploration/exploitation algorithm. 3. Global output α(x) and corresponding value of Q(x, α) computation by N i=1 α(x) = N
αi (x) αi
i=1
Q x, α =
N
αi (x)
αi (x) αi q[i, i † ] N i=1 αi (x)
i=1
(2)
(3)
where N is the number of the fired fuzzy rules, αi (x) is the fired degree for each fuzzy rule, αi is the action that is selected for each fuzzy rule and q[i, i † ] is the corresponding Q-value for the pair of the fuzzy rule i and the action i † . The action corresponds to the action that the exploration/exploitation algorithm selects. 4. The global action α(x) is applied and the next state x is observed. 5. Reward R computation. 6. Q-values are updated according to the formula: αi (x) q[i, i ∗ ] = gQ N i=1 αi (x)
(4)
where Q = R + γV (x ) − Q(x, α) and V x =
N
αi (x ) αi q[i, i ∗ ] N i=1 αi (x )
i=1
(5)
and q[i, i ∗ ] is the corresponding Q-value for the pair of the fuzzy rule i and the action i ∗ . The action i ∗ corresponds to the action that has the maximum corresponding Q-value for the fuzzy rule i. A schematic diagram describing the method of this paper is given in Fig. 1.
Fuzzy Reinforcement Learning Multi-agent System …
295
Fig. 1 Fuzzy Q-Learning agent interacting with environment
2.2 Multi-agent System (MAS) and Q-Learning A system which is composed of two or more agent is called MAS. The agents have the ability to interact with the environment and commonly have the ability to interact with each other [28]. A MAS is commonly used in problems that a single agent cannot solve. Specifically, MASs are used in problems where the solution is very difficult to be found by one agent or in problems that are physically distributed and its target is to maximise the total discounted reward [29]. The most common approaches of Q-learning in MAS are – Markov Decision Process (MDP) learning [30], where the MAS is a single agent, i.e. it has multiple state variables and a vector action. – Independent learners, where each agent ignores the presence of the other agents and learns its own policy [31]. – Coordinated RL, where an agent coordinated its actions with only their neighbours and ignores the presence of the other agents [32]. – Distributed value function, where each agent has a local Q-function based on its own actions and updates this function by embedding its neighbours Q-functions [33].
296
P. Kofinas et al.
3 Building Modelling and Description The building’s model consists of three subsystems: the Heating/Cooling Balance Model, the Daylighting and Artificial Lighting System and the Ventilation System. Let us start by considering the heating/cooling balance model. In the current study, we consider a space of 50 m2 with a 3 m2 window located on the north side of the building. Details and characteristics of the building can be found in Table 1. The heating system of the building introduces hot air into the space [34]. The heat flow into the space arises from the below equation: dQ = (Theater − Troom ) · M˙ · c · u h/c , where u h/c ∈ [0, 1] dt
(6)
and d Q/dt is the heat flow rate into the space, Theater is the temperature of the hot air, Troom is the temperature of the room’s air, M˙ is the air mass flow rate in the air from the heating system, c is the heat capacity of the air at constant level of pressure and u h/c is the control signal. The derivative of the space temperature over time is expressed as 1 d Q heater d Ql dTroom = ( − ) dt Ma · c dt dt
(7)
where Ma is the mass of the air inside the space and Troom − Tout d Q losses = dt Req
(8)
where Req is the equivalent thermal resistance of the space and Tout is the ambient temperature. The cooling system introduces cold air into the space [35]. The heat flow into the space arises from: dQ = (Tcooler − Troom ) · M˙ · c · u h/c , where u h/c ∈ [−1, 0] dt
(9)
and Tcooler is the temperature of the cold air. Tout − Troom d Q losses = dt Req
(10)
dTroom 1 d Q losses d Q cooler = ( − ) dt Ma · c dt dt
(11)
Details about the parameters of the heating/cooling system can be found in Table 1.
Fuzzy Reinforcement Learning Multi-agent System … Table 1 Thermal parameters of the building Parameter M˙ Theater c Mα Req Tcooler House dimensions (length, width, height)
297
Value 2600 kg/h 40 ◦ C 1005.4 J/kg.K 183.75 kg 1.0229 × 10−6 K/W 16 ◦ C 10 m, 5 m, 3 m
Let us now discuss the daylighting and artificial lighting system. In order to calculate the diffused horizontal illuminance provided by the sky into the building, Eq. 12 is used. Specifically, this equation calculates the horizontal illuminance in a point p inside the space [36]: E p,d
x w + ww − x p −1 x w + x p tan + tan h 2p + z 2 h 2p + z 2 x w + ww − x p z tan−1 − + 2 2 (h p + h w ) + z (h p + h w )2 + z 2 x p + xw −1 + tan (h p + h w )2 + z 2
z rw L = 2 h 2p + z 2
−1
(12)
where E p,d is the horizontal diffused illuminance of the point p, L is the Luminance from the sky, z is the horizontal distance between the point p and the window, h p is the height between the lower edge of the window and the point p, h w is the height of the window, ww is the width of the window, xw is the distance between the left wall and the left edge of the window, x p is the distance between the point p and the left wall and rw is the transparency of the window. In order to compute the diffused horizontal illuminance, a sensor is installed outside the building close to the window. In this way we can compute the diffused horizontal illuminance at a point which is located close to the south wall of the building which is among the most shaded points inside the building. According to this value we assume that the diffused horizontal illuminance provided by the sky is distributed uniformly and the control decisions are taken according to this value. The actuators are fluorescence lamps, where the relation between the consumed power and the diffused horizontal illuminance is 100 lm/W [37] and the window, which is an electro-chromic window [33], can change its transparency provided a small value of voltage. The installed power of the artificial lighting is 400 W which can provide a maximum illuminance of 800 lux in a surface of 50 m2 . The relation between the lux and the consumption is assumed to be linear. We also assume that
298
P. Kofinas et al.
Table 2 Parameters of the building Parameter Number of windows Window dimensions (width, height) rw z hp hw ww xw xp
Value 1 3 m, 1 m 0.78 10 m 0.5 m 1m 3m 3.5 m 2.5 m
the diffused horizontal illuminance provided by the artificial lighting is distributed uniformly all over the surface of the building. Details about the lighting system can be found in Table 2. As far as the ventilation system is concerned, the differential equation, which governs the generation and decay of CO2 , based on mass balance considerations and for constant volume of air inside the building is expressed as follows: q(Cout − C) + R N dC = dt V
(13)
where V is the volume of the space, R is the generation rate of CO2 by a single occupant, N is the number of occupants inside the space, Cout is the outdoor concentration of CO2 and q is the ventilation rate. For a building office, according to DIN 1946, the air renewal per hour must be between four and six times. For a space of 150000 L, the ventilation rate of the ventilation system must be at least 600000 L/h. The installed power for such a system is approximately 165 W by assuming a relationship of 0.5 W/(L/s). The generation rate per occupant equals 29.88 l/h [19] and for the number of occupants inside the building a repeating sequence is used in the range of [0, 10] (Fig. 2 and Table 3).
Table 3 Parameters of the ventilation system Parameter R q (maximum value) Cout V
Value 29.88 l/h 600000 L/h 400 ppm 150000 L
Fuzzy Reinforcement Learning Multi-agent System …
299
Fig. 2 Number of occupants inside the building
4 Multi-agent System (MAS) The MAS consists of a group of three agents A = {AG 1 , AG 2 , AG 3 }, where AG 1 is the temperature agent, AG 2 is the illuminance agent and AG 3 is the CO2 agent. Figure 3 illustrates the MAS. Black arrows represent set points defined by the users and outdoor measurements from sensors. Blue arrows represent the control signals produced by the agents. Red arrows represent measurement signal from the actuators (power consumption) and green arrows represent indoor measurements. The signals from the black and the green arrows define the states of the agents. Furthermore, the signals from these two groups in combination with the signals from the red arrows constitute the reward signal. All signals are normalised in [0, 1] range for signals with positive values and in [−1, 1] range for signals with both positive and negative values. The states of the agents are defined by a total number of six fuzzy state variables X i : two variables for each agent. For each positive value input five membership functions (MFs) are used. In addition to that, for each both positive and negative value inputs, seven MFs are used (Fig. 4). We denote membership functions as linguistic variables PVB, PB, PM, PS, Z, NS, NM and NB, where P stands for Positive, V:Very, B:Big, M:Medium, S:Small, Z:Zero and N:Negative. The temperature agent has two inputs as follows: – The outdoor temperature which is in the range [0, 1]; – The error eT between the set point of the users and the indoor temperature normalised in [−1, 1].
300
Fig. 3 Proposed MAS
Fig. 4 Membership functions of inputs
P. Kofinas et al.
Fuzzy Reinforcement Learning Multi-agent System …
301
An input with five MFs and an input with seven MFs are used, resulting in 35 states corresponding to an equal number of fuzzy rules. Five singleton fuzzy sets 1 1 1 1 + −0.01 + 01 + 0.01 + 0.1 } are used for the output vector of the temperA1 = { −0.1 ature’s agent. The global action defines the percentage of the air flow rate by the heating/cooling system according to its maximum capacity. Positive signal actuates the heating system while negative signal actuates the cooling system. The reward R A1 of the agent is given by R A1 (x, α, x ) = −|eT | − 0.1· PH C
(14)
where PH C is the power consumption of the heating/cooling system. The illuminance agent has two inputs as follows: – The indoor horizontal illuminance which is in the range [0, 1]; – The error e L between the set point of the users and the total indoor illuminance normalised in [−1, 1]. Again, there is one input with five MFs and a second input with seven MFs, resulting in 35 states corresponding to an equal number of fuzzy rules. The illuminance 1 1 1 + −0.15 + 01 + 0.15 + agent’s output vector has five singleton fuzzy sets A2 = { −0.45 1 }. The global action defines the percentage of the power change to be consumed 0.45 by the artificial lighting system according to its nominal operating power for positive signal while negative signal change the transparency of the electro-chromic window. The reward R A2 of the agent is given by R A2 (x, α, x ) = −|e L | − 0.1· PAL
(15)
where PAL is the power consumption of the artificial lighting system. The CO2 agent has two inputs as follows: – The number of occupants inside the building normalised in the range [0, 1]; – The error eco between the set point of the users and the indoor CO2 concentration normalised in the range [−1, 1]. There are two inputs with five MFs, which result to 25 states corresponding to an equal number of fuzzy rules. The output vector of the CO2 agent has five singleton 1 1 1 1 fuzzy sets A3 = { −0.3 + −0.06 + 01 + 0.1 + 0.15 }. The global action defines the percentage of the power change to be consumed by the ventilation system according to its nominal operating power. The reward R A3 of the agent is given by R A3 (x, α, x ) = −|eco | − 0.1· PV
(16)
where PV is the power consumption of the ventilation system. The same exploration/exploitation algorithm is used by all the agents. For the agent’s exploration capability to be increased, when visiting a new state, we allow the agent to explore for 500 rounds/state and then check and perform the actions that
302
P. Kofinas et al.
has not been performed at all. The agent exploration is set to 1% while the agent exploitation is set to 99%. The consequents of the fuzzy rules are chosen by trialand-error method. The fuzzy singleton sets, of the output vector, provided by each agent represent different levels of the continuous control signal.
5 Simulation Results A one year period, with a simulation step of 0.001 h, is chosen as the time of our numerical experiments. The data, concerning the outdoor temperature and the diffused horizontal illuminance of the sky, is acquired from the database of software EnergyPlus for the location Athens, Greece with 2 h sample time for the temperature and 1 h for illuminance. The illuminance values are multiplied by a factor of 0.5 that corresponds to the amount of illuminance that is provided outside a north window. For the outdoor CO2 concentration, a constant value of 400 ppm is used. In Figs. 5 and 6, the outdoor temperature, the indoor temperature and the control signal for heating/cooling system for the whole year and for one random day, respectively, are illustrated. The control signal is produced by the corresponding agent and drives the heating/cooling system. The set point of the indoor temperature is not constant and depends on the outdoor temperature. If the outdoor temperature exceeds the value of 29 ◦ C the set point is set to 27 ◦ C. If the indoor temperature falls under the value of 20 ◦ C the set point is set to 22 ◦ C . In the beginning, the agent focuses more on exploration than exploitation and the indoor temperature gets various values from 3 to 33 ◦ C. After the extensive exploration, it is obvious that the indoor temperature is stabilised close to CO2 for the winter and CO2 for the summer. Small deviations from the set points come from the exploration phase which has reduced to only 1%. Similar behaviour can be observed on the agents of the indoor illuminance’s and the CO2 concentration’s agents. Figures 7 and 8 depict the outdoor horizontal diffuse illuminance, the indoor diffused horizontal illuminance provided by the sky, the total indoor illuminance, the control signal of the agent, the provided illuminance by the artificial lighting system, and the transparency of the electro-chromic window for the whole year and for one random day. The set point for indoor horizontal illuminance is set to 700 lux. After the extensive exploration, the indoor horizontal illuminance remains close to the set point. This happens by increasing the indoor horizontal illuminance by the artificial lighting when the illuminance provided by the sky is not sufficient and by decreasing the indoor illuminance by changing the window transparency when the indoor luminance provided by the sky exceeds the set point. This is more obvious in Fig. 6 where these quantities are depicted for only 1 day. Figures 9 and 10 illustrate the number of occupants inside the building, the control signal provided by the agent and the indoor CO2 concentration for the whole year and for one random day, respectively. The CO2 concentration remains under the value of the 1000 ppm in most of the time. During the extensive exploration, the CO2 concentration does not exceed the limit of 1000 ppm but a continuous operation of
Fuzzy Reinforcement Learning Multi-agent System …
303
Fig. 5 a Outdoor temperature. b Indoor temperature. c Control signal for heating/cooling system. All plots correspond to 1 year
Fig. 6 a Outdoor temperature. b Indoor temperature. c Control signal for heating/cooling system. All plots correspond to a random day
304
P. Kofinas et al.
Fig. 7 a Outdoor/Exterior illuminance. b Indoor illuminance provided from the sky. c Total indoor illuminance. d Control signal. e Provided illuminance by the artificial lighting system. f Transparency of the electro-chromic window. One-year time period
Fig. 8 a Outdoor/Exterior illuminance. b Indoor illuminance provided from the sky. c Total indoor illuminance. d Control signal. e Provided illuminance by the artificial lighting system. f Transparency of the electro-chromic window. One random day time period
Fuzzy Reinforcement Learning Multi-agent System …
305
Fig. 9 a Number of occupants in the building. b Agent’s control signal. c Indoor CO2 concentration. All plots correspond to 1-year period
Fig. 10 a Number of occupants in the building. b Agent’s control signal. c Indoor CO2 concentration. All plots correspond to one random day period
the ventilation system is observed. This operation leads to energy waste. After the extensive exploration, the ventilation system operates when there are people inside the building. The proposed MAS is compared with a single on-off control system regarding the provided overall comfort and the energy consumption. For the overall comfort, three indexes are combined, one for the thermal comfort, one for the visual comfort
306
P. Kofinas et al.
Fig. 11 a Membership function of thermal index. b Membership function of illuminance index and c Membership function of air quality index
and one for the air quality. Figure 11 depicts the trapezoidal membership functions that assign values to the corresponding indexes according the indoor temperature, the indoor horizontal illuminance and the CO2 concentration. The total comfort index equals to C I = 0.4 · T C I + 0.4 · V C I + 0.2 · I AQ (17) Figures 12 and 13 illustrate the total comfort index with respect to time for the proposed MAS and the on-off control system, respectively. The comfort index for the on-off control system remains high with deviations mainly in the range of [0.8, 1] while the comfort index for the MAS remains close to 1 (except in the beginning where extensive exploration is applied). Sudden reductions to the index value are observed to both cases because of the change of the temperature set point. The total energy consumption of the MAS for the whole year equals to 4059 kWh while for the on-off control system equals to 4185. This means a reduction of 126 kWh which corresponds to 3% further energy saving. Specifically the energy requirements of the MAS are: the heating/cooling system consumes 1736 kWh, the artificial lighting system consumes 1732 kWh and the ventilation system consumes 590 kWh. For the on-off control system, the energy requirements are: the heating/cooling system consumes 1829 kWh, the artificial lighting system consumes 1735 kWh and the ventilation system consumes 621 kWh.
Fuzzy Reinforcement Learning Multi-agent System …
307
Fig. 12 Comfort index with MAS
Fig. 13 Comfort index with on-off control
6 Conclusion A multi-agent system, as a solution to the distributed problem of indoor comfort for a building office while reducing the overall energy consumption of the building, is proposed. The building’s energy is managed by a multi-agent system. The MAS controls actuators such as heating/cooling system, ventilation system, artificial lighting and electro-chromic window. We deploy a modified Independent Learners approach
308
P. Kofinas et al.
in order to reduce that states space and enhance a learning mechanism. Local rewards and state information relevant to each agent are used. In addition to that, fuzzy Qlearning is utilised to handle the continuous state and action space of each agent. The MAS consists of three agents and the total number of the fuzzy states is 95. Each agent learns through the same exploration/exploitation algorithm demonstrating good performance. The comfort index, after the initial extensive exploration phase of the agents, remains very high and in most of the time equals to 1, highlighting the superiority of the MAS compared to a simple on-off control system. Additionally, a further reduction of 3% to the overall energy consumption is observed. Simulation results, of our study, justify the agent’s individual performance as well as the MAS’s total performance. The trained algorithm can be applied in any similar building system and avoid the initial intense exploration of the proposed MAS. A trained MAS can be directly applied on a real building. In future work, the combination of fuzzy Q-learning with evolutionary algorithm can lead to even better performance by optimising parameters regarding the membership function, learning rate (g) and discount factor (γ). These parameters have been chosen by trial-anderror method and an optimisation algorithm can exploit values which will lead to even better performance of the whole system regarding both the occupant comfort and the energy consumption. Additionally, fuzzy stochastic equations can be used for predicting occupancy which will lead to further reduction of the consumed energy and improvement of the comfort index. Acknowledgements This research is co-financed by Greece and the European Union (European Social Fund-ESF) through the Operational Programme Human Resources Development, Education and Lifelong Learning 2014–2020 in the context of the project ‘Intelligent Control Techniques for Comfort and Prediction of Occupancy in Buildings—Impacts on Energy Efficiency’ (MIS 5050544). The publication of the scientific work is funded by the University of West Attica.
References 1. Wang Z, Wang L (2013) Intelligent control of ventilation system for energy-efficient buildings with CO2 predictive model. IEEE Trans Smart Grid 4(2):686–693. https://doi.org/10.1109/ TSG.2012.2229474 2. Wang Z, Yang R, Wang L, Dounis AI (2011) Customer-centered control system for intelligent and green building with heuristic optimization. In: IEEE/PES power systems conference and exposition, Phoenix, AZ, pp 1–7 3. Dounis AI, Caraiscos C (2009) Advanced control systems engineering for energy and comfort management in a building environment-a review. Renew Sustain Energy Rev 13(6–7):1246– 1261 4. Dong J, Winstead C, Nutaro J, Kuruganti T (2018) Occupancy-based HVAC control with short-term occupancy prediction algorithms for energy-efficient buildings. Energies 11:2427
Fuzzy Reinforcement Learning Multi-agent System …
309
5. Chiang M, Fu L (2007) Adaptive control of switched systems with application to HVAC system. In: IEEE international conference on control applications, Singapore, pp 367–372 6. Semsar-Kazerooni E, Yazdanpanah MJ, Lucas C (2008) Nonlinear control and disturbance decoupling of HVAC systems using feedback linearization and backstepping with load estimation. IEEE Trans Control Syst Technol 16(5):918–929 7. Arguello-Serrano B, Velez-Reyes M (1999) Nonlinear control of a heating, ventilating, and air conditioning system with thermal load estimation. IEEE Trans Control Syst Technol 7(1):56–63 8. Dounis AI, Manolakis DE (2001) Design of a fuzzy system for living space thermal-comfort regulation. Appl Energy 69:119–144 9. Morales Escobar L, Aguilar J, Garcés-Jiménez, A, Gutierrez De Mesa JA, Gomez-Pulido JM (2020) Advanced fuzzy-logic-based context-driven control for HVAC management systems in buildings. IEEE Access 8:16111–16126 10. Shah ZA, Sindi HF, Ul-Haq A, Ali MA (2020) Fuzzy logic-based direct load control scheme for air conditioning load to reduce energy consumption. IEEE Access 8:117413–117427 11. Dounis AI, Manolakis DE, Argiriou A (1995) A fuzzy rule based approach to achieve visual comfort conditions. Int J Syst Sci 26(7):1349–1361 12. Malavazos C, Papanikolaou A, Tsatsakis K, Hatzoplaki E (2015) Combined visual comfort and energy efficiency through true personalization of automated lighting control. In: 4th international conference on smart cities and green ICT systems (SMARTGREENS-2015), pp 264–270 13. Wang Z, Wang L (2012) Indoor air quality control for energy-efficient buildings using CO2 predictive model. In: IEEE 10th international conference on industrial informatics, Beijing, pp 133–138 14. Fayaz M, Kim D (2018) Energy consumption optimization and user comfort management in residential buildings using a bat algorithm and fuzzy logic. Energies 11:161 15. Wahid F, Fayaz M, Aljarbouh A, Mir M, Aamir M (2020) Energy consumption optimization and user comfort maximization in smart buildings using a hybrid of the firefly and genetic algorithms. Energies 13(17):4363 16. Wahid F, Ismail LH, Ghazali R, Aamir M (2019) An efficient artificial intelligence hybrid approach for energy management in intelligent buildings. KSII Trans Inte Inf Syst 13(12):5904– 5927 17. Wang Z, Yang R, Wang L (2010) Multi-agent control system with intelligent optimization for smart and energy-efficient buildings. In: 36th annual conference on IEEE industrial electronics society, Glendale, AZ, pp 1144–1149 18. Shaikh PH, Nor NBM, Nallagownden P, Elamvazuthi I (2014) Stochastic optimized intelligent controller for smart energy efficient buildings. Sustain Cities Soc 13:41–45 19. Wang Z, Wang L, Dounis AI, Yang R (2012) Multi-agent control system with information fusion based comfort model for smart buildings. Appl Energy 99:247–254 20. Jose R (2019) Vazquez-Canteli, Zoltan Nagy: reinforcement learning for demand response: a review of algorithms and modeling techniques. Appl Energy 235:1072–1089 21. Zhang C, Kuppannagari SR, Kannan R, Prasanna VK (2019) Building HVAC scheduling using reinforcement learning via neural network based model approximation. In: BuildSys ’19: proceedings of the 6th ACM international conference on systems for energy-efficient buildings, cities, and transportation, pp 287–296 22. Gao G, Li J, Wen Y (2019) Energy-efficient thermal comfort control in smart buildings via deep reinforcement learning. arXiv:1901.04693v1 23. Park June Young, Dougherty Thomas, Fritz Hagen (2019) Nagy Zoltan: LightLearn: an adaptive and occupant centered controller for lighting based on reinforcement learning. Build Environ 147:397–414 24. Dalamagkidis K, Kolokotsa D, Kalaitzakis K, Stavrakakis GS (2007) Reinforcement learning for energy conservation and comfort in buildings. Build Environ 42(7):2686–2698 25. Kofinas P, Dounis AI, Vouros GA (2018) Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids. Appl Energy 219:53–67 26. Watkins C (1989) Learning from delayed rewards. PhD thesis. University of Cambridge, England
310
P. Kofinas et al.
27. Glorennec Y, Jouffe L (1997) Fuzzy Q-learning. In: 6th international fuzzy systems conference, pp 659–662 28. Sycara K (1998) Multiagent systems. AI Mag 19(2):79–92 29. Shi B, Liu J (2015) Decentralized control and fair load-shedding compensations to prevent cascading failures in a smart grid. Int J Electr Power Energy Syst 67:582–590 30. Guestrin C (2003) Planning under uncertainty in complex structured environments. PhD thesis. Computer Science Department. Stanford University 31. Clausand C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: National conference on artificial intelligence (AAAI), Madison, WI 32. Guestrin C, Lagoudakis M, Parr R (2002) Coordinated reinforcement learning. In: 19th international conference on machine learning (ICML), Sydney, pp 227–234 33. Schneider J, Wong W-K, Moore A, Riedmiller M (1999) Distributed value functions. In: 16th international conference on machine learning (ICML), Bled, pp 371–378 34. MathWorks (2013) Thermal model of a house, MATLAB documentation. https://www. mathworks.com/help/simulink/slref/thermal-model-of-a-house.html 35. Dejvisesa J, Tanthanuchb N (2016) A simplified air-conditioning systems model with energy management. In: International electrical engineering congress, iEECON2016, Chiang Mai, pp 371–378 36. Kim C-H, Kim K-S (2019) Development of sky luminance and daylight illuminance prediction methods for lighting energy saving in office buildings. Energies 12(4):592 37. Avella JM, Souza T, Silveira JL (2015) A comparative analysis between fluorescent and LED illumination for improve energy efficiency at IPBEN building. In: The XI Latin-American congress electricity generation and transmission-CLAGTEE, Brazil, pp 148–151
Discrete Markov Model Application for Decision-Making in Stock Investments Oksana Tyvodar
and Pylyp Prystavka
Abstract Understanding of the stock market and ability to forecast the price move play the key role in the wealth generation for every investor. This paper attempts to apply Markov chain model to forecast the behavior of the single stocks from S&P 100 index. We provide the description of the discrete Markov model that aims to forecast upward or downward move based on historical statistics of stocks’ visit to particular state which is constructed using technical analysis. S&P 100 data from January 2008 to December 2015 was used to build the model. The analysis of the model on real-life out-of-sample data from January 2016 to August 2020 provides the proof that use of proposed model will generate higher profits in comparison with the buy-and-hold investment approach. Keywords Markov chains · Discrete Markov model · Technical analysis · Stock movement forecast · Stock price · Cumulative profit · Stock market · Technical analysis · Absolute return · Moving average
1 Introduction Billions of dollars are traded every day on the financial markets around the world. The main priority of the traders behind these financial operations is to predict the future behavior of the financial instrument which they have interest in. But due to the dynamic and noise structure of the markets, this problem has not been solved for many years. Deterministic models cannot be used for the tasks with stochastic nature, so statistical methods based on information from the previous data are used. The Markov chain is one of the simplest cases of the random events sequence, but despite its simplicity it can be used to describe complex phenomena. Markov chains have been also used repeatedly by different researchers in financial forecasting tasks. Matle, Quaya (2014) used Markov chains to analyze changes in the stock price on the Ghana Stock Exchange, which improved the method for portfolio construction [1]. O. Tyvodar (B) · P. Prystavka Department of Applied Mathematics, National Aviation University, pr. Lubomir Husar, 1, Kyiv 03058, Ukraine © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_27
311
312
O. Tyvodar and P. Prystavka
Zhang et al. (2009) applied Markov chains to predict China stock market dynamics [2]. Choji, Iduno, and Kasem (2013) applied the Markov chain model to predict the stock price of two Nigerian banks, which was described by three states: rise, fall, and no change in price [3]. This article provides a Markov-based model for predicting stock price movements in the S&P 100 portfolio [4]. First, the definition of the discrete Markov model for the price movement process is introduced. Then, the efficiency of the proposed model is compared to the buy-and-hold strategy. This article firstly introduced the discrete Markov model with the states described by the values of technical indicators and previous changepoints of the price chart. It is proved that the use of the proposed model can yield to significant profit in comparison with buy-and-hold investment approach.
2 Discrete Markov Model Definition Let us consider the process p, which takes place in the system S, and describes the behavior of the given stock price.
2.1 System State Definition Firstly, we define the local changepoints (cp) of a size X as the up-and-down streaks of greater than X% for time series. We calculate them as the series of uninterrupted up-and-down movements in a price series. Uninterrupted is meant in the sense that no countermovement of +X% (−X%) percent or more occurs in down (up) movements. Changepoints correspond to the action which should be done to maximize the profit, so when cp = −1, one should sell short [5] company’s stocks, and when cp = 1 one should buy the corresponding stocks. The changepoints of the Bank of America (BAC) stock prices from January 2018 to January 2020 are shown in Fig. 1. Let us define the states, that the system can be in, using the following information: 1.
Relative Strength Index (RSI). RSI is a technical indicator developed by J. Welles Wilder [5]. We calculate RSI on 14-day timeframe and discretize its values into the following groups: – – – –
2.
Low, if the RSI value is less than 45. High, if the RSI value is greater than 65. Lower than average, if the RSI value is between 45 and 50. Higher than average, if the RSI value is between 50 and 65.
Signal of Moving Average Convergence/Divergence (MACD). MACD is a technical indicator developed by Gerald Appel and is designed to develop the changes in the strength, direction, momentum, and duration of a trend in a
Discrete Markov Model Application for Decision-Making in Stock Investments
313
Fig. 1 Local changepoints detection for BAC in 2018–2019: triangle up-local minimums (cp( pi ) = 1), triangle down-local maximums (cp( pi ) = −1)
stock’s price [6]. We use 14-day moving average (MA) as “fast” moving average and 28-day MA as “slow” moving average. We use double exponential moving average [7] to calculate MAs. The MACD signal is discretized into two groups: – Fast, if the “fast” moving average is greater than the “slow”; – Slow, otherwise. 3.
The number of days since the last local changepoint (cp = 0), which are discretized into the three groups: – Long Trend, if the number of days since the last changepoint to the current calculation date is greater than 75% percentile of the time between changepoints’ distribution. – Short Trend, if the number of days since the last changepoint to the current calculation date is less than 25% percentile of the time between changepoints’ distribution. – Average Trend, otherwise.
4.
Return since the last changepoint (cp = 0), which are discretized into six groups: – Lowest, if the absolute return is less than 10% percentile of the distribution of the absolute returns between changepoints. – Medium Low, if the absolute return is between 10 and 30% percentile of the distribution of the absolute returns between changepoints. – Higher Low, if the absolute return is between 30 and 50% percentile of the distribution of the absolute returns between changepoints. – Average, if the absolute return is between 50 and 70% percentile of the distribution of the absolute returns between changepoints. – Above Average, if the absolute return is between 70 and 90% percentile of the distribution of the absolute returns between changepoints. – Highest, if the absolute return is above 90% percentile of the distribution of the absolute returns between changepoints.
314
O. Tyvodar and P. Prystavka
Fig. 2 State transition of AAPL stock
The percentile values of the distributions of time and absolute returns between changepoints are calculated on historical data before 2010 for each separate stock, so no leakage of the future data is done. We then combine the four features explained above to define the state of the system S. Figure 2 shows the state transition of AAPL stock (Apple Inc.) in January 2020. In general, the system S can be in one of the 144 states, for example: S0 = RSI − Low_MACD − fast_Long Trend_Lowest Return.
2.2 Dataset Labeling The most common way to label the dataset is to assign a 1 to a positive move of a threshold %, −1 to a negative move of a -threshold % and a 0, if the stock move is less than threshold %. This technique has flaws due to heteroskedastic nature of the stock returns: the volatility of the stock returns is not constant so the constant threshold value cannot account for this [8]. To perform the labeling of upward and downward move of the analyzed stock, the triple barrier method is used [9]. For each analyzed day, we define three barriers: upper and lower horizontal barriers, and the vertical barrier, which defines the maximum holding period of a trade. To define the upper and lower barriers, we calculate the 10-day historical volatility [10] and set the lower barrier to Close[i] − 1.95 · SD[i], upper barrier to Close[i] + 1.95 · SD[i], where Close[i] is the analyzed day’s close price, SD[i]—volatility of previous 10 days. We assign a label −1, if the lower barrier is the first hit in 20-day ahead interval, 1 if the higher barrier is the first hit in 20-day ahead interval. If the vertical barrier is hit first, we assign 1, if the Close[i + 20] > Close[i], −1—otherwise.
Discrete Markov Model Application for Decision-Making in Stock Investments
315
In the raw form, 10-year data of the stock price represent only one sequence of many events leading to the last quoted price. In order to get more sequences and, more important, get training set for stock behavior prediction, for each analyzed day we generate a sequence of nine consecutive states drawn from previous changepoints and a state on the analyzed day. For example, below are the sequence of states drawn on June 01, 2020 for the OXY stock. These sequences can be thought of as a pattern leading to a final price move. RSI − Low_MACD − slow_Short Trend_Average Return → . RSI − Low_MACD − slow_Short Trend_Higher Low Return → . RSI − Higher than Average_MACD − slow_Average Trend_Highest Return → . RSI − Low_MACD − slow_Average Trend_Above Average Return → . RSI − Higher than Average_MACD − slow_Short Trend_Average Return → . RSI − Low_MACD − slow_Short Trend_Average Return → . RSI − Higher than Average_MACD − slow_Short Trend_Average Return → . RSI − Lower than Average_MACD − slow_Average Trend_Higher Low Return → . LABEL 1 (UPWARD MOVE)
2.3 Transition Matrices Definition To generate the transition matrices [11] from the sequences, we split the events into two separate datasets based on the label. As we predict the triple barrier outcomes, one dataset will contain the sequences which led to the label 1 (hit of the higher barrier) and other, label −1. The transition matrix A+ for price move that ends with the label 1 is determined as follows: c+ Si S j a+ = + , S j c Si S j where c+ Si S j is the number of positions in a training set of label 1 at which the state Si is followed by the state S j . We obtain the matrix A− for the label −1 from empirical data in a similar way. We calculate these counts in a single pass over the sequences and store them in 144 × 144 matrices A+ and A− . When we have two models, we can ask which one explains the observation better. To compare the models, we calculate the log-odds ratio: L a S+i S j P Si |A+ R(Si ) = log = log . P(Si |A− ) a S+i S j i=0
316
O. Tyvodar and P. Prystavka
The higher this score is, the higher the probability is that the sequence leads to the upward move (label 1). To generate the transition matrices and calculate the threshold Tstock for the score R(Si ) for each analyzed stock, the stock price data from 201001-01 to 2015-12-31 is used. To determine the value of threshold Tstock , Youden’s J statistic [12] is employed. The optimal cut-off is the threshold that maximizes the distance to the identity (diagonal) line of the ROC curve. The optimality criterion is max(sensitivities + specificities) [13].
3 Model Efficiency Testing To evaluate the adequacy of the model, we will conduct an experiment with the simulation of trading for January 2016–August 2020. For each trading day, the sequences of nine states are generated using the features explained in Sect. 2.1 and the decision to Buy is made if R(Si ) is greater than threshold Tstock , to Sell R(Si ) ≤ Tstock . Since the last Close price is used in the indicators’ calculation, we assume that the trade is opened on the next trading day using its Open as enter price. The triple barrier is also used in the process of trade generation. The trade can be closed in the following situations: • The holding period of 20 days is over. • The take profit is hit. For the long position, the take profit is placed on the top barrier explained in Sect. 2.2, for the short position—bottom barrier. • The stop loss is hit. For the long position, the stop loss is placed on the bottom barrier, for the short position—top barrier. To compare the proposed model and buy-and-hold strategy we will use the cumulative $ profit: 2020−08−01 Cumulative$Profit = signald · (exit − enter), d = 2016−01−01
where exit—close price of the trade; enter—open price of the trade; signal = 1—for buy decision, signal = −1—for sell decision. The cumulative $ profit for stocks in each economic sector is shown in Fig. 3. When analyzing the profitability of the Markov model strategy, we can notice that the communication sector is outperforming the others. Figure 4 shows the comparison of buy-and-hold versus Markov model strategy on communication’s stocks. The second outperformer is consumer discretionary, whose high returns can be explained by firstly, the outlook on American consumer spending [14] appeared to be solid in 2019; secondly, the sector is formed with the such market giants as Amazon, Target, Home Depot, and Walt Disney. Materials sector showed the lowest performance which can be explained by its sensitivity to the fluctuations in the global economy and
Discrete Markov Model Application for Decision-Making in Stock Investments
317
Fig. 3 Cumulative $ profit for discrete Markov model for S&P 100 stocks by sector
Fig. 4 Comparison of buy-and-hold versus Markov model strategy on communication’s stocks
the US dollar concerns about the US–China trading relationships and the COVID-19 pandemic in 2020. To prove that buy-and-hold strategy generates lower returns, we apply Kolmogorov–Smirnov test with the main hypothesis: H0 : F(x) > G(x)
318
O. Tyvodar and P. Prystavka
and the alternative H1 : F(x) ≤ G(x), where F(x) and G(x) are the distribution of cumulative $ profit of the Markov model and the buy-and-hold model, respectively. The p-value of the conducted test is equal to 1, so we can conduct that the proposed model is outperforming the market.
4 Conclusion The paper proposes a model based on Markov chains and its use in problems of decisions-making in investing in market traded shares. It was shown that proposed Markov model outperforms buy-and-hold approaches in the analysis of S&P 100 stocks data. It is shown that communication and consumer discretionary sectors show the highest performance due to the rapid growth of its holdings, such as Amazon, and growth in consumer spendings. Further research may be applied to the analysis of Russell 1000 data with the outlook on credit rating of the analyzed companies and development of the automated system with states defined by the indicators of user’s choice.
References 1. Mettle F, Quaye E, Laryea R (2014) A methodology for stochastic analysis of share price of markov chains with finite states. In: SpringerPlus. https://doi.org/10.1186/2193-1801-3-657 2. Zhang D, Zhang X (2009) Study on forecasting the stock market trend based on stochastic analysis method. Int J Bus Manag 3. Choji DN, Eduno SN, Kassem GT (2013) Markov chain model application on share price movement in stock market. J Comput Eng Intell Syst 4 4. Wikipedia-S&P 100. https://en.wikipedia.org/wiki/S%26P_100 5. Wilder JW (1978) New concepts in technical trading systems. ISBN 0-89459-027-8 6. Appel G (2005) Technical analysis power tools for active investors. Financial Times Prentice Hall, p 166. ISBN 0-13-147902-4 7. Patrick GM (1994) Smoothing data with faster moving averages. Tech Anal Stocks Commod 8. Singh A, Joubert J (2019) Does meta labeling add to signal efficacy? https://hudsonthames. org/does-meta-labeling-add-to-signal-efficacy-triple-barrier-method/ 9. de Prado ML (2018) Advances in financial machine learning. Wiley 10. Sinclair E (2008) Volatility trading. Wiley 11. Gagniuc PA (2017) Markov chains: from theory to implementation and experimentation. Wiley, USA, NJ. pp 1–235. ISBN 978-1-119-38755-8 12. Youden WJ (1950) Index for rating diagnostic tests. Cancer 3:32–35 13. Powers DMW (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation. J Mach Learn Technol 2(1):37–63 14. U.S. Bureau of Economic Analysis, Personal Consumption Expenditures [PCE], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/PCE
Howling Noise Cancellation in Time–Frequency Domain by Deep Neural Networks Huaguo Gan, Gaoyong Luo, Yaqing Luo, and Wenbin Luo
Abstract With the wide applications of sound reinforcement system, howling has become a major problem affecting system performance due to the acoustic coupling between the speaker system and the microphone when there exists a positive feedback loop. To suppress the howling noise, in recent years, researchers have proposed many acoustic feedback control methods such as frequency shift method, notch filtering, and adaptive feedback cancellation method. However, current methods mainly involve using adaptive filters in either time or frequency domain, which can suppress howling to some extent but may lead to sound distortion, or have limited suppression ability. In this paper, we propose a novel method to suppress howling noise from speech signal by training deep neural networks (DNN) as an adaptive filter in time–frequency domain, where short-time Fourier transform (STFT) is performed to convert the signal from the time domain to time–frequency domain, and to extract complex values as signal features, so that a supervised end-to-end DNN is constructed which can nonlinearly map the complex values of the howling speech to the complex values of the clean speech, aiming for cancelling the howling noise from the feedback signals. Experimental results have demonstrated that the proposed method can suppress the howling noise effectively, and at the same time greatly improve the quality and intelligibility of the processed speech. Keywords Sound reinforcement system · Acoustic feedback · Howling · STFT · Supervised · End-to-end DNN · Time–frequency domain
H. Gan · G. Luo (B) · W. Luo School of Physics and Materials Science, Guangzhou University, Guangzhou 510006, China Y. Luo Department of Mathematics, London School of Economics and Political Science, London WC22AE, UK © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_28
319
320
H. Gan et al.
1 Introduction Sound reinforcement system is widely used in daily life, such as in publicaddress systems, in an auditorium or conference/meeting room, where acoustic or audio/speech signals are picked up by microphones. In a closed environment, the structure of a single-channel closed-loop sound reinforcement system is shown in Fig. 1, where the source signal v(t) and the acoustic feedback signal x(t) obtained through the feedback path (F) are collected by the microphone to generate microphone signal y(t), and then the loudspeaker signal u(t) is obtained through the electro-acoustic forward path (G), which is played out by the loudspeaker. In such a positive feedback process, according to the Nyquist stability criterion [1], the amplitude of some frequency components of the microphone output signal y(t) will be larger and larger, resulting in that the closed-loop system will become unstable and an oscillation producing acoustic howling will occur. Since the use of sound reinforcement system, howling has been affecting the system performance due to the acoustic feedback and has attracted the attention of numerous researchers. Acoustic feedback causes the coupling between the loudspeaker signal and microphone signal [2–4], also known as the Larsen effect [1]. In the past few decades, researchers have proposed different approaches to suppress the howling noise. For example, designing special building structures and using particular building materials can reduce the reflection of sound, thus decreasing the coupling effect between the loudspeaker signal and microphone signal. However, these methods are often costly, requiring high expertise to operate, and are not suitable for wider applications. In addition to the manual method of suppressing howling, many acoustic feedback control methods based on digital signal processing techniques such as adaptive filtering, have also been proposed to automatically suppress howling. There are mainly three methods presented, namely the frequency shift (FS) [2, 5], the notch-filter-based howling suppression (NHS) [6, 7], and the adaptive feedback cancellation (AFC) [8]. However, both FS and NHS methods can cause signal distortions due to frequency shift on the input signal, or wide bandwidth of notch filter to suppress the amplitude of the howling frequency point, destroying the Fig. 1 Single channel closed-loop sound reinforcement system
Howling Noise Cancellation in Time–Frequency …
321
amplitude condition of howling. While the AFC estimates the feedback signal by adaptive algorithm, which is subtracted from the microphone output signal to reconstruct and approximate the original one. In practical applications, due to the correlation between the feedback signal and the desired signal, it is required to perform decorrelation operations including time-varying processing [9], noise injection [9], nonlinear processing [10], and forward path delay [11], which can also cause signal distortion. Moreover, when the frequency of the howling signal changes rapidly, the AFC method may not respond quickly enough to signal changes. As discussed, current methods mainly using adaptive filters in either time or frequency domain, can generally suppress howling to some extent but may lead to sound distortion, or have limited suppression ability. To solve the complex howling problem, we propose a novel method to suppress howling noise by training deep neural networks (DNN) as adaptive filters in time–frequency domain.
2 Theoretical Analysis 2.1 System Model The structure of a single channel closed-loop sound reinforcement system is shown in Fig. 1, where the corresponding impulse response of the system is as follows: y(t) = x(t) + v(t) x(t) = F(t) ∗ u(t), u(t) = G(t) ∗ y(t) Therefore v(t) = y(t) ∗ (1 − F(t)G(t)) where “∗” represents convolution. Thus, the frequency response of the closed-loop path from the source signal to the loudspeaker signal, including the acoustic feedback, can be expressed as transfer function: G(ω)Y (ω) G(ω) U (ω) = = V (ω) Y (ω)(1 − F(ω)G(ω)) 1 − F(ω)G(ω)
(1)
where G(ω) is the frequency response of the equipment between the microphone and the loudspeaker, F(ω) is the frequency response of the acoustic path from the loudspeaker to the microphone, F(ω)G(ω) denotes a sloop response of the closedloop acoustic system. According to the Nyquist stability criterion, when the system is unstable, the following conditions are satisfied:
322
H. Gan et al.
|G(ω)F(ω)| ≥ 1
(2)
∠G(ω)F(ω) = n2π, n ∈ Z
(3)
where the magnitude response |G(ω)F(ω)| and phase response ∠G(ω)F(ω) denote loop gain and phase, respectively. If there is a radial frequency ω in the source signal that satisfies both conditions (2) and (3), the acoustic system will produce an oscillation at this radial frequency, which can be detected by human ears as howling. To suppress howling, an adaptive filter can be designed to avoid the above conditions to be satisfied.
2.2 Adaptive Filtering 2.2.1
Adaptive FIR Filter
The adaptive filter parameters or weights can be adjusted automatically according to the statistical characteristics of the input signal. The filtering includes two processes: one is the filtering process, in which the parameters of the filter are convoluted with the input sequence to obtain the filtered output; the other is the adaptive process, in which the parameters of the filter are adjusted by the adaptive algorithm. The typical time-domain least-mean-square (LMS) adaptive FIR filter is shown in Fig. 2. According to LMS algorithm, the weight vector w(n) of the adaptive filter is updated by
Fig. 2 Adaptive FIR filter
Howling Noise Cancellation in Time–Frequency …
323
y(n) = x(n)wT (n)
(4)
e(n) = d(n) − y(n) = d(n) − x(n)wT (n)
(5)
w(n + 1) = w(n) + 2μe(n)x(n)
(6)
where x(n) is the input sequence and the error signal e(n) is the difference between the desired signal d(n) and the output signal y(n).
2.2.2
Frequency-Domain Adaptive Filter
Frequency-domain adaptive filter (FDAF) [12] is used here to compare with the proposed DNN method. As shown in Fig. 3, the input signal X (n) and the desired signal D(n) are data blocks with N data points at n time, which are converted into frequency-domain signals as X (k) and D(k) by fast Fourier transform (FFT). Then the output Y (k) of FDAF is given by W T (k) = W1 (k)W 2 (k) · · · W N (k) ⎡
⎤ X 1 (k) 0 · · · 0 ⎢ 0 X 2 (k) · · · 0 ⎥ ⎢ ⎥ X (k) = ⎢ . .. . . .. ⎥ ⎣ .. . . . ⎦ 0 0 · · · X N (k) Y (k) = X (k)W (k)
Fig. 3 Structure of frequency-domain adaptive filter
(7)
324
H. Gan et al.
E(k) = D(k) − Y (k)
(8)
where E(k) denotes the error between D(n) and Y (k), and W (k) is the frequencydomain weight vector. According to LMS adaptive algorithm, the weight update equation of FDAF is W (k + 1) = W (k) + μX ∗ (k)E(k)
(9)
where the asterisk * denotes conjugate. Next, we calculate the optimal weight of FDAF. According to the minimum mean square error criterion, we have R X X = ε X ∗ (k)X (k) R X D = ε X ∗ (k)D(k) W0 = R −1 X X RX D
(10)
where ε indicates mathematical expectation, R −1 X X is the inverse of the autocorrelation of X (k), and R X D is the cross-correlation between X (k) and D(k). It is noted that the optimal weight W0 of FDAF is the form of Wiener optimal solution. We then transform the operation formula of the frequency-domain adaptive algorithm into an equivalent time-domain operation formula to compare. According to the definition of cyclic convolution, the equivalent time-domain form of formula (7) can be expressed as ⎡
⎤ · · · x(n + 1) ⎢ · · · x(n + 2) ⎥ ⎢ ⎥ X (n) = ⎢ ⎥ .. .. ⎣ ⎦ . . x(n + N − 1) x(n + N − 2) · · · x(n) x(n) x(n + 1) .. .
x(n + N − 1) x(n) .. .
Y (n) = X (n)W (n)
(11)
where X (k) whose first column is X (n), denotes cyclic matrix of the input data block X (n). The computational speed of FDAF is faster because it transforms convolution operation in the time domain into multiplication operation in the frequency domain, and FFT is a fast implementation of Fourier transform. According to formulas (7)–(9) and (11), the equivalent time-domain form of frequency-domain weight updating formula can be obtained: E i (n) = Di (n) − X i (n)
Howling Noise Cancellation in Time–Frequency …
W (n + 1) = W (n) + μ
325 N
E i (n)X i (n)
(12)
i=1
where X i (n) is the ith data point of data block X (n). FDAF can process the signal in blocks, so that the same weight can be used in a data block, updated with the gradient of the whole data block, leading to better adjusted weights with higher accuracy.
3 Howling Noise Cancellation by DNN 3.1 Feature of Complex Coefficients To suppress howling noise, we propose to use the complex coefficient values of the signal as inputs of DNN, where signal features are extracted in the time as well as frequency domain with phase information by short-time Fourier transform (STFT). The sampling frequency of all speeches in this paper is 16 kHz. Assuming that the time-domain signal of a speech is s(t), the number of points of each frame of the signal is 256, and the number of overlapping points between frames is 128, and then the FFT of 256 points is used for each frame of data to obtain a group complex coefficient values, in the time–frequency domain, the signal at the mth frame is given by s(m, f ) =
a f1 + b f1 i , . . . , a fk + b fk i . . . , a f256 + b f256 i k = 1, 2, 3, . . . , 255, 256
(13)
where (a fk +b fk i) is the complex value of the kth frequency bin. Due to the symmetry of FFT, only a half of the values s(m, f ) is taken. At the same time, the real part and imaginary part of each frequency bin are taken out to form a new complex coefficient vector: snew (m, f ) = [a f1 , . . . , a fk , . . . , a f129 , b f1 . . . , b fk . . . , b f129 )] k = 1, 2, 3, . . . , 128, 129
(14)
It is noted that the speech of the current frame is related to adjacent frames, so we combine the complex coefficients of the current frame and the five frames before and after the current frame into a vector, which is the input of the DNN feature vector: S(m, f ) = [snew (m − 5), . . . , snew (m), . . . , snew (m + 5)]
(15)
326
H. Gan et al.
The S(m, f ) calculated from the howling speech is used as the input feature vector of DNN. The snew (m, f ) of the current frame is calculated by clean speech as the desired output y(m, f ) to train DNN.
3.2 DNN Network The deep neural network (DNN) used in this paper is a multi-layer perceptron (MLP). By establishing an end-to-end network model, the time–frequency features of howling speech extracted by STFT are directly mapped to the time–frequency features of clean speech. The output of the network is the estimation of the time– frequency characteristics of clean speech. The network structure is shown in Fig. 4. As previously discussed, the dimension of the input vector of DNN is 129 * 2 * 11 = 2838, and that of the output vector is 129 * 2 = 258. The number of hidden layers of the network is 3 layers, each layer has 500 neurons. In this study, the backpropagation (BP) algorithm is developed to train the network with mean squared error as the loss function. By repeating the two processes of forward propagation and backward propagation, the weight of the network is adjusted, so that the value of the loss function of DNN is minimized until it reaches a reasonable range or a preset value, then the training stops. The forward propagation can be represented by ‡(l) = W (l) a(l) + b(l)
(16)
a(l) = f ‡(l)
(17)
The back propagation is also the process of weight updating: Fig. 4 Diagram of DNN structure
Howling Noise Cancellation in Time–Frequency …
327
W (l) = W (l) − μ b(l) = b(l) − μ
∂E ∂ W (l)
(18)
∂E
(19)
∂ b(l)
where f (·) refers to the activation function, ‡(l) and a(l) denote the state value vector and the activation value vector of all neurons in layer l. The bias vector and weight matrix from layer l − 1 to layer l are represented as b(l) and W (l) , respectively. E is the loss function. The desired output of DNN is y(m, f ), and the output vector of DNN ¯y(m, f ) is equal to F(X(m, f )), then its loss function can be expressed as 1 || y − F(X)|| k
E=
1
= (yn − Fn (X))2 k n=1 k
(20)
Here k = 258 is the number of DNN output neurons, and F(·) represents the mapping relationship from the input to the output of DNN. We use the exponential linear unit (ELU) as the activation function and He initialization as the weight initialization to prevent the gradient from vanishing, and Adam as optimization technology to accelerate the training speed of DNN. The activation function adds a nonlinear factor to enable the DNN to represent the nonlinear system model. The nonlinear ELU activation function is
α(exp(z) − 1) if z < 0 ELUα (z) = (21) z if z 0 where the parameter α is generally set to 1. We summarize the network parameters of DNN as in Table 1. MLP neurons of the DNN network shown in Fig. 4 are fully connected to each other. The output of the DNN is the estimation of the time–frequency feature of Table 1 Network parameters of DNN
Parameters
Choice
Input layer neuron
2838
Hidden layer
3
Neuron in each hidden layer
500
Output layer neuron
258
Activation function of hidden layer
Elu
Optimization algorithm
Adam
Weight initialization
He initialization
328
H. Gan et al.
clean speech. DNN adjusts the weights of the network through the BP algorithm to minimize the error between the predicted values and the desired values, by completely mapping the input to output with nonlinear relationships. When compared with the FDAF method, the advantages of the DNN method lie in the fact that it has a higher dimensional structure and employs time–frequency analysis of input signal. It is interesting to note the similarity between the forward propagation process of DNN and the filtering process of adaptive filtering, and that the back-propagation process of DNN is similar to the adaptation process of adaptive filtering. From theoretical analysis, it is noted that the best weight of FDAF is the form of Wiener filter solution. If the acoustic environment changes rapidly, it is difficult for the adaptive filter to track the changes of the environment quickly. While DNN learns through a large number of samples, and the correct mapping law of input space and output space is stored in complex weights, by which the DNN filtering can rapidly respond to signal changes. Furthermore, the traditional howling suppression methods usually require a model of the howling problem to work out a solution, and particularly the multi-channel model can be much more complex and may be difficult to build. With the proposed DNN method, however, there is no need to build a mathematical model.
4 Experimental Results and Discussions The time–frequency features of howling speech are used as the input of DNN. In order to generate the howling noise, we first simulate the single-channel sound reinforcement system as shown in Fig. 1. For the sake of simplicity, the electro-acoustic forward path adopts fixed gain. Then howling is caused by the coupling of the microphone signal and loudspeaker signal. The acoustic feedback path F is represented by room impulse response (room acoustic characteristics). The feedback signal x(t) is equal to the convolution of room impulse response and loudspeaker signal u(t), and the room impulse response can be replaced by a finite impulse response. To obtain the howling speech, a clean speech is used as the original input of the simulation system. Only one sampling point is taken and put into the input buffer in each cycle. After the amplification by electro-acoustic forward path, the loudspeaker signal at this moment is obtained. As described in Sect. 2.1, the feedback signal x(t) is convoluted with the room impulse response. The obtained feedback signal is added to the sampling point of the next cycle to obtain the microphone signal.
4.1 Dataset We found more than 100 sentences from the Internet for experiments, and all sentences were sampled at 16 kHz. We selected 100 sentences to form the test/training set, 10 sentences to form the validation set. All the sentences in the sets use the room impulse response shown in Fig. 5 to generate howling speech.
Howling Noise Cancellation in Time–Frequency …
329
Fig. 5 Measurement results of PESQ and STOI in test set
4.2 Evaluation Method Speech quality and intelligibility are the evaluation criteria for many speech processing technologies, particularly for speech recognition. In this study, objective methods are used to measure speech quality and intelligibility. The objective measurement method of speech intelligibility is a short-term objective intelligibility measure (STOI) [14], which calculates the correlation between the time envelope of clean speech and processed speech in a short period of time. STOI has been shown to be highly correlated with human listeners’ speech intelligibility, ranging from 0 to 1. While the objective measurement method of speech quality is the perceptual evaluation of speech quality (PESQ) [15], which calculates the difference between clean speech and processed speech, and obtains the MOS-LQO (mean opinion score listening quality objective) value of speech samples. PESQ values are set from −0.5 to 4.5.
4.3 Results and Discussions 4.3.1
Measurement of PESQ and STOI
The PESQ and STOI values of the test set processed by FDAF and DNN are calculated respectively, and the values are averaged. The results are shown in Fig. 5, where DNNcomplex means by the proposed DNN method, its input feature vector is the complex coefficient values (containing phase information) of the signal in the time–frequency domain. While DNN-magnitude uses the same DNN structure, but the input feature vector is the magnitude spectrogram of howling signal in the time–frequency domain. The phase information of the clean signal is not used directly for training, while the phase of the howling signal is used when reconstructing the processed signal. It can be seen from the evaluation results in Fig. 5 that by both the PESQ and the STOI measurements, the values obtained by the DNN-complex method are the highest, indicating that its howling suppression performance is the best. Based on the DNN method, regardless of the input feature vectors, the values of PESQ and STOI are both higher than those by the FDAF method. This shows that the howling suppression ability of DNN is better than that of conventional adaptive filtering. As for the DNN-based methods, we also changed the input feature vector of the network to study the influence of different input features on the suppression ability. It is found
330
H. Gan et al.
that the DNN method based on complex coefficient value input is 0.0324 higher on STOI and 0.3613 higher on PESQ than that based on magnitude spectrogram input. This confirms that adding phase information to train DNN can greatly improve the processed speech quality.
4.3.2
Signal Reconstruction
In order to observe the spectrogram of reconstructed speech processed by DNNcomplex and FDAF, we select a sentence from the test set to calculate. The following spectrograms plotted in Fig. 6 are based on the processing of that sentence. It can be seen from the area selected by the red rectangle that the speech produces an obvious howling around 550 Hz with a large amplitude in Fig. 6b. When compared Fig. 6c with that in Fig. 6a, the spectrum of reconstructed speech processed by FDAF has a serious loss in the high-frequency part. While the spectrum of the reconstructed speech processed by DNN in Fig. 6d is very similar as in Fig. 6a, indicating that the frequency of howling is obviously removed, and signal features are well preserved. This explains why the method based on DNN has a better ability to suppress howling than the method based on FDAF.
(a)
(b)
(c)
(d)
Fig. 6 Spectrogram of speech signal: a Spectrogram of clean speech. b Spectrogram of howling speech. c Spectrogram of reconstructed speech processed by FDAF. d Spectrogram of reconstructed speech processed by DNN-complex
Howling Noise Cancellation in Time–Frequency …
331
5 Conclusions Howling in sound reinforcement system has become a major problem affecting system performance due to the acoustic coupling between the speaker system and the microphone when there exists a positive feedback loop. To suppress the howling noise, researchers have proposed many methods such as the frequency shift method, notch filtering, and adaptive feedback cancellation method. However, current methods mainly using adaptive filters in either time or frequency domain, can suppress howling to some extent but may lead to sound distortion, or have limited suppression ability. In this paper, a DNN-based method with a higher dimensional structure and nonlinear activation function is proposed by using complex coefficient values in the time–frequency domain as DNN input features that can cancel the howling noise and greatly improve the quality and intelligibility of the processed speech. The experimental results demonstrate that the performance of the proposed DNN-based method with minimized sound distortion is much better than the conventional adaptive filtering for howling cancellation.
References 1. Van Waterschoot T, Moonen M (2011) Fifty years of acoustic feedback control: state of the art and future challenges. Proc IEEE 99(2):288–327 2. Siqueira MG (2000) Steady-state analysis of continuous adaptation in acoustic feedback reduction systems for hearing-aids. IEEE Trans Speech Audio Process 8(4):443–453 3. Wang G, Liu Q, Wang W (2020) Adaptive feedback cancellation with prediction error method and howling suppression in train public address system. Signal Process 167:107–279 4. Sankowsky-Rothe T, Schepker H, Doclo S, Blau M (2020) Acoustic feedback path modeling for hearing aids: comparison of physical position based and position independent models. J Acoust Soc Am 147(1):85–100 5. Schroeder RM (2005) Improvement of acoustic-feedback stability by frequency shifting. J Acoust Soc Am 36(9):1718–1724 6. Leotwassana W, Punchalard R, Silaphan W (2003) Adaptive howling canceller using adaptive IIR notch filter: simulation and implementation. In: International conference on neural networks & signal processing, vol 1, pp 848–851 7. Deepak S (2008) Feedback cancellation in a sound system, US 8. Van Waterschoot T, Rombouts G, Moonen M (2004) On the performance of decorrelation by prefiltering for adaptive feedback cancellation in Public Address systems. In: Proceedings of the 4th IEEE benelux signal processing symposium, pp 167–170 9. Schmidt G, Haulick T (2006) Signal processing for in-car communication systems. Signal Process 86(6):1307–1326 10. Waterschoot TV (2004) Instrumental variable methods for acoustic feedback cancellation. Katholieke Universiteit Leuven, Belgium 11. Estermann P, Kaelin A (1994) Feedback cancellation in hearing aids: results from using frequency-domain adaptive filters. In: IEEE international symposium on circuits & systems (ISCAS), vol 2, pp 257–260 12. Wu S, Qiu X (2009) A windowing frequency domain adaptive filter for acoustic echo cancellation. IEICE Trans Fundam Electron Commun Comput Sci 10:2626–2628 13. Williamson DS, Wang Y, Wang DL (2017) Complex ratio masking for monaural speech separation. IEEE/ACM Trans Audio Speech Lang Process 24(3):483–492
332
H. Gan et al.
14. Taal CH, Hendriks RC, Heusdens R, Jensen J (2011) An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Trans Audio Speech Lang Process 19(7):2125– 2136 15. Rix AW, Beerends JG, Hollier MP, Hekstra AP (2001) Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In: 2001 IEEE international conference on acoustics, speech and signal processing, vol 2, pp 749–752
Daily Trajectory Prediction Using Temporal Frequent Pattern Tree Mingyi Cai, Runze Yan, and Afsaneh Doryab
Abstract Prediction of future locations from traces of human mobility has significant implications for location-based services. Most existing research in this area focuses on predicting the next location or the destination rather than the entire route. This paper presents a temporal frequent-pattern tree (TFT) method for predicting future locations and routes. We evaluate the method using a real-world dataset containing location data from 50 users in a city. Our results show that for more than 91% of the users, the accumulated average distance between the actual and predicted locations is less than 1000 m (46 m < range < 1325 m). The results also show that the model benefits from similarities between users’ movement patterns. Keywords Temporal series · Frequent pattern mining · Trajectory · Temporal frequent pattern tree
1 Introduction Location-aware systems provide services based on the current or predicted location of the users. Location prediction algorithms often focus on the immediate next location or destination location [1–4] rather than the entire route. However, in some applications such as peer-to-peer and reciprocal services (e.g., ridesharing), which involve coordination between multiple people, predicting future location trajectories of the parties involved in the transaction becomes crucial in order to connect and match them efficiently at their convenience. M. Cai (B) Carnegie Mellon University, Pittsburgh, PA 15213, USA e-mail: [email protected] R. Yan · A. Doryab University of Virginia, Charlottesville, VA 22904, USA e-mail: [email protected] A. Doryab e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_29
333
334
M. Cai et al.
This paper addresses the problem of predicting future locations and routes at different timeslots throughout the day. Given the start time and location, we aim to find the likely routes to predict a user’s locations at certain time slots. Our method constructs a temporal frequent-pattern tree (TFT) from users’ location history and uses sequence matching to predict a series of locations in a given time segment. We show our prediction models perform well on users whose movement patterns are relatively fixed and that the performance is reasonable for users with variable movement patterns. Moreover, we show the temporal tree structure efficiently stores data and makes it more convenient to exploit the similarity of movement patterns between users. The main contributions of our paper are as follows: • We introduce a temporal tree structure for storing historical location patterns and for fast retrieval of likely future patterns. • We explore the impact of different customization settings on our method, including temporal clustering, outlier removal, global predictor, and the length of the initial trajectory on the route prediction.
2 Related Work The fast development in location tracking technology has given rise to studying human mobility patterns. Some research studies have focused on predicting such patterns ranging from predicting the immediate next location to predicting the movement trajectory. Methods for predicting the next immediate location often use the given historical data from locations visited by the user. We discuss two main approaches for next location prediction: Markov modeling and Neural Networks. Chen et al. [5] presented a next location predictor with Markov modeling (N L P M M), which integrates a Personal Markov Model (P M M) and a Global Markov Model (G M M). P M M only uses each user’s data for modeling, while G M M utilizes all users’ trajectories based on the assumption that users often share movement patterns. The results show that N L P M M outperforms P M M, G M M, and other state-of-the-art methods. However, each state in N L P M M represents an abstract location, and its performance can degrade when applied to real locations. Asahara et al. [6] proposed a mixed Markov-chain model (M M M) to predict pedestrian movement. M M M categorized the pedestrians into groups and assumed the pedestrians in the same groups behave similarly. The performance of M M M is better than Markov Models and Hidden Markov Models. However, since the above methods are based on Markov models, they are instrumental in predicting the next immediate location, but they are not suitable for predicting a long-term trajectory. Another weakness of Markov models is the increased complexity and computational cost as the number of states increases. This problem becomes more prevalent in trajectory prediction because locations are modeled as states; thus, Markov models can only model a minimal number of locations.
Daily Trajectory Prediction Using Temporal Frequent Pattern Tree
335
Since the development of deep learning techniques, there have been more approaches to address the next location prediction problem using neural networks. Fan et al. [7] encoded locations and trajectories into feature vectors and built multiple Convolution Neural Networks (CNNs) and Bidirectional Long Short-Term Memory (Bi-LSTM). A real-world traffic dataset with 197, 021, 276 vehicle passing records was used for the evaluation, and their proposed models outperform Markov models. Users’ intents captured from calendar events and text messages have also been used for location prediction. For example, Yao et al. [8] proposed a method named semantics-enriched recurrent model (SERM) to learn the joint embeddings of multiple factors such as user and time. They evaluated their model on two real-world datasets and showed that it could better capture sequential transition regularity and long-term dependencies than existing models. One weakness in their approach is that, they discretize the locations, and so the precision of the results is only within each grid. Although Neural Networks have been widely used in this area, they require many computationally expensive training instances. However, location data is often difficult to collect due to user privacy issues. The continuous trajectory prediction problem aims to predict the user’s future route in the remaining time under the condition of the user’s historical and initial trajectories. Sadri et al. [9] presented a method combining similarity metrics and temporal segmentation to estimate the rest of the day location trajectory. They also implemented temporal correlation and outlier removal to improve the method’s performance and reliability. Chen et al. [10] designed a continuous personal route prediction system using the Continuous Route Pattern Mining (CRPM) algorithm and decision tree-based algorithms. The client-server architecture of the system can protect personal privacy. In [11], the same authors analyzed user movement patterns from historical data and then predicted the destination and continuous future routes given the start routes. However, they did not consider time factors, which are essential for route prediction since most users’ movement patterns are repetitive.
3 Methods 3.1 Temporal Frequent Pattern Tree (TFT) Han et al. [12] developed Frequent Pattern Growth (FP-growth) algorithm and an efficient data structure (FP-Tree) for mining frequent pattern objects. Unlike the Apriori candidate set generation-and-test approach, the FP-growth adopts a patternfragment growth method to avoid the costly generation. The efficiency of mining is also achieved by compressing a large database into a FP-Tree, avoiding repetitive scans of the database. They have shown that the FP-growth method is efficient and scalable for mining both long and short frequent patterns and is about an order of magnitude faster than the Apriori algorithm. This algorithm was mainly designed for efficient mining of patterns in customer transactions without considering the
336
M. Cai et al.
time sequence. However, our modeling of frequent routes requires considering the temporal and sequential aspect of the route, i.e., if location spot l1 is visited at time t, then l2 must be visited at time t + , where > 0. Therefore, we extend the FP-Tree method to include temporal series data that we refer to as Temporal Frequent-Pattern Tree (TFT). The following describes the method in detail. Let P = {(ti , s j )} be a general temporal series, with 0 ≤ i ≤ m and 0 ≤ j ≤ n, where m is the total number of time slots and n is the total number of distinct items (location spots in our case). Given a temporal series database D B = {P1 , ..., PK } which consists of K temporal series, our goal is to design a structure for efficient mining of temporal sequences and to identify their frequency. A temporal frequent pattern tree has the following structure: 1. It consists of a root labeled as “null”, a set of subtrees, and a frequent-item headertable. 2. Each node in the tree structure consists of four fields: item, timeslot, count, and a node-link. The timeslot field can be any time segment, e.g., hours during the day, days of the week, etc. In our paper, the timeslot field represents hours of the day, an integer between 0 and 23, inclusive. The count field registers the count of the item-timeslot appearing in the database, and node-link links to the next node in the tree carrying the same item and a larger timeslot or null if there is none. 3. Each entry in the frequent-item headertable consists of three fields: item, head of node-link in the TFT carrying the same item-name or null if there is none, and frequency. This frequency field is the frequency of some item appearing in the database as opposed to the count of the item-timeslot appearing in the database. Given a temporal data series D B, and for each tuple (ti , s j ) in D B, there are three cases: Case 1: If the node (ti , s j ) already exists in the tree, then 1) increment its node count by one, and 2) increment the frequency of the edge from the root to (ti , s j ) by 1. Case 2: If the node (ti , s j ) does not exist in the tree but the node (ti , s j ) exists in the tree with ti < ti , then 1) create a new node (ti , s j ), 2) create a node-link from (ti , s j ) to (ti , s j ), and 3) increase the frequency of the edge from the root to (ti , s j ) by 1. Case 3: If the item s j does not exist at all in the tree, then 1) create a new node of (ti , s j ), as well as a new entry of (s j , f ) in the headertable, where f = 1, and 2) create an edge from the root to the new node (ti , s j ). We set the root to node (ti , s j ) after each iteration. Table 1 shows an example dataset that consists of three trajectories P1 , P2 , and P3 , where each contains location spots that span six hours. We first instantiate our temporal frequent pattern tree with the new data. Then we build the corresponding tree by processing the user’s data from each temporal series. For each series, we start from the root of the tree. For each time slot and location item, we check if there is already a node in the tree with the same location and timeslot. If there is, we increase the frequency of that node by one and connect the current node and that node by an edge. Starting from the root for each series, we recursively process all
Daily Trajectory Prediction Using Temporal Frequent Pattern Tree Table 1 Example database Hours t0 t1 Trajectories P1 P2 P3
l1 l1 l1
(a) Tree with P1
l3 l2 l2
337
t2
t3
t4
t5
l4 l4 l4
l1 l2 l4
l2 l3 l1
l1 l1 l1
(b) Tree with P1 and P2
(c) Tree with P1 , P2 , and P3
Fig. 1 Construction process of TFT tree for the data in Table 1
data. The frequencies of l1 , l2 , l3 , and l4 appearing in D B are 8, 4, 2, 4, respectively. As demonstrated, the frequencies of the sequence l1 → l2 and l1 → l2 → l4 are relatively high compared to others. The process of constructing the TFT tree for the data in Table 1 is shown in Fig. 1a–c.
3.2 Route Prediction Modeling Our goal is to predict the route of future locations from the constructed TFT. We first characterize a route through the following definitions. Definition 1 A spot v ∈ V is a point of location. Note that such a location could either be an outdoor location or an indoor location. An outdoor location is usually represented as a geo-location in the GPS, whereas an indoor location is a contextual location such as an office or kitchen. We assume that both outdoor and indoor locations could be encoded as a numerical latitude-longitude pair. Definition 2 A significant spot denoted as s is a spot or a clustering of spots that indicate a location that the user is likely to visit regularly.
338
M. Cai et al.
Definition 3 A route Er is an ordered sequence of significant spots, where ∀e ∈ Er , e ∈ E, is a path from a significant spot s1 to another significant spot sn . Such a path follow the standard mathematical definition of a path in a graph. In particular, Er := [e1 , e2 , e3 , ..., em ], e1 = (s1 , s2 ), e2 = (s2 , s3 ), e3 = (s3 , s4 ) ... em−1 = (sm−1 , sm ), em = (sm , sm+1 ). Following the terminology in previous section, a route is an instance of a temporal series P = {(ti , s j )} (0 ≤ i ≤ m and 0 ≤ j ≤ n) where the item fields are significant spots, m = 23 is the total number of hours during a day, and n is the total number of significant spots. As an example, P = {(7, s1 ), (7, s2 ), (8, s3 ), (9, s3 )} might be a person’s route in the morning. We also define Ppred as the route of the current day, the day that needs prediction. Similarly, if the database D B = {P1 , ...PK }, then Pk stands for the route of the k-th day in the database. Definition 4 A sub-route P ti ∼t j is a contiguous sub-sequence of route P between timeslot ti and t j with timeslots between 0 ≤ ti ≤ t j ≤ 23. For example, if we want 10∼13 . to predict current day’s trajectory between 10 : 00 and 13 : 00, we aim to find Ppred Definition 5 The length of a route P is the length of its sequence. For example, the length of the route trajectory given above is 4. In general, the length of the route P ti ∼t j is t j − ti + 1. Our route prediction method uses historical routes constructed as a TFT to predict the current day’s future location route. The algorithm assumes the first part of the route is known. This can be one starting point and time or a sequence of location spots with corresponding timeslots. The algorithm then finds a subset of candidate routes that may predict the rest of the sequence. The final step chooses the route with maximum frequencies. Algorithm 1 describes how we use TFT to predict routes.
Algorithm 1: Route Prediction t ∼t
i j Input: (T , Ppred ) T := T F T of historical routes ti ∼t j Ppred := The known part of the route to be predicted
t
j+1 Output: Ppred
∼m
t ∼m {Pk j+1
set T := : Pk ∈ T } set P∗ = {} for t j+1 ≤ t j ≤ m do add(t j , s j ) to P∗ , where s j is the location of maximum frequency at time t j r etur n P∗
The algorithm by default uses the built TFT from each individual user’s data to predict their future routes. However, in some cases, the lack of enough historical data may affect the prediction results. To accommodate for this limit, we take advantage of similarity between movement patterns that may exist between users in same spatial
Daily Trajectory Prediction Using Temporal Frequent Pattern Tree
339
regions or clusters. Therefore, in addition to building individual TFTs for each user, we build a community tree with from all users. Definition 6 A Community tree (C-TFT) is a TFT as described in Sect. 3.1 except the database D B consists of the data from all users from the same region. We then use this tree to predict individual routes. We call this approach Global Prediction which we test in our experiments as described in the next section.
4 Experimental Evaluation In this section, we introduce how we use past historical location GPS data and the initial routes on the current day to predict trajectories during the rest of the day. We collected the GPS coordinates from 50 users in a city for five consecutive weeks. The data was collected at three-minute intervals and continuously uploaded to our server. It was processed daily to extract and update location and route patterns for each participant. We then performed a clustering method on all the location spots in the dataset and replaced each location coordinate with a corresponding intersection coordinate. We built the routes with intersections as significant spots. To test our prediction algorithm, we used portions of data from each user to build the TFT tree and used the rest to test the prediction performance. Since different users have a different amount of data, for each user, we selected the first 45 of the total days as historical trajectories and the last 15 as test sets. We then built TFT models with the historical trajectories and used our route prediction algorithm on the test set to predict future location trajectories. For each test day, we ask the model to predict users’ locations at every timeslot after the end timeslot of the known trajectory (as 0∼7 , then we ask long as there is data for that timeslot). For example, if we are given Ppred the model to predict locations for that user starting at eight that day. Once the model has made the prediction, we calculate the distance between predicted locations and actual locations for each timeslot and obtain an average. In the end, we calculate the average for all days to obtain the overall distance. Table 2 shows a sample route from a user along with the model prediction results. The first row represents the times during the day, and the last row is the distance between the predicted location and the actual location. The predicted location and the actual location are the same from timeslots 4 am to 9 am. From the table, the model’s prediction is accurate before 7 pm. The predicted location at 7 pm is l0 , while the actual location is l2 . In this scenario, l0 is likely to be the user’s home since they most frequently visit it from 20 to 6 each day. Similarly, l1 is probably the user’s workplace since it appears mostly during 10–17. The location l2 only appears in the database once, and among all other locations in the user’s database, l0 is the closest to l2 with a distance of 1.19 km. One possibility in this scenario could be that the user stayed at work until 18 on that particular day and then ran errands on the way back home on a different route. This divergence resulted in lower prediction accuracy on that particular day (see Table 3).
340
M. Cai et al.
Table 2 Sample route Time (hours) 4–9 Actual_loc Predicted_loc Distance (m)
l0 l0 0
10–18
19
20–23
l1 l1 71.4
l2 l0 1189.6
l0 l0 0
Table 3 Global and individual predictors. Distance is in meter and time is in millisec Mean dist. Max dist. Std dist. Mean time Individual Global
505.5 446.0
1325.0 824.0
340.6 270.8
1267 13613
We examined different settings, including the number of historical routes, outlier removal, and global predictors, to understand the effect of different factors on the prediction performance. We then explored the role of variation in movement patterns in prediction performance. The results are described in the following sections.
4.1 Impact of the Number of Historical Routes We explored how different numbers of historical routes in constructing TFT impact the prediction outcome by creating a comparison. For the first model (model 1), the historical routes included all the routes in the first 45 days. In the second model (model 2), only part of the routes was chosen as historical routes. Our hypothesis was that, model 1 would be more accurate than model 2, but model 2 would be faster. For model 2, we selected routes on the same weekdays as the prediction day. For example, if the prediction day were Wednesday, the model would only look at routes on previous Wednesdays. As shown in Table 4, the models with more data are indeed more accurate. In terms of performance, the average time for model 1 to make predictions is 1267 ms, whereas it takes 1096 ms for model 2. The performance speed of the two models is very close. We believe this is because the dataset contains only around five weeks of data for each user, and thus the difference between the number of historical routes in the two models is modest. Despite this small difference in data, the two models differ significantly in terms of accuracy. We believe that there are too little data for model 2, which only looks at the same weekdays. If there are four weeks of historical routes in total, then the TFT for model 2 will be built from around one week of data. This data may not accurately model a user’s frequent routes. When the size of the dataset increases, this tradeoff becomes more worth considering.
Daily Trajectory Prediction Using Temporal Frequent Pattern Tree
341
Table 4 Impact of the number of historical routes and outlier removal. Model 1 and 2 are built with outliers removed, whereas model 3 and 4 include outliers. Model 3 is built with all historical routes, and model 4 takes partial data to build the TFT and make predictions Dist. ≤ Dist. ≤ 750 Dist. ≤ 500 Max (m) Min (m) Time (ms) 1000 m (%) m (%) m (%) Model 1 Model 2 Model 3 Model 4
91.2 76.5 73.0 68.9
78.4 64.7 63.5 50.8
51.0 37.3 49.2 34.4
1365 2383 2625 3403
46 59 72 79
1267 1096 1175 953
4.2 Outlier Removal Since our focus is mostly on frequent (routine) routes, we try to identify outlier routes (e.g., a business trip) and measure the model’s performance with and without the outlier routes. However, outlier detection requires a distance threshold that might differ from user to user depending on travel patterns. To obtain a personalized threshold for each user, we first calculate the individual distance threshold based on each person’s average distance traveled per day for five weeks. Then, we cluster those averages to find the distance threshold for each user. This threshold is used to identify outlier locations. Our method is summarized in the following steps: 1. Scan D B. Sum up (lat ∗ f, lon ∗ f ) for each distinct location in D B, where lat and lon are latitude and longitude of the location, respectively, and f is the frequency of the location visits by the user. 2. Calculate the centroid coordinates by dividing the above coordinates by the total frequencies. 3. Calculate the average distance d between the centroid and other locations. 4. If the distance between l and the centroid is greater than the d above, then output true and false otherwise. Table 4 summarizes the results from models with and without outliers. Models 1 and 2 are built with outliers removed, whereas models 3 and 4 include outliers. Model 3 is built with all historical routes and model 4 takes partial data to build the TFT and make predictions. By comparing model 1 with model 3 and model 2 with model 4, we can see that regardless of the number of historical routes, the model without outliers is much more accurate but slightly less efficient. As mentioned in previous sections, the reason behind this is that since we measure model accuracy in terms of average distance, outliers can have a significant impact on accuracy.
342
M. Cai et al.
4.3 Global Predictor As previously mentioned, we construct a Community TFT (C-TFT) to accommodate the sparsity in users’ data and evaluate its performance in predicting future routes for users in the same region. Such users may have similar movement patterns, so data from other users may cover the user’s missing routes. To use the global route predictor, we traverse the community TFT instead of the individual TFTs. Table 3 is a comparison between the individual predictor and the global predictor. The performance of the global predictor is slightly better than the individual predictor in terms of residual distance. However, the average time to make predictions is more than ten times higher than the individual model. This is because the community tree’s size can be large, and the amount of data that needs to be processed is also substantially larger than the individual tree.
4.4 The Role of Variable Movement Patterns in Prediction Performance We further explore the relationship between model performance and the user movement patterns to understand the impact of variation in those patterns on predictions. We separate the users into four groups: Group 1 are users with a predicted distance above 1000 m, Group 2 are users with a predicted distance between 1000 and 750 m, Group 3 are users with a predicted distance between 750 and 500 m, and Group 4 are users with a predicted distance less than 500 m. We hypothesize that a lower average distance implies lower movement variation. To test this, we first fix four time slots 7, 8, 17, and 18 when the users are likely to have the most varied movement patterns. Then for each time slot, we count the number of distinct locations ever visited for each user at that time. This gives us a different list of numbers for each group. We then take the standard deviation within each list to measure how varied movement patterns are within each group. We repeat this process for all four time slots to obtain the results in Table 5. As expected, users in groups 1 and 2 have a larger variation in the number of significant spots than users in groups 3 and 4, which implies relatively more variant movement patterns.
5 Conclusion We designed a temporal tree structure that efficiently stores location data from users and helps predict future location trajectories. We built different data models and examined the effect of outlier routes and the number of routes used in building the TFT. Our evaluation of real-world data from 50 users collected over five consecutive weeks showed that in the best case, in 91.2% of the time, the accumulated average
Daily Trajectory Prediction Using Temporal Frequent Pattern Tree
343
Table 5 Comparison between different user groups based on Standard Deviation of the number of different locations. Group 1 are users with predicted distances above 1000 m, Group 2 are users with predicted distances between 1000 and 750 m, Group 3 are users with predicted distances between 750 and 500 m, and Group 4 are users with predicted distances less than 500 m Std. of locations 8 9 17 18 time slots Group 1 Group 2 Group 3 Group 4
42.0 38.2 30.5 34.1
62.3 60.0 49.5 29.5
105.0 92.4 66.3 29.2
139.2 109.4 43.8 52.5
distance between the predicted location trajectories and actual routes was less than 1000 m (min = 46 m and max = 1365 m). Our future steps include testing with a larger dataset collected over a longer time and evaluating the ridesharing real-world application method.
References 1. Do TMT, Dousse O, Miettinen M, Gatica-Perez D (2015) A probabilistic kernel method for human mobility prediction with smartphones. Pervasive Mob Comput, 20(C):1328 2. Jeung H, Liu Q, Shen HT, Zhou X (2008) A hybrid prediction model for moving objects. In: 2008 IEEE 24th international conference on data engineering, pp 70–79 3. Jeung H, Yiu ML, Zhou X, Jensen CS (2010) Path prediction and predictive range querying in road network databases. VLDB J 19(4):585602 4. Scellato S, Musolesi M, Mascolo C, Latora V, Campbell A (2011) Nextplace: a spatio-temporal prediction framework for pervasive systems, vol 6696, pp 152–169 5. Chen M, Liu Y, Nlpmm XY (2014) A next location predictor with Markov modeling. In: Tseng VS, Ho TB, Zhou Z-H, Chen ALP, Kao H-Y (eds) Advances in knowledge discovery and data mining. Springer International Publishing, Cham, pp 186–197 6. Asahara A, Maruyama K, Sato A, Seto K (2011) Pedestrian-movement prediction based on mixed Markov-chain model, pp 25–33 7. Fan X, Guo L, Han N, Wang Y, Shi J, Yuan Y (2018) A deep learning approach for next location prediction, pp 69–74 8. Yao D, Zhang C, Huang J, Serm JB (2017) A recurrent model for next location prediction in semantic trajectories, pp 2411–2414 9. Sadri A, Salim FD, Ren Y, Shao W, Krumm JC, Mascolo C (2018) What will you do for the rest of the day? An approach to continuous trajectory prediction. In: Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, vol 2, no 4 10. Chen L, Lv M, Ye Q, Chen G, Woodward J (2011) A personal route prediction system based on trajectory data mining. Inf Sci 181(7):1264–1284 11. Ling C, Mingqi L, Gencai C (2010) A system for destination and future route prediction based on trajectory mining. Pervasive Mob Comput 6(6) 12. Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, SIGMOD 00. Association for Computing Machinery, New York, NY, USA, p 112
Quick and Dirty Prototyping and Testing for UX Design of Future Robo-Taxi Dokshin Lim
and Minhee Lee
Abstract People increasingly view mobility as a service and want more choices for traveling between points A and B. Designing user experiences of future robo-taxi needs to be seen from a broad perspective and consider an extended user journey. Our work explores the application of quick and dirty prototyping and usability testing to design UX of future robo-taxi services from the ground up. We made low-fidelity prototypes early in the design process and quickly tested with users iteratively to answer initial questions arisen on critical touchpoints in the user journey. Two critical touchpoints are rider-driverless car match and in-car environment configuration. Five optional eHMI (external HMI) were tested, and our results suggest combining options depending on the distance between the vehicle and the passenger. We also tested two environment templates and found that preference depends on travel time regardless of the purpose of the journey. Finally, we suggest a service scenario composed of 14 scenes. Keywords Autonomous vehicles · Robo-taxi · eHMI · Quick and dirty prototyping · Usability · Co-creation · UX
1 Background Consumers, who increasingly view mobility as a service, want more choices for traveling between points A and B, including ride hailing, car sharing, and perhaps even self-driving “robo-taxis.” [1]. A robo-taxi, also known as a self-driving taxi or a driverless taxi, is an autonomous vehicle of level 4 or 5 (levels defined by SAE international) operated for ridesharing services like NuTonomy and Waymo. Public opinion on self-driving vehicles, in general, is regarded as skeptical in many published reports D. Lim (B) Department of Mechanical and System Design Engineering, Hongik University, Seoul, South Korea e-mail: [email protected] M. Lee Samsung Electronics, Suwon, Gyeonggi, South Korea © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_30
345
346
D. Lim and M. Lee
[2]. There are studies that suggest how to better offer fully autonomous vehicles in a way to improve people’s trust. For example, there exist gender differences in fearing autonomous vehicles depending on the size of the cars [3]. In terms of appearances, it is demonstrated by ‘Firefly’, the first hardware prototype of Waymo, that robo-taxi design in the early phase of adoption is better to be a small and completely different style than conventional cars [4]. Another study [5] pointed out specific user interfaces such as information display, handles, digital blinkers, camera, and mobile application’s presence affected the feeling of trust. User experience design of autonomous vehicles is being developed under different subjects and this study place the focus on designing user interfaces needed in critical situations while using a future robo-taxi. Lim et al. [6] view the user journey in a simple framework from a case study of T5 Pod (available at Heathrow Terminal 5 in the UK) and indicate that users feel anxious when they are not able to anticipate and prepare for the next step at every step. Lee also simply breaks down the user journey of robo-taxi services into five steps: 1. Calling a service, 2. Waiting, 3. Boarding, 4. Moving, and 5. Getting off [5]. The issues in common arise when users want to make the best use of their time either to prepare for the next step or to enjoy the best of being in the journey. “Quick and dirty” is the term that became popular by Brooke [7] and his approach is still widely practiced. Quick and dirty usability tests require less effort while still yielding useful results. These approaches are quite revealing in the early stages of a project as a quick sense check [8]. The main idea of quick and dirty prototyping is to start with cheap and fast prototypes. It can be achieved by using low-cost, readily available materials on the spot in early-stage and using (creating) these lowfidelity prototypes. It is important to make sure that the low-fidelity prototype has just the level of detail required, never too much. Also, it is important to accept that the prototype is to break, completely destroy or throw away once the questions they pose are answered. Thus, “$1 prototyping” [9] or “Rough Prototyping” (Service Design Tools [10]) is appropriate. Quick and dirty methods have sampling bias though. Therefore, the results should be counted mindfully [11].
2 Research Methods 2.1 Our Process This research is composed of two main iterations of prototyping and usability testing, which resulted in our first user scenario. Then, a co-creation workshop is organized to refine and develop our final service scenario. Figure 1 illustrates our process.
Quick and Dirty Prototyping and Testing for UX Design …
347
Fig. 1 Our process
2.2 First Iteration The first iteration deals with the rider-driverless car (robo-taxi) match situation. The rider-driver match is a perplexing experience even today. How future robo-taxis could make the rider-car match experience simple and easy? In each iteration, we define four key components (people, objects, location, and interactions) of prototyping and testing [12], which make decision-making of each step clear. Nine subjects are passengers who are waiting for a robo-taxi (see Table 1). Table 1 People; participants of the first iteration No.
Age
Gender
Job
Major
1
21
M
A second-year undergraduate
Visual design
2
22
F
A junior undergraduate
Visual design
3
21
M
A first-year undergraduate
Visual design
4
23
M
A second-year undergraduate
Automobile design
5
24
M
A senior undergraduate
Mechanical engineering
6
24
M
A senior undergraduate
Mechanical engineering
7
24
M
A senior undergraduate
Mechanical engineering
8
24
M
A junior undergraduate
Mechanical engineering
9
25
F
A web designer
Graphic design
348
D. Lim and M. Lee
Fig. 2 Objects; user interfaces of the robo-taxi and passenger’s mobile phone
Users’ mobile and a vehicle can communicate in real time through a network and near-field communication. Based on this, information is displayed on the vehicle’s user interface, assuming that the front glass is displayed so that the passenger can scan it before boarding. The front-facing screen of robo-taxi and passenger’s mobile phone screens are prototyped as shown in Fig. 2. The two interfaces are consistent. The interaction takes place in the following steps. A passenger calls a robo-taxi via a mobile phone. A robo-taxi approaches from a distance. The passenger compares the interfaces on both devices and identifies the taxi. Hypothesis and Determinations. A hypothesis is that people will prefer a minimum amount of information on an abstract level. The following determinants were applied differently (see Table 2).
2.3 Second Iteration The second iteration studies in-car user experiences. In the era of autonomous vehicles, we expect to be able to perform various tasks during the journey. How future robo-taxis could offer an in-car environment optimized for the tasks that users want? Six subjects assumed moving to a club to attend a party with friends at night (see Table 3). A combination of dashboards, lighting, and music defines the “environmental templates” of future robo-taxi. Robo-taxi analyzes the passenger’s context and sets the most optimized environmental template. The interaction is as follows. In order to go to a party, passengers ride a taxi. The back-end system of taxis analyzes the context of passengers. Based on the analysis, the taxi system optimizes the template for the journey.
Quick and Dirty Prototyping and Testing for UX Design …
349
Table 2 Five low-fidelity porotypes of the first iteration No. Determinants 1
Text
2
3
Information
Robo-taxi
Mobile
User data Unprocessed – User Name – Destination – Time of request – Time of departure – Car number Processed
Random code
4
Graphic Image
5
Color
– User ID – Car ID
– Ex: 64WPQL
– Icon
– Ex: Yellow
Hypothesis and Determinations. A hypothesis is that people will prefer that the user interfaces of robo-taxi are automatically set depending on the passenger’s context of use. The determinants of this experiment are whether or not custom the in-car UI is provided based on contextual awareness. We conducted two tests with two groups. Three subjects played the role of friends, who went to the party together. When passengers got into the robo-taxi, the system set the “Party” UI template based on
350
D. Lim and M. Lee
Table 3 Profile of second test participants No.
Group
Age
Gender
Job
Major
1
A
23
M
A graduate student
Visual design
2
A
23
M
A graduate student
Visual design
3
A
23
M
S/W engineer
Web S/W
4
B
23
M
S/W engineer
Mobile S/W
5
B
23
M
Project manager
Web service
6
B
23
M
H/W engineer
Mobile H/W
Fig. 3 Low-fidelity protype of the second iteration
the passengers’ destination, time, passenger’s personal information, etc. Lighting, music, and dashboard display were set up according to the theme (see Fig. 3).
2.4 Co-creation Workshop In this study, experts in autonomous vehicles co-create the UX scenario to make up for the limitation of quick and dirty prototyping. When the problem is complicated, designers experience limitations in solving problems. They inevitably accept cocreation with others [13]. Two external experts (in Table 4) and four internal project members worked together in the workshop. Based on the previous process, all participants have identified what the pain points are. Then, they build new scenarios that can solve the problems.
Quick and Dirty Prototyping and Testing for UX Design … Table 4 Profile of semi-professionals who participated in co-creation
No.
Position
1 2
351 Company
Career
UX designer
Electronics company
15 Years
Project manager
Automotive AI S/W company
3 Years
3
Product planning
Electronics company
14 Years
4
Graphic designer
E-Book service
4 Years
5
Product designer
Electronics company
11 Years
6
A graduate student
N/A
N/A
3 Results 3.1 First Iteration The subjects preferred to identify seamlessly at a glance. The lower the complexity, the higher the preference, and the most preferred type was the color (option no. 5 in Table 2). Option no. 1 and 2 were repulsed in terms of personal information being displayed. However, right before boarding, many wanted to confirm with specific information. As a result, six out of nine subjects suggested a combined approach. From the distance, they preferred to identify the vehicle with implicit information such as color. When the vehicle got closer enough, they preferred explicit data such as their id to confirm.
3.2 Second Iteration Three out of six did not prefer contextual UI’s automatic configuration. The negative opinion revealed that they just wanted to sleep or rest regardless of the destination. Also, they mentioned that the travel time was not long to consider environmental configuration. Three people who preferred this feature predicted that it would be useful if the templates were diversified and refined.
3.3 First User Scenarios Based on the results of two iterations, user scenarios are defined as in Fig. 4. Important visual elements are well emphasized in the form of storyboards. The scenario is in the following order: (Scene #1) Main character A prepares to go out. (Scene #2) A calls a robo-taxi using his mobile. He adds his friends as passengers and depicts the final destination. The system identifies each passenger’s location and calculates the optimized route to pick up everyone. (Scene #3) A receives a notification that his car
352
D. Lim and M. Lee
Fig. 4 First user scenario after quick and dirty prototyping
is close. (Scene #4) A gets in the car and moves to the location of B. (Scene #5) B is waiting on the boulevard. Simple visual information is displayed on the front glass of the robo-taxi. The same visual information is shown on B’s mobile phone, making it easy for B to identify that the car is for him. (Scene #6) The taxi is carrying A and B and heading to C’s location. (Scene #7) C, ready early, is moving without waiting. The taxi’s back-end system tracks C’s location in real time and moves to where C is. (Scene #8) After C gets on board, the taxi moves to the final destination of the group.
3.4 Co-creation Workshop The above user scenario was discussed to supplement the following design issues: Rider-driverless car match scenario. Both agreed to a simple test through color. P2 noted that it is practical to derive a way to distinguish the specificity of information services by distance. Professional 2 noted, “It accurately reflects the needs of users who do not want to get much attention when boarding a taxi, but want to avoid errors just before boarding,” Professional 1 said, “Random code should be simplified to 2–3 digits or emoticons to reduce user stress.” Professional 1, who is a UX designer, noted that a designer should consider user psychology. In the introduction of autonomous driving technology, users will have anxiety about the new high-end technology, so it is necessary to consider emotional aspects. “I think it would be better to use emoticons rather than random numbers and letters.” Professional 2 assumed that the passenger was also moving toward the car. A scenario is possible to ask the occupant to move to the vehicle to optimize the route. Scenarios for exception cases are needed, such
Quick and Dirty Prototyping and Testing for UX Design …
353
as when the passenger’s location is not known. AI technology will also be introduced in robo-taxi, so the vehicle can actively call the passenger. Customized UI for the vehicle. This feature is useful, but preferences vary depending on individuals and context, so every professional agreed to suggest and have users choose the template themselves. Both professionals said that situations, not passenger feelings, should be prioritized to define UI. Defining UI based on other objective factors, excluding analogy to emotion, can minimize errors. When robo-taxi automatically configure the user interface, the voice can also be specialized according to preference or situation. There was also a mention of the limits of quick and dirty prototyping. Professional 1 commented, “In addition to typical situations such as parties, reviews are needed based on various situations. If the research and definition of space itself are preceded in the following study, more sophisticated results will be produced.” Co-creation of the final service scenario. Two semi-experts and researchers improved the user scenario by reflecting on the issues discussed earlier. Final service scenarios are shown in Table 5. Discussing each scene on the storyboard with semiprofessionals, the service scenario became sophisticated. Various methods have been proposed to improve low-fidelity prototypes into high-fidelity services applied to the real market.
4 Conclusion The test showed the need of combining options depending on the distance. People preferred to identify their car all the way from a distance till the car is very close. The most preferred option was the color. People showed their concern of privacy regarding the interface showing unprocessed user data. However, they wanted to have clear confirmative information when the car is very close and were willing to use explicit user data. Our second test revealed clear personal preferences. Three out of six did not prefer the contextual UI’s automatic configuration, so the hypothesis could not be verified. Some people who were negative about this feature said that they might do things that were not related to the destination (for example, sleep or rest). Another important finding is that it was also influenced by the length of travel time. When it does not take long, there is no interest in adaptive UI. By applying a series of quick and dirty prototyping and testing, we could validate our hypothesis in two days at an ultra-low cost. The co-creation workshop definitely supplemented the shortcomings of the quick and dirty methods. Our results and final service scenarios may give implications in designing eHMI(external human–machine interface) or in the interior design of future robo-taxis.
354
D. Lim and M. Lee
Table 5 Service scenarios created in the co-creation workshop No.
Scenes
Scenario description
1
Joe Frey decides to go to a club with his friends, getting ready! He turns on the robo-taxi app with his mobile and requests a taxi. Joe adds two more people as passengers
2
Robo-taxis analyzes the locations of Joe, Kelvin, and Steve to optimize the route. The first pickup is Joe’s house: Joe gets a close-up notification and goes out to the boulevard in front of the house
3
When robo-taxi enters a near distance, it displays a specific color (blue), which is the same color displayed on Joe’s mobile. It is vivid, so that can be seen from a long distance
4
As the vehicle enters a closer distance, the color is blurred, and a simple emoticon code occurs. The color of Joe’s mobile is also blurred, and the same emoticon is shown
5
The second passenger, Kelvin, is waiting for the taxi on the road early. As the vehicle nears, a simple emoticon code co-occurs as the mobile and the vehicle, making it easy for Kelvin to recognize which vehicle to take
6
Nearby communication occurs between the two devices as Kelvin approaches the vehicle door, which automatically unlocks the vehicle. At this time, haptic and notification occur on mobile as well. Kelvin checks his name and gets on the taxi
7
The robo-taxi picks up Joe Frey and Kelvin and heads to Steve, the last passenger
8
Steve, the third pick-up target, is in a hurry. So, he keeps walking, looking at the taxi’s moving position. While he is on the move, the notification arrives on mobile that the vehicle has entered the perimeter
9
The robo-taxi has entered a close distance from Steve, but the final route is not accessible due to one-way traffic. Nevertheless, since it takes quite a while to go back, the vehicle system analyzes whether it will go back or ask Steve to come (continued)
Quick and Dirty Prototyping and Testing for UX Design …
355
Table 5 (continued) No.
Scenes
Scenario description
10
After analyzing that it is much faster for him to walk, Robo-taxi calls Steve. Robo-taxi asks Steve to come back about 20 m through an AI-based call and sends maps to Steve’s mobile
11
Steve also seamlessly identifies the taxi and gets into the car after auto-unlock
12
Robo-taxi analyzes the UI that passengers are expected to prefer depending on their destination, arrival time, day of the week, Etc
13
Based on the analysis, the taxi first recommends a mood template named “Party” to the dashboard, and various templates are presented below it
14
When Kelvin selects the Party template, the lights on the vehicle change, and the music plays. The dashboard plays live-streaming of the destination club. Joe, Kelvin, and Steve go to the club with excitement!
Acknowledgements This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (No. NRF2018R1D1A1B07045466).
References 1. Gauging the disruptive power of robo-taxis in autonomous driving|McKinsey. (October 4, 2017). https://www.mckinsey.com/industries/automotive-and-assembly/our-insights/gaugingthe-disruptive-power-of-robo-taxis-in-autonomous-driving. Accessed 22 Nov 2020 2. Public Opinion Polls Show Skepticism About Autonomous Vehicles by Advocates for Highway and Auto Safety. (April 26, 2018). https://saferoads.org/wp-content/uploads/2018/04/AV-Pub lic-Opinion-Polls-4-26-18.pdf. Accessed 22 Nov 2020 3. Lim D, Lee H (2018) A study on technology acceptance of fully autonomous cars for UX design. J Integr Des Res 17(4):19–28 4. Newcomb D (2018) Will cute cars make you less scared of autonomous tech?| PCMag. (May 1, 2018). from https://www.pcmag.com/opinions/will-cute-cars-make-you-less-scared-of-aut onomous-tech. Accessed 22 Nov 2020
356
D. Lim and M. Lee
5. Lee M, Lee Y (2020) UI proposal for shared autonomous vehicles: focusing on improving user’s trust. In: Krömker H (ed) HCI in mobility, transport, and automotive systems. Driving behavior, urban and smart mobility. Springer International Publishing, pp 282–296. https://doi. org/10.1007/978-3-030-50537-0_21 6. Lim D, Lee JH, Han SY, Jung YH (2019) Deriving insights to design UX of fully autonomous vehicles from contextual user interviews on T5 POD. In: Korean society of design science conference proceeding, vol 5, pp 300–305 7. Brooke J (1996) SUS: a “Quick and Dirty” usability scale. usability evaluation in industry. (1996, June 11) 8. Moule J (2012) Killer UX design: Create user experiences to wow your visitors, 1st edn. SitePoint, Australia 9. Nudelman G (2014) The $1 prototype: lean mobile UX design and rapid innovation for material design, iOS8, and RWD, 1.2 ed. Design Caffeine Press, San Francisco. California 10. Rough Prototyping|Service Design Tools. https://servicedesigntools.org/tools/rough-protot yping. Accessed 22 Nov 2020 11. Hall E (2013) Just enough research. A Book Apart, New York 12. Dam RF, Siang TY (2020) Prototyping: learn eight common methods and best practices. The Interaction Design Foundation. https://www.interaction-design.org/literature/article/pro totyping-learn-eight-common-methods-and-best-practices. Accessed 22 Nov 2020 13. Sanders EB-N, Stappers PJ (2008) Co-creation and the new landscapes of design. CoDesign 4(1):5–18. https://doi.org/10.1080/15710880701875068
Iterative Generation of Chow Parameters Using Nearest Neighbor Relations in Threshold Network Naohiro Ishii, Kazuya Odagiri, and Tokuro Matsuo
Abstract Intelligent functions and learning are important issues, which are needed in the application fields. Recently, these technologies are extensively studied and developed using threshold neural networks. The nearest neighbor relations are proposed for the basis of the generation of functions and learning. First, the these relations are shown to have minimal information for the discrimination and to be the basis of the inherited information for threshold functions. Second, for the Chow parameter problems, we developed fundamental schemes of the nearest neighbor relations and performed their analysis for the Chow parameters. The sequential generation of the Chow parameters is proposed which is caused by small changes of the connecting weights in threshold neurons. Keywords Nearest neighbor relation · Sequential generation of chow parameters · Boundary vertex · Minimal information of nearest neighbor relation
1 Introduction Neural networks are the current state-of-the art technologies for developing the AI and machine learning. It is well known that the network is composed of many neurons. Intelligent and active functions of neural networks are based on that of the respective neurons. As the application of threshold function to learning theory, the Chow parameters problem is extensively studied using complexity theory [1–3]. The Chow parameter problem is given as follows.—Given the Chow Parameters [4] of a N. Ishii (B) · T. Matsuo Advanced Institute of Industrial Technology, Tokyo 140-0011, Japan e-mail: [email protected] T. Matsuo e-mail: [email protected] K. Odagiri Sugiyama Jyogakuen University, Nagoya 464-8662, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_31
357
358
N. Ishii et al.
threshold function, output a representation in the weights and threshold, which realize the function. These sophisticated analyses do not necessarily show the constructive procedures or algorithms to obtain the final solutions. Any concrete procedures are expected to obtain the solutions. For the Chow parameter problems, we developed fundamental schemes of the nearest neighbor relations [5–8] and performed their analysis for the application to threshold networks. The nearest neighbor relation (NNR) consists of the adjacent pair vertices between the true and false vertices through the hyperplane. Through the nearest neighbor relations (NNRs), the sequential generation and learning of the Chow parameters can be analyzed. First, we have shown the basic characteristics of the NNRs in threshold function, which includes fundamental properties studied up to now. Next, it is shown that the NNR has the minimal information for generation of threshold function. Third, for the generation of threshold functions and Chow parameters [1, 4], conditions of the NNRs are analyzed. Finally, the Chow parameters are obtained sequentially. As the Chow parameter problem, a solvable procedure was proposed through the NNRs. In this paper, the nearest neighbor relation (NNR) is introduced in Sect. 2. As the basis of the generation of the threshold function, the NNRs are important role for the inherited information of the function, which are shown in Sect. 2. The boundary vertices for the Chow parameters are shown in Sect. 3 and the sequential generation of the parameters is described in Sect. 4.
2 Nearest Neighbor Relations in Threshold Function The NNR is applicable for the generation of threshold functions. The threshold function is a Boolean function on the n-dimensional cube with 2n vertices, which is divided by a hyperplane. The threshold function f is characterized by the hyperplane W X − θ = 0 with the weight vector W (= (w1 , w2 , . . . , wn )) and threshold θ . The X is a vertex of the cube 2n . The threshold function is assumed here to be a positive and canonical function, in which the Boolean variables satisfy the partial order [9]. Definition 1 The NNR (X i , X j ) in the threshold function f is defined to be vertices satisfying the Eq. (1), {(X i , X j ) : f (X i ) = f (X j ) ∧ |X i − X j | ≤ δ(= 1)},
(1)
where δ = 1 shows one bit difference between X i and X j in the Hamming distance (also in the Euclidean distance). Definition 2 The boundary vertex X is defined to be the vertex which satisfies the Eq. (2) |W X − θ | ≤ |W Y − θ | for the X (= Y ∈ 2n )
(2)
Iterative Generation of Chow Parameters Using Nearest …
359
Fig. 1 Boundary vertices and NNRs
Theorem 1 The boundary vertex X becomes an element of the NNR in the threshold function. Theorem 2 The vertices X i and X j in the NNR (X i , X j ) are the adjacent vectors, each of which belongs to a different class through the hyperplane. This is proved easily from the Definition 1, since the vertices X i , X j are the nearest to the hyperplane, which divides the true and the false data (Fig. 1).
2.1 Discrimination Using Nearest Neighbor Relations The discrimination between true vertices and false ones in threshold function is developed using nearest neighbor relations. In Fig. 2, a three-dimensional cube for the discrimination between true vertices ( 余 , black circles) and false ones ( ○ , white circles) is shown. We consider the difference between the true vertex X 7 (111) and the false one X 0 (000), to be X 7 − X 0 = {(111) − (000)}. The difference becomes {(111)}, which implies the x1 component to be 1, the x2 to be 1 and x3 to be 1. In Fig. 2, four nearest neighbor relations are obtained, which are indicated in arrows of the thick solid lines. These are {(111), (011)}, {(101), (001)}, {(101), (100)}, and {(110), (100)}. When there exists different component between data in the nearest relation neighbor {(111), (011)}, the difference is described as {(111) − (011)} x − x = 1. We can Fig. 2 Pathways between true vertex (111) and false one (000)
X 5 (101)
X 7 (111)
X1 (001) X 3 (011)
X 4 (100) X 6 (110)
X 0 (000)
X 2 (010)
360
N. Ishii et al.
show the difference between the true vertex and the false one in the sequence as (111) → (110) → (010) → (000) in the dotted line in Fig. 2. The difference information between the true vertex (111) and the false one (000) is derived using the sequence as follows: First, the difference of the x 1 component between (111) and (000) is expanded as {(111) − (000)}(x1 −x1 )=1 = {(111) − (110)}(x1 −x1 )=0 + {(110) − (010)}(x1 −x1 )=1 + {(010) − (000)}(x1 −x1 )=0
(3)
Equation (3) shows the pathway with three parenthesis steps indicated in arrows of the dotted lines from (111) to (000). Similarly, the difference of the x3 component from (111) to (000) is equal to the NNR {(011), (001)}. The discriminative information is represented by the NNRs. Theorem 3 The difference information between true vertices and false ones in threshold function is inherited from that of the NNRs.
2.2 Boolean Operations for the NNRs The threshold function is generated using the NNRs. As an example, the threedimensional vertices in the cube are shown in Fig. 2. As true vertices, (011), (110), and (111) are indicated in the black circle, 余 , which belongs to +1 class. As false vertices, (000), (010), (100), (001), and (101) are indicated in the circle, ○, which belongs to 0 class. The generation of threshold functions through the NNRs is performed in the Boolean operations. As an example, five directed arrow vectors are introduced here for the NNRs. In Fig. 2, the true vertex (011) has the NNRs as {(011), (010)} and the true one (11) has also {(010), (100)}. Then, the directed arrow vector indicates the vector from the true vertex to the false one for the NNR, which is shown as {(011), (010)} Two directed arrow vectors, {(011), (001)} and {(011), (010)} generate one plane in the Boolean operation AND in Fig. 3. The directed arrow vector generates the x 3 variable between the true vertex and the false
(011) ە
Fig. 3 Boolean operations for the NNRs
x3
AND
ۑ (010)
(110) ە
x2
x1
AND
ۑ ۑ (001) (010)
(111) ە
x2 ۑ (100)
x2 ۑ (101)
Iterative Generation of Chow Parameters Using Nearest …
361
one. Thus, {(011), (010)} generates x 3 variable, while {(011), (001)} generates x 1 variable. By the operation AND of x 2 and x 3 , the Boolean product x2 · x3 is generated. Similarly, {(110), (100)} and {(110), (010)} generates another plane, which is shown in Fig. 3. From this plane, the Boolean product x1 · x2 is generated. Since these two planes are perpendicular, the Boolean operation OR is used to connect these two planes. The remained directed arrow vector in Fig. 3, is {(111), (101)},which is included in either planes as the vector. Thus, by the OR operation for the two perpendicular planes, the threshold function x2 · x3 + x1 · x2 is obtained. Thus, the function is represented in the Eq. (4). f = x1 · x2 + x2 · x3
(4)
3 Boundary Vertices for Generation of Chow Parameters 3.1 Characterization of Boundary Vertices The condition of the linear separability of the Boolean function is stated in the following theorem derived from the linear inequality conditions [9, 10]. Theorem 4 To realize the linear separability for the Boolean function with n weights and a threshold of {(n + 1) variables}, there exist (n + 1) inequalities, which are independent inequality equations among 2n vertices inputs Since these (n + 1) independent inequalities are replaced to the equality equations having the value +ε or −ε, the (n + 1) vertices corresponding to these equations become boundary vertices in the threshold function. Applying Theorem 6 to the nearest neighbor relations in threshold functions, the following theorems are derived. Theorem 5 There exist at least (n + 1) nearest neighbor relations in the threshold function with n input variables and threshold. Further, at least one boundary vertex is included in the respective NNR. Theorem 6 Assume the boundary vertex X in the threshold function f with n variables and assume the following Eq. (5) holds, g(X ) = 1 − f (X ), g(Y ) = f (Y ) for (Y = X and X, Y ∈ 2n )
(5)
Then, the function g becomes a threshold function. Further, when functions f and g satisfy the Eq. (5), the X becomes a boundary vertex for both functions, f and g. Theorem 6 is proved in the following. Assume the X is a true vertex of f , i.e., f (X ) = 1. Assume the X has the m (≥ 1) components of 1 and weights components
362
N. Ishii et al.
a ji , i = 1 ∼ m of the weight A corresponding to the m components. Replace these weights {a ji } of the m components to the weights {a ji }, where a ji = a ji −δ, i = 1 ∼ m and δ > 0. Other components weights except these m ones are kept as the same valued components. This changed weights and threshold is indicated as (A , θ ). The weight change δ is computed from the Eqs. (6), (7), (8), and (9). For the X , the Eq. (6) holds, A X − θ = a j1 + a j2 + · · · + a jm − θ − mδ = ε − mδ < 0
(6)
ε = |AX − θ | = (a j1 + a j2 + · · · + a jm ) − θ < |AY − θ |
(7)
and
In the components of ji , i = 1 ∼ m, assume the true input vertex Y has at most (m − 1) components to be 1. Then, for the Y , the following equation holds A Y − θ ≥ AY − θ − (m − 1)δ > ε − (m − 1)δ > 0
(8)
Then, the δ satisfying both Eqs. (6) and (8) is derived as 0 < (ε/m) < δ < (ε/(m − 1)δ),
(9)
where m ≥ 2 is assumed. For the true vertex Y (= X ), which has the components, ji , i = 1 ∼ m to be 1 and other components to be 1 in the jk . Since f is assumed to be a positive function, the weight a j for any j, a j ≥ 2ε. From the Eq. (9), 0 < δ < ε holds. Thus, using the Eq. (8) A Y − θ = ε − mδ +
a jk > ε − (m − 2)δ > 0
(10)
k
Thus, for m ≥ 2 and all the true vertices {Y },the Eq. (8) is satisfied. For all the false vertices, the Eq. (11) holds. A Y − θ ≤ AY − θ < −ε
(11)
In case of m = 1, from the Eq. (6), A Y − θ = a ji − θ − δ = ε − δ < 0
(12)
For the true vertex Y , which has the component to be 1 of the X and other components jk with 1,
Iterative Generation of Chow Parameters Using Nearest …
A Y − θ = ε − δ +
363
a jk > 3ε − δ > 0
(13)
k
From the Eqs. (12) and (13), ε < δ < 3ε is derived.
3.2 Boundary Vertices in the NNRs The boundary vertices in the NNRs play an important role for the generation and learning of Chow parameters [4]. The boundary vertex in the NNR is characterized using 2-assumability [9] in the following: Theorem 7 In the threshold function f , the necessary and sufficient condition of the true(false) vertex X for the NNR to be a boundary vertex is to satisfy the 2-assumable condition for the changed false(true) vertex. In the above, the first parenthesis (false) corresponds to the latter one(true). Theorem 8 In the threshold function f, any NNR has at least one boundary vertex, i.e., the true vertex or the false vertex becomes a boundary vertex or both of them become boundary vertices. This is proved showing a simple case without the loss of the generality. Assume that a nearest neighbor relation {(101), (111)} is shown in the directed arrow in Fig. 4, in which the (101) is a false vertex, while (111) is a true vertex. The NNR is included in the dotted ellipse. The vertex (111) is checked using Theorem 6. The true vertex (111) is changed as the false vertex indicated in the white circle by the arrow in Fig. 4. Then, the 2-assumability (two crossed diagonals) is not satisfied among 4 vertices in Fig. 4. Thus, the true vertex (111) does not become a boundary vertex. Theorem 9 Assume the total number of m NNRs specified in the i-th component (i = i 1 , i 2 , . . . , i m ) of the threshold function f , in which there exits one NNR for the respective component i s , (s = 1 ∼ m). Among the m NNRs, the vertex Z is common. Then, all the vertices among the m NNRs become boundary ones. Fig. 4 Boundary vertex in the NNR
X 5 (101)
X 7 (111)
X1 (001) X 3 (011) X 4 (100) X 6 (110)
X 0 (000)
X 2 (010)
364
N. Ishii et al.
This is proved as follows: Assume the common vertex Z is the true one. If the common vertex Z is not a boundary one, there exists the true vertices Y ,which satisfy Z > Y by 2-assumability. As an example, this is shown to be the case of Z = X 7 (111) in Fig. 4. But, Y = X 3 (011) and X 6 (110) are false vertices generating the nearest neighbors with Z , which contradicts the assumed Y to be true vertices.
4 Iterative Generation of the Chow Parameters In Fig. 5, the Chow parameters with 5 variables are generated sequentially based on the NNRs of threshold functions. Then, the Chow parameter consists of 6 tuples [m : s1, s2, s3, s4, s5 ], which m shows the summed number of true vertices, while s1, s2, s3, s4, s5 indicate the summed number of the first 1’s component, that of the second 1’s component„ that of the fifth 1’s component of the true vertices [4], respectively. Based on the Chow parameter [m : s1, s2, s3, s4, s5 ] = [6:11111], the new Chow parameter [7:22111] is generated changing the false boundary vertex (11000) to the true vertex using the NNR {(10000), (11000)} from the Chow parameter [6:11111]. In Fig. 5, the generation of Chow parameter is generated sequentially by the small change of small weights of the element. The relations of the Chow parameters are given in the Theorem 10. In Fig. 4, assume that the Chow parameter at T is indicated as the ChowT and that at S as Chow S , which is generated in the advance of T with m steps. Also, assume here their threshold functions f T and f S , respectively.
Fig. 5 Chow parameters with 5 variables are generated sequentially through NNRs. Parenthesis indicates false vertex to generate the next Chow parameter
Iterative Generation of Chow Parameters Using Nearest …
365
Theorem 10 Between the Chow parameters at T and at S, the following equation holds. The ChowT = the Chows +
m
false vertex i
(14)
i=1
in the NNR, where the first component of the (ChowT )1 becomes (ChowT )1 = (Chow S )1 + m
(15)
As an example, in Fig. 5, consider the case of the Chow parameter T to be 9: 33311 and S to be 6: 11111. Then, the false vertices of the NNRs between them are given as m = 3 and {(11000), (10100), (01100)}.From the Chow parameter S, 6:11111), by the Eq. (14), 6 + 3: (11111) + {(22200)} = 9: 33311 is obtained. Corollary 11 Between the threshold functions f T and f S , the Eq. (16) holds. f T = f S + Boolean sum
m
false vertex i in the NNR of the respective step
i=1
(16) As a solvable procedure for the Chow parameter problem, the sequential generation of the Chow parameters is applicable. When the Chow parameter with m variables, T is given, we start from the function of the most simple Chow parameter, S0 with m variables at the 0-th step as shown in Fig. 6. At the 1st step, the Chow parameter S1k generated from S0 using the boundary vertex in the NNR selected by the minimization between the Chow parameter T j and the selected S jk in Fig. 6. Similarly, at the 2nd step, the Chow parameter S2k generated from S1k is obtained
Fig. 6 Selection of the Chow parameter at the j-th step
366
N. Ishii et al.
similarly by the minimization. By the iteration of steps in Fig. 6, the given Chow parameter, T is obtained. The Boolean function of the T is obtained from the Eq. (16), which generates the weight and threshold [9].
5 Conclusion For the Chow parameter problems, we developed fundamental schemes of the nearest neighbor relations and performed their analysis for the application to threshold networks. The NNR consists of the adjacent pair vertices between the true and false vertices. Through the NNRs in the threshold function, the iterative generation and learning of the Chow parameters can be analyzed. In this paper, first, we have shown the basic characteristics of the NNRs in threshold function, which unifies fundamental properties studied up to now. Second, we have shown the solvable procedure of the iterative generation of the Chow parameters.
References 1. O’Donnell R, Servedio R (2011) The Chow parameters problem. SIAM J Comput 40(19):165– 199 2. De A, Diakonikolas I, Feldman V, Servedio RA (2014) Nearly optimal solutions for the Chow parameters problem and low-weight approximation of halfspaces, 61(2):11.1–11.36 3. Diakonikodas I, Kane DM (2018) Degree-d Chow parameters robustly determine degreed PTFs (and algorithmic applications) electronic colloquium computational complexity, 25(189):1–29 4. Chow CK (1961) On the characterization of threshold functions. In: Proceedings of the symposium on switching circuit theory and logical design (FOCS), pp. 34–38 5. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27 6. Ishii N, Torii I, Iwata K, Odagiri K, Nakashima T (2017) Generation and nonlinear mapping of reducts-nearest neighbor classification. In: Chapter 5 in advances in combining intelligent methods. Springer, pp 93–108 7. Ishii N, Torii I, Mukai N, Nakashima T (2017) Generation of reducts and threshold function using discernibility and indiscernibility matrices. In: Proceedings of ACIS-SERA IEEE computer society, pp 55–61 8. Ishii N, Torii I, Iwata K, Odagiri K, Nakashima T, Matsuo T (2019) Incremental reducts based on nearest neighbor relations and linear classification. In: Proceedings of IIAI-SCAI IEEE computer society, pp 528–533 9. Hu ST (1965) Threshold logic. University of California Press 10. Fan K (1966) On systems of linear inequalities, linear inequalities and related systems. In: Kuhn HW, Tucker AW (eds). Princeton University Press, pp 99–156
Effective Feature Selection Using Ensemble Techniques and Genetic Algorithm Jayshree Ghorpade-Aher and Balwant Sonkamble
Abstract Individual feature selection algorithms, used for processing highdimensional multi-source heterogeneous data may lead to weak predictions. The traditional single method process may not ensure the selection of relevant features. The selections of features are susceptible to the changes in input data, and thus fail to perform consistently. These challenges can be overcome by having a robust feature selection algorithm that generates a subset of original features and evaluates the candidate set to check for its relevance. Also, it determines the feasibility of the selected subset of features. The fundamental tasks of selecting feature subset minimize the complexity of the model and help to facilitate the further processing of the model. The limitations of using single feature selection technique can be reduced by combining multiple techniques to generate the effective features. There is a need to design efficient approaches and technique for estimating the feature relevance. This ensemble approach will help to include diversity at input data level, as well as the computational technique. The proposed method—Ensemble Bootstrap Genetic Algorithm (EnBGA)—generates the effective feature subset for the multi-source heterogeneous data. Various univariate and multivariate base selectors are combined together to ensure the robustness and stability of the algorithm. In this pandemic of COVID-19, it’s observed that patients already diagnosed with diseases such as diabetes had an increased mortality rate. The proposed method performs feature analysis for such data, where the Genetic Algorithm searches the feature subset and extracts the most relevant features. Keywords Feature selection · Genetic algorithm · Ensemble · Machine learning · Heterogeneous data · Bootstrap
J. Ghorpade-Aher (B) P.I.C.T, MIT World Peace University, Pune, India B. Sonkamble Pune Institute of Computer Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_32
367
368
J. Ghorpade-Aher and B. Sonkamble
1 Introduction The existing data processing methods face challenges to process heterogeneous data due to computational power, statistical accuracy, and algorithmic stability [1]. To handle these problems, good feature selection techniques must be used that will maximize the relevance of the features with the target variable. The feature selection method comprises of selecting most relevant features and improving the performance of the model with the selected feature subset. The data oriented architecture lays various challenges [2] for processing the data and extracting the significant features. The association of multiple base selector techniques, yield better features as compared to the individual traditional feature selection techniques. Ensemble learning can incorporate with better features for feature selection and provide robustness to the model by introducing variations in the input data. Removal of redundant features avoids making incorrect decisions based on noisy data [3], thus reducing the overfitting problem. Also, the removal of constant, quasi-constant or low variance features contributes in dimensionality reduction. The lowered training time of algorithm fastens the process of modeling. The relevant feature subset is deduced using various univariate and multivariate feature selection techniques [4]. Further, optimization is one of the important aspects of algorithmic calculations. Minimizing or maximizing a function depending upon the problem statement can provide with optimal results, which will fine-tune the decision-making strategy. The biological behavioral algorithms such as Genetic Algorithm (GA) and Swarm Intelligence (SI) are mostly used for extracting optimal features. Various univariate techniques like Chi-square, Information Gain, Receiver Operating Characteristics (ROC) scores, and multivariate algorithms such as Extreme Gradient Boosting (XGBoost), Random Forest (RF), and Gradient Boosting are used to compute the scores. The univariate techniques are mostly used for analyzing single variable, whereas multivariate techniques emphasize the study of correlation among more than two variables. Genetic Algorithm (GA) [5] follows the biological evolution with stochastic behavior that produces the optimal solution of the given fitness function. It uses the Darwin’s theory called ‘Survival of the Fittest’, where the successive generations try to adapt with the environment. Genetic algorithm processes the population of individuals, and thus generates better approximations towards the expected outcome. The proposed method called Ensemble Bootstrap Genetic Algorithm (EnBGA) include selection of the significant top ‘k’ features and then extracting the effective best features. Diabetic Mellitus (DM) disease is increasing continuously and has become one of the most prevalent diseases. The mortality rate is increasing in the current pandemic of COVID-19. Patients already diagnosed with disease such as diabetes had a death rate of more than 7% as reported by the Centre for Disease Control and Prevention [6]. The experimental analysis of multisource heterogeneous data is performed using the proposed algorithm for the Type-2 Diabetes Milletus (T2DM) disease. The effective diagnosis provides meaningful insights to the domain experts for further research.
Effective Feature Selection Using Ensemble Techniques …
369
The literature review is discussed in Sect. 2. Section 3 depicts the research study of the proposed method and Sect. 4 states the experimental discussions. Finally, the concluding remarks are stated in the last section.
2 Literature Review Most of the Machine Learning applications involving multi-source data with various input features possess complex relationships. Each application with its smallest point as datum has its own set of features or characteristics or traits [1, 4]. The expected results can be obtained by selecting the feasible feature subset with proper decisionmaking, which still is an open challenge while constructing the compact predictive models [7]. As it’s difficult to control the size of the input data, the subset of unique features must be considered to improve the performance of the model and also minimize the resource utilization. These challenges can be overcome by having a robust feature selection algorithm that generates a subset of original features and evaluates the candidate set to check for its relevance. Feature engineering [8] is one of the important tasks as it identifies the relevant traits for an application. Each single learning technique has some limitations and cannot perform consistently for a varied set of input data [3]. Ensemble technique collectively considers the processing of different base selectors [9] and tries to make the model more generalized for new data. The literature study revealed that different learning techniques use its own strategy for calculating the feature selection accuracy. These techniques can be integrated with average mean functions or geometric mean functions. The univariate techniques [10] such as Chi-square (Chi2) calculate the significance value (pvalue) that gives the probability under the null hypothesis. It is observed that minimum Chi2_Score(1pvalue) will yield best features. Entropy is said to be the measures of uncertainty, impurity, noise, etc. The Information Gain (IG) maximizes with reduction in entropy. The higher values of IG, states the effectiveness of an attribute in classifying training data. Feature selection techniques helps to maximize the classification accuracy of a classifier by reducing the computational complexities. The ROC (Receiver Operating Characteristics) method examines each one of the variables ‘x’ or feature ‘F’ against the target variable ‘y’. The significant features are identified depending on the significance value and weight scores of the features. A boosting algorithm is a multivariate technique that changes the weak learners to strong learners by adjusting their weights along the way. The Extreme Gradient Boosting [11] (XGBoost) algorithm implements ensembles of decision tree. The comparative importance of an attribute increases as its use in constructing key decisions increases. XGBoost is the regularization of Gradient Boosting Machine and posses L1 (Lasso Regression) & L2 (Ridge Regression) which prevents the model from overfitting. Random Forest (RF) derives the importance of each variable on the tree decision to provide a good predictive performance. In Random Forests, the
370
J. Ghorpade-Aher and B. Sonkamble
impurity decrease from each feature can be averaged across trees to determine the final importance of the variable. The Genetic Algorithm is a technique that evolves over a period with better solutions by adapting with the environment. The fitness function acts as an optimization benchmark. The measures of model performance with best features are functions such as accuracy scores or root mean square error. The fitness values [7] with larger scores are considered to be better. The next generation is produced by randomly combining the individuals having best fitness scores resulting in crossover and mutation processes. The selected feature subset has the individuals as either included (TRUE or ‘1’) or not included (FALSE or ‘0’). Thus, Genetic Algorithm [8] performs better than the traditional feature selection algorithms. Genetic Algorithm can handle datasets with multiple features, even if it requires more computations to converge during each population.
3 Proposed Ensemble Bootstrap Genetic Algorithm (EnBGA) The motivation of the research is to study and analyze the feature selection algorithms to generate the features that would maximize the outcome of the proposed model. Figure 1, depicts the proposed technique for feature selection. The proposed feature selection technique is designed to perform data processing [12] to obtain the optimal features. The proposed EnBGA technique for selecting the relevant features is as follows:
3.1 Data Preprocessing Data preprocessing is a crucial part as it handles the missing data-values by using imputer method, removing ‘null’ or ‘nan’ values, selecting the mean, median or mode of the attribute, etc. It deals with the multiple observations of same entry, thus tackling the problem of data repetition along with the quasi-constant features.
Fig. 1 Proposed feature selection technique
Effective Feature Selection Using Ensemble Techniques …
371
The statistical techniques like standard deviation, central tendency computations, interquartile ranges, etc., help to explore, analyze, and understand the data with its respective features.
3.2 Feature Creation Data transformations are performed on the heterogeneous data to convert categorical data [4] to numerical data or ordinal data. The scaling of categorical data is performed by converting it into dummies. The feature creation and selection helps to extract the relevant feature subset. Various techniques such as Variance threshold scores, One-hot binarizer, Count vectorizer, etc., are used to find the scores of the feature relevance. Data standardization and normalization improves the feature selection process.
3.3 Feature Selection The various feature selection techniques explores the significance of each feature along with its correlation with the target variable. Various techniques such as Filter methods, Wrapper methods, and Ensemble approaches can be used for feature selection. The wrapper method performs extraction of relevant features by forming a feature subset. The undesired features of a model, slows its performance. Filter method finds the ranking of individual features as per the relevance criteria with the response variable. Among these methods, the Ensemble approach [13] yields the optimal features, those results in better predictive performance. The appropriate features help to facilitate the Machine Learning algorithm to train faster. Features from different sources are used to predict the target variable. The proposed method implements these significant characteristics of the Ensemble approach to generate the optimal features [14]. The main objective of this approach is to generate a more reliable and robust set of features by integrating the results of various base feature selection vectors. These base selectors can be functionally similar or may have heterogeneous approach. The proposed method includes the heterogeneous [15] selectors by considering different selection algorithms to exploit the features of sampled data. Also, the knowledge of fine-tuning of the parameters of the algorithms is required to select the ‘best’ threshold values. Further, data transformations are performed with various data aggregation techniques [9] to generate the optimal ensemble feature subset. The proposed method known as Ensemble Bootstrap Genetic Algorithm (EnBGA) works in two phases as discussed below. Selection of the significant top ‘k’ features: The first module inserts variation at data level by having bootstrap samples and implements the ensemble techniques for best feature selection using univariate and multivariate algorithms. The univariate techniques such as Chi-square, Information gain, and ROC are used to identify the
372
J. Ghorpade-Aher and B. Sonkamble
relevant features possessing strong relationship with the output variable. The multivariate techniques like Extreme Gradient Boosting, Random Forest, and Gradient Boosting are implemented to obtain the significance of the features. It’s observed that a higher score of a feature implies an increase in the importance of that feature in the data with respect to the target variable. Let ‘D’ be the input data, with a set of bootstrap data samples. The reduced feature set ‘FR ’ is a subset of the total features ‘FN ’. The ‘TEn ’ represents combined ensemble techniques. The Algorithm 1, processes ‘D’ to generate the top ‘k’ features ‘Fk ’. Step-1: Step-2:
Step-3: Step-4: Step-5:
D ← {di | i ∈ (1,2,…, 6)}} for each di ∈ D ◯ di ← [input(X1 ), target(y1 )] ◯ Fj (s) ← {feature-score for j ∈ FR where FR ⊆ FN } Franked ← {descending [Fj (s)]} Faggregate ← {mean [Xranked ] for t ∈ TEn } Fk ← {Fm | m ∈ (1,2,…, k)}
Extracting the optimal best features: The parameter tuning for feature selection using Genetic Algorithm (GA) helps to obtain effective optimal results. The GA outperforms the traditional feature selection techniques and explores the best features of the datasets. Selecting significant features as the relevant subset is a combinatorial optimization problem [8]. Mathematically, an exhaustive feature selection technique requires 2N different combinations for computations where ‘N’ is the number of features. Thus, huge computations are involved to evaluate the data with large number of features. Hence, an intelligent technique is needed to perform better selection of features in practice with stochastic selection method. The Genetic Algorithm [10] does not depend on the specific knowledge of the problem and is inspired by the idea of natural selection theory proposed by Charles Darwin. The GA algorithm initializes the population in the dataset and evaluates the fitness function which signifies the effectiveness of the selection. It involves selection, crossover, mutation, and termination processes. The selection process is based on the concept of survival of the fittest as part of the evolutionary algorithm. GA allows the emergence of the optimal features that improves the selection over time from the best of prior features. The selected k-features ‘Fk ’ are fed to Algorithm 2 for extracting the effective features. Let P(n) be the population and FF(n) evaluates the fitness function. Step-1: Step-2: Step-3:
Step-4:
P(n) ← {initialize Pi where i ∈ (1,2,….,n)} FF(n) ← {fitness function FFj where j ∈ (1,2,….,n)} while {termination = True} ◯ Pp (t) ← {select parents with best fitness} ◯ Pc (t) ← {crossover for generating new individuals} ◯ Pm (t) ← {mutation[Pc (t)]} FGA ← {fittest individuals/features where FGA ⊆ Fk }
Effective Feature Selection Using Ensemble Techniques …
373
4 Experimental Discussion The applications such as medical diagnosis have features with complex relationships. The data is captured from multiple sources. This heterogeneous data is explored to extract optimal best features which are relevant to the target variable and improve the performance of the model. The experimentation is executed using spyder tool and python programming. The proposed method, Ensemble Bootstrap Genetic Algorithm (EnBGA), is implemented and executed using multi-source data. The statistical analysis of the selected features helps to understand the significance of each feature towards the target variable. The public dataset from EHR (Electronic Health Records) is used for the experimentation. The dataset is a multisource dataset with various sources from Diagnosis data (disorders as approved by the World Health Organization.), Medication data (medicines, e.g., voxy, lisinopril, cozaar, etc.), Transcripts data (Vitals, e.g., weight, height, blood pressure, etc.), and Lab data. Table 1, shows the feature score analysis for Diagnosis dataset. The optimal features selected are (i) F1: Cardiovascular_Disorders, (ii) F2: Genitourinary_Disorders, (iii) F3: Endocrine_Disorders, (iv) F4: Hematologic_Disorders, and (v) F5: Musculoskeletal_Disorders. Similarly, the feature analysis was performed for the other datasets. The Gradient Boost (GB) algorithm is used for comparative performance analysis of these selected effective Genetic Algorithm (GA) features and Top-10 features as shown in Table 2. It’s observed that GA yields more accuracy (AUCScore) with significant features. Table 1 Feature score analysis for Diagnosis dataset Techniques
Features F1
F2
F3
F4
F5
Chi2_Mean
1.00000
1.00000
1.00000
1.00000
0.97709
IG_Mean
0.06785
0.01001
0.00467
0.00736
0.00152
ROC_Mean
0.10795
0.02677
0.04818
0.04442
0.04818
xGB_Mean
0.65972
0.01756
0.00885
0.01055
0.00760
RF_Mean
0.08399
0.03792
0.03776
0.03436
0.05377
GBC_Mean
0.49867
0.04204
0.02170
0.01644
0.01090
Wt_Mean
0.40303
0.18905
0.18686
0.18552
0.18318
Table 2 The performance results for Diagnosis dataset Classifier
Type
AUCScore
Sensitivity
Specificity
GB
GA
0.806533
0.808468
0.166667
GB
TOP10
0.805528
0.808468
0.166667
374
J. Ghorpade-Aher and B. Sonkamble
5 Conclusion The proposed algorithm, Ensemble Bootstrap Genetic Algorithm (EnBGA), is a robust technique that selects the relevant effective features. These effective features improve the predictive power of the model. The Ensemble approach includes diversity at input data level and provides stability to the model. Genetic Algorithm explores the effective features and enhances the performance of the model. Genetic Algorithm is a stochastic method that may require more computations to converge during each population with increased computational time. Thus, the proposed algorithm EnBGA, will help to select the appropriate features that will contribute to proper decision-making and predictions.
References 1. Ding W, Lin C, Pedrycz W (2020) Multiple relevant feature ensemble selection based on multilayer co-evolutionary consensus mapreduce. IEEE Trans Cybern 50(2):425–439 2. Zhao Y, Duangsoithong R (2020) Empirical analysis using feature selection and bootstrap data for small sample size problems. In: IEEE 16th international conference on electrical engineering/electronics, computer, telecommunications and information technology (ECTICON), Pattaya, Chonburi, Thailand, pp 814–817. (January 2020) 3. Ghorpade J, Sonkamble B (2020) Predictive analysis of heterogeneous data–techniques & tools. In: 2020 5th international conference on computer and communication systems (ICCCS), Shanghai, China, pp 40–44. (May 2020) 4. Pes B (2019) Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. In: Neural computing and applications. Springer pp 1–12. (February 2019) 5. Wang J, Xu J, Zhao C, Peng Y, Wang H (2019) An ensemble feature selection method for high-dimensional data based on sort aggregation. Syst Sci Control Eng IEEE Access 7(2):32–39 6. Woodward A (2020) What to know about the coronavirus outbreak. World Economic Forum. (March 2020) 7. Yamada Y, Lindenbaum O, Negahban S, Kluger Y (2020) Feature selection using stochastic gates. In: 37th international conference on machine learning, Vienna, Austria, PMLR 119, pp 1–12. (July 2020) 8. Khair U, Lestari YD, Perdana A, Hidayat D, Budiman A (2019) Genetic algorithm modification analysis of mutation operators in max one problem. In: IEEE, third international conference on informatics and computing (ICIC), Palembang, Indonesia, pp 1–6 (August 2019) 9. Yu Z et al (2019) Adaptive semi-supervised classifier ensemble for high dimensional data classification. IEEE Trans Cybern 49(2):366–379 10. Nag K, Pal NR (2020) Feature extraction and selection for parsimonious classifiers with multiobjective genetic programming. IEEE Trans Evol Comput 24(3):454–466 11. Palhares P, Brito L (2018) Constrained mixed integer programming solver based on the compact genetic algorithm. IEEE Latin Am Trans 16(5):1493–1498 12. Aruna Kumari GL, Padmaja P, Jaya Suma G (2020) ENN-ensemble based neural network method for diabetes classification. Int J Eng Adv Technol (IJEAT) 9(3). ISSN: 2249–8958 13. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2018) Feature selection: a data perspective. ACM Comput Surv (CSUR) 50(06), pp 1–45 14. Ditzler G, LaBarck J, Ritchie J, Rosen G, Polikar R (2018) Extensions to online feature selection using bagging and boosting. IEEE Trans Neural Netw Learn Syst 29(9):4504–4509
Effective Feature Selection Using Ensemble Techniques …
375
15. Thomas J, Sael L (2015) Overview of integrative analysis methods for heterogeneous data. In: IEEE international conference on big data and smart computing, Jeju, pp 266–270 16. Seijo-PardoI B, Porto-Díaz I, Bolón-Canedo V, Alonso-Betanzos A (2017) Ensemble feature selection: homogeneous and heterogeneous approaches. In: Knowledge-based systems, vol 118. Elsevier, pp 124–139 (February 2017)
A Generalization of Secure Comparison Protocol with Encrypted Output and Its Efficiency Estimation Takumi Kobayashi and Keisuke Hakuta
Abstract Secure comparison protocol outputs an unencrypted or encrypted comparison result. In this paper, we focus on a secure comparison protocol with an encrypted output. In recent works, since the computation of such protocols proceeds bitwise, the efficiency problem has not yet been solved. In this study, we propose a new secure comparison protocol with an encrypted output, which is a generalization of one proposed by Kobayashi and Hakuta (2018). As an interesting feature, a computation of our proposed protocol proceeds w bits-by-w bits for any positive integer w to compute an output. We discuss the security under semi-honest model. Furthermore, we estimate for the efficiency. Keywords Cryptography · Homomorphic encryption · Privacy · Privacy enhancement techniques · Secure comparison · Secure multi-party computation · Semi-honest model
1 Introduction Secure comparison protocol can be applied to various applications of secure multiparty computation. We call it a SCP (Secure Comparison Protocol) in this paper. Many researchers have developed SCPs which output an encrypted comparison result by using several building blocks [1, 2]. On the other hand, some SCPs using one building block also have been proposed [3, 4]. It has ease to perform security analysis for SCPs using one building block compared to that of SCPs using several ones. Moreover, since SCPs in recent works proceed bitwise to compute the output, many SCPs still have an efficiency problem. To solve that, Kobayashi and Hakuta focused on a SCP which proceeds multi bits-by-multi bits to compute it [4]. In this paper, we call such a SCP multi bitswise SCP for short. A goal of their study is developing a w bits-wise SCP for any T. Kobayashi (B) · K. Hakuta Shimane University, 1060 Nishikawatsu-cho, Matsue, Shimane, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_33
377
378
T. Kobayashi and K. Hakuta
positive integer w (w ≥ 2). For the goal, they proposed two SCPs. In this paper, we call the first protocol [4, Protocol 1] (resp. the second one) KH Protorol 1 (resp. KH Protorol 2). KH Protocol 1 proceeds bitwise to obtain an encrypted output, while KH Protocol 2 is a 2 bits-wise SCP. They claimed that “the generalization of KH Protocol 2 could make the SCP more efficient” [4]. Therefore, we tackle the generalization. A notion “(a > b)” indicates a truth value of a proposition “a > b” for the input integers a and b. Moreover, the other cases are defined as well. For instance, if a proposition “a > b” is truth then a ciphertext of a comparison result (a > b) equals that of “1”. Throughout this paper, we consider a scenario as follows: “Party A has a private integers a, while Party B has a private integers b. They want to compare their private integers and do not leak their integers and the comparison result each other. Now Party A’s plaintext is encrypted with Party B’s public key. On the other hand, Party B has Party B’s public key and the secret key. As a result, Party A gets an encrypted comparison result (a > b).” In this study, we propose a SCP with an encrypted output. Now we call it Our protocol for short. Our protocol (Algorithm 1) is a generalization of KH Protocol 2 [4, Protocol 2]. KH Protocol 2 is a 2 bits-wise SCP to obtain the output. On the other hand, Our protocol is a w bits-wise SCP for any positive integer w (w ≥ 1) to do that. This feature of Our protocol is so interesting. We also discuss the security under semi-honest model and estimate for the efficiency of Our protocol. Moreover, homomorphic encryption is commonly used in a SCP. We call it HE (Homomorphic Encryption). Since we use subprotocols which are constitute of only HE, Our protocol is constitute of only HE. Our protocol has one advantage that it has ease to perform the security analysis, compared to that of ones using several building blocks. This paper is organized as follows. In Sect. 2, we fix some notation. In Sect. 3, we propose a SCP and prove the correctness (a protocol works to provide the outputs using the inputs correctly). Moreover, we discuss the security and estimate for the efficiency. In Sect. 4, we conclude the paper.
2 Mathematical Preliminaries Throughout this paper, we use the following notation. We reserve the notation Z to denote the set of rational integers. We represent the set of non-negative integers by Z≥0 , that is, Z≥0 := {n ∈ Z | n ≥ 0}. Let us denote by Zn := Z/nZ the residue class ring modulo n for a positive integer n ≥ 2. We reserve the notation p and q to denote (l/2)-bit primes. We represent a multiplication of p and q by N . Now we assume that N is a l-bit RSA modulus. We denote by Z∗N 2 the set {a ∈ Z/N 2 Z | gcd(a, N 2 ) = 1}. We describe the definition of a public key encryption scheme as follows [5, Definition 7.1]. We reserve the notation E to denote a public key encryption scheme. Let us denote by κ a security parameter of the scheme E Moreover, ( pk, sk) indicates an output by Gen(1κ ) and a combination of public key pk and secret key sk. Party B’s public key is denoted by pk B . The corresponding secret key is denoted by sk B . Let
A Generalization of Secure Comparison Protocol with Encrypted Output …
379
us denote by PK and SK the public key space and the secret key space, respectively. We reserve the notation P and C to denote the plaintext space and the ciphertext space. Enc pk (m; r ) means a ciphertext of the plaintext m ∈ P with a public key pk and a random number r , while Decsk (c) means a result by decrypting the ciphertext c ∈ C with a secret key sk. In this study, we define maps Enc and Dec as follows: Enc : P × PK → C, (m, pk) → c, Dec : C × SK → P, (c, sk) → m. Our protocol needs a semantically secure and additive HE scheme. Any additive HE scheme [1, 2, 6, 7] can be applied to Our protocol. In this paper we apply Paillier encryption scheme [6] to Our protocol. Paillier encryption scheme is commonly applied to many applications of SMC. For more details, we refer the reader to [6]. – Encryption: Given a plaintext m ∈ Z N , choose a random number r ∈ Z∗N . A ciphertext of the plaintext m is c = Enc pk (m; r ) = g m r N mod N 2 . – Homomorphic property: Given plaintexts m 1 and m 2 ∈ Z N , choose random numbers r1 , r2 ∈ Z∗N . Ciphertexts are represented by c1 = Enc pk (m 1 ; r1 ) = g m 1 r1N mod N 2 , c2 = Enc pk (m 2 ; r2 ) = g m 2 r2N mod N 2 . Then we have c1 · c2 = Enc pk (m 1 ; r1 ) · Enc pk (m 2 ; r2 ) = Enc pk (m 1 + m 2 ; r1r2 ).
(1)
3 Proposed Protocol We propose a SCP which outputs an encrypted comparison result (a > b). We describe the overview of Our protocol as follows: Our protocol is a generalization of KH Protocol 2. KH Protocol 2 is a 2 bits-wise SCP to obtain the encrypted output, while Our protocol is a w bits-wise SCP for any positive integer w to do that. These protocols compute a ciphertext of comparison result (a > b). Maps G and Gˆ are used in this paper. These return a comparison result for the inputs. We define a map G : Z2≥0 → {0, 1} and a map Gˆ : Z2≥0 → {0, 1} as
1, if a ≤ b, G(a, b) = 0, otherwise,
0, if a ≤ b, ˆ and G(a, b) = 1, otherwise.
Next, we put a positive integer w (w ≥ 1). We define the following sets Γ˜ and Γ , and the following element γ in Γ : Γ˜ := {0, 1, . . . , 2w − 1} Z, γ ∈ Γ := Γ˜ \{0} = {1, . . . , 2w − 1} Z. The non-negative integer a is represented as follows: a = (ah−1 , . . . , a0 )2 = (A H −1 , . . . , A0 )2w ∈ Z≥0 , and Ai := (awi+(w−1) , awi+(w−2) , . . . , awi )2 ∈ Γ˜ Z. We represent the non-negative integer b similarly. We assume that h is divisible by w in Our protocol using padding zeros to the integers if necessary. Now we have H = h/w obliviously. Furthermore, let us represent an integer representation of (Ai−1 , . . . , A0 )2w ∈ Z≥0 as A(i) . An integer representation of (Bi−1 , . . . , B0 )2w ∈ Z≥0 is represented as B (i) similarly. Moreover, we put ti for
380
T. Kobayashi and K. Hakuta
Algorithm 1 Our protocol Inputs: Party A : a = (ah−1 , . . . , a0 )2 = (A H −1 , . . . , A0 )2w , d := 2w and pk B . Inputs: Party B : b = (bh−1 , . . . , b0 )2 = (B H −1 , . . . , B0 )2w , d := 2w , pk B and sk B . (A) Outputs: Party A : Enc pk B (t H ; rt H ) such that t H = Gˆ (a, b), Party B : Not applicable.
1: Party A encrypts “0”, “1” and d and obtains Enc pk B (0; r X(A) ), Enc pk B (1; rY(A) ) and (A) Enc pk B (d; r Z ). (A) 2: Party A computes Enc pk B (1; rY )−1 . (A) (A) 3: Party A substitutes Enc pk B (1; rY ) into Enc pk B (t0 ; rt0 ). 4: for i from 0 to H − 1 by +1 do 5: Party A chooses ci such that ci ∈ {0, 1} at random. 6: if ci = 0 then (A) (A) (A) 7: Party A substitutes Enc pk B (ti ; rti ) and Enc pk B (0; r X ) into Enc pk B (si ; rsi ) and 8: 9:
10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25:
26:
Enc pk B (ci ; rc(A) i ), respectively. else (A) (A) (A) −1 Party A does: Enc pk B (si ; rsi ) ← Enc pk B (1 − ti ; rY rti ). (A)
(A) (B) (A)
29: 30:
31: 32: 33: 34:
(A)
They perform a SDP: Party A inputs d and Enc pk B (d + Bi + ci − 1; r Z r Bi rci (rY )−1 ) while Party B inputs d and sk B . Finally Party A outputs Enc pk B ((d + Bi + ci − (A) ). 1)/d ; r D (A) Party A does: Enc pk B (ti+1 ; rti+1 ) ← Enc pk B (ti + Bi − u i + (d + Bi + ci − (A) (B)
27: 28:
(A)
Party A substitutes Enc pk B (1; rY ) into Enc pk B (ci ; rci ). end if (A) Party A sends Enc pk B (si ; rsi ). (B) Party B encrypts Bi and obtains Enc pk B (Bi ; r Bi ). if Bi = 0 then Party B encrypts “0” and obtains Enc pk B (0; r X(B) ) and substitutes Enc pk B (0; r X(B) ) into (B) Enc pk B (u i ; ru i ). else (B) (A) (B) Party B does: Enc pk B (u i ; ru i ) ← Enc pk B (si + Bi ; rsi r Bi ). end if (B) (B) Party B sends Enc pk B (Bi ; r Bi ) and Enc pk B (u i ; ru i ) to Party A. if ci = 0 then (B) (B) (B) (A) Party A does: Enc pk B (u i ; ru i ) ← Enc pk B (2Bi − u i + 1; (r Bi )2 (ru i )−1 rY ). end if if Ai = 0 then (A) (B) (A) (A) Party A computes: Enc pk B (d + Bi + ci − 1; r Z r Bi rci (rY )−1 ).
(B)
(A)
1)/d ; rti r Bi (ru i )−1 r D ). else (A) Party A encrypts Ai and obtains Enc pk B (Ai ; r Ai ). (A) (B)
(A)
(A)
Party A does: Enc pk B (d + u i − Ai − 1; r Z ru i (r Ai )−1 (rY )−1 ). They perform a SDP: Party A inputs d and Enc pk B (d + u i − Ai − (A) −1 (A) −1 (rY ) ) while Party B inputs d and sk B . Finally Party A out1; r Z(A) ru(B) i (r Ai )
(A) puts Enc pk B ((d + u i − Ai − 1)/d ; r D ). (A) (A) Party A substitutes Enc pk B ((d + u i − Ai − 1)/d ; r D ) into Enc pk B (ti+1 ; rti+1 ). end if end for (A) (A) (A) Party A does: Enc pk B (t H ; rt H ) ← Enc pk B (1 − t H ; rY (rt H )−1 ). (A)
35: Party A outputs Enc pk B (t H ; rt H ).
A Generalization of Secure Comparison Protocol with Encrypted Output …
381
i = 0, . . . , H as follows: ti :=
1, G(A(i) , B (i) ),
if i = 0, otherwise.
Remark that we have t H = G(A(H ) , B (H ) ) = G(a, b). We describe Our protocol as Algorithm 1. Algorithm 1 computes the value ti+1 for each i from 0 to H − 1. The objective of Algorithm 1 is as follows: Party A outputs a ciphertext of a comparison result t H = G(a, b) using their private input integers a and b. Algorithm 1 can achieve the goal by computing a ciphertext of t H = G(A(H ) , B (H ) ) = G(a, b) throughout the iteration between lines 4–33 of Algorithm 1 and carrying out bit inverse the ciphertext between lines 34–35. A notation “←” means that computes an element of right side and substitutes it into one of left side. We use a notation Enc pk B (m; r ) to describe a ciphertext of a plaintext Mul using Party B’s public key pk B and a random number r . Now we explain each lines of Algorithm 1. In line 1–3, Party A performs an ). In line 4, they carry initialization and substitutes Enc pk B (1; rY(A) ) into Enc pk B (t0 ; rt(A) 0 out iterations on i from 0 to H − 1. In lines 5–12, Party A masks ti (= G(A(i) , B (i) )) then it is sent to Party B. Due to the mask, Party B is unable to determine if si = ti or ). In lines 13–19, Party si = 1 − ti if Party B decrypts the ciphertext Enc pk (si ; rs(A) i (B) (B) B computes Enc pk (Bi ; r Bi ) and Enc pk (u i ; ru i ) depending on Bi and sends the ciphertexts to Party A. In lines 20–22, Party A solves the mask. In lines 23–32, Party ) depending on Ai . After line 33, since we have the A computes Enc pk (ti+1 ; rt(A) i+1 following equation, Party A gets a ciphertext of G(a, b). t H = G(a, b).
(2)
ˆ b). In Finally, by using bit inverse in lines 34–35, Party A gets a ciphertext of G(a, lines 3, 7, 9, 10, 15, 17, 21, 24, 26, 28, 29, 31, and 34, Party A or Party B denotes by one ciphertext other one. In particular, in lines 9, 17, 21, 24, 26, 29, and 34, they can obtain each ciphertext by the property (1) of HE scheme. Furthermore, as a subprotocol, a SDP is needed in Algorithm 1. The SDP is described as follows: “Party A inputs an encrypted integer X and a divisor D while Party B inputs a divisor D and sk B . As a result, Party A outputs the encrypted division result X/D .” We use a SDP proposed by Veugen [8, Protocol 1] as the subprotocol. They need to perform the SDP in line 25 and 30 of Algorithm 1. Meanwhile, they do not need to do that in lines 26 and 31 because Party A can store the encrypted division result. Furthermore, a SCP is required in Veugen’s SDP. This SCP needs to provide an encrypted comparison result (a > b) from the inputs a and b where the size is log2 d . We apply KH Protocol 1 [4, Protocol 1] to the subprotocol of the SDP. For more details, we refer the reader to KH Protocol 1 [4, Protocol 1] and [8, pp. 169–170]. Next, we describe the correctness of Our protocol (Algorithm 1). First of all, we need the following proposition (Proposition 1) for the correctness.
382
T. Kobayashi and K. Hakuta
Table 1 Value of variables in one iteration (1/2) ci Ai Bi Case si u i after line 18 0 0 0 0 0 0 1
0 0 γ γ γ γ 0
0 γ 0 γ γ γ 0
1
0
1
←0 ← ti ←0 ← ti ← ti ← ti ←0
ti ti ti ti ti ti 1 − ti
ui ui ui ui ui ui ui
γ
1 − ti
u i ← 1 − ti + Bi
γ
0
1 − ti
ui ← 0
1
γ
γ
(Ai < Bi )
1 − ti
u i ← 1 − ti + Bi
1
γ
γ
(Ai = Bi )
1 − ti
u i ← 1 − ti + Bi
1
γ
γ
(Ai > Bi )
1 − ti
u i ← 1 − ti + Bi
(Ai < Bi ) (Ai = Bi ) (Ai > Bi )
+ Bi + Bi + Bi + Bi
u i after line 22 ui ← 0 u i ← ti + Bi ui ← 0 u i ← ti + Bi u i ← ti + Bi u i ← ti + Bi u i ← 2Bi − u i + 1 = 0−0+1=1 u i ← 2Bi − u i + 1 = 2Bi − (1 − ti + Bi ) + 1 = ti + Bi u i ← 2Bi − u i + 1 = 0−0+1=1 u i ← 2Bi − u i + 1 = 2Bi − (1 − ti + Bi ) + 1 = ti + Bi u i ← 2Bi − u i + 1 = 2Bi − (1 − ti + Bi ) + 1 = ti + Bi u i ← 2Bi − u i + 1 = 2Bi − (1 − ti + Bi ) + 1 = ti + Bi
Proposition 1 For given integers Ai ∈ Γ˜ and Bi ∈ Γ˜ (1 ≤ i ≤ H − 1), we have ⎧ ⎪ ⎨1, if Ai < Bi , (d + ti + (Bi − Ai − 1))/d = ti , if Ai = Bi , ⎪ ⎩ 0, otherwise (Ai > Bi ). Proof The proof is straightforward.
(3)
From Proposition 1, we depict that what values are substituted into each variables in one iteration between lines 4–33 as Tables 1 and 2. Moreover, we need the following lemma (Lemma 1) for the correctness of Algorithm 1. Lemma 1 For any integer i (0 ≤ i ≤ H − 1), we have
ti+1
⎧ ⎪ ⎨1, if Ai < Bi , = ti , if Ai = Bi , ⎪ ⎩ 0, otherwise (Ai > Bi ).
(4)
A Generalization of Secure Comparison Protocol with Encrypted Output …
383
Table 2 Value of variables in one iteration (2/2) ci Ai Bi Case ti+1 0
0
0
0
0
γ
0 0
γ γ
0 γ
(Ai < Bi )
0
γ
γ
(Ai = Bi )
0
γ
γ
(Ai > Bi )
1
0
0
1
0
γ
1 1
γ γ
0 γ
(Ai < Bi )
1
γ
γ
(Ai = Bi )
1
γ
γ
(Ai > Bi )
ti+1 ← ti + Bi − u i + (d + Bi + ci − 1)/d = ti + 0 − 0 + 0 = ti ti+1 ← ti + Bi − u i + (d + Bi + ci − 1)/d = ti + Bi − (ti + Bi ) + 1 = 1 ti+1 ← (d + u i − Ai − 1)/d = (d − (Ai + 1))/d = 0 ti+1 ← (d + u i − Ai − 1)/d = (d + ti + (Bi − Ai − 1))/d = 1 ti+1 ← (d + u i − Ai − 1)/d = (d + ti + (Bi − Ai − 1))/d = ti ti+1 ← (d + u i − Ai − 1)/d = (d + ti + (Bi − Ai − 1))/d = 0 ti+1 ← ti + Bi − u i + (d + Bi + ci − 1)/d = ti + 0 − 1 + 1 = ti ti+1 ← ti + Bi − u i + (d + Bi + ci − 1)/d = ti + Bi − (ti + Bi ) + 1 = 1 ti+1 ← (d + u i − Ai − 1)/d = (d − Ai )/d = 0 ti+1 ← (d + u i − Ai − 1)/d = (d + ti + (Bi − Ai − 1))/d = 1 ti+1 ← (d + u i − Ai − 1)/d = (d + ti + (Bi − Ai − 1))/d = ti ti+1 ← (d + u i − Ai − 1)/d = (d + ti + (Bi − Ai − 1))/d = 0
Proof We consider three cases: Ai < Bi , Ai = Bi , and Ai > Bi . Case 1. Ai < Bi . From Tables 1 and 2, we consider the four subcases with ci , Ai and Bi . Case 1–1. ci = 0, Ai = 0 and Bi = γ . Since u i = ti + Bi , we have ti+1 = ti + Bi − u i + (d + Bi + ci − 1)/d = (d + (Bi − 1))/d . Since 0 ≤ (Bi − 1) < d and d ≤ d + (Bi − 1) < 2d, we have ti+1 = (d + (Bi − 1))/d = 1. Case 1–2. ci = 0, Ai = γ , Bi = γ and Ai < Bi . Since u i = ti + Bi , we have ti+1 = (d + u i − Ai − 1)/d = (d + ti + (Bi − Ai − 1))/d . From Proposition 1, we have ti+1 = (d + ti + (Bi − Ai − 1))/d = 1. Case 1–3. ci = 1, Ai = 0 and Bi = γ . Since u i = ti + Bi , we have ti+1 = ti + Bi − u i + (d + Bi + ci − 1)/d = (d + Bi )/d . Since d < (d + Bi ) < 2d, we have ti+1 = (d + Bi )/d = 1. Case 1–4. ci = 1, Ai = γ , Bi = γ and Ai < Bi . Since u i = ti + Bi , we have ti+1 = (d + u i − Ai − 1)/d = (d + ti + (Bi − Ai − 1))/d . From Proposition 1, we have ti+1 = (d + ti + (Bi − Ai − 1))/d = 1. Case 2. Ai = Bi . From Tables 1 and 2, we consider the four subcases with ci , Ai , and Bi .
384
T. Kobayashi and K. Hakuta
Case 2–1. ci = 0, Ai = 0 and Bi = 0. Since u i = 0 and Bi = 0, we have ti+1 = ti + Bi − u i + (d + Bi + ci − 1)/d = ti + (d − 1)/d = ti . Case 2–2. ci = 0, Ai = γ , Bi = γ and Ai = Bi . Since u i = ti + Bi , we have ti+1 = (d + u i − Ai − 1)/d = (d + ti + (Bi − Ai − 1))/d . From Proposition 1, we have ti+1 = (d + ti + (Bi − Ai − 1))/d = ti . Case 2–3. ci = 1, Ai = 0 and Bi = 0. Since u i = 1 and Bi = 0, we have ti+1 = ti + Bi − u i + (d + Bi + ci − 1)/d = ti − 1 + d/d = ti . Case 2–4. ci = 1, Ai = γ , Bi = γ and Ai = Bi . Since u i = ti + Bi , we have ti+1 = (d + u i − Ai − 1)/d = (d + ti + (Bi − Ai − 1))/d . From Proposition 1, we have ti+1 = (d + ti + (Bi − Ai − 1))/d = ti . Case 3. Ai > Bi . This proof is similar to that of Case 1. This completes the proof of Lemma 1.
Again we recall some notation. h is a bit-length of binary representation for the input integers. H is a bit-length of 2w -ary representation for the input integers. In other words, H indicates the number of blocks when h-bit binary representation is separated by w bits. Now we have H = h/w. The correctness in Algorithm 1 is as follows: Party A computes a ciphertext of ˆ b) from the inputs a and b. In other words, if a > b then Party A outputs a G(a, ciphertext of t H = 1, while if a ≤ b then Party A outputs a ciphertext of t H = 0. We describe it as Theorem 1. From Theorem 1, a ciphertext of t H = G(a, b) can be )−1 computed by Party A after line 33. By computing Enc pk (1; rY(A) ) · Enc pk (t H ; rt(A) H ˆ b) is output by Party A. in lines 34–35, a ciphertext of G(a, Theorem 1 For any integer H (H ≥ 1) that stands for a bit-length of 2w -ary representation for the inputs integer, after H -th iteration in Algorithm 1, We have Eq. (2). Proof We prove by induction on the number of blocks H . At first, we consider H = 1. Now two integers a and b are represented as follows: a = (ah−1 , . . . , a0 )2 = A(1) = (A0 )2w and b = (bh−1 , . . . , b0 )2 = B (1) = (B0 )2w . From Lemma 1, if A0 < B0 then we have t1 = 1 after line 32 in Algorithm 1. Hence, if A(H ) < B (H ) then t H = G(A(H ) , B (H ) ) = 1. From Lemma 1, if A0 = B0 then we have t1 = t0 after line 32. Since t0 = 1, we have t1 = t0 = 1. Hence if A(H ) = B (H ) then t H = G(A(H ) , B (H ) ) = 1. From Lemma 1, if A0 > B0 then we have t1 = 0 after line 32. Hence if A(H ) > B (H ) then t H = G(A(H ) , B (H ) ) = 0. Thus, we have Eq. (2) after line 32 for H = 1. Next, we assume that we have Eq. (2) after (H − 1)-th iteration in Algorithm 1 for any positive integer H . Now two integers a and b are represented as follows: a = (ah−1 , . . . , a0 )2 = A(H −1) = (A(H −1)−1 , . . . , A0 )2w and b = (bh−1 , . . . , b0 )2 = B (H −1) = (B(H −1)−1 , . . . , B0 )2w . Finally, we prove that we have Eq. (2) after H -th iteration in Algorithm 1 for any positive integer H . We have a = (ah−1 , . . . , a0 )2 = A(H ) = (A H −1 , A(H −1)−1 , . . . , A0 )2w , b = (bh−1 , . . . , b0 )2 = B (H ) = (B H −1 ,
A Generalization of Secure Comparison Protocol with Encrypted Output …
385
B(H −1)−1 , . . . , B0 )2w . To give the proof, we consider three cases as follows: A(H ) < B (H ) , A(H ) = B (H ) , and A(H ) > B (H ) . Case 1. A(H ) < B (H ) . Now we consider the following two subcases to prove the correctness. Case 1–1. A(H ) < B (H ) and A H −1 < B H −1 . A H −1 and B H −1 are most significant bits of A(H ) and B (H ) , respectively. Since A H −1 < B H −1 , from Lemma 1, t H = 1. Thus we have t H = G(A(H ) , B (H ) ) = 1. Case 1–2. A(H ) < B (H ) and otherwise. We have ∃ i 0 ∈ {0, . . . , H − 2} s.t. Ai0 < Bi0 , Ak = Bk (i 0 + 1 ≤ k ≤ H − 1). From Lemma 1, since tk+1 = tk for k = H − 1, we have t H = t H −1 . Since Ak = Bk (i 0 + 1 ≤ k ≤ H − 2), we have tk+1 = tk for k = i 0 + 1, . . . , H − 2. Since Ai0 < Bi0 , we have ti0 +1 = 1. Thus, we have t H = t H −1 = t H −2 = · · · = t(i0 +1)+1 = ti0 +1 = 1. Since t H = 1, we have t H = G(A(H ) , B (H ) ) = 1. Hence, in Case 1, we have Eq. (2) after H -th iteration. Case 2. A(H ) = B (H ) . Now it is obvious that A j = B j for all j (0 ≤ j ≤ H − 1). From Lemma 1, we have t H = t H −1 . we have tk+1 = tk for all k (0 ≤ k ≤ H − 2). Namely, we have t H −1 = t H −2 = · · · = t1 = t0 . Moreover, we have t0 = 1. Since t H = t H −1 = t H −2 = · · · = t1 = t0 = 1, we have t H = G(A(H ) , B (H ) ) = 1. Hence, in Case 2, we have Eq. (2) after H -th iteration. Case 3. A(H ) > B (H ) . This proof is similar to that of Case 1. Consequently, we have Eq. (2) after H -th iteration in Algorithm 1. This completes the proof of Theorem 1.
3.1 Efficiency Estimation We estimate for the efficiency of Our protocol. Now we describe some notation for the computational cost. We denote by Dec, Enc, E x p, I nv, Mul, and Sqr the cost of one decryption, encryption, exponent, inverse, multiplication, and squaring, respectively. We use a 2048-bit RSA modulus N (this satisfies 112-bit security level [9, pp. 51– 56]) for our estimation. Since Our protocol is a generalization of KH protocol 2, the estimation for Our protocol is similar to that of KH Protocol 2 [4, Sect. 4.3]. For our estimation, we have the following two assumptions. First, in lines 24–26 of rc(A) (rY(A) )−1 ) Algorithm 1, if ci = 1 then Party A computes Enc pk B (d + Bi ; r Z(A)r B(B) i i and carries out the rest lines using the result. Otherwise Party A computes Enc pk B (d + Bi − 1; r Z(A)r B(B) rc(A) (rY(A) )−1 ) and carries out them as well. In addition, Party A does i i not need to encrypt “0” in line 1. Since Party A computes the above ciphertexts depending on the value of ci , the changes do not affect the correctness of Our protocol. Second, Party A and Party B can store their ciphertexts they obtain in their lines, respectively.
386
T. Kobayashi and K. Hakuta
Table 3 Estimation for computational cost (Total) Total Enc Mul Cost 1 Cost 2 Cost 3 Cost 4 Cost 5
3 1.5w + 1 1.5w H + 4H 1.5H + 2 1.5w H + 5.5H + 2
3 2.5w + 1 2.5w H + 4H 5.875H + 1 2.5w H + 9.875H + 1
I nv
Exp
1 1.5w + 1 1.5w H + 2H 1.875H + 2 1.5w H + 3.875H + 2
1 0.5w 0.5wH+H 0 0.5w H + H
Table 4 Computational cost among similar protocols (Total) Protocol Total Enc Mul I nv KH Protocol 1 KH Protocol 2 Algorithm 1
1.5h + 1 8.5H + 2 1.5w H + 5.5H + 2
2.5h + 1 14.875H + 1 2.5w H + 9.875H + 1
1.5h + 1 6.875H + 2 1.5w H + 3.875H + 2
Exp 0.5h 2H 0.5w H + H
First of all, we estimate for the computational cost of subprotocols in Our protocol. Our protocol uses a secure division protocol [3, Protocol 1] which needs a SCP with an encrypted output. We call it a SDP (Secure Division Protocol). In [3], the author claimed that “Except for one execution of a SCP, Party A requires 2 Enc, 3 Mul and 1 I nv, and Party B requires 1 Enc and 1 Dec”. By precomputing, we may assume that 1 Dec ≈ 1 E x p [6]. For one execution of the SDP without a SCP, the average cost is depicted as Cost 1 in Table 3. Moreover, we use KH Protocol 1 as a subprotocol for the SDP. The SDP requires the input size log2 d = log2 2w = w against the SCP. For one execution of the SCP in the SDP, the average cost is depicted as Cost 2 in Table 3. For H -times executions of the SDP including the subprotocol, the average cost is depicted as Cost 3 in Table 3. Next, we estimate for the cost of Our protocol without subprotocols. From [4, Table 3], the average cost of Our protocol without subprotocols is depicted as Cost 4 in Table 3. Finally, we estimate for the cost of Our protocol including subprotocols is depicted as Cost 5 in Table 3. We compare the total computational cost among Our protocol, KH Protocol 1, and KH Protocol 2 with the number of multiplications. Again we use a 2048-bit RSA modulus. From [4, Sect. 3.2, Sect. 4.3], we depict the cost of KH Protocol 1 and KH Protocol 2 in Table 4. From Table 3, we depict that of Algorithm 1 in Table 4. Moreover, we convert the cost of 1 I nv, 1 Sqr , 1 E x p, and 1 Enc into the number of Mul, respectively. According to some works by Cohen et al. [10], we have 9Mul ≤ 1I nv ≤ 30Mul and 0.8Mul ≤ 1Sqr ≤ 1Mul. In this paper, we assume that 1I nv ≈ 9Mul and 1Sqr ≈ 1Mul.
(5)
Number of Multiplications (Thousand)
A Generalization of Secure Comparison Protocol with Encrypted Output … 1400
387
Our proposed protocol KH Protocol 1 KH Protocol 2
1200 1000 800 600 400 200 0 1
2
5
10
15
20
25
30
35
40
45
50
w
Fig. 1 Comparison of computational cost (Total)
We use a modified binary method for an exponent [11, Algorithm 14.83]. In the algorithm, we set the exponent part t = 2048. Moreover, we choose the window size k = 7 because the parameter is the most efficient. From [11, Table 14.16], since the algorithm requires (t + 1) Sqr and (t /k × (2k − 1)/2k + 2k −1 − 1) Mul on average, we assume that
1E x p ≈ (t + 1) Sqr + (t /k × (2k − 1)/2k + 2k −1 − 1) Mul ≈ 2402 Mul. (6) Furthermore, we estimate for the computational cost for 1 Enc of Paillier encryption scheme. According to [7, Fig. 1], we may assume that 1Enc ≈ 3 × (l/2) + 1 ≈ 3073Mul.
(7)
Note that the security parameter in [7] is equivalent to (l/2)-bits in this paper. For our estimation, we need to set a bit-length of inputs. As one application, Our protocol can be applied for a secure face recognition system proposed by Erkin et al. [12]. Since they set 50-bit integers for a SCP as the inputs [12, Sect. 5], we assume that h = 50 similarly. Note that 2048-bit integers can be applied as the inputs at most. From Eqs. (5), (6), (7), and Table 4, KH Protocol 1 needs a thousand of 294.4 Mul and KH Protocol 2 needs a thousand of 781.2 Mul approximately. The total computational cost of Our protocol is as follows: a thousand of 1264.9 Mul, 781.2 Mul, 491.0 Mul, 394.2 Mul, 362.0 Mul, 345.9 Mul, 336.2 Mul, 329.70 Mul, 325.1 Mul, 321.7 Mul, 319.0 Mul, 316.8 Mul for w = 1, 2, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, respectively. Furthermore, we illustrate the cost among these protocols in Fig. 1. From Fig. 1, Our protocol is more efficient as w is increased. Our protocol is more efficient than KH Protocol 2 for w ≥ 3 and is not more efficient than KH Protocol 1.
388
T. Kobayashi and K. Hakuta
3.2 Security We discuss the security of Algorithm 1. Our analysis is similar to that of KH Protocol 2 [4, Protocol 2] and Veugen’s one [3, Protocol 2]. For more details, we refer the reader to [4, Sect. 4.4] and [3, Sect. 2.3]. Moreover, we confirm the security for their lines of Our protocol. In lines 5–12, ) by a coin toss ci and sends Enc pk (si ; rs(A) ) to Party Party A masks Enc pk (ti ; rt(A) i i B. Because of the mask, Party B is unable to determine if si = ti or si = 1 − ti if ). In lines 13–19, Party B computes Party B decrypts the ciphertext Enc pk (si ; rs(A) i (B) Enc pk (u i ; ru i ) depending on Bi . Furthermore, Party B sends Enc pk (Bi ; r B(B) ) and i Enc pk (u i ; ru(B) ) to Party A. Thus, they do not leak any private information each other. i As as result, we can see that a semi-honest model is acceptable in Our protocol.
4 Conclusion We proposed a secure comparison protocol. It outputs an encrypted comparison result and is based on only an additive homomorphic encryption. We archived to generalize a protocol proposed by Kobayashi and Hakuta in WISA 2018. As an interesting feature, our proposed protocol proceeds w bits-by-w bits for any positive integer w to compute an output. Moreover, we discussed the security under semi-honest model and estimated for the efficiency of our proposed protocol.
References 1. Damgård I, Geisler M, Krøigård M (2008) Homomorphic encryption and secure comparison. Int J Appl Crypt 1(1):22–31 2. Damgård I, Geisler M, Krøigård M (2009) A correction to efficient and secure comparison for on-line auctions. Int J Appl Crypt 1(4):323–324 3. Veugen T (2011) Comparing encrypted data. In: Technical Report, Multimedia Signal Processing Group, Delft University of Technology, The Netherlands, and TNO Information and Communication Technology, Delft, The Netherlands 4. Kobayashi T, Hakuta K (2019) Secure comparison protocol with encrypted output and the computation for proceeding 2 bits-by-2 bits. In: Kang B, Jang J (eds) The 19th world conference on information security applications-WISA 2018. LNCS, vol 11402. Springer, Cham, pp 213– 228 5. Goldwasser S, Bellare M (2008) Lecture notes on cryptography 1996–2008. http://cseweb. ucsd.edu/mihir/papers/gb.html 6. Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: Stern J (ed) Advances in cryptology-eurocrypt 1999. LNCS, vol 1592. Springer, Heidelberg, pp 223–238 7. Damgård I, Jurik M (2001) A generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In: Kim K (ed) Public-key cryptography-PKC 2001. LNCS, vol 1992. Springer, pp 119–136
A Generalization of Secure Comparison Protocol with Encrypted Output …
389
8. Veugen T (2014) Encrypted integer division and secure comparison. Int J Appl Crypt 3(2):166– 180 9. National Institute of Standards and Technology, Recommendation for Key Management Part 1: General (Reversion 4), National Institute of Standards and Technology Special Publication 800–57, January 2016 10. Cohen H, Miyaji A, Ono T (1998) Efficient elliptic curve exponentiation using mixed coordinates. In: Ohta K, Pei D (eds) Advances in cryptology-ASIACRYPT 1998, international conference on the theory and applications of cryptology and information security. LNCS, vol 1514. Springer, Heidelberg, pp 51–65 11. Menezes AJ, Oorschot PCV, Vanstone SA (1997) Handbook of applied cryptography. CRC Press, Boca Raton 12. Erkin Z, Franz M, Katzenbeisser S, Guajardo J, Lagendijk RL, Toft T (2009) Privacy-preserving face recognition. In: Goldberg I, Atallah MJ (eds) Privacy enhancing technologies symposiumPETS 2009. LNCS, vol 5672. Springer, Heidelberg, pp 235–253
Conceptualizing Factors that Influence Learners’ Intention to Adopt ICT for Learning in Rural Schools in Developing Countries Siphe Mhlana , Baldreck Chipangura , and Hossana Twinomurinzi
Abstract Despite the positive contribution of Information and Communication Technologies (ICT) to learning outcomes in education, its adoption, integration, and use remain a challenge for rural school learners in developing countries. To understand why ICT adoption by rural school learners has been slow, a systematic literature analysis was undertaken in this study. An analysis of twenty-nine peer reviewed and published papers selected from five electronic databases was conducted to identify the key factors that influence rural school learners’ adoption patterns. The findings revealed the key factors as ICT infrastructure, technical support, access to resources, and social influence. Deductively, infrastructure was found to have a direct obstruction and negative consequences on the adoption of ICT technologies in learning. The findings are valuable for illustrating factors that affect the adoption of ICT technologies by rural learners in developing countries. The results inform educational policy strategies for providing rural schools with infrastructure and resources that promote the use of ICT in learning. Providing ICT infrastructure is fundamental to rural schools, especially in this era where the COVID-19 pandemic has made it compulsory for learners to adopt online learning. Keywords ICT · Adoption · Learner intention · Secondary schools · Developing countries
S. Mhlana (B) · B. Chipangura University of South Africa, Johannesburg, South Africa e-mail: [email protected] B. Chipangura e-mail: [email protected] H. Twinomurinzi University of Johannesburg, Johannesburg, South Africa e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_34
391
392
S. Mhlana et al.
1 Introduction In the past decade, teaching and learning has evolved from approaches that are classroom and teacher-centered to approaches that are learner-centered [1, 2]. The migration to learner-centered approaches have been facilitated by the adoption of technologies that include communication devices, computers, teaching and learning software applications, and social media [3–8]. As of recently, the adoption of ICT technologies in learning have immensely contributed to online teaching and learning as an intervention for breaking the disruption brought by the COVID-19 pandemic [9]. Additionally, the other benefits of ICT technology adoption in teaching and learning include the acquisition of twenty-first century technological skills by learners [10], learner motivation, breaking boundaries of space and time in learning [11, 12], and improved quality of education [13, 14]. Despite the benefits of ICT technology adoption in teaching and learning, literature documented challenges that hinder adoption at rural schools, which include lack of ICT infrastructure, technical support, cost of ICT devices, intermittent electricity and network connectivity [15]. Apart from the pros and cons, existent literature is silent on the actual ICT technology adoption by rural learners in developing countries. Literature is biased on ICT technology interventions that focus on bridging the digital divide and technology adoption at school level with a focus on school management and teachers [16]. Deductively, it can be inferred that ICT technology adoption at schools exclude the learners even though they are fundamental stockholders. If learners are excluded from the adoption equation at schools, this contradicts the basic definition of adoption. Adoption is defined as a judgment or choice that individuals make every time they want to take part in an innovation project [17]. Therefore, the aim of this research paper is to investigate factors that influence rural school learners’ intention to adopt ICTs for learning in developing countries. To uncover these factors, the investigation was undertaken as a systematic literature analysis study. The rest of the paper is structured as follows. Section 2 presents the Methodology, Sect. 3 presents the Analysis and discussion of the findings, and Sect. 4 presents the Conclusion, limitations, and future work.
2 Methodology This study adopted the Lage and Filho [18] systematic review protocol and the reporting approach used by Brereton et al. [19] and Amui et al. [20] in their studies. The aim of the review was to analyze literature on the factors that influence rural school learners’ intention to adopt ICT technologies. Figure 1 presents the protocol which was undertaken in three phases. The phases include the identification of articles, developing a classification and coding framework, and the actual coding and classifying of themes.
Conceptualizing Factors that Influence Learners’ Intention …
393
Fig. 1 A schematic representation of the adopted methodology and results framework for this study [19]
2.1 Identification of the Research Articles Identification of research articles involved the construction of search terms, identification of data sources, and the selection of articles based on the inclusion and exclusion criteria [19]. The search terms were constructed to identify factors that influence rural school learners’ intention to adopt ICT technologies for learning. The terms used for searching the articles were ICT adoption factors in learning; E-learning adoption at schools; M-learning adoption at schools; ICT adoption by learners; ICT technology adoption at schools in rural areas; ICT benefits in learning; ICT challenges in learning. During searching, the terms were sometimes merged with terms such as “and”, “or” to come up with search phrases. The articles were retrieved from five electronic multidisciplinary databases, which are Scopus, ScienceDirect, IEEE, ACM, and CiteSeerX. Google Scholar was also used as a search engine for general searching of articles. The initial search yielded 291 articles, which were downloaded and reviewed by scheming through the abstracts and the discussion sections. Articles were selected for further analysis if they were peer reviewed journal or conference papers and published between 2005 and May 2020. After the first round of analysis, 218 articles were excluded. The remaining 73 articles were analyzed and 44 articles were excluded, allowing 29 articles to be used for the systematic literature analysis.
2.2 Classification and Coding Framework The classification and coding framework developed by Amui et al. [20] was adapted in this study. Table 1 presents the classification, description of the classification and the assigned codes. Six classifications and twenty codes were developed for this study.
394
S. Mhlana et al.
Table 1 Classification and code framework Classification
Description
Codes
Context
Developed countries
1A
Developing countries
1B
Focus
Method
Field/Area
Theory/Model
Factors that affect adoption
Not applicable
1C
Factors affecting adoption of ICT as a main theme
2A
ICT adoption in secondary schools as main theme
2B
m-learning used interchangeably
2C
e-learning used interchangeably
2D
Qualitative
3A
Quantitative
3B
Theoretical
3C
Empirical
3D
Case studies/interviews
3E
Survey
3F
Mixed-method
3G
Education sector
4A
Other sectors
4B
Not applicable
4C
Single
5A
Combined
5B
Not applicable
5C
Adoption factors (perceived usefulness, ease to use, anxiety, and self-efficacy), infrastructure, support, social influence, resources, active learning and personalized learning
3 Analysis and Discussion of Findings After analyzing the 29 selected articles in this systematic literature review, a classification and coded framework was utilized (Table 1). The results of the classification and coded framework are presented in Table 2.
3.1 Context of Research The context of a research is a valuable parameter that exposes the background of research [22, 23]. The analysis found that fifty-nine per cent (59%) of the studies were conducted in developing countries (1B) as compared to thirty-four per cent (34%) in developed countries (1A); see Table 2. However, three per cent (3%) of the studies compared developed and developing countries (1A and 1B), and the other
Conceptualizing Factors that Influence Learners’ Intention …
395
Table 2 Classification and coding framework No
Authors
Context
Focus
Method
Sector
1
Manduku et al. (2012)
1B
2A
3F
4A
2
Moghaddam et al. (2013)
1A
2A
3G
4B
3
Mehra et al. (2011)
1A
2D
3F
4A
4
Sun et al. (2008)
1A
2D
3G
4A
5
Prasad et al. (2015)
1B
2A
3F
4A
6
Baki et al. (2018)
1A
2D
3B
4A
7
Nikou et al. (2017)
1A
2C
3B
4A
8
Rajesh et al. (2018)
1A
2C
3B
4B
9
Miller et al. (2006)
1B
2A
3A
4A
10
Bakhsh et al. (2017)
1B
2C
3B
4A
11
Hamzah et al. (2018)
1B
2C
3A
4C
12
Adel Ali et al. (2018)
1A
2C
3D
4A
13
Gakenga et al. (2015)
1B
2A, 2B
3G
4A
14
Qteishat et al. (2013)
1B
2D
3B
4A
15
Cakir et al. (2014)
1A
2D
3F
4A
16
Senaratne et al. (2019)
1B
2C
3F
4B
17
Ogundile et al. (2019)
1B
2A, 2B
3B
4A
18
Mutua et al. (2016)
1B
2D
3B
4A
19
Mtebe et al. (2014)
1B, 1A
2C
3F
4A
20
Friedrich et al. (2010)
1A
2D
3B
4C
21
Buabeng-Andoh (2015)
1A
2A, 2B
3B
4B
22
Farinkia (2018)
1B
2A, 2B
3F
4A
23
Momani et al. (2012)
1B
2A, 2B
3B
4A
24
Grandon et al. (2005)
1A, 1A
2A
3B
4A
25
Nasser et al. (2016)
1B
2C
3B
4A
26
Shraim et al. (2010)
1B
2D
3F
4A
27
Langat (2015)
1B
2A, 2B
3G
4A
28
Barakabitze et al. (2015)
1B
2A, 2B
3G
4A
29
Wei-Han Tan et al. (2012)
1B
2C
3F
4A
three per cent (3%) compared countries within the developed world (1A). Comparing the articles published in developing regions show that Asia had more articles (54%) than Africa (46%). As a result, African countries are lagging behind other developing countries in terms of publications [21, 22].
396
S. Mhlana et al.
3.2 Focus of the Research The analysis found that forty-five per cent (45%) of articles discussed factors that influence learners’ intention to adopt ICT at secondary schools as the main theme (2B). Eight per cent (8%) of articles discussed ICT adoption (2A) without mentioning secondary schools (2B). Mobile learning (m-learning) (2C) and ICT for learning were used interchangeably, which was the most preferred way for narrating ICTs as a tool of learning. Twenty per cent (20%) of the selected articles discussed factors that influence the adoption of m-learning at schools. Twenty-seven per cent (27%) of articles discussed e-learning at schools (2D). The analysis did not identify any papers that discussed the adoption or intention to adopt ICT technologies by rural school learners. Therefore, this presents a research gap that needs to be pursued.
3.3 Research Methods The analysis found that seventy per cent (70%) of the articles employed quantitative research strategies (3B); twenty per cent (20%) utilized mixed method strategies (3G); and ten per cent (10%) of articles employed qualitative research strategies (3A). The findings suggest that most articles used the quantitative method. This research will employ a mixed-method approach for data gathering because it gives a wholistic view of technology adoption by rural school learners.
3.4 Theories Adopted The literature analysis found that six theories were used to explain the adoption of ICT technologies for learning and these are: Technology Adoption Model (TAM), Theory of Reasoned Action (TRA), Theory of Planned Behavior (TPB), Unified Theory of Acceptance and Use of Technology (UTAUT), and Diffusion of Innovation (DoI). There were studies that adopted one theory and some combined two or more theories. For studies that adopted one theory, TAM was adopted by 35% of the studies and UTAUT was adopted by 15% of the studies. Almost 5% of the studies combined TAM, TRA, TPB, and UTAUT, and about 10% combined TAM and UTAUT. There were 25% of the papers that did not use any theory in their studies.
Conceptualizing Factors that Influence Learners’ Intention …
397
3.5 Factors that Affect the Adoption of ICT Technologies by Learners This section discusses the factors that affect the adoption of ICT technologies by learners from rural schools in developing countries. Factors that emanate from adoption theories, and other factors such as infrastructure, support, social influence, resources, active learning, and personalized learning are also discussed. Adoption theories factors. From the point of view of technology adoption theories (TAM, UTAUT, TRA, TPB, and DOI), perceived usefulness and ease are factors that positively influence ICT technology adoption by learners [23–25]. If learners at rural schools believe that ICT technologies are useful and easy to use in learning, they would adopt the technologies. Morris and Venkatesh [26] posit that users will adopt a technology if they believe that learning to use such a technology is easy to understand. Hence, if ICT technologies for learning are difficult to learn, and demand a lot of time and effort to understand, the result is that learners will become anxious [27]. Anxiety is the fear of interacting or using a computer technology and the associated emotional fears of receiving negative outcomes. If learners find an ICT technology stressful to use in learning, they will not trust that the technology will improve their learning [28]. Furthermore, learners will find a technology which is easy to use if they have positive self-efficacy. Self-efficacy is the confidence and motivation that a user would have in learning to use a new technology [29]. Learners with positive self-efficiency will find ICT technologies easy to use and will develop competence in using the technology, which increases their chances of accepting the technology [30]. Infrastructure. ICT infrastructure is a factor that predicts the adoption and acceptance of ICT technologies by learners [31–34]. Learners will adopt ICT technologies for learning if they have access to ICT backbone infrastructure as provided by the government, ICT infrastructure as provided by schools and access to end-user devices [31]. With respect to the ICT backbone infrastructure, many developing countries have failed to provide rural areas with fixed wired access such as fiber optic or highspeed wireless access [32]. However, technologies available in rural areas such as 3G have been reported to be expensive and beyond reach for many rural people [15]. In situations where rural schools are exposed to such challenges, it follows that the schools cannot provide the learners with the required ICT infrastructure for learning. Richardson et al. [33] stated that schools should provide ICT infrastructure such as WiFi, end-user devices such as computers or mobile devices to encourage learners to adopt ICTs for learning. In circumstances where schools cannot provide end-user computing devices, the alternative is to implement the Bring Your Own Device Policy (BYOD) [31]. The implications of implementing a BYOD policy at rural schools is that learners from families living under the poverty datum do not afford to purchase end-user devices. In South Africa, Donner et al. [34] found that ICT devices were shared among members of families living under the poverty datum.
398
S. Mhlana et al.
Technical support. Technical support provided by schools is a factor that has effect on the adoption of ICT technologies by learners in rural schools [35]. Schools are encouraged to provide learners with support in terms of training and access. Training has been identified as an important initiative that entice learners to adopt technologies as witnessed in a case study carried in South Africa [16]. Social influence. Social influence was found to influence the adoption of ICT technologies in learning. Social influence is the degree to which learners believe that if important people around them use certain technology, they should also use that technology [36, 37]. Sumak et al. [38] found that social influence had a significant effect on learner behavior and attitude to adopt and use learning technologies. In this case, teachers play an important role in encouraging learners to use ICT for learning. Access resources. Adoption of ICT technologies can enable learners to access online learning management systems and open access courses. Online learning management systems provide learners with tuition and administrative resources such as content, assessment, discussion portals, communication channels, and peer tutoring resources [39, 40]. The benefit is that the learners can access archived lessons, take mock assessments, and interact with other learners on discussion forums during or after classes. As of recent, learning management systems have been moved to Cloud services to improve accessibility and efficiency [41]. The advantage of cloud-based learning management services is that learners do not need end- user software on their devices. Hence, they can access the learning management systems through any internet connected device such as a tablet, a phone or a computer. Furthermore, learning management systems provide learners with access to Massive online open courses (Moocs), a term coined to describe online courses that are offered to anyone who has interest in learning, where registration, study material, and assessment are open access [42, 43]. The benefit of Moocs to rural students is that they can undertake courses that are not offered at their schools [43], which, in turn, enable learners to pursue their passion independent of their schools. Active learning. Adopting ICT technologies in learning promotes active learning in the class or out of class. Active learning is when learners are actively participating in their learning and knowledge making process [44]. ICT technologies enable active learning because they provide a medium that stimulate learner-centered activities such as learning through gaming, collaboration, research, and debating [45]. ICT tools that facilitate active learning include discussion forums, blogs, wikis, and text messaging. Personalized learning. ICT technologies facilitate personalized learning, on which the learning pace and the resources are optimized to meet the needs and capabilities of a learner [46]. The philosophy of personalized learning is that people are different, and their learning styles are also different, hence for them to succeed, they need to be supported with personalized resources to unlock their potential [47, 48]. Fiedler [49] argued that even if learners are provided with personalized learning they will still need support from teachers and their peers to succeed. Therefore, intelligent ICT
Conceptualizing Factors that Influence Learners’ Intention …
399
learning systems can be used to provide learners with personalized learning tools that enable communication and interaction. While ICT technologies provide personalized learning, they also provide learners with the flexibility of learning remotely and at any time of the day.
4 Conclusion The literature analysis uncovered a dearth on literature that focus on ICT technology adoption by rural learners in developing countries. Even though literature reported on initiatives undertaken to introduce ICT technologies for learning at rural schools, the adoption of such technologies is discussed from the point view of management and schoolteachers. In most of research papers, the aspect of learner adoption of ICT technologies was not investigated. However, the identified factors emanated from both developed and developing countries and not necessarily from literature that investigated adoption of technology from rural schools. Therefore, there is a need to conduct an empirical study in order to understand how these factors affect the rural school learner’s intention to adopt ICT technologies for learning.
References 1. Higgins S (2003) Does ict improve learning and teaching in schools? J Sci Technol 17:586–594 2. Sutherland R, Armstrong V, Barnes S, Brawn R, Breeze N, Gall M, Matthewman S, Olivero F, Taylor A, Triggs P, Wishart J, John P (2004) Transforming teaching and learning: embedding ICT into everyday classroom practices, pp 413–425 3. Lawrence JE, Tar UA (2018) Factors that influence teachers’ adoption and integration of ICT in teaching/learning process. EMI Educ Media Int 55:79–105. https://doi.org/10.1080/09523987. 2018.1439712 4. Watson DM (2001) Pedagogy before technology: re-thinking the relationship between ICT and teaching. Educ Inf Technol 6:251–266. https://doi.org/10.1023/A:1012976702296 5. Yusuf MO (2005) Information and communication technology and education: analysing the nigerian national policy for information technology. Int Educ J 6:316–321 6. Assan T, Thomas R (2012) Information and communication technology Integration into teaching and learning : opportunities and challenges for commerce educators in South Africa. Int J Educ Dev Using Inf Commun Technol 8:4–16 7. Kirkwood A, Price L (2014) Technology-enhanced learning and teaching in higher education: What is ‘enhanced’ and how do we know? a critical literature review. Learn Media Technol 39:6–36. https://doi.org/10.1080/17439884.2013.770404 8. Kirkwood A, Price L (2005) Learners and learning in the twenty-first century: What do we know about students’ attitudes towards and experiences of information and communication technologies that will help us design courses? Stud High Educ 30:257–274. https://doi.org/10. 1080/03075070500095689 9. UNESCO (2020) Back to school: preparing and managing the reopening of schools. https:// en.unesco.org/news/back-school-preparing-and-managing-reopening-schools 10. Van Braak J (2004) Explaining different types of computer use among primary school teachers. Euopean J Psychol Dep 407–422. https://doi.org/10.1007/bf03173218
400
S. Mhlana et al.
11. Al-zaidiyeen NJ, Mei LL (2010) Teachers attitude towards ICT. Int Educ Stud 3:211–218 12. Alzamil Z (2006) Application of computational redundancy in dangling pointers detection. In: 2006 international conference on advanced software engineering ICSEA’06, p 30. https://doi. org/10.1109/icsea.2006.261286 13. Tong KP, Trinidad SG (2005) Conditions and constraints of sustainable innovative pedagogical practices using technology. Int Electron J Leadersh Learn 9 14. Farooq MS, Chaudhry AH, Shafiq M, Berhanu G (2011) Factors affecting students quality of academic performance: a case of secondary school level. J Qual Technol Manag 01–14:1–14 15. Mitchell M, Siebörger I (2019) Building a National Network through peered community area networks: realising ICTs within developing countries. In 2019 conference on information communications technology and society (ICTAS). IEEE, pp 1–5 16. Botha A, Herselman M (2014) Designing and implementing an Information Communication Technology for Rural Education Development (ICT4RED) initiative in a resource constraint environment: Nciba school district, Eastern Cape, South Africa 17. Voogt J, Knezek G (2008) International handbook of information technology in primary and secondary education 18. Lage M, Godinho M (2010) Variations of the kanban system: literature review and classification. Int J Prod Econ 125:13–21. https://doi.org/10.1016/j.ijpe.2010.01.009 19. Brereton P, Kitchenham BA, Budgen D, Turner M, Khalil M (2007) Lessons from applying the systematic literature review process within the software engineering domain. J Syst Softw 80:571–583. https://doi.org/10.1016/j.jss.2006.07.009 20. Bartocci L, Amui L, Jose C, Jabbour C, Beatriz A, Sousa L De, Kannan D (2017) Sustainability as a dynamic organizational capability: a systematic review and a future agenda toward a sustainable transition. J Clean Prod 142:308–322. https://doi.org/10.1016/j.jclepro.2016. 07.103 21. Kinuthia W (2009) Educational development in Kenya and the role of information and communication technology Wanjira Kinuthia Georgia State University, USA, vol 5, pp 6–20 22. Aduwa-ogiegbaen SE, Okhion E, Iyamu S (2005) Using information and communication technology in secondary schools in Nigeria: problems and prospects. Int. Forum Educ Technol Soc 8(1):104–112 23. Gefen D, Karahanna E, Straub DW (2003) Inexperience and experience with online stores: the importance of TAM and trust. IEEE Trans Eng Manag 50:307–321. https://doi.org/10.1109/ TEM.2003.817277 24. Ong CS, Lai JY, Wang YS (2004) Factors affecting engineers’ acceptance of asynchronous e-learning systems in high-tech companies. Inf Manag 41:795–804. https://doi.org/10.1016/j. im.2003.08.012 25. Legris P, Ingham J, Collerette P (2003) Why do people use information technology? A critical review of the technology acceptance model. Inf Manag 40:191–204. https://doi.org/10.1016/ S0378-7206(01)00143-4 26. Morris MG, Venkatesh V (2000) Age differences in technology adoption decisions: implications for a changing work force. Pers Psychol 53:375–403. https://doi.org/10.1111/j.17446570.2000.tb00206.x 27. Alenezi AR, Karim AMA, Veloo A (2010) An empirical investigation into the role of enjoyment, computer anxiety, computer self-efficacy and internet experience in influencing the students’intention to use e-learning: a case study from Saudi Arabian Governmental Universities. Turkish Online J Educ Technol 9:22–34 28. Chua SL, Chen DT, Wong AFL (1999) Computer anxiety and its correlates: a meta-analysis. Comput Human Behav 15:609–623. https://doi.org/10.1016/S0747-5632(99)00039-4 29. Kanwal F, Rehman M (2017) Factors affecting e-learning adoption in developing countriesempirical evidence from Pakistan’s higher education sector. IEEE Access 5:10968–10978. https://doi.org/10.1109/ACCESS.2017.2714379 30. Mccoy C (2010) Computers & education perceived self-efficacy and technology proficiency in undergraduate college students, vol 55, pp 1614–1617. https://doi.org/10.1016/j.compedu. 2010.07.003
Conceptualizing Factors that Influence Learners’ Intention …
401
31. Chipangura B (2019) Conceptualizing factors that influence South African students’ intention to choose mobile devices as tools for learning 32. Gillwald A, Moyo M, Stork C (2012) Understanding what is happening in ICT in South Africa a supply- and demand-side analysis of the ICT sector 33. Williams D, Coles L, Wilson K, Richardson A, Tuson J (2000) Teachers and ICT: current use and future needs. Br J Educ Technol 31:307–320. https://doi.org/10.1111/1467-8535.00164 34. Gitau S, Marsden G, Donner J (2010) After access–challenges facing mobile-only internet users in the developing world, pp 2603–2606 35. Butler DL, Sellbom M (2002) Barriers to adopting technology for teaching and learning. Educ Q 25:22–28. https://doi.org/10.1016/j.compedu.2009.03.015 36. Thomas T, Singh L, Gaffar K (2013) The utility of the UTAUT model in explaining mobile learning adoption in higher education in Guyana. J Educ 9:71–85. https://doi.org/10.5539/ass. v10n11p84 37. Mousa Jaradat M-IR, Al Rababaa MS (2013) Assessing key factor that influence on the acceptance of mobile commerce based on modified UTAUT. Int J Bus Manag 8:102–112. https:// doi.org/10.5539/ijbm.v8n23p102 38. Sumak B, Polancic G, Hericko M (2010) An empirical study of virtual learning environment adoption using UTAUT. In: 2010 second international conference mobile, hybrid, on-line learning, pp 17–22. https://doi.org/10.1109/elml.2010.11 39. Clarke A (2008) E-learning skills. Palgrave Macmillan 40. Garrison DR (2011) E-learning in the 21st century: a framework for research and practice. Taylor & Francis 8:50–56 41. Mhouti A El (2018) Using cloud computing services in e-learning process: benefits and challenges, pp 893–909. https://doi.org/10.1007/s10639-017-9642-x 42. Cormier D, Siemens G (2010) Through the open door: open courses as research, learning, and engagement 43. Cooper S, Sahami M (2013) Education reflections on stanford’s MOOCs, pp 3–5. https://doi. org/10.1145/2408776.2408787 44. Bonwell C, Eison J (1991) Active learning: creating excitement in the classroom. Ashe-Eric higher education reports. Information analyses-ERIC clearinghouse products (071), pp 3. ISBN 978-1-878380-08-1. ISSN 0884-0040 45. McKeachie WJ, Svinicki M (2006) Teaching tips: strategies, research, and theory for college and university teachers. Wadsworth, Belmont, CA 46. Pogorskiy E (2015) Using personalisation to improve the effectiveness of global educational projects. https://doi.org/10.1177/2042753014558378 47. Mulford B (2010) Leadership and management overview, editor(s): international encyclopedia of education, 3rd edn. Elsevier, pp 695–703. https://doi.org/10.1016/b978-0-08-044894-7.010 60-5 48. Duckett I (2010) Personalized learning and vocational education and training, pp 391–396 49. Fiedler SHD, Innovation S (2011) Personal learning environments: concept or technology? vol 2, pp 1–11. https://doi.org/10.4018/jvple.2011100101
The Innovation Strategy for Citrus Crop Prediction Using Rough Set Theory Alessandro Scuderi, Giuseppe Timpanaro, Giovanni La Via, Biagio Pecorino, and Luisa Sturiale
Abstract The agri-food system of the world is undergoing a radical change in relation to the future scenario with the reconfiguration of the production factors. The future of agriculture, following the other sectors, will be that of an innovative agriculture based on digitization. The future outlook indicates that a “digital agricultural revolution” will be the change that would allow to have the quantities of food for the needs of the whole world. Predictive analysis is a tool that would provide the best use of production, reduce waste, and satisfy food needs. The process uses heterogeneous data, often large in size, in models capable of generating clear and immediately usable results to more easily achieve this goal, such as reducing material waste and inventory, and to obtain a finished product that meets specifications. The proposed theoretical model represents a first modeling to make usable the innumerable amount of data that, in the future, the agri-food system, through digital transformation, will be able to provide, to which it will be necessary to give an adequate response in methodological and operational terms. Keywords Agri-food · Big data · Digital transformation · Agriculture 4.0 · Multi-criteria decision
1 Introduction The demand for food products in the near future will be related to the growing population growth. The Food and Agriculture Organization of the United Nations (FAO) estimates that current agricultural production will have to increase by 50% more by 2050 [5]. The world population is constantly growing and will continue to grow from 7.1 billion people in 2012 to 9.8 billion in 2050 and up to 11.2 billion A. Scuderi (B) · G. Timpanaro · G. La Via · B. Pecorino Department of Agriculture, Food and Environment (Di3A), University of Catania, Catania, Italy e-mail: [email protected] L. Sturiale Department of Civil Engineering and Architecture (DICAR), University of Catania, Catania, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_35
403
404
A. Scuderi et al.
in 2100. Populations in high-income countries will increase by about 10% or 111 million people [5]. In this future scenario, process and product innovations can contribute to the definition of new production and distribution models. Technological innovations take on a strategic role, as they make it possible to optimize production factors, harvest management, and distribution to consumers. Production efficiency is the main driver of technological development that leads to a reduction in the use of energy and water resources with a positive impact on the environment and climate. Finally, technological progress drives not only vertical integration, but also horizontal integration in the food chain that tends to promote large food suppliers [10]. Agriculture 4.0 will revolutionize the agro-food system, with the ability to manage seeds, fertilizers, water, and crops by providing continuous control and supporting decision-making related to consumer needs [7]. The research aims to provide an initial contribution to the perception that citrus fruit operators have of the opportunities and limitations of adopting intelligent agrifood chain. The first results will be presented, obtained from a predictive analysis approach to citrus fruit production through the application of rough sets combined with big data, to define possible future scenarios deriving from the implementation of digital transformation with econometric analysis.
2 Materials and Methods 2.1 Trend of World Citrus Production The surface dedicated globally to citrus fruits has increased in the last decade from 7.3 million hectares to 7.9 million hectares; this increase can be attributed to new geographical areas being used to cultivate citrus fruits, especially in developing countries [4]. Among the various citrus species, the cultivation of oranges is becoming increasingly important, covering an area of 3.8 million hectares in 2019, followed by small citrus fruits (tangerines, clementines, and mandarins) with 2.5 million hectares, and lemons with 1.1 million hectares, as well as other minor species. The global production of citrus fruit in 2019 is 132.7 million tons, an increase of 19.6% from 2009 [17], showing a greater dynamism in production compared product and process innovations occurring in the same period. From the analysis of the geographical distribution relative to the world production of citrus fruits, it can be seen that the Asian continent is the main global producer (42%), followed by the Americas (36%) and Africa (13%), with Europe only producing 8% of the total production [4]. Given that oranges account for almost half of the world’s citrus fruit production, the present study focuses on oranges. According to previous research [16], the European Union has suffered a decrease in production to the point of becoming a net importer of oranges. Based on United States Department of Agriculture (USDA)
The Innovation Strategy for Citrus Crop Prediction …
405
data, in 2019, the European Union imported more than one million tons of oranges, 3% more than in the previous season, with a significant cost to the European market (USD 780 million); the largest exporters were South Africa and Egypt, followed by Morocco and Argentina [20]. The quantity of oranges consumed worldwide in 2017 was 73.3 million tons, similar to the previous year [6]. In general, however, orange consumption shows a modestly increasing trend. In the decade analyzed (2009–2019), orange consumption reached 73.6 million tons in 2016 and then decreased. Global orange market revenues amounted to $44 billion in 2019, an increase of 3.6% over the previous year. The market value of oranges increased progressively by 2.9% from 2009 to 2019, confirming the current trend, albeit with some annual variations. The most significant growth rate was recorded in 2009, when it increased by 15%. The global consumption of oranges reached a peak of 48 billion dollars, but in recent years it has been at a slightly lower level. Regarding the consumption of oranges globally in recent years, the country with the highest consumption is Brazil, followed by China and India [20]. Brazil’s level of consumption (Table 1) increased over a 3-year period (2015–2017) from 16.9 to 17.4 million tons (although there has been a reduction over the last decade of 0.7 tons), while per capita consumption expressed in kg for each individual consumer rose from 82.2 kg per person to 83.4 kg (2015–2017) (although there has been a reduction of 1.6% over the last decade) [20]. In total, orange consumption was 73.3 million tons in 2017, representing a decrease of 1.2% over the last decade. Average per capita consumption was 9.7 kg per person in 2017, representing a decrease of 0.2% from 2015. These data confirm the importance of the citrus-fruit-production chain in terms of: value and quantity in the field of agri-food production; the considerable level of international trade; the growing interest in the traceability and characteristics of products by all actors in the chain; and, most importantly, consumers who are looking for healthy citrus fruits, deliberately seeking to establish the quality of the product at its origin.
2.2 Digital Transformation and Predictive Analysis in Agri-Food Chain The digital revolution that is characterizing all sectors will also affect the management of the agro-food chain [18]. The production will see the optimization of resource management, so as to improve yields, reduce impacts, and increase margins. The agriculture of the future will be highly optimized, intelligent, and capable of planning. The system will be characterized by the big data that will be generated and processed in real time so as to have a hyper-connected system, in which the data take on a guiding function.
1.5
Turkey
73.1
2.0
Italy
Total
1.2
Spal»
18.0
1.9
Indonesia
Others
2.7
Egypt
1.5
5.4
USA
1.7
4.S
Henco
Pakistan
7.7
India
Iran
9.1
China
73.6
17.7
1.6
1.5
1.5
1.5
2.3
2.2
2.2
5.0
4.6
7.6
8.3
73.3
17.7
1.6
1.6
1.6
1.6
1.9
2.3
2.4
4.3
4.6
7.7
8.6
17.4
2017
7414.1
3220.1
189.4
79.4
78.3
59.5
46.4
258.2
93.8
319.9
125.9
1309.0
1429.4
206.0
7498.2
3266.1
193.2
50.3
79.5
59.4
46.3
261.1
95.7
322.2
127.5
1324.2
1435.0
207.7
2016
7581.6
3312.2
197.0
81.2
80.7
59.4
46.4
264.0
97.6
324.5
129.2
1339.2
1441.1
209.3
2017
2015
17.3
2016
2015
16.9
Population, million persons
Consumption, million tons
Brazil
Country
9.9
9.9
8.9
19.4
19.5
33.7
25.4
7.2
28.7
16.9
35.7
5.9
5,7
82.2
2015
9.8
9.8
85
22.9
18.7
24.9
49.2
8.2
22.9
15.5
35.9
5.7
5.8
83.1
2016
9.7
9.7
8.0
19.8
19.9
27.3
41.6
8.8
24.1
13.2
35.4
5.7
6.0
83.4
2017
Per capita consumption, kg per person
1.0
−0.2
−0.2
−3.8 −2.2
−2.6 −0.2 1.2
0.7
2.2
2.6 −4.3
2.8 −4.3
0.7 −2.6
−1.4
−5.1
2.8
−4.4
4.8 −0.7
0.8
6.1
9.0
−1.6
9.6
2007–2017
2007–2017
CAGR per capita consumption %
−0.7
CAGR of consumption %
Table 1 Per capita consumption of oranges in the main countries of the world, 2015–2017 (millions of tons, kg/year)
406 A. Scuderi et al.
The Innovation Strategy for Citrus Crop Prediction …
407
The supply chains will become digital with traceable systems, agricultural production, and breeding will be indirect connection with the supply chain and the individual processes will follow the instructions from the system to obtain maximum production, at minimum cost and with maximum optimization. Digital agriculture from a theoretical point of view could lead to greater food security, profitability, and sustainability [13, 19]. In this context, market forecasts to date a field little explored will represent, for the next decade, a real novelty in relation to the needs of the world population in the future [5]. In the current scenario of the Sustainable Development Goals, an opposing vision is generated. On the one hand, those who see a scenario linked to organic farming, the reduction of chemistry, and the return of the balance between agriculture and the environment. On the other hand, those who see digital agriculture as the tool to provide economic efficiency through increased yields, reduced costs, and greater integration with the market with undoubted social and cultural benefits, capable of generating environmental benefits and adaptation to climate change [9]. In the field of digital transformation, predictive analysis is an important element, using big data to predict production phenomena. The historical data, which the agrifood system can provide, are endless and can be used to build a mathematical model able to detect the most important trends. The predictive model is applied to the current scenario to predict future events and plan the measures to be taken to achieve optimal results [8]. To exploit the value of large data, companies apply algorithms to large datasets using tools such as Hadoop and Spark. Data sources can be databases, equipment log files, images, video, audio, sensors, or other types of data. The objective of predictive analysis is to increase production efficiency and to plan sales or use of the production obtained in order to reduce waste, increase profits, and meet food requirements. Predictive analysis is a very complex process because production depends on many variables, and the sources from which the data can be obtained are very heterogeneous, with an infinite number of data, so the creation of a model is very complex if a series of discriminants are not determined a priori in the mathematical model. Aware that not all data influencing production can be used, a mathematical approach by aggregations describing the phenomenon under examination may represent a first step [3]. The model is used to predict a result in a given future state as a function of input changes. Numerical predictive modeling differs from the mathematical approach in that it uses models that are not easy to explain in equation form and often require simulation techniques to create a prediction [3]. Predictive modeling is often performed using curve and surface adaptation, regression of historical series, or machine learning approaches. Regardless of the approach used, the process of creating a predictive model is the same as the other mathematical methods using big data.
408
A. Scuderi et al.
2.3 Methodology—The Rough Sets Theory The theory of “rough sets” [11] is applied in several fields, resulting, in particular, in the evaluation of multi-attribute classification and, recently, also in multi-criteria decisional problems of classification, choice, and ordering [2]. This theory is based on the premise that each unit of a given universe is associated with information, expressed using certain attributes that describe these units. As reported in several papers [3, 11], the relation of indiscernibility thus generated constitutes the mathematical foundation of the “rough set” theory, the bricks (granules) with which the construction of the knowledge of reality is constructed. The “rough set” is defined by the difference between the lower and upper limits and constitutes the boundary of the “rough set” [3]. The characteristic of “rough sets” is the uncertainty of knowledge and its high variability. The methodology consists in treating the indiscernibility between data having the same description (“granules”) [11]. The approach in the specific case allowed to highlight the relationships between the available data, regarding citrus fruit production, highlighting the importance of some information and the irrelevance of others [3]. In order to be able to proceed with the application of the theory of “rough sets” to the problems linked to the forecasting of citrus production, the main concepts that characterize it will be illustrated. The information on the units that make up the universe considered is given in the form of an information table, in which the rows refer to the different units and the columns to the different attributes considered. Each cell indicates the evaluation of the unit placed in the rough with the attribute of the corresponding column, as specified below. The representation of reality depends on the knowledge possessed in relation to it and the ability to classify the information obtained [11]. The size of these granules obviously depends on the number of attributes used to describe the objects in the original information table and the domain of each attribute. The concepts exposed up to this point constitute the key points of the original theory of “rough sets,” to which methodological adaptations were subsequently made, as already mentioned, to apply it to multi-criteria decisional problems of classification, choice, and ordering. The methodological changes made, which at this point are only hinted at by referring back to the authors who have dealt with them for further study, make it possible to consider the ordinary properties of the criteria and information on preferences and to proceed with the comparison by pairs of “actions” for the future management of citrus fruit production in a country system [14].
The Innovation Strategy for Citrus Crop Prediction …
409
3 Results and Discussion The theory of “rough sets” has been applied in several fields less in the agro-food chain. The research proposes to apply it as a forecasting model using big data from the main sources in the sector. The results represent only an example of the “rough sets” theory applied to a decisional problem related to the characterization of the productions that could be obtained from a territory with big data. The information available with reference to the single attributes is not certain and quantitatively expressible, vice versa it is in qualitative terms, albeit in a rather vague and imprecise manner. On the basis of the data provided by the statistical system and the data provided by the stakeholders, a scheme for the construction of the information table has been elaborated considering that the x units that make up the universe D, reported in the single lines, are constituted by the main types of farms present within the area of interest, which in detail are represented by c1orange; c2-bloodorange; c3-lemon; c4-clementine; c5-mandarin; c6-grapefruit; and c7-other citrus (Fig. 1). The table of information has been constructed considering the attributes that can have the greatest influence on citrus fruit production. To simplify the analysis, five attributes have been considered, four of which are qualitative and one quantitative. In particular, the attributes reported in the single categories are (Table 2) For the forecast of citrus fruit, I therefore consider a set given by U = {clorange, c2-blood orange, c3-lemon, c4-clementine, c5-mandarin, c6-grapefruit, c7othercitrus}, which vary according to the following attributes Q = {Xl, X2, X3, X4, X5, D}, using the following evaluations: VX1 = {weak, medium, strong}, VX2 = Fig. 1 Rough set data analysis of citrus production
Citrus Data Collection
Data processing
Table of information
Table of decision
Decision rules
Analysis of rules
410
A. Scuderi et al.
Table 2 Attribute and scale of value X1
Rain
Weak
Medium
Strong
X2
Temperature
Weak
Medium
Strong
X3
Disease
Low
Medium
High
X4
Growing
Low
Medium
High
X5
Market trend
Low
Medium
High
Table 3 Information and evaluation table U
X1
X2
X3
X4
X5
C1
Strong
C2
Medium
C3
d
Strong
Medium
Low
Low
A
Strong
Medium
Low
Low
B
Weak
Medium
High
Medium
Medium
D
C4
Weak
Medium
Medium
Low
Medium
C
C5
Weak
Medium
Medium
Low
Low
C
C6
Weak
Weak
High
Medium
High
D
C7
Weak
Weak
Medium
Low
Low
C
Source Our elaboration
{weak, medium, strong}, VX3 = (low, medium, high}, VX4 = {low, medium, high}, VX5 = {low, medium, high},Vd = {A, B, C, D}. The production of citrus fruits in a given area will, therefore, be determined in relation to the factors listed above, in relation to the different weights of the individual attributes (Table 3). The results show that each subset P of Q’s attributes generates a subdivision of the U universe that groups objects that have the same description in terms of P’s attributes into equivalence classes, for example, for P = {Xl, X2} we have UIp = {{Cl, C2},{C3, C4, C5},{C6, C7}} and then {C1, C2},{C3, C4, C5}, {6, 7} are the P-elements sets. The reported explanation is composed of only five conditional attributes, so it is a table of information that is quite easy to read, but in the operative reality situations can arise in which there is a much higher number of attributes, so it is useful, in both time and resources terms. The results of this first theoretical approach to the application of the Rough sets theory to the production forecast report how the attributes C = {X1, X2, X3, X4, X5} and the decision attribute D = {d} are decision elements, it is therefore possible to interpret the information table as a table of decisions. The set of rules just described can be appropriately reduced to a minimum set using fewer attributes in each rule. In particular, considering only attribute c1, the following decision rules are reached: (1’) (2’)
if f(C, X1) = strong, then f(C, d) = A; if f(C, X1) = average, then f(C, d) = B;
The Innovation Strategy for Citrus Crop Prediction …
(3’) (4’) (5’)
411
if f(C, X1) = weak, then f(C, d) = C or D; if f(C, X1) = weak and f(C, (4) = low, then f(C, d) = C; if f(C, X1) = weak and f(C, X4) = medium, then f(C, d) = D.
4 Conclusion The articulation of the model, which includes parameters of different kinds, leads us to believe that the theory of “rough sets” in relation to the diffusion of big data can represent an evaluation and forecasting tool, which at the same time includes attributes of different nature, without any hierarchy among them, deriving from databases in different fields (environmental, technical, economic), for the application of which multidisciplinary contributions are needed, all equally essential [1, 12, 21]. The proposed theoretical model represents an initial modeling through the demonstration of how to make usable the innumerable quantity of data that in the future the agri-food system through the digital transformation will be able to provide, to which it will be necessary to give an adequate response in terms of methodological and operational terms [7, 15].
References 1. Chinnici G, Pecorino B, Scuderi A (2013) Enviromental and economic performance of organic citrus growing. Qual Access Success 14:106–112 2. Greco S, Matarazzo B, Slowinski R (1997) Rough approximation of a preference relation by fuzzy dominance relations. In: 1st international a workshop on preferences and decision. Trento, pp 70–72. (5–7/06/1997) 3. Greco S, Matarazzo B, Slowinski R (2002) Rough sets methodology for sorting problems in presence of multiple attributes and criteria. Eur J Oper Res 138:247–259 4. FAO (2019) Digital Technologies in agriculture and rural areas. Briefing paper. In: Nikola M, Trendov S, Meng Z (eds) Food and agriculture Organization of the United Nations. Rome 5. FAO (2020) Agricultural markets and sustainable development: global value chains, smallholder farmers and digital innovations. Food and Agriculture Organization of the United Nations, Rome 6. FAOSTAT (2020) Statistic yearbook–agriculture production 7. Ge L, Brewster CA (2016) Informational institutions in the agri-food sector: meta-information and meta-governance of environmental sustainability. Curr Opin Environ Sustain 18:73–81 8. Jin X, Yang N, Wang X, Bai Y, Su T, Kong J (2019) Integrated predictor based on decomposition mechanism for PM2. 5 long-term prediction. Appl Sci 9:4533 9. Kayumova M (2017). The role of ICT regulations in agribusiness and rural development. World Bank. https://openknowledge.worldbank.org/bitstream/handle/10986/29041/121932-WP-ICT 10. Lee H, Mendelson H, Rammohan S, Srivastava A(2017) Technology in agribusiness: opportunities to drive value. White paper, Stanford Graduate School of Business 11. Matarazzo B, Greco S, Slowinski R (2019) La teoria degli insiemi approssimati. In Strategie, Introduzione alla teorie dei giochi e delle decisioni. Bertino, Gambarelli, Stach (eds), Editore Giappichelli 12. Nakasone E, Torero M, Minten B (2014) The power of information: The ICT revolution in agricultural development. Ann Rev Resour Econ 6(1):533–550
412
A. Scuderi et al.
13. OECD (2019) Measuring the digital transformation. A road map for the future. OECD 14. Scuderi A, Foti VT, Timpanaro G (2019) The supply chain value of POD and PGI food products through the application of blockchain. Qual Access Success 20:580–587 15. Scuderi A, Sturiale L, Timpanaro G (2018) Economic evaluation of innovative investments in agri-food chain. Qual Access Success 19(51):482–488 16. Scuderi A, Sturiale L (2016) Multicriteria evaluation model to face phytosanitary emergencies: the case of citrus fruits farming in Italy. Agric Econ 62:205–214 17. Scuderi A, Zarbà AS (2011): Economic analysis citrus fruit destined to market. Italian J Food Sci 34 18. Sturiale L, Scuderi A. (2011) Information and communication technology (ICT) and adjustment of the marketing strategy in the agrifood system in Italy. In: CEUR workshop proceedings, vol 1152, pp 77–87. (5th international conference on information and communication technologies for sustainable agri-production and environment, HAICTA 2011, Skiathos, Greece) 19. Sturiale L, Timpanaro G, La Via G (2017) The online sales models of fresh fruit and vegetables: Opportunities and limits for typical Italian products. Qual Access Success 18:444–451 20. USDA (2020) Global citrus market analysis 21. Weis T (2007) The global food economy: The battle for the future of farming. Ed. Zed Books
Predicting Traffic Path Recommendation Using Spatiotemporal Graph Convolutional Neural Network Hitendra Shankarrao Khairnar and Balwant Sonkamble
Abstract Vehicle navigation is mainly used in path recommendations for selfdriving and travel. It also plays an increasingly important role in people’s daily trip planning. After referring to existing literature, the authors found that algorithms for vehicular path recommendations have attracted substantial attention. The available path recommendation algorithms furnish the shortest distance or shortest journey time-based traffic paths only. But the algorithms neglect current traffic parameters present at a specific location and at a specific time of a day. A spatiotemporal graph is used to represent the road traffic network at a specific location and it also makes use of time. To recommend a traffic path across the origin to the destination, the paper offer a novel approach based on neural network and spatiotemporal graph convolutional network (STGCN) framework. The proposed framework learns the topological structure of a road traffic network for spatial reliance and dynamic change of traffic parameter data for temporal dependence. Based upon the learning of previous time instances, aggregated graph is prepared which is used for recommendation of the shortest path. The experimental results show that the STGCN framework processes spatiotemporal correlations of traffic data and predict the traffic parameter values for optimal path recommendation across Origin - Destination location (OD pair). Keywords Traffic path recommendation · Deep learning · Spatiotemporal graph convolutional network · Shortest path
H. S. Khairnar (B) Research Scholar PICT, Cummins College of Engineering, Pune, India e-mail: [email protected] B. Sonkamble Pune Inistitute of Computer Technology, Pune, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_36
413
414
H. S. Khairnar and B. Sonkamble
1 Introduction Nowadays, the present transportation system focus has been shifted from the deployment of physical system capacity to improving operational efficiency using informed intelligence. Motion sensors and cameras record road traffic details in uninterrupted fashion from various road segment across the path. Vehicular traffic data prediction is a process of analyzing traffic parameters. Such forecasting provides a basis for transportation service traffic path recommendation as well as journey efficiency improvement. However, accurate path prediction has always been a challenging job because of the complex spatial and temporal nature of traffic data [1]. The primary goal of the spatiotemporal analysis is the prediction of traffic parameter values. Author proposes a STGCN framework to make use of spatial data collected at different locations across the path, framework models the road traffic network. Proposed model uses a general graph along with a convolution structure for temporal dependencies. The convolution structure is a blend of graph layers along with chain of learning layers. Framework predicts traffic path recommendation for OD pair using an optimal pathfinder technique. The remainder of this paper is organized as follows. Section 2, reviews work related. Section 3 introduces the preliminary. Section 4 discusses the model for a vehicular path recommendation. The methodology is presented in Sect. 4.3. Section 5, presents experiment and results. Finally, Sect. 6 concludes paper.
2 Related Work The need for traffic parameter predictions has motivated research efforts at forecasting fundamental traffic parameter such as traffic flow, speed, journey time across origin, and destination (OD) pair. These parameters are generally selected to keep an eye on the status of vehicular traffic and to predict future values. Due to the incremental interest of various businesses in the advanced transportation system, there is a need for an optimal path recommendation based on multiple traffic parameter information over the large dynamic road network in real-time. Hence, the road traffic path recommendation has received a lot of attention. Nicholas G. et al. developed a deep learning model for traffic flow prediction with the sequence of tanh layers and l1 regularization technique [2]. Tan et al. demonstrated usage of statistical learning algorithms to justify improvement in traffic flow prediction using the aggregation method [3]. Yishng et al., used spatiotemporal correlation based on deep learning method for traffic flow prediction [4]. For traffic flow prediction, Huang et al. introduced deep belief network (DBN) for an unsupervised feature training [5]. Variability of approximated journey speed and actual journey speed for various traffic status are examined with the help of mean of measured speed at multiple data collection locations, and vehicle’s onboard devices [6]. Jenelius et al. suggest multivariate probabilistic principal component analysis model for journey time prediction
Predicting Traffic Path Recommendation Using Spatiotemporal Graph Convolutional …
415
and vehicle routing [7]. To exploit the relationship between speed and flow, Balazs et al. developed optimization method to find the best speed, handle energy utilization of individual vehicle and it’s impact on road traffic flow [8]. To study the impact of weather conditions on journey time and traffic flow, Koesdwiady et al. incorporates DBN-based weather conditions and traffic parameter values prediction [9]. For traffic path recommendation, a predictive congestion minimization A* algorithm has a logic for vehicle selection and diversion in case of congestion across an OD pair [10]. Weizi et al. tested GPS-based framework to approximate road traffic parameters across the OD pair. Such a framework handles spatial-temporal traffic data sparsity [11]. To recommend a traffic path across OD pair, the preference of the user refers to known choices. Paolo et al. introduces favourite route recommendation, which is the Bayesian learning-based way to suggest a person-specific circumstances aware route across the OD pair [12].
3 Representing Road Traffic Network Features Using Graphs Def 1: A path g1:k is a succession of k data collection locations through which vehicle movement takes place. At each location 3-tuple data (gk .t,gk .s, and gk .f ) indicating journey time, speed, and flow is recorded. In this study, it is assumed that 3-tuple data is recorded at 15 min time interval. Def 2: A road network is a directed graph G(V, E), where V is a set of data collection locations and E represents link references to connect data collection locations. Each link reference e ∈ E is a 4-tuple (e.id, e.l, e.start, and e.end) consisting of its alpha numeric key, length, start point, and endpoint. Def 3: A road segment h 1:k connecting data collection locations corresponding to path g1:k . Each link reference ek ∈ E represents the road segment of path g1:k . Def 4: A time-evolving graph G˜ = {G (1) ,...,G (T ) } that is, G (t) = (V (t) , G (t) , W (t) ), t = 1,..T, that can be presented as a series of time-evolving adjacency matrices, A(1) ,...,A(T ) , which aggregates the traffic parameter values from time instance 1 to t. Fig. 1 shows road traffic network representation using graph.
Fig. 1 Graph-structured road traffic data[13]
416
H. S. Khairnar and B. Sonkamble
4 Model for Vehicular Path 4.1 Problem Definition To design a framework for prediction of weights of link, references for shortest traffic path recommendation using time instance embedding of collected traffic data and STGCN. Predicted link references are processed with the help of Dijkstra’s algorithm to recommend shortest path across OD pair. There is a need to predict traffic path based on weight of link references collected at different time instances. To optimize prediction, there is a need to design a framework based on STGCN.
4.2 Link Prediction Model In the proposed framework, link references h k are recommended based on speciotemporal graph of previous j number of traffic data collection locations across path h k− j to current location. Then the link reference h k is predicted. Link prediction based on transition probability is given by [14] p(h (k+1) |h k ) =
p(τ ) p(h (k+1) |h k , τ )
(1)
j
where, τ is a sequence of connected road segment that starts at h k , that is τ = { e1 → e2 → .... → destination }, where e1 = h k , ei .end = ei+1 .start and so on. p(τ ) can be described by edge transition probability p(ei → ei+1 ) as follows: p(τ ) =
Destination
p(ei → e(i+1) )
(2)
i=1
where an edge transition probability p(ei → ei+1 ) can be estimated from historical traffic data at previous locations as follows: p(ei → ei+1 ) =
N (ei → ei+1 ) + 1 j N (ei → ei+1 ) + N j
(3)
where N (ei → ei+1 ) is the number of times link reference from ei to ei+1 predicted and N j is the number of link references connected from ei . This is known as add-one smoothing training process [15].
Predicting Traffic Path Recommendation Using Spatiotemporal Graph Convolutional …
417
Fig. 2 An illustration of the proposed framework
4.3 Methodology 4.3.1
Proposed STGCN Framework
The proposed framework learns traffic parameter values for context sampling to be trained on different graphs. Figure 2 is about an overview of STGCN framework, where spatial context and temporal context are extracted. With the help of graphs at various times instances, the process of traffic parameter value prediction is as follows: 1. Estimation of the co-occurrence matrix of a time-evolving graph 2. The spatial and temporal attention module and node embedding are jointly trained 3. The prediction of a set of link references for the recommendation of the traffic path takes place in two stages (a) Aggregated values of parameter are predicted for a given graph using STGCN for future time instances based upon past data (b) Predicted values of parameter are fed to the shortest path finding algorithm leading to computation of an optimal path
418
H. S. Khairnar and B. Sonkamble
Table 1 Sample rows of the data set provided by Highways, England [17] LinkRef Date TP(0–95) AvgJT AvgSpeed(km/h) Link length AL215 AL215
4.3.2
2015-0210 2015-0210
Flow
67
305.47
105.12
8.920
286.50
68
289.30
111.00
8.920
291.63
Shortest Path Recommendation
Multiple paths may exist across OD pair. Depending upon contextual features and predicted set of link references, compute number of paths across OD pair. Dijkstra’s algorithm is used to recommend shortest path across OD pair [16].
5 Experiment and Results 5.1 Data Set: Monthly Strategic Road Network (SRN) The experiment uses an open-source data set of highways in England, which is collected by recognition of vehicle number plates and other data collection mechanisms [17]. The data collection system continuously acquires traffic information over 2511 link references connecting major ‘A’ roads in England. The data set contains traffic information collected at an interval of 15 min. The data set comprises of Link reference, link description, Average journey time, Average speed, link length, and traffic flow. Data set snapshot is as shown in Table 1.
5.2 Experimental Settings Experimentation use tests for time instance embedding records of 12 historical observed road traffic parameter values each of 15 min. These recorded values were used to prepare time-evolving graph, which are fed to STGCN to forecast traffic parameters values for the next 45, 75, and 90 min. Experimental setup consist of Intel Core i7-7700HQ CPU of 16 GB RAM with GeForce GTX 1070/PCle/SSE2 CUDA card of 8 GB RAM. Linux platform is used with Python 3.6 and Tensor-Flow 1.9.0 Library.
Predicting Traffic Path Recommendation Using Spatiotemporal Graph Convolutional … Table 2 Prediction assessment Forecasting horizon (minutes) MAE 45 75 90
0.200 0.182 6.071
419
MAPE (%) 0.235 0.195 6.460
5.3 Evaluation Metrics and Baseline The workflow of evaluation of the proposed prediction of path recommendation system works in two stages. First, a training set is fed to STGCN framework and generated new predictions. To evaluate the correctness of the framework, a test is conducted on the learned model. To measure and evaluate the prediction output, mean absolute error (MAE) [1] and mean absolute percentage error (MAPE) [1] is deployed. These predictions with known paths (true value) are then used as an input to an evaluation technique to produce evaluation results. n 1 |et | M AE = n t=1
n 100% et M AP E = n t=1 yt
(4) (5)
We use the following metrics for testing the proposed system.
5.3.1
Mean Average Precision at K (MAP@k) and Mean Average Recall at K (MAR@k)
The proposed system produces an ordered list of recommended paths for each OD pair in the test set. MAP@k gives insight into how relevant the list of recommended paths are, whereas MAR@k gives insight into how well the recommender can recall all the paths that exist (ground truth or gold standards) and rated positively in the test set. As a benchmarking method, we used the auto regressive moving average (ARMA) with the Dijkstra’s shortest path method.
5.4 Results We report prediction assessment of the proposed method in Table 2. We observed that for small prediction horizon of 45 and 75 min, MAE and MAPE were small. For prediction horizon of 90 min MAE and MAPE were 6.07 and 6.460% respectively.
420
H. S. Khairnar and B. Sonkamble
Table 3 Path recommendation for OD pair A181–A135 Method MAP@5 (%) Proposed technique ARMA
75.0 50.0
MAR@5 (%) 87.5 33.3
Table 4 Path recommendation for multiple OD pair OD pair Links to traverse for Shortest path cost Shortest path unita A620–A3121 A5–A52 M53J12–A5094
A174–A66
A620, A1, A52, A511, A38, A3121 A5, A453, A52 M53J12, A55, A51, A5, A453, A52, A1, A66, A595, A5094 A174, A19, A1130, A66
Number of paths
893.00
210
102.084 559.597
238 4668
143.822
12
a We
are focusing on 3 traffic parameters simultaneously, shortest path unit is cumulative effect of journey time, speed, and flow
For OD pair A181-A135, there are multiple paths and one of them is recommended as the optimal path. We report MAP@5 and MAR@5 for the prediction with the proposed method and a benchmarking method ARMA with Dijktras shortest path algorithm in Table 3. We report the MAP@5 of 75% and MAR@5 of 87.5% for the proposed method. As the benchmarking method of ARMA with Dijkstras don’t capture the non-linearity of data, the performance is weak. Table 4 shows the recommended shortest path for various OD pairs and corresponding links to traverse for the shortest path.
6 Conclusion In this paper, a novel framework comprising of STGCN and Dijktras algorithms is proposed for the prediction of a set of link references and the recommendation of the shortest vehicular traffic path across an OD pair. Due to the growth of road traffic infrastructure network and traffic monitoring, in the field of transportation system, there is a need for an optimal path recommendation based on multiple traffic parameter information. STGCN framework predictions are based on transition probability of link prediction model. The recommended shortest path gets modified as per time instance of the day and traffic parameter values. In future, we would explore different features of aggregation methods and their impact on the proposed method.
Predicting Traffic Path Recommendation Using Spatiotemporal Graph Convolutional …
421
References 1. Zhao L, Song Y (2020) T-GCN: a temporal graph convolutional network for traffic prediction. IEEE Trans ITS 21(9):3848–3858 (Sept 2020). https://doi.org/10.1109/TITS.2019.2935152 2. Polsona NG, Sokolov VO (2017) Deep learning for short-term traffic flow prediction. Trans Res Part C: Emerging Tech 79:1–17 (June 2017). https://doi.org/10.1016/j.trc.2017.02.024 3. Tan M-C, Wong SC, Xu J-M (2009) An aggregation approach to short-term traffic flow prediction. IEEE Trans ITS 10(1):60–69. https://doi.org/10.1109/TITS.2008.2011693 4. Lv Y, Duan Y (2015) Traffic flow prediction with big data: a deep learning approach. IEEE Trans ITS 16(2):865–873 (Apr 2015). https://doi.org/10.1109/TITS.2014.2345663 5. Huang W, Song G (2014) Deep architecture for traffic flow prediction: deep belief networks with multitask learning. IEEE Trans ITS 15(5):2191–2201 (Oct 2014). https://doi.org/10.1109/ TITS.2014.2311123 6. Kim H, Kim Y (2017) Systematic relation of estimated travel speed and actual travel speed. IEEE Trans ITS 18(10):2780–2789 (Oct 2017). https://doi.org/10.1109/TITS.2017.2713983 7. Jenelius E, Koutsopoulos HN (2018) Urban network travel time prediction based on a probabilistic principal component analysis model of probe data. IEEE Trans ITS 19(2):436–445 (Feb 2018). https://doi.org/10.1109/TITS.2017.2703652 8. Nemet B, Gaspar P (2017) The relationship between the traffic flow and the look-ahead cruise control. IEEE Trans ITS 18(5):1154–1161 (May 2017). https://doi.org/10.1109/TITS.2016. 2599583 9. Koesdwiady A, Soua R (2016) Improving traffic flow prediction with weather information in connected cars: a deep learning approach. IEEE Trans Veh Tech 65(12):9508–9517 (Dec 2016). https://doi.org/10.1109/TVT.2016.2585575 10. Backfrieder C, Ostermayer G (2017) Increased traffic flow through node-based bottleneck prediction and V2X communication. IEEE Trans ITS 18(2):349–363 (Feb 2017). https://doi. org/10.1109/TITS.2016.2573292 11. Li W, Nie D (2017) Citywide estimation of traffic dynamics via sparse GPS traces. IEEE Trans ITS Mag 9(3):100–113 (July 2017). https://doi.org/10.1109/MITS.2017.2709804 12. Campigotto P, Rudloff C (2017) Personalized and situation-aware multimodal route recommendations: the FAVOUR algorithm. IEEE Trans ITS 18(1):92–102 (Jan 2017). https://doi. org/10.1109/TITS.2016.2565643 13. Han Y, Wang S (2019) Predicting station-level short-term passenger flow in a citywide metro network using spatiotemporal graph convolutional neural networks. Int J Geo-Information 8(243):1–24. https://doi.org/10.3390/ijgi8060243 14. Taguchi S, Koide S, Yoshimura T (2019) Online map matching with route prediction. IEEE Trans ITS 20(1):338–347 (Jan 2019). https://doi.org/10.1109/TITS.2018.2812147 15. Manning CD, Raghavan P, Schuetz H (2008) Introduction to information retrival. Cambridge University Press, Cambridge, U.K 16. Khairnar HS, Sonkamble BA (2020) Aggregated time series based vehicular traffic path recommendation. In: 2020 5th ICCCS, Shanghai, China, pp 191–195. https://doi.org/10.1109/ ICCCS49078.2020.9118575 17. England H, Highways agency network journey time and traffic flow data, Technical report. https://data.gov.uk/dataset/dc18f7d5-2669-490f-b2b5-77f27ec133ad/highwaysagency-networkjourney-time-and-traffic-flow-data
Machine Learning and Context-Based Approaches to Get Quality Improved Food Data Alexander Muenzberg, Janina Sauer, Andreas Hein, and Norbert Roesch
Abstract The provision of high-quality food data presents challenges for developers of health apps. There are no standardized data sources with information on all food products available in Europe. Commercial data sources are expensive and do not allow long-term storage, whereas open data sources from communities often contain inconsistent, duplicate, and incomplete data. In this thesis, methods are presented to load data from multiple sources via extract, transform, and load process into a central food data warehouse and to improve the data quality. Data profiling is used to detect inconsistencies and duplicates. With the help of machine learning methods and ontologies, data is completed and checked for plausibility using similar datasets. Via a specific API, an usage context can send to the central food data warehouse together with the search word to be queried. The API send a response with the food data results which were checked based on the context and provides further information as to whether the quality of the result data is sufficient in the respective context. All developed methods are tested using linear sampled test data. Keywords Food data warehouse · Machine learning · Data mining · Context-based data analysis · Data quality improvement
A. Muenzberg (B) · J. Sauer · N. Roesch University of Applied Science Kaiserslautern, Amerikastr. 1, 66482 Zweibrücken, Germany e-mail: [email protected]; [email protected] J. Sauer e-mail: [email protected]; [email protected] N. Roesch e-mail: [email protected] A. Muenzberg · J. Sauer · A. Hein Carl von Ossietzky University Oldenburg, Ammerländer Heerstr. 114-118, 26129 Oldenburg, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_37
423
424
A. Muenzberg et al.
1 Introduction The development and application of nutrition-related health apps is hindered by the limited availability of plausible data on ingredients and nutritional values of food products. There are no standardized data sources that cover all food products available in Europe [1]. This data is particularly important if, for example, a doctor or therapist wants to use electronic nutrition diaries to identify a possible correlation between the consumed foods and occurring symptoms [1, 2], as this requires trustworthy information on ingredients and nutrients. Even though there are several commercial vendors and providers of open source data that output data sets via a custom Application Programming Interface (API) [3] or as database dumps, the use of such data introduces additional challenges for the application developer. Open source food data has often been collected by Internet communities, and therefore, due to deficient monitoring and verification of the data collections, inconsistencies, duplicates, and errors occur [4]. The data sets of commercial providers are usually checked and corrected more accurately. Nevertheless, such data are often subject to licensing models that make their use in research difficult. The inclusion of such, usually expensive, data in research projects can lead to the fact that following commercial use of the research results may become unprofitable. In addition, their providers often do not allow long-term storage of the data for copyright reasons. But many medical professionals are obliged to store patient data for several years, which is why such intermediate storage is a necessity [5]. Since the research project “Digital Services in Dietary Counseling” (DiDiER) [6], funded by the Federal Ministry of Education and Research in Germany (BMBF), the University of Applied Sciences Kaiserslautern (HSKL) has been developing a central Food Data Warehouse, which combines different open source food data sources or data provided voluntarily by suppliers and food manufacturers. For this purpose, the developed data warehouse offers an application interface (API) [3, 4, 7] based on researched information, which allows the requesting app to specify the respective context of data usage and thus enabling specific verification methods. In previous publications by the author of this publication [4], it has already been explained how the datasets were extracted from the various data sources using Extract, Transform, and Load processing (ETL) and stored in a central food data warehouse. Using data profiling and machine learning methods, the data were checked for inconsistencies, faulty datasets, duplicates, and missing values and, if necessary, adjusted [4]. For a specific dataset similarity analysis, clustering methods and text mining techniques, as well as natural language processing methods were tested [7]. With the help of the determined similarities between the datasets, missing values can be completed or the plausibility of data sets can be determined. By creating ontologies and detecting hidden dependencies, further inconsistency checks could be performed.
Machine Learning and Context-Based Approaches …
425
2 State of the Art 2.1 Food Data Sources Currently, 173 761 food data records are stored in the FDWH. The FDWH stores botnatural unpackaged food data (e.g., tomatoes, bread, etc.) and packaged food of a specific brand. In addition to information on product name and nutritional values, the FDWH data records contain additional information on the list of ingredients, quantity information, packaging information, traces, and allergens subject to mandatory labeling, brand name, origin, and Global Trade Item Number (GTIN or European Article Number (EAN)). The GTIN, also formerly known as EAN, is a 13-digit code that indicates the origin, manufacturer, and product in coded form as a sequence of numbers. This sequence of digits can be represented as a two-dimensional code (2D barcode). Most of the data of the different data sources are based on the Wiki principle. A community of volunteers maintains the data, corrects them if errors are found, and completes them. Nevertheless, many of the data contain inconsistencies and missing values or duplicate values, because the verification is very complex and on a voluntarily basis, often not sufficiently intensive. In the following, the data sources from which data is contained in the FDWH are listed, together with the number of datasets in each source [4]. • German Federal Food Key (BLS, in German: Bundeslebensmittelschlüssel, from the Max Rubner Institue (MRI) Karlsruhe), 14814 Datasets • WikiFood.eu (WF), 64099 Datasets • das-ist-drin.de (DID), 996 Datasets • OpenFoodFacts.org (OFF), 63776 Datasets • Danone GmbH (DA), 94 Datasets • FoodRepo.org (FR), 29982 Datasets.
2.2 ETL Process Through the ETL process known from data warehousing, data is extracted from its individual sources and must then be transformed into a uniform format and data structure. The datasets from the different sources are available in different data formats. In addition, the data may contain different attributes and different data types for the same information. The ETL process splits the extracted data, converts it into the format and data types of the FDWH and then stores it there centrally [4]. Both after and during each execution of the ETL process, inconsistent, missing or duplicate values are detected and cleaned or removed using data profiling techniques.
426
A. Muenzberg et al.
2.3 Data Profiling Data profiling is a process to obtain information about the consistency, structure, and type of data. Data is analyzed for lacking datasets, incorrect datasets or duplicates [8]. The data profiling process must be performed during and after each ETL process to always be able to identify the inconsistencies of the latest data. It detects erroneous data records which are corrected for better quality. And it detects missing data or duplicates that need to be completed or removed [8]. Data profiling includes general tasks like recognition of patterns and data types, outlier detection and characterizing missing and default values, and the data rule analysis (for example, detecting values that match certain regular expressions [9]). Furthermore, analysis of column properties (an analysis performed by checking all values in a column and determining whether the values are valid or not), the analysis of value dependencies across columns, and the detection of functional dependencies or foreign key dependencies belongs to data profiling [10, 11].
3 Machine Learning Approach for Quality Improvement The FDWH supplies a large number of food datasets. In order to detect erroneous datasets, an approach was chosen to compare similar foods with each other and thus uncover contradictory information. In this way, methods were chosen to identify similar foods as best as possible.
3.1 Ontologies for Information Retrieval To identify the relationships between attributes (and their information) in the FDWH, an ontology model was created (see Fig. 1). As shown in the ontology model, there is a closer connection between ingredient list and nutrient list, between ingredient list and allergenic information and between ingredient list, respectively, nutrient list and food category. When one of the ingredients of a product changes, the nutritional values on the nutrient list also change and possibly the allergenic information changes. The data sources BLS and WF have assigned their food data sets to categories. These categories were mapped to FDWH’s own categories. For example, the BLS category “Fruits, Fruit and Fruit Products/Citrus” and the WF category “Citrus” were mapped to the FDWH category “Fruit and Fruits/South and Citrus”. Thus, during the ETL process categories could be assigned to the data of different data sources.
Machine Learning and Context-Based Approaches …
427
Fig. 1 Ontology model with the relationships between attributes in the FDWH
3.2 Similarity Analysis To recognize two similar foods, both must be compared in their different attributes. First, the assumption has been established that if two foods have similar names and are similar in their composition, they show an overall similarity. Therefore, food names and ingredients or nutrition facts (or both) have to be compared. These different attributes are also subject to different data types. Consequently, different approaches to similarity analysis must be used. Similarity Analysis in Food Names and in Ingredient Lists. Both food name and ingredient list are stored as strings in the FDWH. The cosine similarity equation [12] can be used to compare strings with several words separated by spaces or punctuation marks. The cosine similarity is calculated as follows: n x = n i=1
xi yi
i=1
X i2
n i=1
(1) yi2
For the use of the equation in (1), the strings to be compared are divided into lists of words, list A and list B, as mentioned above, based on their spaces and
428
A. Muenzberg et al.
punctuation marks. Furthermore, E-numbers (names for additives, e.g., E 150a-d for caramel color) are converted into their respective additive names in the words by specific dictionaries. All other numeric characters and special characters (e.g., #) are removed from the words. Stop words are removed from the word lists. These are, for example, filler words like “and” and “with” which do not contribute any information to the food names or ingredient lists. The next step is to convert the words to their root words (e.g., peaches to peach). This is done with the stemming algorithm Snowball Stemmer [13]. With the help of another dictionary, synonyms of words which have the same meaning but a different designation, such as groundnut for peanut, are converted into their basic word which is recorded in a standardized list. Finally, A and B contain standardized words in their basic form, which represent either a part of the food name or an ingredient, depending on what is to be compared. Then another list V of the union of A und B will be built. x and y of Eq. (1) are vectors of the same length as V . The following applies: ∀vi ∈ V, i = 1, 2, . . . , |V |, xi = ∀vi ∈ V, i = 1, 2, . . . , |V |, yi =
0, vi ∈ A / A 1, vi ∈
(2)
0, vi ∈ B / B 1, vi ∈
(3)
Now the vector x at position i contains 0 if the word in V at position i is not contained in the list A and 1 if the word is contained in A. Likewise, the vector y at position i contains 0 if the word in V at position i is not contained in list B and 1 if the word is contained in B. Using the vectors x and y, the equation in (1) can now be used to determine the cosine similarity of both word lists A and B and thus the similarity of the strings to be compared. The value of the cosine similarity is in the range between 0 and 1, where 1 means that the strings are identical and 0 means that there is no similarity. As an example, for the similarity analysis of food names, the two strings “Cola Light with 0% Sugar” and “Cola Zero without Sugar” are given. First, the two word lists (after removing special characters, numeric characters and stop words), A = {Cola, Light, Sugar} and B = {Cola, Zero, Sugar}, are created. The union set results in V = {Cola, Light, Sugar, Zero}. Furthermore, the two vectors x = [1, 1, 1, 0] and y = [1,0 ,1, 1 ] are created. After calculating the Cosine Similarity, the result is a value of 0.67. This represents a similarity of 67%. For the similarity analysis of ingredient lists, the following example is used for explanation. The following strings, which contain ingredients, are to be compared. 1. 2.
„sugar, cocoa butter, cocoa mass, whole milk, almonds (7.7%), sugar, butterfat, emulsifier: lecithin (soya), natural aroma“ „sugar, cocoa butter, milk, cocoa mass, butterfat, hazelnuts, emulsifier (soya lecithin), aroma (vanillin),cocoa: 30% minimum“ This results in the following modified word lists:
Machine Learning and Context-Based Approaches …
429
A = {sugar, cocoa, butter, cocoa, mass, milk, almond, sugar, butterfat, emulsifier, lecithin, soya, aroma} B = {sugar, cocoa, butter, milk, cocoa, mass, butterfat, hazelnuts, • emulsifier, soya, lecithin, aroma, vanillin, cocoa} V = {sugar, cocoa, butter, mass, milk, almond, butterfat, emulsifier, • lecithin, soya, aroma, hazelnuts, vanillin}
•
According to rules (2) and (3), the vectors x = [1,1,1,1,1,1,1,1,1,1,1,0,0] and y = [1,1,1,1,1,0,1,1,1,1,1,1,1] are formed. The calculation of the cosine similarity results in a value of 0.87 and thus a similarity of 87%. Similarity Analysis in Nutrition Facts. The nutritional data of the FDWH are stored as floating-point numbers. Therefore, different approaches for the similarity analysis were chosen here. The product data sets should be classified into classes (categories) according to their nutritional values. These classes can be compared with each other. With the data mining method of clustering [14], foods can be divided into clusters based on their nutritional values. In order to determine the similarity of the foods, these clusters can then be compared with each other and be used to identify the category of food data. Several categories can be assigned to each cluster. By comparing the categories of a cluster with the ingredients or food names of the respective food data set, the category of many foods can be specified. The nutrients: fat, carbohydrates, and proteins were selected for the clustering method because these values are most present in all available data sources and, according to the findings of the methods performed, these nutrients can most efficiently be used to identify the food data set, together with other information such as the food name. The K-Means [14] method was chosen as the clustering algorithm. This algorithm is a good approach to clustering attributes whose values are near each other. In addition, a similar approach was found after a literature review in [15] which gave good results. The difficulty in using K-Means Clustering arises in defining the number of clusters (the number of K) into which the data sets should be divided. In the end, the decision was finally made to use K-Means. The choice of K was initially based on the approach in [14], which works with 20 clusters. The number of 20 clusters also corresponds with the number main categories in the BLS data source (highest hierarchical level of categorization, e.g. bakery products, meat, etc.). As a test, the food data were also classified into K ± 1, K ± 2 a,nd K ± 3 clusters, but the number of similar categories in each cluster decreased and the number of dissimilar categories increased, so K = 20 was retained. To better represent the achieved clusters, the dimension of the nutritional values for the K-Means method was reduced using principle component analysis (PCA). As described in [16], fat and protein values correlate significantly in the same direction. Carbohydrates have been shown to correlate in the opposite direction to fat. The PCA method reduces the values of fat, protein, and carbohydrates to their orthogonal main components. Thus, the three original value attributes were transformed into the two attributes PCA Dimension 0 and PCA Dimension 1 [16, 17]. With the help of the two PCA dimensioned attributes, the clustering procedure has now been applied. By
430
A. Muenzberg et al.
Fig. 2 Plot of the K-Means clusters from the BLS data
K-Means the cluster centers are determined. The clustering algorithm assigns one data sector to just one cluster using the Euclidean distance [14]. In Fig. 2, the clusters formed from the BLS data are plotted. Overall Similarity Analysis. In order to determine the overall similarity of two foods to be compared, the rule set shown in Table 1 was developed by evaluating 100 test data, which includes the similarity values and clustered classes obtained above. The rule sets are only applied to data sets whose food names have already been classified as similar. This is done to check all 173,000 + data sets against each other for performance reasons, so that only food data sets with similar sounding names are checked. With the rulesets in Table 1, it can now be determined whether a food data set is plausible if another data set was found which is similar to the first data set in name and has similar compositions. As can be seen in this table, missing values from one data set can be taken from another similar data set (pin := p’in or pcl:= p’cl, pcarb:= p’carb , pfat:= p’fat, pprot:= p’prot , pen_kcal:= p’en_kcal, ). This is the case, for example, if food name and ingredient list of another record are very similar. Then the nutritional values are taken over. Or if food name and nutrition facts are very similar and the ingredient list is used. In the FDWH database, transferred data is specially marked so that it is clear for the user that the data may not match 100%. This will be communicated to the requesting application via the API (see Chap. 4).
Machine Learning and Context-Based Approaches … Table 1 Rulesets of the overall similarity analysis
431
Rulesets pcl = Ø p’cl = Ø pcl = p’cl pin = Ø p’in = Ø αin > 0.4 αna > 0.4 ⇒ nu := 2, in:= 2, y := 4 pcl = Ø p’cl = Ø pcl = p’cl pin = Ø p’in = Ø αin > 0.4 αna ≤ 0.4 ⇒ nu := 1, in:= 1, y := 2 pcl = Ø p’cl = Ø pcl = p’cl pin = Ø p’in = Ø αin ≤ 0.4 αna > 0.4 ⇒ nu := 2, in:= 0, y := 2 pcl = Ø p’cl = Ø pcl = p’cl pin = Ø p’in = Ø αin ≤ 0.4 αna ≤ 0.4 ⇒ nu := 1, in:= 0, y := 1 pcl = Ø p’cl = Ø pcl = p’cl pin = Ø p’in = Ø αna > 0.4 ⇒ nu := 2, in:= 0, y := 2, pin := p’in pcl = Ø p’cl = Ø pcl = p’cl pin = Ø p’in = Ø αna ≤ 0.4 ⇒ nu := 1, in:= 0, y := 1 pcl = Ø p’cl = Ø pcl = p’cl p’in = Ø αna > 0.4 ⇒ nu := 2, in:= 0, y := 2 pcl = Ø p’cl = Ø pcl = p’cl p’in = Ø αna ≤ 0.4 ⇒ nu := 1, in:= 0, y := 1 pcl = Ø p’cl = Ø pcl = p’cl pin = Ø p’in = Ø αin > 0.4 αna > 0.4 ⇒ nu := 0, in:= 2, y := 2 pcl = Ø p’cl = Ø pcl = p’cl pin = Ø p’in = Ø αin > 0.4 αna ≤ 0.4 ⇒ nu := 0, in:= 1, y := 1 pcl = Ø p’cl = Ø pin = Ø p’in = Ø αin > 0.4 αna > 0.4 ⇒ nu := 0, in:= 2, y := 2, pcl:= p’cl, pcarb:= p’carb , pfat:= p’fat, pprot:= p’prot , pen_kcal:= p’en_kcal, pcl = Ø p’cl = Ø pin = Ø p’in = Ø αin > 0.4 αna ≤ 0.4 ⇒ nu := 0, in:= 1, y := 1 p’cl = Ø pin = Ø p’in = Ø αin > 0.4 αna > 0.4 ⇒ nu := 0, in:= 2, y := 2 p’cl = Ø pin = Ø p’in = Ø αin > 0.4 αna ≤ 0.4 ⇒ nu := 0, in:= 1, y := 1
432
A. Muenzberg et al.
For a product dataset p, pin and pcl describe the list of ingredients and cluster name of the product dataset. pcarb , pfat , pprot , and pen_kcal describe the nutritional values of the carbohydrate, fat, protein, and energy data set in kilocalories. p’ describes the potentially similar dataset that is compared to p. αna and αin represent the cosine similarity value of the food name and the ingredient list. Ø indicates that the value is not set, that means the NULL value is contained. The character stands for the logical AND in the respective rule. in and nu are the plausibility values of ingredient list similarity and nutrition fact similarity and y is the total similarity value. If the total similarity value contains y = 4 (in = 2 and nu = 2), this is because the two datasets are very similar to each other. y = 2 (in = 1 and nu = 1 or one of the two values is 2) means the datasets are probably similar. A total value of y = 1 (one of both plausibility values is 1) means that the datasets are possibly similar to each other. In all other combinations of values, which are not listed in Table 1, the total similarity value is y = 0 and the datasets are not similar.
4 Context-Based Response from the API The API of the FDWH accepts food data requests in the JavaScript Object Notation Format (JSON) and returns the response via the same data format. The requesting application must identify itself to the service via an id and password hash and then sends a search query or a GTIN to the FDWH. Optionally a maximum number of result pages can be specified. The user receives a list of the food datasets found for the search query with id, GTIN, food name, brand, and origin. In a further step, a detailed request for a specific dataset can be sent to the API using the product ID. Additionally, it is possible to include a focus and context value with the detailed request. The focus value indicates which attributes of the FDWH are of greater importance in the respective application context. The context value contains a specific string that specifies the value that is relevant in the respective application context. For example, if an app in the context of nutritional advice wants to know whether a certain food contains nuts, the focus value specifies the attribute ingredient list. The context value then contains the string “nuts”. The FDWH API checks the information and returns whether the food contains nuts, is not contained or cannot be specified. It also indicates whether this information is taken from a plausible ingredient list or from other data sets. Likewise, nutritional values can be given as a focus to get a certain value in a nutritional context. With the help of the context-based query, a food dataset that does not contain a plausible list of ingredients, for example, can be used for an application without any concerns, if the context mentioned above is only about information about certain nutritional values. This also applies in the opposite case.
Machine Learning and Context-Based Approaches …
433
5 Results So far, a review of the described procedures for similarity analysis and completion of the data has been performed with 60 data sets. The test data sets were randomly selected by linear sampling across all data of the FDWH. The following results were obtained. Nineteen data sets had the maximum plausibility value y = 4 and were, therefore, considered as very plausible. These data did not need to be completed. In 21 datasets, 5 were classified as plausible in the attributes nutrition facts and ingredient lists (y = 2), 11 data sets were classified as similar only in the attribute nutrition facts ,and 5 data sets only in the attribute ingredient lists. For 3 of the data sets, the missing ingredient list was taken from a similar data set. In two of the data sets, minerals and vitamins were adopted. The three characteristic nutrients carbohydrates, fat, and proteins were not missing in any of the data sets analyzed. The plausibility of 11 data sets cannot be determined correctly (y = 1). These must be checked manually. Nine data sets were classified as not plausible. No similar foods were found for these datasets. A manual check of the data is necessary. The evaluation has shown that similarity analysis can make a good decision between plausible and implausible data. All test data were manually checked and researched to determine the extent to which the information in the data sets was correct. The values classified as plausible were not objected. In 5 cases, values in the data sets could be completed automatically. In 6 datasets the information of the ingredients list was missing. In 3 datasets, this information could be taken from a similar dataset. In two-thirds of the data sets, no plausibility of the data could be determined, because either there was no sufficient similarity with other foods or no similar data sets were found. Therefore, no further completion of data was possible. Such data sets must be checked manually for quality.
6 Conclusion and Discussion The results in Chap. 5 demonstrate that the described procedures achieve good results in the detection of similarities between products. With data mining methods like KMeans, PCA or cosine similarity analysis, the similarity between data sets could be determined and this result could be used for an overall similarity analysis. If entries, made by the community of the respective data source, are plausible, can be shown quite easily by this analysis if similar entries exist. Non-plausible data and data for which no similar products exist are determined, so that they can be checked manually. The Completion of the data sets can also be achieved by the similarity analyses. Nevertheless, only the ingredients and nutritional values of similar foods are adopted here. An exact specification is not provided in this case. It can only be shown that which are the ingredients that are possibly contained in the food data set and what the approximate range of nutritional values looks like. Nevertheless, this can be valuable information for the user who receives the data. For example, if a
434
A. Muenzberg et al.
person is expected to consume a particularly small or large amount of a nutrient, the analysis can indicate the approximate value that can be expected. With the help of the context-based FDWH API, data sets can be applied for health apps without concern, even if a part of the data set is not plausible, when another plausible part of the data set is needed. Following this work, the methods for improving the quality of food data will be extended by using further ontologies. In further research intentions, the data completion with data analysis methods shall be improved so that the number of automatically completed food products will increase even further. The productive use of the FDWH API will be tested and validated using real scenarios from the DiDiER study. The presented procedures can also be extended to other areas of application. For example, other products such as care products, cosmetics or medical products can be analyzed.
References 1. Muenzberg A, Sauer J, Laemmel S, Teichmann S, Hein A, Roesch N (2019) Optimization and merging of food product data and food composition databases for medical use. In: European academy of allergy & clinical immunology (EAACI) Congress, Lisbon 2. Roesch N, Muenzberg A, Sauer J, Arens-Volland A, Laemmel S, Teichmann S, Eichelberg M, Hein A (2019) Digital supported diagnostics in food allergy by analyzing app-based diaries. In: European academy of allergy & clinical immunology (EAACI) Congress Lisbon 3. Dig D, Johnson R (2006) How do APIs evolve? A story of refactoring. J Softw Maint Evol Res Pract. (John Wiley & Sons, Ltd.) 4. Muenzberg A, Sauer J, Hein A, Roesch N (2018) The use of ETL and data profiling to integrate data and improve quality in food databases. In: 14th international conference on wireless and mobile computing, networking and communications (WiMob 2018), Limassol, pp 231–238 5. Neuleben I (2020) Dokumentationspflicht und Aufbewahrungsfristen. Kassenärztliche Vereinigung Nordrhein. Düsseldorf, Deutschland: KVNO unterwegs, https://www.kvno.de/10p raxis/30honorarundrecht/30recht/20dokupflicht/15_05_aufbewahrungsfristen/index.html. Accessed 12 July 2020 6. Elfert P et al (2017) DiDiER-digitized services in dietary counselling for people with increased health risks related to malnutrition and food allergies. In: Computers and communications (ISCC), Heraklion, Greece, pp 100–104 7. Muenzberg A, Sauer J, Hein A, Roesch N (2020) Intelligent combination of food composition databases and food product databases for use in health applications. In: O’Hare G, O’Grady M, O’Donoghue J, Henn P (eds) Wireless mobile communication and healthcare. MobiHealth 2019. Lecture notes of the institute for computer sciences, social informatics and telecommunications engineering, vol 320. Springer, Cham 8. Kusumasari TF, Fitria (2016) Data profiling for data quality improvement with OpenRefine. In: IEEE international conference on information technology systems and innovation (ICITSI) 9. The IEEE and The Open Group, The Open Group Base Specifications Issue 6, 9. Regular Expressions, https://pubs.open-group.org/onlinepubs/009695399/basedefs/xbd_chap9.html# tag_09_03_05. Accessed 29 Oct 2020 10. Olson JE (2003) Data quality: the accuracy dimension, Morgan Kaufmann Publishers 11. Abedjan Z, Golab L, Naumann F (2016) Data profiling. In: IEEE international conference on data engineering (ICDE), pp 1432–1435 12. NIST, Statistical Data Engineering Division Dataplot, COSINE DISTANCE, https://www.itl. nist.gov/div898/software/dataplot/refman2/auxillar/cosdist.htm. Accessed 29 Oct 2020 13. Snowball, https://snowballstem.org/. Accessed 29 Oct 2020
Machine Learning and Context-Based Approaches …
435
14. Cleve J, Laemmel U (2014) Data mining. De Gruyter, Oldenburg 15. Fink L (2020) Hidden treasures in our groceries. https://www.kaggle.com/allunia/hidden-tre asures-in-our-groceries. Accessed 29 Oct 2020 16. Ng A, Soo K (2018) Data science–was ist das eigentlich?!. Springer, Berlin 17. Abdi H, Williams LJ (2010) Principle component analysis. In: Wiley interdisciplinary reviews: computational statistics, vol 2. In Press (2010)
Components of a Digital Transformation Strategy: A South African Perspective Kudzai Mapingire , Hanlie Smuts , and Alta Van der Merwe
Abstract Most organisations have begun to take the phenomenon of digital transformation very seriously and in response, they have adopted a digital transformation strategy (DTS) to guide them on the journey to being digitally transformed. Despite the impetus to adopt a DTS, most organisations lack the understanding of what a DTS entails and the components of such a strategy. In an effort to bring better understanding on the components of a DTS, this study adopted a qualitative research approach and collected research data using an Internet-mediated questionnaire. Our research findings revealed that most organisations have recently adopted a DTS within the last 10 years with the exception of a few. Furthermore, our research findings reveal that a DTS must incorporate the following components, digitisation of the customer experience, digitisation of products and services, digitisation employee ways of working and digitisation of business processes. Our findings also revealed that a DTS leverages digital technologies enabling the organisation to compete, innovate, grow and achieve its business strategy. These results have implications for academics; our study adds to the digital transformation body of knowledge and specifically the components of a DTS. We also propose a definition of a DTS based on our findings of DTS components. For practitioners, managers formulating and refining their DTS can use the DTS components as a benchmark of what to incorporate in their DTS. Keywords Digital transformation strategy · Digital transformation · Business transformation · Strategy components
K. Mapingire (B) · H. Smuts · A. Van der Merwe University of Pretoria, Private Bag X20, Pretoria, South Africa H. Smuts e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_38
437
438
K. Mapingire et al.
1 Introduction In recent times, almost all businesses in most industries have been impacted largely by the technological environment due to digitisation; “conversion of physical into virtual content” [1: 121] and digitalisation; “processing of digitised content” [1: 121]. The nomenclature of digitisation and digitalisation is part of a bigger digital transformation paradigm that organisations, industries, professions and societies are undergoing due to the pervasiveness of digital technologies in every aspect of life [2–4]. Organisations are facing immense challenges in undertaking digital transformation especially as they venture into new and unfamiliar territory, and at the same time lacking the solid strategy upon which to achieve the desired state of digital transformation [4–6]. The real problem in digitally transforming enterprises is that the full spectrum of what is possible with digitisation remains latent [7]. The fact that organisations have only begun to embrace and implement digital transformation strategies mean that there is a need to conduct research on digital transformation strategies in general and more specifically to elucidate on the components that such a strategy should encompass [8]. Furthermore, Hess et al. [9] argue that there is no clarity on the DTS options available for practitioners to choose from and the components that managers need to consider in their DTS. The objective of this research was to understand the components of a DTS thereby providing guidance to practitioners on what a DTS should contain. The aforementioned objective was realised by conducting a theoretical review of literature on digital transformation and the DTS. Furthermore, insights are gleaned from a qualitative Internet-mediated questionnaire conducted with research participants from South African organisations. The study contributes to our understanding of digital transformation and specifically the components of a DTS and adds to the DTS body of knowledge. The study also assists practitioners, such as business and digital managers on what components to incorporate in their DTS. We also propose a working definition of a DTS based on the identified DTS components. The research paper is organised as follows: Sect. 2 discusses the literature on digital transformation and the DTS. Thereafter, Sect. 3 discusses the research method adopted for this research study and Sect. 4 presents the discussion of research findings. Lastly, Sect. 5 concludes the research paper by delineating the key findings in the research study, research contributions, research limitations, and directions for further research.
2 Theoretical Background In this section, we present the extant literature on digital transformation and DTS. These concepts lay the theoretical foundation that informs our study and guided the design of the questionnaire research instrument.
Components of a Digital Transformation Strategy …
439
2.1 Digital Transformation Reis et al. [10] and Vial [11] note that digital transformation can be categorised into three main thematic definitions, namely, technological, organisational and social. From a technological perspective, these definitions emphasise that digital transformation has to do with the use of digital technologies such as social media, mobile, analytics, cloud and Internet of Things (IoT) [8, 11–14]. Vial [11] adds that, however, the definitions differ with regard to the types of technologies. Organisational definitions focus on how digital transformation is bringing about changes in the organisation’s business processes, operations, products, servicing, customer experience and business models [2, 10, 11, 13, 15]. Lastly, social definitions focus on the impact of digital transformation on the social and personal lives of people [10]. Vial [11] further notes that there are also definitional differences in terms of the type of transformation taking place. Abdelaal et al. [13] conducted an extensive literature review on digital transformation and consolidated perspectives on digital transformation into five perspectives: era, social/economic, industry/ecosystem, network, company/institutional and individual. The era perspective acknowledges that the fourth industrial revolution is upon us [7, 13, 16]. The social/economic facet recognises the notion of a shared economy in which different actors share in the creation and distribution of value [10, 17, 18]. From an industry/ecosystem perspective, traditional industry boundaries have become blurry and concepts such as industry 4.0 have become popular signifying the impact of digital technologies [7, 8, 13, 16]. The network perspective describes how the value network or value chains have been transformed from linear to more complex and dynamic matrix structures [11, 19, 20]. The company perspective highlights the ongoing digital transformation efforts that organisations are undertaking to harness the power of digital technologies and the emergence of the strategic contribution that digital transformation can bring to an organisation [3, 4, 13, 21]. The individual perspective underscores the concept that people’s identities have been extended to include their online profiles due to the use of digital technologies. As organisations embrace digital transformation into their organisational structures, new roles are emerging [13, 22]. The role of the Chief Digital Officer (CDO) has risen to prominence in most organisations undertaking digital transformation and echoes the importance of digital leadership [11, 22]. The scope of the DTS spans across all the organisation’s departments and is the responsibility of everyone in the organisation [2, 21]. Notably, the business-related roles are now required to take on and lead digital projects and, at the same time, technology roles are required to become more business savvy [11].
440
K. Mapingire et al.
2.2 Digital Transformation Strategy Matt et al. [22: 339] state that a DTS focuses on “… the transformation of products, processes, and organisational aspects due to new technologies”. Specifically, the DTS, “… signposts the way toward digital transformation and guides managers through the transformation process resulting from the integration and use of digital technologies” [9: 125]. Digital transformation strategies often describe anticipated business risks, opportunities and strategies of organisations that are fully or to some extent based on digital technologies [3, 4]. Digital transformation strategies are procedures that govern the organisation’s path to being digitally transformed [9]. The DTS is holistic in nature, spans across and is aligned to the other functional strategies [2, 4, 13, 21]. Unlike the IT strategy, which is the sole responsibility of the IT department, the DTS is a collective responsibility among the organisation’s functional departments [21]. Chanias et al. [21] claim that the DTS is distinctively different from the IT/IS strategy because it is business and customer centric and not centred on technology as the case of the IT/IS strategy. Another important distinction is between the digital strategy and the DTS. A digital strategy according to Bharadwaj et al. [24: 472] is “… organisational strategy formulated and executed by leveraging digital resources to create differential value”. Furthermore, the digital strategy is a functional-level strategy that is at par with the business strategy [4]. Hess et al. [23] contend that the digital business strategy lacks the transformational steps required to reach a state of being digitally transformed which a DTS encompasses. Digital transformation strategies are the road maps or blue prints that prescribe the transformations available to organisations, how organisations can implement these transformations and ultimately how organisations integrate the transformations into their operations and business processes [23]. Berman [25] adds that regardless of the industry that an organisation is in, digital transformations have an aspect of one of these four elements: use of digital or disruptive technologies, customer value creation, structural organisational changes and financial aspects. Concisely, digital transformation strategies are fundamentally challenging the traditional approach to strategy setting and maintaining those strategies pertinent in light of swift technological changes [2, 4, 21, 23]. Chanias [2], Zimmer [3] and Albukhitan [4] found out that the DTS formulation process does not follow the traditional top-down and formal planning approach. Instead, it follows a bottom-up fashion initiated by cross-functional or informal teams before management formalises the strategising process.
3 Methodology To understand the components of the DTS, this study adopted a qualitative research approach and collected research data using a qualitative Internet-based questionnaire hosted on SurveyMonkey. The qualitative questionnaire was chosen because it
Components of a Digital Transformation Strategy …
441
is an easier way to reach research participants in many organisations and generates large amounts of data faster [26]. The research participants that responded to the qualitative questionnaire are practitioners, business and digital managers in South African organisations from the following industries: financial services, telecommunications, consulting, Information Technology (IT), aviation and insurance. Qualitative research is most appropriate when conducting exploratory research, which involves an investigation on a new topic or subject of inquiry where there is deficient literature on the subject such as DTS [27, 28]. This research study utilised the non-probability stratified convenience sampling technique to recruit research participants who were suitable to respond to the research questions [29]. In our case, convenience sampling was ideal because we specifically targeted research participants that work in digital transformation and strategy. The questionnaire had the Coding and DTS Rationale sections. The questions in the Coding section addressed these key concepts: organisational roles of the participant, industry in which the company belongs and how long the organisation has been in existence. The following concepts were covered in DTS Rationale section: definition of a DTS, whether the organisation has a DTS, how long the company has formally had a DTS, motivation behind the company’s adoption of the DTS and components that make up the DTS. This section had multi-choice and open-ended questions. The questionnaires were qualitatively analysed using thematic analysis, specifically thematic networks [30]. Firstly, the basic themes are identified by assigning initial labels to the data. Secondly, the basic themes are further grouped into organising themes that combined common basic themes. Lastly, the organising themes were grouped into the high-level global themes. In the next section, we present and discuss the research findings of our study.
4 Analysis and Discussion of Findings In this section, we present and discuss the findings from the qualitative Internetmediated questionnaire. We also discuss what the findings mean and highlight where the findings corroborate literature. The questionnaire was sent to 69 research participants and 51 research participants responded to the questionnaire. This represents a response rate of 74%. Eighty three percent (83%) of research participants reported that their organisations have a DTS, whereas 11% reported that their organisations did not have a DTS. About 6% of research participants were unsure if their organisations have a DTS. An analysis of the data revealed that most of organisations have formally adopted a DTS in the past 10 years. These findings corroborate the assertion that most organisations have recently adopted a DTS [8].
442
K. Mapingire et al.
4.1 Components of a DTS The data collected from research participants on the definition of a DTS yielded eight global themes that attempt to summarise the components of a DTS. The global themes are digitise customer experience, digitise products and services, digitise employee ways of working, digitise business processes, achieve competitiveness through digital technology, grow the organisation through digital transformation, realise business strategy through digital transformation and innovate with digital technology. Each of these global themes are discussed in detail next and verbatim statements are used to substantiate these themes. Digitise customer experience. The digitise customer experience global theme has two organising themes that are associated with it; use of digital to improve customer experience and use of digital to solve customer needs. Use of digital to improve customer experience. This organising theme pertains to the use of digital technology in offering customers a remarkable customer experience. The use of digital technologies is giving organisations the opportunity to improve customer experience through improved service delivery and better engagement with customers [23] as stated in these statements: RP-2: “Simply the use of information technology to enhance and improve the service delivery to the end customers”; RP36: “…digitisation of the client experience and digitisation more broadly” and RP37: “A strategy that leverages digital (all facets) to optimise business outcomes and change/improve how customers engage”. Use of digital to solve customer needs. This organising theme describes the use of digital technology to solve customer problems or fulfil customer needs in a novel way. The following statements from research participants buttress give this view: RP-11: “Digital solution that solves clients’ needs; one that is apt” and RP-20: “The intent to integrate digital technology … to solve customer problems”. We can, therefore, conclude that the use of digital technologies to deliver a unique customer experience and solve customer problems or meet customer needs is an important component of the DTS. Our findings bode well with those of KetonenOksi et al. [17] and Vial [11] who assert that digital technologies are being used to better understand customer behaviour and target customers more effectively. Digitise products and services. The digitise products and services global theme is derived from the use of digital to digitise/create products and services organising theme. Use of digital to digitise/create products and services. This organising theme emphasises the role of digital technology in rendering products in a digitised form. The following verbatim statements reinforce this theme: RP-4: “Components of digital that impact… product…”; RP-5: “The plan or journey to digitise your products and service offerings using digital technologies”. RP-32: “Business strategy that is driven by the use of emerging technologies to either improve existing services and products, and/or to create new products and services”.
Components of a Digital Transformation Strategy …
443
In summary, the DTS incorporates the digitisation of products and services. More importantly, digital technologies have enabled organisations to create new products and services that would traditionally have not been possible without using digital technologies [16]. Digitise employee ways of working. The digitise employee ways of working global theme has two organising themes linked to it, which are digitise employee experience and use of digital to transform ways of working. These organising themes underscore the use of digital technologies in the organisation, to digitise the way the organisation engages with employees, and the employees’ ways of working in a digital world. Digitise employee experience. The digitise employee experience organising theme highlights the need to avail the information systems and technology tools that employees use to accomplish everyday tasks remotely and in digitised forms. These verbatim statements support this assertion: RP-40: “… digitisation of your colleague [bank employees] experience”; RP-42: “….a large scale digital ambition and what it means … for the employees” and RP-48: “The changes associated with digital technology application and integration into all aspects of the organisation, including human resources”. Use of digital to transform ways of working. The use digital to transform ways of working organising theme stresses the use of digital technologies in the workplace. These research participants echoed the following views: RP-16: “Components of digital that impact … and shifts in ways of work”; RP-42: “….a large scale digital ambition and what it means … for how we do work” and RP-46: “Planned activity to introduce technology into every day work tasks”. To summarise, digitising the employee experience is an essential aspect of the DTS in this digital world as echoed by [13, 14], and all organisations need to be striving for the employee experience that attracts and retains the best talent. Digitise business processes. The digitise business processes global theme has one organising theme linked to it, use of digital technology to transform business processes. Use of digital technology to transform business processes. This organising theme entails the use of digital technologies in the organisation, to improve, automate and change business processes. The following statements support this: RP-1: “It is the move to a digital business to simplify processes…”; RP-5: “A DTS focuses on moving manual and often paper based processes or artefacts onto digital platforms”; RP-8: “Business strategy that is driven by the use of emerging technologies to improve existing processes…” and RP-10: “… it is about literally getting your processes digitised end to end”. From the above assertions, it is evident that the DTS must incorporate the digitisation of business processes using digital technologies to streamline manual and inefficient business processes. This view supports that of Albukhitan [4] and Vial [11], who state that digital technologies are driving business process efficiencies. Achieve competitiveness through digital. The achieve competitiveness through digital global theme has one organising theme called use of digital capabilities for competitive advantage.
444
K. Mapingire et al.
Use of digital capabilities for competitive advantage. Competitive advantage, which enables an organisation to have a competitive edge or compete effectively against other organisations in its industry, is achievable with digital technologies. Research participants corroborate these findings by saying: RP-18: “… turning digital disruption and innovation into competitive advantage”; RP-27: “… creating competitive advantage by honing in the ability to consume customer feedback and to rapidly create and respond with a solution” and RP-33: “A strategy to leverage off IT systems for increased efficiency, competitiveness and growth”. A digital competitive advantage enables the organisation to remain viable by competing effectively and fending off new digital entrants into the industry. These findings are similar to those of [3, 9, 31, 32], who note that digital transformation can give an organisation a competitive advantage. Grow the organisation through digital transformation. The grow the organisation through digital transformation global theme has one organising theme use of digital technologies to grow the organisation associated with it. Use of digital technologies to grow the organisation. This use of digital technologies to grow the organisation organising theme explains the way in which organisations can use digital technologies to grow the organisation from a value, revenue and market share perspective. Research participants confirm these views: RP-19: “Leveraging digital assets for current and future growth of the organisation”; RP-44: “Using technology as an enabler for growth and driver of business value” and RP50: “… innovation in the interest of growth, both from a revenue and market share perspective”. Growing the business is an imperative for most organisations and the above findings reveal that the DTS should encompass aspects of organisational growth. Realise business strategy through digital transformation. The realise business strategy through digital transformation global theme has one organising theme which is use of technology to achieve the objectives of the organisation. Use of technology to achieve the objectives of the organisation. This organising theme involves the use of technology in achieving the goals and objectives of the organisation including business metrics such as increase revenue, reduce costs and customer satisfaction. These participants confirm these sentiments: RP-32: “Defining how technology will be adopted to achieve the objectives of an organisation”; RP36: “the roadmap towards achieving digital transformation objectives” and RP-43: “…the plan of how you intend to get to be a digitally led organisation, if that’s your goal”. The role of digital technology in enabling the attainment of organisational objectives is an important finding that supports the notion that a DTS should inherently incorporate this aspect and should be part of the business strategy. Abdelaal et al. [13] make the same argument that DTS is at the same level as the business strategy. Innovate with digital technology. The innovate with digital technology global theme has one organising theme, use of digital to enable innovation, that is linked to it. Use of digital to enable innovation. This organising theme underscores the role of digital technologies in either enabling, driving or being the source of innovation for organisations. The verbatim statements reinforce these assertions: RP-1: “It is
Components of a Digital Transformation Strategy …
445
the move to a digital business to simplify processes and introduce innovation while keeping up with user demands”. RP-45: “The deliberate application of digital solutions and new technology to solve conventional problems and to enable new types of innovation…”. RP-50: “… innovation in the interest of growth, both from a revenue and market share perspective”. Innovating with digital technology is an essential component of the DTS enabling the organisation to solve business problems and to grow. These findings buttress those of Kane et al. [19], who argue that digital transformation bolsters innovation.
4.2 Summary In summary, our findings revealed that the components of a DTS are digitisation of the customer experience, digitisation of products and services, digitisation employee ways of working and digitisation of business processes, achieving competitiveness, innovating, growing the business and realising its business strategy through digital technologies. Our findings substantiate the notion that the DTS is entirely or largely based on digital technologies [21]. As organisations adopt a DTS and refine their existing DTS, it is imperative to ensure that their DTS incorporates aspects of these components to be successful. Based on our findings on the components of a DTS, we propose the following working definition of a DTS. A DTS outlines how an organisation will use digital technologies in order to digitise the customer experience, products and services, employee ways of working and business processes. It also encompasses how the organisation will use digital technologies to be competitive, innovative, to grow the business and realise its business strategy.
5 Conclusion This research paper is part of the research efforts to bring better understanding on the components of a DTS and answer the clarion call to conduct more research on digital transformation. We found out that most organisations have a DTS that they have adopted recently within the past 10 years. This corroborates the view that most organisations have recently embarked on their DTS journeys [7, 21]. A DTS must incorporate the following components: digitisation of the customer experience, digitisation of products and services, digitisation of employee ways of working and digitisation of business processes. Furthermore, it leverages digital technologies enabling the organisation to be competitive, innovative, grow the business and realise its business strategy. This research has implications for both academics and practitioners. The study contributes to a better understanding of digital transformation strategies, by specifically outlining the components of a DTS [8]. Furthermore, study also proposes a definition for a DTS from a DTS components’ perspective. From a practitioner point
446
K. Mapingire et al.
of view, the definition of a DTS brings more clarity on the scope of a DTS and guides those involved in formulating and refining the DTS on what components should be embedded in it. Our research study recognises the limitations that should be taken into consideration while interpreting our research findings. First, qualitative research is subjective in nature, and therefore has an element of bias. To counter this bias, we undertook a pilot study before the study to eliminate potential problems. Second, the analysis of qualitative research data is subject to the researcher’s bias and experience, hence limiting the generalisability of our findings. Given that, the literature on the DTS is still developing and deficient, we suggest that future research should focus on understanding the DTS formulation process in organisations. The same study could be conducted using the quantitative research approach in order to compare research findings.
References 1. Enhuber M (2015) Art, space and technology: how the digitisation and digitalisation of art space affect the consumption of art—a critical approach. Digit Creat 26(2):1–17. https://doi. org/10.1080/14626268.2015.1035448 2. Chanias S, Hess T (2016) Understanding digital transformation strategy formation: insights from Europe’s automotive industry. In: Proceedings of the 20th pacific asia conference on information systems (PACIS 2016). Chiayi 3. Zimmer M (2019) Improvising digital transformation: strategy unfolding in acts of organizational improvisation. In: Twenty-fifth Americas conference on information systems. Cancun 4. Albukhitan S (2020) Developing digital transformation strategy for manufacturing. Procedia Comput Sci 170:664–671. https://doi.org/10.1016/j.procs.2020.03.173 5. Henriette E, Feki M, Boughzala I (2016) Digital transformation challenges. In: Tenth mediterranean conference on information systems. Association for Information Systems. Paphos, pp 1–7 6. Vogelsang K, Liere-Netheler K, Packmohr S, Hoppe U (2019) Barriers to digital transformation in manufacturing: development of a research agenda. In: Proceedings of the 52nd Hawaii international conference on system sciences. HICSS, pp 4937–4946. https://doi.org/10.24251/ hicss.2019.594 7. von Leipzig T, Gamp T, Manz D, Schöttle K, Ohlhausen P, Oosthuizen G, Palm D, Leipzig K (2017) Initialising customer-orientated digital transformation in enterprises. In: 14th Global conference on sustainable manufacturing. Procedia Manufacturing, Stellenbosch, pp 517–524. https://doi.org/10.1016/j.promfg.2017.02.066 8. Chi M, Lu X, Zhao J, Li Y (2018) The impacts of digital business strategy on firm performance: the mediation analysis of e-collaboration capability. Int J Inf Syst Change Manag 10(2). https:// doi.org/10.1504/ijiscm.2018.094603 9. Hess T, Matt C, Benlian A, Wiesböck F (2016) Options for formulating a digital transformation strategy. MIS Q Exec 15(2):123–139 10. Reis J, Amorim M, Melao N, Matos P (2018) Digital transformation: a literature review and guidelines for future research. In: Trends and advances in information systems and technologies, pp 411–421. https://doi.org/10.1007/978-3-319-77703-0_41 11. Vial G (2019) Understanding digital transformation: a review and a research agenda. J Strateg Inf Syst 28(2). https://doi.org/10.1016/j.jsis.2019.01.003
Components of a Digital Transformation Strategy …
447
12. Berman S, Marshall A (2014) The next digital transformation: from an individual-centered to an everyone-to-everyone economy. Strategy Leadersh 42(5):9–17. https://doi.org/10.1108/SL07-2014-0048 13. Abdelaal MHI, Khater M, Zaki M (2018) Digital business transformation and strategy: What do we know so far? https://doi.org/10.13140/rg.2.2.36492.62086 14. Kane G, Palmer D, Phillips AN, Kiron D, Buckley N (2015) Strategy, not technology, drives digital transformation. MIT Sloan Management Review and Deloitte University Press 15. Demirkan H, Spohrer J, Welser J (2016) Digital innovation and strategic transformation. IT Prof 18(6):14–18. https://doi.org/10.1109/MITP.2016.115 16. Neumeier A, Wolf T, Oesterle S (2017) The manifold fruits of digitalization–determining the literal value behind. In: 13th international conference on Wirtschaftsinformatik. St. Gallen, pp 484–498 17. Ketonen-Oksi S, Jussila J, Kärkkäinen H (2016) Social media based value creation and business models. Ind Manag Data Syst 116(8):816–838. https://doi.org/10.1108/imds-05-2015-0199 18. Oestreicher-Singer G, Zalmanson L (2012) Content or Community? a digital business strategy for content providers in the social age. MIS Q Manag Inf Syst 37(2):591–616. https://doi.org/ 10.2139/ssrn.1536768 19. Dasí A, Elter F, Gooderham P, Pedersen T (2017) New business models in-the-making in extant MNCs: digital transformation in a telco: opportunities and consequences. Adv Int Manag 30:29–53. https://doi.org/10.1108/S1571-502720170000030001 20. Brody P, Pureswaran V (2015) The next digital gold rush: How the internet of things will create liquid, transparent markets. Strategy Leadersh 43:36–41 21. Chanias S, Myers M, Hess T (2019) Digital transformation strategy making in pre-digital organizations: the case of a financial services provider. J Strateg Inf Syst 28(1):17–33. https:// doi.org/10.1016/j.jsis.2018.11.003 22. Singh A, Hess T (2017) How chief digital officers promote the digital transformation of their companies. MIS Q Exec 16(1):1–17 23. Matt C, Hess T, Benlian A (2015) Digital transformation strategies. Bus Inf Syst Eng 57(5):339– 343. https://doi.org/10.1007/s12599-015-0401-5 24. Bharadwaj A (2013) Digital business strategy: toward a next generation of insights. MIS Q 37(2):471–482 25. Berman S (2012) Digital transformation: opportunities to create new business models. Strategy Leadersh 40(2). https://doi.org/10.1108/10878571211209314 26. Johnson B, Turner LT (2003) Data collection strategies in mixed methods research. In: Handbook of mixed methods in social & behavioral research. SAGE Publications Inc., pp 297 27. Chenail R (2011) Ten steps for conceptualizing and conducting qualitative research studies in a pragmatically curious manner. Qual Rep 16(6):715–1730 28. Myers M (2013) Qualitative research in business and management, 2nd edn. SAGE Publications Inc., California 29. Abrams LS (2010) Sampling ‘hard to reach’ populations in qualitative research: the case of incarcerated youth. Qual Soc Work 9(4):536–550. https://doi.org/10.1177/1473325010367821 30. Attride-Stirling J (2001) Thematic networks: an analytic tool for qualitative research. Qual Res 1(3):385–405. https://doi.org/10.1177/146879410100100307 31. Berghaus S, Back A (2017) Disentangling the fuzzy front end of digital transformation: activities and approaches. In: Thirty eighth international conference on information systems. ICIS 2017 Proceedings, South Korea, pp 1–17 32. Fitzgerald M, Kruschwitz N, Bonnet D, Welch M (2013) Embracing digital technology: a new strategic imperative. MIT Sloan Management Review
Evaluation of Face Detection and Recognition Methods in Smart Mirror Implementation Muhammad Bagus Satrio, Aji Gautama Putrada, and Maman Abdurohman
Abstract Smart Mirror is an emerging technology capable of various reasons, among others, to report missing children to the authorities. It is a device that functions as a mirror, which is built from a two-way mirror with an electronic display behind it and has additional capabilities such as processing and presenting multimedia data. This research includes designing a Raspberry Pi-based smart mirror with the main function of detecting faces and documenting physical appearances which will later be presented on an Android application. A comparison of face detection and recognition method is carried out between the Haar Cascade method and the Local Binary Pattern method. The test results show that both methods have advantages in speed of identification depending on the number of users, where Haar Cascade is superior for two or less people, and Local Binary Pattern is superior for three or more people in one iteration process. Keywords Smart mirror · Raspberry Pi · Haar Cascade · Local Binary Pattern
1 Introduction According to [1, 2], a smart mirror is a device that functions as a mirror, which is built from a two-way mirror with an electronic display behind it and has additional capabilities such as processing and presenting multimedia data, such as text, images, and video. It can also be used in other fields such as medical purpose [3], home security [4], and even child abduction [5, 6]. Several interfaces can be connected M. B. Satrio (B) · M. Abdurohman School of Computing, Telkom University, Bandung 40257, Indonesia e-mail: [email protected] M. Abdurohman e-mail: [email protected] A. G. Putrada Advanced and Creative Network Research Center, Telkom University, Bandung 40257, Indonesia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_39
449
450
M. B. Satrio et al.
to the system, such as voice via Voice User Interface (VUI) [7] or vendors such as Amazon Alexa [8] and the system can also be ambient by embedding sensors such as ultrasonic sensors [9]. A smart mirror that was developed with two main components, a microprocessor, and a two-way glass [10] has the main feature of taking documentation of children’s clothing when the child looks in the mirror. To retrieve documentation, detection of user identity using a mirror uses the help of the library “OpenCV” which is a library to recognize images [11]. Digital images on OpenCV that were tested and used to help face detection of documented children use the Haar Cascade and Local Binary Pattern (LBP) method, and training datasets using the Histogram of Oriented Gradients (HOG) method such as in [12, 13]. The motivation of this research is to evaluate the performance of Haar Cascade and LBP as an object detection and recognition algorithm in the implementation of smart mirrors. Evaluation is done by examining the Pearson Correlation between defined metrics and presumed related variables. The Pearson Correlation table will be the novelty and contribution of this research. Moreover, evaluation is carried out by comparing the distribution of each method with t-test to conclude if the two methods have significant differences in performance values or not. The organization of the sections in this paper begins with an introduction in Sect. 1, Sect. 2 discusses the design system, moreover the evaluation is discussed in Sect. 3, and finally Sect. 4 contains the conclusion of the research conducted.
2 System Design The design of smart mirror tools and systems in this thesis is explained in Figs. 1 and 2. Starting from the user taking a face photo on the Android application to the Android application presents a photo of documentation from the smart mirror [14]. Components used in smart mirror devices are Web Cameras, Raspberry Pi 3B + models, 24 Monitors, and 5 and 12 v Power Adapters [1]. The smart mirror is connected to an Android phone app via the Internet [15–17]. Facial recognition testing is done by comparing two detection methods, namely Haar Cascade and Local Binary Pattern (LBP) [18, 19]. Tests are carried out with several presumed related variables, such as Object movement The test is carried out in two distinct environments, the first is a condition in which the object under test is moving and the second is a condition where the object being tested is being idle. Number of Datasets The number of datasets used in the test was varied. There are three different test scenarios, namely with 10, 20, and 30 datasets. Number of objects The number of objects being tested at the same time varies during testing. The number of objects tested was 1, 2, 3, 4, and 5 people. Facial Recognition Testing cannot be done in dark or dim conditions such that the results are all obtained when the conditions are bright.
Evaluation of Face Detection and Recognition Methods …
Fig. 1 The smart mirror system flow
Fig. 2 The smart mirror system architecture
451
452
M. B. Satrio et al.
r=
nΣ x y − (Σ x)(Σ y) (nΣ x 2 − (Σ x)2 )(nΣ y 2 − (Σ y)2 )
(1)
where in Eq. 1, n is the amount of data, x is the independent variable, and y is the dependent variable. This value will indicate whether the number of objects, the condition of the object, and the number of object datasets as a variable X affect the test results on the test subjects as variables Y. The results of each test subject are compared using the t-value on the t-test of x¯1 − x¯2 t= 2 s1 s2 + n22 n1
(2)
where n 1 is the number of pairs of sample 1, n 2 is the number of pairs of sample 2, x¯1 is the average of sample 1, x¯2 is the average of sample 2, s12 is the variant of sample 1, and s22 is the variant of sample 2. The t value that has been obtained is then compared with the critical one tail value which can indicate whether the null hypothesis is rejected or accepted by seeing whether t-stat is greater than t-critical one tail. The subject of facial recognition testing consists of the following. Percentage of face identified and authenticated. Subjects that contain a percentage value of the test carried out based on the identified face, and the identified face are not in Unknown condition. Unknown condition is a condition where a face is fully identified but the face cannot be authenticated with any dataset so the system cannot recognize the owner of the face. Percentage of faces not identified by the camera. Subjects that contain percentage values where the object is inside the camera, but the faces cannot be identified. False Positive. It is the face that has been identified but an error occurs in authenticating the owner of the face. Recognition runtime. A subject that contains a second value in which the 30iteration test has a different time depending on the number of objects when the test took place. Dataset training runtime. Subjects that contain minute values where before testing is performed, a dataset of training in the form of 50, 100, and 150 faces is required. Face data training uses the HOG method [20].
3 Evaluation Smart mirror testing is carried out by running all the tools that have been designed and assembled as in Fig. 3. Documentation results that have been taken by a smart mirror can be presented through the Android application as shown in Fig. 3. Facial recognition testing is carried out by running the program 30 times the iteration for one testing session. In one test, the maximum number of faces that can be identified depends on the number of objects that is 30 times the object in the
Evaluation of Face Detection and Recognition Methods …
453
Fig. 3 Front view, rear view, and Android app of the smart mirror Table 1 Pearson correlation result of presumed related variables Identified and Face not False positive authenticated identified faces Object movement −0.307 Number of 0.009 datasets Number of −0.221 objects
Face recognition runtime
0.236 −0.220
0.101 0.089
0.016 0.021
0.310
−0.051
0.926
camera, assuming if there are five people, the maximum number of faces that can be identified is 150 faces. The number of tests was carried out 120 times by combining two methods, three groups of datasets, five objects, light and dark conditions, and the condition of stationary and moving objects. The test results are as follows.
3.1 Pearson Correlation Test of Presumed Related Variables From the results of proving the effect of testing variables on the subject of testing with Pearson Correlation in Table 1, it is found that only the number of objects that affect the subject of testing the face recognition runtime with a Pearson value of 0.926. This correlation is further proven in Fig. 4. The HOG dataset test results on the left part of Fig. 4; training object dataset in the form of faces shows a significant difference in the number of datasets. The training
454
M. B. Satrio et al.
Fig. 4 HOG training time compared to number of datasets, and Haar Cascade and LBP runtime compared to the number of objects
was carried out three times, with different numbers of datasets. Testing the training with 50 datasets took 22.39 min, 100 datasets took 33.34 min, and 150 datasets took 43.79 min. So, it can be said that the more datasets that are trained, the more time it takes. The Haar Cascade and LBP Runtime test results are shown in the right part of Fig. 4; facial recognition carried out has a very time dependent on the number of object tests. From the above table, it can be concluded that LBP has a higher intercept but a lower slope compared to Haar Cascade.
3.2 T-Test of Result of Haar Cascade and LBP Method Measurements Data generated from object condition variables and the number of object datasets, with face subjects identified and corrected, faces unidentified and False Positive, are combined into one sample unit. The test result data in the top left graphic of Fig. 5, the two methods are compared using t-test, and produce a t-stat value smaller than t-critical one tail as in Table 2, which results in the conclusion that the two methods, namely Haar Cascade and LBP are the same both in testing the subject Face Identified and authenticated. The test result data in the top right graphic of Fig. 5, the two methods are compared using t-test, and produce a t-stat value smaller than t-critical one tail as in Table 2, which results in the conclusion that the two methods namely Haar Cascade and LBP are the same both in testing the subject Face Not identified. Data from the test results in the top left graphic of Fig. 5: the two methods are compared using t-test, and produce a t-stat value smaller than t-critical one tail as in
Evaluation of Face Detection and Recognition Methods … Table 2 T-test result of Haar Cascade and LBP comparison t-Stat t-critical one tail Identified and authenticated faces Unidentified faces False positives
−0.40134515
1.67942739
−1.12755381
1.67202889
1.61110920
1.67155276
455
Conclusion The null hypothesis is accepted The null hypothesis is accepted The null hypothesis is accepted
Fig. 5 PDF comparison results of Hear Cascade and LBP method in identified and authenticated faces (Top Left), not identified (Top Right), and false positive (Bottom)
456
M. B. Satrio et al.
Table 2, which results in the conclusion that the two methods, namely Haar Cascade and LBP are both the same in testing face subjects who experience False Positive.
4 Conclusion A smart mirror implementation that has been installed with a recognition and identification system and is connected to an Android app has been successfully implemented. The result is from four variables that are presumably related to three key performances, the variable that is proved to be related to a performance is the recognition and identification runtime. Additionally, through t-tests, it is proven that Haar Cascade and LBP do not have a significant difference in performances in terms of identified and recognized faces, unidentified faces, and false positive faces because the t-stat value of each test was smaller than each of the t-critical one tail values.
References 1. Gold D, Sollinger D (2016, October) SmartReflect: a modular smart mirror application platform. In: 2016 IEEE 7th annual information technology, electronics and mobile communication conference (IEMCON). IEEE, pp 1–7. https://doi.org/10.1109/IEMCON.2016.7746277 2. Luce L (2018) Artificial intelligence for fashion: How AI is revolutionizing the fashion industry. Apress 3. Miotto R, Danieletto M, Scelza JR, Kidd BA, Dudley JT (2018) Reflecting health: smart mirrors for personalized medicine. NPJ Digit Med 1(1):1–7. https://doi.org/10.1038/s41746018-0068-7 4. Nadaf RA, Hatture S, Challigidad PS, Bonal VM (2019, June) Smart mirror using raspberry pi for human monitoring and home security. In: International conference on advanced informatics for computing research. Springer, Singapore, pp 96–106. https://doi.org/10.1007/978-981-150111-1_10 5. Siripala RMBN, Nirosha M, Jayaweera PADA, Dananjaya NDAS, Fernando SGS (2017) Raspbian magic mirror-a smart mirror to monitor children by using Raspberry Pi technology. Int J Sci Res Publ 7(12):281–295 6. Lampinen JM, Miller JT, Dehon H (2012) Depicting the missing: prospective and retrospective person memory for age progressed photographs. Appl Cognit Psychol 26(2):173–197. https:// doi.org/10.1002/acp.1819 7. Yusri MMI, Kasim S, Hassan R, Abdullah Z, Ruslai H, Jahidin K, Arshad MS (2017, May) Smart mirror for smart life. In: 2017 6th ICT international student project conference (ICTISPC). IEEE, pp 1–5. https://doi.org/10.1109/ICT-ISPC.2017.8075339 8. Purohit N, Mane S, Soni T, Bhogle Y, Chauhan G (2019, May) A computer vision based smart mirror with virtual assistant. In: 2019 international conference on intelligent computing and control systems (ICCS). IEEE, pp 151–156. https://doi.org/10.1109/ICCS45141.2019. 9065793 9. Johri A, Jafri S, Wahi RN, Pandey D (2018, December) Smart mirror: a time-saving and affordable assistant. In: 2018 4th international conference on computing communication and automation (ICCCA). IEEE, pp 1–4. https://doi.org/10.1109/CCAA.2018.8777554
Evaluation of Face Detection and Recognition Methods …
457
10. Sun Y, Geng L, Dan K (2018, January) Design of smart mirror based on Raspberry Pi. In: 2018 international conference on intelligent transportation, big data & smart city (ICITBS). IEEE, pp 77–80. https://doi.org/10.1109/ICITBS.2018.00028 11. Laganière R (2014) OpenCV computer vision application programming cookbook second edition. Packt Publishing Ltd 12. Moskvil J (2017) The intelligent mirror-a personalized smart mirror using face recognition. Master’s thesis, NTNU 13. Oo SLM, Oo AN (2019, November) Child face recognition with deep learning. In: 2019 international conference on advanced information technologies (ICAIT). IEEE, pp 155–160. https://doi.org/10.1109/AITC.2019.8921152 14. Guo-Hong S (2014, October) Application development research based on android platform. In: 2014 7th international conference on intelligent computation technology and automation. IEEE, pp 579–582. https://doi.org/10.1109/ICICTA.2014.145 15. Jin K, Deng X, Huang Z, Chen S (2018, May) Design of the smart mirror based on raspberry pi. In: 2018 2nd IEEE advanced information management, communicates, electronic and automation control conference (IMCEC). IEEE, pp 1919–1923. https://doi.org/10.1109/ IMCEC.2018.8469570 16. Montaño JL (2016) Android powered smart mirror device 17. Mittal DK, Verma V, Rastogi R (2017) A comparative study and new model for smart mirror. Int J Sci Res Res Paper Comput Sci Eng 5(6):58–61. https://doi.org/10.26438/ijsrcse/v5i6. 5861 18. Cuimei L, Zhiliang Q, Nan J, Jianhua W (2017, October). Human face detection algorithm via Haar cascade classifier combined with three additional classifiers. In: 2017 13th IEEE international conference on electronic measurement & instruments (ICEMI). IEEE, pp 483– 487. https://doi.org/10.1109/ICEMI.2017.8265863 19. Zhang B, Gao Y, Zhao S, Liu J (2009) Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor. IEEE Trans Image Process 19(2):533–544. https://doi.org/10.1109/TIP.2009.2035882 20. Shu C, Ding X, Fang C (2011) Histogram of the oriented gradient for face recognition. Tsinghua Sci Technol 16(2):216–224. https://doi.org/10.1016/S1007-0214(11)70032-3
Comparative Analysis of Grid and Tree Topologies in Agriculture WSN with RPL Routing Febrian Aji Pangestu, Maman Abdurohman, and Aji Gautama Putrada
Abstract Agricultural Internet of Things is very dependent on its Wireless Sensor Network (WSN) performance. The Routing Protocol for Low Power and Lossy Network (RPL) is an IPv6-based routing protocol that was developed to provide more addresses and lower power for sensor nodes on WSN. This research compares the performance of grid and tree topologies with RPL routing protocol on Cooja simulator. The parameters evaluated in this study are power consumption, routing metrics, Expected Transmission Count (ETX), throughput, and delay. The result is the performance of the RPL routing protocol with the grid topology has better values than the tree topology in the various parameters tested. In throughput parameters, the grid topology values with 20, 30, and 42 nodes are 901 bps, 722 bps, and 678 bps, better than the tree topology which are 812 bps, 697 bps, and 531 bps. Keywords WSN · RPL · Cooja simulator · Grid topology · Tree topology · Power consumption · Routing metric · ETX
1 Introduction RPL is a proactive tree-based routing protocol that builds a directed acyclic graph (DAG) between leaf nodes and sink nodes [1–4]. RPL itself is the routing protocol most often used on 6LowPAN networks. Some examples of devices that can use RPL are ESB, Sky, and Zolertia. These devices can be simulated by the Cooja simulator [5].
F. A. Pangestu (B) · M. Abdurohman School of Computing, Telkom University, Bandung 40257, Indonesia e-mail: [email protected]; [email protected] M. Abdurohman e-mail: [email protected] A. G. Putrada Advanced and Creative Networks Research Center, Telkom University, Bandung 40257, Indonesia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_40
459
460
F. A. Pangestu et al.
Several studies have been conducted in the field of RPL. In [6], two topologies, namely, grid topology and random topology are compared. In [7], random and grid topologies in an area of 300 m x 300 m are compared. In [8], performance of five topologies, namely, chain, linear, circle, random top, and random center topology is compared. In [9], four topologies, namely, random, manual, linear, and elliptical are compared. In [10] that compares star, mesh, and tree topologies, tree topology gets the best results with the parameters tested, namely, packet loss, throughput, delay, and energy consumption. The case study raised in this study is to make a prototype design of a farm monitoring system using WSN with the RPL routing protocol using a grid topology and tree topology to measure the performance of the RPL routing protocol. The OS used is Contiki OS because it provides LLN and supports IPv6 and IPv4 [11]. One of the factors that needs to be considered in implementing WSN is its topology because it will affect its performance. Contiki OS provides low-power Internet communication that is widely used in the WSN domain [13]. Besides Contiki OS, there is TinyOS which can also implement RPL [12]. The simulation software used on TinyOS is TOSSIM. The Cooja is an open-source Java-based software for wireless sensor network simulation. Cooja is a combination of hardware platform definitions and application programming via the API provided by the Contiki OS operating system at the software level [6, 14, 15]. In TinyOS, the simulation software used is TOSSIM. In research [16], for RPL simulation studies, Cooja simulators were the most widely used compared to TOSSIM. In this study, the author aims to compare the grid topology and tree topology using Cooja simulator with a case study of agricultural land to get the results in the form of power, routing metrics, ETX, throughput, and delay of each topology and analyze which topologies are better applied to WSN. The organization section of this paper begins with Sect. 1, which is an introduction, Sect. 2 describes the design of the system created, then Sect. 3 explains the test and its results, and finally Sect. 4 concludes this research.
2 System Design This part explains the design flow. The first stage in RPL routing is where the IPv6 initiates all nodes including sink (root). Then the sink sends the DIO to the node at its range to form a DODAG [17, 18]. The node that wants to join the sink will send DAO to the sink to ask permission to join. Then the sink will send DAO-ACK as a sign that the nodes can join. After the connection is established, all nodes will send packet data with the final destination in the sink. The PCAP file is then processed using Wireshark to get the average value of throughput. Then the delay value is obtained by processing the PCAP file using Microsoft Excel. After that, an analysis of the performance of the RPL routing protocol is performed. The picture of the system design is shown in Fig. 1.
Comparative Analysis of Grid and Tree Topologies …
461
Fig. 1 Design flow
2.1 Simulator Design The formation of a topology is a very important function in RPL [19], before the nodes transmit data to each other, the network must set the time for topology formation. The number of nodes is chosen based on the area. In the grid topology, the maximum number of nodes for an area of 100 m x 100 m is 56, but in the tree topology, 56 nodes are too many so that the maximum number of nodes is taken 42. The number of nodes 42 on the grid topology has a symmetrical shape, so nodes are 20 and 30 because the shape is symmetrical too. The design of grid topology and tree topology can be seen in Fig. 2.
2.2 Simulator Parameters The simulation scenario conducted in this study is to simulate the RPL routing protocol on the grid topology and tree topology using a Cooja simulator to assess the performance of each topology. The radio medium used is the Disk Loss Unit Graph Model (UDGM) because it has several advantages over the others because it
462
Fig. 2. 20 (Top), 30 (Middle), and 42 (Bottom) node scenario
F. A. Pangestu et al.
Comparative Analysis of Grid and Tree Topologies … Table 1 Simulation parameter
463
Parameter
Value
Wireless channel module
UDGM distance loss
Radio duty cycle
ContikiMAC
MAC
IEEE 802.15.4
Routing protocol
RPL
Transport protocol
UDP
Topology
Grid, Tree
Number of node
20, 30, 42
Area
100 m x 100 m
Time
15 min
Mote type
Sky mote
Transmission range
50 m
Tx/Rx ratio
100%
can adjust the success rate of the TX / RX ratio so that the simulation becomes more realistic [20]. The time to do the simulation is around 15 minutes because in the first 2 minutes IPv6 initiates all the nodes then it takes about 3–12 minutes to form the DODAG of each topology. Mote which is simulated in this study is sky mote. The following simulated test parameters are seen in Table 1.
2.3 Performance Testing Scenario The tested performances are power consumption, routing metrics, ETX, throughput, and delay [21, 22]. Routing metric is a value to determine the quality of the transmission route [23]. While the ETX value is the expected value of a packet that arrives at its destination without error. The smaller the value of the routing metric and ETX, the better the results.
3 Results and Analysis 3.1 Power Evaluation The result of the power evaluation test is that in each test, grid topology has lower average power than tree topology. Moreover, it can be seen that, as the number of grids rises, the tree topology provides a larger slope. The results can be seen in the top left result in Fig. 3.
464
F. A. Pangestu et al.
Fig. 3 Result of average power, ETX, and routing metric
3.2 Routing Metric and ETX Evaluation The ETX test result between grid and tree topologies shows insignificant in crease. Concluding that ETX is not directly influenced by the increase of nodes in a WSN. However, in each test, grid has lower values than tree, indicating better performance. The routing metric test result between grid and tree topologies shows increase according to the growth of node numbers. Additionally, the results show that the increase is quadratic, indicating that the routing metric will grow in geometric fashion when nodes are added to the network. Moreover, in each test, grid topology shows lower results compared to tree topology, indicating better performance. The graphs of the test results of routing metric and ETX are shown in the bottom and top left picture of Fig. 3.
3.3 Throughput Evaluation The result is that the grid topology has a higher throughput than the tree topology in each scenario with values of 901 bps, 722 bps, and 678 bps, while the tree topology has a throughput value of 812 bps, 697 bps, and 531 bps. In the results of throughput, the highest value is obtained by a grid of 20 nodes with 901 bps, while the lowest is tree 42 with 531 bps. The graph of throughput values of these two topologies is shown in Fig. 4.
Comparative Analysis of Grid and Tree Topologies …
465
Fig. 4 Results of average throughput and delay
3.4 Delay Evaluation The result is the grid topology has an average delay smaller than the tree topology with the average value of delay, respectively, 0.64 s, 0.838 s, and 0.95 s while the tree topology has an average value of delay, respectively, 0.68 s, 0.84 s, and 1.16 s. The graph from each topology can be seen in Fig. 4.
4 Conclusion An agricultural WSN with RPL routing with grid topology and tree topology has been successfully implemented with Cooja simulator. Based on the results of the tests that have been carried out, it can be concluded that from the two topologies compared, the grid topology is superior to the tree topology in all aspects of the parameters tested such as power consumption, routing metrics, ETX, throughput, and delay. The value of power in the grid topology with the number of nodes 20, 30, and 42 is 1,063 mW, 1,415 mW, and 2,384 mW, while in the tree topology is 1,184 mW, 1,546 mW, and 2,877 mW. In terms of data transfer rates, the grid topology with scenarios of 20, 30, and 42 nodes has an average throughput of 901 bps, 722 bps, and 678 bps, higher than the tree topology that has an average throughput of 812 bps, 697 bps, and 531 bps. From this study, it can be concluded that the choice between grid and tree topologies can affect the performance of the RPL routing protocol and is recommended for agricultural WSN.
References 1. Iova O, Picco P, Istomin T, Kiraly C (2016) Rpl: the routing standard for the internet of things... or is it?. IEEE Commun Mag 54(12):16–22. https://doi.org/10.1109/MCOM.2016.1600397CM
466
F. A. Pangestu et al.
2. Xie H, Zhang G, Su D, Wang P, Zeng F (2014) Performance evaluation of RPL routing protocol in 6lowpan. In: 2014 IEEE 5th international conference on software engineering and service science. IEEE, pp 625–628. https://doi.org/10.1109/ICSESS.2014.6933646. (2014, June) 3. Gao L, Zheng Z, Huo M (2018). Improvement of RPL protocol algo rithm for smart grid. In: 2018 IEEE 18th international conference on communica tion technology (ICCT). IEEE, pp 927–930. https://doi.org/10.1109/ICCT.2018.8600162. (2018, October) 4. Lassouaoui L, Rovedakis S, Sailhan F, Wei A (2016, October). Evaluation of energy aware routing metrics for RPL. In: 2016 IEEE 12th international conference on wireless and mobile computing, networking and communications (WiMob). IEEE, pp 1–8. https://doi.org/10.1109/ WiMOB.2016.7763212. (2016, October). 5. Hendrawan INR (2018) Analisis Kinerja Protokol Routing RPL pada Simulator Cooja. Jurnal Sistem dan Informatika (JSI) 12(2):9–18 6. Saputra AH, Trisnawan PH, Bakhtiar FA (2018) Analisis Kinerja Pro- tokol 6LoWPAN pada Jaringan Sensor Nirkabel dengan Topologi Jaringan Grid dan Topologi Jaringan Random Menggunakan Cooja Simulator. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer e-ISSN 2548:964X 7. Ullah R, Faheem Y, Kim BS (2017) Energy and congestion-aware routing metric for smart grid AMI networks in smart city. IEEE Access 5:13799–13810. https://doi.org/10.1109/ACCESS. 2017.2728623 8. Tran H, Vo MT, Mai L (2018) A comparative performance study of RPL with different topologies and MAC protocols. In: 2018 international conference on advanced technologies for communications (ATC). IEEE, pp. 242–247. https://doi.org/10.1109/ATC.2018.8587445. (2018, October) 9. Gokilapriya V, Bhuvaneswari PTV (2017) Analysis of RPL routing protocol on topology control mechanism. In: 2017 fourth international conference on signal processing, communication and networking (ICSCN). IEEE, pp 1–5. https://doi.org/10.1109/ICSCN.2017.8085693. (2017, March) 10. Amalina EN, Setijadi E, Perbandingan Topologi WSN (Wireless Sensor Net-work) Untuk Sistem Pemantauan Jembatan. 11. Hicham A, Sabri A, Jeghal A, Tairi H (2017) A comparative study between operating systems (Os) for the Internet of Things (IoT). Trans Mach Learn Artif Intell 5(4). https://doi.org/10. 14738/tmlai.54.3192 12. Ghaleb B, Al-Dubai AY, Ekonomou E, Alsarhan A, Nasser Y, Mackenzie LM, Boukerche A (2018) A survey of limitations and enhancements of the ipv6 routing protocol for low-power and lossy networks: a focus on core operations. IEEE Commun Surv Tutor 21(2):1607–1635. https://doi.org/10.1109/COMST.2018 13. Hassani AE, Sahel A, Badri A (2019) Assessment of a proactive routing protocol RPL in Ipv6 based wireless sensor networks. In: 2019 third inter-national conference on intelligent computing in data sciences (ICDS). IEEE, pp. 1–7. https://doi.org/10.1109/ICDS47004.2019. 8942364. (2019, October). 14. Wang ZM, Li W, Dong HL (2018). Analysis of energy consumption and topology of routing protocol for low-power and lossy networks. J Phys Conf Ser 1087(5) 15. Zikria YB, Afzal MK, Ishmanov F, Kim SW, Yu H (2018) A survey on routing protocols supported by the Contiki Internet of things operating system. Futur Gener Comput Syst 82:200– 219. https://doi.org/10.1016/j.future.2017.12.045 16. Kim HS, Ko J, Culler DE, Paek J (2017) Challenging the IPv6 routing protocol for low-power and lossy networks (RPL): a survey. IEEE Commun Surv Tutor 19(4):2502–2525 17. Banh M, Mac H, Nguyen N, Phung KH, Thanh NH, Steenhaut K (201) Performance evaluation of multiple RPL routing tree instances for Internet of Things applications. In: 2015 international conference on advanced technologies for communications (ATC). IEEE, pp 206–211. https:// doi.org/10.1109/ATC.2015.7388321. (2015, October) 18. Tian H, Qian Z, Wang X, Liang X (2017) QoI-aware DODAG construction in RPL-based event detection wireless sensor networks. J Sens. https://doi.org/10.1155/2017/1603713
Comparative Analysis of Grid and Tree Topologies …
467
19. Kalyani S, Vydeki D (2018) Measurement and analysis of QoS parameters in RPL network. In 2018 Tenth International Conference on Advanced Computing (ICoAC). IEEE, pp. 307–312. https://doi.org/10.1109/ICoAC44903.2018.8939052. (2018, December) 20. Santos AL, Cervantes CA, Nogueira M, Kantarci B (2019) Clustering and reliability-driven mitigation of routing attacks in massive IoT systems. J Internet Serv Appl 10(1):18. https:// doi.org/10.1186/s13174-019-0117-8 21. Rawat P, Singh KD, Chaouchi H, Bonnin JM (2014) Wireless sensor networks: a survey on recent developments and potential synergies. J Supercomput 68(1):1–48. https://doi.org/10. 1007/s11227-013-1021-9 22. Kaur T, Kumar D (2020) A survey on QoS mechanisms in WSN for computational intelligence based routing protocols. Wireless Netw 26(4):2465–2486. https://doi.org/10.1007/s11276-01901978-9 23. Lokare VT, Thorat SA (2015) Cooperative Opportunistic Routing based on ETX metric to get better performance in MANET. In: 2015 international conference on computer communication and informatics (ICCCI). IEEE, pp 1–6. https://doi.org/10.1109/ICCCI.2015.7218067. (2015, January)
Designing a Monitoring and Prediction System of Water Quality Pollution Using Artificial Neural Networks for Freshwater Fish Cultivation in Reservoirs R Raden Muhamad Irvan, Maman Abdurohman, and Aji Gautama Putrada Abstract If the reservoir is polluted, the water can be dangerous for the fish that live in it. The threat can be mitigated through prediction. In this study, a system based on the Internet of Things (IoT) is proposed to predict water pollution in reservoirs and be able to monitor changes in water quality values. Water quality data are obtained from several sensors and microcontrollers. Data is sent to the Thingspeak IoT Platform and is used to train an artificial neural network (ANN) model which is used to predict freshwater pollution in the reservoir. The results of the data that have been sent are displayed in Thingspeak. The highest accuracy training variable was obtained with Epoch = 600, Learning Rate = 0.1, Momentum = 0.1, and Training Data Percentage = 85%. ANN prediction testing is based on training variables with an average accuracy value of 97.67%. Keywords Water pollution · Artificial neural networks · Internet of Things · Freshwater fish · Prediction · pH sensor · Temperature sensor · Turbidity sensor
1 Introduction The water in a reservoir is a place of use for residents who live nearby; they can use water for their daily lives [14]. The water quality must be maintained in accordance with its standards. Reservoir water treatment is one way out, by preventing, detecting pollution, and predicting the possibility of water pollution [10, 16]. R. R. M. Irvan (B) · M. Abdurohman School of Computing, Telkom University, Bandung 40257, Indonesia e-mail: [email protected] M. Abdurohman e-mail: [email protected] A. G. Putrada Advanced and Creative Networks Research Center, Telkom University, Bandung 40257, Indonesia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_41
469
470
R. R. M. Irvan et al.
In predicting pollution in water, artificial neural network (ANN) is applicable [7, 8, 10, 11]. ANN makes predictions by recognizing patterns of several water quality parameters such as hydrogen potential (pH), temperature, and turbidity over time. Artificial Neural Network is an information processing paradigm that is inspired by the work system of biological nerves, such as the performance of the brain, which processes information [2, 3, 9, 12, 15]. The Internet of Things or commonly known as IoT is a rapidly growing concept [17, 19]. IoT can help people in their lives and simplify their activities. The proposed system of this paper found two problems: how to implement Artificial Neural Network (ANN) on IoT systems for predicting water pollution in webbased reservoirs and how ANN performance on prediction accuracy is obtained from training data on validation data.
2 System Design 2.1 Water Quality Good water quality to be managed has certain parameters that must be taken into consideration before using and managing water [1]. For freshwater fish farming, the pH needed is in the range of 6.5–8 [7]. The standard turbidity level for freshwater fish farming ranges the 25–400 JTU scale (Jackson Turbidity Unit) [7] and other opinions were also expressed, which stated the desired turbidity value for fish culture was 32
NTU ◦C
Designing a Monitoring and Prediction System of Water Quality Pollution …
471
2.2 Workflow In general, the workflow of the system that has been designed can be seen in Fig. 1 and will be explained gradually. The system’s flow starts from the water quality dataflounder of each sensor. Sensors in the system that have been designed are pH sensors, turbidity, and temperature in the water that has been connected with Arduino Uno and NodeMCU. When the data is obtained, the data is sent to the MQTT (Mosquitto) Broker using Publish and Subscribe. Once the data has been sent, it will be analyzed by the Neural Network with Python programming. In the system that has been designed, there is some hardware needed to support the design of this system, including the pH sensor, turbidity sensor, temperature sensor, Arduino Uno, and NodeMCU. The hardware infrastructure designed in this paper can be seen in Fig. 2. The turbidity sensor functions to measure the turbidity level. Arduino is a microcontroller that can read inputs, usually sensors, and process the input in a built in CPU, then produce the desired output. This is similar to NodeMCU,
Fig. 1 Design flow
472
R. R. M. Irvan et al.
Fig. 2 System block diagram
Fig. 3 Thingspeak IoT platform (Left) and IFTTT push notification (Right)
only that NodeMCU has ESP8266 package into a board that has been integrated with various features like a microcontroller and access to Wifi and also a communication chip consisting of USB to serial. Thingspeak is one of the IoT platforms that has been used for monitoring systems. In this study, Thingspeak can display water quality data and prediction results. While the function of the indicator light is to show that water will experience pollution. IFTTT is a service site that allows to carry out actions in two or more different applications, devices, and services automatically. Figure 3 shows the Thingspeak and IFTTT display.
Designing a Monitoring and Prediction System of Water Quality Pollution …
473
2.3 Artificial Neural Network The training carried out in this system uses backpropagation modeling with ANN architecture consisting of an input layer consisting of 3 neurons, a hidden layer containing 8 neurons, and an output layer containing 1 neuron. In backpropagation training, three phases are passed, namely feedforward, backpropagation of error, and modification of weights and biases (adjustments). The workings of this backpropagation first look for the value of the output error with feedforward phase to correct the weights which are then done with backward. When forwarding, there is an activation function that can activate neurons to produce an output value. Meanwhile, the activation function used in the ANN training is the binary sigmoid activation function on the hidden layer and the linear activation function on the output layer. Next in Fig. 4 is a form of ANN architecture in this study. In training, ANNs have several parameters that play an important role in the training process. The parameters used in the training process are as follows. 1. 2. 3. 4.
Epoch is the number or maximum iteration in the ANNs training process. Learning Rate affects the rate of change for each bias and weight value of neurons. Momentum adjusts the value of bias and weight on neurons. Training Data Percentage is the number of data samples determined as training data.
The initial value of each parameter is for Epoch = 1000, Learning Rate = 0.2, Momentum = 0.8, and 20% of sample data is used for data training, while 80% is used for data validation.
Fig. 4 ANN architecture
474
R. R. M. Irvan et al.
3 Results and Analysis In the Artificial Neural Network (ANN) training, it is conducted using the Python programming language. In this training process, the water quality parameter data used consisted of water temperature, turbidity, and water pH, with the amount of data approximately consisting of 7360 data. The accuracy of ANN predictions is measured by an approach called reflexive accuracy. The type of training used is the Multilayer Perceptron Classifier. The first test is performed on the parameters of the Epoch variable, where 70% of the sample data is used for training data, while 30% is used as validation data. The highest accuracy results are found at Epoch = 600 with an accuracy of 99%. Epoch = 600 is used for the Learning Rate test. The second test was conducted on the Learning Rate variable. The highest accuracy results are found in Learning Rate = 0.1 with an accuracy of 99%. Learning Rate = 0.1 is used for the Momentum test. The third test is performed on the Momentum variable. The highest accuracy results are found in Momentum = 0.1 with an accuracy of 99%. Momentum = 0.1 is used for the Training Data Percentage Test. The fourth conducted variable is the Training Data Percentage variable. The highest accuracy results are found in Training Data Percentage = 85% with an accuracy of 99%. Table 2 shows the result of all the tests. The last model is used for real environment tests. From the prediction testing that has been done, the accuracy value is 97.67%. The results of the predictions can be seen in Table 3. This result is benchmarked with results by Kusuma et al. [10]. Kennedy required 6000 data samples. Accuracy results from predictions made by Kennedy can be seen in the amount of 57.80%, while the accuracy of the results of predictions made in this study is equal to 97.67%. Insufficiency of data is usually caused by overfitting. Some methods can be used to prevent overfitting besides adding the training data, such as cross validation, early stop, and regularization [6, 13, 18].
4 Conclusion A monitoring and prediction system of water quality pollution using artificial neural networks for freshwater fish cultivation in reservoirs has successfully been built. The implementation of this research is limited to a laboratory-scale test environment. The accuracy of the training parameters has an average test value of 99%, and the results of the predictions obtained were 97.67%.
Designing a Monitoring and Prediction System of Water Quality Pollution … Table 2 Testing results No. Epoch
Training data accuracy (%)
1 2 3 4 5 No.
200 400 600 800 1000 Learning rate
98 95 99 98 97 Training data accuracy (%)
1 2 3 4 5 No.
0.1 0.2 0.4 0.7 1.0 Momentum
99 98 98 97 98 Training data accuracy (%)
1 2 3 4 5 No.
0.1 0.2 0.4 0.7 1.0 Training data percentage
99 98 98 97 79 Training data accuracy (%)
1 2 3 4 5
10 30 50 70 85
98 97 98 98 99
Table 3 Prediction result Value Data pH Prediction Real Accuracy (%) Mean
5.33 5.30 99.44 97.67%
Temperature
Turbidity
26.90 26.44 99.81
20.79 19.49 93.75
475
476
R. R. M. Irvan et al.
References 1. Bhawiyuga A, Yahya W (2019) Sistem monitoring kualitas air kolam budidaya menggunakan jaringan sensor nirkabel berbasis protokol lora. Jurnal Teknologi Informasi dan Ilmu Komputer 6(1):99–106 2. Cynthia EP, Ismanto E (2017) Jaringan syaraf tiruan algoritma backpropagation dalam memprediksi ketersediaan komoditi pangan provinsi riau. Rabit: Jurnal Teknologi dan Sistem Informasi Univrab 2(2):83–98 3. Djebbri N, Rouainia M (2017) Artificial neural networks based air pollution monitoring in industrial sites. In: 2017 international conference on engineering and technology (ICET). IEEE, pp 1–5 4. Febriwahyudi CT, Hadi W (2012) Resirkulasi air tambak bandeng dengan slow sand filter. Jurnal Teknik Pomits 1(1):1–5 5. Frasawi A, Rompas RJ, Watung JC (2013) Potensi budidaya ikan di waduk embung klamalu kabupaten sorong provinsi papua barat: Kajian kualitas fisika kimia air. e-Journal BUDIDAYA PERAIRAN 1(3) 6. Hammoodi AI, Al-Azzo F, Milanova M, Khaleel H (2018) Bayesian regularization based ANN for the design of flexible antenna for uwb wireless applications. In: 2018 IEEE conference on multimedia information processing and retrieval (MIPR). IEEE, pp 174–177 7. Hidayatullah M, Fat J, Andriani T, Sumbawa UT, Prototype sistem telemetri pemantauan kualitas air pada kolam ikan air tawar berbasis mikrokontroler. Jurnal Positron Universitas Tanjung Pura 8(2) 8. Hizham FA, Nurdiansyah Y et al (2018) Implementasi metode backpropagation neural network (bnn) dalam sistem klasifikasi ketepatan waktu kelulusan mahasiswa (studi kasus: Program studi sistem informasi universitas jember). BERKALA SAINSTEK 6(2):97–105 9. Khataee AR, Kasiri MB (2010) Artificial neural networks modeling of contaminated water treatment processes by homogeneous and heterogeneous nanocatalysis. J Mol Catal A Chem 331(1–2):86–100 10. Kusuma PD, Setianingsih C et al (2019) River water pollution pattern prediction using a simple neural network. In: 2019 IEEE international conference on industry 4.0, artificial intelligence, and communications technology (IAICT). IEEE, pp 118–124 11. Li XY, Bu FJ (2012) Prediction of settlement of soft clay foundation in highway using artifical neural networks. In: Advanced materials research, vol 443, pp 15–20. (Trans Tech Publ, 2012) 12. Meinanda MH, Annisa M, Muhandri N, Suryadi K (2009) Prediksi masa studi sarjana dengan artificial neural network. Internetworking Indones J 1(2):31–35 13. Moayedi H, Osouli A, Nguyen H, Rashid ASA (2019) A novel harris hawks’ optimization and k-fold cross-validation predicting slope stability. Eng Comput 1–11 14. Purnomosutji Dyah Prinajati (2019) Kualitas air waduk jatiluhur di purwakarta terhadap pengaruh keramba jaring apung. J Community Based Environ Eng Manag 3(2):78–86 15. Sari Y (2017) Prediksi harga emas menggunakan metode neural network backpropagation algoritma conjugate gradient. Jurnal Eltikom 1:2 16. Suharto B, Dewi L, Mustaqiman AN, Marjo TRAK (2019) The study of water quality status in the ngebrong river with physical and chemical parameters in the Tawangsari Barat region, Pujon district, Malang regency. Indones J Urban Environ Technol 2(2):164–180 17. Talari S, Shafie-Khah M, Siano P, Loia V, Tommasetti A, Catalão JPS (2017) A review of smart cities based on the internet of things concept. Energies 10(4):421 18. Thike PH, Zhao Z, Liu P, Bao F, Jin Y, Shi P (2020) An early stopping-based artificial neural network model for atmospheric corrosion prediction of carbon steel. CMC-Comput Mater Contin 65(3):2091–2109 19. Vijayakumar N, Ramya R (2015) The real time monitoring of water quality in IoT environment. In: 2015 international conference on innovations in information, embedded and communication systems (ICIIECS). IEEE, pp 1–5
Sentence-Level Automatic Speech Segmentation for Amharic Rahel Mekonen Tamiru and Solomon Teferra Abate
Abstract The extraction of information from a large archive requires extracting both audio file structure and its linguistic content. One of these processes is to add sentence boundaries to the automatic transcription of speech contents. In this work, we present an automatic sentence-level speech segmentation system for the Amharic language. We have used Amharic read speech, and a spontaneous speech corpus for the development of an automatic speech segmentation system. In this work, an automatic speech segmentation system is completed by forced alignment. Monosyllable, tied-state tri-syllable, and mono phone acoustic models have been developed to build forced alignment. Rule-based and AdaBoost have been used to differentiate the accurate boundaries from candidates. The evaluation of the experiments shows that encouraging automatic speech segmentation results are achieved using monosyllable acoustic model forced alignment. We have achieved the best results using a decision tree classifier with a segmentation accuracy of 91.93 and 85% result for read loud and spontaneous speech, respectively. Keywords Sentence segmentation · Acoustic model · Decision tree classifier · Support vector machine · AdaBoost · Forced alignment
1 Introduction Current research in the field of speech technology aims to develop efficient speech systems that can be used for communication between people and devices for processing information. Unfortunately, the ability of a computer to understand speech is still weak. Since human speech is continuously generated, the most difficult aspect of speech that challenges machines is its segmentation. Several speech processing systems require speech segmentation waveform into principal acoustic R. M. Tamiru (B) Bahir Dar University, Bahir Dar, Ethiopia S. T. Abate Addis Ababa University, Addis Ababa, Ethiopia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_42
477
478
R. M. Tamiru and S. T. Abate
units (phonemes, syllables, sentences, and paragraphs). In the field of speech technology, it is a very primary phase. The primary purpose of this segmentation process is to use the outcome for other areas of speech research. In several speech research areas such as speech recognition, speech synthesis, language generation-based system, and language identification, and speaker identification system, speech segmentation is an important preprocessing phase [15]. Therefore, to achieve this objective, a wellorganized, precise, and simple technique is required. While tending to the segmentation task, it must be considered that speech is not clearly organized as written text, especially spontaneous speech. For both humans and machines, it is hard to identify sentence units from continuous speech. Jones found that the legibility of speech transcripts is important for sentence breaks [10]. In addition, missing segmentation of sentences makes certain utterances vague in importance. Similarly, Kolar found that missing sentence boundaries cause automatic downstream processes to have major problems [11]. When evaluating the syntactic complexity of speech, sentence boundaries are significant, which can be a strong indicator of disability [6]. Manual segmentation of speech is costly, boring, prone to mistakes, and time-consuming if it has to be performed by humans [5]. Given these and other facts, it is becoming increasingly important to improve automatic speech segmentation. There are similar studies performed in different languages on speech segmentation. However, there has been no previous work done for Amharic on sentence-level speech segmentation. While there are similarities between different languages in the structure, and function of prosody, there are major differences suggested by crosslinguistic comparison of characteristics [16, 13]. We do not know which features or their combinations for Amharic speech will result in optimal segmentation of speech. We present our work in this paper to create an acoustic model and use this model to facilitate the segmentation process and to get segmented and labeled Amharic speech data. We have used two classification algorithms to detect sentence boundaries. The first is rule-based and the second is a statistical method, AdaBoost. In order to construct a strong classifier, a popular boosting approach known as AdaBoost combines weakbased classifiers. To generate an effective classifier, the concept of boosting is to combine several poor learning algorithms. There are several base classifiers used by AdaBoost. It uses the classifier of the decision tree as the default classifier [4]. We used a decision tree classifier and support vector classifier (SVC) as a base estimator in our experiment. In the following section, we explain some of the previous works that have developed automatic speech segmentation systems. We’ll provide a brief overview of the Amharic language considered in this study in Sect. 3. In Sect. 4, Amharic corpora used for our studies will be present. The results of automated speech segmentation experiments performed are summarized in Sect. 5. Finally, there are closing remarks and potential future studies to strengthen the work done by this study.
Sentence-Level Automatic Speech Segmentation for Amharic
479
2 Related Works While automatic speech segmentation for other languages has been developed by different researchers, most of the studies were performed using only prosodic features directly from the speech to define sentence boundaries. There is a research report on spontaneous Malay language speech segmentation [8] For the detection of sentence limits, acoustic and prosodic features were used. Other works have been carried out, such as an automatic segmentation of speech into sentences-like units [11], voice segmentation without voice recognition [14], and prosody-based automatic segmentation of speech into sentences and topics. There is no sentence-level speech segmentation for Ethiopian languages that have done well in other languages. While there are similarities between various language such as pause, natural tendency for F0, in type and function of prosody, there are major differences suggested by crosslinguistic comparison of characteristics such as different timing of essentially comparable phenomena, different relationships between F0 or different mutual effects of Fo, duration, and intensity [16]. The above research papers did not use the form of forced alignment for sentence segmentation. The purpose of this research is therefore to select the best unit for acoustic models to design a speech segmenter with a minimal error that can help in the creation of an efficient system of speech segmentation.
3 Amharic Language Amharic is the government of Ethiopia’s official working language, out of the 89 languages registered in the country. Next to Arabic, Amharic is the second largest Semitic language spoken in the world [3]. Thus, with at least 27 million native speakers, it is one of the most commonly spoken Semitic languages in Ethiopia. On the other hand, very limited research on acoustic characteristics and spoken language technology, a lack of electronic tools for speech, and language processing such as a transcribed speech data, monolingual corpora, and pronunciation dictionaries [1] has also been described as an under-resourced language. The Addis Ababa, Gojjam, Wollo, Gondar, and Shewa dialects are five dialects of Amharic. It is taken from the place where they are spoken. As the standard dialect, Addis Ababa’s speech has arisen and has wide currency in all Amharic-speaking communities. The Amharic language’s basic word order is SOV. It is one of the languages which have its own method of writing. It is written using the term Fidel. Amharic, a semi-syllabic system, has its own writing system. It has about 33 primary characters, each representing a consonant and seven vowels, resulting in 196 different pronunciations in 231 CV syllables. Across all Amharic dialects, Fidel is used [7].
480
R. M. Tamiru and S. T. Abate
3.1 The Amharic Consonants and Vowels The Amharic language consists predominantly of 38 phonemes, 7 vowels, and 31 consonants. An additional consonant /v/ is inherited and contains a total of 39 phonemes [2]. In general, consonants are categorized as stops, fricatives, nasals, fluids, and semi-vowels [1]. During speech formation, vowels have various categories depending on the location and height of the tongue and their shapes. Based on the location of the tongue in the oral cavity, vowels are divided into tree front, central, and back. These vowels are often graded into high, middle, and low on the basis of the height of the tongue. Vowels are categorized into two sub-classes that are rounded and unrounded based on their shapes during speech development [1,9]. Amharic has a total of 7 vowels, five of the most common vowels, a, e, I o, and u, plus two additional central vowels, E and I.
4 Corpus Preparation Speech and text corpora are one of the most essential tools for any speech segmentation system and development. One of the most complicated and costly activities when dealing with under-resourced languages is to collect structured and annotated corpora [12]. Amharic speeches and related text data were gathered from the Amharic Bible, broadcast news, broadcast conversation, and Amharic fictions in order to obtain optimal speech corpus. The collected corpus contains over 5 h of audio, along with its text corpus, preserved in 40 audio files. Problems such as spelling and grammar errors are corrected in the document corpus, abbreviations are extended, and numbers are transcribed textually. We also produced two corpora that lead to two different types of speech: a read speech corpus and a spontaneous speech corpus. By gathering existing broadcast news corpus and Amharic bibles, the first corpus was created, while the second corpus was created by mixing broadcast conversation and Amharic fiction (fikr skemekaber). 4000 Amharic speech sentences and the corresponding text corpus are gathered for training in this work.
5 Speech Segmentation System Data preparation, HMM model building, forced alignment, and sentence boundary detection are included in the general model of the automatic sentence speech segmentation framework.
Sentence-Level Automatic Speech Segmentation for Amharic
481
5.1 Data Preparation The steps for getting data ready for the speech segmentation system include data collection, manual segmentation, lexicon preparation, pronunciation dictionary preparation, and feature extraction. Manual segmentation The next critical step in data processing, with the speech and corresponding text corpus, is manual segmentation. The development of a sentence-level speech segmentation system for Amharic was one of the main objectives of this work. Since no speech corpora with sentence break annotation existed, such corpora had to be prepared as the very first part of our work. Text and speech corpora are divided into sets of training and testing. Both for the training and testing results, manual segmentation should be performed. We have, therefore, manually segmented the collected speech at the sentence level. For training, 2000 spontaneous speech sentences and 2000 read speech sentences are segmented, while for research, 400 speech sentences from both styles of speech are segmented. In order to be used in the HTK environments, we have transliterated the text transcription of both the training and testing sets into their corresponding ASCII representation. Lexicon preparation For our speech automated segmentation system, lexicons are prepared as pronunciation dictionaries. Lexicons are prepared with letter sequences, based on our system. ”. There are three orthographies For instance, let’s consider the Amharic word “ ” which are , , and . The corresponding transcription of the Amharic word “ of ASCII becomes “faTara”. Pronunciation dictionary By taking lexicons as data, we have prepared pronunciation dictionaries. Feature extraction Parameterizing the raw speech of the waveforms into sequences of feature vectors is the final stage of data preparation. The method of converting the speech waveform into a collection of feature vectors is feature extraction. To parameterize the speech signals into feature vectors with 39 MFCC coefficients, we use Mel Frequency Cepstral Coefficients.
5.2 HMM Model Building Using Amharic speech and its text scripts, we have developed the acoustic model and compiled it into a statistical representation of sounds that make up words. All the acoustic models were constructed using the HTK toolkit in the same manner
482
R. M. Tamiru and S. T. Abate
[17]. The basic units of speech used in our research are syllable and phone. We are using HMM in this research to model the acoustic component. Three modeling methods, i.e. syllable/phone-based acoustic model and tied-state acoustic model, are used to perform the acoustic modeling system. We first developed an independent acoustic (monophone) model. To represent each phone, the acoustic model uses a 3-state left-to-right HMM without a skip. Then we initialized the model and created the monosyllable model with flat start techniques. In the next step, the tri-syllable model was derived by cloning, and then re-estimating the respective monosyllable models using the tri-syllable transcription.
5.3 Forced Alignment To line up the written words with the spoken words, we have used monosyllable, tied-state tri-syllable, and monophone acoustic models. Various acoustic models can produce slightly different outcomes of forced alignment.
5.4 Speech Segmenter Automated segmented results of the test data set are given by the speech segmenter. A simple classification method based on the rules is used in our initial sentence segmentation experiment. The segmenter conducts the actual segmentation into sentencelevel files of the continuous speech and generates a corresponding transcription file. In our second experiment, features from forced alignment are extracted. Then AdaBoost is used to distinguish between false and true boundaries. In our experiment, we have used a decision tree classifier and support vector classifier (SVC) as a base estimator.
5.5 Experimental Results In our initial experiment, the Speech segmenter takes two main inputs: forced alignment and an audio file. Then, from forced alignment pause features are extracted. Pause duration is calculated and is used to evaluate the sentence boundary as a threshold. In the first experiment, we used the rule-based method for sentence segmentation. If the characteristics of a boundary candidate are assessed as true, the boundary candidate indicates a sentence boundary. Meanwhile, if the characteristics of a boundary candidate are assessed as FALSE, the boundary candidate is not sentence boundary. To detect sentence boundaries, we tried different threshold values (10,000, 500, 800, 1000 ms). In this experiment, we decide the minimum pause for sentence break is 1000 ms (1 s). In the second experiment, pause features from forced alignment are extracted. A statistical method, AdaBoost, is then applied to all
Sentence-Level Automatic Speech Segmentation for Amharic Table 1 Segmentation accuracy of read-aloud and spontaneous speech based on rule-based method
483
Read-aloud speech
Spontaneous speech
SSA (%)
SSA (%)
Monosyllable acoustic model
69.1
53
Tied-state syllable acoustic model
61
47
Monophone acoustic model
56
45
candidates for sentence boundaries. With two corpora Amharic read-aloud speech and spontaneous speech, we test our methods. All experiments were evaluated using human-generated reference transcripts. Regarding Sentence Segmentation Accuracy (SSA), the experimental output is presented. The comparison of the automated sentence segmentation experiments carried out with respect to the monosyllable, tied-state syllable, and monophone as an acoustic model toward read-aloud and spontaneous speech based on the rule-based approach is shown in Table 1. The comparison of the automated sentence segmentation experiments performed with respect to the monosyllable, tied-state syllable, and monophone as an acoustic model against the statistical approach, AdaBoost, based on read-aloud and spontaneous speech is shown in Table 2. As can be seen from Tables 1 and 2, we have achieved (69.1 and 53% accuracy) for read-aloud and spontaneous speech, respectively, using rule-based method and (91.93 and 85% accuracy) result using decision tree classifier for read-aloud and spontaneous speech respectively on monosyllable acoustic model. The percentage of accuracy monophone acoustic model is low in both experiments. These indicate that various acoustic models produce slightly different outcomes of forced alignment. The better acoustic model gives the more accurate the forced alignments. Therefore, the researcher believes that the best acoustic model Table 2 Segmentation accuracy of read-aloud and spontaneous speech based on statistical method, AdaBoost Monosyllable Decision tree acoustic model classifier
Read-aloud speech
Spontaneous speech
SSA (%)
SSA (%)
91.93
85
SVM classifier 84.3 Tied-state Decision tree acoustic model classifier
88.93
SVM classifier 80.08 Monophone Decision tree acoustic model classifier
82.2
SVM classifier 80.2
79 82.7 70 80.6 76
484
R. M. Tamiru and S. T. Abate
for Amharic speech to achieve correct forced alignment is a monosyllable acoustic model. The overall system efficiency is affected by forced alignment. The assessment of the experiments indicates that the AdaBoost classifier achieves greater accuracy than rule-based method. As shown in the experiment, the AdaBoost classifier consistently showed good results, especially in the classifier of decision tree. The resulting processing time per decision tree classifier is faster than the classifier of the support vector. The majority of our experiments showed the best results. The researcher therefore believes that decision tree classifier is the best method of classification for the segmentation of Amharic speech than support vector classifier (SVC).
6 Conclusion and Further Work We presented automatic sentence-level speech segmentation that we performed for the Amharic language in this work. In this paper, the automated speech segmentation systems introduced are the first for the Amharic language. In general, it is accomplished by defining the boundaries in a continuous speech signal between sentences. These findings are promising and will open a broad door for further studies. We are also working in this area by using the neural network-based approach to achieve greater accuracy for these languages. To study strong speech systems, it is also important to continue further study in speech preprocessing. A study on automatic speech segmentation on prosodic characteristics (perceptual and linguistic levels) and research on automatic speech segmentation on other distinct units (syllable, phoneme, and word level) are also required for feature work.
References 1. Abate ST and Menzel W (2007) Syllable-based speech recognition for Amharic, vol 33. https:// doi.org/10.3115/1654576.1654583 2. Amare G (2018) Yamariñña Säwasäw. J Ethiop Stud 28(2):55–60. http://www.jstor.org/stable/ 41966049 3. CSA (2007) Central Statistics Agency CSA 4. Deng H (2007) A brief introduction to adaboost, pp 1–35 5. Emiru ED and Markos D (2016) Automatic speech segmentation for amharic phonemes using hidden Markov model toolkit (HTK), 4(4):1–7 6. Fraser KC et al (2015) Sentence segmentation of aphasic speech HLT-NAACL 2015–human language technology conference of the North American chapter of the association of computational linguistics proceedings of the main conference, pp 862–871. https://doi.org/10.3115/ v1/N15-1087 7. Girmay G (2008) Prosodic modeling for Amharic 8. Jamil N et al (2015) Prosody-based sentence boundary detection of spontaneous speech. In: Proceedings - international conference on intelligent systems modelling and simulation ISMS, 2015-Septe(July 2017):311–317. https://doi.org/10.1109/ISMS.2014.59 9. Jokisch O, Birhanu Y and Hoffmann R (2012) Syllable-based prosodic analysis of Amharic read speech, pp 258–262
Sentence-Level Automatic Speech Segmentation for Amharic
485
10. Jones D et al (2003) Measuring the readability of automatic speech-to-text massachusetts institute of technology information systems technology group 2. System 1585–1588 11. Koláˇr J (2008) Automatic segmentation of speech into sentence-liked units 12. Lewis WD and Yang P (2012) Building MT for a severely under-resourced language: White Hmong. In: AMTA 2012—proceedings of the 10th conference of the association for machine translation in the Americas 13. Mekonen R (2019) Prosody based authomatic speech segmentation for Amharic 14. Mulgi M, Mantri V, Gayatri M (2013) Voice segmentation without voice recognition, 2(1), 2–6 15. Ostendorf M et al. (2008) Speech segmentation and its impact on spoken language technology. IEEE Signal Process Magazine 1–20. https://doi.org/10.1117/12.877599 16. Vaissière J (2012) Language-independent prosodic features, 53–65 17. Young S et al (2009) The HTK book
Urban Change Detection from VHR Images via Deep-Features Exploitation Annarita D’Addabbo , Guido Pasquariello, and Angelo Amodio
Abstract Land cover change detection from remote sensing data is a crucial step not only in the periodic environmental monitoring but also in the management of emergencies. In particular, the availability of Very High Resolution (VHR) images enables a detailed monitoring on urban, regional or larger scale. Together with data, new methodologies able to extract useful information from them are needed. In the present work, a transfer learning technique is presented to produce change detection maps from VHR images. It is based on the exploitation of opportune deep-features computed by using some pre-trained convolutional layers of AlexNet. The proposed methodology has been tested on a data set composed of two VHR images, acquired on the same urban area in July 2015 and July 2017, respectively. The experimental results show that it is able to efficiently detect changes due to the construction of new buildings, to variation in roof materials or to vegetation cut that has made visible the underlying non-vegetated areas. Moreover, it is robust with respect to false positive because changes due to different occupation of parking areas or due to building shadows are not detected. Keywords Change detection · Remote sensing · Urban monitoring · VHR images · Deep learning · Transfer learning
1 Introduction Monitoring land cover changes from remote sensing data plays a key role both in the environmental periodic control and in the management of emergencies due to the occurrence of natural disasters, such as floods, landslides and fires [1]. In order to perform land cover change detection, multi-temporal images acquired on the same A. D’Addabbo (B) · G. Pasquariello IREA-CNR, Bari, Italy e-mail: [email protected] A. Amodio Planetek Italia S.R.L, Bari, Italy © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_43
487
488
A. D’Addabbo et al.
geographical area are needed, together with opportune methodologies to extract useful information from data [2]. In the remote sensing literature, many techniques have been presented to carry out land cover change detection. They can be divided into two main groups, i.e. methodologies based on the fusion of the multi-temporal information at the feature level or at the decision level [2]. Kay differences exist among the two approaches. First of all, when the fusion is carried out at the feature level, a map with two classes is generally produced in the output, “pixel changed” (ωC ) and “pixel not changed” (ωNC ), while it is not possible to provide information on the type of change occurred. This happens because unsupervised classification techniques are generally used in this approach. Although these methods are not able to provide information on the transitions occurred from one land cover class to another, they have the great advantage that they don’t require any a priori information about what has happened on the ground (they do not need any ground truth). On the other hand, methodologies based on the fusion at the decision level can provide an output map in which the types of changes that have occurred are explicitly identified, but, being based on supervised (or partially supervised) classification techniques, they can only be used if the ground truth is available for all the images analyzed (or at least for one) [2, 3]. Unfortunately, the collection of a proper ground truth is a complex, time-consuming and expensive task [1]. Moreover, when these methodologies are used, it is essential that each classified map on the single date is extremely accurate, because large errors in the classification of each image produce a critical over-evaluation of land cover changes [4]. For all these reasons, methodologies based on the fusion of information at feature level and exploiting unsupervised classification techniques are widely used in many applicative domains [5–11]. These techniques are based on a preliminary step, that is the generation of new features, highlighting useful multi-temporal information from data [2]. These features are usually referred to as “Change Indices” (CI) since they are employed to highlight changes occurred in bi-temporal image pairs (X1 and X2 acquired over the same area, respectively, at different times t1 and t2 ). Different CI can be used depending on the input data characteristics. When optical data are considered, a difference operator is generally used such as Univariate Image Differencing [12], Change Vector Analysis [5] and some of its variants [13]. These operators can be directly applied to the different bands of the images or to their appropriate combinations [14, 15]. If Synthetic Aperture Radar (SAR) images are considered, the additive noise model, holding for optical images, is not applicable, therefore the difference operator is not effective. To overcome this problem, multi-temporal SAR images are generally compared using an Image Rationing operator [12]. It allows to reduce the multiplicative noise of the SAR images, and it is particularly efficient in change detection [16]. Another set of operators widely used in the literature to perform feature-level fusion with SAR images is constituted by similarity measures, such as the Kullback-Lieber divergence [17], Mutual Information [18] and Mixed Information [19].
Urban Change Detection from VHR Images …
489
After calculating the CI, it is essential to properly analyze the information extracted to derive the change map. To perform this task, many unsupervised methods have been proposed in the literature, usually based on the selection of a decision threshold separating changed from unchanged pixels [2]. The decision threshold can be selected either with a manual trial-and-error procedure or with automatic techniques, following a Bayesian minimum-error/cost decision rule [5] or using methods based on fuzzy theory [20, 21], etc. Most of the above-mentioned methods are pixelbased methods, in which each pixel is considered independently from the other ones surrounding it. They are useful and efficient when low-resolution images are analyzed because it is reasonable to assume that changes on the ground have dimensions comparable with the pixel spatial resolution [2]. However, when Very High spatial Resolution (VHR) images are analyzed, many challenging factors have to be considered. First of all, in VHR images there are many details that don’t correspond with actual changes on the ground, but they can be misclassified. For example, shadows generated by the structures could be detected as changes because images have been acquired in different periods of the year and at different hours. Similarly, the presence of cars or vegetation at a different stage of growth can lead to an erroneous identification of changes. More importantly, it is needed to consider that neighboring pixels are not independent, but they can belong to the same ground object [1]. For this reason, it becomes essential to consider contextual information. Recently, many methods have been proposed in the literature to deal with change detection in VHR images [10, 11, 22]. In this framework, opportune features capturing the semantic rich information encompassed in VHR images have to be considered. In particular, Deep Neural Networks (DNN) have been recently proved to be able to extract useful features from data, without needing any a priori information [23]. Many DNNs have been applied in different remote sensing applications, often overcoming the performances of any other classification technique [24]. Here, we are interested in a particular application, i.e. the transfer learning one, in which a DNN pre-trained on a data set is used to extract opportune features from another data set, belonging to a different applicative domain [23, 25]. These features, named deep-features, have been proved to be invariant with respect to the applicative domain [26] and have been recently used to classify remote-sensed images [27, 28] or to extract useful information from remotely sensed data to obtain change maps [23]. Moreover, deep-features are able to efficiently capture contextual information and so they can be particularly useful in the analysis of VHR images. In the present work, a change detection methodology based on deep-features is presented. The deep-features have been computed independently on each image in the multi-temporal data set by considering AlexNet [29], a well-known convolutional neural network. Successively, they have been opportunely compared in order to produce the change map. In Sect. 2, the proposed methodology is discussed in detail. In Sect. 3, the available data set is described and the experimental results are presented. Finally, in Sect. 4, some conclusions are drawn.
490
A. D’Addabbo et al.
2 Methods Let us consider two VHR images, named X1 and X2 , acquired on the same geographical area, respectively, at time t1 and t2 . We want to detect land cover changes that occurred between acquisitions. In Fig. 1, a block scheme depicting the proposed methodology is reported.
2.1 Images Pre-processing First of all, the two considered images have to be opportunely geo-referenced, orthorectified and co-registered, to be correctly superimposed. Then, if it is needed, images have to be accurately processed in order to reduce differences that do not depend on real changes that occurred on the ground. This pre-processing step usually includes radiometric and atmospheric corrections. It is conducted in different ways depending on the kind of considered data (either active SAR or passive optical).
2.2 Deep-Features Extraction Relevant features have been computed by using some layers of a convolutional neural network. In particular, to obtain the deep-features, the images X1 and X2 are given as input separately to a pre-trained convolutional network, a deep architecture capable of capturing information with an increasing level of abstraction and complexity through the convolution layers [23]. In the proposed methodology, each input image is analyzed independently by an AlexNet convolutional network [29], described below.
Fig. 1 Scheme of the proposed methodology
Urban Change Detection from VHR Images …
491
Fig. 2 AlexNet scheme, in blue the convolutional layers
AlexNet. AlexNet is a convolutional neural network that has been pre-trained by using the ImageNet data set [30], a data set composed of over 15 million labeled high-resolution images. In the training phase, 1.2 million images belonging to 1000 different classes were presented, including keyboards, mice, pencils, vegetables and many other objects [29]. AlexNet is composed of 25 layers [29], listed below and organized as shown in Fig. 2. • • • • • • • • •
input layer; 5 convolutional layers; 7 Rectified Linear Unit (ReLU) layers; 2 normalization layers; 3 pooling layers; 3 fully connected layers; 2 drop-out layers; 1 soft-max layer; output layer.
Each kind of layer has a well-defined function, useful for the achievement of high classification accuracy, together with high generalization capability and as low as possible computational cost. Here, we are interested only in convolutional layers that are used in the proposed methodology to extract deep-features. For this reason, they are described in detail. AlexNet convolutional layers. The input of each convolutional layer is an array of size M × N × B, where M and N are, respectively, the number of rows and columns of the input images and B is the number of bands. The output of each convolutional layer is also an array of size M1 × N1 × K, i.e. K feature maps of size M1 × N1 . Each convolutional layer is composed of K trainable filters of size L × L × B, also called filter bank, which connect the input feature map with the output feature map. ∼ The generic element xm1 ,n1 of each output map is computed as xm1 ,n1 =
B b=1
⎛ ⎝
L
i, j=1
⎞ b ⎠ f i,b j ∗ xm+ (i−l+ l ), n+( j−l+ l ) + bias 2
2
(1)
492
A. D’Addabbo et al.
Table 1 AlexNet convolutional layers Name
Input images dimensions (M × N × B)
Number of filters (K)
Filter dimension (L × L × B)
Padding
Stride
Output (M 1 × N 1)
Conv1
227 × 227 × 3
96
11 × 11 × 3
0
4/4
55 × 55
Conv2
27 × 27 × 96
256
5 × 5 × 48
2
1/1
27 × 27
Conv3
13 × 13 × 256
384
3 × 3 × 256
1
1/1
13 × 13
Conv4
13 × 13 × 384
384
3 × 3 × 192
1
1/1
13 × 13
Conv5
13 × 13 × 384
256
3 × 3 × 192
1
1/1
13 × 13
where xm,n is the generic element of the input array and f i,b j the generic element of the considered filter. Filter values and bias are the network parameters, computed from data in the training phase. In the present case, they have been determined by training AlexNet with the ImageNet data base, as previously stated. In particular, the first convolutional layer of AlexNet follows the input layer, so its input data are the input images with dimension 227 × 227 × 3. It is composed of 96 filters with dimensions 11 × 11 × 3. Convolution is performed without padding and with a stride equal to 4 in both directions. Accordingly, the output of this layer is composed of 96 patches with dimensions 55 × 55, since the following relation holds: M1 =
M + 2 · padding − L N + 2 · padding − L + 1, N1 = +1 stride stride
(2)
Each convolutional layer works as the first one. In Table 1, details on each convolutional layer are reported.
2.3 Deep-Features Selection Each convolutional layer computes K deep-features, but only some of them are relevant and useful in the change detection task. First of all, for each deep-feature the difference vector is computed as DCi j = FXi 1 ,C j − FXi 2 ,C j
(3)
where C j is the considered convolutional layer and the index i indicates the considered feature, i.e. if j = 1, then i = 1, …, 96. For each DCi j , mean μi and standard deviation σi are computed. Features are selected by considering an opportune threshold value, selected by the user by using a priori information or determined by a trial-and-error procedure, so
Urban Change Detection from VHR Images …
493
∗ ∗ i ∗ ∃ μi ≤ −μT H | μi ≥ μT H
(4)
The basic idea is that deep-features capturing information relevant in change detection exhibit a bi-modal distribution. In particular, it is assumed that each probability density function p(DCi j ) can be modeled as a mixture density distribution composed of two Gaussian distribution: the first one (with 0 mean) associated with no changed pixels and the other one (with a mean value different from 0) associated with changed pixels. Obviously, the capability to opportunely detect the two distributions is related with different factors: especially if their mean values are sufficiently greater than their standard deviations and if the change and unchanged classes are not severely unbalanced. The selected components of the difference vector are successively normalized ∗
∗
∀i ,
∗ D¯ Ci j
=
DCi j − μi
∗
σ i∗
(5)
Successively, the following selection is performed in order to identify the changed pixels in each selected component
∀x ∈
D¯ Ci∗j ,
BCi∗j (x)
= 1if
D¯ Ci∗j (x) ≥ DT H μi∗ ≥ μT H D¯ Ci∗j (x) ≤ − DT H μi∗ ≤ − μT H
(6)
A 0 value is assigned to the other pixels not satisfying the condition i*in (6). Binary images are so formed. Finally, a sum map is created Scj = i* Bcj . This map, opportunely thresholded, gives the output change map MCj ∀x ∈ SCj ,
MCj (x) =
1 0
SCj (x) ≥ ST H SCj (x) < ST H
(7)
3 Experimental Results 3.1 Data Set Description A data set composed of two VHR RGB images has been considered. The images have been acquired on the same geographical area in July 2015 and July 2017, respectively. The pixel resolution is equal to 50 cm. Images have been opportunely ortho-rectified and co-registered. The images depict an urban area in which some changes have occurred between the two acquisitions. Changes of interest are the ones due to new infrastructure construction, however, there are many other changes due to difference in vegetation
494
A. D’Addabbo et al.
Fig. 3 RGB composition of the input images acquired, respectively, in a July 2015 and in b July 2017
cover or in the occupation of car parking areas. The change ground truth is here collected by photo-interpretation. In Fig. 3, the RGB composition of the input images are reported.
3.2 Deep-Features Computation and Selection The two input images, split in patches with dimensions 227 × 227 × 3 have been independently analyzed by using AlexNet. Only the output of the first convolutional layer has been considered, so 96 deep-features have been computed for each input image. The difference vector DCi 1 has been computed for each deep-feature by using Eq. (3), then mean value of each difference image has been considered to select the relevant ones. After a trial-and-error procedure, a threshold value μT H = 15 has been set: 13 deep-features and the correspondent difference images have been selected, as listed in Table 2. It is worth noting that among the 96 difference images, there aren’t images exhibiting a bimodal distribution because change mode is completely “dipped” in the no change distribution. It is important to highlight that the selection performed by using Eq. (4) corresponds to choose, in the AlexNet C1 filter bank, filters more similar to a low-pass filter. In this way, only changes with a given spatial extension are pointed out. Moreover, by considering the unimodal distributions, state-of-theart methodology, as the one described in [5], has been considered unusable. Two examples of difference images are reported in Fig. 4.
Urban Change Detection from VHR Images … Table 2 Selected deep-features, their mean and standard deviation
Deep-feature i ∗
495 Mean value μi
∗
Standard deviation σ i
4
17.37
27.83
10
36.44
69.88
11
− 22.48
30.38
16
− 15.33
15.13
23
21.49
17.81
32
15.74
32.02
39
− 39.55
27.80
41
− 45.18
79.11
43
67.12
54.77
54
− 20.97
20.85
56
− 18.56
27.96
77
34.48
35.41
93
− 19.47
38.25
∗
Fig. 4 Difference images computed from the deep-features 41 (right) and 43(left)
3.3 Change Map Computation Each selected difference image has been normalized by using Eq. (5) and then binarized by using Eq. (6), with DT H = 2. Finally, the change map MC1 has been obtained by applying Eq. (7), in which ST H = 4. It is important to remark that the parameter values (μT H , DT H , ST H ) have been set by considering a trial-and-error procedure: they have been chosen by considering the ground truth collected by photo-interpretation. In Fig. 5, the change map MC1 is reported together with some box, object of the following discussion.
496
A. D’Addabbo et al.
Fig. 5 Change map
3.4 Discussion The red boxes in Fig. 5 correspond to the construction of new buildings. These boxes are reported in Fig. 6, together with the corresponding area in the change map and in the input images. The change with greater extension, shown in the red box labeled as 1 in Figs. 5 and 6, corresponds to the construction of two new buildings between 2015 and 2017 and the transformation of an adjacent area completely vegetated into an area covered with bricks or asphalt. Similarly, changes in boxes 2 and 3 correspond to the construction of new buildings. In box 4, two changes due to the creation of a new swimming pool and the enlargement of a pre-existing building are depicted.
Fig. 6 Changes due to building construction
Urban Change Detection from VHR Images …
497
Fig. 7 Changes due to difference in cover type
Fig. 8 Other interesting changes due to difference in cover type
The orange boxes in Fig. 5 are also reported with details in Fig. 7. These changes concern swimming pools that have been covered or emptied or buildings that have changed their surface covering. A similar situation is depicted in the gray box in Fig. 5 (also in Fig. 8) where a swimming pool has been built and a root exhibits a different cover. However, in this case there is also another change due to a difference in the vegetation: in the image acquired in 2017 there is a tennis field, not present in the 2015 image. This change is not captured by the proposed algorithm, such as many other changes due to tree cutting that occurred in areas that however remain vegetated. On the other hand, if the tree cutting makes visible buildings or soil sealed, these occurrences are detected as changes. Examples of these changes are reported in the violet boxes in Fig. 5 and, in more detail in Fig. 9. Finally, it is important to point out that in the considered images, the only changes due to new building construction are the one in the red boxes and they are correctly detected. Instead there is only one building demolished (i.e. it is in the 2015 image but it isn’t in the following one), that is not detected by the proposed methodology.
498
A. D’Addabbo et al.
Fig. 9 Changes due to vegetation cut
4 Conclusion A new methodology to perform change detection from VHR remote-sensed images has been proposed. In order to capture high-level semantic and contextual information, it is based on the comparison and selection of opportune deep-features, which are independently computed for each input image by using a pre-trained convolutional network, named AlexNet. In particular, the first convolutional layer of AlexNet is here used to compute the deep-features. The proposed methodology has been tested on a data set composed of two VHR optical images acquired on the same urban area in July 2015 and July 2017, respectively. Experimental results show that it is able to efficiently detect changes due to the construction of new infrastructures: all the new buildings in the scene are correctly detected. Moreover, changes due to difference in building or swimming pool covering are detected. In particular, the former could be particularly useful to map measures aimed to increase building energy efficiency. Also changes due to the cutting of the vegetation, which has made visible underlying non-vegetated areas, are correctly detected: these changes could be particularly useful to detect the reduction of vegetated areas in urban contexts and the resulting increase of soil sealing. Finally, the proposed methodology is robust with respect to false positive because changes due to different occupation of parking areas or due to building shadows (present in the considered images and due to their high resolution) are not detected.
Urban Change Detection from VHR Images …
499
Acknowledgements This research has been carried out in the framework of project RPASInAir, funded by the Italian Ministry of Education, University and Research, D.D. 2295 del 12/09/2018, PON R&I 2014-2020 and FSC, and the CosteLab Project funded by ASI.
References 1. Karpatne A, Jiang Z, Vatsavai RR, Shekhar S, Kumar V (2016) Monitoring land-cover changes : a machine learning perspective. IEEE Geosci Rem Sens Mag 4(2):8–21 2. Bovolo F, Bruzzone L (2015) The time variable in data fusion: a change detection perspective. IEEE Geosci Rem Sens Mag 3(3):8–26 3. Cossu R, Chaudhuri S, Bruzzone L (2005) A spatial-contextual partially supervised classifier based on Markov random fields. IEEE Geosci Remote Sens Lett 2:352–356 4. Castellana L, D’Addabbo A, Pasquariello G (2007) A composed supervised/unsupervised approach to improve change detection from remote sensing. Pattern Recogn Lett 28(4):405–413 5. Bruzzone L, Prieto DF (2000) Automatic analysis of the difference image for unsupervised change detection. IEEE Trans Geosci Remote Sens 38(3):1170–1182 6. Häme T, Heiler I, Miguel-Ayanz JS (1998) An unsupervised change detection and recognition system for forestry. Int J Remote Sens 19(6):1079–1099 7. Quegan S, Toan TL, Yu JJ, Ribbes F, Floury N (2000) Multitemporal ERS SAR analysis applied to forest mapping. IEEE Trans Geosci Remote Sens 38(2):741–753 8. Liu JG, Black A, Lee H, Hanaizumi H, Moore JM (2001) Land surface change detection in a desert area in Algeria using multitemporal ERS SAR coherence images. Int J Remote Sens 22(13):2463–2477 9. Bovolo F, Marin C, Bruzzone L (2013) A hierarchical approach to change detection in very high resolution SAR images for surveillance applications. IEEE Trans Geosci Remote Sens 51(4):2042–2054 10. Brunner D, Lemoine G, Bruzzone L (2010) Earthquake damage assessment of buildings using VHR optical and SAR imagery. IEEE Trans Geosci Remote Sens 48(5):2403–2420 11. Poulain V, Inglada J, Spigai M, Tourneret JY, Marthon P (2011) High-resolution optical and SAR image fusion for building database updating. IEEE Trans Geosci Remote Sens 49(8):2900–2910 12. Singh A (1989) Digital change detection techniques using remotely-sensed data. Int J Remote Sens 10(6):989–1003 13. Bovolo F, Marchesi S, Bruzzone L (2012) A framework for automatic and unsupervised detection of multiple changes in multi-temporal images. IEEE Trans Geosci Remote Sens 50(6):2196–2212 14. Byrne GR, Crapper PF, Mayo KK (1980) Monitoring land cover changes by principal components analysis of multi-temporal Landsat data. Remote Sens Environ 10(3):175–184 15. Collins JB, Woodcock CE (1994) Change detection using the Gram-Schmidt transformation applied to mapping forest mortality. Remote Sens Environ 50(3):267–279 16. Cihlar J, Pultz TJ, Gray AL (1992) Change detection with synthetic aperture radar. Int J Remote Sens 13(3):401–414 17. Inglada J, Mercier G (2007) A new statistical similarity measure for change detection in multi temporal SAR images and its extension to multiscale change analysis. IEEE Trans Geosci Remote Sens 45(5):1432–1445 18. Chatelain F, Tourneret JY, Inglada J, Ferrari A (2007) Bivariate gamma distributions for image registration and change detection. IEEE Trans Image Process 16(7):1796–1806 19. Gueguen L, Datcu M (2009) Mixed information measure: application to change detection in earth observation. In: Proceedings of the 5th International Workshop on the analysis of multi-temporal remote sensing images (2009)
500
A. D’Addabbo et al.
20. Pal SK, Ghosh A, Shankar BU (2000) Segmentation of remotely sensed images with fuzzy thresholding, and quantitative evaluation. Int J Remote Sens 21(11):2269–2300 21. Gong M, Zhou Z, Ma J (2012) Change detection in synthetic aperture radar images based on image fusion and fuzzy clustering. IEEE Trans Image Process 21(4):2141–2152 22. Klaric MN, Claywell BC, Scott GJ, Hudson NJ, Sjahputera O, Li Y, Barratt ST, Keller JM, Davis CH (2013) GeoCDX: an automated change detection and exploitation system for highresolution satellite imagery. IEEE Trans Geosci Remote Sens 51(4):2067–2086 23. Saha S, Bovolo F, Bruzzone L (2019) Unsupervised deep change vector analysis for multiplechange detection in VHR images. IEEE Trans Geosci Remote Sens 57(6):3677–3693 24. Zhang L, Zhang L, Du B (2016) Deep learning for remote sensing data: a technical tutorial on the state of the art. IEEE Geosci Remote Sens Mag 4(2):22–40 25. Volpi M, Tuia D (2016) Dense semantic labeling of subdecimeter resolution images with convolutional neural networks. IEEE Trans Geosci Remote Sens 55(2):881–893 26. Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) CNN features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 806–813 27. Penatti OAB, Nogueira K, dos Santos JA (2015) Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 44–51 28. Nogueira K, Penatti OAB, dos Santos JA (2017) Towards better exploiting convolutional neural networks for remote sensing scene classification. Pattern Recogn 61:539–556 29. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the advances in neural information processing systems (NIPS), 1097–1115 30. ImageNet. http://www.image-net.org
Region Awareness for Identifying and Extracting Text in the Natural Scene Vinh Loc Cu, Xuan Viet Truong, Tien Dao Luu, and Hoang Viet Nguyen
Abstract Understanding text that appears in a natural scene is essential to a wide range of applications. This issue is still challenging in the community of document analysis and recognition because of the complexity of the natural scene images. In this paper, we propose a new method to effectively detect text regions by identifying the location of characters. The mainstay of our work is to concentrate on designing a network for text detection and a network for text recognition. For text detection, the proposed method directly predicts characters or text lines that appear in the full scene images, and the approach is able to work for text with arbitrary orientations and quadrilateral shapes. To do that, our model produces the score of character position and the score of character similarity. These scores are utilized to group each character into a single object. For the text recognition phase, the detected text is fed into a second network which is used to extract the features from the text images and to map the features to a sequence of characters. The experiments are performed on public datasets, and the obtained results show that the proposed approach gives competitive performance compared to state-of-the-art approaches. Keywords Scene text detection · Text recognition · Character region · Position score · Similarity score
V. L. Cu (B) · X. V. Truong · T. D. Luu · H. V. Nguyen Can Tho University, Can Tho, Vietnam e-mail: [email protected] X. V. Truong e-mail: [email protected] T. D. Luu e-mail: [email protected] H. V. Nguyen e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_44
501
502
V. L. Cu et al.
1 Introduction Text is a vital means of communication and plays an important role in our lives. Thus, identifying and understanding the text in natural scenes have attracted much intention in the community of computer vision due to its practical applications such as autonomous driving, industrial automation, smart glasses, augmented reality, image retrieval, and blind navigation. The task of textual information extraction in a computer-readable format from the natural scene is one of the challenging topics in the computer vision field. Recently, there are a lot of researches on this topic [1–16], and they have shown promising performance. However, this issue is that there are still several remaining challenges due to the complexity of the real-world images. These challenges consist of illumination reflection, special characters, complex fonts, partial occlusion, multi-resolution, inplane-rotation, and multi-oriented text. Unlike optical character recognition (OCR), extracting text from the natural images is a difficult task to perform. OCR models like Tesseract [17] give good results in the case of scanned document images in which the images have a clean background with popular font, uniform single color, and plain layout. Meanwhile, the text embedded in the natural images has a complex background on which some regions are hard to distinguish from the actual text. The natural images often contain different orientations, sizes, colors, and even fonts, languages, and curves. These images are also subjected to a lot of degradation like motion blur, noise, low resolution, out of focus, various illumination, etc. In general, the issue is divided into two tasks: text detection and text recognition. The phase of text detection is a task of predicting and identifying the textual information from a natural image, and text recognition is a task of extracting the text regions into a computer-readable format. The traditional OCR approaches suffer from many complicated challenges when detecting and recognizing text in images captured in the natural scene that cause them to fail in most of the cases. Most of the existing approaches before the cutting-edge technology of deep learning are bottom up in which the traditional features extracted from the images were mostly used (e.g. MSER [18]), and these features are considered as an essential component. Meanwhile, the deep learning technique has been put forward for text detector by utilizing object or segmentation methods (e.g. FCN [19], Faster R-CNN [20]). The deep learning-based approaches train the model to localize the bounding boxes of word level. However, these approaches are hard to detect the text in case of a curve or deformation. In addition, the existing works for text detection in the natural image are mainly proposed for English or Chinese text, and we have only found few works for Vietnamese text [21]. In order to solve the mentioned problems, we propose a new technique for text detection in the natural scene, and the proposed method is capable to apply it to Vietnamese text. Our text detector is conducted by localizing the regions of individual characters and linking them to a text instance. The paper is organized as follows. The related works are mentioned in the next section. The proposed method is detailed in Sect. 3. In Sect. 4, the experimental results are depicted. Conclusion and future works are presented in Sect. 5.
Region Awareness for Identifying and Extracting Text …
503
2 Related Work The traditional approaches are depending on the features extracted in the image itself. Epshtein et al. [2] have come up with a method by using Stroke Width Transform (SWT) to extract the connected components from the image. The process of connected component extraction is conducted on the images and their complement. The work presented in [3] identifies initial candidates by using slide windows. Gomez and Karatzas [5] take advantage of the Maximally Stable Extremal Regions (MSER) algorithm. The MSER is applied to obtain the low-level regions which are considered as an initial set. The obtained regions are then grouped together in a way starting from the lowest levels along with an agglomerative process conducted by their analogy. Zhang et al. [22] have put forward a method in which they take advantage of the property of the text’s local symmetry and designed different features for detecting text area. Buta et al. [23] have put forward FASTex which is a system for detecting text quickly. The system has been adapted and it justified the renowned key-point detector like FAST for stroke extraction. The approaches relied upon hand-crafted features, and the performance is lower than technique depending on deep neural networks in terms of accuracy when dealing with challenges such as geometric distortion and low resolution. Recently, the direction of natural text detection has gradually entered a stage in which the algorithms depending on deep learning have gradually become popular. Huang et al. [24] have put forward a method by using MSER and sliding windows for localizing the regions of interest. MSER is utilized to mitigate the number of scanned windows, and the extracted regions are classified by making use of a convolutional neural network (CNN). Wafa et al. [1] have proposed a method the relies upon the analysis of multi-level connected component (CC) and learning the features of the textual component by convolutional neural networks (CNN). The work presented in [9] introduces another approach in which fully convolutional networks (FCN) are applied to detect text regions directly without using the tasks of word partition and candidate aggregation. The non-maximal suppression is then utilized to detect words or line text. This method predicts the rotated boxes or quadrangles of text lines or words within the text area. Another FCN-based-approach [10] directly produces the coordinates of bounding boxes surrounding the word. These coordinates are obtained from multiple network layers which are predicting the presence of text and coordinate offsets. The final results are the aggregation of all boxes. The authors also design several inceptionstyle layers to conduct the large variation in terms of word rate. The work presented in [11] directly predicts the bounding boxes of words with quadrilaterals by a single deep network that is an end-to-end process. In this work, the authors replace the representation of a rectangular box in a conventional object detector with the representation of an oriented rectangle or a quadrilateral. The segmentation-based method [13] considers the problem of text detection as a semantic segmentation problem. This approach aims to classify text positions in the images at a pixel level in which the FCN is utilized to extract the text blocks based on the generated heat map.
504
V. L. Cu et al.
Jianqi et al. [12] introduced Rotation Region Proposal Networks (RRPN) which is based on Faster-RCNN. This method aims to identify text with arbitrary orientation in the natural scene. Zhu and Uchida [8] have put forward a natural text detection system that justifies the detection process from text proposals to text.
3 Proposed Method The main objective of our work is to identify each character in the scene image. To do that, we train a network to predict the region of characters. Besides, we also demonstrate the result of text recognition.
3.1 Text Detection We have adopted the FCN network to this work. This kind of network has been initially proposed for semantic image segmentation. Recently, this well-known technology has been used in the field of document layout segmentation, and the FCN network gives good performance in this field. In order to apply the FCN network for our work, we changed the network for the task of document layout segmentation to solve the problem of detecting text positions. The modification is conducted in such a way that the convolutional layers and loss function are adjusted for feature representation and extraction to become better. The main component of our network is to directly infer the presence of text objects from the input images. The architecture of our network partly relies upon the VGG-16 network presented in [25] in which the fully connected layers are converted to fully convolutional networks. The output of the network is feature maps describing the position score and similarity score. The position score (Fig. 1b) depicts the probability where the obtained score marks the character center. Meanwhile, the similarity score (Fig. 1c) illustrates the probability where the obtained score marks the center of adjacent characters (characters within the region with suitable distance). Unlike the generation of ground truth by using a binary segmentation map, we make use of the Gaussian heat map to generate the probability which marks the center of the character. Besides, we also take advantage of this feature representation to learn both the position score and the similarity score. Let B(w) and l(w) be the region of bounding box and the length of the word w, l c (w) be the length of bounding boxes of characters; the score of the sample word w is calculated by s(w) =
l(w) − min(l(w), |l(w) − l c (w)|) l(w)
(1)
Region Awareness for Identifying and Extracting Text …
505
Fig. 1 The heat maps of character region
The feature maps of an image is generated by s(w), p ∈ B(w) F( p) = 1, other wise
(2)
where p is the pixel value in the bounding box B(w). The loss function of the proposed network is then defined as follows: L=
F( p).(G r ( p) − G r∗ ( p)22 + G s ( p) − G ∗s ( p)22 )
(3)
p
where G r ( p) and G s ( p) illustrate the predicted position score and similarity score, and G r∗ ( p) and G ∗s ( p) demonstrate the ground-truth position score and similarity score, respectively. With this loss, during the training process we have observed that the score s(w) is gradually increased, and the network can predict the characters more accurately. However, the network gives low scores for the unfamiliar text in the natural image at the early stage of training.
506
V. L. Cu et al.
3.2 Text Recognition Once the bounding boxes surrounding the text have been determined, the process of text recognition is then performed. There are various approaches for recognizing the text in the existing works. We just present an effective technique which is based upon the convolutional recurrent neural network. This approach is divided into two processes.
3.2.1
The Process of Feature Extraction
This task is performed by making use of CNN and recurrent neural network (RNN). The CNN network architecture is depending on the work published in [26]. This network enables to integrate feature extraction, transcription, and sequence modeling into a unified framework in which the convolution layers extract the features from the text image (obtained in Sect. 3.1) and map these features to a sequence. Meanwhile, the recurrent network works to predict sequence labeling (relationship between characters). This network also analyzes the relation within a sequence in both directions.
3.2.2
The Process of Character Generation
This task enables to produce the sequence of characters. The operation is considered as a task of T -step. The vector of attention weights at step t is computed by wt = attend(st−1 , wt−1 , h)
(4)
where st−1 is the state variable of the long short-term memory (LSTM) at the last step. The recurrent process of LSTM is responsible for updating the states of all the units. The softmax function is used to calculate the probability over the label. The predicted character is the one with the highest probability. The cost function is measured as the negative log likelihood over the dataset.
4 Experimental Results For training the network of text detection, we use the public dataset ICDAR2013 (IC13) which was published at the ICDAR 2013 Robust Reading Competition (scene text detection). This dataset consists of high-resolution images in which the training set contains 229 images with 848 words, and the testing set includes 233 images. The text that appears in these images is in English. The rectangular boxes mark the annotations for text which are at word level.
Region Awareness for Identifying and Extracting Text …
507
By experiments, we have observed that the dataset for English text used for training the network also gives high performance when applying for Vietnamese natural text detection. The maximum number of iterations is set to 20,000,000 for training the FCN network for text detection. To save the time for the training process, we have adopted fine-tune model on the selected dataset. The ADAM optimizer is selected for the training processes. During the process of training, the dataset is also divided at a rate of 1:3. This leads to making sure that the character positions are separated. In addition, the training process contains two kinds of quadrilateral annotations in which one is used for cropping word images, and the other is used for estimating word length. To assess the performance of the network for text recognition, we have performed experiments on the Vietnamese dataset including hand- and type-writing images, which has 229 images along with their ground-truth words. The learning rate is set to 0.001. We choose a batch size of 64. The Adam optimizer is utilized to update the network weights. The network is trained for 20 epochs, and the best model was selected depending on the value of minimal validation loss. Our network is trained on the images, and the trained model is used to test on both synthetic and real-world data. The training process is conducted without any fine-tuning on the training data. For the assessment of the system, we utilize the standard recall, precision, and F-measure metrics used in most scene text detection research. To prove the performance of our approach, we present the obtained results that rely upon the following aspects.
4.1 Text Detection We have performed qualitative and quantitative experiments on a public dataset, namely ICDAR2013 to compare with the existing works. To evaluate the effectiveness of our text detection, we make use of the Recall (R) and Precision (P) metrics that have been used in the information retrieval field. The F-score can be achieved as follows. F − scor e = 2 ×
4.1.1
P×R P+R
(5)
Qualitative Results
Figure 2 denotes various detection produces by our approach. The proposed method enables to conduct several challenging problems, such as low resolution, non-uniform illumination, perspective distortion, and varying orientation. The heat maps describ-
508
V. L. Cu et al.
Fig. 2 The results of text detection
ing the text regions generated from our network are described in Fig. 2a. We have observed that the trained model gives highly accurate character scores where the positions of text are detected and formed.
4.1.2
Quantitative Results
The text detection results of the proposed method conducted on the IC13 dataset are presented in Table 1. The figures demonstrate that the proposed method gives competitive performance compared to the existing methods by an F-score of 87.65. We can observe that the positions of detected text are precise in most of the cases. If the distance between the words is close to the distance between characters belonging to a word located on the same line, the detected positions also correctly cover a text line or a word. However, the proposed method could fail in some cases: logos are misclassified as text; the portions of a text are missed due to degradation like low resolution or light conditions.
Table 1 The results of text detection on the IC13 dataset Approach Recall (%) Precision (%) Zhu and Uchida [8] Wafa et al. [1] Zhang et al. [27] EAST [9] Zhang et al. [27] Our approach
84.00 82.28 88.00 74.24 43.09 89.38
83.00 89.94 78.00 84.86 70.81 83.30
F-score (%) 84.00 85.94 83.00 79.20 53.58 86.23
Region Awareness for Identifying and Extracting Text …
4.1.3
509
Qualitative Results for Vietnamese Text Detection
Besides detecting text in English, we also try to conduct experiments in the scenario of a Vietnamese scene. We have observed that the characteristic of Vietnamese text is not much different from English text except for accents. Thus, the trained network is also enabled to work well for Vietnamese text detection.
4.2 Text Detection and Recognition To estimate the accuracy of text recognition, we utilize the word recognition accuracy or character recognition accuracy in which the evaluation metrics are commonly used. Given a set of cropped character images, the accuracy of character recognition (CR) is defined as follows. C R(%) =
N o. o f Corr ectly Recogni zed Character s × 100 T otal N umber o f Character s
(6)
With this approach, we can extract the information from the natural images in case the embedded characters are clear, and the character size is in moderate size. In addition, we also try to extract text from the images by using the Tesseract OCR library, and this library fails to recognize text in most of the images. In general, even though the accuracy of our text recognition method has not been close to 100%, the result of text recognition is much better than the traditional OCR method.
5 Conclusion and Future Works We have proposed a method for detecting and recognizing text embedded in the natural scene in which the text detection process is performed by detecting individual characters, and these characters are grouped into word levels. The network for text detection generates the score of character region and the score of character similarity. Besides, we also utilize a combination of RNN and CNN networks for text recognition. The process of text detection shows that the proposed method gives competitive results compared to the existing approaches on the public datasets and demonstrates the generalization of our approach. As future work, we intend to improve the model with Vietnamese text recognition to see better performance for extracting text embedded in the natural scene.
510
V. L. Cu et al.
References 1. Khlif W, Nayef N, Burie J, Ogier J, Alimi A (2018) Learning text component features via convolutional neural networks for scene text detection. In: DAS 2. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: CVPR 3. Lee J, Lee P, Lee S, Yuille A, Koch C (2011) Adaboost for text detection in natural scene. In: ICDAR 4. Text extraction from scene images by character appearance and structure modeling (2013) Comput Vis Image Underst 5. Gomez L, Karatzas D (2016) A fast hierarchical method for multi-script and arbitrary oriented scene text extraction 6. Zhang C, Yao C, Shi B, Bai X (2015) Automatic discrimination of text and non-text natural images. In: ICDAR 7. Zhu S, Zanibbi R (2016) A text detection system for natural scenes with convolutional feature learning and cascaded classification. In: CVPR 8. Zhu A, Uchida S (2017) Scene text relocation with guidance. In: ICDAR 9. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: CVPR 10. Liao M, Shi B, Bai X, Wang X, Liu W (2016) Textboxes: a fast text detector with a single deep neural network 11. Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Trans Image Process 12. Jianqi M, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2017) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 13. Qin H, Zhang H, Wang H, Yan Y, Zhang M, Zhao W (2019) An algorithm for scene text detection using multibox and semantic segmentation. Appl Sci 14. Liu J, Liu X, Sheng J, Liang D, Li X, Liu Q (2019) Pyramid mask text detector. CoRR 15. Gupta A, Vedaldi A, Zisserman A (2020) Synthetic data for text localisation in natural images. In: CVPR 16. Wang P, Yang L, Li H, Deng Y, Shen C, Zhang Y (2019) A simple and robust convolutionalattention network for irregular text recognition. In: CVPR 17. https://github.com/tesseract-ocr/tesseract 18. Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 19. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: CVPR 20. Ren S, He K, Girshick R, Sun J (2017) Faster r-CNN: towards real-time object detection with region proposal networks. Trans PAMI 21. Nga P, Trang N, Phuc N, Quy T, Binh V (2017) Vietnamese text extraction from book covers. Tap chi Khoa hoc Dai hoc Da Lat 22. Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: CVPR 23. Buta M, Neumann L, Matas J (2015) Fastext: efficient unconstrained scene text detector. In: ICCV 24. Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced MSER trees. In: Comput Vis—ECCV 2014 25. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 26. Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans PAMI 27. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: CVPR 28. Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation
Analysis of Effectiveness of Selected Classifiers for Recognizing Psychological Patterns Marta Emirsajłow
and Łukasz Jelen´
Abstract Intensive development of artificial intelligence methods in recent years has led to a great interest in this subject among scientists and psychologists. This interest has led to the increasing use of these methods in the humanities and social sciences. Since this is a relatively new area of applications of artificial intelligence, most of research concentrates on the improvement of the use of these new tools in psychology. The present work focuses on the comparison of the precision and effectiveness of several known algorithms for, among others, recognizing patterns in the classification of human traits and behaviors, so that in the future one can automate this process and find reliable methods to predict these behaviors. Specifically, some three most interesting algorithms were selected and their effectiveness in the pattern recognition tested on selected data. Keywords Methods of artificial intelligence · Classification of psychological patterns · Classification of human traits
1 Introduction Human behavior is complex, but not accidental. Areas where understanding of human behavior is crucial are not only traditional social sciences, mainly psychology and sociology but also a number of engineering areas such as robotics, human-computer interaction, as well as the processing of signals generated by humans. Research in this area is interdisciplinary in nature [1, 2, 5]. The huge progress in pattern recognition in recent years has allowed computer scientists and psychologists to develop methods for automatic analysis of human behavior using computers [8, 10]. These methods are based on advanced pattern recognition techniques that automatically interpret complex human behavior. As already mentioned, research in this area is multidisciplinary, and the role of computer science is to develop effective IT tools for M. Emirsajłow (B) · Ł. Jele´n Department of Computer Engineering, Wrocław University of Science and Technology, 50-370 Wrocław, Poland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_45
511
512
M. Emirsajłow and Ł. Jele´n
sociologists, psychologists and medical doctors. Intelligent algorithms for processing extensive sets of multimedia information in order to obtain appropriate material for modeling and behavior analysis are being sought [8, 9]. In this article, three selected pattern recognition algorithms will be analyzed for effectiveness and efficiency in pattern recognition in the field of psychology. Algorithm testing will be based on the answers to questions from three psychological questionnaires, so that the usefulness of these classification techniques in recognizing psychological patterns can be assessed as accurately as possible. For this purpose, the paper uses a database publicly available on the Internet which was collected from students through a smartphone application [12]. The collected data included, among others, a number of official questionnaires in the field of medicine and psychology. These questionnaires concerned broadly understood social behaviors caused by various processes occurring in human beings. The main purpose of this paper is to compare algorithms in terms of the effectiveness of feature prediction and relationships with different types of behavior. The evaluation of algorithms has also to answer the question of which ones work best with which data type. This can help future research to more effectively use pattern recognition methods in social science applications, and adapt learning data sets to provide the highest training effectiveness.
2 Data Set and Algorithms While reviewing the literature in the field of computational social sciences, our attention was attracted to the project called StudentLife: Using Smartphones to Assess Mental Health and Academic Performance of College Students [11], carried out by scientists from Dartmouth College (USA), where many factors affecting student behavior as well as their academic performance have been studied. Our paper uses part of the data collected within this project and made publicly available on the site [12]. Research carried out under the mentioned project involved 48 students during the 10-week spring trimester (03.27–06.05.2013) at Darmouth College in the United States, studying computer science with smartphone programming profile. 48 people joined the project (10 women and 38 men) and 41 lasted until the very end. Data were collected through a dedicated smartphone application, which also included a number of official questionnaires from the field of medicine and psychology. In our article, we used data from the three well-known questionnaires: the Big Five [3], the Loneliness Scale [7] and the Depression Test [4]. The selection of algorithms used to train the model in pattern recognition was based on a literature review, and comparison of their suitability for classification and the following three algorithms has been chosen: Decision Tree, Support Vector Machine (SVM) and k-Nearest Neighbors (KNN) [9]. The algorithms were tested using the Matlab Classification Learner application [6]. The qualitative and performance features of the three algorithms are summarized in Table 1.
Analysis of Effectiveness of Selected Classifiers …
513
Table 1 Qualitative and performance comparison of algorithms Algorithm Prediction Training speed Memory speed consumption 1 2 3
Decision tree SVM KNN
Fast Fast Moderate
Fast Fast Minimal
Small Small Average
Adjustment required Average Minimal Minimal
3 Examination of Algorithms The selected algorithms were tested in terms of quality and quantity, in the context of the classic forecasting model. Three different input files have been prepared. They contained StudentLife participants’ pre- and after-trimester responses to questions related to their well-being and behavior. Each file was related to one out of the three questionnaires completed by participants.
3.1 Big Five Questionnaire The Big Five questionnaire [3] was analyzed as the first one. The score for the answers to individual questions was calculated and then the number of points was added and the type of personality was assigned. The highest score corresponded to the strongest personality types. Figures 1, 2 and 3 show the number of correctly and incorrectly classified personality types using the three compared algorithms. The correctly classified values are marked in green and those incorrectly in red in all figures.
3.2 Depression Test Questionnaire The second analyzed questionnaire was the Depression Test [4] concerning five levels of depression. In this case, also specific points were assigned to the answers, and after adding up each participant was assigned to one of the five levels of depression. The results are presented in Figs. 4, 5 and 6.
3.3 Loneliness Scale Questionnaire The last file was related to the Loneliness Scale questionnaire [7], in which three levels of feature intensity were distinguished and assigned to participants. The results
514
M. Emirsajłow and Ł. Jele´n
Valid classes
Matrix for predicting personality type (Fine Tree)
'Extroversion'
1
'Neuroticism'
2
1
49
6
4
'Conscientiousness'
9
2
2
'Agreeableness'
3
'Openness'
'E x
'N e
tro
'O pe
ur
ve
es
sm '
io n'
'C o
'A g
re
ns
nn
ot ic i
rs
6
ea
cie nt
s'
bl
io
en es
us n
s'
es
s'
Predicted classes
Fig. 1 Model of predictability of personality types for the decision tree algorithm
Valid classes
Matrix for predicting personality type (Linear SVM)
'Extroversion'
1
'Neuroticism'
2
'Openness'
60
'Conscientiousness'
12
1
'Agreeableness'
8
1
'E
'N
xt
'O
eu
ro
ve
n'
'
gr
sc
es
ism
'A
on
nn
tic
io
'C
pe
ro
rs
s'
ee
ie
nt
ab
le
io
us
'
ss
'
Predicted classes
Fig. 2 Model of predictability of personality types for the SVM algorithm
ne
ss
ne
Analysis of Effectiveness of Selected Classifiers …
515
Valid classes
Matrix for predicting personality type (Cubic KNN)
'Extroversion'
1
'Neuroticism'
2
'Openness'
60
'Conscientiousness'
13
'Agreeableness'
9
'E x
'O
'N e
tro
ur
'C o
pe
ici s
re
cie
es
nt
s'
m '
'A g
ns
nn
ot
ve rs io n'
ea
ble
io
ne
us
ss
ne
'
ss
'
Predicted classes
Fig. 3 Model of predictability of personality types for the KNN algorithm Matrix for predicting depression level (Fine Tree)
Valid classes
'Mild'
20
1
2
'Severe'
3
'Moderately_severe'
2
1
1
1
'Moderate'
7
'None-minimal'
7
6
33
'M
'S
ild
'
'M
ev
er
e'
'M od er at
od
er
at
el
'N
on
e'
y_
se
e-
m
in
im
al
'
ve
re
'
Predicted classes
Fig. 4 Model of predictability of depression level for the decision tree algorithm
516
M. Emirsajłow and Ł. Jele´n Matrix for predicting depression level (Linear SVM)
'Mild'
28
1
Valid classes
'Severe'
1
2
'Moderately_severe'
3
'Moderate'
1
'None-minimal'
2
8
38
'M ild '
'S ev er e'
'M
'M
od
'N
on
od
er
at
er
el
e-
m
at
y_
se
e'
ini m al
ve
'
re
'
Predicted classes
Fig. 5 Model of predictability of depression level for the SVM algorithm
Valid classes
Matrix for predicting depression level (Cubic KNN)
'Mild'
27
'Severe'
1
2
'Moderately_severe'
1
2
'Moderate'
8
1
'None-minimal'
2
'M ild
2
38
'S '
'M
ev
er
e'
'M od
od
er
at
'N
on
er
el
at
y_
se
e'
e-
m
in i
m
al
'
ve
re
'
Predicted classes
Fig. 6 Model of predictability of depression level for the KNN algorithm
Analysis of Effectiveness of Selected Classifiers …
517
Valid classes
Matrix for predicting loneliness scale (Fine Tree)
'Low'
19
5
'Medium'
2
34
3
'High'
2
8
10
'Lo
w'
'M
'H
ed
iu
ig h'
m
'
Predicted classes
Fig. 7 Model of predictability of the loneliness scale for the decision tree algorithm
of processing the prepared files by the three analyzed algorithms were compared with the questionnaire data and presented in Figs. 7, 8 and 9.
3.4 Analysis of the Prediction Precision Table 2 compares algorithms in terms of prediction effectiveness. For the prediction of personality types, the KNN method proved to be the most effective algorithm, achieving a 1.1% better result than the SVM algorithm. This questionnaire had 5 classes, which is by 2 more than in the Loneliness Scale questionnaire and, in addition, these classes were very unevenly distributed in terms of quantity. This means that the KNN method was more effective for data with more classes and less balanced. When predicting responses from the Depression Test questionnaire, the SVM algorithm proved to be the most effective, achieving a clearly better result than the other algorithms. The algorithm of the KNN achieved quite a satisfactory result, whereas better results could be expected from the Decision Tree.
518
M. Emirsajłow and Ł. Jele´n
Valid classes
Matrix for predicting loneliness scale (Linear SVM)
'Low'
19
5
'Medium'
1
37
1
2
18
'High'
'Lo
w'
'M
ed
'H i
iu
gh
m
'
'
Predicted classes
Fig. 8 Model of predictability of the loneliness scale for the SVM algorithm
The SVM Algorithm also obtained the best result in the prediction of a sense of loneliness, while the Decision Tree algorithm was again the least precise in this case.
3.5 Analysis of Prediction Speed and Training Time Tables 3 and 4 show the prediction speed and model training time. The fastest algorithm for predicting personality types was the Decision Tree algorithm with approximately 3100 observations per second. Three times worse results were obtained by the KNN algorithm, achieving 930 observations per second. The worst algorithm in this set is the SVM algorithm, achieving over three times worse results than the Decision Tree. Despite the desired results of the prediction speed in the case of the Decision Tree, it can be concluded that it has lost a lot of precision, while the relatively most effective algorithm, i.e. the SVM method, achieved the lowest number of observations per second.
Analysis of Effectiveness of Selected Classifiers …
519
Valid classes
Matrix for predicting loneliness scale (Cubic KNN)
'Low'
18
6
'Medium'
6
33
'High'
3
'Lo
17
'M
'H
ed
w'
ig
iu
h'
m
'
Predicted classes
Fig. 9 Model of predictability of the loneliness scale for the KNN algorithm Table 2 Comparison of precision of class prediction by algorithms Questionnaire Decision tree (%) SVM (%) 1 2 4
Big five Depression test Loneliness scale
61,2 70,2 75,9
71,8 88,1 89,2
Table 3 Comparison of the speed of class prediction by algorithms Questionnaire Decision tree SVM (obs/sec) (obs/sec) 1 2 4
Big five Depression test Loneliness scale
3100 3200 3800
730 1600 2200
KNN (%) 72,9 77,4 81,9
KNN (obs/sec) 930 2200 2300
The least varied set of class prediction speed appeared in the case of the Loneliness Scale questionnaire, which had the most balanced class sets. It can therefore be concluded that the size and distribution of the classes has an impact on the speed of prediction in all algorithms.
520
M. Emirsajłow and Ł. Jele´n
Table 4 Comparison of model training time using individual algorithms Questionnaire Decision tree (s) SVM (s) 1 2 4
Big five Depression test Loneliness scale
1,0103 1,0524 0,94026
4,0525 1,2246 0,85494
KNN (s) 1,4437 0,88228 0,8896
The differences between the training times of the model were insignificant. For most of the data, the training times of individual algorithms were similar for the same data. The only exception was the training time of the personality prediction model using the SVM algorithm, which obtained almost four times worse results than the other two algorithms.
3.6 Summing Up 1. The Decision Tree algorithm achieved much worse performance results than the SVM algorithm. The result of its precision was not the highest in any of the surveyed questionnaires. Regarding the speed of prediction, it proved to be the clear leader, which may have caused the loss of quality, but to gain efficiency. It achieved the best speed results in the case of personality types, depression scale and the feeling of loneliness. 2. In the case of training time, most of the algorithms obtained very similar results. The KNN algorithm turned out to be the most effective in terms of precision in predicting personality types and quite effective in the case of other predictions. In the case of this method, it was also associated with quite high performance. All algorithms had a fairly good training time, except for the SVM algorithm for personality type data, which could mean difficulties in finding links between responses through the carrier vector method. 3. To sum up, it can be said that each of the tested algorithms coped with the problems relatively well and it cannot be said that one of them is superior over the other, but rather that each of them shows their advantages in different types of conditions.
4 Final Remarks The selected pattern classification algorithms were tested for suitability for training a prediction model of human traits and behaviors. Their operation was largely dependent on the quality of data sets, and among other things, their balance. Some data sets were not evenly balanced, which could adversely affect the effectiveness of the algorithms. Their effectiveness varied depending on the sample balance and
Analysis of Effectiveness of Selected Classifiers …
521
algorithm quality from 61.2 to 96.4% on the analyzed data sets. In this context, the statistical probability of correctly classifying responses can be excluded due to the very different options available. An important generalization is also the fact that the study using the Big Five questionnaire determines the level of all five personality traits, while in this work the personality trait of each participant has been narrowed to one achieving the highest level. From the psychological point of view, this is not a reliable assessment of a person’s personality, because everyone has all five characteristics, only at different levels, not just one of them. If one were to consider developing research replication or replication at work, one would certainly have to think about resolving this inaccuracy. Nevertheless, the undoubted positive achievement of the performed research was the creation of a methodology which, subjected to further development, may lead to the generation of a model giving greater than before possibilities to predict the psychological features of man. Also, the development of existing databases and the creation of new ones in this area will increase the possibilities of model training and provide tools to facilitate the work of psychologists, diagnosing disorders and conducting psychological tests. At the current stage, the tested methodology is a satisfactory achievement, although it is recommended to use better balanced data, eliminating classes containing one or only a few observations, because these are interfering data.
References 1. Chetouani M, Cohn J, Salah AA (eds) (2016) Human behavior understanding. In: LNCS 9997, proceedings of the 7th international workshop, Amsterdam, The Netherlands, Oct 16 2. Emirsajłow M (2019) Pattern recognition methods in human behaviour analysis. MSc thesis, faculty of electronics, Wroclaw University of Science and Technology, Wrocław 3. John OP, Srivastava S (1999) The big five trait taxonomy: history, measurement, and theoretical perspectives. Handb Pers: Theory Res 2:102–138 4. Kroenke K, Spitzer RL, Williams JB (2001) The PHQ-9. J Gen Intern Med 16(9):606–613 5. Pentland A, Liu A (1999) Modeling and prediction of human behavior. Neural Comput 11(1):229–242 6. Statistics and Machine Learning ToolboTM User’s Guide, Matlab R2019a. The MathWorks (2019) 7. Russell DW (1996) UCLA loneliness scale (version 3): reliability, validity, and factor structure. J Pers Assess 66(1):20–40 8. Salah AA, Gevers T (eds) (2011) Computer analysis of human behavior. Springer, London 9. Shalev-Shwartz Shai, Ben-David Shai (2014) Understanding machine learning. From theory to algorithms. Cambridge University Press, New York 10. Salah AA, Krose BJA, Cook DJ (eds) Human Behavior Understanding. In: LNCS 9277, proceedings of the 6th international workshop, Osaka, Japan, Sept 8 11. Wang R, Chen F, Chen Z, Li T, Harari G, Tignor S, Zhou X, Ben-Zeev D, Campbell A, Student life: assessing mental health, academic performance and behavioral trends of college students using smartfons. In: Proceedings of ACM conference on ubiquitous computing (2014) 12. Project StudentLife (2019). http://studentlife.cs.dartmouth.edu/. Accessed 14 June 2019
A Virtual Reality System for the Simulation of Neurodiversity Héctor López-Carral, Maria Blancas-Muñoz, Anna Mura, Pedro Omedas, Àdria España-Cumellas, Enrique Martínez-Bueno, Neil Milliken, Paul Moore, Leena Haque, Sean Gilroy, and Paul F. M. J. Verschure
Abstract Autism is a neurodevelopmental disorder characterized by deficits in social communication and repetitive patterns of behavior. Individuals affected by Autism Spectrum Disorder (ASD) may face overwhelming sensory hypersensitivities that hamper their everyday life. In order to promote awareness about neurodiversity among the neurotypical population, we have developed an interactive virtual reality simulation to experience the oversensory stimulation that an individual with autism spectrum disorder may experience in a natural environment. In this experience, we project the user in a first-person perspective in a classroom where a teacher is presenting a lecture. As the user explores the classroom and attends the lecture, he/she is confronted with sensory distortions which are commonly experienced by persons with ASD. We provide the users with a virtual reality headset with motion tracking, two wireless controllers for interaction, and a wristband for physiological data acquisition to create a closed feedback loop. This wearable device measures blood volume pulse (BVP) and electrodermal activity (EDA), which we use to perform online estimations of the arousal levels of users as they respond to the virtual stimuli. We use this information to modulate the intensity of auditory and visual stimuli simulating a vicious cycle in which increased arousal translates into increased oversensory stimulation. Here, we present the architecture and technical implementation of this system. H. López-Carral · M. Blancas-Muñoz · A. Mura · P. Omedas · À. España-Cumellas · E. Martínez-Bueno · P. F. M. J. Verschure (B) Synthetic Perceptive Emotive Cognitive Systems (SPECS) Lab, Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain e-mail: [email protected] H. López-Carral · M. Blancas-Muñoz Universitat Pompeu Fabra (UPF), Barcelona, Spain N. Milliken · P. Moore Atos, London, UK L. Haque · S. Gilroy BBC, Manchester, UK P. F. M. J. Verschure Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_46
523
524
H. López-Carral et al.
Keywords Virtual reality · Neurodiversity · Autism spectrum disorder · Physiology
1 Introduction 1.1 Autism Spectrum Disorder Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that affects social communication and is characterized by repetitive patterns of behavior. Individuals diagnosed with ASD may experience hypersensitivity, enhanced perception, and sensory overload [10, 12]. Some view this hypersensitivity as the result of hyperacute sensation, others, as a lack of prediction, leading to impairments in habituation. Regardless of the cause, these differences in sensory prediction, together with impairments in contextualizing sensory evidence, can handicap the understanding of others’ actions and, consequentially, social interactions [7].
1.2 Virtual Reality for Neurodiversity Simulation With the goal of raising awareness among the neurotypical population about the neurodiverse phenomenology, we developed an interactive virtual reality simulation to experience “neurodiversity”. In particular, we wanted to simulate the oversensory stimulation that people with ASD may experience during an ordinary situation. For the simulation environment, we have chosen a classroom given that it is a social context in which many possible stimuli may be present. In order to offer a realistic first-person experience, we chose to use virtual reality (VR) to place users in the perspective of a student affected by ASD (see Fig. 1). Furthermore, we used a wearable device for acquiring physiological signals that we use to estimate arousal levels, which we use in real time to create a closed feedback loop. ASD encompasses a wide range of traits. As the use case of our project, we have simulated the experience of a teenager, focusing on Level 1 of the 5th Version of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) [4] (“Requiring Support”); that is, although diagnosed with ASD, this person would not suffer severe deficits. The reasons to choose this level are because more advanced levels would deal with more complex motor symptoms [9], making it more difficult to simulate the experience, and because individuals under more severe symptoms (such as low intelligence or impaired communication) could even seek sensory stimuli.
A Virtual Reality System for the Simulation of Neurodiversity
525
Fig. 1 Screenshot of the classroom environment. The user is placed sitting on a desk, surrounded by other peers and in front of a teacher, who gives a lecture on astronomy
1.2.1
Previous Examples
The experience we are proposing is informed both by scientific literature and existing multimedia projects for ASD awareness. The available examples can be classified considering their format and level of interactivity. • Videos (regular): In this type of experience, users can watch in a first-person view what a person with ASD would be experiencing. Examples (all but the first one are homemade): Carly’s Café,1 Walking Down the Street,2 Sensory Overload Stimulation,3 and Autism: Sensory Overload Stimulation.4 • 360° videos: In these ones, the viewer can also experience a 360° representation of their surroundings. Examples: Project Cape,5 The Party,6 and Autism TMI Virtual Reality Experience.7 • Interactive: This kind of experience also allows users to interact with the environment. Examples: Auti-Sim (game)8 and Autism Reality Experience.9 As mentioned, a variety of works have been created through different technological means to raise awareness about the sensory overstimulation that someone with ASD could suffer. However, to the best of the authors’ knowledge, none of the 1
https://youtu.be/KmDGvquzn2k. https://youtu.be/plPNhooUUuc. 3 https://youtu.be/BPDTEuotHe0. 4 https://youtu.be/IcS2VUoe12M. 5 https://youtu.be/ZLyGuVTH8sA. 6 https://youtu.be/OtwOz1GVkDg. 7 https://youtu.be/DgDR_gYk_a8. 8 http://gamejolt.com/games/auti-sim/12761. 9 https://www.training2care.co.uk/autism-reality-experience.htm. 2
526
H. López-Carral et al.
existing systems includes biofeedback to more realistically and dynamically recreate that experience. By using multiple physiological signals in real time, we aim at overcoming this limitation and thus deliver a more complete simulation.
1.2.2
Affective State Estimation from Physiological Signals
The autonomic nervous system modulates physiological responses, such as heart rate, respiration, and pupil dilation. This is directly reflective of certain internal human states, such as emotions and cognitive load. Thus, it is possible to use a variety of sensors to measure different physiological signals, such as the electrical activity of the heart using an electrocardiogram (ECG) or the skin’s electrodermal activity (EDA), to learn about the users’ states. In particular, these signals are known to correlate with affective states such as arousal [13], including in VR experiences [6]. Electrodermal activity is the fluctuation of the electrical properties of the skin as modulated by sweat gland activity. This is controlled by the sympathetic nervous system in correlation with arousal [8]. Heart rate variability (HRV) is a measure of the variation of time intervals between heartbeats [1], which can be derived from ECG data or photoplethysmography (PPG) data [3] and also correlates with arousal levels [2]. We use EDA and PPG together for increased robustness. In this paper, we present a novel virtual reality experience to simulate the oversensory stimulation that neurodiverse people might face in their daily lives in order to promote awareness of this among the neurotypical population. This experience is enhanced by biofeedback using physiological signals to dynamically adapt the experience. Here, we describe the outcome, focusing on the stimuli used and the implementation in terms of its architecture and the estimation of the user’s internal states, before discussing the resulting work.
2 Neurodiversity Experience 2.1 Stimuli While immersed in the virtual reality experience, users are exposed to a series of stimuli whose properties (such as intensity and duration) are manipulated to induce a state of oversensory stimulation. The chosen stimuli are informed by a body of research on sensory overload in ASD and self-reports from individuals in the ASD spectrum. Considering the types of oversensory stimulation, the stimuli can be divided between visual and auditory (see Table 1). Apart from being triggered, these stimuli can be modulated in intensity within a continuous range of values. Thus, they can be regulated depending on a number of factors, including arousal levels of the users as inferred using their physiological responses.
A Virtual Reality System for the Simulation of Neurodiversity
527
Table 1 Examples of stimuli used in the experience, divided between auditory and visual Type Stimulus Examples Audio Audio Visual Visual Visual
Background noise Sudden noise Color Distortions Light
Peers talking Car horn Shiny colors Moving patterns Excess of light
2.2 Implementation We have developed the “Neurodiversity Experience” as an interactive virtual reality experience augmented by biofeedback using a wearable device and implemented via a combination of different hardware and software technologies.
2.2.1
Architecture
As a platform for the VR experience, we chose the Oculus Rift S headset (Oculus from Facebook Technologies, U.S.A.), a head-mounted display that provides the audiovisual experience to the users, as well as handling body movements (particularly head) and integrating two wireless controllers for interaction. We engineered the virtual environment and the foundation of the experience using the Unity real-time development platform (Unity Technologies, U.S.A.). Using Unity, we developed the 3D environment in which users are situated during the experience to perceive a series of stimuli. This environment is populated by human-like characters, including other students and the teacher. They are animated realistically, in terms of both body movements and facial expressions. In the case of the teacher, the avatar moves around the space while gesturing, simulating the delivery of a lecture on astronomy. Mouth movements of this character are synchronized with a recording of the speech, performed by a human actress. This 3D application is also the basis for the interaction process, taking care of integrating both explicit interaction, such as body movements and actions with the controllers, and implicit interaction, deriving mental and affective states from physiological signals. The sensor used to acquire physiological signals is the Empatica E4 wristband (Empatica Inc., U.S.A.), a wearable device equipped with multiple sensors, including a photoplethysmography (PPG) sensor and an electrodermal activity (EDA) sensor. It offers the possibility of real-time data acquisition and streaming using wireless connectivity via Bluetooth to a computer (see Fig. 2). In order to process the physiological signals online and estimate the internal states of the users for interaction purposes, we developed an architecture integrating
528
H. López-Carral et al.
Fig. 2 Setup of the experience. The user is wearing an Oculus Rift S headset and an Empatica E4 wristband. The computer screen allows observers to see what the user sees
several software technologies (see Fig. 3). We use the existing E4 streaming server 10 to forward real-time data using TCP socket connections. We developed a Python script that connects to that server, obtaining all data acquired by the wristband and relaying it using the lab streaming layer (LSL) system,11 a protocol for streaming data which handles the networking and time-synchronization of signals for both online usage and recording. Then, we use the Modular Integrated Distributed Analysis System (MIDAS) [11] to perform the online analysis of the signals streamed using LSL. To do this, we developed a node for each of the signals of interest (PPG and EDA), integrating the necessary analysis functions to estimate arousal levels. The virtual reality application then performs requests using a REST JSON API at regular intervals to obtain the processed arousal levels, which are used to modulate the intensity of the stimuli presented to the users.
2.2.2
Online Physiological Signal Analysis
In order to estimate the arousal level of the users, we use a combination of two physiological signals: photoplethysmography (PPG) and electrodermal activity (EDA). From the blood volume pulse (BVP) measured by the PPG sensor, we derive heart rate variability (HRV). EDA and HRV are used in conjunction to estimate arousal levels. To compute the changes in the physiological signals, we use a moving average algorithm based on two overlapping time windows. The shorter time window, corresponding to the last 10 s, is compared to a longer time window of 30 s that includes 10 11
https://developer.empatica.com/windows-streaming-server.html. https://github.com/sccn/labstreaminglayer.
A Virtual Reality System for the Simulation of Neurodiversity
529
Fig. 3 Architecture to process the physiological signals to estimate arousal levels online for interaction with the virtual reality experience. The physiological signals from the Empatica E4 wristband are transmitted to be streamed using LSL, to be analyzed online by MIDAS to infer arousal levels for a closed-loop interaction
the shorter window. By dividing the mean value during the short window over that of the long window, a measure of change is computed, centered around a value of 1. Values over 1 indicate an increase in arousal, while lower values denote a decrease. This moving average is computed for each signal type individually, on the analysis node corresponding to each. Then, an additional node combines the result from the physiological processing nodes by computing an average that will act as the estimation of arousal levels [5].
3 Discussion and Conclusions We developed an innovative setup for a VR experience that places users in a classroom where they can assume the role of a student during a lesson. Throughout the experience, users are exposed to a series of stimuli to simulate their experience in ways that people with ASD may perceive them. To do this, we use a series of visual and auditory stimuli that are triggered depending on the timing and the actions of the users. The intensity of many of these effects is regulated using estimations of the arousal levels of the users, computed in real time from physiological signals acquired by a wristband they are wearing, to further reinforce the experience using biofeedback for achieving increased effectiveness and realism. To accomplish this, we developed a software architecture that transmits the raw signals obtained by the wristband’s sensors, processes them online, and makes them available for real-time usage by the VR environment to dynamically adapt it to the user. The main implication of this experience is to raise awareness about the daily life of a student with ASD. To do so, this system will be deployed in several neurodiversityrelated events, where users will be able to experience it. Moreover, it will allow us to understand the relation between physiological signals, sensory overload, as well as attention and memory retrieval in classroom environments. This would be
530
H. López-Carral et al.
useful not only for gaining scientific knowledge and contribute to understanding the neurodiverse phenomenology but also for possibly helping teachers design more inclusive classrooms.
3.1 Further Steps This paper presents the technical implementation of our study focused on building an interactive VR experience targeting the neurodiverse phenomenology. A further step in the validation of this experience will be to perform a user evaluation. As a longer term possibility, this experience could also support ASD individuals themselves. Previous studies have discussed the need for techniques to improve predictive skills, rather than just treating ASD symptomatology. This could be done by adapting the type, intensity, and timing of the sensory overload stimuli to the degree of overload suffered by the individual. Acknowledgements This research received funding from H2020-EU, ID: 787061 and ERC (PoC) H2020-EU, ID: 840052.
References 1. Acharya UR, Joseph KP, Kannathal N, Lim CM, Suri JS (2006) Heart Rate Var: A Rev. https:// doi.org/10.1007/s11517-006-0119-0 2. Agrafioti F, Hatzinakos D, Anderson AK (2012) ECG pattern analysis for emotion detection. IEEE Trans Affect Comput 3(1):102–115. https://doi.org/10.1109/T-AFFC.2011.28 3. Allen J (2007) Photoplethysmography Appl Clin Physiol Measur. https://doi.org/10.1088/ 0967-3334/28/3/R01 4. Association AP et al (2013) Diagnostic and statistical manual of mental disorders (DSM-5®). American Psychiatric Pub 5. Betella A, Cetnarski R, Zucca R, Arsiwalla XD, Martinez E, Omedas P, Mura A, Verschure PFMJ (2014) BrainX 3: embodied exploration of neural data. In: Proceedings of the 2014 virtual reality international conference. ACM, p 37 6. Betella A, Pacheco D, Zucca R, Arsiwalla XD, Omedas P, Lanatà A, Mazzei D, Tognetti A, Greco A, Carbonaro N, Wagner J, Lingenfelser F, André E, Rossi DD, Verschure PF (2014) Interpreting psychophysiological states using unobtrusive wearable sensors in virtual reality. In: ACHI 2014, the seventh international conference on advances in computer-human interactions pp 331–336 7. Chambon V, Farrer C, Pacherie E, Jacquet PO, Leboyer M, Zalla T (2017) Reduced sensitivity to social priors during action prediction in adults with autism spectrum disorders. Cognition 160:17–26. https://doi.org/10.1016/j.cognition.2016.12.005 8. Critchley HD (2002) Review: electrodermal responses: what happens in the brain. The Neurosci 8(2):132–142. https://doi.org/10.1177/107385840200800209 9. Goldman S, Wang C, Salgado MW, Greene PE, Kim M, Rapin I (2009) Motor stereotypies in children with autism and other developmental disorders. Dev Med Child Neurol 51(1):30–38 10. Gomes E, Pedroso FS, Wagner MB (2008) Hipersensibilidade auditiva no transtorno do espectro autístico. Pró-Fono Revista de Atualização Científica 20(4):279–284. https://doi.org/10.1590/ s0104-56872008000400013
A Virtual Reality System for the Simulation of Neurodiversity
531
11. Henelius A, Torniainen J (2018) MIDAS: open-source framework for distributed online analysis of data streams. SoftwareX 7:156–161. https://doi.org/10.1016/j.softx.2018.04.004 12. Mitchell P, Ropar D (2004) Visuo-Spat Abil Autism: A Rev. https://doi.org/10.1002/icd.348 13. Szwoch W (2015) Emotion recognition using physiological signals. In: Proceedings of the mulitimedia, interaction, design and innnovation—MIDI ’15, pp 1–8. https://doi.org/10.1145/ 2814464.2814479
An Algorithm Classifying Brain Signals in the Control Problem Urszula Jagodzinska-Szyma ´ nska ´ and Edward S˛edek
Abstract The article presents an algorithm that has been designed and implemented for classifying brain signals for control. The classification algorithm that was developed uses the graph theory, which accelerates its operation. The objectives of the classification were intentions of left- and right-hand movements. The basis for the algorithm is provided by signals from the brain read by means of the Emotiv Epoc mobile device. The results of the classification can be observed on a continuous basis in the dialogue window on the computer screen and with the help of a simple device based on the Arduino Uno R3 (Bluetooth) platform and a Wi-Fi module. The compliance with the results obtained and the movement intention can be analysed in reports generated in Microsoft Excel. Keywords Classification algorithm · Device control · Classifying brain signals · Spanning tree · Feature selection · Gower’s algorithm · Inverse problem · ERD/ERS phenomenon
1 Introduction The classification algorithm that was developed comprises a complete system working as BCI, from registering signals in the cerebral cortex, through processing the signals, reconstructing the sources of the signals, isolating signal features, selecting them, and classifying [10]. The result of the classification was used as an impulse controlling the device. Control was conducted wirelessly, via Bluetooth or Wi-Fi. In order to effectively classify the signals relating to movement intention, an algorithm was developed, which determines the sources of these signals on the basis U. Jagodzi´nska-Szyma´nska (B) · E. S˛edek PIT-RADWAR S.A., Warsaw, Poland e-mail: [email protected]; [email protected] E. S˛edek University of Science and Technology (UTP), Bydgoszcz, Poland e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_47
533
534
U. Jagodzi´nska-Szyma´nska and E. S˛edek
of the raw signals registered by the electrodes of the Emotiv headset. Source signals were sought in selected areas of the brain. To find them, the inverse problem was solved using the field theory method [8, 9]. The location of the original signals connected with movement was determined on the basis of the layout of potentials on the surface of the brain. The classification algorithm uses the graph theory to select the features which is the novelty in the classification process of brain signals. An important element of the operation of the classification algorithm is the dialogue window of the classification program, which makes it possible to observe the results of a classification in a continuous way in real time. Before starting measurements, classification parameters are set for a given user. These parameters can be changed in the course of the measurement/control process. Moreover, the dialogue window makes it possible to set the cooperation parameters between the classification program and the device. The classification algorithm is based on the ERD/ERS phenomenon. The delay of the occurrence of this phenomenon and the frequency band on which it appears is characterized by high individual variability. Therefore, to obtain the best results, it is necessary to set classification parameters in the dialogue window for the given measurement/control session and for each specific user. These can be modified while performing the measurement/control session. The motivation behind starting work on the algorithm to classify brain signals for control was to show that it is possible to obtain a high degree of agreement of the intention to make a hand movement with the obtained classification result, if the measurement parameters can be adjusted to individual characteristics obtained at the stage of classifier testing. These parameters can also be changed while one works. The ones that are involved are mainly frequencies and time intervals. The contribution of the study lies in the development of the entire system working as BCI, while the novelty is the selection of features with the use of the graph theory. The article is divided into four chapters. The second one presents the classification algorithm. The third chapter presents the classification function implemented in the MATLAB environment being the main function of the control algorithm. The fourth chapter presents the summary.
2 The Classification Algorithm 2.1 The Classification Algorithm in a System Working as BCI The classification algorithm was implemented in the MATLAB environment. The program developed using the MATLAB programming language has a modular structure, which makes it possible to register and process the signals from the electrodes and then to find the signal sources, isolate, select and classify their features as well as controlling.
Algorithm Classifying Brain Signals in the Control Problem
535
Moreover, a library was written in C++ on the basis of SDK Emotiv Epoc (eegloglib.dll), which enables MATLAB program to download raw data from the Emotiv Epoc headset. These are saved to files on a local disc in subdirectories corresponding to individual measurements. The main modules of the program call detailed functions performing operations in accordance with the operation scheme of the BCI system that was constructed, i.e. 1. 2. 3. 4. 5. 6.
Initiation of the control/measurement process. Registration of the raw signal from the electrodes of the Emotiv headset and saving it in a file as well as transferring it for further processing. Processing the signal and saving it to a file with processed data. Classification of the intention of the user’s movement––which is the main function of the control algorithm. Displaying the classification result in the dialogue window and the wireless transmission of the classification result in the form of a control impulse. Creating a report of the measurements, thus allowing a quick analysis of classification correctness (optionally).
The classification process comprises: solving the inverse problem (specifying the approximate location of signal sources), calculating t-statistics, graph theory–– Gower’s algorithm.
2.2 Gower’s Algorithm in Classification Based on the Graph Theory Gower’s algorithm was used to find a spanning tree with the largest sum of edge weight modules. The tree constructed in this way makes it possible to use the classification rule stating that if the sum of edge weights of the tree is positive or negative, depending on the sign of the sum the classifier will register the image of the movement as that of either the right or the left hand (the scalar sign is the classifier). The rule of the classification is using the spanning tree of the graph whose vortices are areas on the surface of the cerebral cortex and the edges have weights which are values of t-statistics. The graph G(X,E) was defined, where X is a set of the graph’s vortices, while E is the set of the edges weighted. The function of the edge weights w : E →< 0, ∞). which X = {P1 , . . . , Pn }, where Pi are fields on the cerebral cortex, include a , P certain number of voxels [7]. The weight of the graph’s edge P i1 i 2 is a function of w( Pi1 , Pi2 ) = m i1 i2 , where m i1 i2 = ti1 i2 , ti1 i2 is a value of t-statistics. m i − m i2 ti1 i2 = 1 σi2 σ2 1 + Ni22 N2 i1
i2
(1)
536
U. Jagodzi´nska-Szyma´nska and E. S˛edek
where m i1 , mi 2 are averages, Ni1 , Ni2 are voxel numbers in Pi1 , Pi2 areas, respectively, while σi21 ,σi22 are variations of current density for the voxels in the areas of Pi1 , Pi2 , (i 1 , i 2 = 1, . . . , n). The values of ti1 i2 are positive or negative depending on whether the hand movement relates to the right or the left hand. In order to obtain fast classification, a Minimum Spanning Tree (MST) was found, i.e. the graph T (X, A). A is a certain set of edges, which at the start of the algorithm’s operation is empty, while at the end of the algorithm’s operation will contain the edges of the maximum spanning tree being searched T (X, A). Taking [1] into consideration, due to the speed of operation for finding MST, a modification of Prim’s algorithm was adopted. This is an iteration algorithm. V denotes the set of vortices of the tree being constructed by the algorithm. The procedure of determining the spanning tree starts from one of the graph’s vortices G(X,E) (we include this vortex in set V) and at the beginning the incidental edge with the vortex with the greatest weight in set V is included in set A. In each subsequent iteration step, we include in A the edge with the greatest weight, i.e. one whose one end belongs to the vortex from set V, and the other to X \ V, remembering that there can be no loop in the graph being searched for. The other vortex is included in V. The procedure is repeated until V = X. After building the spanning tree (MST), the total of the sum of edge weights in this tree is considered. The sign of this sum (scalar) is the classifier. The result of this classification was used for control in the device designed for the purposes of the classification. Next, after determining the MST, in accordance with [2], a cluster analysis (SLCA––Single Linkage Cluster Analysis) was performed; edges whose weights were lower than the adopted threshold di (di = 0.9, di = 0.8, di = 0.7 was assumed) were removed from the MST edge list. In the end, only those vortices of the spanning tree were taken into account T (X, A), which according to the map of the brain lie in the area of the motor cortex.
2.3 The Classifying Program—The User’s Interface After the classifying program is turned on, the dialogue window opens (Fig. 1), which makes it possible to enter the initial parameters of the measurement and start the work of the program. The dialogue window of the classifying program is the user’s graphic interface making it possible to set the classification parameters demanded and enabling the cooperation of the classifying program with the device. Moreover, the dialogue window allows ongoing entry of measurement parameter corrections in the course of controlling the device. The following parameters are to be chosen in the dialogue window: • The frequency of the signal analysed. • The time intervals of the signal, (the result of the classification is determined as true or false for these intervals).
Algorithm Classifying Brain Signals in the Control Problem
537
Fig. 1 Dialogue window with sample initial settings––communication via the Bluetooth protocol
• Work mode: normal work (“Recognition”) or training mode (“Training”). The “Training” mode is useful when evaluating the best measurement for a user/ person tested, such as the time interval during which the correctness of classification results is observed or the frequency which best reflects the intention of the user. The best parameters for a given user are selected. • The thought task regarding imagining the movement of the left or right hand or “None” (neither the left nor the right-hand movement is chosen). The “None” option can be chosen, that is, neither hand can be chosen, if the user controls the device in an arbitrary way in the normal “Recognition” work mode, does not have to analyse the results of the classification, and did not have to specify beforehand what activity is to be performed. The choice of the hand makes it easier to analyse the correctness of classification if reports are needed, particularly in the course of training or testing the user, i.e. when the thought activity is set.
538
U. Jagodzi´nska-Szyma´nska and E. S˛edek
• The Wi-Fi or Bluetooth work mode, i.e. wireless transmission of classification results to the device. If neither Wi-Fi nor Bluetooth option is chosen (the chosen option is “None”), then the classification results are displayed only in the dialogue window of the classifying program [3, 4]. If the function of controlling the device is set, the classification result can be observed on the device as well as in the dialogue window of the classifying program. The dialogue window has buttons. Each of these has a corresponding activity set off by choosing the given button: • The “Start” button––makes it possible to start controlling/measurements. It calls the primary functions of the program making it possible to initiate, run and finish the process of controlling/measurements. After choosing the “Start” button in the dialogue window the code is executed, which activates the parameters chosen in the dialogue window and makes it possible to do the next function. This is a file in the program, which calls all the basic functions, i.e. • The shared library (eegloglib.dll), which is connected to the headset, collects raw data from 14 electrodes of the Emotiv headset and records raw files to a selected file, which is then copied by the program to subdirectories corresponding to subsequent measurement sessions. • The function of processing the recorded data and saving them in the appropriate catalogue. • The classification function on the basis of processed data. The classification function is the main function of the control algorithm. The correct classification result sets off the control process—making it possible to implement the user’s intention. This function includes modules carrying out the classification process: solving the inverse problem (determining the approximate location of signal sources) [5, 6], calculating t-statistics, Gower’s algorithm (the scaler sign is the classifier), the algorithm for determining the classification result, the graphic interpretation of the results of classification signals. This function transmits the classification result and the impulse with the classification result, which in fact is a control impulse. It is possible to generate reports on classification accuracy in the MS Excel software program, which makes it easy to analyse if the classification results are in accordance with the movement intention imagined by the user. This is useful in the user training process or in evaluating consistency when analysing the correctness of the classifier. The reports with the best classification results can also be used to set the parameters for a specific user. The view of a sample report of a correctly recognized right-hand movement in MS Excel format, performed automatically by the program, is presented in Table 1. This table presents the results of one measurement and also a summary of four measurements made in the same measurement session for both the right and the left hand (in this case, one measurement session consisted of four measurements taken for one person). The letter L stands for the left hand and the letter R for the right one. All the measurements taken in this session were correct.
Algorithm Classifying Brain Signals in the Control Problem
539
Table 1 Sample report in MS Excel format P Field
Description
13 24 25 26
PSD2_rawdata
18 19 30 31
1 side_20HzPSD2_rawdata
Summary
8–10(time interval)
Assumption
Summary PSD2_rawdata 8–10
R
R
1
R
1
Sum of edge weights_2: 9.1401
L
1
Sum of edge weights_3: 95.4223
L
1
Sum of classification results
4
Number of measurements
4
Result of spanning tree Classification: RIGHT
• The code that remembers the data which will be included in the report, i.e. the classification result and selected parameters of the classification (the frequency of signals used for the classification, the time intervals used for signal analysis, the acronym of choosing the right or the left hand for a given measurement in a measurement session carried out, the brain areas taking part in the signal classification process). • The “Stop” button makes it possible to stop the measurement session. Choosing “Stop” in the dialogue window sets off the execution of the code that stops the measurement session. After pressing the “Stop” button, it is possible to generate a report of the control/measurement session. If required, it is possible to choose the “Start” button again and start a subsequent measurement session/sessions. • The “Report” button makes it possible to generate a report on a given session, control test or measurement attempt. Generating reports is important in the course of testing users and when choosing the best parameters for particular users. Reports are used for fast evaluation of classification correctness. After choosing the “Report” button in the dialogue window, the execution of the code from the file is started, which generates the report from the control/measurement session into a Microsoft Excel file. In order to generate a report, the program has to be stopped.
540
U. Jagodzi´nska-Szyma´nska and E. S˛edek
• The “Load data from file” button makes it possible to reproduce the classification results from the registered raw data and generate a report on their basis. After choosing the “Load data from file” button, the execution of the code from the file is started, which allows to repeat the classification process and generate a report from the raw data recorded. The dialogue window makes it possible to submit the value of the head radius (the default value is 65 mm, which is the diameter of the brain area over which the cerebral cortex is located).
3 The Classification Function as the Main Function of the Control Algorithm The classification function is implemented on the basis of the processed data. It includes the following modules and algorithms implementing the classification process: • The implementing module: – A spherical coordinate system for the head model (the layout of electrodes and voxels on the cerebral cortex). – An algorithm for determining the approximate intensity of the signals using the algorithm solving the inverse problem by means of the method of minimizing the norm with the least squares method. – An algorithm for determining the value of t-statistics between the P areas of opposite brain hemispheres. – An algorithm for determining spherical coordinates of local signal sources. • A module implementing Gower’s algorithm. • A module that returns the spherical coordinates of P fields included in the spanning tree. • A module that makes it possible to visualize the results of Gower’s algorithm. • An algorithm for determining the classification result on the basis of the weights of significant connections between the P distances from particular areas of the head. • An algorithm for transmitting the classification result in the form of a control impulse to the device. Below there is a fragment of the classification function code, which is the most important function of the algorithm––it is a fragment of the code for calculating t-statistic:
Algorithm Classifying Brain Signals in the Control Problem
541
4 Summary Brain signals can be effectively converted into a control impulse [10]. In order to recognize the intention of left- and right-hand movement, the spanning tree of the EEG signal classification graph was used. The algorithms that were developed for locating the sources of these signals on the surface of the cerebral cortex using the graph theory, determining the current density of these sources, as well as developing algorithms using a spanning tree graph for classifying the intention of left- and righthand movements yields promising results for control using the classifier based on the ERD/ERS phenomenon.
542
U. Jagodzi´nska-Szyma´nska and E. S˛edek
The Emotiv mobile device produced by Epoc was used to record brain signals. The device has 14 active electrodes registering the records of electrical activity of neurons in the brain [7], and it does not have electrodes located on the cerebral cortex. Classifying movement intention by means of using the algorithm that was developed based on graph theory and on information regarding the sources of EEG signals (after solving the inverse problem) makes it possible to achieve a high rate of accuracy in the classification of these intentions, even if the mobile Epoc device does not have electrodes in the area of the motor cortex. For this purpose, the algorithm has to be adapted to the user’s personal characteristics, which is possible due to the possibility of selecting parameters in the dialogue window of the classifier program. In the dialogue window, one can change the radius of the user’s head (the radius is 65 mm by default), select the frequency from among: 12, 20 and 22 Hz, and the time intervals of the appearance of Beta and Mu waves regarding the user’s intention to move. One can also set whether the image of the movement should refer to the right or left hand and select the NONE option when reports of the control being conducted are not required and the operator does not set the task of making a specific movement but the user does it himself.
References 1. D˛abrowski M, Laus-M˛aczy´nska K (1978) Information Retrieval and Classification. Survey of Methods. Scientific and Technical Publishers, Warsaw 2. Gower JC, Ross GJS (1969) Minimum spanning trees and single linkage cluster analysis. J R Stat Soc. blackwell publishing for the royal statistical society, series C, 18(1): 56–64 3. Jagodzi´nska U (2013) Towards the applications of algorithms for inverse solutions in EEG brain-computer interfaces. Int J Electron Telecommun 59(3): 277–283, Sigma-Not, Warsaw, Poland. https://doi.org/10.2478/eletel-2013-0033 4. Jagodzi´nska U (2013) The implementation of algorithms for inverse solutions in EEG braincomputer interfaces. In: IEEE conference signal processing symposium (SPS), Poland, 178– 182. https://doi.org/10.1109/SPS.2013.6623607 5. Jagodzi´nska-Szyma´nska U, S˛edek E (2018) Wireless device control with the use of brain signals. Elektronika 2 46–51 Sigma-Not, Warsaw, Poland. https://doi.org/10.15199/13.2018. 2.11 6. Jagodzi´nska-Szyma´nska U, S˛edek E (2018) Wireless device control by means of brain signals. In: IEEE conference baltic URSI symposium (URSI). 108–110, Pozna´n, Poland. https://doi. org/10.2319/URSI.2018.8406721 7. Kandel ER, Schwartz JH, Jessell TM (2000) Principles of neural science. McGraw-Hill Companies, Inc. 8. Noirhomme Q (2006) Localization of brain functions: stimuling brain activity and source reconstruction for classification. louvain: these presentee en vue de l’obtention du grande de docteur en science appliquees, Universite catholique de Louvain. 9. Nunez PL, Srinivasan R (2006) Electric fields of the brain. Oxford University Press. 10. Wolpaw JR, Birbaumer N, McFarland DJ, Pfurtscheller GI, Vaughan TM (2002) Braincomputer interfaces for communication and control. Elsevier
Fast Geometric Reconstruction Using Genetic Algorithms from Single or Multiple Images Afafe Annich, Imane Laghman, Abdellatif E. L. Abderrahmani, and Khalid Satori
Abstract This paper presents a new reconstruction method. It allows semiautomatic 3D modelling for human-made environments based on single or multiple views. Our proposed algorithm allows 3D results to be explored in architecture, video games, virtual reality and more. It is interesting in terms of simplicity, efficiency and rapidity. It doesn’t require any initialization so any calibration is needed before. In this work, we will focus on genetic algorithm and geometric constraints. The user inputs observed constraints in image. Using this input, the 3D reconstruction starts. It takes advantage of constraints to reduce the number of estimated parameters, and also preserves the geometry of the 3D object. As a result, the 3D model comes more realistic. First, we describe our 3D model with geometric constraints. Then, we bring the appropriate genetic model. We estimate 3D primitives, focal lengths. We conclude simultaneously the camera position using an estimation of the camera position algorithm. Finally, we conduct experiments and present the results on different views the forcefulness of the present work. Keywords 3D reconstruction · Virtual reality · Constraints · Single uncalibrated image reconstruction · Genetic algorithms (GAs)
1 Introduction Recovering the model of three-dimensional from one (or more than) images presents an important field of computer vision and computer graphics. It allows the ability to attempt several application areas such as virtual and augmented reality, architecture and engineering. Unfortunately, complex calibration and assumptions are needed first to describe camera properties. This acts upon several areas; complexity of used A. Annich (B) · I. Laghman · A. E. L. Abderrahmani · K. Satori Department of Computer Sciences FSDM, LISAC, University Sidi Mohammed Ben Abdellah, P.B 1796 Atlas FEZ, Morocco A. Annich Higher Institute of Information and Communication, Rabat, Morocco © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_48
543
544
A. Annich et al.
materials, the slowness of reconstruction and the results obtained do not make it the best or the most realistic. That’s why, this makes also real-time reconstruction becomes a challenge. The solution that we propose is based on genetic algorithms and geometric constraints. They keep the geometry of three-dimensional objects protected all along the reconstruction. Thus, they guarantee a fast result as well as higher reliability and reality. Some approaches are based on basic primitives, and they automate the reconstruction process. However, they present a high dependence on the nature of textures. Approaches that are based on the description of Computer-Aided Design use a 3D shape’s library. They implicitly present a definition of constraints. But they suffer from library limitations. Finally, there are mixed approaches that are combining the two previous approaches mentioned. Indeed, our main contribution is a simple and practical modelling method. The use of constraints influences the quality of the 3D results obtained. It reduces the quantity of images employed and facilitates modelling. Furthermore, the application of the GAs enhances 3D reconstruction. First of all, we present the reconstruction issue through a global overview of existing methods. After that, we will explain in Sects. 3 and 4 how the constraints and GAs were used. In Sect. 5, we will give distinctive tests and comparisons to demonstrate the competitiveness and efficiency of our new method. Section 6 is a conclusion for this paper.
2 Related Works Several approaches for 3D reconstruction work mostly from multiple views. So, it’s really hard to offer just one view of a 3D scene. In fact, in this study, we are attempting to reduce the number of images, overcome the calibration stage and limit the resources used. The approach consists of geometry and genetic algorithms. Several ways of using geometry [1] on 3D reconstruction [2] have been used. Next, techniques that take a 3D object as a structured combination of elementary elements. The user selects types and establishes the restrictions between them. Obtaining indirect geometry is the main concept. However, such a strategy hardly relies on current shapes such as CATIA and 3D Studio frameworks. In system [3], the goal of the authors is to acquire a camera calibration employing six parallelepiped corners. But it does not seem strong because of the requirement of using a particular shape. As reported in [4], the architectural scenes are built from a series of planes: floors either walls, which in all 3D worlds are not useful. To offer a metric 3D reconstruction, the sensors are self-calibrated. In reality, the initialization of the camera to initiate the problem of reconstruction is difficult. The research explained in [5] uses cylinders. So, one of the main purposes is to offer the main equations that dominate the three-view geometry of cylinders. The authors liaise between the three-view geometry of cylinder. They use the correspondences between six cylinders over three views so they can achieve 3D world reconstruction and camera parameters. Methods that use simple primitives retrieve simple elements from the 3D model. The complexity of models, such as curves and
Fast Geometric Reconstruction Using Genetic Algorithms …
545
surfaces, can be traced, however, they depend heavily on object textures [6]. For details on plane architectures, refer to [7–10]. Suddenly, there are hybrid methods that merge the two. The goal is to maintain the flexibility of the basic primitives and the development of CAD elements as well. Linear and bilinear constraints have been provided in the system discussed in [11]. In [12], a research issue discusses the challenge of dealing with redundant constraints. In fact, we follow a similar philosophy to the works presented in [13]. We are using GAs [14] for reconstruction problems in this research. In reality, GAs have shown that due to the ability to tackle search spaces with different local optima. In order to extract constraints and predict focal length, another approach uses vanishing points. It seems necessary, but it suffers from restrictions. We use a hybrid approach to define constraints. The user enters the chosen corners and then incorporates constraints.
3 Methodology We consider a pinhole camera. Two sets of parameters are defined: Extrinsic ones define components of translation vector T and rotation angles R. Intrinsic are f, ku , kv , u 0 , and v0 . f is the focal length, ku , kv are the scale factors, u 0 and v0 describe principal point coordinate. Intrinsic parameters will be estimated using GAs. For extrinsic, we will use the position’s estimation [15] simultaneously.
3.1 Problem Formulation → We estimate a vector’s parameters x. That is described like: − x = φ = (φ1, φ3D ). Our developed algorithm is used in the aim of model and intrinsic parameters estimation. The 3D model’s parameters except angles are in a range [0,1]. Focal length fx and fy are between [800, 2000]. We avoid complex bundle adjustment use. In the experimental section, we will prove that our new approach succeeds to improve reconstruction success without any required information. We will present various tests and comparisons.
3.2 Genetic Algorithm’s Use and Impact Assume that we have a discrete search space S and a F : S → R function. The general problem of optimization is to find: minF(X)X∈S . Here X is a vector’s parameter, and F is the cost function or fitness function. We assume here that the optimization problem is a minimization one. There is a multiple discussion of parameter choice for GAs in the literature. We could not find results that prove what is the best. The studies are given in [14] and [17] in detail.
546
A. Annich et al.
In our paper, our purpose is that the best solution vector will last. So, a crossover is strongly needed. We work on an initial population composed of 100 individuals. In the literature, some studies prove that a small population approves more performance (20–30). To avoid the mutation, we increase the population up to 100 individuals. It gives at the same time some disruption. In the end, although we use only crossover, we could find more disruption between individuals. Real coding has much precision [14] especially for the kind of problems requiring more precision. In addition to this, real coding brings a large parameter’s margin.
4 Our Technical Approach Our novel approach is based on different major concepts. First of all, we used GAs to estimate the 3D model. Different approaches can use GAs like [19] and [20]. They are not based on constrained and one view reconstruction, which makes the comparison inutile. Secondly, to enhance results quality and reduce estimated parameters, geometry is mainly important. The flowchart in Fig. 1 presents a summary of our 3D reconstruction’s steps.
4.1 Input Geometric Constraints The user input constraints interactively. Constraints are described using graphs. We modelized an example of a 3D model for four points linked by constraints (double orthogonality). We represent a primitive by a node. We consider three parameters X, Y and Z as a description for free point. Otherwise, other parameters are defined by constraint. Point’s coordinates are concluded from its antecedents. Using GAs completely gives us more flexibility. Used approach in [1] requires some assumptions to use a constraint. It should check some properties of continuity and derivability to be integrated with the Levenberg algorithm. In fact, the
Fig. 1 Flowchart of the proposed 3D reconstruction’s approach
Fast Geometric Reconstruction Using Genetic Algorithms …
547
3D reconstruction algorithm requires derivatives calculation of fitness function to determine the Jacobi matrix. The present work avoids all this. It can help us to develop more complex constraints. We present an interactive reconstruction. User should pick up 2D primitives from images, and then starts modelling using them. We require selecting the nodes defined above, for each new node. This will invite user first to realize a 3D model, but has the advantage of having a reliable used model afterwards.
4.2 Optimization Step-Based GAs Genetic algorithms are meta-heuristics that use a population of possible solutions to an optimization problem to reach the best solution (individual). Each solution is described by a set of properties that will be coded, muted.Work presented in [1] used vanishing point to initialize focal length. They used also a previous initialization of 3D model using GAs. But the major optimization is based on bundle adjustment. Their results are very interesting in front of classic approach of bundle adjustment. But this approach is still not perfect. Approach presented in [18] used also vanishing point to initialize focal length and takes assumption that 3D plane has various parallel lines with at least two directions, to simplify parameter’s estimation. So, a validation step is required from users to check errors. In the present work we use GAs to estimate all 3D model and camera’s intrinsic parameters. As a result, we don’t need any information about the 3D scene. Genetic algorithms help us to explore a large space or research. Practically, we seek to reduce fitness for all views and points. Fc =
k 2 i, k p˜ ki − G( fli , P( f i , f 3D )i) f 3D
(1) ∼i
where G is the matrix of projection. P refers to camera’s position [15] and p k = T (xki , yki , 1) is the position of the Kth 2D point in picture i. f i. and f 3D are ϕ1 and ϕ3D respectively In our approach’s flowchart we clarify that our aim is to improve fitness. So, we want to decrease (Eq. 1). The vector’s components that will be estimated are focal length f x and f y corresponding to each view. We will generate randomly an initial population. Therefore, used structure is changed following available 3D model, which is presented by the user.
4.3 Selection and Crossover Operators First, the traditional representation of individuals is in binary. On the contrary, we choose real coding. Then, one chromosome (parameter) forms a set of genes, each one takes a value between real a and b values that we need for this specific parameter.
548
A. Annich et al.
The fitness function that we have used is the Eq. 1. The individual nearest to the optimal solution of the vector is the one with the lowest value of this fitness function. So, better individuals have a better chance to get more child than inferior in new generations. Tournament selection is frequently used for selecting individuals in GAs, which is strongly depends on fitness. We define a number of tries N ’ to select two individuals (chosen randomly from population). Each time, we compare them and we take the best so that he participates on next step as a parent. The crossover method merges the genetic information of two individuals and transported it to children. If we choose the coding adequately, then a good child would be produced by two good parents. We apply linear crossover. Two selected parents Pr 1 , Pr 2 , a given chromosome and a random number α between [0, 1].T the best two children from three children that will be generated will be taken: (2). C1i = α∗Pr i1 +α∗Pr i2 ; C2i = α∗Pr i1 +(1 − α)∗Pr i2 ;C3i = (1 − α)∗Pr i1 +α∗Pr i2
5 Experimental Results Our 3D reconstruction is elaborated with java language. We want to recover fast 3D reconstruction from one or more uncalibrated views. We can extract and add constraints on reconstruction in two ways. First is the automatic extraction, but they suffer from weak constraint definition. Then, the found model is unreliable. The second is based on a manual addition of constraints. Actually, the proposed system must allow the possibility to select simple geometric elements. Afterwards, he incorporates a 3D graph. As a result, he evaluates intrinsic and extrinsic parameters for each sensor, and recovers the 3D model. The experimental results are detailed below. We will take into consideration two important cases: constrained and nonconstrained 3D models. On the other side, our method will be compared with work [1]. The user selects primitives; he adds a 3D model. Thereafter, the system starts 3D reconstruction.
5.1 Population’s Optimization For this test, we will consider free points. We used three uncalibrated images in Fig. 2b. for the optimization step we generate randomly 100 individuals as an initial population. We define t (GAs iteration’s number). In fact, we pick t = 100 to stop the optimization’s process. The use of t = 100 is in other to show the behaviour of results from a generation to another. Practically, we compare the fitness function with a given threshold to stop. First, we show the population’s evolution from one generation to another. As described in Fig. 3a, first generation presents the highest value of fitness function. In fact, we show the improvement of the best individual in the population. High fitness
Fast Geometric Reconstruction Using Genetic Algorithms …
549
Fig. 2 a Our 3D reconstruction system. b Three images used to elaborate 3D reconstruction and test fitness evolution. c Used image for 3D reconstruction and selected points. This image is extracted from [16] database
Fig. 3 a Genetic algorithm impact on fitness function. b Genetic algorithm impact on average and dispersion. c Focal length estimation (with/without constraints). d Transformation of fitness function. From the first generation up to the 100th one using Fig. 2b
value means that the individual is so far from solutions. The fitness function comes stable after the 40th generation. From one generation to another, fitness decreases. We describe in Fig. 3b the behaviour of fitness’s average and population’s dispersion. It will be surprising that average and dispersion don’t decrease from one generation to another. Can we explain this event? In fact, the evolution of the best individual in the population is interesting as described in Fig. 3a and guarantees that the algorithm process is efficient but, the dispersion of the population is still not stable. If we look in Fig. 3b we can notice that high dispersions are in relation with a high average. The explanation that we can propose is the impact of our proposed crossover operator. Using for each chromosome from the same individual different values of α gives a
550
A. Annich et al.
Fig. 4 Fitness function’s value for generations (1, 5, 10 and 100) using GAs for each individual
large space of research. It behaviour is like mutation operator. It causes the appearance of some strange individuals from time to time. To prove it, in Fig. 4 we describe the value of fitness for all individuals from lowest to the highest. In fact, we chose generations 1, 5, 10 and 100. It is clear that the population continues to concentrate aground low values with some dispersions caused by the little strange individuals. The behaviour presented of focal length f x in Fig. 3c for the first camera. We give a simple comparison between two cases with and without constraints. Using constraints gives clearly the best results. It seems to stabilize the value of focal length from the 32nd generation. In the other case, without constraints estimation is still in evolution until the 166th generation. In general, our method gives good results without any prior information. No calibration or self-calibration is needed. No pertinent initialization is required. With constraints, we obtain results for 200 generations just in 32 s. The behaviour of f x comes stable after the 30th generation.
5.2 Comparisons This part using images Fig. 2c will compare results with anterior works [1]. Our current aim is to resolve also a convergence problem. The main contributions in this paper are around two concepts. The first one (1) consists to avoid complexities and challenges caused by the use of bundle adjustment. We avoid the problem of required initialization. In works [1], vanishing points are needed to estimate focal length. On other side, GAs are used to improve 3D model initialization, and using a simultaneous estimation of camera position (using an algorithm of position estimation). All these solutions give interesting results, but in our works, we try to resolve these problems
Fast Geometric Reconstruction Using Genetic Algorithms …
551
Fig. 5 a Obtained 3D model based on Fig. 2c. b 3D model used in reconstruction
completely. We propose to use GAs in order to estimate all parameters and use them as the principal method of optimization. So, we don’t use bundle adjustment. Our present paper use only GAs. We avoid also complexities of derivation (Jacobin Matrix), complex decomposition of matrix, etc. Without pertinent initialization and simplification, we can achieve 3D reconstruction with good results. The second concept (2) is that we can use different kinds of constraints. In fact, using bundle adjustment requires derivability and continuity to calculate the Jacobin matrix. Our work struggles with this limitation. We offer a wide range of constraints for a more general 3D object. In addition, the use of vanishing points to extract constraints or estimate focal makes 3D reconstruction limited. In fact, we can’t find or detect vanishing points all the time in our environments. It depends on the nature of existing 3D objects. To make our results clear, we will compare them with the strategy presented in [1]. Seven points are selected. We compare in terms of fitness function per generation. 200 generations are proceeded. Two cases are considered for the two methods, with and without constraints. In Fig. 3d, obtained results show that our method is more efficient. The fitness value is lower. Applying constraints brings more precision. Another test for real image was extracted from the database [16]. We apply constraints described in Fig. 5b and we use Fig. 2c. The obtained 3D results are very interesting as shown in Fig. 5a 3D object respects hardly presented constraints.
6 Conclusion Our contribution is the use of GAs to recover 3D reconstruction from uncalibrated views. We compared obtained results with the classic approaches based on bundle adjustment optimization. To summarize, our method offers different advantages: Reducing 3D reconstruction’s complexity and make it faster, avoiding required pertinent initialization, ability to use different kinds of 3D scenes and to use different types of constraints and 3D objects, guarantee reconstruction success and quality of the 3D result. Thanks to their advantages, our method can be used in several applications: architecture, video games, visual simulation and robotic. But the application in virtual and augmented reality is still the most important concept that interests us in future works.
552
A. Annich et al.
References 1. Annich A, El Abderrahmani A, Satori K (2017) Fast and easy 3D reconstruction with the help of geometric constraints and genetic algorithms. 3D Res 8(3). https://doi.org/10.1007/s13319017-0139-6. 2. Andrew AM (2001) Multiple view geometry in computer vision, by Richard Hartley and Andrew Zisserman, Cambridge University Press, Cambridge 2000 xvi+607. ISBN 0–521– 62304–9 (hardback, £60.00). Robotica 19(02). https://doi.org/10.1017/s0263574700223217 3. Chen C-S, Chi-Kuo Yu, Hung Y-P (1999) New calibration-free approach for augmented reality based on parameterized cuboid structure. In: Proceedings of the seventh IEEE international conference on computer vision. https://doi.org/10.1109/iccv.1999.791194 4. Dick AR et al (2001) Combining single view recognition and multiple view stereo for architectural scenes. In: Proceedings eighth IEEE international conference on computer vision. ICCV 2001. https://doi.org/10.1109/iccv.2001.937528 5. Navab N, Appel M (2006) Canonical representation and multi-view geometry of cylinders. Int J Comput Vis 70(2): 133–149. https://doi.org/10.1007/s11263-006-7935-4 6. Mahamud S, Hebert M (2000) Iterative projective reconstruction from multiple views. In: Proceedings IEEE conference on computer vision and pattern recognition. CVPR 2000 (Cat. No.PR00662). https://doi.org/10.1109/cvpr.2000.854872 7. Bartoli A (2003) Reconstruction et alignement en vision 3D: points, droites, plans, caméras. Phd thesis, Institut national polytechnique de grenoble, septembre 2003. Werner T, Zisserman A (2002) Model selection for automated reconstruction from multiple views. In: Procedings of the british machine vision conference 2002. https://doi.org/10.5244/c.16.3 8. Werner T, Zisserman A (2002) New techniques for automated architectural reconstruction from photographs. Lect Notes Comput Sci 541–555. https://doi.org/10.1007/3-540-47967-8_36 9. Habbecke M, Kobbelt L (2012) Linear analysis of nonlinear constraints for interactive geometric modeling. Computer graphics forum, 31(2pt3). 641–650. https://doi.org/10.1111/ j.1467-8659.2012.03043 10. Vouzounaras G et al (2010) 3D reconstruction of indoor and outdoor building scenes from a single image. In: Proceedings of the 2010 ACM workshop on surreal media and virtual cloning—SMVC. https://doi.org/10.1145/1878083.1878100 11. Wilczkowiak M, Sturm P, Boyer E (2005) Using geometric constraints through parallelepipeds for calibration and 3D modeling. IEEE Trans Pattern Anal Mach Intell 27(2):194–207. https:// doi.org/10.1109/tpami.2005.40 12. Zou C et al (2015) Sketch-based 3-D modelingfor piecewise planar objects in single images. Comput Graph 46: 130–137. https://doi.org/10.1016/j.cag.2014.09.031 13. Annich A, Abdellatif EA, Khalid S (2015) Enhancement of 3D reconstruction process in terms of beautification and efficiency using geometric constraints. Intell Syst Comput Vis (ISCV). https://doi.org/10.1109/isacv.2015.7106180 14. O’Neill M, Riccardo P, William BL, Nicholas FM (2008) A field guide to genetic programming. Genet Prog Evolvable Mach 10(2):229–230. https://doi.org/10.1007/s10710-008-9073-y 15. David P et al (2004) SoftPOSIT: simultaneous pose and correspondence determination. Int J Comput Vis 59(3): 259–284. https://doi.org/10.1023/b:visi.0000025800.10423.1f 16. Denis P, Elder JH, Estrada F (2008) Efficient edge-based methods for estimating manhattan frames in urban imagery. In: proc. European conference on computer vision (5303), 197– 211.http://www.elderlab.yorku.ca/YorkUrbanDB/ 17. Grefenstette JJ (1986) Optimization of control parameters for genetic algorithms. IEEE Trans Syst Man Cybern 16(1):122–128 18. Feng C, Deng F, Kamat VR (2012) Rapid geometric modeling for visual simulation using semi-automated reconstruction from single image. Eng Comput 30(1):31–39. https://doi.org/ 10.1007/s00366-012-0283-9
Fast Geometric Reconstruction Using Genetic Algorithms …
553
19. Bevilacqua V et al (2017) Photogrammetric meshes and 3D points cloud reconstruction: a genetic algorithm optimization procedure. In: Rossi F, Piotto S, Concilio S (eds) Advances in artificial life, evolutionary computation, and systems chemistry. WIVACE 2016. communications in computer and information science, vol 708. Springer, Cham. https://doi.org/10.1007/ 978-3-319-57711-1_6 20. Abidi MA, Gribok AV, Paik J (2016) Multimodal scene reconstruction using genetic algorithmbased optimization. In: optimization techniques in computer vision. Advances in computer vision and pattern recognition. Springer, Cham. F
Metagenomic Analysis: A Pathway Toward Efficiency Using High-Performance Computing Gustavo Henrique Cervi , Cecília Dias Flores , and Claudia Elizabeth Thompson
Abstract Clinical metagenomics is a technique that allows the search for an infectious agent in a biological tissue/fluid sample. Over the past few years, this technique has been refined, while the volume of data increases in an exponential way. DNA sequencers generate gigabytes of data and these data must be paired with bases that exceed one terabyte. The molecular nature of DNA does not allow the use of traditional string search algorithms, since biologically analogous sequences are not always syntactically identical, which makes it difficult to analyze matching sequences. In the case of clinical diagnosis, in specific cases, the sooner the diagnosis is available to the doctor, the higher are the chances of the patient’s recovery. Therefore, the data processing must be as short as possible, maintaining the necessary detail and coverage. This paper describes some current techniques of computational processing and acceleration of the search for genomic data and explores possible alternative computational paths to streamline the process of metagenomic diagnosis. Keywords Metagenomics · Diagnosis · Computing methodologies · High-throughput nucleotide sequencing
1 Introduction Metagenomics refers to the study of the genetic material collected from environmental biological samples, such as soil, water, fluids, and others, processed at the lab (the “wet” part) and sequenced by a machine (genetic sequencer) [1]. A huge amount of data, which may vary from some gigabytes to a terabyte data, is obtained by this process called sequencing [2]. The “dry” analysis is made in silico (computational environment), using specific softwares that analyze data, including statistical data [3]. In the last years, the metagenomic analysis has been applied as clinical metagenomics [4], where the environmental samples are collected from human tissues and fluids, with the main objective of looking for infectious and parasitic biological agents. A G. H. Cervi (B) · C. D. Flores · C. E. Thompson Federal University of Health Sciences (UFCSPA), Porto Alegre, Brazil © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_49
555
556
G. H. Cervi et al.
special chapter of this history is to use metagenomics as a tool for the diagnosis of infectious diseases [4–6]. While the genomic analysis is focused on a single biological subject, metagenomics collects genetic information from all biological subjects present in the sample [7]. Since the DNA is measured in atoms and molecules (molecular biology), the amount of genetic material from a single sample is too small to be sequenced. Thus, the genetic material obtained from samples must be duplicated and amplified through a technique called PCR (polymerase chain reaction). This method consists of thermal cycles that denature (break) the DNA allowing the duplication and amplification processes [4, 7]. In the next phase, a machine identifies the nucleotide sequences. In the early days of the genetic sequencers, this was a time-consuming job because the process occurred in a serial way [7]. Nowadays, the NGS (next-generation sequencing) [8] works in a massive parallel way, obtaining gigabytes of DNA data per run (chemical process cycle). Until the mid-2000s, when the first NGS sequencers appeared, it was hard to obtain data. Today, it is hard to analyze the amount of data [7]. The field of metagenomics is extensive, and this paper aims to discuss the computational steps, which are commonly performed using software “pipelines” [3]. In these “pipelines,” the raw data “enters” the first software (normally a filter) and passes through a sequence of other softwares and scripts, each one with a specific function (filter, organizer, alignment, matching, and statistics) [7].
2 Pipeline In general terms, a metagenomic pipeline can be divided into four steps: (i) filtering data, (ii) aligning with databases, (iii) filtering results, and (iv) statistical reports. Despite the fact, the DNA vocabulary is very reduced, containing only four letters, the combination of these letters are the “source code” of all living matter, which varies from a simple thousand-base bacteria to a multi-billion-base animal genome [7]. The computational processing behind this amount of data is a challenging problem [9], especially when the researcher/physician is running against the time, for example, waiting for the diagnosis of a disease. In a computational perspective, the main timeconsuming problem relies on the huge volume of data to be filtered, organized, and compared. Once the genetic sequencer finishes the sequencing process, all that data must be processed. The NGS technology produces a large amount of “short reads,” which are sequences up to 300 nucleotide bases (adenosine, guanine, thymine, or cytosine). They are represented as a string of data like “ACGGATATTCGATTG…”. The comparison of these sequences with the reference database results in a possible diagnosis with the identification of the specific etiological agent (virus, bacteria, and/or fungus). These reference databases are commonly accessible from public sites like GenBank (NCBI, USA), EMBL Bank (EMBL-EBI, Europe), and DDBJ Bank (DDBJ, Japan), which easily overpass the terabyte of data.
Metagenomic Analysis: A Pathway Toward Efficiency…
557
Fig. 1 Chromatogram samples source: the authors, based on [13, 14]
2.1 Quality Control The first stage after sequencing is the “quality control.” The sequencer produces a large amount of data, but not all with the same quality. This step is important to remove “low-quality” data [10, 11]. In summary, the first-generation and some second-generation sequencers collect information through a fluorescent agent bound to the nucleotide [8], using a very precise wavelength laser, capturing the light emitted by the molecule to infer the sequence. The resulting signal may be biased or not deterministic. The sequencer calculates the “quality” of the read based on the light intensity and writes the score in the result file—each sequenced nucleotide has its specific quality score [12]. Figure 1 shows two chromatograms, on the left good quality samples, and on the right low-quality samples, showing multiple peaks per base [11].
2.2 Host Removal This step is also important to speed up the analysis and reduce the risk of bias. Since the sample is obtained from a living host (human), it is likely to have the host DNA present in data after sequencing. This stage is performed by searching for the host DNA, in the result file, through comparison with the reference genome available on public databases. Once the host reads are identified, they must be removed. In case of human samples, a reference human genome is used.
558
G. H. Cervi et al.
2.3 Searching Through Reference Databases This step is the most computing expensive task. The sequencer yields a huge amount (>100 million) of short reads (up to 300 nucleotides bases in current NGS technology) written in a text file, whose format is commonly the FASTQ type [12] (same from the original FASTA file format but with quality information). Each read is represented by one DNA string like “ACGATCGATTCGGA(…)” and it must be compared to reference datasets (terabytes of genomes from all sorts of living organisms, available on public organized databases). The first guess is that O(m + n) like Knuth-MorrisPratt or O(m) + (n/m) like Boyer-Moore algorithms could be applied to solve this problem. However, these algorithms cannot produce acceptable results from a biological viewpoint. To be able to compare the new sequence to all sequences deposited in databases, it is necessary to perform sequence alignment. The DNA is not a rigid and static sequence; it is submitted to evolutionary forces such as mutation and selection. Considering the mutational aspect, the DNA substitutions can be classified as (i) transitions: when involves bases with similar shape, interchanges of two-ring purines (A G), or one-ring pyrimidines (C T) and (ii) transversions: when involves substitutions of one-ring and two-ring DNA bases, interchanges of purine for pyrimidine bases, and vice versa. Figure 2 indicates the possible transitions and transversions. When comparing two sequences to obtain an alignment, the main objective is to identify the positional homology, i.e., identify sites with a common ancestry in the alignment. It may be necessary to include gaps (indels, corresponding to deletion in sequence 1 and insertion in sequence 2) to better accommodate one sequence in relation to another. In this sense, the sequence “ACGATCGAT” may be biologically equivalent to the sequence “ACGCTCGGAT” (one mutation and one indel), i.e., they may be homologous. Homology is a biological concept that indicates two sequences share a common ancestry. Common algorithms used to align sequences in genomic research are Levenshtein distance [17], Smith-Waterman [18], Needleman-Wunsch [19], Burrows-Wheeler [20] plus hashing and its derivatives. Blast, which uses a heuristic method based on the Smith-Waterman algorithm, is the most used software to perform local alignment. It allows identifying subject sequences in a database that are similar to a query sequence. Figure 3 shows a local alignment obtained by a Blast
Fig. 2 Transitions versus transversions in a DNA sequence source: the authors, based on [16]
Metagenomic Analysis: A Pathway Toward Efficiency…
559
Fig. 3 NCBI Blast output example source: the authors (sample on NCBI Blast)
search, with the indication of mismatches (blue arrow and lack of | symbol), indels (green arrow and gaps), and matches (red arrow and | symbol).
2.4 Dynamic Programming Dynamic programming applied to bioinformatics (e.g., Levenshtein distance, SmithWaterman, and Needleman-Wunsch) has complexity in order of O(mn) in the worst case, but it is possible to improve [21]. It is very time-consuming task, although it is possible to parallelize the job. It is not rare that a software solution has more than one combination of algorithms. For example, in case of seed-and-extend algorithms, it is very common for a software aligner using the Burrows-Wheeler algorithm to reduce the size and hash tables to find the seed portions. Dynamic programming applied to sequence alignment can be explained using a two-dimensional matrix where two sequences are compared and there are three main steps: (i) matrix initialization, (ii) matrix fill (scoring), and (iii) traceback (alignment). Match, penalty-gap, and mismatch values are defined according to a score [22]. During the matrix fill, for each cell, all possibilities are evaluated and received a value: (i) in diagonal: match or mismatch, (ii) gap in sequence y, and (iii) gap in sequence x. The traceback step determines the actual alignment(s) that result in the maximum score. In Fig. 4, the maximum alignment score for the two sequences is 11 and the best alignment is shown in red.
2.5 Accelerated Alternatives Through the years, several alternative methods were studied to accelerate the process of alignment and matching of biological sequences. Some of them use common hardware pieces like a standard × 86 home computer containing graphics cards with GPU processors and others using specific pieces. In summary, there are four main approaches: ASIC, ASIP, FPGA, and GPU.
560
G. H. Cervi et al.
Fig. 4 Dynamic program applied to sequence alignment Source The authors. References: [2, 3]
The ASIC approach. The Application-Specific Integrated Circuits (ASIC) may be the lowest level of the computational data integration where the circuit is designed to be the most specific as possible [6]. In the 90s, to play a single MP3 sound, a common home computer had to be used with full capacities to run the MP3 decode algorithm. Another example was observed again with the cryptic coins. The first coins were mined with a simple home computer. Both examples had successful ASICs implementations to enhance the performance. An actual example of ASIC used in metagenomics is the Nanopore [24] for the sequencing and DNASSWA [25] and [26] for alignment. It is the most computational efficient way to standardized algorithms. However, the weak part relies on the cost per device and the time to develop the solution. The ASIC must be engineered by specialists and produced by a capable industry at a high cost (sometimes more than a million US Dollars). This cost can vary, it is possible to ask for an estimation if you have the project and parameters well defined [27]. This approach, while known by the best results, is not financially viable for academic purposes. The ASIP approach. The General-Purpose Processor (GPP) is an architecture that allows the solution for a vast range of problems [29], involving calculations, signal analysis, and even neural networks. This architecture can be useful to most of the computations but may be not highly efficient when specific tasks are required. In 1997, Intel released the Pentium MMX processor containing 47 new instructions [29] focused on multimedia and bringing a new approach to the Intel × 86 family. These SIMD (single instruction multiple data) instructions, at some cases, increased the performance 23 times in comparison to the equivalent task using the standard instructions, as shown by [30]. A simple task that required 1,881 cycles to be performed was reduced to 81 cycles using the specific SIMD instructions. This was a convergence between two technologies: the GPP and the ASIP (application-specific instruction-set processor), also known as co-processor. The ASIP approach has the flexibility of GPP and the performance of an ASIC [31] but has the same financial issues of ASIC. Projects that implement this solution in genomics are using FPGA to synthesize the ASIP [32].
Metagenomic Analysis: A Pathway Toward Efficiency…
561
The FPGA approach. The Field Programmable Gate Array (FPGA) is an integrated circuit that contains an array of programmable (configurable) logic blocks. These logic units (LU) can be configured as logic gates (AND, OR, XOR), and other elements [33]. The FPGA approach is the most viable and feasible to academic researchers due to its low cost and high availability of the required hardware. A simple research board can cost as low as 50 US Dollars. The manufacturers like [34] and [35] have libraries that help the developer to explore its functionalities. A simple FPGA implementation can speed up at 10 × over the original GATK pipeline [36]. Metagenomic projects that use this approach can be listed as [36–38], yielding results in order of 81 times faster and 32 times more cost-efficient [38]. The GPU approach. In the late 1970s, the early computer systems had adopted auxiliary processors to handle the video signal and display elements, which led to what is known today as GPU. The term GPU stands for graphics processor unit through which the time was optimized achieving the level of a high-performance massive parallel processing unit, which allowed a large amount of calculations in parallel [39]. The direct benefit is the advance and performance of the computer display graphics (2D and 3D). However, as the technology advanced, other computations took advantage of the GPU to optimize results such as matrix calculations and vector mathematics [40]. One of the optimized functions was the Smith-Waterman algorithm in CUDASW ++ [41] with use of the CUDA technology [42]. With the spread of the GPU through general computing machines (home computers included), the access to this technology was a straightforward way. The issue related to this approach is that the required GPU to perform real-world biological computations requires an upper level machinery, which is not commonly available on general computing. Additionally, a single board can cost more than an array of standard computers. The use of GPUs in metagenomic analysis can yield a ~ 30 times faster performance in comparison with the same algorithm using only the software implementation [43]. Software optimizations. The most common approach to solve complex sequence alignments is software optimization [44–49], using all sorts of algorithms that may have advantages (speed and precision). These optimizations include hash tables [44], heuristics [45], reference comparison [46], and other mixed techniques [45–49]. Once the General-Purpose Processor (GPP) is not optimized to solve this kind of problem, the computational time applied to the software optimization without a massive cluster may be longer than the patient can wait. The most common example of software for sequence alignment is Blast (Basic Local Alignment Search Tool) [47], sponsored by the NCBI [36] and released at first in 1990. Blast can be simplified as a kind of search and match tool that receives a given string and tries to find matches with reference genomes, resulting in the identification of the biological agent. The most computational consuming step is the “seed-and-extend,” where the tool establishes short sequences called k-mer, finds the matches, and extends the matching nucleotides. This method may speed up the process 50 times in comparison with the string distancing Smith-Waterman algorithm [50]. Over the years, solutions were developed based on this approach. Some of them use hardware acceleration and
562
G. H. Cervi et al.
Table 1 Technologies and their implementations source: the authors Name
Application
Speed
Feasibility
ASIC
DNASSWA [25], SWASAD[57], Darwin [28]
High
Hard
ASIP
SIMD [58], GMAP [59], SSW [60]
Medium
High
FPGA
INTEL [34], Falcon [35], Survey [61]
High
Medium
GPU
Nvidia [40, 62], CUDASW [41]
High
Medium
Software
Kraken [44], Diamond [47], Kaiju [46], HSBlast [54]
Vary
High
present a gain in performance of 20 times in comparison with computer (PC)-based Blast [51], others use GPU accelerations improving up to 10 times the performance in comparison with standard PC software [52, 53]. The HSBLAST [54] claims to have the same result of MEGABLAST (NCBI) with up to 20 times the performance, using the Burrow-Wheeler Transform (BWT), which is the same algorithm used in software such BWA [55] and Bowtie2 [56]. Table 1 illustrates some technologies and their implementations.
3 Research In the last years, clinical metagenomics has jumped from ~ 70 publications on PubMed in 2010 to ~540 publications in 2019, probably because of advances on computational methods and the development of new sequencing technologies. While the mainstream factories are spreading genetic sequencers through the biotech laboratories, some companies are developing pocket sequencers [63] at a cost of ~U$ 4.500,00 that produce up to 30 Gb of data. Soon, it may allow the self-diagnosis of some diseases with effective confidence. Some researchers are working on selfwearable devices like watches, rings, earrings, and glasses. It will result in a massive amount of data to process, turning the analysis even more challenging. The traditional algorithms and the commonly used hardware, as in the cryptocoins case, may be not enough for the probe job. New technologies like fuzzy pattern matching [64], signal analysis [65], and even quantum computing [66] may be a game changer that will bring metagenomics research to the masses.
4 Conclusion As observed through the years, the researchers are evolving their techniques to speed up the analysis process and produce results earlier. The computer technology is also evolving, increasing speed, and capacities in new processors generation. However, the genomic databases are also growing in an exponential way. Consequently, it is
Metagenomic Analysis: A Pathway Toward Efficiency…
563
necessary a faster solution be able to deal with large amounts of data comparisons, enabling the use of clinical metagenomics as an important weapon against infections of difficult diagnosis and treatment. Acknowledgements This work is funded by grant number 440084/2020-2 from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq—Brazilian Ministry of Science and Technology) and Amazon Web Service (AWS—Cloud Credits for Research). Conflict of Interest The authors declare that there are no conflicts of interest.
References 1. Chen K, Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLOS Comput Biol 1:e24. Springer, Heidelberg 2. Metagenomics versus Moore’s law (2019) Nat Methods 6:623–623 3. Kakirde KS, Parsley LC, Liles MR (2010) Size does matter: application-driven approaches for soil metagenomics. Soil Biol Biochem 42:1911–1923 4. Chiu CY, Miller SA (2019) Clinical metagenomics. Nat Rev Genet 5. Dekker JP (2018) Metagenomics for clinical infectious disease diagnostics steps closer to reality. J Clin Microbiol 56. https://doi.org/10.1128/JCM.00850-18 6. Pallen MJ (2014) Diagnostic metagenomics: potential applications to bacterial, viral and parasitic infections. Parasitology 141:1856–1862 7. Compeau P (2015) Bioinformatics algorithms. vol. 1 Active Learning, La Jolla, CA 8. Benefits of SBS technology. https://www.illumina.com/science/technology/next-generationsequencing/sequencing-technology/sbs-benefits.html. Accessed 26 Oct 2020 9. Council NR (US) Committee on Metagenomics: Challenges and Functional Applications. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. Washington (DC): National Academies Press (US); 2007. PMID: 21678629 10. Cook DA, Hatala R, Brydges R et al (2011) Technology-enhanced simulation for health professions education: a systematic review and meta-analysis. JAMA 306:978–988 11. Sequencing quality scores. https://www.illumina.com/science/technology/next-generation-seq uencing/plan-experiments/quality-scores.html. Accessed 26 Oct 2020 12. FASTQ. https://support.illumina.com/bulletins/2016/04/fastq-files-explained.html. Accessed 26 Oct 2020 13. Troubleshooting your data. https://www.roswellpark.org/shared-resources/genomics/servicesand-fees/sanger-sequencing/troubleshooting-your-data. Accessed 26 Oct 2020 14. Interpretation of sequencing chromatograms. https://brcf.medicine.umich.edu/cores/adv anced-genomics/faqs/sanger-sequencing-faqs/interpretation-of-sequencing-chromatograms/. Accessed 26 Oct 2020 15. Porta A (2012) Determining annealing temperatures for polymerase chain reaction. 16. Shewaramani S (2015) Effects of aerobic and anaerobic environments on bacterial mutation rates and mutation spectra assessed by whole genome analyses. Thesis, Massey University, Palmerston North, New Zealand 17. Levenshtein VI (1966) Binary codes capable of correcting deletions. insertions and reversals. Sov Phys Dokl 10:707 18. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147:195–197. https://doi.org/10.1016/0022-2836(81)90087-5 19. Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453 20. Burrows M, Wheeler DJ (1994) A block-sorting lossless data compression algorithm. Digital systems research center.
564
G. H. Cervi et al.
21. Hal Berghel, David Roach (2020) An extension of Ukkonen\’s enhanced dynamic programming ASM algorithm. http://berghel.net/publications/asm/asm.php. Accessed 26 Oct 2020. 22. Carroll H, Clement M, Ridge P, Snell Q (2006) Effects of gap open and gap extension penalties 23. Eddy SR (2004) What is dynamic programming? Nat Biotechnol 22:909–910 24. Oxf. Nanopore Technol http://nanoporetech.com/how-it-works. Accessed 26 Oct 2020 25. DNASSWA. https://espace.library.uq.edu.au/view/UQ:295057. Accessed 26 Oct 2020 26. Halim AK, Majid ZA, Mansor MA, et al (2010) Design and analysis of 8-bit smith waterman based DNA sequence alignment accelerator’s core on ASIC Design Flow. 27. PeopleVine S via ASICs. https://www.sigenics.com/page/asics-c. Accessed 26 Oct 2020 28. Turakhia Y, Zheng KJ, Bejerano G, Dally WJ. Darwin (2017) A hardware-acceleration framework for genomic sequence alignment. 29. Saltzer JH, Kaashoek MF (2009) Principles of computer system design. 30. Conte G, Tommesani S, Zanichelli F (2000) The long and winding road to high-performance image processing with MMX/SSE. 31. Shahabuddin S, Janhunen J, Juntti M et al (2014) Design of a transport triggered vector processor for turbo decoding. Analog Integr Circuits Signal Process 32. Vacek G (2011) Hybrid-core computing for high-throughput bioinformatics. J Biomol. 33. FPGA architecture for the challenge. https://www.eecg.utoronto.ca/~vaughn/challenge/fpga_a rch.html. Accessed 26 Oct 2020 34. FPGA genomics. https://www.intel.com/content/www/br/pt/healthcare-it/products/progra mmable/applications/life-science.html. Accessed 26 Oct 2020 35. Falcon accelerated genomics pipelines. In: Xilinx. https://www.xilinx.com/products/accelerat ion-solutions/1-zzroc0.html. Accessed 26 Oct 2020 36. Mahram A, Herbordt MC (2012) FMSA: FPGA-accelerated ClustalW-based multiple sequence alignment through pipelined prefiltering. 37. Jacob A, Lancaster J et al (2007) FPGA-accelerated seed generation in mercury BLASTP 38. Wu L et al (2019) FPGA Accelerated INDEL Realignment in the cloud 39. GPU history: Hitachi ARTC HD63484. https://www.computer.org/publications/tech-news/cha sing-pixels/gpu-history-hitachi-artc-hd63484/. Accessed 26 Oct 2020 40. nVidia CUDA Bioinformatics: BarraCUDA. In: BioCentric https://www.biocentric.nl/biocen tric/nvidia-cuda-bioinformatics-barracuda/. Accessed 26 Oct 2020 41. Liu Y, Wirawan A, Schmidt B (2013) CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinform 42. NVIDIA. https://www.nvidia.com/en-us/high-performance-computing/. Accessed Oct 2020 43. Kobus R, Hundt C, Müller A, Schmidt B (2017) Accelerating metagenomic read classification on CUDA-enabled GPUs. BMC Bioinform 18:11 44. Wood DE, Lu J, Langmead B (2019) Improved metagenomic analysis with Kraken 2. 45. BLAST. https://blast.ncbi.nlm.nih.gov/Blast.cgi. Accessed 26 Oct 2020 46. Menzel P, Ng KL, Krogh A (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun 7:11257. https://doi.org/10.1038/ncomms11257 47. Buchfink B, Xie C, Huson DH (2015) Fast and sensitive protein alignment using DIAMOND. Nat Methods 12:59–60. https://doi.org/10.1038/nmeth.3176 48. Ba˘gcı C, Beier S, Górska A, Huson DH (2019) Introduction to the analysis of environmental sequences: metagenomics with MEGAN. Springer, New York, NY, pp 591–604 49. Steinegger M, Söding J (2017) MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 50. Oehmen C, Nieplocha J (2006) ScalaBLAST: A Scalable Implementation of BLAST for HighPerformance Data-Intensive Bioinformatics Analysis," in IEEE Transactions on Parallel and Distributed Systems, vol. 17, no. 8, pp. 740-749, Aug. 2006, doi: 10.1109/TPDS.2006.112 51. Herbordt MC, Model J, Sukhwani B et al (2007) Single pass streaming BLAST on FPGAs. Parallel Comput 33:741–756. https://doi.org/10.1016/j.parco.2007.09.003 52. Vouzis PD, Sahinidis NV (2011) GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinform
Metagenomic Analysis: A Pathway Toward Efficiency…
565
53. Liu W, Schmidt B, Muller-Wittig W (2011) CUDA-BLASTP: Accelerating BLASTP on CUDA-enabled graphics hardware. IEEE/ACM Trans Comput Biol Bioinform 54. Chen Y, Ye W, Zhang Y, Xu Y (2015) High speed BLASTN: an accelerated MegaBLAST search tool. Nucleic Acids Res 43:7762–7768. https://doi.org/10.1093/nar/gkv784 55. Fast and accurate short read alignment with burrows–wheeler transform. https://academic.oup. com/bioinformatics/article/25/14/1754/225615. Accessed 26 Oct 2020 56. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. 57. Han T, Parameswaran S. Swasad (2002) An asic design for high speed DNA sequence matching. IEEE, Computer Society, USA 541 58. Jacob A, Paprzycki M, Ganzha M, Sanyal S (2008) Applying SIMD approach to whole genome comparison on commodity hardware. Parallel processing and applied mathematics. Springer, Berlin, Heidelberg, pp 1220–1229 59. (2016) GMAP and GSNAP for genomic sequence alignment: enhancements to speed accuracy and functionality. https://doi.org/10.1007/978-1-4939-3578-9_15 60. Zhao M, Lee W-P, Garrison EP, Marth GT (2013) SSW library: an SIMD smith-waterman C/C++ library for use in genomic applications. PLOS ONE 61. Salamat S, Rosing T (2020) FPGA Acceleration of sequence alignment: a survey. ArXiv200202394 Cs Q-Bio 62. NVIDIA Clara. https://developer.nvidia.com/clara-parabricks. Accessed 27 Oct 2020. 63. MinION. http://nanoporetech.com/products/minion. Accessed 27 Oct 2020. 64. Mishra P, Bhoi N. Genomic signal processing of microarrays for cancer gene expression and identification using cluster-fuzzy adaptive networking. Soft Comput. (2020). 65. Quaid MAK, Jalal A. Wearable sensors based human behavioral pattern recognition using statistical features and reweighted genetic algorithm. Multimed Tools Appl. (2020). 66. Chattopadhyay A, Menon V. Fast simulation of Grover’s quantum search on classical computer. Quant-Ph (2020).
A Machine Learning Approach to CCPI-Based Inflation Prediction R. Maldeni
and M. A. Mascrenghe
Abstract Inflation is one of the critical parameters that indicate a country’s economic position. Therefore, maintaining it at a stable level is one of the objectives of any country’s financial regulator. In Sri Lanka, the Central Bank of Sri Lanka (CBSL) is mandated to formulate policies to achieve desired inflation targets that reflect in price indices, mainly the Colombo Consumer Price Index (CCPI). The effectiveness of these policies depends on the accuracy of projections obtained by such CCPI models. Hence, regulators continuously attempt to develop models that are more accurate, flexible, and stable in their predictions. At present, economic data modeling has taken a new turn around the globe with the introduction of Machine Learning (ML) algorithms. ML approach, although is promising, is not yet extensively explored in the context of the Sri Lankan economy. The study attempts to address this gap by constructing six different types of tuned ML models to compare and arrive at the best model for CCPI-based inflation prediction in Sri Lanka. It also presents a rationally selected combination of predictor variables, specialized for the Sri Lankan economic environment. The results of the study indicate that support vector regression is the best model in terms of prediction power to achieve the said objective. This study also recommends it as a model that is highly flexible and resistant to any future modifications. Keywords Inflation · CCPI · Machine learning
R. Maldeni (B) Robert Gordon University, Aberdeen, Scotland e-mail: [email protected] M. A. Mascrenghe Informatics Institute of Technology, Colombo, Sri Lanka e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_50
567
568
R. Maldeni and M. A. Mascrenghe
1 Introduction Inflation is the phenomenon of a continuous increase in the prices of commodities and services. An increase in prices reduces the purchasing power of the general public due to the low value of the domestic currency, it hence dampens the growth of the economy by reducing the development of the financial sector of the country [9]. To ensure inflation is kept within limits and is less volatile, monetary policies are regulated by the central financial regulator. Accordingly, one of the main objectives of the Central Bank of Sri Lanka (CBSL) is to maintain price stability which directly relates to inflation control. Accurate long-term inflation prediction is one of the critical aspects of achieving this objective, which triggers the policymakers to make more informed, proactive regulatory measures. Inflation prediction research in Sri Lanka lacks a Machine Learning (ML) perspective and verification against traditional models, even if ML, specifically ANN, has shown promising potential in the global context [5]. Therefore, this research explores ML-based modeling for inflation prediction, which claims to be potent to compete against traditional models for prediction accuracy.
2 Methodology 2.1 Determinants of Inflation This research identified determinants based on previous literature [7] and through an extensive study of the Sri Lankan economic background. The variables chosen were Broad Money Supply (M2b), Narrow Money Supply (M1), Exchange Rate, Credit to the Private Sector, Average Weighted Deposit Rate (AWDR), Average Weighted Lending Rate (AWLR), Average Weighted Fixed Deposit Rate (AWFDR), Rice Price, and the Month. The outcome variable was the Colombo Consumer Price Index (CCPI). All variables were numeric and continuous and were used to predict the headline inflation.
2.2 Research Methodology Overview Data used for this study are monthly records from January 2014 to January 2020 period. The data-gathering stage involved collecting data from publicly available national data libraries. The study processed and split the data in a 4:1 ratio for training and testing, respectively, followed by normalizing and scaling of data when required by specific ML models. The model building involved choosing six ML algorithms and fitting them on the processed data. The study then tuned these models using parameter adjustments and accuracy improvement techniques, while iteratively
A Machine Learning Approach …
569
validating them with accuracy measures. Finally, the model evaluation stage analyzed each of the accuracies and characteristics of the individual models, and objectively selected the best model.
2.3 Exploration of Existing Machine Learning Models Exploring machine learning models to predict inflation over traditional time series models has become a trend for researchers in recent years to find more accurate results in different contexts. Ali Choudhary et al. [3] did a comparative study for 28 countries comparing ANN versus Auto-Regressive (AR) models and concluded that ANN models are more capable than AR models in more instances of inflation prediction. Hurtado et al. [6] explored several ANN models for inflation with a different number of hidden layers and hidden neurons. It was evident from this study that ANN models display more accuracy in forecasts when compared with the statistical models used by the Bank of Mexico. Similar researches were done on inflation data in countries such as Germany [3], Pakistan [2], and Ghana [10]. As vivid from above, most studies for inflation focus on one machine learning model, specifically ANNs, making studies that use hybrid models comparatively rare. Enke and Mehdiyev [4] explored a hybrid model using fuzzy and neuro concepts to predict US Consumer Price Index data. However, fuzzy models are less common in predictive systems due to the lack of expertise in defining good rules [8]. Even though the fuzzy-neuron hybrid model outperformed statistical models and the individual ANN model, it lacked robustness and demanded changes in fuzzy rules that required the model to be highly dependent on expert judgment, making it less practical. As per the existing researches, it is evident that ANN is widely used by researchers. Random Forest (RF) model, although minimally explored, is another candidate for similar problems in literature. Behrens et al. [1] explored this model on German inflation data and supported its efficiency in long-term inflation prediction.
2.4 Model Design and Implementation The study used six different types of supervised ML algorithms to build six models with competing prediction accuracies, as explained below. Random Forest (RF). The study initially fitted a regression-type RF model with 500 decision trees. For each split, it used three features. The initial RF model was further tuned to improve its power of prediction. The out-of-bag error rate was plotted against the number of features to determine how many features are required to split at each tree. Since the plot reached its lowest point of error when the features chosen for splits were six, the study considered it as the optimum value. Similarly, the study
570
R. Maldeni and M. A. Mascrenghe
Fig. 1 Variable importance
also obtained the optimum number of decision trees. Using these tuning parameters, the study once again fitted the model to training data for better accuracies. As per the above plot (see Fig. 1) based on the percentage increase of MSE and Node purity, the M2b, Credit to the Private Sector, and Exchange Rate variables tend to be much more critical to the model than the other six predictor variables. Artificial Neural Network (ANN). The study used the Resilient Backpropagation (Rprop) learning algorithm as it is an improved version of the regular backpropagation algorithm. Implementation of it is, however, more complex than the regular one. Rprop Algorithm is an iterative algorithm that uses the gradient method to calculate weights and biases. These iterations are named epochs and each epoch attributes to changes in weights and biases of the nodes. The main difference between the two is that regular backpropagation uses the learning rate, and gradient value to decide on the weight changes of the nodes, whereas the Rprop only uses the sign of the gradient and rather than using the same learning rate to obtain weight delta for all weights, it adapts each weight delta accordingly and improves it through the training process. The initial graph of the ANN obtained from the above model is shown in Fig. 2.
A Machine Learning Approach …
571
Fig. 2 Initial ANN architecture
The leftmost input layer has nine variables, then the hidden layer with two neurons, followed by the hidden layer with one neuron. The numeric value on each of the node connecting black arrows depicts the weights obtained by the Rprop algorithm, and the similar ones on the blue lines depict biases used. In simpler terms, these weights consider how much of a contribution a particular input node passes on to the next node. The bias shifts the activation function’s performance based on the weights on each of the middle layers. The study tuned this ANN model by changing the number of hidden neurons and layers to arrive at the best-tuned model in terms of accuracy. Extreme Gradient Boost (XGBoost). The main two parameters required to fit an XGBoost model are the objective parameter and the booster parameter. Since the problem at hand is a regression one, the “reg-linear” objective function was used. Out of the gbtree, dart, and gblinear boosters, the model performed best with the tree-based dart booster. This booster was chosen after optimizing the results with 100 iterations of RMSE improvement. Out of many sub-parameters used in the XGBoost, Gamma is one critical parameter that requires appropriate handling as it decides the minimum error reduction
572
R. Maldeni and M. A. Mascrenghe
required for a node split. However, increasing this value initially will increase the regularization of the model and result in an overfit. Hence, Gamma was kept at the null level initially. Once the error of the training set started to differ substantially from the testing set, then the Gamma value was altered accordingly. Similarly, optimum values of the learning rate and maximum depth parameters were determined with special consideration to avoid overfitting. Model tuning for XGBoost took a trialand-error approach by varying such parameter value combinations using iterative RMSE optimization until it could not further improve the error of the model. Support Vector Regression (SVR). To get an overall idea of the SVR model parameters, firstly, the study fitted an eps-regression model. As the problem at hand is a non-linear one, the study used the Radial Basis Function (RBF) kernel. In SVR, epsilon is a primary measure of the accuracy of the approximation function. Generally, an epsilon closer to 0 gives the optimum balance between overfitting and optimal fitting of the data. Model tuning for SVR refers to finding the best epsilon and cost combination that gives the lowest error measure. The study ran the model iteratively with different combinations of epilson and cost and recorded the error. This flexibility given by the SVR model to avoid any overfitting by handling the cost parameter is critical for a highly accurate model. This type of model tuning by fitting various combinations of parameters to obtain a low error value refers to sensitivity analysis in ML modeling. To arrive at the best model, the SVR was run with 1100 different parameter combinations. Each combination from epsilon 0, 0.1, …1 and the cost 1, 2, 3, …100 were used as depicted by the below code fragment: OptSVM 1 the predicted response would be the mean response of the nearest neighbors. Therefore, the value of K is the parameter that decides the prediction power of the model. Finding the optimum K that gives the lowest MSE is the challenge in KNN modeling. The study tried K values ranging from 1 to 12. For each model, the MSE value was obtained as the selection criteria for the optimum K. The results are shown in Fig. 3. From the above figure, it is vivid that the minimal MSE results when K = 2. Hence, the study chose the K = 2 model as the best-tuned model for the data. Linear Regression (LR). Initially, the study obtained p-values and coefficients by fitting the regression model with all predictor variables. The initial model’s probability values detected any insignificant variables at this stage. If the coefficient
A Machine Learning Approach …
573
Fig. 3 KNN optimal K graph
Pr(>abs(t)) is smaller than 0.05 it means that the variable is significant for the model. The model tuning stage involved removing each of the highly insignificant variables to arrive at the model with the best accuracy measures.
Fig. 4 Tuning of LR model
574
R. Maldeni and M. A. Mascrenghe
From the above output (see Fig. 4), it is vivid that AWDR, AWFDR, and Month variables have high Pr(>t) values, and therefore were removed from the initial model before running the regression process again. The study applied this process iteratively to arrive at the final model with all significant variables and the lowest error.
3 Evaluations and Discussion As per RMSE, MSE, MAE, and MAPE accuracy measures, the best three models were SVR, RF, and KNN, respectively. When selecting the best model for inflation prediction, it is equally important to consider the flexibility and resistance of the model along with the prediction power. The SVR model, when considered based on its algorithm, is highly flexible in comparison to others since it lets the modeler handle the cost of regression, the error limit, and is also not dependent on assumptions of any prior probability distribution. Since it does not draw inferences from a prior probability distribution, the level of understanding is relatively high. Also, since SVR is a model that allows for regularization and generalization of data, it is not overly dependent on the model building dataset and therefore avoids any potential overfitting. Hence, it works fairly even with small datasets as used in this study, making it a more resistant and stable model. Therefore, considering all of the above factors in relevance to model accuracy, flexibility, understandability, and model resistance, the SVR model is chosen as the best solution for the problem at hand.
4 Conclusion This study explored the domain of improving Sri Lanka’s CCPI-based inflation prediction through a comparative ML approach. Accordingly, it implemented six supervised machine learning models based on RF, ANN, XGBoost, SVR, KNN, and LR algorithms. The most powerful prediction models as per the study’s accuracy measures were SVR, RF, and KNN, respectively. Even though ANN and RF are popular models in literature for solving similar problems, this study notes that the size of the dataset would have had an impact on the predictions given by each algorithm. SVR is an algorithm that works better than ANN and RF with smaller datasets performing a high level of randomization and generalization using a fewer number of observations. Therefore, this particular dataset performed best with the SVR model. In the Sri Lankan context, this can be considered as a starting point, and both SVR and RF models can be studied extensively to build more robust hybrid models in the future. Further to best accuracies, the SVR model also depicted high flexibility and understandability along with better resistance in comparison to other models. Therefore, this research concludes that the SVR model is the best individual ML model to predict Sri Lanka’s CCPI-based inflation, subject to the data and predictors that were chosen for the study.
A Machine Learning Approach …
575
References 1. Behrens C, Pierdzioch C, Risse M (2018) Testing the optimality of inflation forecasts under flexible loss with random forests. Econ Model 72:03 2. Bukhari S, Muhammad Hanif (2007) Inflation forecasting using artificial neural networks. 8 3. Ali Choudhary M, Adnan Haider (2008) Neural network models for inflation forecasting: an appraisal. Appl Econ 44(12) 4. Enke D, Mehdiyev N (2014) A hybrid neuro-fuzzy model to forecast inflation. Procedia Comput Sci 36(12) 5. Å EstanoviÄ Tea (2019) Jordan neural network for inflation forecasting. Croat Oper Res Rev. 10(7) 23–33 6. Hurtado C, Luís J, Cortes Fregoso, Hector J (2013) Forecasting mexican inflation using neural networks. In: CONIELECOMP 2013 23rd international conference on electronics, communications and computing. 32–35 7. Hasan Kiaee (2018) Determinants of inflation in selected countries. 07 8. Jyh-Shing Roger Jang, Chuen-Tsai Sun, Eiji Mizutani (1997) Neuro-fuzzy and soft computing 9. Ruzima M (2016) Impact of inflation on economic growth: a survey of literature review. 5(4) 10. Hadrat Yusif, Eric Effah Sarkodie (2015) Inflation forecasting in the Ghana–artificial neural network model approach. Int J Econ Manag Sci 4(1)
On Profiling Space Reduction Efficiency in Vector Space Modeling-Based Natural Language Processing Alaidine Ben Ayed, Ismaïl Biskri, and Jean-Guy Meunier
Abstract Space reduction is widely used in natural language processing tasks. In this paper, we use automatic text summarization as a use case to compare the quality of automatically generated summaries using two space reduction-based variants of the same summarization protocol. Obtained results show that non-linear space reductionbased summarization approaches outperform linear space-reduction-based ones. This research’s salient outcome is that it explains obtained results based on a rigorous study of the generated space sparsities Keywords Space reduction · Vector space modeling · Natural language processing
1 Introduction 1.1 Space Reduction Space reduction techniques have been used in many research and industrial domains. Authors in [1] explain Big Data complexity and the need for dimensionality reduction [2–4]. Also, space reduction techniques have been used in image processing applications such as biomedical research, industry [5], face tracking [6], and so on. Dimensionality reduction approaches have also been used in speech recognition [7] and natural language processing (NLP) sub-tasks like automatic text summarization [8, 9] and automatic translation [6]. A. Ben Ayed (B) · J.-G. Meunier Université du Québec à Montréal, 405 Rue Sainte-Catherine Est, Montréal, QC H2L 2C4, Canada e-mail: [email protected] I. Biskri Université du Québec à Trois-Rivières, 3351 Boulevard des Forges, Trois-Rivières, QC G8Z 4M3, Canada
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_51
577
578
A. Ben Ayed et al.
1.2 Vector Space Modeling and Natural Language Processing A vector space model is an algebraic-based technique for expressing documents as vectors of index terms. Generally, a document is represented by a feature vector whose coordinates correspond to different terms. A term can typically be a single word or a longer phrase. The simplest way to construct feature vectors is to attribute a non-zero value to a given coordinate if its associated term occurs in the document [10]. There are several other ways of computing those features, also known as (term) weights [11, 12]. Vector space modeling is widely used in natural language processing (NLP) [10, 13]. Automatic text summarization (ATS) is a typical task of NLP [14]. This paper will use it as a use case to profile space reduction efficiency in vector space modeling-based natural language processing. Note that magazine reviews, headlines on TV news, and paper outlines are samples of summaries that we interact with every day [15]. A summary has commonly been defined as a condensed form of one or more texts. It should convey critical information expressed in the source document [16]. The generated output should be short, accurate, and fluent [17]. Juan-Manuel [18] identifies six main reasons why we need automatic text summarization. First, automatically generated summaries permit to consume content faster and more efficiently. Second, they make the selection process easier when foraging documents. Automatic summarization can likewise make the process of text indexing more efficient. Next, the produced output is less biased than that prepared by humans. Furthermore, generated summaries may contain much-personalized information, which can be a valuable supplement to question-answering frameworks. Lastly, we need automatic text summarization to increase the number of documents treated by commercial abstract services. There is no agreed taxonomy of summary types. Indeed, taxonomy variants are dependent on the angle of perception. Mani [15] introduced a taxonomy based on a bunch of criteria. The first criterion is the adopted approach used to construct the summary. In this context, the generated output can be considered as either an extract or an abstract. Extractive summarization implies that the most significant parts of the original text are combined to make the summary. On the other hand, when dealing with abstractive summarization, the original format’s significant issues are paraphrased and presented logically to produce a more coherent outline. Type of the input is another criterion that categorizes the summarization process into mono-document ATS, when we deal with a single document and multi-document ATS when dealing with a collection of documents. Language is another crucial parameter. Depending on this angle of perception, we can distinguish three variants of automatic text summarization: (1) mono-lingual ATS, when the source text and the output are in one language; (2) multi-lingual ATS, when the source text is written in two or more languages, thus the output would be in corresponding languages; and (3) cross-lingual ATS, when the generated summary is not in the same language of the original text. Authors in [19] have pointed out key challenges associated with automatic text summarization, remaining a hot research topic.
On Profiling Space Reduction Efficiency in Vector Space …
579
1.3 Related Work Mashechkin et al. [20] presented earlier vector space modeling-based text summarization techniques. Generally, the original text is represented in the form of a numerical matrix. Each sentence is represented by a feature vector in the term space. Next, a new topic space, namely, the conceptual (topic) space, is constructed through latent semantic analysis. The conceptual space encodes most of the relevant information conveyed in the original text. Next, sentence selection is carried out based on the relevance of the information it encodes in the topic space. The length of the generated summary depends on the targeted compression ratio. Authors in [21] suggested an unsupervised Non-negative Matrix Factorization (NMF)-based summarization technique. We proposed a hybrid system for automatic text summarization in [8]. First, vector space modeling was used to compute two coverage and fidelity metrics. The fuzzy logic theory was then used to combine the latter metrics onto a unified Fidelity-Coverage (F-C) score. Next, coherence was achieved by applying the rhetorical structure theory on top of F-C highly scored sentences. We also proposed many variants of a vector space modeling metric of automatically generated text summaries evaluation in [9].
1.4 Motivations and Scope of the Study As we mentioned above, space reduction techniques are widely used in natural language processing sub-tasks. Even more, recent natural language processing literature shows that we still use classic dimensionality reduction techniques [22–24]. In this work, we profile the space reduction efficiency in vector space modeling-based NLP. We use automatic text summarization as a use case for this purpose. We investigate how the used technique to generate the conceptual space affects the generated output quality. Thus, we compare the pertinence of automatically generated summaries using two variants of the same summarization protocol [8]. The first one uses a linear space reduction technique to construct the conceptual space (the topic space). The second one uses a non-linear technique for the same purpose. The salient outcome of this research is that it explains obtained results based on a rigorous study of the generated space sparsities. The rest of this paper is broken down into the following sections: Sect. 2 describes the proposed vector space modeling protocol of summarization and emphasizes mathematical details of conceptual space construction using a linear and a non-linear space reduction approach. Section 3 compares the quality of obtained summaries using the two mentioned above variants of the proposed text summarization approaches. It discusses whether the used space reduction technique to generate the conceptual space affects the generated output quality and explains the obtained results. Section 4 puts forth conclusions.
580
A. Ben Ayed et al.
2 The Proposed Summarization Protocol From a computational perspective, the main idea is to project the source document onto a more informative and compressed new space that captures its main concepts. Unitary vectors of the latter space are used to compute Retention and Fidelity scores. Then, a unified Retention-Fidelity (R-F) score is obtained following the approach we had previously described in [8]. In this paper, we propose an isometric mapping variant of the principal component analysis-based technique of [8], and we compare the quality of obtained summaries. Then, we explain the obtained results. Note that there are many other non-linear dimensionality reduction techniques such as t-SNE [25] (t-distributed stochastic neighbor embedding), LLE (locally linear embedding) [26], and so on. The choice of ISOMAP for this study was motivated by the fact that ISOMAP is not the best empirically proven non-linear dimensionality reduction technique. It assumes that we deal with positive semidefinite matrixes, which is not always the case. The LLE technique deals better with this issue. The intuition behind choosing ISOMAP is to compare the principal component analysis; the state-ofthe-art method of linear space reduction techniques, to a standard, but not the best non-linear space reduction method.
2.1 The Principal Component Analysis-Based Protocol of Text Summarization First, the text to summarize is segmented onto units (generally, sentences are considered as unitary text units). Then, a lexicon is built and filtered to discard all universal expressions and terms. Next, the text is coded as an s × t matrix; s refers to the number of sentences, while t refers to the number of significant unique tokens. Next, the conceptual space will be constructed as described below. It will be used later on to calculate a Retention-Fidelity (PCA-RF) score for every source text sentence.
2.1.1
Building the Conceptual Space:
Each sentence Si is coded by a ζi vector of t components. ζi components refer to tf-idf s associated to tokens present in Si . Afterward, redundant information is coded as ω; the mean sums of vectors ζi (Eq. 1). It will be used later on to construct a normalized feature vector i to every text unit (Eq. 2). 1 ζi s i=1
(1)
i = ζi − ω
(2)
s
ω=
On Profiling Space Reduction Efficiency in Vector Space …
581
The new compressed, more informative space is built by first computing the covariance matrix described in Eq. 3. A singular value decomposition will then be performed as described by Eq. 4 to construct eigenconcepts: the unitary vectors of the targeted space. 1 T n n = χχT ℵ= s n=1
(3)
χ = δ.S.γ T
(4)
s
χ = [1 , . . . , s ] in Eq. 3. Also, ℵ and χ are, respectively, t × t and t × s matrix. Also, dimensions of matrix δ, S, and β in Eq. 4 are, respectively, t × t, t × s, and s × s. Note that, δ and γ are orthogonal (δδ T = δ T δ = I dt and γγ T = γ T γ = I ds ). Additionally, (i) Eigenvectors of χT χ are columns of γ. (ii) Eigenvectors χχT are columns of δ. (iii) Eigenvalues σk of χχT and χT χ squares of singular values sk of S. Eigenvalues σk of χχT are null when k > s and their associated eigenvectors are unnecessary since s < t. So, matrix δ and S can be truncated, and dimensions of δ , S, and γ in (4) become, respectively, t × s, s × s, and s × s. Next, the targeted compressed conceptual space will be built using K eigenvectors δi , belonging to the highest K eigenvalues as shown in Eq. 5: Π K = [δ1 , δ2 , ..., δ K ]
2.1.2
(5)
Computation of the PCA-RF Score
Sentences of the source text are projected onto the constructed conceptual space and encoded as a linear combination of K eigenconcepts as described by Eq. 6: the vector ℵi (k) = δkT i provides coordinates of a sentence Si in the conceptual space. pr oj
i
=
ℵi (k)δk
(6)
k
Next, we try to determine to which extent selected units to be part of the generated abstract encode the original text’s main concepts. Thus, the Euclidean distance between a given concept q and any projected sentence onto the conceptual space is defined and calculated as described by Eq. 7: di (q ) = q − i pr oj
(7)
582
A. Ben Ayed et al.
Fig. 1 The Retention-Fidelity tensor
Next, we construct the Retention-Fidelity tensor (Fig. 1) by fixing a window size ω. In the below example, ω is set to 4. The first line of the Retention-Fidelity tensor provides the better four sentences to encode the most crucial concept as their associated projected vectors onto the conceptual space in Eq. 7 have the smallest distances to the eigenvector associated with the highest eigenvalue in Eqs. 4 and 5. The second line of the Retention-Fidelity tensor gives the best four sentences to encode the second most important concept and so on. Note that the line order is related to the importance of the encoded concept. Also, the order of a given text unit in a given window ω depends on its distance to a given concept. For instance, the fifth sentence is the best one to encode the fourth most crucial concept, while the second one is the last best sentence to encode the same concept in a window of four sentences. The Retention-Fidelity tensor will be used later on to compute a unified Retention-Fidelity score for each sentence of the text to summarize as follows: first, a Retention coefficient of a given projected sentence onto the conceptual space is computed. Sentences with high retention coefficients should encode as much as possible the most essential concepts present in the original text. Thus, the retention coefficient is defined as the number of times it occurs in a window of size ω when considering the most critical K concepts divided by K . 1 αi k i=1 k
Rkw (S) =
(8)
αi = 1 if sentence S encodes the ith concept. It is equal to zero in the opposite case. The PCA-RF scores are defined as the averaged sum of the retention coefficients. Retention coefficient of a given sentence is weighted according to its position in a given window of size ω. Sentences with high Retention-Fidelity coefficients should encode the most important concepts expressed in the text to summarize while considering the importance degree of each concept.
On Profiling Space Reduction Efficiency in Vector Space …
583
Fig. 2 Summary construction using highly scored PCA-RF sentences
1 1 − ψi ] αi [1 + k i=1 w k
PC A R Fkw (S) =
(9)
K is the number of principal concepts, αi = 1 if sentence S encodes the ith concept. It is equal to zero in the opposite case. ψi is the rank of S in the ith window of the Retention-Fidelity tensor. Next, a unified Retention-Fidelity (PCA-RF) score is computed for every sentence of the source text following the approach we had previously described in [8]. Highly scored sentences will be extracted to generate the final output (Fig. 2).
2.2 The Isometric Mapping-Based Protocol of Text Summarization In the beginning, we proceed in the same way as we did previously to construct the set of ζ1 , ζ2 , ..., ζm feature vectors describing the s sentences of the text to summarize. The ISOMAP-RF approach consists of constructing a k-nearest neighbor graph on n data points, each one representing a sentence in the original space. Then, we compute the shortest path between all points as an estimation of geodesic distance D G . Finally, we compute the decomposition K in order to construct Π K previously defined in equation five, where
584
A. Ben Ayed et al.
1 K = − H DG H 2
(10)
H is a centering matrix, H = I D n1 ee T , and e = [1, 1, . . . , 1]T is an n × 1 matrix. Note that the decomposition of K is not always possible because there is no guarantee that K is a positive semidefinite matrix. We deal with this case by finding out the closest positive semidefinite matrix to K . Then we decompose it. Next, we proceed in the same way we proceeded previously with PCA-RF. ISOMAP-RF is defined as PCA-RF in Eq. 9.
3 Experimental Results 3.1 Dataset The used dataset for experiments is the Timeline17 corpus [27]. It contains news articles and their associated manually created timelines belonging to international news agencies, such as BBC, the Guardian, CNN news, Fox News, and NBC News. Source texts are related to nine broad topics: British Petroleum’s Oil Spill, death of the Pop-Music’s king (Michael Jackson), Financial Crisis, Libya’s war, Iraq’s war, Syrian Crisis, Egyptian Protest, H1N1 (Influenza), and the 2010 Haiti earthquake.
3.2 Results To evaluate the quality of the obtained results, we compute averaged ROUGE scores for obtained summaries. Obtained results are reported in Table 1. Note that ROUGE is the commonly used metric for automatically generated abstract evaluation. Summaries are compared to a bunch of human-produced summaries [28]. Note that there are many variants of the ROUGE metric: • ROUGE-N: a measure of N-grams overlap between the system and human-made abstracts. • ROUGE-L: it gives statistics about the Longest Common Subsequence (LCS). • ROUGE-W: a set of weighted LCS-based statistics that favors consecutive LCSes. • ROUGE-S: a set of Skip-bigram-based co-occurrence statistics. This study used ROUGE-1, ROUGE-2, and ROUGE-S metrics to evaluate the generated outputs’ quality. Obtained results illustrated by Table 1 show that the nonlinear space reduction-based summaries are closer to the human-made ones than those generated using the linear space reduction-based technique. We computed the sparsity percentage of original space feature metrics for every text to summarize to peel obtained results. We did the same thing for constructed spaces using the linear and non-linear studied space reduction techniques. Sparsity
On Profiling Space Reduction Efficiency in Vector Space …
585
Table 1 Obtained ROUGE-1, ROUGE-2, and ROUGE-S scores when using the PCA-RF- and ISOMAP-RF-based summarization protocols ROUGE-1 ROUGE-2 ROUGE-S PCA-RF ISOMAP-RF
0.241 0.259
(a)
0.061 0.066
0.058 0.064
(b)
Fig. 3 a Comparison of the feature matrix’s sparsity for both original and compressed conceptual spaces, using linear and non-linear approaches. b Linear versus non-linear space reduction efficiency based on the sparsity of the constructed space
percentages are illustrated by Fig. 3. Sub-figures, Fig. 3a, b, show that the feature matrixes relative to the conceptual (topic) space are less sparse than the feature matrix of the original space. However, the non-linear (ISOMAP) technique remarkably outperforms the linear (PCA) one. Those results will be explained in the following subsection.
3.3 Discussion When constructing the compressed conceptual space using the PCA approach, we use the Euclidean distance, which does not approximate the actual distance between feature word vectors in the targeted space. In contrast, ISOMAP uses spectral theory to perform space reduction. It preserves the geodesic distances in the lower dimension constructed space. It creates a neighborhood network at the beginning. It approximates the geodesic distance between all pairs of word feature vectors of the original space using a graph distance. Next, it finds out the feature vectors’ lowdimensional embedding by performing an eigenvalue decomposition of the geodesic distance matrix. Note that the Euclidean distance works well only if neighborhood structure can be approximated as linear in non-linear manifolds. If it is not the case, Euclidean distances can be extremely deceiving. Contrary to this, measuring the dis-
586
A. Ben Ayed et al.
Fig. 4 Construction of the compressed conceptual space in a 2-D dimension by a linear (PCA) versus a non-linear (ISOMAP) approach
tance between two-word feature vectors by following the manifold will help achieve a more appropriate approximation to which extent two-word feature vectors are similar. Figure 4 illustrates the construction of the compressed conceptual space in a 2D dimension. It shows how the ISOMAP technique succeeds in approximating the geodesic distance between pairs of word feature vectors (A,C) and (D,B) in the compressed conceptual (topic) space.
4 Conclusion In this paper, we compared two space reduction techniques using automatic text summarization as a use case. We analyzed the quality of automatically generated summaries using two space reduction-based variants of the same summarization protocol. The obtained results show that the non-linear space reduction summarization approach outperforms the linear space reduction-based one. We explained the obtained results by the mean of a rigorous sparsity study. Note that recent natural language processing literature shows that different models of topic modeling [23], feature extraction, selection [22, 24], and so on are still using linear space reduction techniques. Our research findings prove that obtained results in those papers can be improved by merely using non-linear space reduction techniques instead of linear ones.
On Profiling Space Reduction Efficiency in Vector Space …
587
References 1. Rashid J, Adnan S, Syed M et al (2020) An efficient topic modeling approach for text mining and information retrieval through K-means clustering. Mehran Univ Res J Eng Technol 39(1):213– 222. ISSN 2413-7219 2. Saggion H, Poibeau T (2013) Automatic text summarization: Past, present and future. In: Multi-source, multilingual information extraction and summarization,theory and applications of natural language processing. Springer, Berlin, pp 3–21 3. Salton G (1962) Some experiments in the generation of word and document associations. In: Proceeding AFIPS’62, fall joint computer conference, pp 224–250 4. Saul L, Roweis S (2001) An introduction to locally linear embedding. J Mach Learn Res 5. Tran G-B, Tran T-A, Tran N-K, Alrifai M, Kanhabua N (2013) Leverage learning to rank in an optimization framework for timeline summarization. In: TAIA workshop, SIGIR 13 6. van der Maaten LJP, Hinton G-E (2008) Visualizing high-dimensional data using t-SNE. J Mach Learn Res 9(Nov):2579–2605 7. Ur-Rehman MH, Liew CS, Abbas A et al (2016) Big Data Reduction Methods: A Survey. Data Sci. Eng. 1:265–284. https://doi.org/10.1007/s41019-016-0022-0 8. Kortli Y, Jridi M, Falou A-A, Atri M (2020) Face recognition systems: a survey. Sensors (Basel) 20(2):342. https://doi.org/10.3390/s20020342 9. Lee J-H, Park S, Ahn C-M, Kim D (2009) Automatic generic document summarization based on non-negative matrix factorization. Inf Process Manag 45, 20–34 10. Alaidine B-A, Ismaïl B, Jean-Guy M (2020) Vector Space Modeling based Evaluation of Automatically Generated Text Summaries. International Journal on Natural Language Computing 9(3):43–52. https://doi.org/10.5121/ijnlc.2020.9303 11. Bela G, Corinna B, Stefan L (2015) Research-paper recommender systems: a literature survey. Int J Digit Lib 305–338 12. Cengizler Ç, Ün MK, Büyükkurt S (2020) A Nature-Inspired Search Space Reduction Technique for Spine Identification on Ultrasound Samples of Spina Bifida Cases. Sci Rep 10:9280. https://doi.org/10.1038/s41598-020-66468-x 13. Alaidine B-A, Ismaïl B, Jean-Guy M (2019) Automatic text summarization: a new hybrid model based on vector space modelling, fuzzy logic and rhetorical structure analysis. In: Book computational collective intelligence. Springer, pp 26–34. https://doi.org/10.1007/978-3-03028374-2_3 14. Juan-Manuel T-M (2014) Automatic Text Summarization. Wiley, London 15. Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 16. Gani A et al (2015) A survey on indexing techniques for big data: taxonomy and performance evaluation. In: Knowledge and information systems, pp 1–44 17. Gosztolya G, Kocsor A (2004) Aggregation operators and hypothesis space reductions in speech recognition. Lecture notes in computer science, vol 3206. Springer, Berlin, Heidelberg. https:// doi.org/10.1007/978-3-540-30120-2-40 18. Hashem I et al (2015) The rise of "big data" on cloud computing: review and open research issues. J. Inf Syst 47:98–115 19. Hemavathi D, Srimathi H (2020) Effective feature selection technique in an integrated environment using enhanced principal component analysis. J Ambient Intell Human Comput. https:// doi.org/10.1007/s12652-019-01647 20. Kambatla K et al (2014) Trends in big data analytics. J. Parallel Distrib Comput 74(7):2561– 2573 21. Hovy E, Radev DR, McKeown K (2002) Introduction to the special issue on summarization. Comput Linguist 28(4):399, 399–408 22. Lin C-Y (2004) ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the workshop on text summarization branches (2004) 23. Mani, I.: Automatic summarization. John Benjamins Publishing, Amsterdam (2001)
588
A. Ben Ayed et al.
24. Mashechkin, I.V., Petrovskiy, M.I., Popov, D.S. et al.: Automatic text summarization using latent semantic analysis. Program Comput Soft 37, 299–305 (2011). DOI: 10.1134/S0361768811060041 25. Wong A, Salton G, Yang C-S (1975) A vector space model for automatic indexing. In: Communications of the ACM, vol 18, nr11, pp 613–620 (1975) 26. Zha H (2002) Generic summarization and key-phrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th annual international acmsigir conference on research and development in information retrieval (SIGIR’02), pp 113–120 27. Lhazmir S, El Moudden I, Kobbane A (2017) Feature extraction based on principal component analysis for text categorization. In: International conference on performance evaluation and modeling in wired and wireless networks (PEMWN) proceedings, pp 1–6. https://doi.org/10. 23919/PEMWN.2017.8308030 28. Rajaraman A, Ullman J-D (2011) Mining of massive datasets, pp 1–17
Proposal of a Methodology for the Implementation of a Smart Campus Sonia-Azucena Pupiales-Chuquin, Gladys-Alicia Tenesaca-Luna, and María-Belén Mora-Arciniegas
Abstract Nowadays, the transformation of a city to a smart city is of great interest to governments with the firm purpose of improving the quality of life of its citizens. Several initiatives have presented methodologies to be able to carry it out covering all the work areas of a city, such as transport, mobility, the economy, sustainability, health, and education, the latter being of great importance for this work. When we talk about education, we mean offering agile, automatic processes that give great benefit and satisfaction to the work that students and teachers do. Talking about a smart campus in the area of higher education leads us to present a methodological proposal for the creation of a smart university campus, taking advantage of all the management, administration, and education processes. The methodology presented has been the result of a related work research process focused on a higher education institution. It is made up of three phases: preliminary planning, a functional scheme, and the development and execution of the proposal, leaving solid foundations for the implementation of a prototype of a smart campus. Keywords Smart city · Smart campus · Smart campus methodology · Smart education
S.-A. Pupiales-Chuquin (B) · G.-A. Tenesaca-Luna · M.-B. Mora-Arciniegas Departamento de Ciencias de La Computación y Electrónica, Universidad Técnica Particular de Loja, San Cayetano Alto y Marcelino Champagnat S/N, Loja, Ecuador e-mail: [email protected] G.-A. Tenesaca-Luna e-mail: [email protected] M.-B. Mora-Arciniegas e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_52
589
590
S.-A. Pupiales-Chuquin et al.
1 Introduction Talking about smart cities today is to envision the trend that most governments have as a priority in their planning to improve the quality of life of communities. Smart cities offer services to their inhabitants that facilitate daily life beyond providing a feedback channel to the city administration. Governments are interested in automating the different areas of a city, in such a way that it allows them to control the activities of transport, mobility, economy, sustainability, health, and education, among others, by using technologies that make these areas smart and facilitate the quality of life of citizens. One of the main areas to enhance and work to make your processes smart is education, known as smart campus. A smart campus is one of the most important concepts for incorporating technology into education. In recent years, it has garnered a tremendous amount of attention from professionals, academics, and researchers from multiple disciplines. However, even so, the concept has not been developed as a whole and lacks a methodological framework. This study, therefore, conducts research to review recent achievements in the field of smart cities in general and the proposed methodologies, focusing the work on a clear proposal for the creation of a methodology for a smart campus, giving solution to the multiple tasks that can be found in the daily work of students and teachers by automating and improving communication and the teaching–learning process. Based on this context, this work shows the most important components of a methodology to create a smart campus from the perspective and interest of higher education entities.
2 Theoretical Foundation The section presents the fundamental concepts that underpin the research carried out and that provide the basis for the solution proposed for a smart campus and its extensive job opportunities as part of the context of the implementation areas of a smart city.
2.1 Smart City During the last years, several definitions have been presented about what a smart city is, one of them is that of [1], which defines it as a place where people from all walks of life connect through technology and in a very significant way in the twenty-first century, the same one that is focused on the combination of conscience, independence and self-determination activities [2].
Proposal of a Methodology for the Implementation of a Smart Campus
591
Other authors such as [3] mention that “The objective of a smart and sustainable city is to invest in technology to stimulate economic growth, promote social progress, and improve environmental conditions” and for this, citizen participation in public management is necessary [4]. Ontiveros et al. [3], for his part, mentions that a smart city is not a place that benefits only modern technology, but is also a very broad ecosystem in which citizens, municipal authorities, public companies participate/private and industrial groups [5]. In addition to all these definitions, it is stated that smart cities have contributed greatly to economic growth because in recent years they have been involved in the socioeconomic development of any region, becoming key axes of economic growth, innovation, progress social and cultural [6]. It is important to mention that there are experiences regarding the creation of frameworks on the specialties that allow identifying when a city is smart, realizing it through initiatives or projects that include smart government, smart economy, smart population, smart sustainability, and smart mobility [7]. These aspects known as characteristics can also be scaled and adapted to the area of education, such as universities to transform their general infrastructure into smart campuses, creating small and independent cities in activities, multiplicity of functions, connections, users, among other aspects [8]. Characteristics of a smart city. The definitions presented make it clear that smart cities have a set of particular characteristics to be managed, thus each one is clearly presented [9]. Smart government. Government within a smart city must be transparent, allow citizens to access data, and stay informed and interconnected. The government must work with open data and open government technologies allowing the citizen to exchange and provide data that is totally public so that the social agents can make use of them [10]. Open data and open government. Open data is a large amount of free information that is available to citizens, whose provenance comes from different organizations. By having access to open data, citizens have the benefits of working or building on such information or on them, generating new data or offering services to the community [11]. In order to achieve this objective, it is necessary to have an open government, which “engages in constant conversation with citizens in order to hear what they say and request, who make decisions based on their needs and preferences, which facilitates collaboration of citizens and officials in the development of the services it presents and that communicates everything it decides and does in an open and transparent manner” [12]. Therefore, having open government and open data means creating direct communication between the government and citizens, obtaining transparency, participation, and collaboration. Smart Mobility. Mobility must be sustainable, safe, and efficient in transport systems; its main objective is the improvement of public transport, optimizing routes avoiding
592
S.-A. Pupiales-Chuquin et al.
vehicular congestion, seeking to give an effective, intelligent, and sustainable result to citizens and tourists. Smart Sustainability. It refers to being an attractive city based on natural and environmental qualities, that is, in areas dedicated to caring for the environment and therefore reducing pollution, among the actions that can be taken to protect the environment is carry out the procedures online avoiding the expense of paper and emissions. Smart Population. Within a smart city, the projects to be carried out depend directly on the citizens, so they must be participative based on each of the plans to be developed, and the support of the citizens will result in the projects being successful in the short or long term. Smart Economy. The economy must be sustainable, competitive, and must help improve the quality of life of citizens; this is achieved after a good organization that attracts investment by generating jobs, and today there are many companies seeking to invest in smart cities with the objective to be more competent and have greater possibilities.
2.2 Smart City Methodologies The methodologies for smart cities play a very important role in society; they are aimed at promoting technology, the economy, automation, and creating a new city model, among others, including citizens as the main authors, since the changes that are given in the city will be for the welfare and better quality of life of citizens. There are some proposed methodological models for a smart city; Table 1 presents their respective descriptions.
2.3 Smart Campus All universities have something in particular and that is that they are connected to the Internet, surrounded by simple objects such as doors, windows, printers, projectors, books, posts, and banks, among others, and complex objects such as buildings, classrooms, laboratories, and parking lots. All the mentioned objects can be converted into smart objects through the use of sensors, QR tags (links, text, geographic), RFID, NFC, or BLE, all these smart objects are the ones that make it possible to transform a classic university into a smart university, because, by communicating between students with their smartphones [13]. Therefore, a smart campus is defined as a small world that integrates devices that through sensors and network work unanimously for the well-being of students. Every day, students, professors, and visitors who are at the university have an object
Proposal of a Methodology for the Implementation of a Smart Campus
593
Table 1 Methodologies for an intelligent environment [10–12] Methodologies
Components/stages
Objective
Methodology Colombia Research Group, Development and Application in Telecommunications and Informatics and Research Group in Urban Communication of Smart Cities (GIDATI-GICU DE CI)
Country
• Pillars of the city • City themes • Phases
Aimed at the main leaders of a city so that they can create their own smart city model adaptable to their problems and needs of the city In addition, it serves as a guide to plan strategies, projects, and initiatives to solve critical issues
Emerging and Sustainable Cities Initiative (ICES) methodology
America and the Caribbean
• Core of the methodology • Pre-investment of innovation
Carry out a complete study of the city identifying the needs and therefore propose a solution to different critical factors with the aim of achieving sustainability in the city and citizen well-being
Innovation Management Plan Methodology
Santander
• Principles • Phases • Plans (results)
Define the mission and vision of the future based on the new model of city through new technologies Development of strategic plans that will be the fundamental pillars in the progress of the city, including public and private citizen commitment
connected to the Internet, be it a smartphone or a tablet, which makes it easier for them to find a location inside the campus. Other authors affirm that smart campuses are not only responsible for providing solutions in technical areas, they also make available various services that facilitate the student’s stay on campus, making it more comfortable and attractive, among them they have programs to link with the environment showing a green smart campus, areas in which socialization activities can be carried out between own and visitors [14]. Advantages of a smart campus. Smart campuses offer many advantages in order to create a pleasant socialization environment, techniques to improve student learning, save resources, and implement smart algorithms to improve services, management, infrastructure, and impact problems, among others, as follows:
594
Smart identification and payments
S.-A. Pupiales-Chuquin et al.
Building automation
Smart lighting
Location service
Parking spaces
Fig. 1 Smart applications [15]
• Monitoring the flow of people, opening and closing access points. • Lighting of corridors with the presence of people. • Accident prevention through constant monitoring of temperature, smoke, humidity, and noise. • To reduce electricity consumption. • To create a comfortable environment to increase socialization among members of the university community. • Implementation of intelligent algorithms that improve the learning methodology and keep students informed of the different academic events. Application of a smart campus. Having stable bandwidth within the university campus makes life easier for students, teachers, and visitors, among others, allowing applications to be launched and new services to be offered or ambiguous services to be improved. Among the applications that enhance and enhance a smart campus is detailed in Fig. 1.
2.4 Similarity Between Smart City and Smart Campus Smart cities and smart campuses share similar objectives, several proposals present some similarities that are important to mention, Table 2 shows the similarities found. The integration of technologies within cities or universities creates a much more sophisticated profile for people who live, work, and study, raising their level of knowledge, culture, technology, and good habits. Table 2 Similarity between smart city and smart campus [15] Similarity
Smart city Smart campus
Seek to attract people and investments to their communities
x
x
Seek to improve the quality of life of its inhabitants/students through the creation of services and applications
x
x
Try to differentiate them from the rest by demonstrating innovation x and leadership initiatives using technology such as the Internet of Things
x
Proposal of a Methodology for the Implementation of a Smart Campus
595
2.5 Related Works Cities and universities go hand in hand in the matter of implementing intelligent algorithms, automating processes, saving resources, reducing costs, and most importantly improving the quality of life of the citizen or student. Several initiatives have been developed to date that can meet the aforementioned needs and achieve solid projects that encompass the construction of smart campus methodologies. Table 3 presents some works related to the subject.
3 Methodology 3.1 Design of the Methodology The methodology is designed from a diagnosis to the areas that characterize a smart city and case studies that reveal the similarities between a city and a university; therefore, a scalable and adaptable process of the fields that define a smart city for creating a smart campus sketch. The methodological approach aims to transform a traditional university campus into a smart campus. Figure 2 presents the structure of the methodology divided into three stages, such as Preliminary Planning, Functional Scheme, and Development and Execution [10]. The methodology aims to create a new university model through new technologies, that is, it is flexible to the insufficiencies and problems that require an immediate solution. In addition, it proposes to the leaders of a university to make strategy plans, automate traditional processes, decrease energy consumption, improve the overall quality of learning, improve communication among students, faculty and management, and future challenges that ensure the availability of services and information for students. The following section describes each of the stages that the methodology has. Preliminary planning. This stage begins with a preamble of the research applied to the university campus in order to obtain as a result the requirements, needs, and processes that need to be automated, improved, or implemented, for this it is broken down into phases, which are Preparation phase. For the implementation of new technologies, it is necessary to carry out an in-depth study on the areas of intervention and their context in which they will work, which is why the formation of work teams according to their common skills is proposed as the first step and therefore the assignment of different roles to each member of the team, in addition an estimation of time will have to be made for each of the activities to be carried out by the respective groups; therefore, to fulfill such designated tasks, you must choose the appropriate tools for gathering information and
King Fahd University of Saudi Arabia Petroleum and Minerals [17]
Democrita de Tracia University [18]
2
3
Greece
Thailand
Maejo University [16]
1
Country
University name
No.
Table 3 Smart campus case studies [16–18]
Universities need simple identity cards for those who make it up: employees, students, and teachers, among others, who have access to certain data, equipment, or departments Cards are microprocessor based and multifunctional, incorporate identity with access privileges, and also store values to be used in various places such as coffee shops, and stores, among others
Development of an intelligent Lingzhi mushroom farm, through the use of Internet of Things technology that allows controlling the environment Maejo University developed an application called Blynk, which can be viewed on mobile devices. This app, which is in the cloud and is compatible for Android and IOS, Blynk displays real-time data such as reading voltage data and current from sensors, solar panel charge voltage data on a battery or when activities such as a sprinkler occur or when fog pumps are turned on
Description
(continued)
Simulation platform They present a simulation platform for smart microgrids on for smart microgrids on university campuses The proposed platform for smart University Campus campuses facilitates the analysis of energy calculations in order to save electricity costs, optimizes the size of the micronetwork components, as well as the campus power management and load control operations
KFUPM smart cards (King Fahd University of Petroleum and Minerals)
Smart farms
Project name
596 S.-A. Pupiales-Chuquin et al.
University name
Nusantara Multimedia University [19]
No.
4
Table 3 (continued)
Indonesia
Country Smart posters
Project name
They proposed the implementation of Smart Posters based on Near-Field Communication (NFC) technology NFC is a new technology that is going around the world and is integrated into today’s mobile devices, allowing the connection between mobile phones or other devices for the transfer of information, in terms of public transport, broadcast electronic tickets, credit card payment, and advertising, among others
Description
Proposal of a Methodology for the Implementation of a Smart Campus 597
598
S.-A. Pupiales-Chuquin et al.
Education
Fig. 2 Design of the methodology for a smart campus
the techniques to be used, including surveys, interviews, observation, questionnaires, etc. Exploratory phase to the areas. The areas that define a smart city are the fields of action that are also applied adapting to a university campus, among which are Government, Economy, People and Life, Sustainability, and Mobility. The exploratory phase tries to carry out an evaluation to each of the areas on performance, process management, problems, deficiencies, strengths, and weaknesses, among others. The working groups designated to follow up on each of the areas will be in charge of making a projection of the reliability, cost, strategies, and estimation time for the project, in addition to verifying the availability of access to the database for consumption and presentation of results. Problem categorization phases. After the detailed analysis of each of the fields, the next step is to prioritize the most affected areas that need immediate solutions, that is, in an order from highest to lowest critical degree, and finally select a field to carry out the development and the implementation of technologies, algorithms, that allow to transform into a smart area. Solution phase and strategies. The preliminary stage ends with the detail of proposed solutions and most appropriate strategies for the chosen area, the working group will present a list of possible solutions and the next step will be to choose the best option considering the benefits of the interested party, the feasibility, and completion times. Functional scheme. In the second stage of the methodology, we present a functional outline of the entire life cycle of the methodology and, therefore, it is subdivided into three blocks:
Proposal of a Methodology for the Implementation of a Smart Campus
599
• Block one: They are all the areas that are immersed within a university campus and to which we can transform them into smart ones, in which are green areas, parking, education, and lighting, among others and through sensors, Wi-Fi, social networks RFID tags, and the Internet of Things allow us to collect data, which we call Big Data. • Block two: Once the information has been collected, the next step is to carry out a management process, that is, when having a large volume of data; it is cleaned, classified, analyzed, and interpreted in real time. • Block three: After analyzing the data, these are finally presented to those who make up the university campus, showing clarity and transparency by the authorities, information that can be viewed through mobile devices, computers, or information panels. • Block four: Finally, in this block are the interested parties that will make use of the information, among them are the students, work personnel, and teachers, among others. Development and execution. Finally, the development and execution stage is the implementation of technologies, of programming algorithms that allow obtaining the smart campus as a result, for them we present a layered architecture. Capture system layer. There are all the instruments and tools that allow data to be captured or collected, different types of sensors, including water, light, humidity, temperature sensors, mobility sensors, and cameras, among others. Acquisition/interconnection layer. There is the Wi-Fi, routers, and stiches that allow the connection and interaction of things for data transfer. Knowledge layer. All the information of different formats, such as videos, images, and audios, among others, are stored. For the analysis, processing and cleaning of the data, the NoSQL database must be used, since they have a great capacity to incorporate several servers and at the same time help to make the storage more efficient, among the database storage options of data are found: Cassandra, Mongo Data Base, and Couch Data Base. Interoperability layer. In this layer is the entire data development and exploitation kit: SDK, Open data. Intelligence layer. Finally, the intelligence layer is the result of obtaining the areas or fields with intelligent algorithms already implemented in what we call smart campus.
4 Evaluation of the Methodology for a Smart Campus Table 4 presents the evaluation of the proposed methodology to create a smart campus, from the case studies corresponding to the related works of universities in different countries.
600
S.-A. Pupiales-Chuquin et al.
Table 4 Evaluation of the methodology for a smart campus Study cases Smart campus methodology country
Smart farms [16] Smartcards [17] Simulation Smart posters [19] Thailand Saudi Arabia platform for smart Indonesia microgrids on university campuses [18] Greece
Preliminary planning Preparation phase x
x
x
x
Exploratory phase x to the areas
x
x
x
Problem categorization phase
x
x
x
x
Solution and strategies phase
x
x
x
x
Data Capture
x
x
x
x
Processing
x
x
x
x
Presentation
x
x
x
x
Designation
x
x
x
x
Functional scheme
Development and execution Capture layer
x
x
x
x
Acquisition layer
x
x
x
x
Knowledge layer
x
x
x
x
Interoperability layer
x
x
x
x
Intelligence layer
x
x
x
x
The tests consist of a comparative analysis of each of the phases of the methodology, which allows us to know how adaptable it is in the case studies, appreciating as a result that all the projects from the beginning comply with such stages for the construction of smart projects. We consider that 90% of the methodology is replicable in these investigations, whose objective is to improve the well-being of students, teachers, and university staff, among others.
5 Conclusions • The present investigation allowed an investigation of the different proposals of existing methodologies, serving as the basis for the creation of our “Smart Campus Methodology.”
Proposal of a Methodology for the Implementation of a Smart Campus
601
• The methodology created focuses on innovating and encouraging transparency of communication, improving, creating, or automating processes, facilitating the learning and fulfillment of activities within a smart campus. • The methodological proposal created for a smart campus is adaptable for implementation on a university campus, leaving the necessary bases for optimizing processes, from planning to prototype execution.
References 1. Giffinger R, Fertner C, Kramar H, Meijers E, City-ranking of European Medium-Sized Cities, pp 1–12 2. Giffinger R, Pichler-Milanovi´c N (2007) Smart cities: ranking of European medium-sized cities. Centre of Regional Science, Vienna University of Technology 3. Ontiveros E, Vizcaíno D, López Sabater V (2017) Las ciudades del futuro: inteligentes, digitales y sostenible, 1st edn. Madrid 4. Davy RG et al (2018) Construir las ciudades inteligentes y sostenibles del mañana. Geophys J Int 212(1):244–263. https://doi.org/10.1093/gji/ggx415 5. Lea RJ (2017) Smart cities: an overview of the technology trends driving smart cities 6. TECNO-Cercle Tecnològic de Catalunya (2012) Hoja de ruta para la Smart City. Consult. el, vol 27 7. Manville et al C (2014) Mapping smart cities in the EU 8. Pagliaro F et al (2016) A roadmap toward the development of Sapienza Smart Campus. In: 2016 IEEE 16th international conference on environment and electrical engineering (EEEIC), pp 1–6 9. Avellana Doménech E (2015) Ciudad Inteligente (Smart City), Gand{\’\i}a. Propuestas para un plan de actuación en el sector tur{\’\i}stico 10. Mellouli S, Luna-Reyes LF, Zhang J (2014) Smart government, citizen participation and open data. Inf Polity 19(1–2):1–4. https://doi.org/10.3233/IP-140334 11. Naser A, Concha G (2012) Datos abiertos: Un nuevo desaf{\’\i}o para los gobiernos de la región 12. César C, Lorenzo S (2010) Open government: gobierno abierto. Jaén, España: Algón Editrores MMX, 2010 13. Cata M (2015) Smart university, a new concept in the internet of things. In: 2015 14th RoEduNet international conference-networking in education and research (RoEduNet NER), pp 195–197 14. Quimis Nogales MA, Romero Freire CM, others (2017) Propuesta de un marco de referencia para la óptima transición de un campus hacia un sistema de administración de energ{\’\i}a inteligente y proyecto de pre-factibilidad a Espol 2.0, Espol 15. Guan L, others (2011) Study tour-city of Stratford Ontario Canada: creating a smart city. Gov News 31(3):20 16. Chieochan O, Saokaew A, Boonchieng E (2017) Internet of things (IOT) for smart solar energy: a case study of the smart farm at Maejo University. In: 2017 international conference on control, automation and information sciences (ICCAIS), pp 262–267 17. Halawani T, Mohandes M (2003) Smart card for smart campus: KFUPM case study. In: Proceedings of the 2003 10th IEEE international conference on electronics, circuits and systems, 2003. ICECS 2003. vol 3, pp 1252–1255 18. Elenkova MZ, Papadopoulos TA, Psarra AI, Chatzimichail AA (2017) A simulation platform for smart microgrids in university campuses. In: 2017 52nd international Universities power engineering conference (UPEC), pp 1–6
602
S.-A. Pupiales-Chuquin et al.
19. Audy, Kristanda MB, Hansun S (2017) Smart poster implementation on mobile bulletin system using NFC tags and salt tokenization case study: universitas multimedia nusantara. In: Proceeding - 2016 2nd International Conference on Science in Information Technology, ICSITech 2016: Information Science for Green Society and Environment, 152–157. https:// doi.org/10.1109/ICSITech.2016.7852625
Emotion Cause Detection with a Hierarchical Network Jing Wan and Han Ren
Abstract Emotion cause detection plays a key role in many downstream sentiment analysis applications. Research shows that both knowledge from data and experience from linguistics help to improve the detection performance. In this paper, we propose an approach to combine them. We utilize a hierarchical framework to model emotional texts, in which the emotion is independently represented along with each word and clause in a document. We also employ linguistic features to help the model find the emotion cause. Such features by manual work help to describe deep semantic relations between emotions and their causes that are difficult to be cast by representation models. Experimental results show that the combination model helps to detect emotion cause within emotional texts with complex semantic relations. Keywords Emotion cause detection · Sentiment analysis · Hierarchical neural network
1 Introduction Current research on sentiment analysis focuses on fine-grained tasks of retrieval, extraction, and generation such as stance detection [1], review summarization [2], and emotion cause detection [3]. Compared to traditional tasks such as sentiment polarity recognition, fine-grained sentiment analysis tasks concern more about concepts, events, and relations within emotional texts, which helps to better understand emotions. As one of the fine-grained sentiment analysis tasks, emotion cause detection tends to capture the causality of consequence or find out the relation between an J. Wan Guangdong University of Foreign Studies, Guangzhou, China e-mail: [email protected] H. Ren (B) Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_53
603
604
J. Wan and H. Ren
emotion and its stimuli. Such an issue may play an important role in many application scenarios, such as collecting the reasons why a movie receives positive feedback or acquiring the causes of the opinion to a public policy made by a government. Although related research tasks such as review mining [4] and user profiling [5] help to explore the cause of an attribute, emotion cause detection gives a better way to understand the logic of emotion generation. There are two kinds of strategies for emotion cause detection. One is rule-based methods, that is, to build linguistic rules for detecting emotion cause. Although such rules contribute to achieving a high precision, they are still difficult to recognize all emotion causes in emotional texts due to heterogeneous emotional expressions. The other is computational models, that is, to model emotional texts in order to analyze emotions and their stimuli. Such a method leverages statistical patterns to represent emotional texts, avoiding the coverage problem by rules. However, the deficiency of linguistic instruction may make such models achieve a low performance in detection precision. In this paper, we propose an approach to detect emotion cause in texts. We combine such two strategies, tending to balance the generalization and the precision. More specifically, a hierarchical GRU model is built to learn the representation of words and clauses. Then we build linguistic features for recognizing emotion cause. The task of emotion cause detection can be viewed as a classification issue, i.e., each clause is judged whether it is the cause of the emotion of the document or not. Thus, clause representations, as well as manually linguistic features, are concatenated for classification. The contribution of this paper lies in two folds: (1) we propose a hierarchical framework to model emotional texts and the difference between our model and many hierarchical neural networks such as [6] is that the emotion is independently represented along with each word and clause in a document. (2) We employ both the result of representation learning and linguistic features for detecting emotion cause. Such features by manual work help to describe deep semantic relations between emotions and their causes that are difficult to be cast by representation models. The rest of this paper is organized as follows. Section 2 gives a brief description of related work in emotion cause detection. Section 3 shows the model proposed in this paper. In Sect. 4, experimental results as well as discussions are given. Finally, the conclusion is drawn in Sect. 5.
2 Related Work Sentiment analysis, which plays an important role in the research area of social computing, has been well studied for years [7]. Traditional tasks on sentiment analysis mainly involve shallow text analysis issues such as polarity classification and emotion detection[8]. Essentially, most of these tasks aim to classify texts with affective labels, and always leverage multiple emotional resources, such as emotion lexicons
Emotion Cause Detection with a Hierarchical Network
605
[9], manual patterns [10], or heuristics [11]. Machine learning models as well as feature engineering [12] are also utilized for better performances. In order to better understand affective texts, fine-grained analysis issues, such as aspect term extraction and emotion cause detection, are continuously investigated in recent years. Essentially, an emotion cause represents an event that triggers a corresponding emotion, hence requiring emotion causes help to better understand the relations between emotions and events within affective texts. There are two main approaches for detecting emotion cause: rule-based and statistical ones. Methods of the first class often leverage heuristic rules that are composed of cue words and particular syntactic patterns [13, 14]. Statistical approaches tend to detect emotion cause by learning causal events within affective texts automatically. Traditional models, such as SVM [3], CRF [15], and MaxEnt [16], are utilized to classify clauses with emotion causes. For better performances, neural networks, such as CNN [17] and LSTM [18], are utilized to build deep learning models for emotion cause detection. Although such models achieve much better performances than traditional ones, it is still difficult to recognize the causes of emotions because of complex semantic relations in emotional texts.
3 The Approach 3.1 The Framework We leverage a two-layer framework for the learning model. The first layer encodes each word in a document, while the second one encodes each clause in a document. The output of the second layer is the encoded clause, which is a part of the input of the classifier. In addition, the document representation is treated as one part of the input of the classifier, aiming at providing context information to such clause. Linguistic features are also encoded as a part of the input of the classifier, helping to better detect the cause of the emotion of such document. The framework is shown in Fig. 1.
3.2 Hierarchical Network We utilize the GRU [19] to build the hierarchical encoder. Compared to the LSTM, GRU has no separate memory cells, which makes it converge faster than the former. We focus on clause-level classification in this work. The dataset we utilize in our experiment comes from NTCIR-13 [20]. Each document has multiple clauses, and some of them are labeled with a boolean value, indicating that they are the causes of the emotion of such document. We firstly encode each word in a document, then encode each clause via encoded word representations to represent a document.
606
J. Wan and H. Ren Output Label
Softmax
Concatenation
Bi-GRU
Bi-GRU
Bi-GRU
Clause Encoder
Bi-GRU
Bi-GRU
Bi-GRU
Word Encoder
Linguistic Features
w1
w2
w3
Fig. 1 The framework of the model
Word Encoder Assume that ωit , t ∈ [0, T ] is the embedding vector of the i-th word in a document. We utilize a bidirectional GRU to get the representation of each word from both directions, and then integrate them to get the contextual information: − →t −−−→ t h i = G RU (ωi ), t ∈ [1, T ] ←−−− ← − h it = G RU (ωit ), t ∈ [T, 1]
(1)
where h denotes the hidden state, and the arrow denotes the reading direction: the right arrow means the model reads the document from ωt1 to ωtT , while the left arrow means a reverse reading direction. Finally, the representation for a given clause is − → ← − generated by concatenating such two hidden states: h it = [ h it , h it ]. Clause Encoder The running mechanism of the clause encoder is similar to the word encoder, and the only difference is that the input of the clause encoder is an encoded clause: −−−→ − → h i = G RU (ci ), t ∈ [1, L] ←−−− ← − h it = G RU (ci ), t ∈ [L , 1]
(2)
Emotion Cause Detection with a Hierarchical Network
607
where ci is a clause vector, and h i is the hidden state of it. Here, we also get the − → ← − representation of the clause by concatenating two hidden states: h i = [ h i , h i ].
3.3 Features We also employ the following linguistic features in our approach: Keyword A keyword is a cue word for detecting emotion cause. Such keywords are usually functional words or connectives in a discourse. For example, in the sentence Our team is licking its chops, because we beat the champions last night, the word because indicates the casual relation between the emotion of the sentence happiness and the clause beat the champions. Distance An emotion and its causal event may close in most cases, hence, a sequential distance may estimate whether an event is the causation of an emotion. Such feature is an integral value, denoting how many clauses of a clause to an emotional word. Linguistics Casual indicators or connectives in discourse are always omitted in texts of the Chinese language; therefore, it is difficult to detect emotion cause via the above features in such cases. In spite of this, emotion causes can be reflected by particular expressions. Based on it, Lee et al. [13] defined linguistic rules to detect casual clauses. In this paper, we employ several particular expressions for emotion cause detection, shown in Table 1. Each of them is described as a syntax-based rule. EC denotes a clause with emotion cause, CV denotes a causative verb, EM denotes an emotional word, CC denotes a causal conjunction, such as because or hence, CT denotes a casual indicator or a connective in discourse, and PP denotes a preposition related to casual relations. Based on these rules, a clause may contain an emotion cause on the condition that the other two constituents in one rule are found in a sentence. Table 1 Linguistic rules
ID
Rule
1
EC + CV + EM
2
CC + EC + EM
3
EM + CC/CT + EC
4
PP + EC + EM
608
J. Wan and H. Ren
4 Experiments 4.1 Settings The dataset we utilize in our experiment comes from NTCIR-13[20], in which 2,200 paragraphs with emotion cause annotation of training data and 2,000 paragraphs of test data were given. Each clause in a paragraph is labeled with an emotion, the cause of the emotion, and the keyword that probably indicates the emotion, if have. Precision, recall, and F-1 score are adopted as the metrics. Four models are set in the experiment: GRU: in this model, only clause representation is leveraged for cause classification. Linguistic features are not involved. GRU + d: document representation is concatenated with clause representation for cause classification. GRU + LF: linguistic features are concatenated with clause representation for cause classification. GRU + d + LF: all the representations and features mentioned above are concatenated for cause classification.
4.2 Experimental Results Table 2 shows the experimental results of the overall performance of each model. We can see from the table that: (1) the model with all the features (GRU + d + LF) achieve the best performance, which testifies the helpfulness of the document representation and linguistic features to the task; (2) the model with linguistic features (GRU + LF) outperforms the model without them (GRU) with a high improving F-1 performance 3.37%, indicating that linguistic features much contributes to improving the ability of emotion cause detection; 3) the model with document representation (GRU + d) is better than the one without it (GRU), meaning that contextual information helps to detect emotion cause, in other words, it is inappropriate to only classify clauses without contexts. We also conduct an in-depth investigation by analyzing performance for each emotion category. There are six emotion categories in the dataset: fear, surprise, disgust, sadness, anger, and happiness. We utilize the model with all the features Table 2 Experimental results
Precision
Recall
F-score
GRU
0.6765
0.6594
0.6678
GRU + d
0.6833
0.6725
0.6779
GRU + LF
0.7159
0.6876
0.7015
GRU + d + LF
0.7223
0.6908
0.7062
Emotion Cause Detection with a Hierarchical Network Table 3 Performances for each category
Precision
609 Recall
F-score
Fear
0.716
0.6825
0.6988
Surprise
0.7208
0.6883
0.7042
Disgust
0.7432
0.7109
0.7267
Sadness
0.7205
0.6864
0.7030
Anger
0.7301
0.6956
0.7124
Happiness
0.7269
0.6898
0.7079
and compute precision, recall, and F-1 score for each category. Results are shown in Table 3. Table 3 shows that (1) the model achieves the best performance on the emotion class disgust, and the reason is probably that keywords indicating the disgust emotion are often explicit or disgust expressions are easier to be recognized by linguistic features than expressions of other emotions; (2) the proportion of the data of sadness is more than that of other emotions, whereas the detection performance to this emotion category is less than the average one, and the reason probably lies in that, expressions with sadness emotion may vague or implicit; thus it is not much easier to detect it by current linguistic features. In order to achieve a better performance, a deeper model or more complex features are needed.
5 Conclusion This paper proposes an approach to detect emotion cause in texts. Overall, the contributions of this paper are (1) we propose a hierarchical framework to model emotional texts, and the difference between our model and many hierarchical neural networks is that the emotion is independently represented along with each word and clause in a document. (2) We employ both the result of representation learning and linguistic features for detecting emotion cause. Experimental results show that our model helps to detect emotion cause within emotional texts with complex semantic relations. Acknowledgements This work is supported by the Foundation of the Guangdong 13th Five-year Plan of Philosophy and Social Sciences (GD20XZY01, GD19CYY05), the General Project of National Scientific and Technical Terms Review Committee (YB2019013), the Special Innovation Project of Guangdong Education Department (2017KTSCX064), the Graduate Research Innovation Project of Guangdong University of Foreign Studies(21GWCXXM-068), and the Bidding Project of GDUFS Laboratory of Language Engineering and Computing (LEC2019ZBKT002).
610
J. Wan and H. Ren
References 1. Sun Q, Wang Z, Zhu Q, Zhou G (2018) Stance detection with hierarchical attention network. In: Proceedings of the 27th international conference on computational linguistics, pp 2399–2409 2. Mitcheltree C, Wharton S, Saluja A (2018) Using aspect extraction approaches to generate review summaries and user profiles. In: Proceedings of the 2018 conference of the north american chapter of the association for computational Linguistics: Human Language Technologies, New Orleans, Louisiana, pp 68–75 3. Gui L, Wu D, Xu R, Lu Q, Zhou Y (2016) Event-Driven emotion cause extraction with corpus construction. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, Texas, USA, pp 1639–1649 4. Diaz GO, Ng V (2018) Modeling and prediction of online product review helpfulness: a survey. In: Proceedings of the 56th annual meeting of the association for computational linguistics, Melbourne, Australia, pp 698–708 5. Wang J, Li S, Jiang M, Wu H, Zhou G (2018) Cross-media user profiling with joint textual and social user embedding. In: Proceedings of the 27th international conference on computational linguistics, NM, USA, pp 1410–1420 6. Yang Z, Yang D, Dyer C, He X, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 15th annual conference of the North American chapter of the association for computational linguistics: human language technologies, CA, USA, pp 1480–1489 7. Abirami AM, Gayathri V (2017) A survey on sentiment analysis methods and approach. In: Proceedings of the eighth international conference on advanced computing 8. Das D, Bandyopadhyay S (2014) Emotion analysis on social media: natural language processing approaches and applications. In: Agarwal et al (eds) Online collective action: dynamics of the crowd in social media. Lecture notes in social networks. Springer, pp 19–37 9. Xu R, Gui L, Xu J, Lu Q, Wong KF (2013) Cross lingual opinion holder extraction based on multiple kernel SVMs and transfer learning. Int J World Wide Web 18(2) 10. Xu R, Wong K-F (2008) Coarse-Fine opinion mining—WIA in NTCIR-7 MOAT task. In: Proceedings of NTCIR-7 workshop meeting, Tokyo, Japan, pp 307–313 11. Agrawal S, Siddiqui TJ (2009) Using syntactic and contextual information for sentiment polarity analysis. In: Proceedings of the 2nd international conference on interaction sciences: information technology, culture and human, pp 620–623 12. Rao KS, Koolagudi SG (2013) Robust emotion recognition using spectral and prosodic features. Springer, New York 13. Lee SYM, Chen Y, Huang C-R (2010) A text-driven rule-based system for emotion cause detection. In: Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text, Los Angeles, CA, pp 45–53 14. Gao K, Xu H, Wang J (2015) A rule-based approach to emotion cause detection for Chinese micro-blogs. Expert Syst Appl 42(9):4517–4528 15. Li Y, Li S, Huang C-R, Gao W (2013) Detecting emotion cause with sequence labeling model. J Chinese Inf Process (Chinese) 27(5):93–99 16. Chen Y, Lee SYM, Li S, Huang C-R (2010) Emotion cause detection with linguistic constructions. In: Proceedings of the 23rd international conference on computational linguistics, Beijing, China, pp 179–187 17. Chen Y, Hou W, Cheng X (2018) Hierarchical convolution neural network for emotion cause detection on microblogs. In: Proceedings of the 27th international conference on artificial neural networks, Rhodes, Greece, pp 115–122 18. Chen Y, Hou W, Cheng X, Li, S (2018) Joint learning for emotion classification and emotion cause detection. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, Belgium, pp 646–651
Emotion Cause Detection with a Hierarchical Network
611
19. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv: 1409.4073 20. Gao Q, Jiannan H, Ruifeng X, Lin G, He Y, Wong K-F et al (2018) Overview of NTCIR-13 ECA task. In: Proceedings of the 13th NTCIR conference on evaluation of information access technologies, Tokyo, Japan, pp 380–383
Skills and Human Resource Management for Industry 4.0 of Small and Medium Enterprises Sukmongkol Lertpiromsuk, Pittawat Ueasangkomsate, and Yuraporn Sudharatna
Abstract This study is aimed at identifying the skills that small and medium enterprises in the food industry place importance upon, the skills levels of the food industry employees of the small and medium enterprises, the skills levels that students acquire from educational institutions, any skills gap between those salient to small and medium enterprises and those possessed by employees and students as well as the role of human resource management in addressing this skills gap, where it exists. The results show that social skills and personal skills are deemed very important, while technical skills and methodological skills are considered as being important for Industry 4.0. There is a gap in all these types of industry skills requirements in regard to employees of small and medium enterprises as well as those of 4th year students who are about to graduate. Moreover, human resource management in the recruitment process influences the reduction of the skills gap (β = −0.390) at a significance of level of 0.01, while such management in human resource planning impacts a reduction in the skills gap (β = 0.309) at a significance level 0.05 for SMEs in the Thai food industry. Keywords Skills 4.0 · SMEs · Small and medium enterprises · Human resource management · Food industry · Thailand
1 Introduction Nowadays, small and medium enterprises (SMEs) have come to play a prominent role in the economy, both in terms of production, investment, employment, and entrepreneurship promotion. In 2017, there were 3,046,793 SMEs in Thailand, which delivered 42.4% of GDP and constituted 82.22% of national employment [1]. This S. Lertpiromsuk Regular MBA Program, Kasetsart Business School, Kasetsart University, Bangkok, Thailand P. Ueasangkomsate (B) · Y. Sudharatna Department of Management, Kasetsart Business School, Kasetsart University, Bangkok, Thailand e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_54
613
614
S. Lertpiromsuk et al.
industry is subject to a continuous change. Today, Industry 4.0 applications are using the Internet network to connect production information data effectively [2]. The transition into Industry 4.0 will result in moderately and low-skilled workers having a precarious employment situation, whereby they will become increasingly replaced by robots or other forms of AI. Some types of employment will disappear and hence, new forms will be required to replace them [3]. The rapid growth of modern technology is leading to shortages of personnel with specialized skills [4], that is, there is a gap of talent and skills in the workforce across many industrial sectors [5]. In Thailand, there are problems with labor shortages, low-skilled labor, and lack of expertise. All of these indicate the country’s unreadiness when faced with the relentless drive toward Industry 4.0 [6]. Human resource management has an important role to play for small and medium-sized enterprises [7] in ensuring their workforce can deliver better performance [8]. At the same time, educational institutions must also be involved in making sure new graduates are eligible for participating in the industry in terms of providing knowledge, skills, and adaptability [6]. The purposes of this study are to assess the current levels of skills in small and medium enterprises in the Thai food industry in the context of Industry 4.0, thereby identifying any skills gaps among the workforce and investigating the role of human resource management in addressing those identified. Moreover, we investigate student perceptions of the levels of skills they had as well as which they considered the most important for pursuing their careers. We make recommendations as to how curricula can be adjusted such that both student and industry skills needs are addressed. In sum, the scope of this study is SMEs in the food industry as well as fourth-year students from one university in Bangkok, Thailand. The study includes five parts. The literature review is presented in Sect. 2. Part III is involved with the methodology, while the research results are described in Sect. 4. The conclusion is explained in the final part.
2 Literature Review 2.1 Skills Skills mean having knowledge, ability, or features, which may be a physical, intelligence, or society that can be achieved by learning, practicing, and/or working with others, which are necessary for a job to be completed successfully. Skills can be classified into two categories: hard skills, which refer to those that can be practiced, and soft skills, which are those hidden within the person [9]. While Industry 4.0 skills have been divided into four categories: technical skills, methodological skills, social skills, and personal skills [5].
Skills and Human Resource Management for Industry …
615
2.2 Skills Gap The skills gap refers to a mismatch of skills that an individual possesses compared to those they require for gaining successful employment [10]. From the employer’s perspective, the skills gap refers to the lack of appropriate skills for effective engagement in the labor market, thus being unable to gain a competitive advantage [11]. That is, when the existing workforce has insufficient skills it is unable to meet the needs of the organization [12]. At the present, we are facing a skills gap, especially in the technical domain. At present, employers are unable to find workers with the right qualifications and at the global level, it has been estimated that this is leading to economic losses of more than 1.6 trillion dollars [13].
2.3 Human Resource Management Human resource management, or previously known as personnel management, refers to the art of selecting new and old people in such a way that organizational goals will be achieved both in terms of quantity and quality [14]. It pertains to the relationship between managers and workers aimed at meeting the organization’s objectives [15]. It is the duty of the organization to make effective use of staff, thereby fulfilling its performance goals [16], which can be achieved through human resources management [17].
2.4 Related Studies It has been found that changes in technology lead to skill gaps, which will affect the work process of employees [5]. When managing human capital to reduce skills gaps that occur, it is necessary to consider these in relation to two types, those who are already working and new workers who have just completed their studies [11], which leads to research Hypotheses 1 and 2. Hypothesis 1 (H1): The average levels of the required skills for Industry 4.0 exhibits a significant difference when compared to those that employees currently have. Hypothesis 2 (H2): The average levels of the required skills for Industry 4.0 exhibits a significant difference when compared to those that students have before entering the labor market. In order to be ready to deal with the changing work processes, businesses should focus on the largest gap first, because it is the weakest link in their adaptability. They should invest in human resource management through offers of training and development [5, 16], which leads to research Hypothesis 3.
616
S. Lertpiromsuk et al.
Hypothesis 3 (H3): Human resource management can foster the reduction of skills gaps in small and medium enterprises.
3 Methodology 3.1 Data Collection For this research, a questionnaire was deployed, with the schedule being constructed by drawing on the literature and related studies and subsequently being validated by five experts in the field. The questionnaire data was collected from two cohorts as follows. • Small and medium enterprises in the food industry: 137 questionnaires were collected through purposive sampling by email of Thai companies involved in food production from a database of 1,332 companies registered with the Department of Business Development. • Fourth-year students of one university in Bangkok: 400 questionnaires were collected using convenience sampling of 6,106 students. For SMEs in the food industry, the questionnaire had four parts. General information about the participants (such as gender, age, education, work experience) and their firms (such as the number of employees, fixed asset) constituted the first and second parts, respectively. The third part was focused on investigating the required skills by SMEs, while the final part was aimed at ascertaining the current levels of human resource management in five identified domains. For the student respondents, the questionnaire had two sections. The first pertained to general information, while the second was aimed at gathering data about the level of skills relating to Industry 4.0, namely technical skills, methodological skills, social skills, and personal skills that the students currently had. The respondents were asked to rate statements on a five-point Likert scale, where 1 indicated the least requirement/importance and 5 the most. The average scores were divided into five equal-sized groups: (1) very low level [1.00–1.80]; (2) low level [1.81–2.60] (3); moderate level [2.61–3.40]; (4) high level [3.41–4.20]; and (5) very high level [4.21–5.0].
3.2 Data Analysis The study used t-tests with statistical significance at 0.05 to identify the skills gaps as well as deployed multiple regression analysis with statistical significance at 0.05 to test the influence of human resource management on the skills gaps.
Skills and Human Resource Management for Industry …
617
4 Research Results 4.1 Skills According to Table 1, social skills and personal skills were considered very important, while technical skills and methodological skills were important in dealing with Industry 4.0. Specifically, social skills emerge as being the most important desired Table 1 Importance skills Skills
Mean
S.D.
Meaning
Knowledge of advanced technology
4.1533
0.6630
Important
Process understanding
4.2628
0.7790
Very Important
Information and communication technology
4.2117
0.7711
Very Important
Coding
3.8394
0.9012
Important
Understanding IT security
4.1168
0.8917
Important
Total
4.1168
0.6378
Important
Entrepreneurial thinking
4.1825
0.7786
Important
Problem-solving
4.3431
0.7420
Very Important
Decision-making
4.3577
0.7450
Very Important
Analytical
4.1825
0.7972
Important
1. Technical skills
2. Methodology skills
Researching skills
3.8832
0.8917
Important
Total
4.1898
0.6925
Important
3. Social skills Negotiation
4.2774
0.6831
Very Important
Language and communication
4.3869
0.6886
Very Important
Ability to work in a team
4.5182
0.6076
Very Important
Interpersonal relations
4.4380
0.6046
Very Important
Ability to transfer knowledge
4.3431
0.6116
Very Important
Total
4.3927
0.5332
Very Important
Ability to work under pressure
4.4745
0.5826
Very Important
Leadership
4.1825
0.6776
Important
4. Personal skills
Flexibility
4.1679
0.6921
Important
Emotional intelligence
4.4818
0.5829
Very Important
Continuous learning and self-improvement
4.4745
0.5156
Very Important
Total
4.3462
0.4510
Very Important
618
S. Lertpiromsuk et al.
skill, with an average of 4.3927, followed by personal skills (4.3462), methodological skills (4.1898), and technical skills (4.1168), respectively.
4.2 Skills Gap Employee skills gap The results reveal that there are significant differences in all skills in terms of the average of those required and those which the employees currently have, at a significance level of 0.01. Technical skills exhibit the biggest gap with an average difference of 0.69, followed by methodological skills (0.65), personal skills (0.61), and social skills (0.56), respectively, as shown in Table 2. Student skills gap The results elicited that there are significant differences between the average of the required skills and those that students have. Specifically, all skill sets were significantly different at the 0.01 level, which means there is a substantial gap between what skills students have before they graduate and what the focal industry requires. Technical skills exhibit the biggest gap, with a mean difference of 1.27, followed by methodological skills (0.44), personal skills (0.33), and social skills (0.20), respectively. However, it was found that the ability to work in a team, interpersonal relations, and emotional intelligence are skills that students possess to a level higher than employer requirements, which means there are no skills gaps in regard to these, as can be seen in Table 2. Human resource management According to descriptive statistics, for the 76 SMEs that reported having human resource management (55.47%), the selection was ranked as the most important function, followed by performance appraisal, training and development, human resource planning, and recruitment in descending order of importance, respectively. Human Resource Management Influencing the Skills Gaps Multiple regressions were carried out with the average of the skills gaps as the independent variable and human resource management in human resource planning, in recruitment, in selection, in training and development, and in performance appraisals as the dependent variables. It was found that human resource management has an influence on the skills gap significantly in two respects. Specifically, human resource management in the recruitment process can foster a reduction in the skills gap (β = −0.390) at a 0.01 significance level, while human resource planning leads to an increase in the skills gap (β = 0.309) at a significance level of 0.05, as can be seen in Table 3.
Skills and Human Resource Management for Industry …
619
Table 2 Employee skills and student skills gaps Skills
Required (1)
Employer (2)
p—value (1)–(2)
Student (3)
p—value (1)–(3)
Mean
Mean
Knowledge of advanced technology
4.15
3.43
0.000***
3.02
0.000***
Process understanding
4.26
3.77
0.000***
3.65
0.000***
Information and Communication technology
4.21
3.64
0.000***
3.37
0.000***
Coding
3.84
3.01
0.000***
1.99
0.000***
Understanding IT security
4.12
3.43
0.000***
2.21
0.000***
Total
4.12
3.43
0.000***
2.85
0.000***
3.63
0.000***
3.63
0.000***
Mean
1. Technical skills
2. Methodology skills Entrepreneurial thinking
4.18
Problem-solving
4.34
3.70
0.000***
4.06
0.000***
Decision-making
4.36
3.68
0.000***
4.05
0.000***
Analytical
4.18
3.42
0.000***
3.88
0.000***
Researching skills
3.88
3.26
0.000***
3.11
0.000***
Total
4.19
3.54
0.000***
3.75
0.000***
Negotiation
4.28
3.69
0.000***
3.45
0.000***
Language and communication
4.39
3.75
0.000***
4.03
0.000***
Ability to work in a 4.52 team
4.02
0.000***
4.74
0.000***
Interpersonal relations
4.44
3.84
0.000***
4.64
0.000***
Ability to transfer knowledge
4.34
3.87
0.000***
4.09
0.000***
Total
4.39
3.83
0.000***
4.19
0.000***
Ability to work under pressure
4.47
3.83
0.000***
4.08
0.000***
Leadership
4.18
3.60
0.000***
3.56
0.000***
Flexibility
4.17
3.60
0.000***
3.60
0.000***
3. Social skills
4. Personal skills
(continued)
620
S. Lertpiromsuk et al.
Table 2 (continued) Skills
Required (1)
Employer (2)
p—value (1)–(2)
Student (3)
p—value (1)–(3)
Mean
Mean
Emotional intelligence
4.48
3.85
0.000***
4.57
0.105
Continuous learning and self-improvement
4.47
3.86
0.000***
4.35
0.023**
Total
4.36
3.75
0.000***
4.03
0.000***
Mean
Statistical significance levels: *** 0.01 and ** 0.05
Table 3 The role of HRM in skills gaps Human Resource Management
B
Standard error
β
p-value
Constant
0.511
0.557
Human resource planning
0.196
0.078
0.309
−0.251
0.073
−0.390
0.131
0.086
0.180
0.132
−0.113
0.090
−0.169
0.217
0.032
0.131
0.033
0.809
Recruitment Selection Training and development Performance appraisal
0.362 0.014** 0.001***
Statistical significance level: *** 0.01 and ** 0.05; R2 = 0.234; Adjusted R2 = 0.179
5 Conclusion For Industry 4.0, social and personal skills were found to be very important, while technical and methodological skills were considered important, with social skills being viewed as most important. Regarding employee skills, there is a gap in all skills and the enterprises should engage in upskilling their workers across the board. While skills for fourth-year students, the ability to work in a team, interpersonal relations, and emotional intelligence are skills that the students reported as having higher levels than employer requirements. Furthermore, human resource management in recruitment has the most influence in reducing the skills gaps. However, human resource management in planning would appear to increase the skills gaps. This could be explained by the greater focus of human resource management in resource planning, the greater the requirements of firms regarding the skills possessed by their employees. Therefore, the curriculum from educational institutions needs to be adjusted to ensure that there is an instruction in those skills where there is a deficit, such as technical and methodology skills. This is essential if graduates are to acquire the skills of the labor market required for effectively servicing Industry 4.0. The institutions should have teaching and learning programs such that students obtain all types of
Skills and Human Resource Management for Industry …
621
skills, because enterprises place importance on the acquisition of all of them. In particular, institutions should focus on courses that enable students to develop their own technical skills, such as ICT mastery and coding. The institutions also need to pay more attention to cooperative education, as this will help students to understand the working process better before entering the labor market.
References 1. Office of Small and Medium Enterprise Promotion. Report of the Situation of Small and Medium Enterprises in 2018. (2018) 2. Sangmahachai K (2016) The era that changed with the arrival of the fourth industry. Technol InnoMag Online 43:38–41 3. Roongsangjun T (2018) Coping with the changes in the 4.0 era of Thai workers. J Soc Work 26(2):172–204 4. Grzelczak A, Kosacka M, Werner-Lewandowska K (2017) Employees competences for industry 4.0 in poland—preliminary research results. Int Conf Product Res 24:139–144 5. Hecklaua F, Galeitzkea M, Flachsa S, Kohlb H (2016) Holistic approach for human resource management in Industry 4.0. CLF—CIRP Confer Learn Factor 6:1–6 6. Chinachoti P (2018) The readiness of human resource management for industrial business sector towards industrial 4.0 in Thailand. Asian Administr Manag Rev 1(2):123–131 7. Hung YT, Cant MC, Wiid JA (2016) The importance of human resources management for small businesses in South Africa. Problems Perspect Manag 14(3):232–238 8. Sembling R (2016) Impact of human resources’ knowledge and skills on SMEs’ in Medan City, Indonesia. Int J Manag Econ Soc Sci 5(3):95–104 9. McClelland DC (1973) Testing for competence rather than for “intelligence”. Am Psychol 28:1–14 10. CEDEFO (2010) The skill matching challenge: analyzing skill mismatch and policy implications. Luxembourg: Publications Office 11. Lounkaew K (2018) Skills gap and implications for the development of Thailand. Research Report. Faculty of Economics, Thammasat University 12. McGuinness S, Ortiz L (2016) Skill gaps in the workplace: measurement, determinants and impacts. Industr Relations J 47(3):253–278 13. Daily Infographic. Understanding Tech Skills Gap (Online). www.dailyinfographic.com, August 26, 2019. (2019) 14. Nigro FA (1959) A public personnel administration. Henry and Co, New York 15. Clark R (1992) Human resources management: framework and practice, 2nd edn. McGraw-Hill, Sydney, p 1992 16. Ivancevich JM (1998) Human resource management, 7th edn. McGraw-Hill, Boston, MA 17. Mondy RW, Noe RM, Premeaux SR (1999) Human resource management, 7th edn. PrenticeHall International, London, p 1999
Fully Passive Unassisted Localization System Without Time Synchronization ´ Przemysław Swiercz
Abstract Wireless networks became the primary foundation of modern communication. The proliferation of mobile wireless devices will continue. The market for mobile applications is growing rapidly and the need for context-aware services often require to precisely determine the device’s location. GPS modules are packaged into the vast majority of modern smartphones and other mobile devices that take advantage of rich services and applications. When it comes to localizing wireless devices by service operators or in closed-door surroundings more advanced techniques must be applied. Some of them require target devices to support the particular localization method explicitly. Other techniques are easily detectable in the area of deployment or require specific prerequisites to be met. In this paper, a new localization technique is presented. The primary goal of the new method is to allow government organizations and other service providers to localize wireless devices within a given area without any assistance from the target. Moreover, the introduced technique may be deployed in both closed-door and open-space areas using low cost and easy to set up equipment. Localized targets cannot detect being localized and are not being identified thus personal privacy is not violated in any case. Previous studies have shown that it is possible to achieve a low-cost infra-structure-based localization system. However, these methods are easily detectable by targeted devices thus cannot be used as an effective surveillance technique. The presented method should be considered as a reliable and effective surveillance system. It does not require any changes to be implemented in targeted wireless devices when it comes to both hardware and software. Keywords Localization · Synchronization-free · TDoA · ToF · Surveillance
´ P. Swiercz (B) Faculty of Electronics, Department of Computer Engineering, Wrocław University of Science and Technology, Wrocław, Poland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_55
623
624
´ P. Swiercz
1 Introduction As of today, GSMA Intelligence [1] reports the number of mobile connections at 10 billion, whilst according to US Census Bureau, [7] the world population is around 7.7 billion. Excluding M2M connections, there is approx. 7 billion mobile phone users worldwide as reported by Statista [6] and the forecast for 2024 reaches 7.4 billion devices. Wireless networks became the primary foundation of modern communication. The proliferation of mobile wireless devices affected society in many areas, especially when it comes to so-called digital presence. The market for mobile applications is growing rapidly and the need for context-aware services often require to precisely determine the device’s location. This is why GPS modules are packaged into the vast majority of modern smartphones and other mobile devices that take advantage of rich services and applications. On the other hand, when it comes to estimating the location of wireless devices by service operators or in closed-door surroundings more advanced techniques must be applied. Some of them require target devices to support and cooperate with the infrastructure and others are applicable only by telecom providers as they require access to GSM network at the operator level [10]. In general, localization techniques can be divided into two categories. The first category can be defined by the fact that the mobile device itself tries to determine its own location. It is called self-localization. There are several methods that can be used to achieve that. The most common example is Global Positioning System (GPS). The device receives a satellite signal and computes its location based on the estimated distance to at least three satellites. This requires the device to synchronize time with the senders (satellites) and to know the current location of each satellite. Time synchronization increases the number of satellites required for computation to four. Satellite’s orbital position information is encoded within the broadcast signal itself. This is known as Time of Arrival (ToA) technique which is sometimes called also Time of Flight (ToF) localization. Other self-localization methods share the same principle—there must be a set of transmitters with known positions that transmit signals used by the mobile device to compute its estimated location. This computation can be based on a few types of measurement: as mentioned earlier, ToA, Angle of Arrival (AoA), Received Signal Strength Indicator (RSSI), etc. Apart from radio-based techniques, there are also other self-localization methods. One of them is using visual recognition of the surrounding area. An example of this approach is Google’s Project Tango. This technique is based on visual analysis of the area and extracting its visual features in order to build a 3D map and estimate the location and the orientation of the camera (device) [4]. This project is currently under heavy research and development. The second category of localization methods can be described as infrastruc-turebased localization systems. It is characterized by the fact that the device is being localized by the infrastructure and does not necessarily know its own location. The localization system, however, can always estimate the device’s position [5].
Fully Passive Unassisted Localization System …
625
There are two sub-types of infrastructure-based localization systems. The first one requires the target device to cooperate with the infrastructure, e.g. reply to synchronizing signals [2]. The other is not determined by the assistance of a localized device which is considered in this case as a source of any sort of signal. In this work, the main focus is put on this type of localization system often referred as unassisted.
2 Unassisted Localization Systems A localization technique where target devices do not need to support an infrastructure-based localization system can be used as a surveillance method. The infrastructure that is required to support this technique in most cases is composed of a set of anchors—listening devices with the known position are used to estimate the target’s location. Those methods often require anchors to precisely synchronize the time in order to be able to estimate the position of the source of the signal. It is called Time Difference of Arrival (TDoA) technique which is a subcategory of multilateration (MLAT) technique. In this approach, each anchor stores the exact time at which the signal was received. This information is forwarded to a central node which knows the exact position of anchors and solves a system of nonlinear equations in order to compute the estimated location of the target device [8]. The requirement of synchronization, however, makes the infrastructure more complex and therefore more expensive to design, build and deploy. Solving this problem is a topic of current research. There are few modifications of TDoA method that aim to remove the time synchronization requirement.
3 Related Work A research shows that it is possible to estimate the position of the source of the signal using TDoA method without time synchronization. It does, however, introduce other requirements. A recent study by Kim and Chong [3] presents one of those methods. In their research, it is assumed that the initial position of the target device is known and the device transmits a ranging signal periodically thus can be identified. It is acceptable to make these assumptions for distributed sensor networks. On the other hand, such expectation makes the proposed method not suitable as a reliable surveillance technique. Another research introduces a method called Whistle [9]. As opposed to the previously described study, the Whistle system neither make any assumptions of the initial position nor the possibility to identify the target device. As a modification of TDoA technique, this method is based on time measurement. It does not, however, require time-synchronized anchors. It is worth noting that one factor disqualifies this
´ P. Swiercz
626
method as being used as a reliable surveillance technique—it is the fact that Whistle system is not fully passive. The localization requires anchors to emit synchronizing signals to make the position estimation possible. Such signals can be easily detected by the target device making it aware of the presence of Whistle system
4 Proposed Method To create a reliable and effective localization surveillance system a new fully passive TDoA-based localization system without time synchronization called Passync is presented in this paper. The proposed technique is designed to work with any kind of source signals including acoustic and radio signals. It does not require to identify the target device thus this method is applicable to encrypted wireless communication like GSM, WCDMA, LTE or WiFi. Similar to previous studies, the Passync method is using anchors without the requirement of time synchronization. The difference, however, is that Passync anchors do not emit any signals which makes them completely undetectable by target devices. Therefore, this localization system can be safely used as a surveillance technique to precisely estimate the position of any kind of wireless signal source. Moreover, there is no assumption made that the monitored devices have any known initial position. Passync method is applicable in both closed-door and open-space areas. The general idea can be specified as follows. Each anchor is receiving a series of signals transmitted by a set of mobile wireless devices within the monitored area and measures how much time elapsed between each two subsequent signal retrievals. This data is shared with the rest of anchors, e.g. by forwarding to the central node. Signals are identified using a digital fingerprint. An example fingerprint may be an SHA digest of received bitstream of a digital signal, but in fact, it can be any kind of function that translates signal data to the identifier (fingerprint) with negligible or no collision margin. It should be noted that fingerprinting can be applied to an encrypted communication. Let’s assume a 2D space. An absolute time of retrieval of two subsequent signals (here: 1 and 2) at anchor A, as presented on Fig. 1, is defined as follows: T A,1 = T1 + t A,1
(1)
t A,1 = c− 1 (x A − x1 )2 + (y A − y1 )2
(2)
T A,1 = T1 + c− 1 (x A − x1 )2 + (y A − y1 )2
(3)
where: xn , yn —2D coordinates of signal source position (signal ids are 1, 2, 3…) c—signal propagation speed
Fully Passive Unassisted Localization System …
627
Fig. 1 2D space ToF visualization
Fig. 2 Timeline for ToF
Tn,m —an absolute time of retrieval of signal m at anchor n Tm —an absolute time of signal m emission tn,m —time of flight of signal m to anchor n If we assume that signals 1 and 2 were transmitted at the same moment, the TDoA at anchor A can be defined as follows: T A,1,2 = T A,1 − T A,2 = t A,1 − t A,2
(4)
where: T A,1,2 —time difference of retrieval of signals 1 and 2 at anchor A There is no requirement for signals to be transmitted at the same moment. Passync method works for any random delay between signal transmissions. The central node that stores the data from the anchors is not required withal. This information can be
´ P. Swiercz
628
distributed within the network so each anchor can act as a central node independently making the system fault-tolerant. The time difference of two signals on a given anchor is a sum of a random transmission time difference and an unknown ToF difference of those signals. The ToF difference is directly proportional to the distance difference between the anchor and positions of signal sources. Having this information makes it possible to build a system of nonlinear equations for n-dimensional space. In general, if signals were transmitted at different moments (the timeline visualization for this case can be found on Fig. 2.): T A,1,2 = T A,1 − T A,2 = T1 − T2 + t A,1 − t A,2
(5)
Two-dimensional space requires five equations to be solved. There are five unknowns in the system: two position coordinates of the source of the first signal, transmission time difference and two position coordinates of the source of the second signal. Solving this system results in calculating the positions of both signal sources which significantly reduces the number of TDoA equations required to localize a large number of mobile devices. Three-dimensional space requires a system of seven equations. In general, n-dimensional space requires 2n + 1 equations to be solved and always allows to calculate positions of two signal sources at once. To simplify calculations let’s define transmission time difference as R1,2 = T1 − T2
(6)
The final system of nonlinear equations for a 2D space and five anchors is presented below: ⎧ √ √ (x A −x1 )2 +(y A −y1 )2 − (x A −x2 )2 +(y A −y2 )2 ⎪ ⎪ t A,1,2 = R1,2 + ⎪ c√ ⎪ √ ⎪ ⎪ (x B −x1 )2 +(y B −y1 )2 − (x B −x2 )2 +(y B −y2 )2 ⎪ ⎪t B,1,2 = R1,2 + ⎪ c√ ⎨ √ (xC −x1 )2 +(yC −y1 )2 − (xC −x2 )2 +(yC −y2 )2 (7) tC,1,2 = R1,2 + c √ ⎪ √ ⎪ 2 2 2 2 ⎪ ⎪ ⎪t D,1,2 = R1,2 + (x D −x1 ) +(y D −y1 ) − (x D −x2 ) +(y D −y2 ) ⎪ c√ ⎪ √ ⎪ ⎪ (x E −x1 )2 +(y E −y1 )2 − (x E −x2 )2 +(y E −y2 )2 ⎩ t E,1,2 = R1,2 + c The unknown values here are x1 , y1 , x2 , y2 and R1,2 .
5 System Prototype Implementation In the system prototype, acoustic signals were used. This approach allowed to focus on the validation of the Passync method itself and use a simpler hardware for the anchor receivers and target device transmitters. Target devices (signal sources) are
629
14 13.5 13 12.5 12 11.5 11 10.5 10 9.5 9 8.5 8
5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 1
2
3
4
5
stdv. positioning error (cm)
Fig. 3 Mean error per number of transmitters
Mean positioning error (cm)
Fully Passive Unassisted Localization System …
6
Number of transmitters
transmitting acoustic signals lasting 50 ms with a random delay in a range from 100ms to 1100 ms with a 10 ms step which gives 100 different delays within specified bounds. The monitored area used in experiments is a closed-door room 10 m wide and 10 m long. The size of the monitored area was limited by the power of used acoustic transmitters and the sensitivity of receivers. Transmitters were built using Atmel ATMega 32 microprocessors with a 16-MHz clock and multi-tone PWM buzzers. The anchors, however, used Xilinx Spartan 3 FPGA with Ethernet PHY layer and TCP/IP stack modules. All anchors were connected to the same network switch and to the central unit (laptop). The time measurement on the anchors had an accuracy of 10 ms.
6 Experiment Results Five anchors were used in each experiment as this is the minimal sufficient number required by the Passync method. The number of transmitters was ranging from 2 to 5. As it is shown in Fig. 3, the accuracy of position estimation does not depend on the number of target devices and is similar for 2, 3, 4 and 5 transmitters present in the monitored area. Each transmitter was localized with a comparable margin of error no matter how many additional target devices were added to the experiment. This shows that the proposed method is stable in terms of the number of signal sources within the monitored area. The measured mean error of position estimation was around 12 cm (Fig. 4). The placement of anchors had a significant impact on the results. If any two receivers were placed closer than 40 cm from each other the mean error was growing. Figure 5 shows the relation of the mean error and the smallest distance between two anchors
Mean positioning error (cm)
Fig. 4 Mean error for static configuration
15 14.5 14 13.5 13 12.5 12 11.5 11 10.5 10 9.5 9 8.5 8
5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0
2
4
6
8
10
stdv. positioning error (cm)
´ P. Swiercz
630
12
450
36
400
32
350
28
300
24
250
20
200
16
150
12
100
8
50
4
stdv. positioning error (cm)
Fig. 5 Mean error per distance between anchors
Mean positioning error (cm)
Sequence of experiment
0 0
50
100 150 200 250 300 350 400
Smallest dist. between anchors (cm)
in the system. This problem affects all anchor-based localization systems and is not specific to the Passync method, though. The experiments showed that there are three main error sources that affect the position estimation. The first one is the accuracy of time measurement on the anchors. For acoustic signals, 10-us resolution is enough for an acceptable error margin but when it comes to radio signals a more sophisticated time measurement protocols must be applied. The second factor that affects the accuracy is the multipath effect. The closer to the walls the anchors or transmitters were placed the stronger impact signal reflections had. The third source of error is the Signal-to-Noise Ratio (SNR). Environmental noises are affecting the quality of received signal and, in some cases, may be jamming it. When the energy of the noise is close to the signal’s energy, the result is that not every anchor receives the signal thus the overall localization error grows.
Fully Passive Unassisted Localization System …
631
7 Conclusions In this paper, a new fully passive unassisted localization method without time synchronization is presented. Unlike other similar TDoA-based methods, the Passync technique cannot be detected by the target devices within the monitored area and no assumption is made when it comes to initial position of mobile devices. To validate this method, a series of experiments was run. The experiment was conducted using acoustic signals to simplify the hardware required for the tests. A proposed method like other anchor-based techniques is prone to main error sources: noise, time measurement accuracy and multipath effect. The results show that, even with a very simple hardware, it is possible to achieve a good positioning accuracy in a closed-door environment. There is, however, a room for future research and experimentation on this technique. With a more sophisticated hardware, the Passync method is applicable for wireless radio device localization. This requires more effort to be put on time measurement protocol as the speed of radio wave propagation is significantly higher than the speed of sound. The Passync method can be reliably used as an effective surveillance technique since it is undetectable and works with any signal source including acoustic and radio. The computational complexity grows linearly with the number of signals to position, however, a distributed computing techniques can be applied to achieve near real time position calculations.
References 1. GSMA Intelligence. https://www.gsmaintelligence.com 2. Jankowski T, Nikodem M (2017) Synchronization-free TDoA localization method for large scale wireless networks. In: 2017 international conference on indoor positioning and indoor navigation (IPIN). IEEE 3. Kim S, Chong J-W (2015) An efficient TDOA-based localization algorithm without synchronization between base stations. Int J Distrib Sensor Netw 11(9):832351 Jan 4. Lynen S, Sattler T, Bosse M, Hesch J, Pollefeys M, Siegwart R (2015) Get out of my lab: Largescale, real-time visual-inertial localization. In: Robotics: science and systems XI. Robotics: Science and Systems Foundation 5. Mao G, Fidan B (eds) Localization algorithms and strategies for wireless sensor networks. IGI Global 6. Statista, The Statistics Portal. https://www.statista.com 7. United States Census Bureau. http://www.census.gov 8. Xiong H, Chen Z, Yang B, Ni R (2015) TDOA localization algorithm with compensation of clock offset for wireless sensor networks. China Commun 12(10):193–201 9. Xu B, Sun G, Yu R, Yang Z (2013) High-accuracy TDOA-based localization without time synchronization. IEEE Trans Parallel Distrib Syst 24(8):1567–1576 Aug 10. Zhang Y, Liu H, Fu W, Zhou A, Mi L (2014) Localization algorithm for gsm mobiles based on rssi and pearson’s correlation coefficient. In: 2014 IEEE international conference on consumer electronics (ICCE). IEEE
Appropriation Intention of a Farm Management Information System Through Usability Evaluation with PLS-SEM Analysis Helga Bermeo-Andrade
and Dora González-Bañales
Abstract The aim of this article is to present the results of the evaluation of usability for a web-based Farm Management Information System (FMIS) in terms of the following parameters: ease of use, usefulness, ease of learning, and their relation to the appropriation intention. Using Partial Least Squares Structural Equation Modeling (PLS-SEM) analysis, an FMIS called itagüe® was evaluated with a group of 64 fruit producers from the Tolima region of Colombia. The obtained results show that only the perceived usefulness of the FMIS is related to the appropriation intention, whereas ease of use is related to ease of learning and usefulness. Therefore, an important factor in successfully appropriating an FMIS is its perceived usefulness, which is related to its ease of use. Adopting this approach informs FMIS developers of the importance of designing systems that end-users in an agricultural context perceive as both useful and easy to use. Keywords Farm management information systems · Usability evaluation · PLS-SEM · Agriculture
1 Introduction In our current society, which is dominated by information and knowledge, integrating Information and Communication Technologies (ICT) supports the efficient and modern management of agricultural production units and the formation of sustainable feeding systems [1]. Such integration involves solutions such as Farm Management Information Systems (FMIS), which were formed to support the crop, administrative, and financial management of farms [2]. Currently, an agricultural administrator who uses information technology and is an expert in his/her field H. Bermeo-Andrade (B) Universidad de Ibagué, Ibagué, Colombia e-mail: [email protected] D. González-Bañales Instituto Tecnológico de Durango/Tecnológico Nacional de México, Durango, Mexico e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_56
633
634
H. Bermeo-Andrade and D. González-Bañales
of agricultural production, finds his/her farming experience insufficient to achieve both sustainability and commercial success [3]. With the support of ICT, agricultural administrators must explore the latest advances in agricultural research and technology while incorporating decision-making skills in their work [4]. The incorporation of ICT in farm management is a particularly relevant and necessary issue for producers in developing countries with deep-rooted agricultural traditions, as these technologies can become a method of realizing productivity and competitiveness [5]. Colombia is a country of significant agricultural traditions. Historically, Colombia has stood out in international markets for its rich supply of agricultural products such as coffee, bananas, flowers, and fruits. Nevertheless, the agricultural sector still faces great challenges given the deficient technological infrastructure of productive units and rural areas combined with the deficient and ineffective logistic integration of the various components of these supply chains. Acting together, these two factors weaken the position of producers and undermine the productivity of the agricultural sector [6]. To correct the part of the problem within Colombia’s rural regions, an FMIS called itagüe® (www.itague.co) was developed. It was created to facilitate the tasks of planning, recording, controlling, and tracing operative tasks linked to the output of agricultural production units. Following the design and testing of itagüe® with a group of fruit producers, the next step was to investigate how the parameters ease of use, perceived usefulness, and ease of learning of an FMSI relate to appropriation intention. Therefore, this article presents the results of the usability evaluation of the FMIS itagüe® , compiled from a group of 64 avocado producers from the central region of Tolima, Colombia. The analysis considers the following parameters: ease of use, perceived usefulness, ease of learning, and appropriation intention.
2 Reference Framework 2.1 Farm Management Information Systems (FMIS) In recent years, sophisticated farm management systems have emerged to replace complex monolithic agricultural systems and obsolete software tools. The latest trend is to allow these management systems to operate over the Internet and be a kind of “systems of systems” [1]. Notably, neither the development nor the use of FMIS is particularly recent. In the 1990s, Lewis [7] stated that if farmers wish to thrive in a turbulent economic environment, they must manage their productive resources and commercial services more efficiently. Thirty years later, while the technology may now have other names and capacities, these statements remain valid. Although the names and capacities may have changed, the technology continues to serve the same purpose of increasing efficiency, quality, innovation, and productivity in the agricultural sector [1].
Appropriation Intention of a Farm Management Information System …
635
2.2 FMIS Appropriation Intention Factors The identification of factors associated with the appropriation intention has been a topic of scientific interest regarding the incorporation of information technology into agricultural production processes. Henley and Dowell’s study [8] suggests that the appropriation/adoption intention process is strongly associated with the degree to which producers perceive ICT as useful to support their business model while providing benefits such as increasing the production scale, improving cost management, supporting business diversification, and increasing the overall efficiency of farm production processes. The use and final adoption of an FMIS are related to several aspects associated with both producer and farm profiles. Here are some of them: • The innovation capacity of the main decision-makers is related to more sophisticated FMIS [9]. • The use of technical assistance can positively affect the adoption and intensity of FMIS usage [3]. • The study from Tummers et al. [2] showed that some of the recurring deficiencies linked to the design of FMIS are that they are difficult to understand, have a poor user interface, have different interfaces, have too much information, and are excessively specialized.
2.3 Usability Evaluation in Information Systems In general, one critical aspect in information system development is the way the system allows easy interaction with users [10], as well as easy usage and learning, which is related to usability and user experience (UX) [11, 12]. In that context, usability and user experience become an important parameter to render informatics solutions usable for an FMIS [13]. In order to measure usability aspects, usability questionnaires are employed to collect self-reported data from users regarding their experience with a specific product or system. These questionnaires help researchers understand the usability of a product by revealing users’ perceptions of outcomes and interactions [14]. One of the several usability questionnaires is the usefulness, satisfaction, and ease of use questionnaire (USE test) created by Lund [15], which measures the subjective usability of a product or service. It is a survey that examines four criteria: usefulness, ease of use, ease of learning, and satisfaction. The original USE test can be applied in various scenarios for usability assessment, modifying the questions and using different scale values, without much risk of damaging validity [16]. Although there is little published research on the reliability or validity of the USE test, Gao et al. [17] addressed this issue and found that for the four criteria in the USE Questionnaire, Cronbach’s alpha was 0.98, indicating the high reliability of the overall USE score.
636
H. Bermeo-Andrade and D. González-Bañales
3 Methodology 3.1 itagüe® FMIS Fountas et al. [4] classified commercial FMIS into four clusters (groups) according to two parameters: support for inventory management and specific support. The first cluster is categorized as “basic service systems,” the second as “sales-oriented systems,” the third as “specific systems,” and the fourth as “complete systems.” In addition to having functions based on web and mobile devices, the systems in the last category allow functions such as generating field reports, managing field operations, and administering inventories, so the itagüe® FMIS is considered in the category of “complete system.”
3.2 User Profile A total of 64 users linked to the avocado (Persea americana) production sector of the central zone of Tolima, Colombia, participated in the evaluation of itagüe® and the implementation of an online questionnaire to evaluate the parameters of ease of use, usefulness, and ease of learning together with their relation to the appropriation intention. Of the total of 64 participants, 60% (38) were both owners and managers of small farms, mainly with no more than five hectares; 9.5% (6) of them were under 30 years old, 76.2% were between 30 and 60 years old, and the rest were 60 or older (14.3%); and 67% (42) were men. In general, their self-assessment of their level of use of technology, which included their experience in the use of a desktop or laptop, tablet, smartphone, email, social networks, and the Internet as tools for making decisions on the farm, was from basic to intermediate.
3.3 Usability Evaluation For the usability evaluation of itagüe® , an adaptation of the USE questionnaire developed by Lund [15] was used. USE is an English acronym for perceived utility, ease of use, ease of learning, and satisfaction. The applied questionnaire was structured according to five dimensions: (1) General information, (2) Ease of use, (3) Ease of learning, (4) Usefulness, and (5) Appropriation intention. Dimensions 2, 3, and 4 were measured with a Likert scale [where 1 was the lowest value and 5 was the highest value] and dimension 5 with a Scale [1 to 10]. The questionnaire was conducted online and sent to each farm producer that was trained in the itagüe® system, using a form designed in Google Forms® .
Appropriation Intention of a Farm Management Information System …
637
3.4 PLS Data Analysis The recommendations of Hair et al. [18] were considered as the basis to perform PLSSEM analysis. Notably, the number of users in the area of usability is constantly under discussion [19, 20]; however, the studies of Nielsen et al. [21] have shown that even with small samples of 8–12 participants, the results are significant. As this study was analyzed with the PLS approach, to achieve a statistical power of 80%, with a significance level of 5%, a minimum R2 = 0.25, and three independent variables, the recommended sample size was at least 37. The sample analyzed in this research was 64 farmers. From a practical perspective, PLS-SEM exhibits certain advantages over its counterparts in the field of multivariate analysis because it makes very few claims on the population size, makes no assumptions regarding data distribution, and achieves high levels of statistical power in complex models, even when working with small samples [22]. The research model was formulated with the following hypotheses: • H1 : “There is a relation between ease of use and ease of learning.” • H2 : “There is a relation between ease of use and usefulness.” • H3 : “There is a relation between perceived ease of learning and the appropriation intention.” • H4 : “There is a relation between perceived usefulness and appropriation intention.” The research model’s goodness of fit was assessed in two phases. First, a reflective model assessment verified internal consistency (Cronbach’s alpha and composite reliability), convergent validity (indicator reliability and average variance extracted), and discriminant validity (HTMT; Heterotrait-Monotrait Ratio of Correlations). Second, a structural model assessment considered coefficients of determination (R2 ), predictive relevance (Q2 ), the magnitude and significance of path coefficients, and f 2 effect sizes. Data analysis was performed using the SmartPLS v3.3.2 software [23] at a significance level of 0.05.
4 Results and Discussion This section presents the results obtained using the PLS-SEM technique and addresses the theoretical foundations of Hair et al. [18].
638
H. Bermeo-Andrade and D. González-Bañales
4.1 Assessment of the Structural Model The values from the assessment of the resulting structural model are detailed below (Fig. 1). Based on these results, we must assess the capacity exhibited by the model to predict dependent variable variances. For these purposes, we reviewed the determination coefficients (R2 ) along with the magnitude and significance of the path coefficients, effect sizes (f 2 ), and predictive relevance (Q2 ), which provide evidence of quality in measurement model estimates [18]. The R2 value represents to what extent the variance of an endogenous construct is explained by its predictor constructs, in this case, the appropriation intention. As the adjusted R2 considers the number of predictor constructs, comparing models with different numbers of predictor constructs, regardless of different sample sizes, is useful. For the assessed model, R2 = 0.366 and R2 adjusted = 0.346. Those values are considered weak [18]. Here, the contribution from exogenous constructs to the R2 value is mainly due to the effect size from the usefulness value (f 2 = 0.312, a large effect according to Cohen’s scale). Table 1 displays the results from the significance test for the path coefficients of the structural model, including load values for both the refined and raw models. Based on the above results, the data suggests the following hypotheses are supported: • “There is a relation between perceived usefulness and appropriation intention.”
Fig. 1 Measurement model with exogenous constructs and an adjusted R2
Table 1 Significance of test results for the path coefficients provided by the structural model
t Usefulness → appropriation intention Ease of use → usefulness Ease of use → ease of learning Ease of learning → appropriation intention *(p value 0 for a specific endogenous latent variable denote the predictive relevance of the structural model for a dependent construct [18]. Herein, values of 0, 0.25, and 0.50 suggest small, medium, and large predictive relevance, respectively [18]. In this specific study, the Q2 value for all constructs was 0.290, which means that the model’s predictive relevance can be classified as medium, considering only the usefulness construct, Q2 = 0.286.
4.3 Discussion The obtained results, some of which are based on Lund’s [15] findings, suggest that: (1) ease of use and usefulness influence one another in a manner such that improvements in ease of use improve the ratings of usefulness and vice versa; (2) while both aim for satisfaction, usefulness is relatively less important when the systems are internal systems that users are required to use; (3) users are more variable in their usefulness ratings when they have had only limited exposure to a product; (4) satisfaction is strongly related to the usage (actual or predicted); (5) for internal systems, the items contributing to ease of use for other products could be separated into the following two factors: ease of learning and ease of use (which were highly correlated). Furthermore, it was found that a critical aspect of the development of an FMIS is the way the system provides an easy interaction with end-users [10] as well as the manner in which the system enables ease of use and learning. More recently, studies such as Morris et al. [8], Carrer et al. [3], and Lin [24] suggested that the appropriation/adoption intention process is strongly associated with the degree of utility that producers see/perceive in ICT to support their business model while allowing them to, for example, increase their production scale, improve cost management, support business diversification, and overall make their production processes on farms more efficient, which was reflected on the itague® PLS research model.
640
H. Bermeo-Andrade and D. González-Bañales
5 Conclusions This study concludes that an important factor to achieve success in the transfer and appropriation of an FMIS is the perceived usefulness, which in turn is related to ease of use. Results presented from the Colombian case provide useful and actual evidence for the IT industry, primarily for FMIS developers, to realize the importance of designing systems that are perceived as both useful and easy to use by end-users for agricultural contexts. Funding: SGR resources to the Agreement No. 046-2019, executed as part of framework Agreement No. 2077–2018 with the research project “Development of competitive advantages through I + D + i activities in eight agricultural sector chains,” executed by Universidad del Tolima-Gobernación del Tolima (Colombia).
References 1. Munz J, Gindele N, Doluschitz R (2020) Exploring the characteristics and utilisation of farm management information systems (FMIS) in Germany. Comput Electron Agric 170: https:// doi.org/10.1016/j.compag.2020.105246 2. Tummers J, Kassahun A, Tekinerdogan B (2019) Obstacles and features of farm management information systems: a systematic literature review. Manuscr Prep 157:189–204. https://doi. org/10.1016/j.compag.2018.12.044 3. Carrer MJ, de Souza Filho HM, Batalha MO (2017) Factors influencing the adoption of Farm Management Information Systems (FMIS) by Brazilian citrus farmers. Comput Electron Agric 138:11–19. https://doi.org/10.1016/j.compag.2017.04.004 4. Fountas S, Carli G, Sørensen C, Tsiropoulos Z, Cavalaris C, Vatsanidou A, Liakos B, Canavari M, Wiebensohn J, Tisserye B (2015) Farm management information systems: Current situation and future perspectives. Comput Electron Agric 115:40–50. https://doi.org/10.1016/j.compag. 2015.05.011 5. Pradhan RP, Mallik G, Bagchi TP (2018) Information communication technology (ICT) infrastructure and economic growth: a causality evinced by cross-country panel data. IIMB Manag Rev 30:91–103. https://doi.org/10.1016/j.iimb.2018.01.001 6. OCDE: OECD Review of Agricultural Policies: Colombia 2015. https://www.minagricultura. gov.co 7. Lewis T (1998) Evolution of farm management information systems. Comput Electron Agric 19:233–248. https://doi.org/10.1016/S0168-1699(97)00040-9 8. Morris W, Henley A, Dowell D (2017) Farm diversification, entrepreneurship and technology adoption: analysis of upland farmers in Wales. J Rural Stud 53:132–143. https://doi.org/10. 1016/j.jrurstud.2017.05.014 9. Sonderegger A, Schmutz S, Sauer J (2016) The influence of age in usability testing. Appl Ergon 52:291–300. https://doi.org/10.1016/j.apergo.2015.06.012 10. Elberkawi E, El-firjani N, Maatuk A, Aljawarneh S (2016) Usability evaluation of web-based systems: a new method and results. Int Confer Eng MIS (ICEMIS) 2016:1–5 11. Norman D (2004) Emotional design: why we love (or hate) everyday things. Basic Civitas Books 12. Lazar J, Feng JH, Hochtein H (2017) Research methods in human-computer interaction. Morgan Kaufmann 13. Aziz N, Kamaludin A, Sulaiman N (2013) Assessing web site usability measurement. Int J Res Eng Technol 2:386–392
Appropriation Intention of a Farm Management Information System …
641
14. Hornbæk K (2006) Current practice in measuring usability: challenges to usability studies and research. Int J Hum Comput Stud 64:79–102 15. Lund A (2001) Measuring usability with the USE questionnaire. Usability user Exp. Spec Interes Gr 8 16. Hartson R, Pyla P (2012) The UX Book: process and guidelines for ensuring a quality user experience. Morgan Kaufmann 17. Gao M, Kortum P, Oswald F (2018) Psychometric evaluation of the USE (usefulness, satisfaction, and ease of use) questionnaire for reliability and validity. Proc Hum Factors Ergon Soc 3:1414–1418. https://doi.org/10.1177/1541931218621322 18. Hair JFJ, Hult GTM, Sarstedt M, Castillo Apraiz J, Cepeda-Carrion G, Roldán JL (2019) Manual de partial least squares structural equation modeling (PLS-SEM). SAGE-Omnia Science 19. Tullis T, Albert B (2013) Measuring user experience. Collecting, analyzing, and presenting usabilidty metrics. Morgan Kaufmann Publishers 20. Rojas-Pino LA, Macías-Iglesias JA (2012) Sistema automatizado de integración de arquitectura de la información en el desarrollo de aplicaciones web interactivas. El Prof la Inf 21:160–166. https://doi.org/10.3145/epi.2012.mar.06 21. Nielsen J, Lewis J, Turner C (2006) Determining usability test sample size. Int Encycl Ergon Hum Factors, Second Ed 3 Vol. Set. 3, 3084–3088. https://doi.org/10.1201/9780849375477. ch597 22. Hair JFJ, Risher JJ, Sarstedt M, Ringle CM (2019) When to use and how to report the results of PLS-SEM, https://www.emeraldinsight.com/doi/10.1108/EBR-11-2018-0203 23. Ringle CM, Wende S, Becker J-M (2015) Smart PLS. www.smartpls.com 24. Lin CC (2013) Exploring the relationship between technology acceptance model and usability test. Inf Technol Manag 14:243–255. https://doi.org/10.1007/s10799-013-0162-0
Collaborative Control of Mobile Manipulator Robots Through the Hardware-in-the-Loop Technique Luis F. Santo, Richard M. Tandalla, and H. Andaluz
Abstract This article aims at designing and implementing the “Hardware-in-theLoop” (HIL) technique, to evaluate the collaborative control algorithm of two mobile manipulator robots to carry out tasks of movement and manipulation of objects in an industrial environment. The developed control structure is made up of a centralized control algorithm, developed in MATLAB mathematical software, which is linked to the HIL system, which contains both the kinematic model and the dynamic model of each robotic system programmed on the Raspberry Pi. To analyze the optimal functioning of the proposed control algorithm, an immersive virtual reality scenario is designed and implemented using the UNITY3D graphic engine, which facilitates interaction with the user. Keywords Collaborative control · Hardware in the loop · Mobile manipulator · Virtual environment
1 Introduction Industry 4.0 has opened several forms of automation that have a common objective; the improvement of productivity and optimization of the work processes, reason why, many companies already have different collaborative robots that are designed to share with people with a high degree of security in a work atmosphere [1, 2]. Currently, mobile manipulator robots refer to robots that consist of a manipulator arm on a moving platform [3, 4]. The combination of these robotic systems has several advantages because it complements the capacity of a fixed base manipulator arm with L. F. Santo (B) · R. M. Tandalla · H. Andaluz Universidad de Las Fuerzas Armadas ESPE, Sangolquí, Ecuador e-mail: [email protected] R. M. Tandalla e-mail: [email protected] H. Andaluz e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_57
643
644
L. F. Santo et al.
the freedom of movement offered by the mobile platform with wheels [5]. Mobile manipulators allow for complex tasks that require both locomotion and handling capabilities [6, 7]. The mobile manipulator robots have multiple applications in different areas of the industry, for example, in construction companies, in mining, as assistance to people, among others [8]. In addition, a mobile manipulator robot to adapt to an industrial environment, must meet several features such as having the ability to work with people without any risk, be autonomous, easy to configure and install, and work in compliance with the requirements of the industry [9]. Systems generally made up of two or more mobile manipulator robots that fulfill a common objective are called collaborative robots [10], which allows for multitasking operations [11], allowing standard controllers to cooperate with each other, to perform complex tasks that cannot be performed by a single robot [12], The control schemes of the collaborative robots are mainly based on: (i) centralized architecture in which the central computer generates the control actions to achieve the secondary projections [13]; (ii) decentralized architecture, in which all components of the robotic system consist of the proprietary processing unit which develops both kinematic and dynamic control [13]. The implementation of control algorithms for mobile manipulator robots in the development of collaborative tasks presents a high complexity, because the robots are not always physically available due to the high cost of acquiring each one [14]. Therefore, we suggest the implementation of didactic modules that use the HIL technique [15] as a low-cost alternative for the implementation of the proposed algorithms, in which it will be possible to simulate a control environment for the robot, which will analyze the stability of the control algorithm implemented [16], to validate trends in control errors. Avoiding the cost, risk, and time associated with physical testing today. [17] This article aims to develop a system that uses the technique Hardware in the Loop, for collaborative control between two mobile manipulator robots, for which it is considered a 3D virtual environment where the displacement of the robots is simulated. To do this, from MATLAB mathematical software control actions are sent through a wireless channel to Raspberry Pi modules, which are programmed mathematical models for each robot, to evaluate the control algorithms and present them in the 3D virtual interface. This document is developed in seven sections. Section two presents the structure of the implemented HIL system. Section three shows the development of the virtual environment. Section four presents the modeling both kinematic and dynamic of the Mobile Manipulator Robot. Section five describes the Collaborative Control algorithm. Section six includes results obtained. Finally, Sect. 7 contains the conclusions of the implemented system.
Collaborative Control of Mobile Manipulator Robots Through the Hardware …
645
2 System Structure The HIL technique is a simulation of the system or process required in real time, in which real signals from the controller are connected to a test system that uses a computer as a virtual representation of the plant model, with the objective of validating the controller without the cost and time associated with current physical testing. For the proposed system, its structure consists of the implementation of the “Hardware-in-the-loop” (HIL) technique, for the collaborative control of two mobile manipulator robots to execute tasks of moving and manipulating a common object. Figure 1 describes the structure of the system based on HIL. The proposed structure consists of three main stages. In the first stage, the collaborative control algorithm consists of three layers: (a) The Offline Planning layer: It is in charge of determining the initial conditions for the system and generating the required path of the common object to be manipulated by the robots, (b) The layer
Fig. 1 System structure
646
L. F. Santo et al.
of Planning Online: It aims to restore the references at any time, so that the layer of formation reacts appropriately to the environment, and (c) The layer of control formation: It is in charge to generate the signals of control that are sent to each robot manipulator mobile that works as a team to realize the established trajectory by the planning layers. The signals of the collaborative control algorithm for the robots are calculated by the mathematical software MATLAB, which are transmitted via a wireless communication channel to each mobile manipulator robot of the HIL system. The second stage presents the Robot Control, which has two mobile manipulator robots, each consists of a 3DOF robotic arm on a mobile platform type unicycle, whose mathematical models, kinematics and dynamics, are integrated into a Raspberry Pi to simulate the dynamic behavior of the mobile manipulator robot in real time, without the need for the physical mechanism. In stage three, in the Unity3D graphic engine is designed and implemented a virtual environment for immersion and interaction of the system in real time, which contains the structure of the mobile manipulator robots, designed by means of CAD software considering their respective mechanical characteristics, which emulate the movement of the robots. The communication between MATLAB mathematical software and Unity 3D virtual environment is based on the communication protocol DLL (dynamic link library), through shared memories.
Fig. 2 Structure of the virtual environment
Collaborative Control of Mobile Manipulator Robots Through the Hardware …
647
3 Virtual Environment Figure 2 presents the proposed scheme for the implementation of the virtual environment designed in the Unity3D software. The aim of the virtual system is to simulate the environment; the programming of scripts that allow the control of each object present in the environment; external complements that allow to improve the animation of the scenes. The simulated system incorporates 3D objects that allow the creation of a virtual reality environment. This environment is developed considering all the real physical properties, taking into account that it is an industrial environment. The environment contains two mobile manipulator robots, each consisting of a 3DOF robotic arm on a mobile platform, designed to execute collaborative tasks that require the ability to manipulate and move objects. The virtual environment presents each mobile manipulator robot, developed in a CAD design software, considering the mechanical and physical characteristics of the whole robotic system. In the same way, the 3D design of the scenario where the animation is developed is presented; in this case, the environment is an industrial environment (a construction), where the robots will execute tasks of manipulation and displacement of the mechanical pieces (beams, rods, etc.) belonging to the environment. In the execution of the system, the virtual environment is linked to input devices such as the Oculus, which uses the device library (SDK), which allows visual and auditory user interaction during the execution of the established task. It also links up with MATLAB mathematical software through the use of Shared Memories (SM) based on Dynamic Link Libraries (DLLs), which allow information to be shared between programs. The use of shared memories allows the control actions, both position and rotation, to be entered into the robotic system.
4 Mobile Manipulator Robots The vector q of n coordinates, defines the mobile manipulator robot, so that q = [q1 q2 q3 . . . qn ], where qa and q p represent the standardized coordinates of both the manipulator and the mobile platform, respectively. It is obtained n = n a + n p , where n a refers to the magnitudes of the standardized spaces of the manipulator, in the same way n p of the mobile platform. The vector q expresses the configuration of the working area of the robot; expressed as N. The ubication for the point of interest of the robot refers to the vector m-dimensional h = [h 1 h 2 h 3 . . . h m ], which represents both the position and the orientation of the point of interest corresponding to the robot in the plane . The grouping of all locations forms the operating area of the robot, expressed as M.
648
L. F. Santo et al.
4.1 Kinematic Model It presents the ubication of the point of interest (operating end) h(t) depending on both the position of the platform as well as the configuration of the manipulator [18]. f : Na x M p → M so that
qa , q p ⇒ h = f qa , q p
in which, Na corresponds to the operating area of the manipulator, in the same way M p of the mobile platform. By means of the derivative of the position of the point of interest taking into account those derived from the location of the platform and the configuration of the manipulator, the instantaneous kinematic model of the robotic system is obtained [18], ∂f ˙ qa , q p v(t) h(t) = ∂q in which, h˙ = h˙ 1 h˙ 2 h˙ 3 . . . h˙ m expresses the velocity at the point of interest, v = [v1 v2 v3 . . . vm ] = v p va expresses the mobility control of the robotic system. The dimension is denoted by n = np + na , in which np expresses the magnitudes of the mobility control corresponding to the platform and na to the magnitudes of the manipulator. Following the analysis of the above statements, it is possible to express the velocity at the operational end as ˙ h(t) = (q)v(t)
(1)
in which, (q) defines the Jacobian matrix, which represents the linear mapping of ˙ v(t) and h(t), which corresponds to the velocity vector of the robotic system and the operating end, respectively.
4.2 Dynamic Model The mathematical model that refers to the dynamics of the robotic system, is obtained by means of the Euler–Lagrange technique, from the difference of the kinetic energy K , and the potential energy P [19]. L=K−P
(2)
Collaborative Control of Mobile Manipulator Robots Through the Hardware …
d f i (t) = dt
∂L ∂ q˙ i
−
∂L ∂qi
649
(3)
Applying the Euler–Lagrange method, the forces generated in the robot are defined as f(t) = g(q) + C q , q˙ v(t) + M(q)˙v(t)
(4)
in which, q is the main vector of the coordinate system for the robotic system, v represents the velocity vector of the robot, M expresses the inertial matrix for the system, C represents the matrix of centrifugal and centripetal forces, and g(q) defines the gravity [19, 20]. The reference velocities that act as control signals of the system, considering the dynamics of the robot, are expressed in the following way: ¯ q , q˙ v(t) + M(q)˙ ¯ v(t) vref (t) = g¯ (q) + C
(5)
5 Collaborative Control The collaborative control starts from the formation of two mobile manipulator robots in which the main interest is based on the center point of the distance between the manipulators having a projection of position and orientation. This projection is shown in Fig. 3, in which the two mobile manipulators h1 and h2 execute the collaborative task. Between the two robots h1, h2, there is the projection in which the point n0 is obtained η0 = p0 s0 , where you have to p0 = x0 y0 z 0 , which establishes the center point of the distance that separates the robots in coordinates from the reference
Fig. 3 Collaborative control of mobile manipulators
650
L. F. Santo et al.
system ; y s0 = d0 α0 β0 where d0 is in units of length and represents the distance between the manipulators; while α0 and β0 are angles representing the orientation formed with the axes of the reference system . The projection allows to differentiate the position and orientation that is desired (given by the established task) and the orientation and position of the collaborative work to be done at that time η0 , to later generate new velocities and make corrections of the errors in the mobile manipulators, for which we have the following expressions with which the projections are made [7]. p0 = ⎡ ⎢ s0 = ⎢ ⎣
1 2
(x2 + x1 ) (y2 + y1 ) (z 2 + z 1 )
2 + (z 2 + z 1 )2 (x2 − x1 )2 + (y2− y1 )
tan−1 tan
−1
z 2 −z 1
x2 −x1 y2 −y1 x2 −x1
(6) ⎤ ⎥ ⎥ ⎦
(7)
where x1 y1 z 1 , x2 y2 z 2 refers to the position components of the reference system , of the point of interest of the first and second mobile manipulator, respectively. Taking into account the forward and backward temporal derivation of the kine˙ matic transformations, the relation between the temporal variations is obtained h(t), η(t) ˙ which are represented by J, defined as the Jacobian matrix, the same that is described by ˙ η(t) ˙ = J(h)h(t)
(8)
˙ ˙ h(t) = J−1 (η)η(t)
(9)
Otherwise, it is defined by
5.1 Formation Controller The collaborative control stage in Fig. 1 designates the shape parameters and the definite position parameters for the task ηd = pd sd and desired variations η˙ d = p˙ d s˙d by means of η˜ d = ηd − η the formation error is determined, deriving the equation as a function of time η˙˜ d = η˙ d − η. ˙ Now, by defining η˜ d with null or zero value and considering as the control objective of the system, to verify which system is stable, a check is implemented with Lyapunov’s method. Defining the function as ˙ η) ˜ = 21 η˜ T η˙˜ is V(η) ˜ = 21 η˜ T η˜ > 0. Taking into consideration the first derivative V( ˜ = η˜ T η˙˜ = replacing in η˙˜ d = η˙˜ d − η˙˜ taking into consideration η˙ = h˙ is obtained V˙ (η)
Collaborative Control of Mobile Manipulator Robots Through the Hardware …
651
n˜ T η˙˜ d − J h˙ . Consequently, the law of control is established by ˙ ˜ = J−1 η˙ h(t) = J−1 (η˙ d + K tanh(η))
(10)
where K expresses the diagonal matrix with a positive gain. By deriving (10) with respect to time, it is determined that ˙ η) ˜ 0. Taking into account the first derivative and replacing h˙˜ = h˙ d − h˙ 2
and considering that h˙ = Γ v, where represents theJacobian matrix and v repreT ˜ ˙ ˜ ˜ sents the control actions; with what is obtained V h = h hd − Γ v Thus, the law of control for the ith mobile manipulator is defined as v = −1 h˙ d + G tanh h˜
(12)
in which G represents the matrix that has positive diagonal gain. By deriving (12) with respect to time, it is determined that ˙ h˜ = h˜ T G tanh h˜ < 0 V
(13)
which leads to say that the equilibrium point is asymptotically stable. Therefore the ˜ error position the point of interest h(t) → 0 asymptotically t → ∞
652
L. F. Santo et al.
6 Experimental Results In this part of the article, you can see the results of the development and implementation of the HIL technique focused on the teaching process. The proposed system consists of two Raspberry Pi development cards that were considered, in which the kinematic and dynamic models corresponding to each mobile manipulator robot obtained in (1) and (5). Each Raspberry Pi device is connected through a wireless communication channel to the Central Control Unit developed in MATLAB mathematical software which is in charge of executing the collaborative control law. By using the DLL communication protocol, the communication between the MATLAB software and the Unity 3D development environment is carried out, where the user has the ability to interact and observe the movement that develops the robotic system to execute the various tasks of collaboration. Figure 4 shows the digitized environment, which consists of an environment that simulates an industrial environment (Construction) in which two mobile manipulators are used as means of transporting heavy loads. The evaluation was conducted using two mobile manipulators with a collaborative controller, for the transport of a bar with a defined trajectory, the simulation progresses in time, where the correction of errors made by the controller is checked. The manipulators acquire a formation pattern to make the trajectory and also keep the axis of the manipulated object in the center of the distance separating the manipulators. In Fig. 5, the results obtained from the experiment are presented, showing the desired task and the task performed, so that the controller responds optimally and independently. The errors in the speeds generated by the controller for each mobile manipulator are presented in the curves in Figs. 6 and 7. The errors corrected by the corrective action of the controller are presented in the curves in Figs. 8 and 9.
Fig. 4 Virtual mobile manipulator environment
Collaborative Control of Mobile Manipulator Robots Through the Hardware …
653
Fig. 5 Trajectory described by the mobile manipulators
Fig. 6 Linear velocities errors of the mobile manipulators
7 Conclusions This work presents the design of the Hardware-in-the-loop technique developed in a virtual environment, for the collaborative control between two robotic systems, in order to carry out tasks of moving and manipulating objects in an industrial environment. The development of the control algorithms is composed of two stages, a kinematic cascade controller, which executes the path of the manipulated object; also a dynamic controller that acts as a compensator for errors generated by the dynamic effects of the robotic system. With the tests developed through the use of a virtual system, it was possible to verify and evaluate the control algorithm implemented, managing to demonstrate the validity of the control algorithm for the fulfillment of
654
L. F. Santo et al.
Fig. 7 Errors in the angular velocities of the mobile manipulators
Fig. 8 Distance and orientation errors
the established task, as presented in Figs. 8 and 9, the errors of position, distance, and orientation of the manipulated object, tend satisfactorily to zero. In developing this project, the flexibility obtained by using an HIL system becomes evident, since it allows the creation of efficient systems with good performance at a much lower cost than the usual in today’s industry.
Collaborative Control of Mobile Manipulator Robots Through the Hardware …
655
Fig. 9 Positioning errors
Acknowledgements The authors would like to thank the Coorporación Ecuatori-ana para el Desarrollo de la Investigación y Academia- CEDIA for their contribution in innovation, through the CEPRA projects, especially the project CEPRA-XIV-2020-10 “Multi-user immersive technologies aimed at synergistic teaching-learning systems”; also, the Universidad de las Fuerzas Armadas ESPE and the Research Group in ARSI, for the support for the development of this work.
References 1. Zuehlke D (2019) Industry 4.0: more than a technological revolution. Revista CEA 10(5) 2. Sharkawy A, Papakonstantinou C, Papakostopoulos V, Moulianitisy V, Aspragathos N (2020) Task location for high performance human-robot collaboration. J Intell Robot Syst 100:183–202 3. Chen J, Kai S (2018) Cooperative transportation control of multiple mobile manipulators through distributed optimization. Sci China Inf Sci 61 4. Soltanpour M, Zaare S, Haghgoo y M, Moattari M (2020) Free-chattering fuzzy sliding mode control of robot manipulators with joints flexibility in presence of matched and mismatched uncertainties in model dynamic and actuators. J Intell Robot Syst 100(51):47–69 5. Acosta J, Gonzáles G, Andaluz y V, Garrido J (2019) Multirobot heterogeneous control considering secondary objectives. Sensors 19(20) 6. Ramos D, Almeida y L, Moreno U (2019) Integrated robotic and network simulation method. Sensors 19(20) 7. Bonilla E, Rodriguez J, Acosa y J, Andaluz V (2020) Teaching and learning virtual strategy for the navigation of multiple-UAV. 15th Iberian conference on information systems and technologies (CISTI) 8. Herrera K, Rocha J, Silva y F, Andaluz V (2020) Training systems for control of mobile manipulator robots in augmented reality. 15th Iberian conference on information systems and technologies, pp 1–7 9. Molina y M, Ortiz J (2018) Coordinated and cooperative control of heterogeneous mobile manipulators. International conference on social robotics, vol 11357 10. Galicki M (2019) Tracking the kinematically optimal trajectories by mobile manipulators. J Intell Robot Syst 93(51):635–648
656
L. F. Santo et al.
11. Varela J, Buele J, Jadan y J, Andaluz V (2020) Teaching STEM competencies through an educational mobile robot. International conference on human-computer interaction, vol 12206 12. Leica P, Rivera K, Muela S, Chávez D, Andaluz y G, Andaluz VH (2109) Consensus algorithms for bidirectional Teleoperation of aerial manipulator robots in an environment with obstacles. IEEE fourth ecuador technical chapters meeting (ETCM), pp 1–6 13. Salcic Z, Atmojo U, Park H, Chen y A, Wang K (2019) Designing dynamic and collaborative automation and robotics software systems. IEEE Trans Industr Inf 15(1):540–549 14. Leica P, Balseca J, Cabascango D, Chávez D, Andaluz y G, Andaluz VH (2109) Controller based on null space and sliding mode (NSB-SMC) for bidirectional Teleoperation of mobile robots formation in an environment with obstacles. IEEE fourth ecuador technical chapters meeting (ETCM), pp 1–6 15. Li G, Zhang D, Xin Y, Jiang S, Wang y W, Du J (2019) Design of MMC hardware-in-the-loop platform and controller test scheme. CPSS Trans Power Electron Appl 4(2):143–151 16. Chen y Y, Braun DJ (2019) Hardware-in-the-loop iterative optimal feedback control without model-based future prediction. IEEE Trans Robot 35(6):1419–1434 17. Nguyen H, Yang G, Nielsen y A, Jensen P (2019) Hardware- and software-in-the-loop simulation for parameterizing the model and control of synchronous condensers. IEEE Trans Sustain Energy 10(3):1593–1602 18. Varela J, Andaluz y V, Chicaiza F (2018) Modelling and control of a mobile manipulator for trajectory tracking. 2018 international conference on information systems and computer science (INCISCOS), pp 69–74 19. Andaluz V, Varela J, Chicaiza F, Quevedo W, Ruales B (2019) Teleoperation-of-a-mobilemanipulator-with-feedback-forces-for-evasion-of-obstacles. Iberian J Inf Syst Technol 291– 304 20. Varela J, Chicaiza F, Andaluz V (2020) Dynamics of a unicycle-type wheeled mobile manipulator robot. Advances in emerging trends and technologies, pp 24–33
Application ArcGIS on Modified-WQI Method to Evaluate Water Quality of the Euphrates River, Iraq, Using Physicochemical Parameters Ali Chabuk , Hussein A. M. Al-Zubaidi , Aysar Jameel Abdalkadhum , Nadhir Al-Ansari , Salwan Ali Abed , Ali Al-Maliki , Jan Laue , and Salam Ewaid Abstract The global interest of the water bodies due to the water scarcity crisis encourages researchers to study the details water environment in different aspects. Consequently, this study objective to evaluate the water quality in the Euphrates River through adopted 11 physicochemical parameters measured at 16 locations during the 3 years (2009–2011) for both seasons (dry and wet). In this study, the water quality index model (WQIM) was calculated after modifying the weighted arithmetic method to define as MWQI. The chosen parameters were comprised of Cl, SO4, HCO3 , NO3 , Na, K, Ca, Mg, TH, TDS, and EC. For the river section of locations (L.1–L.10), all readings of the selected parameters (expected HCO3) were increased more and A. Chabuk (B) · H. A. M. Al-Zubaidi University of Babylon, Babylon 51001, Iraq e-mail: [email protected] H. A. M. Al-Zubaidi e-mail: [email protected] A. J. Abdalkadhum Al-Qasim Green University-Babylon/Iraq, Al Qasim, Iraq e-mail: [email protected] N. Al-Ansari · J. Laue Lulea University of Technology, 971 87 Lulea, Sweden e-mail: [email protected] J. Laue e-mail: [email protected] S. Ali Abed University of Al-Qadisiyah, P.O. Box.1895, Diwaniya 58001, Iraq e-mail: [email protected] A. Al-Maliki Ministry of Science and Technology, Baghdad 10001, Iraq e-mail: [email protected] S. Ewaid Southern Technical University, Basra 61001, Iraq e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_58
657
658
A. Chabuk
more. Then, all concentrations of parameters were recorded the high increasing after location (L.10) at locations (L.11–L.14). The concentrations situation of HCO3 were verse vice at all locations. For the average values of 3 years (wet, dry, total), the MWQI of section length of the Euphrates River at locations (L.1–L.10) were classified as good water quality (class, C-II). The river section at locations (L.11–L.16, excepted L.13) was classified as poor water quality (class, C-III), while the location (L.13) was classified as very poor (class, C-IV). The interpolation prediction maps of the average readings (total, dry, and wet) of the Euphrates River were output in GIS using the interpolation model of IDWM. Keywords Modified-WQI · ArcGIS · IDWM · Physicochemical Parameters · the Euphrates River · Iraq
1 Introduction Global warming affects climate change and then on river basins environment in the arid and semi-arid region especially in the East Mediterranean, the Middle East, and region. Climate change is shown the effecting on the annual streamflow of the basins of the Tigris and the Euphrates Rivers through the increased temperatures, minimized precipitation, moreover, fluctuation in the state of weather [1]. Iraq suffers from complicated water crisis risk of quantity and quality due to water shortages such as sequence wars, conflicts sectors, and the sanctions by the United Nations, neglect supporting infrastructure projects from governments and limited awareness for the environment, bad monitoring of industrial and agricultural wastes that disposals directly into the river. Most sources of water that supply the main rivers in Iraq (Tigris and Euphrates) come from outsides of the Iraqi borders. This crisis effects on properties of rivers water, consequently, this may be reflected in human, social, financial, and environmental [2, 3]. Water in Iraq exists in various forms, including surface water, rain, and groundwater, and each of them differs from others in terms of quantities, physical, chemical, biological characteristics, and its economic importance [3]. The optimal use of these sources has not been achieved due to many problems, including obsolescence of the irrigation and drainage systems, and the huge amount of water that is lost especially from the Euphrates River with the tributaries that flow into it that originates from the neighboring countries lands. This makes Iraq vulnerable to threatening neighboring countries by using water as a force available to them, as well as Iraq is located within dry and semi-arid regions with a severe shortage of rain falling. Moreover, desertification factors have been taken to cover large areas of Iraq [3]. The water of the Euphrates River inside Iraq is divided into two parts according to the concentration of ions in it. The first part is the chemical water type which is existed in the upper part and its contents are sulfate/bicarbonate, while the chemical water type in the lower part of it is sulfate/chloride [4].
Application ArcGIS on Modified-WQI Method to Evaluate Water Quality …
659
The length of the Euphrates River is 2786 km and its average discharge at Syria– Iraq borders before the construction dam in Syria 983–1046 m3 /s [5, 6]. The Murat and the Karasu Rivers in the highlands in Turkey form the Euphrates River that meets at Keban city, Turkey that including Keban Dam. In Turkey, the total flow of river feds by 88% of its tributaries besides from melting snows. In April and May, the maximum flows of the Euphrates River occurred which represented 42% of the total annual flow [6–8]. Then, the river flows southward of the Syrian plateau after passed 455 km within Turkey lands, where three tributaries flow into the Euphrates River, where that the water source of tributaries comes from Turkey [8, 9]. These tributaries add 10% of the total flow to the river before entering the Iraqi borders [6, 10]. After the Euphrates River passing 661 km within Syria lands, the river enters the Iraqi border at Qaim City. The river flows through a limestone desert known as the Jazirah plateau reaching the downstream of the river and joins with the Tigris River at Al-Qurnah City, Al-Basrah Governorate to form Shatt Al-Arab with a length of 190 km and then discharges into the Arabian Gulf [9]. Part of the water river flows into the marsh (south of Iraq) [6, 9, 11]. The Tigris River is provided the Euphrates River with water due to the decreasing flows in the Euphrates River through a canal that is connected to the Tigris River across the Tharthar Lake with the Euphrates River [9]. The Basin of Euphrates consists of 440,000 km2 , where 28% is situated in Turkey, in Syria is 22%, and 47% in Iraq [8]. Eight main dams were constructed on the Euphrates River: five dams in Turkey (Keban Dam, Ataturk Dam, Karakaya Dam, Karkamis Dam, and Birecik), three dams in Syria (Al Baath Dam, Tabqa Dam, Tishreen Dam), and one dam in Iraq (Haditha Dam) [8]. the Euphrates River suffers from deteriorating its water quality due to the effects of agricultural and domestic sources, additionally, the salinity increases severely along the river within Iraq lands [8]. The important issue to protect the river water stream from degradation is controlled on water quality. This is done by analyzing the physical, chemical, and biological properties in the water bodies of rivers to enable the makers to decide to reduce water pollution [6]. The requirement for water has become a main and valuable element for life due to the rapid growth rate, urbanization, and climate change; water has become more and more precious. Characteristics changes in rivers led to an increase in the contamination of river water environment that resulted from inappropriate human activities also to the effects on natural factors (e.g., nature of the soil that the stream of the river passing through it and drought conditions due to building projects and dams at upstream of the river in Turkey and Syria) form season to season and from year to year [6]. So, the discharge of the Euphrates River is changed seasonally and yearly because the water sources of the Euphrates River come from precipitation through the river route in Turkey, Syria, and Iraq, where that the changeable in discharge leads to high variation in the parameter’s concentration [2, 6]. The last dry years is represented the big problem that the Euphrates River suffered from it due to building large dams and reservoirs upstream of the river which caused reducing the water level. additionally, the wrong politic of series governments and the effects of wars and adopt an ancient irrigation system. These factors contributed
660
A. Chabuk
to the increasing salinity in the Euphrates River route inside Iraq to be more than the allowable limit according to WHO (1500 mgL−1 ) [6, 12]. [6] studied the changes in the quality of the Euphrates River water inside the Iraqi borders for the period from 2009 to 2010. Twenty concentrations of parameters were selected along the river. The results were compared with the concentrations between three sites and with similar parameters from earlier studies. The study found a large variance in water flow rate for the selected years and within the year itself due to the discharge of the Euphrates water is entirely controlled by upstream countries (Turkey and Syria). The results showed that the pollutant concentration increase compared to the results with previous studies, and the quantities of discharge were fluctuated and decreasing toward the downstream. Reference [4] conducted the concentration of TDS. in the Euphrates River within Al-Muthana and Al-Qadisiyah governorates during 2015 and it was selected fourteen stations on the river. This study found that the TDS concentrations in Al-Qadisiyah Governorate were between 527–3110 mgL−1 and 1130–8020 mgL−1 in Al-Muthana Governorate. The water quality index (W.Q.I.) study related to the Al-Diwanyiah River part of the Euphrates River was occurred from 2015 to 2016 by measuring the concentrations of nine parameters from selected four locations on the Diwanyiah River. This study found the water quality index for the river was between marginal to poor according to the guideline of the Canadian Water Quality Index (GC-WQI) [13]. Reference [14] carried out the study on the Euphrates River within the Southeast part of the DhiQar Governorate to determine the water quality index using the ANOVA model. Eight concentrations of parameters (temperature, dissolved oxygen, total hardness, total dissolved solids, hydrogen ion, turbidity, chloride, and electrical conductivity) were measured at three selected stations (monthly and seasonally). The results showed that the W.Q.I. was poor at sites (S.1 and S.3), while at the site (S.2) the W.Q.I. was classified as good. Reference [15], conducted study between 2008 and 2009, seventeen parameters at 11 locations in the upper section of the river (Heet–Al-Ramadi). They adopted the WHO standards to compare with their results. In this study, they reached that the parameters of T (0C), PH, K, SO4, HCO3, Cl, D.O., NO3, EC, PO4 were less than the limits of WHO standards. Otherwise, the parameters of Ca, Na, TH, TSS, TDS, BOD, and Turbidity were over the upper limit. The aims of this study are evaluating the concentrations of physicochemical parameters for estimating the (W.Q.I.) after modifying of the Euphrates River (MWQI). For the years of 2009, 2010, and 2011 from upstream to downstream, as well as producing the prediction maps of the MWQI using the GIS Software 10.5.
2 Methodology The main items in this study can be seen in the schematic diagram (Fig. 1). The first part includes the assessment of physicochemical parameters and the second part is employed to compute the MWQI for 16 locations on the river. Then, the third part
Application ArcGIS on Modified-WQI Method to Evaluate Water Quality …
661
Study Area Euphrates River
Data ArcGIS and IDWM Model
Modified Water Quality Index (MWQI)
Project Sixteen Locations on River in GIS
Explaining the MWQI Equations
Exporting MWQI values to the GIS
Calculating the MWQI for 16 Location.
Producing maps of MWQI in GIS using IDWM
Assessment Physic-Chemical Parameters
Elevating of 11 parameters at 16 locations along Euphrates River
Reclassification the MWQI into new classes
Results and Discussion
Conclusions Fig. 1 The schematic diagram for research methodology
comprises creating the maps of the MWQI in GIS along the Euphrates River using the IDWM model.
2.1 Study Area Iraq is situated at a latitude between 34° 22 52 N and 30° 00 19 N, and at longitude between 41° 08 55 E and 47° 25 38 E (Fig. 2). The total area of Iraq is 438,317 km2 . The water area is around 950 km2 which is represented 0.22% of the entire area of Iraq. The population of Iraq is about 41 million inhabitants and its growth rate in 2019 is 2.5% [16]. The climate is divided into the arid, semi-arid subtropical, and continental in central and southern Iraq. The climate in the north region of Iraq is changing to the Mediterranean [16–18]. Mean annual precipitation is about 216 mm in central and southern Iraq, while in the north part of Iraq can be reached to 1200 mm. Temperatures in Winter changes from 16 °C during the day to about 2 °C at night and recorded below 0 °C in the north of Iraq, where that Winter is generally is mild to cool and so cold in northern Iraq. Summers in the central and southern of Iraq is extremely hot especially in the last years, the temperature in this season can be reached more than
662
A. Chabuk
Fig. 2 The Euphrates River Map within Iraqi borders
55 °C in the southern part of Iraq and it changes between 26 and 42 °C in the north, and in the central and southern between 35 and 50 °C [16, 17, 19]. Iraq is divided topographically into sevens sub-regions. sub-regions consist of High folded zone, Low folded zone, Al-Jazira zone, Western Desert zone, Thrust zone, Mesopotamia zone, and Southern desert zone [20]. The Euphrates River flows inside Iraq approximately 1000 km until it joins in Basrah Governorate with the Tigris River to form Shatt Al-Arab with a long of 190 km [19, 21]. The study of the hydrological system is reflected in the climatic factors and phenomena that prevail in the region. this fact states the great relationship between Climatology and Applied Hydrology on the basis that the hydrological system that the river basin now forms reflects the climate effect and its components over the times that the study area has gone through [4].
2.2 Assessment of Water Quality Eleven parameters concentrations were measured at 16 locations in the years of 2009, 2010, and 2011 along the Euphrates River [22]. These locations are Al-Qaim, Before Haditha Dam, Haditha Dam, Hit, Al-Ramadi, Al-Saqlawia, Al-Fallujah, AlYusufiyah, Sadat Al-Hindiah, Al-Kifl, Al-Shinafiyah, Al-Samawah, Al-Nasiriyah, Al-Madina, Al-Izz, and Al-Qurnah (Fig. 3). Table 1 shows the coordinates of the
Application ArcGIS on Modified-WQI Method to Evaluate Water Quality …
663
Fig. 3 The Euphrates River and sampling locations, Iraq
Table 1 Coordinates of the 16 locations along the Euphrates River
Symbol
Location
Latitude
Longitude
L.1
Al-Qaim
34° 22 52 N
41° 08 55 E
L.2
B. Haditha Dam
34°
20
10
N
42° 21 23 E
12
24
L.3
Haditha Dam
34°
N
42° 21 18 E
L.4
Hit
33° 38 44 N
42° 49 32 E
L.5
Al-Ramadi
33°
26
25
N
43° 16 04 E
22
34
L.6
Al-Saqlawia
33°
N
43° 41 04 E
L.7
Al-Fallujah
33° 20 36 N
43° 45 39 E
L.8
Al-Yusufiyah
33°
2
40
43
N
44° 8 09 E
L.9
Sadat Al-Hindiah
32°
N
44° 16 05 E
L.10
AL-Kifl
32° 13 47 N
44° 21 45 E
L.11
Al-Shinafiyah
45
31°
34
50
N
44° 38 44 E
19
11
L.12
Al-Samawah
31°
N
45° 16 55 E
L.13
Al-Nasiriyah
31° 02 31 N
46° 15 00 E
L.14
Al-Madina
30°
57
27
N
47° 15 27 E
58
54
L.15
Al-Izz
30°
N
47° 22 58 E
L.16
Al-Qurnah
30° 00 19 N
47° 25 38 E
664
A. Chabuk
Table 2 Physicochemical parameters at 16 locations on the Euphrates River in 2009–2011 [22] Parameters
Ca
Mg
Na
K
Cl
SO4
HCO3
T.H
T.D.S
NO3
E.C
Iraqi Stand
50
50
200
10
250
250
200
500
1000
50
2000
S.D
41
59
271
3.98
423
329
12.8
342
1233
8.2
1794
Average
114
99
340
9.43
482
611
149
694
8.56
5.9
2783
Max
203
214
853
17.8
1290
1234
171
1421
4188
24.3
6154
Min
73
44
128
5.29
145
306
129
370
823
1.8
1304
2009
2010 S.D
29
44
204
2.7
289
264
12.8
248
866
3.7
1243
Average
110
82
255
7.4
353
545
132
614
1470
5.01
2135
Max
168
176
673
12.2
933
1091
151
1152
3354
11.6
4718
Min
79
43
98
4.2
123
291
112
374
755
1.4
1111
S.D
47
46
209
3.41
320
282
17.98
283
998
0.96
1429
Average
107
81
247
8.91
338
510
139
577
1439
2.16
2116
Max
217
183
714
15.68
1055
1156
168
1183
3683
4.3
5273
Min
59
46
93
5.39
104
269
96
349
675
1.1
1018
2011
selected locations. The 11 values of parameter concentrations recorded at 16 locations on the river for 3 years can be seen in Table 2.
2.3 Computing of Modified Water Quality Index The modified weighted arithmetic method is applied so that compute the W.Q.I. in the current study. Eleven were adopted by measuring them 16 locations along the Euphrates River to find the (W.Q.I.). According to [23], the WQI was computed at 16 locations along the Euphrates River using Eqs. (2), (3), and (4) as follows: S B Pi =
D i − D0 ST V i − D0
× 100
(1)
where: SBPi : Sub-index of the ith parameter, Di : Value of measured concentration for the ith parameter, D0: Ideal value in water for each parameter that has a value of zero, except the pH of 7 and the DO of 14.6 ppm. STVi : Value of Iraqi standard of rivers of the ith parameter [24]. IWi =
1 ST V i
(2)
Application ArcGIS on Modified-WQI Method to Evaluate Water Quality …
665
Table 3 MWQI ranges, statements, and classification (modified after [23]) MWQI-Range
Statement
Classification
MWQI-Range
Statement
Classification
90–100
Excellent
C-I
40–60
Very poor
C-IV
80–90
Good
C-II
20–40
Polluted
C-V
60–80
Poor
C-III
0–20
Very polluted
C-VI
IWi: Value of the inverse standard (STVi ) of the ith parameter. W.Q.I. =
SB P i × I W i IWi
(3)
In the weighted-arithmetic-method, Eq. (3) by [23] was modified to Eq. (4) so that it was reclassified the ranges of the W.Q.I. to a new classification (MWQI-Range) with a range of (0–100). Furthermore, the new MWQI values were divided into six classes and each class was given the description that deserves (Table 3). SB P i × I W i 5 M W Q I = 100 − IWi
(4)
2.4 Prediction Maps To output, the prediction maps of the MWQI of the Euphrates River, the inversedistance-weighting model (IDWM) in the GIS software was used to generate the interpolation for the MWQI values at 16 locations along the river. The prediction maps consisted of the average values of the MWQI for the years 2009, 2010, and 2011 for the whole length of the river and they output based on the values at the chosen locations. The IDWM model is applied for finding the unknown values of set points at definite locations based on the average neighboring and surrounding points that have known values and locations [25, 26]. For calculating the predicted values by this method, the nearest points to the prediction locations have a high effect compared with points that are situated further than the prediction locations [27, 28].
3 Results and Discussion 3.1 Physicochemical Concentrations for Parameters Calcium (Ca) is the first parameter among 11 physicochemical parameters selected in this study, where its concentrations were ranged 73–203, 79–168, and 59–217
666
A. Chabuk
ppm, respectively, for the years 2009, 2010, and 2011. The average values of Ca concentration (ppm) were 114 (2009), 110 (2010), and 107 (2011). The mean values of Mg were 99, 82, and 81 (ppm), respectively, in the years 2009, 2010, and 2011. The range of Mg concentration readings in the years 2009, 2010, and 2011 were, respectively, (44–214) ppm, (43–176) ppm, and (46–183) ppm. The measured values of the Ca and Mg at all selected locations on the river were over the permissible value of Iraqi standards (50 ppm) [24] (Fig. 4a, b). The sodium concentration (Na) ranged 128–853 ppm in 2009, 98 to 673 ppm in 2010, and 93–714 ppm in 2011. The mean values of the Na were 340 ppm (2009), 255 ppm (2010), and 247 ppm (2011). This study displayed concentrations of sodium (Na) at locations from (L.1) to (L.10) were within the Iraqi standards of 200 ppm [24], while the values at the locations from (L.11) to (L.16) over the maximum values according to the Iraqi standards (Fig. 4c). The maximum and minimum values of Potassium (K) concentration in 2009, 2010, and 2011 were, respectively, 17.77, 12.23, and 15.68 ppm and 5.29, 4.24, and 5.39 ppm, while the average values of the potassium were 9.44 ppm (2009), 7.44 ppm (2010), and 8.91 ppm (2011). The results show that the readings of the measured potassium at the locations (L.1–L.10) were within the allowable Iraqi standards (10 ppm) [24], while the measured K concentrations at some locations from (L.11) to (L.16) were over the standard limit (Fig. 4d). The maximum concentrations readings of chloride (Cl) in 2009, 2010, and 2011 were 1290, 933, and 1055 ppm, respectively, at the location (L.12), with mean values of 482, 353, and 338 ppm in the 3 years. During the years 2009, 2010, and 2016, the lowest concentration values of chloride were 145 (L.1), 123 (L.2), and 104 (L.2) ppm. The measured concentrations of chloride of the Euphrates River were within the Iraqi standards (250 ppm), accepted the values at the locations (L.11 to L.16) [24] (Fig. 4e). Sulphate (SO4 ) concentrations and varied between 306 and 1234 ppm at the locations (L.1 and L.13) in 2009 and between 291 and 1091 ppm at the locations (L.3 and L.13) in 2010 with average values of 510 and 545 ppm, respectively. SO4 concentrations in 2011 have fluctuated between 296 ppm (at the location L.1) and 1156 ppm (at the location L.13), and in this year, the mean recorded readings of SO4 was 510 ppm. The concentrations of SO4 in 16 locations on the Euphrates River were over the Iraqi standards (250 ppm) during these years [24] (Fig. 3f). The TDS concentration recorded in 2009, 2010, and 2011 were varied, respectively, from 823 to 4188 ppm, 755 to 3354 ppm, and 675 to 3685 ppm. The mean readings of the TDS concentrations were, respectively, 1841 ppm, 1470, and 1439 ppm in the years 2009, 2010, and 2011. For the EC concentration, the highest and lowest readings were ranged (µmhos/cm) 1304–6292 (2009), 1111–4718 (2010), and 1018– 5273 (2011). For EC concentration, the average values in 2009, 2010, and 2011 were 2783, 2135, and 2116 µmhos/cm, respectively. The readings of TDS and EC were over the acceptable Iraqi standards (1000) ppm, and (2000) µmhos/cm [24], accepted the values at the locations (L.11–L.16) (Fig. 5a, b). The peak readings of the total hardness (TH) concentrations were (ppm) 1421 (2009), 1152 (2010), and 1183 (2011). The readings of 370, 374, and 349 ppm have
Application ArcGIS on Modified-WQI Method to Evaluate Water Quality …
Ca (ppm)
225 200 175 150 125 100 75 50 25 0
Mg (ppm)
2009 2010 2011
L.1 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 L.10 L.11 L.12 L.13 L.14 L.15 L.16
2009 2010 2011
L.1 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 L.10 L.11 L.12 L.13 L.14 L.15 L.16
225 200 175 150 125 100 75 50 25 0
Locations
Locations 700 600 500 400 300 200 100 0
2009 2010 2011
Na (ppm)
667
20
K (ppm)
2009 2010
16 12 8
0 L.1 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 L.10 L.11 L.12 L.13 L.14 L.15 L.16
L.1 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 L.10 L.11 L.12 L.13 L.14 L.15 L.16
4
Locations
Locations 1300 1040
2009 2010 2011
Cl (ppm)
1200 1000 800
780
SO4 (ppm)
2009 2010 2011
600 520
0
L.1 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 L.10 L.11 L.12 L.13 L.14 L.15 L.16
200
0
L.1 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 L.10 L.11 L.12 L.13 L.14 L.15 L.16
400
260
Locations
Locations
3500 2800
2009 2010 2011
T.D.S (ppm)
2100 1400 0
L.1 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 L.10 L.11 L.12 L.13 L.14 L.15 L.16
700
Locations
E.C. (umhos/cm) 2009 2010 2011
L.1 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 L.10 L.11 L.12 L.13 L.14 L.15 L.16
4200
7000 6000 5000 4000 3000 2000 1000 0
Locations
Fig. 4 Concentrations along the Euphrates River in (2009, 2010 and 2011) for parameters of a Ca; b Mg; c Na; d K; e Cl; (f): SO4
668
T.H. (ppm)
250
2009 2010 2011
2009 2010
200
HCO3 (ppm)
150 100
0
L.1 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 L.10 L.11 L.12 L.13 L.14 L.15 L.16
50 L.1 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 L.10 L.11 L.12 L.13 L.14 L.15 L.16
1200 1000 800 600 400 200 0
A. Chabuk
Locations
Locations 25 20
2009 2010
NO3 (ppm)
15 10
0
L.1 L.2 L.3 L.4 L.5 L.6 L.7 L.8 L.9 L.10 L.11 L.12 L.13 L.14 L.15 L.16
5
Locations
Fig. 5 Concentrations along the Euphrates River in 3 years of (2009–2011) for parameters of a TDS; b EC; c TH; d HCO3 ; e NO3
represented the minimum values of (TH). For the years 2009, 2010, and 2011, the mean concentrations were 694, 614, and 577 ppm separately. The (TH) concentrations recorded in this study were over Iraqi standards of 500 ppm [24] at locations from (L.1) to (L.9) in the years of 2009 and 2010, and locations (L.1–L.10) in 2011 (Fig. 5c). The maximum values of bicarbonate (HCO3 ) were 171, 151, and 168 ppm, respectively, in 2009, 2010, and 2011. The minimum values of HCO3 were (ppm) 129 (2009), 112 (2010), also 96 (2011). Average values in the years 2009, 2010, and 2011 were, respectively, 149, 132, and 139 ppm. Compared with the Iraqi standards, all readings of HCO3 concentrations recorded in this study have exceeded the upper limit (200 ppm) [24] (Fig. 5d). The average values of Nitrate (NO3 ) were 8.56 ppm (2009), 5.01 ppm (2010), and 2.16 ppm (2011). The highest and lowest values were 24.3 and 1.78 ppm in 2009, 11.47, and 1.43 (ppm) in 2010, then in 2010 4.29 and 1.13 ppm. All readings of NO3 in this study were within the Iraqi standards limit (50 ppm) [24] (Fig. 5e).
3.2 Modified Water Quality Index (MWQI) For estimating the quality of the Euphrates River water at the chosen locations in 3 years of 2009–2011, the weighted-arithmetic-method employed after modifying.
Application ArcGIS on Modified-WQI Method to Evaluate Water Quality …
669
The average values of MWQI in the two seasons (dry and wet) as well as average readings for 3 years of the Euphrates River were, respectively, as follows: 88.4, 86.9, 85.7 (L.1), 87.9, 85.9, 86.9 (L.2), 87.1, 85.7, 86.4 (L.3), 85.7, 84.2, 84.9 (L.4), 85.1, 83.9, 84.5 (L.5), 84.9, 83.9, 84.4 (L.6), 84.3, 83.7, 84.0 (L.7), 84.2, 83.8, 83.9 (L.8), 84.5, 83.6, 84.0 (L.9) and 83.1, 82.3, 82.7 (L.10). The values of all parameters at the locations (1–3) were within Iraqi standards of rivers accepted Ca and SO4, and at locations (L.4–L.10) eight parameters within Iraqi standards of rivers accepted three parameters are Ca, SO4 , Cl. The average values of MWQI for the years (2009– 2011) at locations (L.1–L.10) were within the range of (80–90) and the water at these locations was classified as good water quality (class, C-II). The calculated values of the MWQI were within the range of (80–90) and classified as good water quality (class, C-II) for the locations from (L.1) until (L.10) of the Euphrates River. At these locations, there were low concentrations of contaminants that give positive influences assisted to improve the river water quality index. Furthermore, increase the discharge of the Euphrates River contributed to reducing physicochemical parameters concentrations, where most rivers have the property of itself-purify, consequently, these leading to reduce MWQI values. Along the Euphrates River at the locations of Al-Shinafiyah (L.11), Al-Samawah (L.12), Al-Nasiriyah (L.13), Al-Madina (L.14), Al-Izz (L.15), and Al-Qurnah (L.16), the MWQI values for the wet season and dry season and average values for the selected 3 years, respectively, were (65.6, 64.6, 65.1), (64.0, 60.4, 62.2), (61.6, 56.5, 58.8), (69.9, 63.4, 66.7), (76.8, 67.7, 72.2) and (78.3, 76.9, 77.6). Most parameters were higher than allowable Iraqi standards of rivers at the locations (L.11–L.16), excepted two parameters (HCO3 and NO3 ) at locations (L.11–L.15), and three parameters (K, HCO3 , and NO3 ) at the location (L.16) which were within Iraqi standards. Table 4 shows the MWQI classes for 16 locations on the Euphrates River in two seasons (dry and wet) and the average values of the MWQI for the seasons also for these 3 years. The calculated values of the MWQI at the locations (L.11, L.12, L.14, L.15, and L.16) were within the category of (60–80). Consequently, the MWQI at these locations was defined as poor water quality (class, C-III). For other locations, most concentrations of the physicochemical parameters for the location (L.11) in Shinafiyah passing to the location (L.16) at Al-Qurnah (excepted L.13 at AlNasiriyah) were over the permissible Iraqi standards. Therefore, the resulted MWQI values were classified as poor water quality (class, P-III). In location (L.13), the MWQI value was within the category of (40–60), and this location was classified as very poor water quality (class, C-III) due to the increased all concentrations of parameters in this location. For the 16 locations, along the Euphrates River, the average MWQI in the seasons of dry and wet also average MWQI for 3 years (2009–2011) can be seen in Fig. 6. The maps of the distribution average values of the MWQI along the Euphrates River via utilizing the IDWM for each season (dry and wet), as well as the average values of the MWQI for the years (2009, 2010, and 2011), can be seen in Figs. 7, 8 and 9.
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-III
C-III
C-IV
C-III
C-III
C-III
L.2
L.3
L.4
L.5
L.6
L.7
L.8
L.9
L.10
L.11
L.12
L.13
L.14
L.15
L.16
C-III
C-III
C-IV
C-IV
C-IV
C-III
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-III
C-III
C-III
C-III
C-III
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
Class wet
Class wet
Class dry
2010
2009
L.1
Location
C-III
C-III
C-III
C-III
C-III
C-III
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
Class dry
C-II
C-III
C-III
C-III
C-III
C-III
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
Class wet
2011
C-II
C-III
C-III
C-IV
C-IV
C-III
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
Class dry
78.3
76.8
69.9
61.1
64.0
65.6
83.1
84.5
84.2
84.3
84.9
85.1
85.7
87.1
87.9
88.4
Wet
C-III
C-III
C-III
C-III
C-III
C-III
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
Class
76.9
67.7
63.4
56.5
60.4
64.6
82.3
83.6
83.8
83.7
83.9
83.9
84.2
85.7
85.9
86.9
Dry
Average MWQI for two seasons
Table 4 Classification of MWQI and average values of the Euphrates River, for 3 years (wet, dry, total)
C-III
C-III
C-III
C-IV
C-III
C-III
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
Class
77.6
72.2
66.7
58.8
62.2
65.1
82.7
84.0
83.9
84.0
84.4
84.5
84.9
86.4
86.9
87.7
Total
C-III
C-III
C-III
C-IV
C-III
C-III
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
C-II
Class
Average MWQI for 3 years
670 A. Chabuk
Application ArcGIS on Modified-WQI Method to Evaluate Water Quality …
Wet Season Dry Season Both Seasons
MWQI
100
671
80 60
40
L.16
L.15
L.14
L.13
L.12
L.10
L.11
L.9
L.8
L.7
L.6
L.5
L.4
L.3
L.2
0
L.1
20
Locations Fig. 6 Average values of the MWQI for 16 locations along the Euphrates River
Fig. 7 Maps of ranges and classification of the MWQI for the wet season (average values), the Euphrates River
4 Conclusions The important role of the Euphrates River beside the Tigris River to utilize water in different aspects in the region that passing through it especially in Iraq, the current study aims to determine the water quality in the Euphrates River. Eleven physicochemical parameters were measured at 16 locations on the river in 3 years (2009– 2011) for two seasons. The chosen physicochemical parameters are included Cl, SO4 , HCO3 , NO3 , Na, K, Ca, Mg, TH, TDS, and EC
672
A. Chabuk
Fig. 8 Maps of ranges and classification of the MWQI for the dry season (average values), the Euphrates River
Fig. 9 Maps of ranges and classification of average values of the MWQI, for 3 years (2009–2011), the Euphrates River
Application ArcGIS on Modified-WQI Method to Evaluate Water Quality …
673
The readings of entirely physicochemical parameters (expected HCO3 ) in the years of 2009, 2010, and 2011 along the Euphrates River were noted to increase progressively from the location (L.1) in Al-Qaim to the location (L.10) in Al-Kifl. Then, the trend of increasing for all concentrations of parameters was displayed obviously after location (L.10) especially for locations (L.11–L.14). The concentration of HCO3 was decreased gradually from location (L.1) to location (L.10) and then increased after (L10) particularly at the locations (L.11, L.12, L.13, and L.16). To calculate the WQI of the Euphrates River, the method of weighted arithmetic was employed after modifying under the new title of the modified weighted arithmetic method (MWAM). The averages values of the MWQI for 3 years (total, wet, and dry) were calculated for 16 locations on the Euphrates River. For the locations from (L.1) until (L.10) of the Euphrates River, the averages values of the MWQI (wet, dry, total) were classified as good water quality (class, C-II) within the range of (80–90). The water quality of the Euphrates River at locations (L.11, L.12, L.14, L.15, and L.16) based on the calculated values of the MWQI was classified as poor water quality (class, C-III) within the category of (60–80). The water quality at the location (L.13) in Al-Nasiriyah governorate was classified as very poor water quality (class, C-IV) within the range of class (40–60), where all concentrations of physicochemical parameters measured at this location were upper the Iraqi standards limit. The calculated values of the MWQI at the locations (L.11, L.12, L.14, L.15, and L.16) were within the category of (60–80). Consequently, the MWQI for the Euphrates River at these locations was classified as poor water quality (class, C-III). The MWQI of the river waterway in Shinafiyah location (L.11) until the location (L.16) at Al-Qurnah (excepted L.13 at Al-Nasiriyah) was classified as poor water quality (class, C-III) based on the calculated MWQI values. This is due to that most measured values of parameters at these locations (L.11–L.16) were over the allowable Iraqi standard. The MWQI value at location (L.13) was within the category of (40–60), and this location was classified as very poor water quality (class, C-III) due to the increased concentrations of measured parameters in this location. The average of the total values 3 years and the average values in the wet and dry seasons for these years of MWQI at 16 locations were employed to find out the prediction maps along the whole route of the Euphrates River, where the interpolation method of Inverse distance weighting in the GIS was used for this purpose. In general, the novelty of this study included two portions. The first section studied the water quality of the Euphrates River as a total length through adopted 11 physicochemical parameters measured at 16 locations. The second section included applying a modified method to calculate the water quality index (MWQI) of the Euphrates River in Iraq for 3 years in the dry and wet seasons. Moreover, creating the distribution (prediction) maps for the MWQI values in both two seasons to evaluate the quantity of the river at each its part for drinking uses. These maps can be supported future studies for water quality of the Euphrates River at any location on it.
674
A. Chabuk
References 1. Adamo N, Al-Ansari N, Sissakian VK, Knutsson S, Laue J (2018) climate change: consequences on Iraq’s environment. J Earth Sci Geotechn Eng 8:43–58 2. Trondalen JM (2008) Water and peace for the people: possible solutions to water disputes in the middle east. (Water and Conflict Resolution) (French Edition), UNESCO, Illustrated edition, 245 3. IOM, Iraq (2020) Water quantity and water quality in central and south iraq: a preliminary assessment in the context of displacement risk. International Organization for Migration (IOM), IOM, Iraq 24 4. Al-Obeidi AHA (2017) Study and evaluate the causes of the euphrates river water salinization in middle and Southern Iraq. M.Sc. Thesis, College of Agricultural, University of Al-Muthanna, Iraq 5. Jehad AK (1984) Effect of Tharthar canal salty water on the quality of euphrates water. M.Sc. Thesis. University of Technology, Bagdad, Iraq 6. Al Bomola A (2011) temporal and spatial changes in water quality of the euphrates river-Iraq. TVVR11/5013, p 147 7. Shahin M (2007) Water resources and Hydrometeorology of the Arab Region. Springer Science & Business Media 59 8. UN-ESCWA and BGR (United Nations Economic and Social Commission for Western Asia; Bundesanstalt für Geowissenschaften und Rohstoffe). Inventory of Shared Water Resources in Western Asia, Beirut, 32 (2013) 9. Abdullah AD (2016) Modelling approaches to understand salinity variations in a highly dynamic Tidal River: The case of the Shatt Al-Arab River. CRC Press/Balkema, Netherlands 140 10. Balciogullari A (2018) The Euphrates according to medieval Islamic geographers. The Eurasia proceedings of educational and social sciences 10:261–268 11. Grego S, Micangeli A, Esposto S (2004) Water purification in the Middle East crisis: a survey on WTP and CU in Basrah (Iraq) area within a research and development program. Desalination 165:73–79 12. Al-Tikrity HN (2001) Forecasting of pollution levels in accordance with discharge reduction in selected area on Euphrates river. (Doctoral dissertation, M. Sc. Thesis, College of Engineering, University of Baghdad, Baghdad, Iraq 13. Abbas AAA, Hassan FM (2018) Water quality assessment of Euphrates river in Qadisiyah province (Diwaniyah river), Iraq. Iraqi J Agric Sci 48(6) 14. Abdullah SA, Abdullah AHJ, Ankush MA (2019) Assessment of water quality in the Euphertes River, Southern Iraq. Iraqi J Agric Sci 50(1):312–319 15. Al-Heety E, Turky A, Al-Othman E (2011) Physico-chemical assessment of Euphrates river between Heet and Ramadi cities, Iraq. J Water Resour Prot 3(11):812–823 16. Central Intelligence Agency (C.I.A.).: The World Factbook, Middle East: Iraq. Main Content, Home Library Publications (2019). https://www.cia.gov/library/publications/the-world-fac tbook/geos/iz.html#photoGalleryModal, last accessed 2020/08/21. 17. Al-Ansari N (2013) Management of water resources in Iraq: perspectives and prognoses. Engineering 5:667–684 18. Al-Ansari N, Adamo N, Sissakian V, Knutsson S, Laue J (2018) Water resources of the Euphrates river catchment. J Earth Sci Geotechn Eng 8:1–20 19. The World Bank.: Iraq–Country Water Resource Assistance Strategy: Addressing Major Threats to People’s Livelihoods. Water, Environment, Social and Rural Development Department Middle East and North Africa Region (2006). http://documents.worldbank.org/curated/ en/944501468253199270/pdf/362970IQ.pdf. Accessed 25 July 2020 20. Al-Jiburi HK, Al-Basrawi NH (2015) Hydrogeological map of Iraq, scale 1: 1000 000, 2013. Iraqi Bullet Geol Mining 11(1):17–26
Application ArcGIS on Modified-WQI Method to Evaluate Water Quality …
675
21. Frenken K (2009) Irrigation in the middle east region in figures, AQUASTAT Survey-2008. FAO Water Reports. Published by Food and Agriculture Organization of the United Nations (FAO), Rome, Italy, p 34. ISSN 1020–120 22. National-Center-of-Water-Resources-Management (NCWoRM).: Water Quality Study of Main Rivers in Iraq, Ministry of Water Resources, Iraq, annual internal report 23. Tyagi S, Sharma B, Singh P, Dobhal R (2013) Water quality assessment in terms of water quality index. American J Water Resour 1:34–38 24. Japan International Cooperation Agency (JICA).: Profile on Environmental and Social Considerations in Iraq. Law No. 25 System of Rivers and Other Water Resources Protection (Include of 45 Pollutants) (2011) 25. Alsaqqar AS, Hashim A, Mahdi A (2015) Water quality index assessment using GIS case study: Tigris River in Baghdad City. Int J Curr Eng Technol 5(4):2515–2520 26. Longley PA, Goodchild MF, Maguire DJ, Rhind DW (2005) Geographic information systems and science, 2nd edn. John Wiley & Sons, England 27. Chang KT (2006) Introduction to geographic information system. McGraw-Hill Higher Education, Boston 28. Panhalkar SS, Jarag AP (2015) Assessment of spatial interpolation techniques for river Bathymetry generation of panchganga river basin using geoinformatic techniques. Asian J Geoinformatics 15:10–15
Information Retrieval and Analysis of Digital Conflictogenic Zones by Social Media Data Maria Pilgun
and Alexander A. Kharlamov
Abstract The paper is concerned with information retrieval and analysis of digital conflictogenic zones based on social media data reflecting the users’ perception of road construction in Moscow. The material for the study was data from social networks, microblogs, blogs, instant messengers, videos, forums, and reviews dedicated to the construction of the South-East, North-East, and North-West Chords in Moscow. The study involved a transdisciplinary approach, neural network text analysis, content analysis, sentiment analysis, and analysis of word associations. The study made it possible to draw conclusions about the extremely tense situation around the construction of the South-East Chord. The level of aggression and social stress is quite high and approaching a critical point, and allows predicting further escalation of the conflict in the online and offline space. The construction of the North-East Chord causes some tension among the city residents and makes it possible to predict the development of a conflict in the virtual environment. The implementation of the North-West Chord project does not bear any special risks. Keywords Conflict · Social media · Neural network technologies · Psycholinguistics
M. Pilgun (B) Institute of Linguistics, RAS, Moscow, Bolshoy Kislovsky lane, 1 p. 1, Moscow 125009, RF, Russia e-mail: [email protected] A. A. Kharlamov Institute of Higher Nervous Activity and Neurophysiology, RAS, 5A Butlerova St., Moscow 117485, RF, Russia M. Pilgun · A. A. Kharlamov Moscow State Linguistic University, Moscow 119034, RF, Russia Higher School of Economics, Moscow 101000, RF, Russia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_59
677
678
M. Pilgun and A. A. Kharlamov
1 Introduction Social media data is actively used in the area of Information Retrieval to solve various problems, for example, to determine the attendance of major events by users [1], to predict the general mood and attitude to a particular topic [2], to develop recommendation algorithms that can be applied to decentralized social networks for community building [3], etc. The analysis of conflicts in the digital environment is a vital task and is performed using various approaches, in particular, using intelligent text analysis, machine learning, and agent-based modeling [4]. Social media data is becoming an important source in the analysis of social processes, models of automatic extraction of behavioral characteristics and perception of certain events and phenomena by citizens, since it allows real-time research on large data sets [5]. Objective of the study: Information retrieval and analysis of digital conflictogenic zones based on social media data reflecting the users’ attitude to the construction of the South-East, North-East, and North-West Chords in Moscow. The identification of conflictogenic zones and analysis was complicated, in particular, by the fact that the analyzed objects are of varying interest to the users who generate the content; accordingly, separate amounts of data were allocated that was relevant for the South-East, North-East, and North-West Chords. Despite the similarity, it is natural that various communication situations arise around the construction sites, due to objective and subjective reasons. To optimize the comparative analysis of digital conflictogenic zones and to increase the efficiency of assessing the situation, the social stress index and the social well-being index were identified. When analyzing social stress, the authors relied on works of [6]; when analyzing social well-being they relied on works of [7]. The two indices were calculated using a single algorithm presented below. The only difference is that for the social stress index the markers were assigned a rank with negative values, and for the social well-being index—with positive ones.
1.1 Data The material for the study was the data of social media, microblogs, blogs, instant messengers, videos, forums, and reviews dedicated to the road construction in Moscow (South-East Chord, North-East Chord, and North-West Chord). Date of collection: 07.01.2019 00:00–12.31.2019 23:59 (Table 1).
Information Retrieval and Analysis of Digital Conflictogenic Zones … Table 1 Quantitative parameters of the data set
679
Chord name
South-East
North-East
North-West
Messages
12 456
1 693
602
Actors
7 459
1 149
433
Loyalty
0,1
0,9
0,9
Involvement
144 028
16 006
7 858
Audience
97 070 580
17 688 221
6 822 650
1.2 Method The study involved a transdisciplinary approach, neural network text analysis, content analysis, sentiment analysis, and analysis of word associations. Data collection and sentiment analysis were performed using the Brand Analytics (br-analytics.ru) algorithms. Using the neural network technology TextAnalyst (analyst.ru/index.php?lang=eng&dir=content/products/&id=ta) developed by one of the authors of the paper, A. Kharlamov, the topic structure of the database and the semantic network obtained as a result of the analysis of the text as its semantic portrait were identified and analyzed; associative search and summarization were performed. The content analysis was performed using the AutoMap service (casos.cs.cmu.edu/projects/automap/). For visual analytics, the Tableau platform (https://www.tableau.com/) was used. Indices of social stress and well-being were also identified, and digital aggression was analyzed. Algorithm for deriving indices of social stress and well-being 1. Separating users’ comments. 2. Compilation of a marker list based on sentiment analysis, summarization, and semantic network analysis. 2.1. Sentiment analysis and content clustering by sentiment. 2.1.1. Summarization of a negative and positive clusters. 2.1.2. Formation of a semantic network for the negative and positive clusters. 2.1.3. Identification of markers from the negative cluster (for the social stress index). 2.1.4. Identification of markers from the positive cluster (for the index of social well-being). 2.2. Compiling an expert list of markers. 2.3. Combining the two obtained lists into a single dictionary. 3. Setting up the neural technology taking into account the resulting dictionary. 4. Derivation of indices of social stress and well-being of the analyzed content according to the formula: % = RmaxR BBmax , where R and B are current, and Rmax and Bmax are the maximum values of ranks and weights (in %), respectively (see also [8]).
680
M. Pilgun and A. A. Kharlamov
Algorithm for identification of digital aggression Since it was the identification of digital aggression that was the leading factor in the search for digital conflictogenic zones, the automatic determination of aggression was performed sequentially using two methods, which made it possible to make sure that the selected procedures were correct: • According to the method of Solovyov [9]. • According to the methodology of the authors of this paper, based on the analysis of lexical marks of aggression determined expertly. The set li of lexical tags making up the lexical mask L = {li }, i = 1..I [8] were automatically ranked within the analyzed text corpus by determining their semantic weight ri in this corpus [8]. Summing the ranks of lexical tags weighted by the degree of their significance wi in this subject domain, as assigned by an expert, allows calculating the integral degree of aggression present in the texts of the corpus: W = ri wi .
2 Results and Discussion The allocation of the topic structure of the database, the semantic network, associative search, summarization, and content analysis made it possible to identify the core of the digital conflictogenic zone for the content associated with each object. The core of the semantic network is made up of nominations with a link weight of 100–98 (Table 2). Also, the main topics were identified that form digital conflictogenic zones reflecting particular problems that concern the actors, and the negative attitude of citizens to the implementation of these road construction projects. Table 2 Core of the digital conflictogenic zone (link weights of 100–98) South-East Chord
North-East Chord North-West Chord
Death (South-East Chord) near the radioactive waste disposal site
Protest
Deforestation
(plant-burial site)
Radioactive dust
Traffic jams
Moscow Chernobyl
Rally
Noise
Ecological genocide
Rally-concert
Genocide of indigenous people and local residents
Eco-protest
War
Landfill
Radioactive contamination
Eko-watch
Radiation measurements Lawless actions of the Moscow authorities Construction lawlessness Ghetto Meweagainstchords
Information Retrieval and Analysis of Digital Conflictogenic Zones …
681
South-East Chord The danger of an ecological catastrophe, which, according to the actors, will lead to the physical death of residents; rejection of the actions of government entities during project implementation; environmental protest; combining the problems of renovation, expanding the boundaries of Moscow and road construction; accusing the builders of deceiving the population, conviction that the implementation of the project will lead to a deterioration of the transport accessibility situation and a sharp decline in the living standards of Muscovites; combining the problems of different regions into a single environmental protest; distrust of the expertise that builders provide. The content analysis revealed an extremely high level of the users’ negative assessment of the situation. Positive assessments are contained exclusively in official materials intended to emphasize the advantages of the project, benefits for residents, and improvement of the transport situation. Meanwhile, the negative attitude of residents to the project is so great that all official information is perceived with skepticism, for each positive argument of the authorities, activists put forward counterarguments that are supported by data and alternative opinions of experts. North-East Chord The danger to the life of citizens; problems with the Lyubertsy sewer collector; threat of destruction of parks and squares; threat of destruction of architectural monuments; growth of social tension; danger of a social explosion; danger of a technogenic and ecological disaster; danger of an ecological disaster; degradation of the living standards of local residents; increased background radiation; an increase in the number of oncological diseases; deterioration of transport links within the area; lower living standards of local residents. The analysis revealed a weaker degree of the conflict potential, in comparison with the previous case, in the users’ content reflecting the actors’ negative attitude to the situation around the North-East Chord. North-West Chord Deterioration of the ecological situation, growth of oncological diseases, deterioration of the quality of life, deterioration of the transport situation (low speed, traffic jams). Key topics of the content represent the official versions of construction and are related to similar projects and represent the North-West Chord as part of a new transport project. The sentiment analysis of the data showed the predominance of the neutral cluster in all three situations. This situation is explained by the large amount of content generated by official and biased sources. Meanwhile, a significant volume of the cluster with a negative sentiment also confirmed the conclusion about the extremely tense situation around the construction of the South-East Chord and the middle
682
M. Pilgun and A. A. Kharlamov
position of the North-East Chord, and showed an insignificant number of negative messages in the content on the North-West Chord (Figs. 1 and 2). Clustering and sentiment analysis of digital footprints of actors show that posts, likes, reposts, and views with negative sentiment dominate in the content regarding the construction of the South-East Chord. In most cases, users’ digital footprints with regards to the implementation of the North-East Chord project refer to the neutral cluster; however, a significant number of comments, reposts, likes, and views demonstrate a negative attitude of some users. The digital footprints of actors associated with the North-West Chord are insignificant and constitute a predominantly neutral cluster (Fig. 3). Meanwhile, it should be noted that for the determination of digital conflictogenic zones, the most effective is the method used to determine the presence of aggression, which indicates the extreme degree of the users’ rejection.
Fig. 1 Message sentiment
Fig. 2 Content sentiment
Information Retrieval and Analysis of Digital Conflictogenic Zones …
683
Fig. 3 Digital footprint sentiment
Thus, the presence of aggression and strong aggression from the audience of the content regarding the South-East Chord suggests that the degree of aggression is so great that attempts to influence by rational methods cannot be successful; they are perceived as just another attempt to thwart the residents’ rights. The audience for content related to the North-East Chord shows few reactions with the presence of aggression. The attitude of the audience for the North-West Chord is characterized by little aggression (Fig. 4). Aggression in the digital footprints of actors most clearly manifests the negative attitude of the users and makes it possible to identify specific features of the conflictogenic zone. Thus, users’ digital footprints in the content devoted to the construction of the South-East Chord contain a significant amount of aggression and strong aggression; in the content regarding the North-West Chord, actors’ digital footprints do not contain aggression at all; and in the content defining the conflictogenic zone with respect to the North-East Chord, there is an insignificant number of digital footprints with pronounced aggression and strong aggression (Fig. 5). During the study of social media data characterizing the communication situation regarding the construction of the South-East, North-East and North-West Chords,
Fig. 4 Aggression characterizing the audience
684
M. Pilgun and A. A. Kharlamov
Fig. 5 Aggression in users’ digital footprints
Table 3 Social stress and social well-being indices
Object name
Social stress index
Social well-being index
South-East Chord
39.3
19.7
North-East Chord
15.6
35.1
North-West Chord
0.4
19.5
the indices of social stress and social well-being were calculated, which made it possible to analyze the users’ attitude and to clarify the features of conflictogenic zones (Table 3). The analysis showed that expressive means of influence are actively used in the content generated by South-East Chord activists and skeptics that forms a conflictogenic zone; the emotional negative attitude is extremely intense; the online conflict and protests move to the offline space. In such a situation, the criterion of truth is equated to the type of assessment: any negative information is perceived as true, and positive information is perceived as knowingly false. The content that forms the conflictogenic zone of the North-East Chord is characterized by the active involvement of actors who are ready to take action mainly in the virtual environment. The conflictogenic zone regarding the North-West Chord can only be discussed conditionally due to its small size.
3 Conclusion During the study, digital conflictogenic zones were identified and analyzed on the basis of social media data reflecting the users’ attitude to the construction of the South-East, North-East, and North-West Chords in Moscow. The social stress and social well-being indices made it possible to characterize the communication situation, taking into account the level of the conflict potential and
Information Retrieval and Analysis of Digital Conflictogenic Zones …
685
the dynamics of digital aggression, to highlight the critical points of the development of the conflict and to determine the degree of the actors’ satisfaction with the progress of the construction projects. The analysis of digital conflictogenic zones allowed for the conclusions about the extremely tense situation around the construction of the South-East Chord. The level of aggression and social stress is quite high and approaching a critical point, and allows predicting further escalation of the conflict in the online and offline space. The construction of the North-East Chord causes some tension among the city residents and makes it possible to predict the development of a conflict in the virtual environment. The implementation of the North-West Chord project does not bear any special risks.
References 1. Lira de VM. Macdonald C, u;Ounis I, Perego R, Renso CT, Cesario V (2019) Event attendance classification in social media. Inform Process Manag 56(3):687–703 2. Nakov PRitter A, Rosenthal S, Sebastiani F, Stoyanov V (2019) SemEval-2016 task 4: sentiment analysis in twitter. arXiv:1912.01973v1 [cs.CL] 3. Trienes J, Cano AT, Djoerd H (2018) Recommending users: whom to follow on federated social networks. In: Proceedings of the 17th dutch-belgian information retrieval workshop (DIR) 4. Deutschmann E, Lorenz J, Nardin LG, Natalini D, Wilhelm AFX (eds) (2020) Computational conflict research. Springer, Heidelberg 5. Kullkarni V, Kern ML, Stillwell D, Kosinski M, Matz S, Ungar L, Skiena S, Schwartz HA (2018) Latent human traits in the language of social media: an open-vocabulary approach. PLoS ONE 28(11):13 6. Fink G (2016) Stress: concepts, cognition, emotion, and behavior, vol 1. Academic Press, New York 7. Johnson Sh, Robertson I, Cooper C (2018) Well-being productivity and happiness at work. Palgrave Macmillan UK, London 8. Kharlamov AA, Pilgun MA (eds) Neuroinformatics and semantic representations. Theory and applications. Cambridge Scholars Publishing, Newcastle upon Tyne 9. Soloviev AN (2019) Aggression and aggressiveness in social media: research on russianlanguage data. In: Proceedings of the international conference “Dialog 2019”. RSUH, Moscow
Introducing a Test Framework for Quality of Service Mechanisms in the Context of Software-Defined Networking Josiah Eleazar T. Regencia and William Emmanuel S. Yu
Abstract Ietworking architecture, supporting Quality of Service (QoS) has been challenging due to its centralized nature. Software-Defined Networking (SDN) provides dynamic, flexible, and scalable control and management for networks. This study introduces a test framework for testing QoS mechanisms and network topologies inside an SDN environment. Class-Based Queueing QoS mechanisms are tested as an anchor to test the introduced framework. Using a previous study as a benchmark to test the introduced framework, results show that the test framework works accordingly and is capable of producing accurate results. Moreover, results in this study show that the distributed Leaf-enforced QoS mechanisms have 11% lower latency compared to the traditional centralized Core-enforced QoS mechanisms. Leaf-enforced QoS also has approximately 0.22% more raw IP throughput than Core-enforced QoS. The HTTP throughput from the Apache Bench Transfer Rate showed that Leaf-enforced QoS with a 2.4% advantage of Core-enforced QoS. SDN is relatively new and there are many possible QoS strategies that can be applied and tested. These initiatives can benefit from an extensible testing framework. Keywords Software-defined networks · Quality of service · Class-based queueing · Web traffic · Streaming traffic
1 Introduction The Software-Defined Networking (SDN) architecture is a relatively new technology that provides a dynamic, flexible, and scalable control and management for networks by separating the control plane and data plane. The control plane handles the decision-making of the network while the data plane forwards packets [10]. As J. E. T. Regencia (B) · W. E. S. Yu Ateneo de Manila University, Loyola Heights, 1108 Quezon City, Philippines e-mail: [email protected] W. E. S. Yu e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_60
687
688
J. E. T. Regencia and W. E. S. Yu
more networking applications rapidly evolve and devices connected to the internet rapidly increase, managing networks for Quality of Service (QoS) across the network continues to become more challenging with the traditional networking architecture for reasons such as lack of flexibility and expensive costs. The Software-Defined Networking (SDN) Architecture has the capabilities of addressing the challenges that the traditional networking architecture face [9, 12]. This study introduces a testing framework for QoS algorithms and network topologies. As an anchor to test framework four (4) Class-based Queueing algorithms were tested. Each mechanism has two test cases. First is the traditional networking approach where QoS is enforced at a single core switch, Core-enforced Qos and second is Leaf-enforced QoS which is the SDN approach where QoS is distributed across forwarding devices closer to the edge, client leaf switches. Intuitively, the Core-enforced QoS should perform better since only a single forwarding device is enforced with QoS. However, given the benefits of SDN [9, 12], the goal for the simulations is for Leaf-enforced QoS to have comparable results with Core-enforced QoS. Hence, a student’s t-test is performed for every CBQ algorithm comparison of Core-enforced versus Leaf-enforced. If the resulting p-value is greater than α = 0.05, then the result is comparable, satisfying the goal of the simulations. SDN is relatively new and there are many possible QoS strategies that can be applied and tested. These initiatives can benefit from an extensible testing framework. The rest of this paper is organized as follows: Sect. 2 shows a brief literature review on SDN, Class-based, and Quality of Service in SDN. Section 3 discusses the framework of the research. In particular, Sect. 3 discusses the architecture of the introduced test framework and also the framework as to how the framework is setup for simulations. Section 4 discusses the methodology of this research and also the tools used for simulations. Section 5 discusses the results from the simulations. Lastly, this paper is concluded in Sect. 6.
2 Related Literature Multiple studies have started exploring Quality of Service (QoS) mechanisms with Software-Defined Networking SDN). Chato and Yu used the Mininet emulator [5] and a custom Pox OpenFlow Controller in order to explore the use of implementing a distributed QoS for SDN by studying the effects of raw IP throughput using Class-Based Queueing (CBQ) QoS algorithms [2] and the effects in terms of latency [3]. Both studies showed that QoS algorithms applied to the distributed nature of SDN have better performance in terms of bandwidth and decreased latency. Civanlar et al. [4] introduced a linear programming-based formula to calculate routes specifically for video traffic flows while routing other traffic flows using best-effort traffic on shortest paths. OpenQoS [6] uses packet header fields in order to classify incoming flows as either multimedia flows or data flows sending multimedia flows to QoS-rich paths while data flows are routed using best-effort routing. Ishimori et al. introduced QoSFlow which provides flexible control mechanisms by manipulating
Introducing a Test Framework for Quality of Service Mechanisms…
689
multiple packet schedulers such as Hierarchical Token Bucket (HTB), Random Early Detection (RED), and Stochastic Fairness Queuing (SFQ) schedulers [8].
3 Framework 3.1 Testing Framework Architecture In this study, a testing framework is designed based on the methodology in the Chato and Yu [2, 3] study. The aim of the framework is so that configurations for new Quality of Service (QoS) algorithms can be added more easily for testing than the current setup. The key component in the framework is the separation of the SDN Controller and the implementation of QoS algorithms. Figure 1 shows the architecture of the introduced testing framework. The use of separate configuration files is also another key component in this framework. The Topology Configuration is used to define the number of client hosts and the number of layers of client leaf switches. This is used by Mininet topology to create virtual network topology. The Class Profile Configuration File contains the Class Profile test code, the function name for the Class Profile as defined inside the CBQ Implementations Python File, the description of the QoS profile, and its test case whether QoS is core-enforced or leaf-enforced. This is used by the QoS Implementations Python File to serialize QoS mechanisms defined and used by Test Simulator for results filenames. Both the Topology Configuration and the Class Profile Configuration File are manually set by the user. The Load Configuration File, Source Queue Grouping Configuration File, Hosts Configuration File, and Client Switch Mapping Configuration File are designed as custom configuration files that contain details about the network and QoS configuration assignments in order to aid the QoS algorithms designed by network researchers. All these custom configuration files are serialized into a Python Pickle File which will be deserialized by the test simulator and the SDN controller. A researcher may create his or her own custom configuration files. All QoS algorithms in this study are written inside the QoS Implementations File where each QoS algorithm is implemented as a function and serialized into a Python pickle. The SDN Controller is started with a class profile code that should be present in the Class Profile Configuration File. Using the mentioned class profile code, the Controller deserializes the QoS Function pickle with the matching class profile code and unpickles it in order to implement the desired QoS mechanism. This is only performed during the initialization phase of the controller in order to avoid unnecessary additional cost to CPU resources. After each simulation, the Test Simulator separately saves results from the tools used in this framework: Ifstat, Ping, ApacheBench, and VLC streaming in the case of this study.
690
J. E. T. Regencia and W. E. S. Yu
Fig. 1 Test framework architecture
3.2 Conceptual Framework This study was performed using an Amazon Web Services (AWS) EC2 m5.xlarge Ubuntu 18.04 environment with 40GB storage size. The implementation of the network topology was performed using Mininet version 2.3 [7] along with a custom Ryu OpenFlow 1.3 Controller [1]. The topology setup used in this study is shown in Fig. 2. In this study, the implementation of Quality of Service (QoS) in the client leaf switches represents the distributed nature of Software-Defined Networking (SDN). In addition, all 70 client hosts are requesting HTTP and streaming services with the use of both Apache Bench and VLC in each host. Three (3) of the servers act as HTTP servers running Python3 simple.http and the other three (3) act as Video on
Introducing a Test Framework for Quality of Service Mechanisms…
691
Fig. 2 Virtual network topology used
Demand (VOD) servers running VLC Streaming Media using VLC version 3.8. All links in the network run at the default 10Gbps of Mininet.
3.3 Theoretical Framework 3.3.1
Clients Configuration
As mentioned in the Conceptual Framework Sect. 3.2, all 70 client hosts execute both HTTP and VLC streaming requests. Each client host is assigned whether it requests a low or a high configuration file or streaming media. Table 1 shows file size and video resolution for each configuration. The number of client hosts requesting to a specific server host is divided as equally as possible for both HTTP and VLC streaming servers. Each client host will request to a single HTTP server and a single VLC streaming server.
Table 1 Servers setup Server Server IP address Server 1 Server 3 Server 5 Server
10.0.1.101 10.0.1.103 10.0.1.105 Server IP address
Server 2 Server 4 Server 6
10.0.1.102 10.0.1.104 10.0.1.106
HTTP request file size Low test case High test case 100 KB 10 MB 16 MB 100 MB 100 MB 1 GB Streaming media video resolution Low test case High test case 360p 480p 480p 720p 720p 1080p
692
J. E. T. Regencia and W. E. S. Yu
Apache Bench (ab) is used to send HTTP requests. It is configured to send 50,000 HTTP requests over 10 concurrent connections. For the media streaming requests, the VLC media player is used to make streaming video requests. This is done through the use of VLC Remote Control configuration and a telnet session.
3.3.2
Server Configuration
For the server hosts, the number of servers divided among the six (6) server hosts is three (3) for each of HTTP and VOD servers. HTTP servers have two (2) jpeg files per server with different file sizes for low and high configuration requests. Python3 http.server is used to host each HTTP server. VOD servers all use VLC as their streaming server using Real Time Streaming Protocol (RTSP). Each VOD server has two (2) videos of the same content but have different video resolution.
3.3.3
Class Profiles
Class profiles in this study define the type of Class-Based Queueing (CBQ) algorithm implemented. These are defined in Table 2. In this study, Basic CBQ is classified traffic as HTTP traffic, Streaming traffic, and lastly, all other remaining traffic types. Prioritization and limiting of behavior between host groupings are to ensure that high traffic within a source IP group does not adversely affect others [2]. In this study, “at the Leaf” QoS mechanisms are referred to as Leaf-enforced QoS, and “at the Core” QoS Mechanisms are referred to as Core-enforced QoS.
Table 2 Class profiles Class profile Basic CBQ at Leaves Basic CBQ at Core Source CBQ at Leaves Source CBQ at Core Destination CBQ at Leaves Destination CBQ at Core Source-Destination CBQ at Leaves Source-Destination CBQ at Core
CBQ scheduling classes
Switch QoS
Traffic Protocol Traffic Protocol Source IP Address Grouping Source IP Address Grouping Destination IP Address Grouping Destination IP Address Grouping Source and Destination IP Address Grouping Source and Destination IP Address Grouping
Client Leaf Switches Core Switch Client Leaf Switches Core Switch Client Leaf Switches Core Switch Client Leaf Switches Core Switch
Introducing a Test Framework for Quality of Service Mechanisms…
3.3.4
693
Quality of Service (QoS) Configurations
Figure 2 shows which switches are the Client Leaf Switch and which switch is the Core Switch. Each switch that is enforced with QoS is allocated a bandwidth 1 Gigabit queue with three (3) queues inside of it. Each of the three (3) queues is allocated with a minimum and maximum bandwidth of 0.33 Gigabits each.
4 Methodology In order to evaluate the test framework, this study simulates web traffic using Apache Bench, streaming traffic using the VLC Streaming Media, and ICMP Ping Packets using the test framework for each Class Profile listed in Table 2 within a 5 min window. The following tools were used to perform the simulations and get data results concurrently. Ifstat is a tool that is used to get Bandwidth In and Bandwidth Out for every second during the 5 min window. This was performed and executed at the Core Switch (switch3-eth3) for all Class Profiles. The Ping command is used to record the Round Trip Time (RTT) or latency. RTT was measured in milliseconds (ms) by sending Ping packets to all destination servers from each of the 70 client hosts. Each destination server has its own measurement but the overall result was calculated by getting the mean results of all destination servers. ApacheBench was used to simulate the HTTP traffic to web servers. The data taken from this tool were the transfer rate and total data transferred. The Theoretical Framework Sect. 3.3.1 shows specifically how the Appache Bench simulation was performed. VLC Streaming Media software client take both De-multiplexer (Demux) Bytes Read (KB) and Demux Bitrate (in Kbps). The De-multiplexer takes feeds from disparate and separate streams and assembles them into a single coherent media stream for playing.
5 Results and Discussion Results in this study are shown statistically in Tables 3, 4, 5 and 6. All these data are taken from the tools mentioned in the Sect. 4. All results shown in this study are already processed and simplified to show the mean, standard deviation, minimum value, and maximum value. The experiments and results in this study both serve as an anchor to test the introduced test framework. In general, the results of this study approximately reflect results from the Chato and Yu [2, 3] study.
694
J. E. T. Regencia and W. E. S. Yu
Table 3 IFSTAT results in KB/s CBQ algorithm Bandwidth In Mean Basic CBQ at the Leaf Basic CBQ at the Core Src CBQ at the Leaf Src CBQ at the Core Dst CBQ at the Leaf Dst CBQ at the Core Src-Dst CBQ at the Leaf Src-Dst CBQ at the Core
Std. dev.
Bandwidth Out Mean
Std. dev.
857,241.0
57,046.89
2,913.0
231.94
735,776.12
54,882.73
2,430.67
207.28
832,973.98
58,974.58
2,783.07
213.55
847,453.89
60,688.61
2,912.67
221.89
867,723.37
58,904.75
2,962.94
225.07
856,943.42
43,341.81
2,942.93
180.69
849,536.17
61,722.35
2,855.84
266.45
746,912.39
60,402.67
2,482.74
217.64
Table 4 Apache bench results CBQ algorithm Total transferred (KB) Mean Std. dev. Basic CBQ at the Leaf Basic CBQ at the Core Src CBQ at the Leaf Src CBQ at the Core Dst CBQ at the Leaf Dst CBQ at the Core Src-Dst CBQ at the Leaf Src-Dst CBQ at the Core
Transfer rate (KBps) Mean Std. dev.
4,927,743.37
3,830,391.11
15,913.38
12,355.04
4,186,955.07
3,322,636.4
13,593.94
10,779.95
4,802,749.11
3,511,440.8
15,482.36
11,277.1
4,873,786.79
3,551,581.79
15,743.02
11,441.85
4,960,232.07
3,637,629.65
16,107.2
11,800.18
4,908,274.32
3,535,940.17
15,888.49
11,421.28
4,870,591.95
3,631,200.2
15,764.16
11,733.97
4,245,625.35
3,189,979.13
13,806.68
10,364.87
Introducing a Test Framework for Quality of Service Mechanisms… Table 5 VLC results CBQ algorithm Total bytes read (KB) Mean Std. dev. Basic CBQ at the Leaf Basic CBQ at the Core Src CBQ at the Leaf Src CBQ at the Core Dst CBQ at the Leaf Dst CBQ at the Core Src-Dst CBQ at the Leaf Src-Dst CBQ at the Core
695
Bitrate (Kbps) Mean
Std. dev.
25,569.0
13,064.05
737.06
390.54
25,574.98
13,071.17
731.58
374.32
25,180.24
13,498.72
739.39
401.78
25,323.76
13,073.63
733.78
384.76
25,559.52
13,052.95
741.84
396.46
25,565.41
13,071.55
734.46
380.75
25,294.83
13,390.5
735.89
391.87
25,011.5
13,640.5
727.83
380.15
5.1 IFSTAT Results Ifstat results for Bandwidth In and Bandwidth Out are shown in Table 3. Outliers in the raw data were removed. This was because the outliers represented the time frame where there was no HTTP and streaming traffic flows from Apache Bench and VLC. Hence, 90 s of data were removed from each QoS mechanism simulation. In general, the throughput results from Table 3 Bandwidth Out show that Leafenforced QoS perform better at approximately 0.23% than Core-enforced QoS. Specifically, Destination Class-Based Queueing (CBQ) resulted to highest raw IP throughput by an average of 6% better against all other mechanisms. Although it is noted that Leaf-enforced QoS mechanisms generally resulted to higher raw IP throughput compared to its Core-enforced QoS mechanism counterpart, SourceDestination CBQ at the Core resulted to a higher raw IP throughput than SourceDestination CBQ at the Leaf. A two-sample student’s t-test with significant level set to α = 0.05 was performed in order to test the statistical significance of SourceDestination CBQ at the Core against Source-Destination CBQ at the Leaf. The resulting p-value was 0.199 which is greater than α = 0.05. Hence the raw IP throughput advantage of Source-Destination CBQ at the Core over Source-Destination CBQ at the Leaf is statistically insignificant. Hence, for IP raw data, Leaf-enforced QoS mechanisms have more throughput—or at least as good as—compared to Coreenforced QoS mechanisms. This is despite having more nodes enforced with QoS for Leaf-enforced QoS mechanism.
696
J. E. T. Regencia and W. E. S. Yu
In addition, a separate study by the researchers using the introduced test framework with multiple layers in the topology and with more network traffic flow being generate showed results that all Leaf-enforced QoS resulted to significantly more raw IP throughput than Core-enforced QoS [11].
5.2 Apache Bench Results For the webserver simulations, ApacheBench (ab) was used to simulate and gather data. The following data taken in this test were the Total Transferred Data and the Transfer Rate which can be found Table 4. The HTTP data throughput shown in Table 4 Transfer Rate show that overall, Leaf-enforced QoS perform approximate 2.4% better against the Core-enforced QoS mechanisms. Destination Class-Based Queueing (CBQ) ath the Leaf mechanism also performed best by an average of 5.7% against all other mechanisms. Core-enforced Source-Destination CBQ QoS, however, had higher HTTP throughput than Leafenforced Source-Destination CBQ QoS by around −0.26%. Using a two-sample student’s t-test with significant level set to α = 0.05, the p-values resulted to 0.0.9853 for the performance difference between Core-enforced Source-Destination CBQ QoS and Leaf-enforced QoS. Hence the distributed nature of the Software-Defined Networking architecture performs at least as good as or even better compared to the traditional centralized networking architecture. The high standard deviation observed in Table 4 Transfer Rate is expected due to the bursty traffic nature of HTTP.
5.3 VLC Results VLC Streaming Media results are found in Table 5. The results in this section show that all CBQ algorithms in this study nearly have the same performance in terms of the VLC Streaming Media. Moreover, the streaming media throughput is shown in Table 5 Bitrate resulted in Core-enforced QoS performing approximately 0.2% better than Core-enforced QoS. The only Leaf-enforced QoS that had higher bitrate compared to its Core-enforced QoS counterpart was the Leaf-enforced Destination CBQ QoS algorithm with 1% higher bitrate than the Core-enforced Destination CBQ QoS algorithm. Despite that, the student’s t-test have shown that the advantages are statistically insignificant, hence, still comparable and thus, satisfying the goals of the experiments. For Basic CBQ, the Core-enforced Basic CBQ resulted to 0.2% higher bitrate than the Leaf-enforced CBQ but the difference had a p-value of 0.986 which is greater than α = 0.05. For Source CBQ, the Core-enforced Source CBQ resulted in 0.1.6% higher bitrate than the Leaf-enforced Source CBQ but the difference had a p-value of 0.8552 which is greater than α = 0.05. For Source-Destination CBQ, the Coreenforced Source-Destination CBQ resulted in 0.08% higher bitrate than the Leaf-
Introducing a Test Framework for Quality of Service Mechanisms… Table 6 Ping results CBQ algorithm Basic CBQ at the Leaf Basic CBQ at the Core Src CBQ at the Leaf Src CBQ at the Core Dst CBQ at the Leaf Dst CBQ at the Core Src-Dst CBQ at the Leaf Src-Dst CBQ at the Core
Round trip time (ms) Mean
Std. dev.
0.0305 0.03986 0.03145 0.03186 0.03004 0.0304 0.03244 0.0391
0.01381 0.02123 0.01564 0.016 0.01414 0.01428 0.01899 0.02288
697
enforced Source-Destination CBQ but the difference had a p-value of 0.9931 which is greater than α = 0.05. Furthermore, the 1% advantage of Leaf-enforced Destination CBQ QoS algorithm over Core-enforced Destination CBQ QoS algorithm resulted with a p-value of 0.9073 which is greater than α = 0.05. Meaning, the Core-enforced Destination CBQ is still comparable with Leaf-enforced Destination CBQ. This shows that the throughput of streaming traffic is not affected regardless of whether the enforcement of QoS is centralized or distributed.
5.4 Round Trip Time Results Table 6 shows results for Round Trip Time (RTT) for the ping simulation. Ping command was executed for all 70 client servers to all 6 destination servers. The results in Table 6 show the overall mean for RTT results of all 6 destination servers. RTT results show that Leaf-enforced QoS generally has lower latency compared to Core-enforced QoS by approximately 11% with Destination Class-Based Queueing (CBQ) at the Leaf having lowest latency by an average of 17% lower than all other CBQ mechanisms. Specifically, Destination Class-Based Queueing (CBQ) at the Leaf had 13.6% less latency against its Core-enforced counterpart, Destination ClassBased Queueing (CBQ) at the Core. Moreover, Basic CBQ at the Leaf has 11% lower latency than Basic CBQ at the Core, Source CBQ at the Leaf has 14.6% lower latency than Source CBQ at the Core, and Source-Destination CBQ at the Leaf has 5.3% lower latency than Source-Destination CBQ at the Core.
698
J. E. T. Regencia and W. E. S. Yu
6 Conclusion Cost, flexibility, and network management benefits of Software-Defined Networking (SDN) have caused network researchers to explore Quality of Service (QoS) mechanisms in SDN in order to ensure successful and efficient packet delivery across the network. As such, this study introduced a test framework for QoS mechanisms and network topologies specifically for the SDN architecture. The results in this study have shown that Leaf-enforced QoS generally performed better than Core-enforced QoS, which shows the advantage of the distributed nature of SDN over the traditional centralized nature of the networking architecture. This is despite having more points in the network enforced with QoS. There are many possible QoS strategies that can be applied and tested by exploiting the advantages of SDN. The use of the introduced test framework can benefit SDN researchers in QoS with the exploration of possible strategies.
References 1. URL https://github.com/faucetsdn/ryu 2. Chato O, Yu WES (2016) An exploration of various qos mechanisms in an openflow and sdn environment. In: Accepted for presentation in the international conference on systems and informatics (ICSAI-2016) 3. Chato O, Yu WES (2016) An exploration of various quality of service mechanisms in an openflow and software defined networking environment in terms of latency performance. In: 2016 International conference on information science and security (ICISS), pp 1–7. IEEE 4. Civanlar S, Parlakisik M, Tekalp AM, Gorkemli B, Kaytaz B, Onem E (2010) A qos-enabled openflow environment for scalable video streaming. In: 2010 IEEE Globecom workshops, pp 351–356. IEEE 5. De Oliveira RLS, Schweitzer CM, Shinoda AA, Prete LR (2014) Using mininet for emulation and prototyping software-defined networks. In: 2014 IEEE colombian conference on communications and computing (COLCOM), pp 1–6. IEEE 6. Egilmez HE, Dane ST, Bagci KT, Tekalp AM (2012) Openqos: an openflow controller design for multimedia delivery with end-to-end quality of service over software-defined networks. In: Proceedings of the 2012 Asia Pacific signal and information processing association annual summit and conference, pp 1–8. IEEE 7. Huang TY, Jeyakumar V, Lantz B, Feamster N, Winstein K, Sivaraman A (2014) Teaching computer networking with mininet. In: ACM SIGCOMM 8. Ishimori A, Farias F, Cerqueira E, Abelém A (2013) Control of multiple packet schedulers for improving qos on openflow/sdn networking. In: 2013 Second European workshop on software defined networks, pp 81–86. IEEE 9. Kim H, Feamster N (2013) Improving network management with software defined networking. IEEE Commun Mag 51(2):114–119 10. McKeown N, Anderson T, Balakrishnan H, Parulkar G, Peterson L, Rexford J, Shenker S, Turner J (2008) Openflow: enabling innovation in campus networks. ACM SIGCOMM Comput Commun Rev 38(2):69–74
Introducing a Test Framework for Quality of Service Mechanisms…
699
11. Regencia JET, Yu WES (2021) Latency and throughput advantage of leaf-enforced quality of service in software-defined networking for large traffic flows. Submitted to: SAI computing conference 2021 12. Yeganeh SH, Tootoonchian A, Ganjali Y (2013) On scalability of software-defined networking. IEEE Commun Mag 51(2):136–141
Building a Conceptual Model for the Acceptance of Drones in Saudi Arabia Roobaea Alroobaea
Abstract During the last few years, drones (also called UAV—Unmanned Aerial Vehicles) are gaining an outstanding success in different domains like: agriculture, healthcare, disasters management, construction activities, delivery applications, buildings and open spaces surveillance, etc. The most challenge in this work is that this research is unique in systematically constructing the conceptual model for the acceptance of Drones in Saudi Arabia. In this paper, a conceptual model is suggested for predicting the acceptance of drones. To reach this aim, an extensive literature review has been done covering several important aspects related to drones. Firstly, this study is dedicated for non-experts of the field to help them getting a first overview about this modern technology. The main topics covered by our article are: drones structure, main characteristics of drones, main fields of applications, regulatory and legal aspects, challenges and solutions for security and privacy issues, and models of technology acceptance. The acceptance model will be proposed to be tested experimentally in the extended paper as further work. Keywords Drones · Technology Acceptance Model(TAM2) · Structure · Characteristics · Legal aspects · Applications · Security · Trust · Experience · Privacy
1 Introduction Drones [25] are aircraft capable of flying manually or operating independently over a predetermined flight plan and returning to the departure point after completing the mission [30]. In past years, drones have become the focus of academia and industry, due to their potential application across a wide variety of applications, from civilian to military. For instance, it is known that they are used in geology, mining, R. Alroobaea (B) Department of Computer Science, College of Computers and Information Technology, Taif University, P.O.Box 11099, Taif 21944, Saudi Arabia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_61
701
702
R. Alroobaea
forestry, archaeology, hydrogeology, firefighting, traffic monitoring, packaging and agriculture [23]. Drones play significant roles in different application areas due to their characteristic attributes such as mobility, flexibility and adaptivity to different altitudes [30]. In addition, drones could provide services as aerial base stations for wireless communication systems. However, compared to conventional base stations, they have limited reliability, coverage and energy resources [12]. The Saudi government goes to use Drones applications extensively based on Saudi vision 2030. To achieve this objective, the General Authority of Civil Aviation launched its electronic services for licensing “drones”. Our question is to which extend the acceptance of technology Drones from the user’s perspective. Having extensively reviewed the current literature on models of technology acceptance, the author can claim that, to the best of his knowledge, this research is unique in systematically constructing the conceptual model for the acceptance of Drones in Saudi Arabia. In this paper, literature will be reviewed covering a variety of important aspects relevant to drones. Also, technology acceptance models will be discussed. The key topics discussed by our article are: the technology acceptance models, the layout of drones, the specific features of drones, the major fields of operation, the legal and regulatory aspects and the threats and remedies to security and privacy concerns. The rest of the paper is organized as follows. Section 2 presents aspects of drones, namely: drones structure, main characteristics of drones, and regulatory and legal aspects. Section 3 provides a list of the main current applications of drones in different fields. Section 4 deals with challenges and solutions related to drones security and privacy aspects. Section 5 reviews the models of technology acceptance. Section 6 shows the proposed model. Finally, Sect. 7 concludes the paper and discusses possible extensions of this article.
2 General Aspects 2.1 Structure of a Drone Simply put, a drone consists of three main components: • The chassis (or frame) is a little like a drone skeleton. Depending on the model and the number of weapons, it may take various forms. There are therefore drone hexacopters, quadcopters, tricopters, etc. In carbon fiber, plastic, aluminum or even wood, the chassis will actually differ between drone types. • The propulsion system consists of engines named more specifically rotors, propellers, a lithium polymer battery and speed controllers. • Ultimately, the flight controller is used to create a connection between the pilot and the drone via a connected receiver, by means of an integrated circuit equipped with a sensor, microprocessor and input/output pins.
Building a Conceptual Model for the Acceptance …
703
2.2 Some Characteristics In this paragraph, we highlight key the common parameters and limitations in use in existing drone modeling methodologies [34]: • Connectivity: Drones must preserve communication relations with the ground control station in order to collect instructions and transmit the information gathered. Since line-of-sight contact is usually needed, the signal is diminished under tree crowns, indoors, or in the shadows of buildings in urban areas. In addition, transmission lines and telecommunications towers can induce signal interference. Path planning techniques can therefore prevent or penalize the visitation of some regions [11]. Where mobile devices, such as, cell phones, are unable to connect directly to the macro-cell base station, drones may act as intermediaries operating as moving base stations [29]. Drones can use various wireless access techniques to provide communication services and may involve the assigning of unique time slots and/or frequencies to users. • Human operator: In many countries, drone regulations require a human operator [31]. Typically, a human operator conducts a variety of set-up actions prior to the launch of the drone and, following its landing [32], may have to monitor the drone and analyze information gathered. • Restricted payload: Payloads for packages delivery drones normally do not exceed three kilograms and each drone typically carries only one package per mission [2]. Limitations on payloads are tightly correlated to the configuration and size of the drone and to the capacity of the energy storage unit of the drone. • Motion: Drones can travel in 3D spaces. Drones autopilots are typically successful in maintaining flight stability, and landing and taking off autonomously. However, some aspects of drones motion do need to be considered carefully in the planning of drones operations. One of them is the minimum radius restriction when attempting to change directions during flights [15], which is specially critical for fixed-wing drones. • Flight range: Most drones hold a small capacity energy unit. Drone energy consumption depends on numerous parameters, such as flight conditions, weather conditions, drone type, payload, climbing speed, and flying altitude. The restricted capacity of the energy unit is typically modeled as the restricted number of addresses a drone may reach during one flight, the maximum flying distance, or the maximum operating time.
2.3 Regulatory and Legal Aspects There is a vast amount of research on the technological capabilities of drone technology, however there is far less research on the regulatory and legal aspects of drones activity and use. The law on the use of drones includes international and national legislation. Rapid growth of drones has been feasible because, unlike manned air-
704
R. Alroobaea
crafts, there exists no regulation on design and development of drones [1]. Drones must comply with the requirements of the manned air-crafts regarding airworthiness and the laws of the air. In fact, however, this is not feasible and, problematically, civil aviation regulations have been established at the national and international levels under the premise that there are professional aircraft staff who oversee flight operations. Furthermore, safety standards are geared towards the health of travelers. This renders these regulations non-applicable to unmanned air-crafts bearing unique specifications. Specific national authorities have implemented operating authorisation programs to ensure that commercial companies wishing to use drones or anyone wishing to explore new methods of using drones do so responsibly and securely. Related non-aeronautical regulations, including data privacy, piracy and terrorism legislation, must also be complied with internationally.
3 Applications In this section, we describe some areas of application of drones [34]: – Delivery Applications [9]: They involve package deliveries to rural regions and first and last mile deliveries in cities and suburbs and also express deliveries. Various technological approaches have been introduced to accommodate a variety of applications, involving drone landing at required places and the use of parachutes or tethers to lower the object. – Agricultural Activities [7]: Agriculture is yet another interesting use of drones. Drones can assess crop health, spray treatments and fertilizers, track livestock, map soil properties, etc. – Construction Activities [5]: Drones has also been applied to monitor progress at current construction sites, to analyze the landscape at potential construction sites, to examine available resources, and to periodically inspect the facilities as part of maintenance. Drones also present safety advantages by replacing people for risky inspections. – Filming [14]: Drones could go locations that no other system can, and in ways that even the most advanced camera equipment obviously could not match. Drones can also be designed to film high-speed chases, chasing subjects across mountainous highways, busy streets, or some other environment that would be hard to use in another strategy. – Augmented and Virtual Reality [17]: In few words, augmented reality enables virtual objects to blend with actual object images when viewed through your camera. Mixing drones with augmented reality may look as follows. While your drone is traveling, you could see the landscape on your smartphone screen or through special glasses. Then, take another step forward and consider that you might see not only actual objects as the drone views them, but also some additional images, text or markings over them.
Building a Conceptual Model for the Acceptance …
705
– Disasters and Accidents Examination [35]: In the event of a major disaster that causes severe damage, there is a need to rapidly track the safety of survivors and the level of damage in the disaster-hit region. Drones are expected to play a main role in such isaster recovery processes. Since drones could be deployed rapidly around disaster areas, they are used to create 3D maps, scan casualties, and evaluate destroyed infrastructures. – Fighting the COVID-19 pandemic [8]: Many countries across the world have joined forces with various innovators and scientists in an attempt to find imaginative ways to use drones to combat the COVID-19 pandemic: crowd surveillance, screening crowds, public announcements, spraying disinfectants, delivery of medical supplies. – Border Surveillance [26]: As drones fly in the sky and capture anything with a wide angle, it’s easy to watch a large area with one drone. With the help of artificial intelligence, the intrusion from the video captured by the drone is easy to detect. Just one drone could do the job of 10–30 border guards very easily, which is very realistic.
4 Privacy and Security The use of drones could possibly violate the data security and privacy of individuals and present a risk to national institutions and governments [18, 21, 22]. For this reason, sophisticated verification [19] and testing techniques [20] are needed in order to prevent malicious attacks. In this paragraph, we examine privacy and security issues and also the current possible solutions [16].
4.1 Challenges – Privacy Leakage: Two major types of privacy leakage, notably identity privacy and locations rivacy. Identity privacy allows the drone’s true identity be secured. Nevertheless, when a conflict arises, it can be tracked and arbitrated efficiently by the competent authority. In other terms, we have to avoid allowing unauthorized drones to function in the airspace, and an authentication process is required for safe communication, guaranteeing that the identification of drones is not lost. On the other side, drones are expected to transmit their addresses in order to prevent congestion on the navigating route. This leads to questions about location privacy, as repeated broadcasts of drone geographic address information could lead to physical dynamic tracking attacks. Even hiding their true position via pseudonyms can not avoid such an attack. – Revealed Data in Cloud Storage: If drone data (e.g. locations during navigation, and sensing data such as surveillance videos and images) are saved in plain text, it can be accessed by cloud services providers employees. The naive approach would be
706
R. Alroobaea
to encrypt data before transferring (encrypted) data to the cloud. However, drones do not have the technical capabilities to achieve large dataset encryption. Moreover, it is recognized that searching for encrypted data is operationally impractical or ineffective.
4.2 Solutions – Identity/Location Protection: Symmetric-key encryption techniques, like energy efficient and lightweight algorithms, should be developed for use on resourceconstrained computers. For instance, the authors of [33] employed Advanced Encryption Standard (AES) techniques to encrypt the device current location. The key management framework for sensor networks, proposed in [36], could accomplish both forward and backward protection while allowing drones to enter and exit the current communication group. Moreover, every drone uses randomly the credential given by the trusted authorities to produce the group signature [24]. Other than encrypting the flight path using Elgamal and AES schemes, the authors of [16] suggest attaching a zero-knowledge range proof so that confidential data (e.g. maximum hops or expiration time) would not be revealed in the navigation reply. Trusted authorities can monitor any fraudulent drones which does not accurately obey the rules, for the traceability of group signatures. – Data Outsourcing Protection: The authors of [16] have implemented security policies for data from drones to cloud servers, thus allowing versatile access to stored data. Its architecture is based on identity-based encryption (IBE). However, this encryption technique can not be used effectively for resource-restricted drones. The authors therefore suggested the use of a compact IBE scheme to simplify the safe sharing of drones data. Especially, the proposed scheme has two overlapping properties with traditional IBE: the capacity to produce a public key independently from the respective secret key and the capacity of using arbitrary strings to produce a public key.
5 Models of Technology Acceptance There are two research questions that need to be answered in this work. The first one is that there is needed to know whether Drones in Saudi Arabia is accepted by the users. The second one is there a desire to continue usage which comes after acceptance depends on users’ satisfaction. This leads to continuing in increasing the investment in Drones technology [13]. In this regards, there are many theories and models have been developed from different theoretical perspectives for technology acceptance. In Saudi Arabia, many acceptance models are developed for different technologies from a user’s perspective like mobile transaction [4] and mobile government services[3]. However, author can
Building a Conceptual Model for the Acceptance …
707
Fig. 1 The classification of the technology acceptance theories. Cited from [27]
claim to the best of his knowledge that there is not acceptance model for Drones in Saudi Arabia. Thus, a systematic review have been done in this research to find the most powerful technology acceptance model to serve our aim of studying acceptance model of Drones in Saudi Arabia. Figure 1 shows the summary of review several papers of theories and models that have done by [28]. They classified the acceptance theories into two classifications which are methods for development and methods for scientific field. These methods have constructs which are; Attitude Toward Behavior, Subjective Norm, Beliefs, Evaluation, Normative Beliefs, Affect Towards Use, Social Factors, Facilitating Conditions, Relative Advantage, Perceived Behavioral Control, Actual Behavioral Control, Behavioral Beliefs, Control Beliefs, Perceived Usefulness, Perceived Ease of Use, Image, Ease of Use, Visibility, Compatibility, Results Demonstrability, Voluntariness of Use, Extrinsic Motivation, Job Relevance, Output Quality, Result Demonstrability, Experience Complexity, Long-term Consequences, Anxiety, Affect, Self-efficacy, Outcome Expectations Personal, Outcome Expectations Performance, Intrinsic Motivation [27, 38, 39]. Furthermore, many studies showed many factors that should be consider when measuring the acceptance of a technology such as trust, attribution, privacy, competence, integrity, benevolence, operator [6, 10]. The next section the proposed model will be explained based on reviewing done in this paper.
6 Proposed Model Reference [27] emphasis in his paper that Technology Acceptance Model (TAM2) that was developed by [38] is comprehensive or complete because of consist of many constructs contributing to the acceptance behavior (Fig. 3). Reference [37] pointed out that TAM2 is more flexible model due to it abilities to capture some main psychological factors that effect in adopting or not. Even TAM2 is good model for adoption (see Fig. 2). Due to these advantages, TAM2 model is adopted. In addition, some factors that mentioned above will be adopted to build the conceptual model for the acceptance of Drones in Saudi Arabia. Figure 3 shows this model. The next step is to prepare the instruments, experiment design, a qualitative and a quantitative data, procedure and recruit participants to validate and test the proposed model and improve it and get the final version of the model for the acceptance of Drones in Saudi Arabia.
708
R. Alroobaea
Fig. 2 TAM2 model. Cited from [37]
Fig. 3 The conceptual model
7 Conclusion In this paper, the extensive review have been done on various models, which lead to an integrated model based on TAM2 and external factors such as trust, privacy, operator, and attribution, to propose the conceptual model for the acceptance of Drones in Saudi Arabia. The next step is to validate this model and its factors to examine their direct influence on users’ intentions to use Drones technology. Further work and examine will determine the direct influence of the conceptual model on private companies adoption.
Building a Conceptual Model for the Acceptance …
709
References 1. Agapiou A (2012) Drones in construction: an international review of the legal and regulatory landscape 2. Agatz N, Bouman P, Schmidt M (2018) Optimization approaches for the traveling salesman problem with drone. Transp Sci 52(4):965–981 3. Alonazi M, Beloff N, White M (2019) Developing a model and validating an instrument for measuring the adoption and utilisation of mobile government services adoption in saudi arabia. In: 2019 Federated conference on computer science and information systems (FedCSIS), pp 633–637. IEEE 4. Alqahtani MA, AlRoobaea RS, Mayhew PJ (2014) Building a conceptual framework for mobile transaction in Saudi Arabia: A user’s perspective. In: 2014 science and information conference, pp 967–973. IEEE 5. Ashour R, Taha T, Mohamed F, Hableel E, Kheil YA, Elsalamouny M, Kadadha M, Rangan K, Dias J, Seneviratne L, Cai G (2016) Site inspection drone: A solution for inspecting and regulating construction sites. In: 2016 IEEE 59th International midwest symposium on circuits and systems (MWSCAS), pp 1–4 6. Boucher P (2016) ‘you wouldn’t have your granny using them’: drawing boundaries between acceptable and unacceptable applications of civil drones. Sci Engi Ethics 22(5):1391–1418 7. Budiharto W, Chowanda A, Gunawan AAS, Irwansyah E, Suroso JS (2019) A review and progress of research on autonomous drone in agriculture, delivering items and geographical information systems (gis). In: 2019 2nd world symposium on communication engineering (WSCE), pp 205–209. IEEE 8. Chamola V, Hassija V, Gupta V, Guizani M (2020) A comprehensive review of the covid-19 pandemic and the role of iot, drones, ai, blockchain, and 5g in managing its impact. IEEE Access 8:90225–90265 9. Choudhury S, Solovey K, Kochenderfer MJ, Pavone M (2020) Efficient large-scale multidrone delivery using transit networks. In: 2020 IEEE international conference on robotics and automation (ICRA), pp 4543–4550. IEEE 10. Clarke R, Moses LB (2014) The regulation of civilian drones’ impacts on public safety. Comput Law Secur Rev 30(3):263–285 11. Ergezer H, Leblebicio˘glu K (2014) 3d path planning for multiple uavs for maximum information collection. Intell Robot Syst 73(1–4):737–762 12. Ever E, Gemikonakli E, Nguyen HX, Al-Turjman F, Yazici A (2020) Performance evaluation of hybrid disaster recovery framework with d2d communications. Comput Commun 152:81–92 13. Hong S, Thong JY, Tam KY (2006) Understanding continued information technology usage behavior: a comparison of three models in the context of mobile internet. Decision Support Syst 42(3):1819–1834 14. Huang C, Yang Z, Kong Y, Chen P, Yang X, Cheng KTT (2019) Learning to capture a filmlook video with a camera drone. In: 2019 international conference on robotics and automation (ICRA), pp 1871–1877. IEEE 15. Hutton C (2019) Augmented reality interfaces for semi-autonomous drones. In: 2019 IEEE conference on virtual reality and 3D user Interfaces (VR), pp 1361–1362 16. Ilgi GS, Ever YK (2020) Critical analysis of security and privacy challenges for the internet of drones: a survey. In: Drones in smart-cities, pp 207–214 17. Kim SJ, Jeong Y, Park S, Ryu K, Oh G (2018) A survey of drone use for entertainment and avr (augmented and virtual reality). In: Augmented reality and virtual reality, pp 339–352. Springer 18. Krichen M, Alroobaea R (2019) Towards optimizing the placement of security testing components for internet of things architectures. In: 2019 IEEE/ACS 16th international conference on computer systems and applications (AICCSA), pp 1–2. https://doi.org/10.1109/ AICCSA47632.2019.9035301 19. Krichen M (2019) Improving formal verification and testing techniques for internet of things and smart cities. Mobile networks and applications, pp 1–12
710
R. Alroobaea
20. Krichen M, Alroobaea R, Lahami M (2019) Towards a runtime standard-based testing framework for dynamic distributed information systems 21. Krichen M, Cheikhrouhou O, Lahami M, Alroobaea R, Maâlej AJ (2017) Towards a modelbased testing framework for the security of internet of things for smart city applications. International conference on smart cities. infrastructure, technologies and applications. Springer, Cham, pp 360–365 22. Krichen M, Lahami M, Cheikhrouhou O, Alroobaea R, Maâlej AJ (2020) Security testing of internet of things for smart city applications: a formal approach. In: Smart infrastructure and applications, pp 629–653. Springer, Cham 23. Lin C, He D, Kumar N, Choo KKR, Vinel A, Huang X (2018) Security and privacy for the internet of drones: Challenges and solutions. IEEE Commun Mag 56(1):64–69 24. Lin X, Li X (2013) Achieving efficient cooperative message authentication in vehicular ad hoc networks. Trans Vehic Technol 62(7):3339–3348 25. Merkert R, Bushell J (2020) Managing the drone revolution: a systematic literature review into the current use of airborne drones and future strategic directions for their effective control. J Air Transp Manag 89:101929 26. Mojib EBS, Haque AKMB, Raihan MN, Rahman M, Alam FB (2019) A novel approach for border security; surveillance drone with live intrusion monitoring. In: 2019 IEEE international conference on robotics, automation, artificial-intelligence and internet-of-things (RAAICON), pp 65–68 27. Momani AM, Jamous M (2017) The evolution of technology acceptance theories. Int J Contemp Comput Res (IJCCR) 1(1):51–58 28. Momani AM, Jamous MM, Hilles SM (2018) Technology acceptance theories: review and classification. In: Technology adoption and social issues: concepts, methodologies, tools, and applications, pp 1–16. IGI Global 29. Mozaffari M, Saad W, Bennis M, Debbah M (2016) Efficient deployment of multiple unmanned aerial vehicles for optimal wireless coverage. IEEE Commun Lett 20(8):1647–1650 30. Mozaffari M, Saad W, Bennis M, Nam YH, Debbah M (2019) A tutorial on uavs for wireless networks: applications, challenges, and open problems. IEEE Commun Surv Tutorials 21(3):2334–2360 31. Murphy RR (2014) Disaster robotics. MIT Press 32. Murray CC, Chu AG (2015) The flying sidekick traveling salesman problem: optimization of drone-assisted parcel delivery. Transp Res Part C: Emerg Technol 54:86–109 33. Ni J, Lin X, Zhang K, Shen X (2016) Privacy-preserving real-time navigation system using vehicular crowdsourcing. In: 2016 IEEE 84th vehicular technology Conference (VTC-Fall), pp 1–5. IEEE 34. Otto A, Agatz N, Campbell J, Golden B, Pesch E (2018) Optimization approaches for civil applications of unmanned aerial vehicles (uavs) or aerial drones: a survey. Networks 72(4):411– 458 35. Rashid MT, Zhang DY, Wang D (2020) Socialdrone: an integrated social media and drone sensing system for reliable disaster response. In: IEEE INFOCOM 2020—IEEE Conference on computer communications, pp 218–227 36. Roman R, Alcaraz C, Lopez J, Sklavos N (2011) Key management systems for sensor networks in the context of the internet of things. Comput Electr Eng 37(2):147–159 37. Silva AG, Canavari M, Sidali KL (2018) A technology acceptance model of common bean growers’ intention to adopt integrated production in the brazilian central region. Die Bodenkultur: J Land Manag Food Environ 68(3):131–143 38. Venkatesh V, Davis FD (2000) A theoretical extension of the technology acceptance model: four longitudinal field studies. Manag Sci 46(2):186–204 39. Venkatesh V, Morris MG, Davis GB, Davis FD (2003) User acceptance of information technology: toward a unified view. MIS Quart, pp. 425–478
A Channel Allocation Algorithm for Cognitive Radio Users Based on Channel State Predictors Nakisa Shams, Hadi Amirpour, Christian Timmerer, and Mohammad Ghanbari
Abstract Cognitive radio networks can efficiently manage the radio spectrum by utilizing the spectrum holes for secondary users in licensed frequency bands. The energy that is used to detect spectrum holes can be reduced considerably by predicting them. However, collisions can occur either between a primary user and secondary users or among the secondary users themselves. This paper introduces a centralized channel allocation algorithm (CCAA) in a scenario with multiple secondary users to control primary and secondary collisions. The proposed allocation algorithm, which uses a channel state predictor (CSP), provides good performance with fairness among the secondary users while they have minimal interference with the primary user. The simulation results show that the probability of a wrong prediction of an idle channel state in a multi-channel system is less than 0.9%. The channel state prediction saves the sensing energy by 73%, and the utilization of the spectrum can be improved by more than 77%. Keywords Cognitive radio · Neural networks · Prediction · Idle channel
1 Introduction The rapid demand for bandwidth, the increasing growth of wireless technology, and the scarcity of spectrum resources create a strong motivation for researchers to find an effective way of sharing the available radio spectrum [1]. The main task N. Shams Department of Electrical Engineering, École de technologie supérieure, Montreal, QC, Canada H. Amirpour (B) · C. Timmerer · M. Ghanbari Institute of Information Technology, Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria e-mail: [email protected] C. Timmerer Bitmovin, Klagenfurt, Austria M. Ghanbari University of Essex, Colchester, UK © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_62
711
712
N. Shams et al.
in a cognitive radio (CR) system is to determine the available frequency bands by exploiting spectrum sensing. Artificial intelligence-based learning algorithms can be efficiently utilized to solve problems related to providing access to the dynamic spectrum. There are a large number of learning algorithms, including neural network (NN) [2], convolutional NN [3], learning with Hidden Markov Models [4], Deep NN [5], etc., which have been used as spectrum predictors in CR networks. A recurrent neural network (RNN) has been used in [6] to build multi-step channel prediction in the frequency domain. The RNN predictor shows considerable performance efficiency in both noiseless and noisy channels. However, it suffers from high computational complexity [7]. A convolutional NN is employed in [8] for spectrum prediction. This method of prediction provides high adaptability within a dynamic environment, however, there is no information on fairness among cognitive users. The main contribution of this paper is to propose a centralized channel allocation algorithm for secondary users considering collisions with other secondary users and providing fairness between all users. Since secondary users have no information about the distribution of the primary user traffic, there is a need to have algorithms to determine the presence probability of the primary user in each channel. However, these algorithms should be able to predict and exploit the spectrum holes of the channels simultaneously. Therefore, secondary users act as decision-makers to probe the channels sequentially to achieve the maximum utilization of the spectrum holes. Neural networks do the channel allocation and decision-making in each secondary user. The neural networks help predict the idle probability of channels and reduce energy consumption by reducing the amount of energy consumed during spectrum sensing. The remainder of the paper is organized as follows.. A dynamic spectrum allocation algorithm using channel state prediction is presented in Sect. 2. In Sect. 3, the CR performance using channel state prediction is presented. The performance of the proposed algorithm is evaluated in Sect. 4. The performance of the CR in terms of the spectrum usage improvement and sensing energy reduction is presented in Sect. 5. Finally, Sect. 6 concludes the paper.
2 CCAA Using Channel State Prediction In this section, we propose our CCAA in a scenario with several secondary users to control secondary and primary collisions. Secondary users in a CR system try to find and use the idle slots to reduce their interference with the primary users. Therefore, there is a need to employ a CSP to identify and choose the best communication channel. Since it is difficult to achieve the communication features of a primary user in a CR network, the knowledge requirements of primary users can be eliminated by using the CSP. Sun [8] proposed a CSP based on two types of neural networks, including time delay neural network (T D N N ) and recurrent neural network (R N N ). T D N N s are defined as feed-forward networks with a delay line as network input, and R N N s are defined as BP networks with a feedback connection from the output to
A Channel Allocation Algorithm for Cognitive Radio Users …
713
their inputs. This feedback from the RNN provides the ability to identify the system and create variable patterns over time. Both RNN and TDNN are suitable to be used in a state prediction design. In this work, we used the CSP proposed in [7]. Lets assume there are n secondary users on the network, i.e., n ∈ {1, . . . , N }, and each secondary user uses a CSP proposed in [7]. It is assumed that the distribution of traffic on the channels of the primary user is the same for all secondary users. Also, we assume that in each time slot, channel m is free with a probability of Pm and occupied with a probability of 1 − Pm . In the proposed channel allocation algorithm, each secondary user tries to learn the traffic model of the primary user by using its CSP. Based on the learning of the traffic distribution of the primary user, each secondary user selects a specific idle channel with the highest probability in time slot t for data transmission. By considering several secondary users in the system, the concept of secondary interaction is defined. This means that if several secondary users choose the same channel for data transmission, secondary interference between secondary users occurs. Thus, if there is more than one secondary user in the network, two types of interference can be defined: • Interference between a primary user and secondary users: This interference occurs when the secondary user selects the channel occupied by the primary user and transmits data on that channel. • Interference between secondary users: This interference occurs when more than one secondary user select the same channel for data transmission. It should be noted that avoiding interference between secondary users and a primary user is far more critical than that between secondary users. In step one, when a secondary user selects a channel based on its predictor output, this channel is compared with the traffic of the primary user in the same time slot. If the selected channel is idle, it can be used for data transmission. On the other hand, suppose a primary user already occupies the selected channel. In that case, the secondary user can wait and check again its chance to access the channel in the next time slot, or secondary users can switch between channels as mentioned above. In step two, in a scenario with several secondary users, the channel selection should consider the probability of interference between the secondary users. If there is no primary user in the considered channels, it should be determined how many secondary users have chosen the same channel. If more than one secondary user selects the same channel, secondary interference occurs in that channel. To avoid interference, the colliding channel is assigned to only one of the secondary users. Suppose interference between secondary users also occurs in the next time slot. In that case, the channel allocation is performed based on the secondary users’ waiting time for accessing the channel and bandwidth usage and continues. In fact, secondary users try to choose the best channel by learning the traffic distribution of the primary user. Consequently, a large number of secondary users choose the same channel and collision occurs. Therefore, we present the channel allocation based on waiting time, which ensures fairness among the secondary users using unused channels to avoid collisions among themselves. The flowchart shown in Fig. 1 illustrates the proposed CCAA for a multi-secondary user scenario.
714
N. Shams et al. Start
Assign inial values i=1, m=1, n=1
YES
Number of me slots< i
NO Select a channel by the secondary users based on predictor output
n = n +1
YES
i = i +1
YES
Traffic_PU(T,m,n) == 0
NO
Number of channels < m NO
NO
Number of users < n
YES
Allocate channel to the secondary user in this me slot based on the waing me
m = m +1 Stop
Fig. 1 The proposed CCAA for multiple secondary users. T, m, n, and tra f f ic_PU denote time slot, number of channels, number of secondary users, and distribution traffic of primary user, respectively
3 CR Performance with Using Channel State Prediction The cognitive radio network allows secondary users to use the specific spectrum of primary users temporarily without interference. This is done by sensing the spectrum of the primary users to determine any accessible channel spectrum. However, considerable energy is consumed by sensing the spectrum. Using predictive approaches to detect spectrum holes can help to reduce energy consumption. If a predicted channel is free, secondary users will only detect the desired channel. Consequently, secondary users can save energy detection by preventing the occupied channels from being detected during sensing. In addition, spectrum usage can be improved by achieving a low probability of false prediction in an idle channel. Besides, the appropriate bandwidth in the following time slot can be approximated by using channel state prediction, with secondary users being able to control the data transfer rate. The benefits of channel state prediction are expressed using the percentage of improvement in spectrum usage and the percentage of reduction in sensing energy donated by SU I (%) and S E R(%), respectively. (1) The percentage of improvement in spectrum use: Assuming a primary user system with M channels, m ∈ {1...M}, with different traffic distribution of the primary users. Due to hardware limitations, each secondary user is enabled to sense only one channel during a time slot. Also, consider that each secondary user receives a delayed sequence of previous channel states. Other cognitive users can accumulate these sensing results through a common control channel. Spectrum usage (SU ) is calculated as follows [9–11]: SU =
NS I N AI
(1)
A Channel Allocation Algorithm for Cognitive Radio Users …
715
where N S I represents the number of time slots sensed as idle and N AI represents the overall number of accessible idle slots in M channels during T time slots. Because of channel state prediction, the percentage of improvement in spectrum usage can be expressed as SU I (%) =
SU pr edict − SUsemse SUsense
(2)
where SUsense represents the spectrum usage where in each time slot secondary users randomly select a channel and then sense that channel state. SU pr edict expresses the spectrum usage when cognitive users try to sense the channel, which is randomly selected among those channels, predicted idle state [9]. By replacing Eq. 2 in Eq. 3, SU I (%) can be given by SU I (%) =
I pr edict − Isemse Isense
(3)
where Isense and I pr edict represent the number of sensed idle time slots and the number of the sensed time slots which are predicted to be idle, respectively. (2) The percentage of sensing energy reduction: By assuming a unit of the required energy to be sensed in a time slot, there are two approaches in calculating the total sensing energy for a single channel scenario. S E S gives the overall sensing energy for all the time slots, and the total sensing energy for those channels whose states are idle during the respective time slot can be expressed by S E P as follows [9–11]: S E S = (T otal number o f time slots during T time slots) × (unit sensing energy) S E P = S E S − ((Bp) × (unit energy sensing))
(4) (5)
where B p is the total number of the time slots predicted to be busy. Thus, by using Eqs. 5 and 6, the percentage of reduction in sensing energy S E R can be given by SE R =
Bp SES − SE P = SES total number o f time slot
(6)
It means that the sensing process is not performed if the state of the time slot is predicted to be busy. Therefore, it leads to sensing energy saving for the measurement.
4 Performance Evaluation of the CCAA The performance of the CCAA will be evaluated by two parameters, (i) fairness and (ii) channel switching. Thanks to channel allocation, it introduces fairness among all existing cognitive users who will benefit from the same level of channel accessibility during given time slots. Besides, employing a CCAA helps secondary users
716
N. Shams et al. 10000
400
300
200
100
0
TDNN-1HL
TDNN-2HL
RNN-1HL
RNN-2HL
8000
6000
4000 SU1 SU2 SU3 SU4 SU5
2000
0
TDNN-1HL TDNN-2HL
RNN-1HL
secondr user index
secondr user index
(a)
(b)
RNN-2HL
Spectrum Usage Improvement
10 4
2
Improvement in Spectrum Utilization
SU1 SU2 SU3 SU4 SU5
fairness between secondary users
number of channel switching
500
1.9
1.8
1.7
1.6
TDNN-1L TDNN-2L RNN-1L RNN-2L
1.5
1.4
3
4
5
6
7
8
9
10
number of channel
(c)
Fig. 2 a Comparing number of channel switchings by five secondary users under static traffic condition in T = 20,000. b Comparing access to the channels by five secondary users under static traffic condition in T = 20,000. c Comparing the CSPs performance in the percentage of spectrum usage improvement
minimize switching channels to figure out the idle channels. The results achieved from the channel allocation algorithm are shown in Fig. 2a. The predictors’ performance using one hidden layer TDNN predictor (TDNN-1HL), two hidden layers TDNN predictor (TDNN-2HL), one hidden layer RNN predictor (RNN-1HL), and two hidden layers RNN predictor (RNN-2HL), are compared in terms of the number of channel switches. In this section, a network with five secondary users is considered under stationary traffic condition (N = 5). As shown in Fig. 2a, the number of channel switching using TDNN predictors and RNN predictors is less than 400 out of 20000 time slots and 200 out of 20000 time slots, respectively. The channel allocation algorithm tries to reduce both primary and secondary collisions. Figure 2b shows the performance comparison of four considered channel state predictions that use the channel allocation algorithm to access an idle channel. The proposed algorithm ensures sufficient fairness among secondary users and maintains fairness among them. Secondary users using RNN-2HL can access more channels than other users. The art of channel access is reduced by increasing the number of secondary users on the network because of increased interference.
5 CR Performance in Spectrum Usage Improvement and in Sensing Energy Reduction We consider eight primary channels (M = 8) with different traffic distributions. The selection of the channel is made sequentially in accordance with different channel models for the primary user system proposed in [7]. By using TDNN and RNN predictor, sensing time slot that has been predicted to be idle can detect more idle slots than sensing all the time slots. The performance comparison between TDNN1HL, TDNN-2HL, RNN-1HL, and RNN-2HL in terms of percentage improvement of spectrum usage is shown in Table 1a, b.
(a) By using TDNN predictors TDNN-1HL TDNN-2HL Mch Isense I pr edict SU imp(%) Isense I pr edict 3 8466 14998 77.16 8476 15257 4 7884 16358 107.48 7915 16559 5 8372 17894 113.74 8321 18095 6 7942 18377 131.39 7940 18479 7 8228 19309 134.67 8223 19350 8 8405 19751 134.99 8529 19761 9 8627 19852 130.11 8701 19852 10 8607 19885 131.03 8628 19885 SU imp(%) 80.21 109.21 117.46 132.73 153.31 131.69 128.15 130.47
Table 1 Comparing the percentage of spectrum usage improvement
Mch 3 4 5 6 7 8 9 10
Isense 8390 7889 8344 7984 8250 8461 8739 8559
(b) By using RNN predictors RNN-1HL RNN-2HL I pr edict SU imp(%) Isense I pr edict SU imp(%) 15257 82.06 8481 16109 89.94 16559 109.89 7996 17166 114.68 18378 120.25 8418 18582 120.74 18762 134.99 8020 18865 135.22 19471 136.01 8236 19513 136.92 19808 134.10 8472 19755 133.17 19875 127.42 8598 19847 130.83 19885 132.32 8639 19847 129.73
A Channel Allocation Algorithm for Cognitive Radio Users … 717
718
N. Shams et al.
Table 2 The percentage of the sensing energy reduction for different traffics intensity Mean Traffic ON/OFF intenTime (slot) sity
TDNN_1Layer
TDNN_2Layer
RNN_1Layer
RNN_2Layer
Bp
Bp
Bp
Bp
SER(%)
SER(%)
SER(%)
SER(%)
16
0.5625
11250
56.25
10000
50.00
10000
50.00
10000
50.00
18
0.6667
13334
66.67
13334
66.67
13334
66.67
13334
66.67
10
0.5
12000
60.00
12000
60.00
12000
60.00
12000
60.00
22
0.6818
14544
72.72
14544
72.72
14544
72.72
14544
72.72
20
0.5
12000
60.00
12000
60.00
11001
55.00
11001
55.00
10
0.7
14000
70.00
14000
70.00
14000
70.00
14000
70.00
18
0.5
12890
64.45
12890
64.45
12890
64.45
12890
64.45
22
0.5
10001
50.00
10001
50.00
10001
50.00
11819
59.09
16
0.5
12750
63.75
12750
63.75
12750
63.75
12750
63.75
20
0.6
10000
50.00
10000
50.00
12000
60.00
10000
50.00
According to Table 1a, the percentage of improvement in spectrum usage for the predictors of TDNN-1HL and TDNN-2HL is more than 77% and 80%, respectively. As can be seen from Table 1b, by using the predictors of RNN-1HL and RNN2HL, the percentage of improvement in spectrum usage is more than 82% and 89%, respectively. Therefore, the best predictor in SU I (%) is related to the RNN predictor with two hidden layers. Figure 2c shows the performance of the four CSPs as a percentage of the improvement in spectrum usage. It can be seen that the number of sensed time slots with a predicted idle state improves as the number of channels increases. Table 2 illustrates the proportion of sensing energy reduction as a percentage for different traffic intensities. It presents various mean ON + OFF times on the channel when secondary users use different predictors. As shown in Table 2, the highest percentage of sensing energy reduction is 73%, and the lowest percentage of sensing energy reduction is 50%.
6 Conclusion In this paper, a centralized allocation algorithm based on the channel state predictor is presented. The channel allocation performance is evaluated by the number of channel switching on the network and the fairness between secondary users. The simulation results show that the channel switching probability in the multiple secondary users is less than 2% (less than 400 out of 20000 time slots). Since each secondary user in the centralized channel allocation uses its channel state predictor, the performance of each predictor should be evaluated in terms of
A Channel Allocation Algorithm for Cognitive Radio Users …
719
saving energy and spectrum usage improvement. The channel status prediction saves the sensing energy and improves the spectrum usage. The proposed channel state predictor improves the spectrum usage percentage by more than 77%. Acknowledgements This research has been supported in part by the Christian Doppler Laboratory ATHENA (https://athena.itec.aau.at/).
References 1. Bhattacharya A, Ghosh R, Sinha K, Datta D, Sinha BP (2015) Noncontiguous channel allocation for multimedia communication in cognitive radio networks. IEEE Trans Cognit Commun Netw 1(4):420–434 Dec 2. Sriharipriya KC, Sanju R (2017) Artifical neural network based multi dimensional spectrum sensing in full duplex cognitive radio networks. In: 2017 international conference on computing methodologies and communication (ICCMC) July 2017, pp. 307–312 3. Zhang M, Diao M, Guo L (2017) Convolutional neural networks for automatic cognitive radio waveform recognition. IEEE Access 5:11074–11082 4. Melián-Gutiérrez L, Modi N, Moy C, Bader F, Pérez-Álvarez I, Zazo S (2015) Hybrid ucbhmm: a machine learning strategy for cognitive radio in hf band. IEEE Trans Cognit Commun Netw 1(3):347–358 Sep 5. Lee W (2018) Resource allocation for multi-channel underlay cognitive radio network based on deep neural network. IEEE Commun Lett 22(9):1942–1945 Sep 6. Jiang W, Schotten HD (2019) Recurrent neural network-based frequency-domain channel prediction for wideband communications. In: 2019 IEEE 89th vehicular technology conference (VTC2019-Spring), pp 1–6 7. Shamsi N, Mousavinia A, Amirpour H (2013) A channel state prediction for multi-secondary users in a cognitive radio based on neural network. In: 2013 international conference on electronics, computer and computation (ICECCO), Nov 2013, pp 200–203 8. Sun J, Liu X, Ren G, Jia M, Guo Q (2019) A spectrum prediction technique based on convolutional neural networks. In: Jia M, Guo Q, Meng W (eds) Wireless and satellite systems. Springer International Publishing, Cham, pp 69–77 9. Tumuluru VK, Wang P, Niyato D (2010) A neural network based spectrum prediction scheme for cognitive radio. In: 2010 IEEE international conference on communications, May 2010, pp 1–5 10. Huk M, Mizera-Pietraszko J (2015) Contextual neural-network based spectrum prediction for cognitive radio. In: 2015 fourth international conference on future generation communication technology (FGCT), July 2015, pp 1–5 11. Yang J, Zhao H, Chen X, Xu J, Zhang J (2014) Energy-efficient design of spectrum prediction in cognitive radio networks: prediction strategy and communication environment. In: 2014 12th international conference on signal processing (ICSP), Oct 2014, pp 154–158
A Framework for Studying Coordinated Behaviour Applied to the 2019 Philippine Midterm Elections William Emmanuel S. Yu
Abstract This paper covers a hybrid framework for studying coordinated social media behaviour. The study focuses on social media activity during the 2019 Philippine National and Local Elections. With the use of social media information obtained from the CrowdTangle platform, the research is able to extract necessary post detail information that can be used to determine coordinated behaviour. The study tags posts are coordinated if they have shared media content between five or more posts within a 1-minute period. This should point to a high degree of coordination. The results are then visualized using the Fruchterman–Reingold algorithm using the Networkx library to see the various clusters. This gives us a reasonable amount of data to further explore and act upon. This framework was able to extract a number of Facebook accounts that showed a high degree of coordinated behaviour. These accounts refer to media that are no longer available which is not characteristic of reputable content. Keywords Social media · Elections · Content networks · Coordinated inauthentic behaviour · Crowdtangle
1 Introduction There is an ongoing battle for our hearts and minds. The battlefield is social media. There have been a lot of concerns raised about population manipulation with the use of social media [1]. There is an increased use of social media as a tool for political propaganda and even cyberwarfare. This is done with the proliferation of misinformation and disinformation. One of the major ways in which this is executed is the use of multiple coordinated accounts to simulate multiple parties in these social networks. These coordinated activities can point us towards networks used to propagate misinformation and disinformation to sow discord [2]. W. E. S. Yu (B) Ateneo de Manila University, 1108 Quezon City, Philippines e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_63
721
722
W. E. S. Yu
This is particularly problematic in a country like the Philippines with a population of nearly 110 million people. Of the 110 million, 79 million will have Internet access and 71.8 million will have Facebook access [3]. This means that over 90% of Filipino Internet users are on Facebook. This is how powerful Facebook is as a platform in the country. In a survey by the Social Weather Station (SWS), it is shown that 31% of Filipino social media users like or promote material related to political or social issues that others have posted. Additionally, 14% follow elected officials, candidates for office or other political figures. With 45% of users using Facebook to get the news at least on a daily basis. Add 14% of users who get the news from Facebook a few days a week [4]. This is 59% of users using Facebook for news. This points to a high degree of the political use of social media particularly Facebook. Facebook has introduced the concept of coordinated inauthentic behaviour (CIB) [5]. This is where various pages and fictitious accounts are used to mislead people. This mechanism does not look at the content of the post but focuses on the behaviour of the actors behind these posts. For example, if a single person pretends to be multiple people in a social media platform for the purposes of making it seem there are many followers and supporters of a particular view then that is potentially a violation of that social media platform’s code of content and likely CIB. This paper aims to look at a framework for looking at coordinated behaviour. It also looks at the shared content to determine possible propaganda content. The findings are then applied to the Philippine National and Local Elections of 2019 (NLE 2019). This is our latest electoral exercise which gives us an opportunity to look at various coordinated behaviours. Elections have been a popular venue for spreading disinformation. A substantial amount of time has already passed before we can see if these detected pages have already been taken down. This particular study will focus on making observations on coordinatedness of players but does not attempt to determine whether they are authentic or not. This framework is generalized to be able to be used for other events where we would like to look at this type of coordinated behaviour. There is also an incentive for operators in the field of information dissemination to re-use pre-established networks.
2 Finding Coordinated Behaviour in Social Media The study of coordinated behaviour in social media is still emerging. There have been a number of studies that aim to use computation methods to determine coordinated behaviour. In their paper, Giglietto et al. use platforms and various computational methods to study the spread of problematic information in contemporary media systems (i.e. social media). They also describe the issues with studying problematic information detection. For this study, we focus on using the assumption that coordinated users will create content and try to distribute it with their established network which has a degree of coordination [6, 7]. This is to maximize the effectiveness of the campaign. This allows the operators of these social media accounts to take advantage of network
A Framework for Studying Coordinated Behaviour Applied …
723
effects. It is also this network effect that makes it valuable to the operator. Social media network algorithms are designed to feature or highlight content that is popular. Using multiple of these networks with their own base of users, operators can boost a post’s popularity within the platform. In this paper, we will count on these operators taking advantage of this mechanism. Thus, we look for behaviour that attempts to coordinate posting amongst multiple accounts in their network. Each of these coordinated accounts will have pages or groups with multiple established users as well. This establishes a powerful networked base for information propagation. This work extends existing work on coordination. It basically attempts to look deeper into the content by first using coordination as a filter then performing analysis on the content itself. The filter is key to make the volume of content more manageable for manual analysis. This is why we call it a hybrid approach.
3 Hybrid Approach Finding Coordinated Behaviour One of the key differences of this approach is to use both computational and manual methods to detect potential problematic coordinated behaviour. Majority of the computation aspect is used to manage the sheer volume of the data. The manual aspect to look for potential outliers in the dataset. We attempt to implement this framework in the context of the posts made in a 4-month period covering NLE 2019 which was held in May 2019.
3.1 Computational Extraction and Detection of Networks The first step is to analyse the large amount of digital information from social media platforms into a smaller subset suitable for further manual analysis. The study makes use of social media data aggregated by CrowdTangle [8]. CrowdTangle (CT) is a service that aggregates information about posts from various social media platforms such as Facebook and Instagram. It provides information about each post including information on sentiment. CT has a Search API that provides additional post details including post history and the details of the media contents that were shared. This information is important in our research as we use media information to determine coordinated behaviour. In this paper, similar shared media synchronous shared are considered coordinated. Our approach to the detection of these coordinated behaviour computations uses the fact that a lot of these coordinated posts use common and shared content such as photos or videos. So, we use the additional media information provided by the CT Search API to determine what posts share the same media. The Search API query parameters used for this particular dataset are covered in Table 1. The CT search yielded the following dataset for processing summarized in
724 Table 1 Input parameters for dataset Parameter Period Platforms Keywords Country of page admin
Table 2 Summary of dataset characteristics Property Number of posts returned Number of unique accounts (by ID) Number of posts with shared media Verified accounts Country of poster
W. E. S. Yu
Value Feb 1 2019 00:00:36 to May 31 2019 23:47:22 Facebook pages/public Groups Elections, halalan, ppcrv, comelec, namfrel Philippines
Values 63,475 5,387 60,680 472 Philippines
Table 2. The study is limited to articles created and posted by non-verified Philippinebased Facebook accounts from the period Feb 1 to May 31, 2019. To simplify the analysis, we first removed all verified accounts. This is generally to remove legitimate media sites and organizations that have already had their pages verified by the social media platform provider. It is assumed that the social media platform provider has already performed the required vetting for those accounts. The next step is to determine coordinated behaviour. As mentioned previously, coordinated behaviour is detected by tracking the media content that was shared by multiple posts. This includes photos and videos. The idea is that if the same media was shared by different pages then this can point to coordinated behaviour if done within a certain period of time. For this particular study, we used 1 minute as that time period. Therefore, if multiple posts shared the same media content within a 1-minute period then they will be flagged as coordinated. Additionally, in order to limit the number of results that get flagged, we also set another criterion for flagging a post as coordinated. Initially, this was set to media in posts shared by 5 or more pages that posted the same media within that 1-minute period. This is a high degree of coordination to make the results more manageable. This allows for a good deal of visualization on highly coordinated activities. The number of posts shared can be adjusted depending on the volume of output of the later visualization. This study uses Python and Pandas [9] to process the information. Pandas provides a ready-to-use framework for processing large volumes of data. These tools are used to implement the algorithm above to filter data that is used. The accounts of those posts with common media are considered coordinated. The various accounts are then placed in a table that shows the other accounts they are sharing content with. The data is then visualized using a package called Networkx [10]. In particular, the
A Framework for Studying Coordinated Behaviour Applied …
725
Fig. 1 Initial plot of coordinated accounts
Fruchterman–Reingold algorithm was used to visualize the data. This algorithm is a force-directed graph drawing algorithm which aims to position nodes in the graph in a two-dimensional space that aims to ensure that all the edges are of more or less equal length and there are as few crossing edges as possible. This makes it easier to visualize clusters of interaction between the nodes. This algorithm was used as it allows a better visualization for the next stage of our framework. Figure 1 shows the execution of the data obtained by the system. As an observation, there are some clusters that refer to established but not verified media groups such as some RMN (Fig. 4) and ABS/TFC (Fig. 3) accounts. There is also a group aligned with an education institution (Fig. 2). There are then politically aligned groups shown in Figs. 5 and 6. These clusters are easily identifiable as the graphing algorithm presents them in a manner that they are easily visually clustered.
3.2 Cleaning up the Data For observations made in the cluster, the next step is to study the content shared by these entities in a little more detail. Upon closer analysis of the shared content, the first class of content refers to common media elements shared by unverified media entities like Sunstar and RMN. In Fig. 7, it can be seen that these are valid news articles but the photo tagged as shared is the company logo. This is likely because of their content management platform(CMP)’s open graph settings. Facebook uses OpenGraph provided content as the basis for what appears in the Facebook feed [11]. So, it is simply placeholder content that is common and not the specific article content itself. Ideally, this image should be customized per news article. So this can be removed from further analysis as coordinated behaviour.
726
W. E. S. Yu
Fig. 2 STI cluster highlighted
Fig. 3 ABS-CBN/TFC highlighted
There are also other groups that have a strong degree of content sharing within affiliated groups. Examples of this are the ABS-CBN group which has the main ABS-CBN page and the various TFC regional pages and the STI group of education institutions. Figure 8 shows examples of this shared content from these entities whose pages are really coordinated. The rest of the content is shared from URLs that no longer exist or have been taken down which is quite suspicious. This refers to sites hosted under the domains astigpinoy.net and nfac.elgoal.net. These are the contents left for the current stage of the analysis. But, they are gone. It is also suspicious that these articles are no longer published. Any reputable author or journalist would normally ensure that
A Framework for Studying Coordinated Behaviour Applied …
727
Fig. 4 RMN cluster highlighted
Fig. 5 Cagayan cluster highlighted
their content continues to be published even after the event for proper attribution. Mainstream writers do not fear fact-checking.
3.3 Looking at the Data Again After removing the entities flagged during the manual clean-up stage, Fig. 9 provides the updated view of the coordinated entities. The accounts left in this analysis refer to highly coordinated sites that are all political in nature. In fact, the majority of the pages are aligned with the current chief executive of the country. This is summarized
728
Fig. 6 PDU cluster highlighted
Fig. 7 Sunstar shared content versus shared media
Fig. 8 Content shared by various other groups
W. E. S. Yu
A Framework for Studying Coordinated Behaviour Applied …
729
Fig. 9 Cleaned up plot of coordinated accounts
in Table 3. This gives us a list of accounts that are highly coordinated. We are not in a position to determine if the previously shared content is fake or not. The content has already been taken down. But, it is clear that these sites are coordinated and are continually used as networks for propaganda purposes. Under the current guidelines of Facebook, could be inauthentic behaviour. A good number of these accounts are still active. This means that these pages can still be used as vehicles for propaganda.
4 Conclusions This framework demonstrates a hybrid approach to looking at coordinated and potentially inauthentic behaviour that uses both computation and manual mechanisms. The use of computational mechanisms allows us to process a large amount of social media data. By using the Networkx with the Fruchterman–Reingold algorithm, it is easier to see what these coordinated networks are and to give additional insight for this analysis. The manual mechanism is then used to vet the flagged shared content used to float issues such as misconfigured open graph content or validly coordinated groups such as the education institution and the unverified media entities. These are exceptions to the rule that can easily be flagged by a human expert (i.e. knowing known media entities and affiliates). Upon validation of media for each cluster of content, there are some shared content that are hosted in sites that no longer exist. This is one of the benefits of doing a slightly delayed analysis. This is highly suspicious and is not a characteristic of reputable content. Hopefully, this paper has shared a practical hybrid means of detecting coordinated behaviour for Facebook pages and groups. Pages and groups are magnets for coordinated user account behaviour. This is considering the fact that operators would
730
W. E. S. Yu
Table 3 Summary of highly coordinated accounts Account Status Duterte Philippine President Rodrigo Roa Duterte President of the Philippines Duterte Nation TATAYng Bayan SolidDDS Philippine Daily News News DailyPH Pro Duterte Blog Rodrigo Duterte Supporters Newz Trend Health Careand News Update Pinoy Updates Duterte Marcos Real Change Rodrigo Duterte the Best President of the Philippines Balitang Pinas
Active Active In-active Active In-active Active Active Active In-active Active In-active In-active Active Active Active
like to take advantage of network effects. We might not have access to user account data in this dataset. But, looking at coordinated behaviour in public pages and groups can be a starting point for further investigation. In this day in age, the most valuable commodity is now one’s attention. It is critical that we equip ourselves with the tools to better detect and understand these attempts to hack the attention economy [12]. For future research, the framework can be applied to process data in a more realtime manner and other themes for further investigation. Real-time processing can be used for proactive detection of coordinated behaviour for further investigation. Other themes can also be applied to the social media data set which is now made available by CrowdTangle for research purposes.
References 1. Bradshaw S, Howard P (2018) Challenging truth and trust: a global inventory of organized social media manipulation. The computational propaganda project, 1 2. Benkler Y (2019) Cautionary notes on disinformation and the origins of distrust. Soc Sci Res Council https://doi.org/10.35650/MD.2004.d 3. Miniwatts Marketing Group (2020) Asia internet use, population statistics data, and facebook data—June 30 2020. Miniwatts Marketing Group. https://www.internetworldstats.com/stats3. htm 4. Social Weather Stations (2019) First wuarter 2019 social weather survey: 1 of 5 adult Pinoys use facebook daily as a source of news. Social Weather Stations. https://www.sws.org.ph/swsmain/ artcldisppage/?artcsyscode=ART-20190629182313
A Framework for Studying Coordinated Behaviour Applied …
731
5. Gleicher N (2020) Removing coordinated inauthentic behavior. Facebook, Menlo Park, California, United States. https://about.fb.com/news/2020/10/removing-coordinated-inauthenticbehavior-september-report/ 6. Alvisi L , Clement A, Epasto A, Lattanzi S, Panconesi A (2013) Sok: the evolution of sybil defense via social networks . 2013 IEEE symposium on security and privacy, Berkeley, CA, 382–396. https://doi.org/10.1109/SP.2013.33 7. Boshmaf Y, Muslukhov I, Beznosov K, Ripeanu M (2011) The socialbot network: When bots socialize for fame and money. In: Proceedings of the 27th annual computer security applications conference, Orlando, Florida, 5–9 December, pp 93–102 8. CrowdTangle Team (2020) CrowdTangle. Facebook, Menlo Park, California, United States. https://apps.crowdtangle.com/admuscienceengfacebook/lists/pages 9. McKinney W (2010). ata structures for statistical computing in python. In: Proceedings of the 9th python in science conference (Vol 445, pp 51–56) 10. Hagberg AA, Schult DA, Swart PJ (2008) Exploring network structure, dynamics, and function using NetworkX. In: Varoquaux G, Vaught T, Millman J (eds) Proceedings of the 7th python in science conference (SciPy2008), (Pasadena, CA USA), pp 1–15, Aug 2008 11. Facebook Team (2020) A guide to sharing for webmasters. Facebook, Menlo Park, California, United States. https://developers.facebook.com/docs/sharing/webmasters/ 12. Boyd D (2017) Hacking the attention economy. Data and Society: Points. Available at: https:// points.datasociety.net/hacking-the-attention-economy-9fa1daca7a37 13. Giglietto F, Righetti N, Rossi L, Marino G (2020) It takes a village to manipulate the media: coordinated link sharing behavior during 2018 and 2019 Italian elections. Commun Soc Inf, pp 1–25
COVID-19 X-ray Image Diagnosis Using Deep Convolutional Neural Networks Alisa Kunapinun and Matthew N. Dailey
Abstract Recently, diagnosis of COVID-19 has become an urgent worldwide concern. One modality for disease diagnosis that has not yet been well explored is that of X-ray images. To explore the possibility of automated COVID-19 diagnosis from X-ray images, we use deep CNNs based on ResNet-18 and InceptionResNetV2 to classify X-ray images from patients under three conditions: normal, COVID-19, and other pneumonia. Experimental results show that deep CNNs can distinguish normal patients from diseased patients with accuracy 93.41%, and among diseased patients, it can distinguish COVID-19 from other pneumonia cases with accuracy 93.53%. The trained model is able to uncover the detailed appearance features that distinguish COVID-19 infections from other pneumonia. Keywords COVID-19 · Pneumonia · CNNs · InceptionResNet · ResNet · Gradient test
1 Introduction 1.1 Background In December 2019, SARS-CoV-2, which mutated from SARS-CoV-1, spread in Wuhan, China, causing the disease known as COVID-19. Within three months, the COVID-19 outbreak spread around the world, and the World Health Organization A. Kunapinun (B) · M. N. Dailey Asian Institute of Technology, Pathumthani, Thailand e-mail: [email protected] M. N. Dailey e-mail: [email protected] A. Kunapinun Industrial Systems Engineering, Pathumthani, Thailand M. N. Dailey Information and Communication Technologies, Pathumthani, Thailand © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_64
733
734
A. Kunapinun and M. N. Dailey
Fig. 1 X-ray images of lungs of a a COVID-19 patient, b an Influenza-A patient, and c a normal patient. Both infected lungs contain white smoke-like features; however, it difficult to diagnose the specific disease. Source Kaggle [6, 7]
(WHO) declared COVID-19 a pandemic. COVID-19 is dangerous due to the fact that it spreads as an aerosol [1], has a death rate of approimately 3%, and has an infection rate that is higher than influenza. As of November 2020, there were 42 million reported infections worldwide [14] (Fig. 1). COVID-19 spreads between patients and lives in the respiratory system, especially the lungs. Infected patients have symptoms similar to those of pneumonia, and as in pneumonia, there are some clues that can be detected in images of the lungs. CT-scan images have been shown to be effective, as, to a lesser extent, have X-ray images [8, 11, 13, 16]. Many researchers have explored the extent to which the visible lung symptoms in COVID-19 patients are different from those of other pneumonia, and this evidence can be readily seen in CT-scan images [11, 13, 15]. Although the symptoms are more difficult to distinguish from those of other pneumonia in X-ray images, X-ray imaging is much more accessible than CT-scans. Thus, this paper, we focus on classifying patients’ lung X-ray images as normal, COVID-19, or other pneumonia. We report on the accuracy of two CNN models, based on ResNet-18 and InceptionResNetV2, for this categorization task. COVID-19 can be detected in many ways. One of the easiest methods to use in the field is to collect phlegm samples and analyze them for the RNA of the virus. Unfortunately, this technique is not reliable in the early stage of infection. Moreover, it takes time to process samples. This has led to consideration of other detection modalities such as using the characteristics of images of infected lungs to identify positive cases. CT-scan images are particularly effective [15]. Automated diagnosis of diseases in CT-scans using deep neural network learning has proven effective, perhaps because CT-scans are clear 3D cross sections of anatomical structures. Xiaowei et al. develop a 3D CNN model [15] based on CTscans. The authors use ResNet-18 for image feature extraction and location-attention to find important regions for symptoms of disease. The system finds candidate positions of disease in each slice of a scan. For each candidate disease region, it selects the slice in which the candidate disease region is largest. Next, the image is input to a network that classifies the type of disease. The classification accuracy was 86.7%.
COVID-19 X-ray Image Diagnosis Using Deep Convolutional Neural Networks
735
1.2 X-ray Image Classification One of the reasons disease diagnosis is effective on CT-scans is that CT-scans are 3D images, and the characteristics of COVID-19 are more clearly different from those of other diseases in 3D. This makes it relatively easy to classify the images. However, X-ray images collapse the depth dimension of the lung to a single channel. Images of an infected lung show white smoke-like features that may indicate disease inside the lung, and physicians can use X-ray images to diagnose symptoms of some diseases, but it is difficult to determine the specific type of disease from this diffuse evidence. From our personal communications with physicians, we learned that physicians are able to find evidence of COVID-19 in X-rays images. However, they suggest that the evidence is insufficiently distinctive to confidently distinguish COVID-19 from other pneumonia. We hypothesize that it may nevertheless be possible to use data-driven analysis of X-ray images to extract features indicative of the symptoms of COVID-19. One of the most effective approaches to image classification is the deep CNN. Deep CNNs have already been used to analyze X-ray images in other applications, such as tooth segmentation, lung cancer diagnosis, and diagnosis of other lung diseases [2, 9]. In this paper, we test the hypothesis that AI methods, in particular, deep CNNs, can distinguish COVID-19 lung X-rays from those of normal patients and patients with non-COVID pneumonia.
2 Deep Lung X-COVID The models used in our system, Deep Lung X-COVID, are ResNet and InceptionResNetV2. There are many reasons to select these two models. ResNet is one of the most popular classification models because it is open source and has very good performance on the ImageNet dataset. It has high accuracy on many data sets and is fast to train. Thus, many medical image classification researchers select ResNet for classifying diseases in X-ray images and other types of images [2, 3]. Some research [5] has already tested the utility of ResNet on COVID-19 Xray images. However, the training data set in this work is very small. Furthermore, the researchers only classify images as coming from normal patients or COVID-19 patients. The model is not asked to distinguish COVID-19 from other pneumonia. Since distinguishing COVID-19 from other pneumonia is an important task that may require a more sophisticated model, we hypothesize that a larger, deeper network structure may be more suitable for the specific subtle details of lung X-rays needed to distinguish different types of disease of lungs. We suggest that InceptionResNet, which is designed to perform deeper analysis and resolve bottlenecks in predecessor models, may be more appropriate then ResNet for the task. We therefore select InceptionResNetV2 , a recent improvement over V1.
736
A. Kunapinun and M. N. Dailey
3 Experimental Design 3.1 Data Preparation We obtained 3336 X-ray images from Kaggle [6, 7]. We obtained images in three classes: normal, COVID-19 and pneumonia. The 3336 images are split into 1585 normal images, 198 COVID-19 images and 1463 pneumonia images. Each catagory was separated randomly into 80% training, 10% validation, and 10% test.
3.2 Experiment I: Classification with ResNet18 Trained from Scratch After image preparation, we trained ResNet18 from scratch, modifying the last fullyconnected layer to have 512 inputs and three outputs. We used the PyTorch library and trained on a GTX1080Ti for 40 epochs. The model’s loss after training is 0.3273, and the accuracy on the test set is 91.02%.
3.3 Experiment II: Classification with ResNet18 and Transfer Learning In this experiment, we began with a pretrained ResNet18, using weights from ImageNet [10] and fine tuning only the last layer of fully-connected units. The model’s loss after training is 0.3050, and the accuracy on the test set is 88.02%.
3.4 Experiment III: Classification with InceptionResNetV2 Trained from Scratch Next, we trained the open source InceptionResNetV2 from scratch. The model’s loss after training is 0.1206, and the accuracy on the test set is 92.51%.
COVID-19 X-ray Image Diagnosis Using Deep Convolutional Neural Networks Table 1 Experiment result: validation & test accuracy Network X-ray type Total Resnet-18
Resnet-18 transfer learning with
InceptionResNet V2
InceptionResNet V2 with expanded blocks
Correct
Percent (%)
All Normal COVID-19 Pneumonia All
334 155 36 143 334
304 144 31 129 294
91.02 92.90 86.11 90.21 88.02
Normal COVID-19 Pneumonia All
155 36 143 334
140 33 121 309
90.32 91.67 84.62 92.51
Normal COVID-19 Pneumonia All
160 25 149 334
151 24 134 312
94.38 96.00 89.93 93.41
Normal COVID-19 Pneumonia
160 25 149
151 24 137
94.38 96.00 91.95
737
3.5 Experiment IV: Classification with Expanded InceptionResNetV2 Models Next, we modified the base InceptionResnetV2 by adding more layers in the repeater layers of Block35, Block17, and Block8 (5 layers each). The model’s loss after training is 0.1830, and the accuracy on the test set is 93.41% All experiment results show in Table 1.
4 Results The test results show that deep learning can effectively distinguish normal, COVID19, and pneumonia categories from lung X-ray images. The results from the expanded InceptionResNetV2 model are especially good, better than ResNet18, as shown in Table 2 (Figs. 2 and 3).
738
A. Kunapinun and M. N. Dailey
Table 2 Summary results Network
Accuracy of classification Normal versus pneumonia (%) COVID-19 versus pneumonia (%)
ResNet18 ResNet18 (TF learning) InceptionResNetV2 InceptionResNetV2 (modify)
91.02 88.02 92.51 93.41
89.38 86.03 90.80 93.53
Fig. 2 Success cases. Correct classification of three classes (COVID-19, pneumonia, and normal) using the modified InceptionResNetV2 of Experiment IV. These images are classified accurately
Fig. 3 Failure cases. Classification of three classes (COVID-19, pneumania, and normal) using the modified InceptionResNetV2 of Experiment IV. There are few failure cases for actual COVID-19 compared to other classes because test accuracy of COVID-19 cases turned out to be very high
COVID-19 X-ray Image Diagnosis Using Deep Convolutional Neural Networks
739
4.1 Analysis of Classification Results The most important aspect of a diagnosis method is whether it can distinguish disease cases from normal cases using the available evidence. The model from Experiment IV can distinguish normal lungs and diseased lungs with test accuracy 93.41%. For distinguishing COVID-19 from other pneumonia types, InceptionResnetV2 also performs very well, with test accuracy 93.53%.
4.2 Analysis of Distinguishing Features Because the classification accuracy for both COVID-19 and pneumonia is high, there must be some clear symptoms visible in the lung image that can be used to predict their differences. Thus, we performed a gradient analysis to identify the critical visible differences between other pneumonia and COVID-19 according to the model. We COVID | Ipneum begin with a pneumonia sample Ipneum and find the partial derivatives ∂∂OI (u,v) for all pixels (u, v). This allows us to form a composite picture of what COVID-19 “looks like” different from other pneumonia, according to the model. The gradient test is shown in Algorithm 1. Algorithm 1 Gradient Test 1: procedure GradientTest 2: Inputs: 3: I: image from pnemonia set 4: M: trained model 5: O ← M(I) 6: Calculate: 7: for (u, v) ∈ U × V do 8: I ← I.clone() 9: I [u,v] ← I[u,v]+1 10: O ← M(I ) 11: D[u,v] ← O [COVID] - O[COVID] 12: end for 13: Out put D: gradient image 14: Finish:
The samples in Fig. 4 show that COVID X-ray images are characterized by more occlusion in the lung area between ribs.
740
A. Kunapinun and M. N. Dailey
Fig. 4 Gradient test for distinguishing COVID-19 from pneumonia. Two pneumonia images were sampled randomly from the correctly-classified test data. ∂∂OI COVID (u,v) was calculated for each pixel (u, v) in the input image. From left to right on each row: input pneumonia image, gradient image, and gradient overlaid on the image. Red indicates positive gradient, and blue indicates negative gradient
5 Conclusion To conclude, we have shown that deep learning can discriminate X-rays of COVID-19 lungs from those of other diseases, especially with the model based on InceptionResnetV2. However, the amount of data available for COVID-19 is currently very little. Thus, in future work, the number of X-ray images of COVID-19-infected lungs in the dataset should be increased. In the future, improved CNNs may be designed for more efficiency. Because X-ray images can be easily obtained in any hospital or clinic, the diagnosis method is much more widely available than CT-scans, which must be taken at large hospitals. Thus, an online service for analysis of X-ray images may enable reliable diagnosis even in developing countries where COVID-19 testing is less available.
5.1 Saliva Test for COVID-19 The easiest COVID-19 test that can be performed at home is the saliva test. The saliva test identifies infection within 1–2 days from onset. The result can be obtained in less than 1 h. However, the test misses 20% of positive cases, and it can miss up to 50% when the sample is obtained from the wrong position in the throat [12].
COVID-19 X-ray Image Diagnosis Using Deep Convolutional Neural Networks
741
5.2 Radiation Magnitude Due to X-rays and CT-Scans The level of radiation considered to cause damage to blood cells is 500 mSv. A lung X-ray requires a radiation dose of only 0.1 mSv, while a CT-scan has a radiation dose of up to 10 mSv. Since the radiation effects of CT-scanning are 100 times stronger than those of an X-ray, X-ray may be more safe for diagnosis COVID-19 [4].
References 1. Aerosol and Surface Stability of SARS-CoV-2 as Compared with SARS-CoV-1. New England J Med. https://doi.org/10.1056/NEJMc2004973 2. Ayyachamy S, Alex V et al (2019) Medical image retrieval using Resnet-18. In Medical imaging 2019: imaging informatics for healthcare and applications 3. Baltruschat I, Nickish H, Grass M et al (2019) Comparison of deep learning approaches for multi-label chest X-ray classification. Scientific Reports 9 4. CDC Radiation Emergencies. https://www.cdc.gov/ 5. Classify Covid-19 from X-ray images. https://medium.com/@nonthakon/ 6. https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge COVID-19 Open research data challenge 7. https://www.kaggle.com/bachrr/covid-chest-xray, COVID chest xray 8. Jacobi et al, Portable chest X-ray in coronavirus disease-19 (COVID-19): a pictorial review. Clin Imaging 9. Jader G, Fontineli J, Ruiz M et al (2018) Deep instance segmentation of teeth in panoramic Xray images. 2018 31st SIBGRAPI conference on graphics, patterns and images (SIBGRAPI), Parana, pp 400–407 10. Krizhevsky A et al (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst, 1106–1114 11. Li L, Qin L et al (2020) Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT. Radiol Soc 12. Questions about COVID-19 test accuracy raised across the testing spectrum. https://www. nbcnews.com/ NBC Health, 27 May 2020 13. Wong et al, Frequency and distribution of chest radiographic findings in COVID-19 positive patients. Radiology 14. World Health Organization. https://www.who.int/ 15. Xu X, Jiang X et al (2020) Deep learning system to screen coronavirus disease 2019 Pneumonia. Appl Intell 1–7 16. Zhu N, Zhang D et al (2020) A novel coronavirus from patients with Pneumonia in China. N Engl J Med, 24
Jumping Particle Swarm Optimization Atiq Ur Rehman , Ashhadul Islam , Nabiha Azizi, and Samir Brahim Belhaouari
Abstract Classical Particle Swarm Optimization (PSO) has limitations of slow convergence rate and getting trapped in a local optimum solution, when the data dimensions are high. It is therefore important to propose an algorithm that has an ability to overcome the limitations of classical PSO. Keeping in view the above mentioned limitations, this paper proposes a variant of classical PSO that has an ability to overcome the problem of slow convergence and skipping the local optimum solution. The proposed algorithm is based on a jumping strategy which triggers the particles to jump whenever they are found stuck in a local optimum solution. The proposed jumping strategy in PSO not only enables the algorithm to skip the local optima but also enables it to converge at a faster rate. The effectiveness of the proposed jumping strategy is demonstrated by performing experiments on a benchmark dataset that contains both the unimodal and the multimodal test functions. Keywords Global optimization · Large scale optimization · Metaheuristics · Particle swarm optimization · Unimodal and multimodal functions
1 Introduction Particle Swarm Optimization (PSO) is a metaheuristic global optimization method inspired by the social behavior of birds and fish flock. Since its initial developments [1, 2], the algorithm gained a huge popularity among researchers because of its usefulness in different optimization problems. Different modifications have been proposed in the scientific literature to address different limitations of the algorithm in terms of its search strategy, parameter tuning and topology [3, 4]. Due to a huge A. Ur Rehman (B) · A. Islam · S. B. Belhaouari ICT Division, College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar e-mail: [email protected] N. Azizi Electronic Document Management Laboratory (LabGED), Badji Mokhtar-Annaba University, Annaba, Algeria © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_65
743
744
A. Ur Rehman et al.
number of developments made by different researchers in PSO, the algorithm has become powerful for different optimization problems and is currently being used for different engineering applications [5] such as solar energy [6], gas industry [7] and health informatics [8, 9], among others. Besides the work done on the development of PSO in terms of (i) hyperparameter tuning, (ii) search strategy and (iii) topology, two other major variants of PSO are discrete PSO [10] and parallel PSO [11]. The discrete version of PSO is specifically designed for discrete optimization problems and this version of PSO is extensively being used for different discrete optimization problems for example feature selection [12, 13]. Whereas, parallel PSO is designed specifically to solve high dimensional optimization problems. These parallel algorithms aim to resolve the limitation of PSO related to being trapped in the local optima, when the dimensions of search problem increase. However, these parallel algorithms only work best if certain conditions are met [14], such as: (i) optimization requires a complete access to a homogeneous cluster of computers without being interrupted by other users, (ii) throughout the optimization, analysis function for the evaluation of any set of design variables takes a constant amount of time, (iii) among the available processors, the number of parallel tasks has to be equally distributed. If any of the above mentioned conditions is not satisfied, the effective utilization of the available computational resources is not possible and since large computer clusters are mostly heterogeneous with multiple users, the violation of first condition is always expected. Furthermore, to optimize a complex analysis function the computational cost is mostly variable [15]. Finally, keeping in view the coarse granularity of most of parallel optimization algorithms, it is hard to balance the workload by assigning the parallel tasks to the available processors. All of these three factors can contribute to the load imbalance problems which can degrade the parallel performance significantly. Keeping in view the drawbacks of parallel algorithms to solve high dimensional optimization problems, this paper presents a novel strategy to jump for PSO that has a capability to deal with high dimensional optimization problems. The proposed jumping strategy enables the particles to jump out of the local optima and encourages the global search at a faster convergence rate. The particles are triggered to jump to a random new position inside the search space if the particles are found stuck in a solution. Otherwise, the particles keep searching at a normal velocity, depending on the hyperparameters. To make the implementation of Jumping Particle Swarm Optimization (JPSO) more convenient and easy, the jumping strategy of the particles is embedded in the velocity equation of the algorithm. Moreover, the proposed JPSO also focuses on the convergence speed of the algorithm and the proposed jumping strategy enables the particles to reach an acceptable solution in fewer iterations. The proposed JPSO is validated by experimenting 12 benchmark test functions and the experimental results reveal the superiority of the proposed JPSO over other existing optimization algorithms in terms of the achieved acceptable solution and convergence rate. The convergence rate is evaluated in terms of the required Function Evaluations (FEs) in reaching an acceptable solution. The rest of paper follows as: Sect. 2 (Proposed Jumping Strategy), Sect. 3 (Evaluation Results) and Sect. 4 (Conclusion).
Jumping Particle Swarm Optimization
745
2 Proposed Jumping Strategy PSO is a nature inspired metaheuristic to solve global optimization problems. The working of classical PSO is based on the velocity of particles moving in a search space to find a global optima solution. The velocity of particles is influenced by their previous search experience and the particles tend to move towards the previously best found locations, with a factor of some randomness. This enables the particles to move towards the best solution available in the search space. However, with an increase in the data dimensionality the search space grows and the problem of optimization becomes challenging for the particles and they get stuck in a local optima solution. Therefore, a jumping strategy is proposed for the particles to skip the local optima solutions and enable them to reach a global optima solution. The velocity of particles in a classical PSO is updated as: vi (t + 1) = wvi (t) + c1rand( pi (t) − xi (t)) + c2 rand(g(t) − xi (t))
(1)
where, vi (t) is the velocity of ith particle in a swarm at iteration t, pi (t) is the best personal experience of the ith particle experienced so far (until iteration t), g(t) is the swarm’s best experience gained until iteration t and w, c1 , c2 are the acceleration constants. The particles move around the search space with velocity defined in (1) and update their positions xi (t) in each iteration, as: xi (t + 1) = xi (t) + vi (t)
(2)
The positions of all the particles are evaluated against an objective function and the best position among all the particles is identified as the global best position. The process of search stops on reaching maximum allowed iterations or FEs and the swarm’s best position is identified as the global best solution. In the proposed JPSO, the particles are forced to relocate to a new random position if found stuck in a solution. This is incorporated as a jump in velocity of the particles as: vi (t + 1) = ωvi (t) + c1rand( pi (t) − xi (t)) + c2 rand(g(t) − xi (t)) + Ji (CJ ) (3) where, Ji (CJ ) is the jump associated with particle i, under the Conditions to Jump (CJ ) defined as: ⎛ Ji (CJ ) =
⎛
α
CJ 0.1∗acceptance ⎜ ⎜ α ⎝log⎝ CJ 0.1∗acceptance
+1
⎞⎞ ⎟⎟ + ε ⎠⎠rand N
(4)
746
A. Ur Rehman et al.
and CJi = vi (t) + pi (t) − xi (t) + g(t) − xi (t)
(5)
where, α and ε are the constants defining the triggering and magnitude of the jump. Furthermore, the jumping strategy is based on a logarithmic function which will only enable the jump if the value of CJ is small. A smaller value of CJ defined in (5) means that the particle has stopped moving and is stuck in a particular solution. This means that if a particle is found stuck it will be relocated to a random new position by forcing it to jump. The jumping behavior of a particle defined in (4) is described more in Fig. 1, where the behavior of jump is explained using a variable x as: xα +ε , J (x) = log α x +1
(6)
and the behavior of J (x) can be seen as:
log
xα 0; |x| 1 + ε = Jm ; |x| 1 xα + 1
(7)
Furthermore, from the description of jump in Fig. 1, it can be seen that the logarithmic function will only enable the jump if the value of CJ is close to zero, otherwise the magnitude of jump will remain zero. Moreover, the influence of different values of ε and α on the behavior of jump is demonstrated in Fig. 1a, b, respectively. It can be seen in Fig. 1a, that small value of ε results in bigger jumps, whereas,
Fig. 1 Description of jump strategy, the jump is only triggered if the value of x is small. a describes the influence of ε on the magnitude of jump, small value of ε results in bigger jumps, whereas, increasing the value of ε decreases the magnitude of jump. b influence of α on triggering the jump, small value of α triggers less jumps whereas large values triggers the jump more
Jumping Particle Swarm Optimization
747
increasing the value of ε decreases the magnitude of jump. Similarly, from Fig. 1b, it is seen that the influence of α is to trigger the jump, small value of α triggers less jumps. Whereas, larger values of α triggers the jump more frequently and this will keep the particles away from getting more closer to the exact solution. This shows that if the value of α is kept large there is less chance for a particle to get stuck in a particular solution and will enable it to reach the global solution. However, this will bring a compromise on the accuracy of the solution as the particle might not be able to reach the exact optimum solution due to the frequent jumps. The rest of process of optimization for the proposed JPSO is carried out in a similar way as done for the classical PSO. The complete pseudocode for the proposed JPSO is given as follows: JPSO Algorithm Inputs: α, ε, w, c1 , c2 , swarm size and maximum iterations. Step 1: Initialize the hyperparameters. Step 2: Initialize the velocities and positions of the swarm randomly. Step 3: Evaluate each particle in the swarm for its fitness using an objective function. Step 4: Record the best positions of the swarm (personal and global). Step 5: Update the swarm’s velocity using (3). Step 6: Apply velocity clamping. Step 7: Update the swarm’s position using (2). Step 8: Bring back those particles which jumped outside the search space, randomly inside the search space. Step 9: Repeat steps 3–8 until the algorithm converges or until it reaches the maximum iterations. Output: Optimum solution as the global best position of the swarm.
3 Evaluation Results Twelve benchmark test functions which are already being used by different researchers [16–18] for evaluation of optimization algorithms are employed here for testing the proposed JPSO. The details of test functions used for evaluation are given in Table 1. The first six test functions are the unimodal functions while the remaining six are multimodal test functions. In order to determine the usefulness of proposed JPSO, the test functions are evaluated for different dimensions. The details of data dimensions used in evaluation along with rest of the hyperparameters are reported in Table 2. The α, and ε values are adjusted after testing a specific range for these two constants, the values tested for α are 5, 10, 20, 30, 40 and 80. Whereas, the values tested for ε are 0.1, 0.01, 0.001, 0.0001 and 0.00001. The maximum allowed iterations are kept intentionally low as 2000 for the evaluation of convergence rate for the proposed JPSO. The evaluation of test functions is completed in repeated 30
748
A. Ur Rehman et al.
Table 1 Details of Evaluated Functions
Table 2 Algorithm Implementation details
Unimodal Test Functions
Multimodal Test Functions
Hyperparameters
Value(s)
Swarm Size
20
Maximum iterations
2000
α
30
ε
0.0001
ω
Exponential decrease from 0.9-0.4
c1
2
c2
2
Data dimensions
30,100,500,1000,5000
trials and the results in terms of reliability and mean/best/worst solution are given in Table 3. It can be seen that the proposed JPSO reached the acceptable solution for all the test functions which proves the capability of the proposed algorithm for high dimensional optimization problems. Moreover, the solutions reported in Table 3 are reached within the maximum allowed (2000) iterations only and the solutions are expected to improve further if more iterations are allowed to solve the problem. The convergence rate evaluation in terms of required FEs taken by each function for reaching an acceptable solution are reported in Table 4. The results in Table 4 reveal the fast convergence speed of proposed JPSO for all the twelve test functions. In order to further demonstrate the performance of the proposed JPSO the convergence speed results are compared with some of the existing state-of-the-art optimization algorithms, in Table 5. From the comparison done with existing methods, the significance of the proposed JPSO is clearly seen for convergence rate.
Acceptable Dimensions Mean Best Worst Reliability Dimensions Mean Best Worst Dimensions Mean Best Worst Dimensions Mean Best Worst Dimensions Mean Best Worst
Function
0.01
9.5665e-04 1.5839e-04 0.0022 100%
0.0036 6.2051e-05 0.0074
0.0158 5.6864e-04 0.0323
0.0411 0.0012 0.0879
0.1704 0.0028 0.4415
0.01
1.2584e-07 2.4855e-08 2.1545e-07 100%
9.2397e-07 1.6705e-08 6.5623e-06
4.0980e-06 5.0397e-07 2.6642e-05
1.1698e-05 3.4158e-07 3.3088e-05
6.0776e-05 1.9768e-06 3.5709e-04
2.1090e+05 8.7381 2.8521e+06
64.66 0.0144 482.61
0.5964 4.7454e-06 3.6957
0.1027 2.2332e-10 0.9078
0.0057 1.4833e-05 0.0457 100%
100
0.2727 0.0021 1.6363
0.0433 1.2216e-04 0.3800
0.0077 1.0268e-05 0.0451
0.0015 2.4481e-05 0.0089
0.0011 9.5072e-07 0.0062 100%
100
Table 3 JPSO Results For Reaching an Acceptable Solution
0 0 0
0 0 0
0 0 0
0 0 0
0 0 0 100%
0
4.8311e-04 5.7703e-05 0.0012
4.1127e-04 5.8998e-05 0.0015
7.5370e-05 3.6224e-06 2.0445e-04
1.9893e-04 1.3957e-05 4.9430e-04
1.9428e-04 5.3675e-06 6.4057e-04 100%
0.01
-10000 N=30 -1.2569e+04 -1.2569e+04 -1.2568e+04 100% N=100 -4.1884e+04 -4.1898e+04 -4.1479e+04 N=500 -2.0949e+05 -2.0949e+05 -2.0949e+05 N=1000 -4.1854e+05 -4.1898e+05 -4.1551e+05 N=5000 -2.0736e+06 -2.0949e+06 -1.9878e+06 0.0263 2.0196e-07 0.4198
5.5915e-04 2.7954e-08 0.0060
2.4080e-05 4.2304e-10 2.7163e-04
6.8105e-05 3.2460e-08 2.9269e-04
5.4482e-05 4.0942e-09 3.6071e-04 100%
50
0.0669 5.5321e-06 1.2944
8.4395e-04 2.7925e-07 0.0160
6.2082e-05 4.9861e-08 0.0011
5.4047e-05 5.6382e-11 3.6501e-04
5.9826e-05 2.5847e-08 4.8492e-04 100%
50
2.4069e-04 1.4149e-06 3.7327e-04
2.4132e-04 2.6411e-05 4.3924e-04
1.6260e-04 1.7835e-05 3.2681e-04
1.8542e-04 2.7825e-05 3.2678e-04
2.0349e-04 4.9436e-05 2.8403e-04 100%
0.01
2.0597e-07 2.6190e-08 1.0906e-06
4.8671e-08 2.1367e-10 1.5118e-07
2.6140e-08 1.1570e-11 1.2398e-07
2.1461e-08 4.6533e-09 8.7101e-08
1.5518e-08 3.0546e-09 7.2094e-08 100%
0.01
6.0597e-09 3.3668e-10 2.5327e-08
1.3662e-09 9.3458e-12 3.5476e-09
1.5387e-09 6.9266e-11 4.5684e-09
2.3003e-09 5.8329e-10 7.3872e-09
3.6867e-09 8.9068e-10 2.7672e-08 100%
0.01
Jumping Particle Swarm Optimization 749
16
Mean
Best
14
483
N = 1000
196.26
26
473
N = 5000
407.23
65
971
Best
Worst
Dimensions
Mean
Best
Worst
Dimensions
Mean
Best
Worst
159.76
97.76
Dimensions
Mean
N = 100
Worst
326
127
Best
N = 500
18
Mean
Dimensions
81.10
Dimensions
Worst
f1
N = 30
Function
773
190
438.56
615
120
391.10
570
51
217.96
452
36
189.73
422
48
159.13
f2
Table 4 Convergence speed for JPSO
1999
942
1780.9
1977
160
1366
1709
44
438.03
213
16
78.06
68
15
45.70
f3
175
11
49.63
69
9
28.63
40
8
21.26
32
12
25.30
34
13
21.36
f4
528
103
313.73
310
93
240.90
231
79
150.26
58
15
41.60
53
16
30.36
f5
658
23
263.43
433
20
137.40
498
8
135.93
333
5
74.43
273
20
60.53
f6
1309
387
989.98
512
189
789.80
441
109
324.45
256
71
127.1
245
79
117.26
f7
381
94
245.40
243
73
160.63
94
23
61
58
21
38.93
26
12
19.96
f8
641
107
331.23
341
85
115.83
54
33
42.26
43
19
29.40
38
12
18.73
f9
1498
816
1114.83
1300
409
908.76
797
181
472.80
436
101
289.9
634
86
146.96
f 10
905
220
519.93
566
149
383.13
383
106
297.13
279
90
170.56
219
47
90.53
f 11
1042
443
879.50
745
225
417.60
339
94
186.83
138
40
90.23
70
15
43.10
f 12
750 A. Ur Rehman et al.
f1
N = 30
105695
118197
112408
32561
30011
91496
72081
7074
81.10
Algorithm
Dimensions
GPSO
LPSO
VPSO
FIPS
HPSO-TVAC
DMS-PSO
CLPSO
APSO
JPSO
159.13
7900
66525
91354
31371
36322
109849
115441
103077
f2
45.70
21166
–
185588
102499
73790
147133
162196
137985
f3
21.36
5334
74815
87518
33689
13301
103643
102259
101579
f4
3036
4902
39296
76975
64555
15056
100389
107315
93147
f5
Table 5 Convergence speed comparison with existing methods
60.53
78117
99795
180352
–
47637
170675
161784
165599
f6
117.26
5159
23861
101829
44697
122210
91811
89067
90633
f7
19.96
3531
53416
127423
7829
87760
98742
99074
94379
f8
18.73
2905
47440
115247
8293
80260
99480
110115
104987
f9
146.96
40736
76646
100000
52516
38356
118926
125543
110844
f 10
90.53
7568
81422
97213
34154
42604
117946
125777
111733
f 11
43.10
21538
59160
95830
44491
19404
102779
107452
99541
f 12
100
97.23
91.67
83.62
88.06
95.83
85.56
83.34
88.62
Reliability (%)
Jumping Particle Swarm Optimization 751
752
A. Ur Rehman et al.
Fig. 2 Behavior of a single particle’s velocity and its associated jump magnitudes for a a unimodal test function and b a multimodal test function
Furthermore, the behavior of a single particle’s velocity and its associated jump magnitudes, for a unimodal and a multimodal test function, are presented in Fig. 2. It is more clear from the behavior of a particle’s velocity for a unimodal test case, that the particle is forced to jump when it stops moving and is stuck. Similarly, for a multimodal function more frequent jumps are observed as the chances of getting stuck in local optima for multimodal function are high. This jumping behavior of particles forces the particles to perform the search again and encourages the global optima search.
4 Conclusion A novel jumping strategy for the classical PSO is proposed for dealing with the drawbacks and limitations of a classical PSO algorithm. With an increase in the dimensionality of data, the performance of classical PSO degrades in terms of both, the convergence rate and the localization of optimum solution. The jumping phenomenon proposed in this paper overcomes the limitations of trapping in local optima solution and convergence speed. The proposed strategy enables the particles to jump out of the local optima and encourages the global search. The effectiveness of the proposed algorithm is demonstrated by performing experimentation on 12 benchmark test functions and a comparison with existing literature is provided. The proposed algorithm is more feasible and reliable for applications demanding high convergence speed with a small compromise on the solution accuracy. Moreover, the benefit of proposed strategy is that it does not require specific computational resources as required by the existing parallel algorithms to deal with high dimensional optimization problems.
Jumping Particle Swarm Optimization
753
References 1. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95international conference on neural networks, pp 1942–1948 2. Shi Y, Eberhart RC (1998) A modified particle swarm optimizer. In: IEEE world congress on computational intelligence, pp 69–73 3. Wang D, Tan D, Liu L (2018) Particle swarm optimization algorithm: an overview. Soft Comput 22(2):387–408 4. Lynn N, Ali MZ, Suganthan PN (2018) Population topologies for particle swarm optimization and differential evolution. Swarm Evol Comput 39:24–35 5. Elbes M, Alzubi S, Kanan T, Al-Fuqaha A, Hawashin B (2019) A survey on particle swarm optimization with emphasis on engineering and network applications. Evol Intell 12(2):113– 129 6. Elsheikh AH, Abd Elaziz M (2019) Review on applications of particle swarm optimization in solar energy systems. Int J Environ Sci Technol 16(2):1159–1170 7. Rehman AU, Bermak A (2018) Drift-insensitive features for learning artificial olfaction in e-nose system. IEEE Sens J 18(17):7173–7182 8. Habib M, Aljarah I, Faris H, Mirjalili S (2020) Multi-objective particle swarm optimization: theory, literature review, and application in feature selection for medical diagnosis. In: Evolutionary machine learning techniques, pp 175–201 9. Ur Rehman A, Khanum A, Shaukat A (2013) Hybrid feature selection and tumor identification in brain MRI using swarm intelligence. In: IEEE 11th international conference on frontiers of information technology 10. Rezaee Jordehi A, Jasni J (2012) Particle swarm optimisation for discrete optimisation problems: a review. Artif Intell Rev 43(2):243–258 11. Lalwani S, Sharma H, Satapathy SC, Deep K, Bansal JC (2019) A survey on parallel particle swarm optimization algorithms. Arab J Sci Eng 44(4):2899–2923 12. Ur Rehman A, Bermak A (2018) Recursive DBPSO for computationally efficient electronic nose system. IEEE Sens J 18(1):320–327 13. Ur Rehman A, Bermak A (2018) Swarm intelligence and similarity measures for memory efficient electronic nose system. IEEE Sens J 18(6):2471–2482 14. Koh B, George AD, Haftka RT, Fregly BJ (2006) Parallel asynchronous particle swarm optimization. Int J Numer Methods Eng 67(4):578–595 15. Schutte JF, Koh BII, Reinbolt JA, Haftka RT, George AD, Fregly BJ (2005) Evaluation of a particle swarm algorithm for biomechanical optimization. J Biomech Eng 127(3):465–474 16. ZH. Zhan, Zhang J, Li Y, Chung HSH (2009) Adaptive particle swarm optimization. IEEE Trans Syst Man Cybern Part B Cybern 39(6):1362–1381 17. Jamian JJ, Abdullah MN, Mokhlis H, Mustafa MW, Bakar AHA (2014) Global particle swarm optimization for high dimension numerical functions analysis. J Appl Math 2014 18. Xu G et al (2019) Particle swarm optimization based on dimensional learning strategy. Swarm Evol Comput 45:33–51
Reinforcement Learning for the Problem of Detecting Intrusion in a Computer System Quang-Vinh Dang and Thanh-Hai Vo
Abstract In recent years, there are many research works focus on studying the intrusion detection systems. Several recent research works have utilized the power of supervised machine learning algorithms to achieve near-perfect predictive performance in modern intrusion datasets. However, these algorithms require huge labeled datasets that usually is not available in practice. In this paper, we analyze the possibility of using reinforcement learning in the problem of intrusion detection. Our experimental results show promising results compared to the other recent studies. Keywords Intrusion detection system · Machine learning · Reinforcement learning · Cybersecurity
1 Introduction Intrusion Detection Systems (IDSs for short) are the “intrusion alarms” whose the purpose is to detect any attack from outside to a computer system [3]. Several early intrusion detection systems rely heavily on the signature detection [12] but recent research studies focus more on using the latest machine learning techniques [9, 12] to run the system. Many research works utilized the power of machine learning methods [17], particularly the supervised machine learning methods. The core idea of the supervised machine learning algorithms is to (i) first the researchers will need to collect as much as possible a certain amount of network traffic data—usually the data size is tens or hundreds of GBs [32]; (ii) the researchers need to label each traffic flow, either by themselves or by a group of experts; and (iii) training a machine learning model to detect any similar attack in the future. We note that the Step (iv) can be replaced in Q.-V. Dang (B) · T.-H. Vo Industrial University of Ho Chi Minh city, Ho Chi Minh city, Vietnam e-mail: [email protected] T.-H. Vo e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_66
755
756
Q.-V. Dang and T.-H. Vo
the way that the researchers actively generating attacks to their host systems, but this approach is questioned on its realistic property [20] as the generated attacks by the researchers might not be similar to real attacks. The state-of-the-art supervised machine learning techniques in IDSs achieved the near-perfect predictive performance [9, 12, 24]. However, in order to use these techniques, it is required to obtain a huge labeled dataset. This task is very expensive in terms of time and effort. As stated in the work of [9], the predictive performance of the IDSs will drop significantly when the model needs to deal with a novel attack, i.e., the attacks that the model did not collect enough information to learn the patterns. Furthermore, it might be too late to recover a system as we need to wait a certain amount of time to gather the data and information regarding the new attacks. In this work, we explore another branch of machine learning which is the reinforcement learning [8, 35] to power the IDSs. Different from the widely used supervised machine learning, reinforcement learning can learn and perform the prediction without the explicit labels. Hence, the approach is more suitable for the IDSs. The paper is organized is as follows. We present the Related Works in Sect. 2. We describe our model in Sect. 3 and the experimental results in Sect. 4. We conclude our paper and draw some future research directions in Sect. 5.
2 Related Works We rely on two recent works [8, 12] and a survey on IDSs [17] to outline an overview of techniques using on detecting the intrusions in literature. In the early days of the information technology, the IDSs rely mostly on the signature-based detection methods [27]. In some other literature they are called as knowledge-based detecting systems or misuse detecting systems. In the signaturebased detection system, if a new traffic flow is matched to a known attack pattern, an alarm will be triggered. As described in the work of [17], the signature usually are expressed under some if-then-else conditions, such as “if source IP = destination ID, then trigger an alarm”. The signature-based detection system has served well for years [18] and is implemented in many popular network security tools such as Snort [31]. However, the signature-based systems cannot deal with novel or zero-day attacks [5]. In the last few decades, a lot of researchers have utilized machine learning for the IDSs. The researchers used mainly three different approaches: supervised learning, unsupervised learning [9], and semi-supervised learning methods. In Fig. 1 we summarize several most popular techniques using in IDSs [17]. We note that reinforcement learning is somehow underused by the researchers. The other recent review [28] states a similar observation. The core idea of using supervised machine learning techniques is by gathering a huge labeled dataset, we know exactly which packets are benign and which are malicious. Then the researchers will build a machine learning model that tries
Reinforcement Learning for the Problem of Detecting Intrusion …
757
Fig. 1 Machine learning based IDSs [17]
to learn the characteristics/patterns of benign and malicious packets then classify them in the future. The approach is mostly used in practice but requires a clean dataset [12]. By nature of the problem, many classification algorithms have been studied. Among them, the most popular algorithms are decision tree [2, 19, 22, 33], random forest [30] and SVM [4, 29]. The authors of [9, 12] used xgboost [6] which belongs to the boosting algorithm family. One major concern of using machine learning in IDSs is the explainability of the system. The problem has been raised for a long time [3], but it became more critical recently with the rapid development of machine learning techniques [12]. The authors of [12] addressed the problem and showed that by understanding a model we can reduce the required computational power but retain the predictive performance. Other researchers used unsupervised learning methods, including dimension reduction methods and anomaly detection. A popular approach to reduce the number of dimension of the dataset is Principal Component Analysis (PCA) [1]. Several researchers, for instance, the authors of the works presented in [16, 37], run the PCA algorithm before sending the output data to another classification algorithm. In their works the authors used the SVM as the classifier. However, there are some drawbacks of this approach that have been addressed [9]: (i) training a PCA model takes a lot of time, (ii) PCA cannot process null values; and (iii) in production we still need to pass the new incoming data through a PCA model to acquire a dimension-reduced data; and by doing so we increase the total running time of the product. In the work of [36] the authors proposed to use the Deep Belief Networks [15] to learn the features
758
Q.-V. Dang and T.-H. Vo
automatically. The work of automatic feature learning have been followed by other works [25]. If we found it difficult to gather attack samples and particularly the labels, the researchers suggested to use anomaly detection techniques [34]. In the work of [21] the authors took into consideration the four widely used anomaly detection at that time, including Mahalanobis distance, k-nearest neighbors, LOF, and unsupervised SVM. The authors of [7] presented the work in comparing several anomaly detection techniques included more recent techniques such as Isolation Forest [23]. The authors of [10] combined the Isolation Forest algorithm with the active learning schemes so we can actively select a next training instance.
3 Work Description 3.1 Datasets We continue using the dataset CICIDS20171 which is presented in the work of [32]. The dataset is created by capturing the real-world network traffic from 3-July2017 to 7-July-2017. According to the authors, the CICIDS2017 is the first network intrusion dataset that satisfies all eleven criteria of a reliable dataset [14]. The criteria include realistic and diversity attacks, and the traffic is complete. The full information regarding the criteria can be found in [12, 14]. The dataset includes more than 51GB of the log data with 2, 830, 743 network flows, labeling either BENIGN or one of pre-defined attack types. The majority of the dataset are benign traffic flows, accounted for more 80% of the datasets. The most popular attacks are DoS Hulk, PortScan, and DDos that accounted for more than 95% of the attacking flows collected. The datasets have been used in several recent studies [9, 10, 12], makes it easier to compare the performance of the models.
3.2 Reinforcement Learning We visualized the general context of reinforcement learning in Fig. 2. We can see that, while a supervised learning finishes the training step before moving to the prediction phase, a reinforcement learning agent interacts back and forth with the environment to receive the feedback. The agent can do At at t and then the environment feedback with the reward in the next time step Rt+1 = R(St , At ). The environment after that shifts itself to the state St+1 = δ(St , At ). The goal of the program is to infer a policy π : S → A to maximize its gain: 1
https://www.unb.ca/cic/datasets/ids-2017.html.
Reinforcement Learning for the Problem of Detecting Intrusion …
759
Fig. 2 The interaction between agent and environment in reinforcement learning
V π (St ) =
∞
γ k Rt+k+1
(1)
k=0
Here, the coefficient γ is the factor that represents the loss of value if we receive the same rewards later on, similar to the interest rate of inflation in economy. Moving to the cybersecurity context, we can interpret the statement as detecting an attack as sooner as possible. In this study we implement Deep Q-learning [26]. The core idea of the Deep Q-Learning is to learn a complicated Q-function by a machine learning model, particularly a neural networks. In our settings, we rely on the setting of stock trading [8]. The possible actions are to allow or deny a network traffic. The rewards including the harm if any that the traffic might cause to the computer system afterward. The loss function is L(θ) =
1 (Q θ (Si , Ai ) − Q θ (Si , Ai ))2 N i∈N
(2)
with Q θ = R(St , At ) + γmax Ai (Si , Ai )
(3)
The weights are updated using gradient descent: θ ←θ−α
∂L ∂θ
(4)
We calculate the gradient of the loss as ∇θi L(θi ) = E S,A∼ P(.),S ∼ [(Rt+1 +γmax A Q(St+1 ,A )−Q(S,A,θi ))∇θi Q(S,A,θi )]
(5)
760
Q.-V. Dang and T.-H. Vo
4 Experimental Results We compare our methods to state-of-the-art research studies [9, 10, 12]. We would like to note that the compared works are done in the supervised learning manner, so they are not fully comparable. We recall that the recent works achieve very high predictive performance, such as in the work of [12] the authors reported to the AUC score of 0.9999995. The reinforcement learning by no means can achieve that performance because we have to sacrifice some first few rounds for trial and errors. We follow the settings of the works of [9, 10, 12] by considering binary classification context and multi-class classification context. In binary classification context we only consider benign and malicious traffic types, while in the multi-class classification context we consider all individual attack types. In the binary classification context, we achieve the AUC score of 0.82, much better than traditional methods such as Naive Bayes or SVM reported in the work of [9]. In the multi-class classification context we achieve the accuracy of 0.85, second to the best compared to supervised learning methods [9]. It shows that reinforcement learning is potential and has a large room for improvement.
5 Conclusions Given the importance of the computer systems in our daily life, the intrusion detection systems play a very important role in keeping our society to function normally. As the defense techniques are improved over time, same as the attacking techniques. In this paper, we argued that the widely used supervised machine learning in literature might not be well-suited for the real-world intrusions because most of them assume a static distribution of incoming flows. We introduced reinforcement learning to deal with the system. The predictive performance of the reinforcement learning algorithms did not yet achieve the similar performance to the supervised algorithms, but there is a room for improvement. In the future, we would like to integrate graph analysis techniques [11, 13] to understand the traffic flows in their connection context.
References 1. Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdisciplinary Rev Comput Stat 2(4):433–459 2. Amor NB, Benferhat S, Elouedi Z (2004) Naive bayes vs decision trees in intrusion detection systems. In: SAC, pp 420–424. ACM 3. Axelsson S (2000) Intrusion detection systems: a survey and taxonomy. Technical report 4. Bhamare D, Salman T, Samaka M, Erbad A, Jain R (2018) Feasibility of supervised machine learning for cloud security. CoRR arXiv:1810.09878
Reinforcement Learning for the Problem of Detecting Intrusion …
761
5. Bilge L, Dumitra¸s T (2012) Before we knew it: an empirical study of zero-day attacks in the real world. In: Proceedings of the 2012 ACM conference on Computer and communications security, pp 833–844 6. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: KDD, pp 785–794. ACM 7. Dang QV (2018) Outlier detection in network flow analysis. arXiv:1808.02024 8. Dang QV (2019) Reinforcement learning in stock trading. In: International conference on computer science, applied mathematics and applications, pp 311–322. Springer 9. Dang QV (2109) Studying machine learning techniques for intrusion detection systems. In: International conference on future data and security engineering, pp 411–426. Springer 10. Dang QV (2020) Active learning for intrusion detection systems. In: IEEE Research, innovation and vision for the future 11. Dang QV (2020) Link-sign prediction in signed directed networks from no link perspective. In: International conference on integrated science, pp 291–300. Springer 12. Dang QV (2020) Understanding the decision of machine learning based intrusion detection systems. In: Dang TK, Küng J, Takizawa M, Chung TM (eds) Future data and security engineering. Springer International Publishing, Cham, pp 379–396 13. Dang Q, Ignat C (2018) Link-sign prediction in dynamic signed directed networks. In: CIC, pp 36–45. IEEE Computer Society 14. Gharib A. Sharafaldin I, Lashkari AH, Ghorbani AA (2016) An evaluation framework for intrusion detection dataset. In: 2016 international conference on information science and security (ICISS), pp 1–6. IEEE 15. Hinton GE (2009) Deep belief networks. Scholarpedia 4(5):5947 16. Kausar N, Samir BB, Sulaiman SB, Ahmad I, Hussain M (2012) An approach towards intrusion detection using pca feature subsets and svm. In: 2012 international conference on computer & information science (ICCIS). vol 2, pp 569–574. IEEE 17. Khraisat A, Gondal I, Vamplew P, Kamruzzaman J (2019) Survey of intrusion detection systems: techniques, datasets and challenges. Cybersecurity 2(1):20 18. Kreibich C, Crowcroft J (2004) Honeycomb: creating intrusion detection signatures using honeypots. ACM SIGCOMM Comput Commun Rev 34(1):51–56 19. Krügel C, Toth T (2003) Using decision trees to improve signature-based intrusion detection. In: RAID. Lecture Notes in Computer Science, vol 2820, pp 173–191. Springer 20. Kumar S, Arora S, et al (2019) A statistical analysis on kdd cup99 dataset for the network intrusion detection system. In: International conference on advanced communication and networking, pp 131–157. Springer 21. Lazarevic A, Ertoz L, Kumar V, Ozgur A, Srivastava J (2003) A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 2003 SIAM international conference on data mining, pp 25–36. SIAM 22. Li X, Ye N (2001) Decision tree classifiers for computer intrusion detection. J Parallel Distrib Comput Pract 4(2):179–190 23. Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: 2008 Eighth IEEE international conference on data mining, pp 413–422. IEEE 24. Marín G, Casas P, Capdehourat G (2020) Deepmal—deep learning models for malware traffic detection and classification. CoRR arXiv:2003.04079 25. Marín G, Casas, P, Capdehourat G (2020) Deepmal–deep learning models for malware traffic detection and classification. arXiv preprint arXiv:2003.04079 26. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller MA (2013) Playing atari with deep reinforcement learning. CoRR arXiv:1312.5602 27. Modi C, Patel D, Borisaniya B, Patel H, Patel A, Rajarajan M (2013) A survey of intrusion detection techniques in cloud. J Netw Comput Appl 36(1):42–57 28. Nguyen TT, Reddi VJ (2019) Deep reinforcement learning for cyber security. arXiv:1906.05799 29. Reddy RR, Ramadevi Y, Sunitha KVN (2016) Effective discriminant function for intrusion detection using SVM. In: ICACCI, pp 1148–1153. IEEE
762
Q.-V. Dang and T.-H. Vo
30. Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems. ACM Comput Surv 51(3):48:1–48:36 31. Roesch M et al (1999) Snort: lightweight intrusion detection for networks. Lisa 99:229–238 32. Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: ICISSP, pp. 108–116 33. Stein G, Chen B, Wu AS, Hua KA (2005) Decision tree classifier for network intrusion detection with ga-based feature selection. In: ACM Southeast regional conference (2), pp 136–141. ACM 34. Suri R, Murty MN, Athithan G (2019) outlier detection: techniques and applications. Springer 35. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press 36. Wu Y, Lee WW, Xu Z, Ni M (2020) Large-scale and robust intrusion detection model combining improved deep belief network with feature-weighted SVM. IEEE Access 8:98600–98611 37. Xu X, Wang X (2005) An adaptive network intrusion detection method based on pca and support vector machines. In: International conference on advanced data mining and applications, pp 696–703. Springer
Grey Wolf Optimizer Algorithm for Suspension Insulator Designing Dyhia Doufene , Slimane Bouazabia , Sid A. Bessedik , and Khaled Ouzzir
Abstract This paper aims to find an optimised design for a high voltage suspension insulator by using a novel meta-heuristic named Grey Wolf Optimizer (GWO) developed by mathematically modelling the hunting procedure of grey wolves and their social hierarchy in nature. The paper considers two goals; the first is a reduction of the electric field stress around the insulator area since the electrical performance of high voltage insulators is governed by their electric field distribution along their leakage path. The second goal is finding a better design of the insulator, by reducing its leakage path thus reducing its global weight leading to better mechanics and economic performances. To achieve the desired goals a fitness function is defined as the value of the electric field at the pin region as it is the most critical area which records the highest values of the fitness function. And after, some constraints are implemented to the (GWO) algorithm to reduce the leakage path of the insulator. The fitness function is calculated by Comsol-multiphysics, using the finite element method (FEM). The obtained results prove the efficiency of GWO for solving such design optimization for high voltage insulators. Keywords Optimization · Grey Wolf Optimizer · High voltage insulators
1 Introduction Due to their essential role in the electrical network the high voltage insulators arouse more and more interest. Indeed suspension insulators have two main role; mechanical support for the high voltage cable and insulating the high voltage part from the supporting pylon. Hence for a better operational electric network, better performing suspensions insulators are required.
D. Doufene (B) · S. Bouazabia · K. Ouzzir University of Science and Technology Houari Boumediene Bab Ezzouar, Laghouat, Algeria S. A. Bessedik Université Amar Telidji de Laghouat, Laghouat, Algeria © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_67
763
764
D. Doufene et al.
In the recent years meta-heuristic algorithms are increasingly used for optimising the electrical and geometrical performances of high voltage insulators [1–11]. Several meta-heuristics are developed for solving unknown search spaces engineering problems. Among the recent one, this work selected the GWO [12] technique to perform an optimization design of a high voltage suspension insulator. It is well established in the literature that the high electric field stress on the suspension insulators’ surface leads to flashover risks, so to ensure a better electric performance of the insulator, reducing its highest electric field value by mean of meta-heuristic optimizations can be a better approach for flashover prevention. The electric field values on the insulator surface are computed by Comsolmultiphysics using the finite elements method [13] resolution to solve the governing differential equation. After the resolution the highest electric field value [14] in the leakage path of the insulator is selected and implemented into the GWO algorithm as a fitness function to be minimised. Constraints are imposed on the length of the insulator ribs, which are considered as the optimisation variables, to reduce the total length of the leakage path.
2 Electric Field Distribution Computation 2.1 Geometry Modelling in Comsol-Multiphysics For this optimization problem a U400B glass insulator is chosen. The insulator parameters are a diameter D = 32[cm], a spacing H = 20.5 [cm] and a creepage distance L = 55[cm]. The pin of the insulator is energised with a 10 kV and the cap is grounded. The geometry modelling in Comsol-multiphysics software is shown in Fig. 1, the lengths of the different insulator ribs are chosen as the optimisation variables for the GWO algorithm and they are numbered from L1 to L4. The materials used for geometry construction are: air, glass, iron, and cement, having the following respective permittivities 1, 5.59, 106 , 5.9. Fig. 1 The optimisation variables (L1 to L4 )
Grey Wolf Optimizer Algorithm for Suspension Insulator Designing
a) Electric field
765
b) Electric potential
Fig. 2 Electric field and potential distributions for the industrial model
Neumann boundary condition is adopted for the outer contour of the domain limiting the studied system. Equations 1 and 2 are solved using FEM. V = 0
(1)
− → − → E = −∇ V
(2)
− → where; V: electric potential and E : electric field stress vector.
2.2 Electric Field and Potential Distributions Results Electric field and potential distributions are given in Fig. 2a, b, respectively. The potential repartition on the leakage path of the insulator (Fig. 3a) shows a variation from 0 kV (the cap area) to 10 kV ( the pin area). The electric field repartition (Fig. 3b) shows that there are two regions with high values of the electric field. The maximum is recorded at the pin region with 5 kV/cm and at the cap area it reaches a value of 3.1 kV/cm.
3 The GWO Optimization Technique Gray Wolf optimization Algorithm (GWO) is a novel intelligent optimization technique established by Mirjalili et al. in 2014 [12]. The GWO method consists in
766
D. Doufene et al.
a) Electric field
b) Electric potential
Fig. 3 The electric field and potential repartitions along the leakage path
imitating the hunting behaviour of grey wolves. The wolves population is divided into four categories in this algorithm, the first three are the strongest wolves (the leaders) named alpha (α), beta (β), and delta (δ) they represent the finest solution for the optimisation problem, another category is the omega (ω) wolves who follows the precedent three leaders toward favourable areas in order to find the global solution [15].: Encircling, hunting, and attacking the prey are the three main steps of the hunting process. Encircling: Gray wolves surround the prey while hunting, this behaviour is mathematically modelled by Eqs. (3) and (4). − → →− → − → − D = C . X p (t) − X (t)
(3)
− → − → − →− → X (t + 1) = X p (t) − A . D
(4)
where t is the present iteration, X: position vector of a grey wolf, Xp: prey position, C and A: coefficient vectors (Eqs. (5) and (6)). The elements of the vector ‘a’ decrease linearly from 2 to 0, and r1 , r2 : random vectors in [0,1]. − → → → a (t) A = 2.− a (t).r1 − −
(5)
− → → C = 2.− r2
(6)
Hunting: the mathematic model of the hunting behaviour is represented by Eqs. (7– 9) by assuming that the wolves α, β, and δ have better information about the position of the prey and the ω wolves have to follow them to reach the solution.
Grey Wolf Optimizer Algorithm for Suspension Insulator Designing
⎧− → ⎪ ⎪ Dα = ⎪ ⎨− → Dβ = ⎪− ⎪ → ⎪ ⎩ Dδ =
− → − → → − C 1 . X α − X (t) − → − → → − C 2 . X β − X (t) − → − → → − C 3 . X δ − X (t)
767
(7)
C1 , C2 , and C3 are found using Eq. (6). ⎧ − → → − → − → − ⎪ X 1 (t) = X α − A 1 .(D α (t)) ⎪ ⎨− − → − − → → → X 2 (t) = X β − A 2 . D β (t) ⎪ ⎪ − → − → − → → ⎩− X (t) = X − A . D (t) 3
δ
3
(8)
δ
At iteration t, Xα , Xβ , and Xδ are the first three finest solutions. A1 , A2 , and A3 are calculated by Eq. (5), and Dα , Dβ , and Dδ are found using Eq. (7). − → − → − → X 1 (t) + X 2 (t) + X 3 (t) − → X (t + 1) = 3
(9)
Attacking: the wolves start the attack when the prey can’t move so the hunting process is terminated. Decreasing linearly the value of ‘a’ from 2 to 0 represent mathematically the attacking step. The pseudo-code of the algorithm is presented in Fig. 4. Fig. 4 The GWO algorithm pseudo-code [12]
Start Initialization of the population of grey wolves Xi (i = 1, 2,3, ... , n) Initialization of a, A, and C Calculation of the fitness values of exploration agents Xα = the best exploration agent, Xβ = the second best exploration agent Xδ = the third best exploration agent t= 0 While (t < Maximum number of iterations) For each search agent Updating the current exploration agent position ( equation 9) Stop for Updating of a, A, and C Calculation of the fitness values of all exploration agents Updating the positions of Xα, Xβ, and Xδ t= t+1 Stop while Xα Stop
768
D. Doufene et al.
4 Optimization Results After applying the GWO optimisation process to the developed FEM code (in Comsol-multiphysics) an optimised design of the U400B insulator is obtained. The characteristic of convergence during the optimization process is presented in Fig. 5, which shows the convergence of the algorithm towards the optimal value of 4.70 kV/cm from iteration 75. The electric field and voltage distributions are given in Fig. 6a, b, respectively.
Fig. 5 The convergence curve
a) Electric field
b) Electric potential
Fig. 6 Electric field and potential distributions for the optimised model
Grey Wolf Optimizer Algorithm for Suspension Insulator Designing
769
The electric potential and field repartitions along the creepage distance of the insulator are represented in Fig. 7a, b, respectively. The optimisation results are summarised in Table 1.
a) Electric potential
b) Electric field
Fig. 7 Electric potential and field repartitions for the reference and the optimised model
Table 1 Comparison between the reference and the optimised model Reference model
Optimised model
l1 [cm]
3,30
0,49
l11 [cm]
3,40
0,59
l2 [cm]
3,85
0,14
l22 [cm]
4,03
0,32
l3 [cm]
3,50
0,13
l4 [cm]
1,16
5
Creepage distance [cm]
55
44
Reduction in creepage length (%)
20
Electric field at the pin region [kV/cm]
5,08
Electric field reduction at the pin region (%)
7.5
Electric field at the cap region
3.25
Electric field reduction at the cap region (%)
10
The diametral area of the insulating 72.45 glass surface [cm2] Reduction of the diametral surface of the insulating skirt in [%]
10.33
4.70
2.93
64.96
770
D. Doufene et al.
5 Conclusion With a reduction of 7.5% and 10% of electric field at the pin region and at the cap region, respectively, the obtained shape of the insulator gives a better electric performance compared to the industrial one. Adding to that a 20% decrease in creepage length, this obtained design can give better mechanic and economic performance since the reduction of the global weight of the insulator. The results achieved in this paper show the efficiency of the GWO technique for designing an optimised shape of a high voltage suspension insulator satisfying a better electric performance by decreasing the maximum electric field value registered at its exterior surface. And also reducing the leakage path of the insulator that leads to an enhanced weight of the insulator.
References 1. Bhattacharya K, Chakravorti S, Mukherjee PK (2001) Insulator contour optimization by a neural network. IEEE Trans Dielectr Electr Insul 8:157–161 2. Nie D, Zhang H, Chen Z et al (2013) Optimization design of grading ring and electrical field analysis of 800 kV UHVDC Wall bushing. IEEE Trans Dielectr Electr Insul 20(4):1361–1368 3. Chen WS, Yang HT, Huang HY (2010) Contour optimization of suspension insulators using dynamically adjustable genetic algorithms. IEEE Trans Power Deliv 25(3):1220–1228 4. Chen WS, Yang HT, Huang H-Y (2008) Optimal design of support insulators using Hashing integrated genetic algorithm and optimized charge simulation method. IEEE Trans Dielectr Electr Insul 15(2):426–434 5. Banerjee S, Lahiri A, Bhattacharya K (2007) Optimization of support insulators used in HV systems using support vector machine. IEEE Trans Dielectr Electr Insul 14:360–367 6. Doufene D, Bouazabia S, Haddad A (2020) Shape and electric performance improvement of an insulator string using particles swarm algorithm. IET Sc Measur Technol 14(2):198–205. https://doi.org/10.1049/iet-smt.2019.0405. 7. Doufene D, Bouazabia S, Haddad A, Optimized performance of cap and pin insulator under wet pollution conditions using a mono-objective genetic algorithms. Australian J Electr Electron Eng 16(3):149–162. https://doi.org/10.1080/1448837X.2019.1627740.10.1080/144 8837X.2019.1627740. 8. M’hamdi B, Teguar M, Mekhaldi A (2016) Optimal design of corona ring on HV composite insulator using PSO approach with dynamic population size. IEEE Trans Dielectr Electr Insul 23(2):1048–1057 9. Doufene D, Bouazabia S, Ladjici AA (2017) Shape optimization of a cap and pin insulator in pollution condition using particle swarm and neural network. The 5th international conference on electrical engineering – Boumerdes (ICEE-B) 10. Nie D, Zhang H, Chen Z et al (2013) Optimization design of grading ring and electrical field analysis of 800 kV UHVDC Wall bushing. Trans Dielectr Electr Insul 20(4):1361–1368 11. Doufene D, Bouazabia S, Haddad A (2017) Polluted insulator optimization using neural network combined with genetic algorithm. 18th International symposium on electromagnetic fields in mechatronics. Electrical and Electronic Engineering (ISEF), Poland 12. Mirjalili S, Mirjalili SM, Andrew L (2014) Grey Wolf optimizer. Adv Eng Softw 69:46–61 13. Cook R et al (1989) Concepts and applications of finite element analysis. Wiley
Grey Wolf Optimizer Algorithm for Suspension Insulator Designing
771
14. Doufene D, Bouazabia S, Bouhaddiche R (2018) Heating dissipation study of a pollution layer on a cap and pin insulator. 2018 International conference on communications and electrical engineering (ICCEE), El Oued, Algeria, 2018, pp 1–4. https://doi.org/10.1109/CCEE.2018. 8634549. 15. Nadimi-Shahraki MH, Taghian S, Mirjalili S, An improved grey wolf optimizer for solving engineering problems. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.113917
Green IT Practices in the Business Sector Andrea Mory, Diego Cordero, Silvana Astudillo, and Ana Lucia Serrano
Abstract The purpose of this study is to determine the level of application that the company has in terms of green IT practices. In this context, the study begins with a theoretical conceptual review, which serves to identify the different variables involved in green computing and to support the nine research hypotheses. The relationship between the identified constructs (hypothesis) generates a model of structural equations; the operationalization of the variables results in a questionnaire with 60 indicators to determine the situational status of various businesses with regard to green information technologies; the instrument is applied to 47 informants from various organizations in the city of Cuenca in Ecuador. The model is evaluated with the software product Smart PLS; finally, with the results provided from the analysis of the model, it is concluded that “organizational strengths, policies, procedures (FO)” have a positive influence on “applications used (AP)”, “energy efficiency (EE)”, “print management and paper use (IP)” and “treatment and disposal of technological waste (RT)”. On the other hand, it is concluded that “used applications (AP)”, “energy efficiency (EE)” and “organizational strengths, policies, procedures (FO)”, “print management and paper use (IP)” and “treatment and disposal of technological waste (RT)” have no influence on “green IT in the organization (GIT)”. Keywords Green computing · Energy efficiency · Structure equations
A. Mory (B) Universidad de Las Islas Baleares, Cra. de Valldemossa, km 7.5, Palma de Mallorca, Spain e-mail: [email protected] D. Cordero Universidad Católica de Cuenca, Av. de Las Américas y Tarqui, Cuenca, Ecuador e-mail: [email protected] S. Astudillo · A. L. Serrano Universidad de Cuenca, Av. 12 de Abril, Cuenca, Ecuador e-mail: [email protected] A. L. Serrano e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 X.-S. Yang et al. (eds.), Proceedings of Sixth International Congress on Information and Communication Technology, Lecture Notes in Networks and Systems 236, https://doi.org/10.1007/978-981-16-2380-6_68
773
774
A. Mory et al.
1 Introduction This study comprises an analysis of green IT practices, also known as green computing (GC), within the business sector of the city of Cuenca in Ecuador. The main objective is to determine which of the practices currently implemented have a direct link to the standards of low consumption and efficiency of computer resources. It is important to determine the weighting of each of the variables in order to determine those most relevant within the different organizations. The problem of research from an ecological point of view determines the nonexistence of practices and techniques in organizations that make use of green information technologies in a way that contributes to the protection of the environment. To this end, a review of the literature is carried out on the implications of green IT, green computing, green practices, reference models, experiences, strategies, application techniques on hardware and software in certain organizations and an investigation into the current environmental protection standards in the region and the country.
2 Literature Review 2.1 Theoretical Framework IT power control, PC supply management, remote conferencing, remote computing and reuse of IT equipment are all initiatives currently being adopted, involving cost reduction and environmental protection [1]. Another important piece of information on environmental impact is paper consumption in companies [2, 3]. Authors like Rodríguez and Carreño [4] indicate that there has been an increase in energy consumption within the organizations for some years now, due to the development of new technology and services offered. Cabarcas et al. [5] point out that internet applications are now available to evaluate and select computer equipment such as laptops that meet low power consumption standards. A direct relationship is being built between the IT industry and climate change, taking into account CO2 emissions, because worldwide most of them come from data centers. In the face of this, it must become an alternative to eliminate emissions, energy consumption and waste [6]. In this sense, Valdés [7] states that during the first decade of the twenty-first century, organizations on all continents have made a series of efforts to increase the socialization of green IT; on the other hand, institutions are participating in the creation of energy-efficient products [8]. Several initiatives are being implemented in Latin America and in the Caribbean to minimize the impact of climate change through IT and electronic waste management and recycling [9]. England and Bartczak [10] say that the drive toward ecological sustainability in organizations has increased both at the public and private level, thus the increase in green jobs focused on IT, but the problem is finding qualified workers in green IT. As a result, there is a growing demand for education in green IT.
Green IT Practices in the Business Sector
775
In Ecuador, private initiatives have been implemented, especially by mobile phone companies, which seek to recycle mobile devices [9]. For its part, the Ecuadorian state created a regulatory context integrating techno-political proposals and technical aspects, to guarantee that the organizations operating in the national territory meet the energy sustainability requirements [11, 12]. In addition, the National Assembly approved the Organic Law on Energy Efficiency in March 2019 [13]. Pollution is defined as the process of releasing pollutants into the environment that cause adverse changes, which takes the form of chemical nuclei or energy such as noise, heat or light, where the pollutants and components may be substances, foreign energies or natural contaminants [14–16]. The lifecycle of the equipment or IT resources is defined as the time that the device provides a contribution to the institution without causing loss of time or quality. This time is predetermined by the manufacturer, but can be altered by the conditions of use and the deterioration suffered by the interaction of the environment [17]. For Garcia [18], the lifecycle comprised a series of sequential and interrelated stages of a product system, from the acquisition or generation of raw materials to their disposal. It is advisable to follow the 3Rs model (reduce, reuse and recycle) to reduce the volume of technological waste [19]. The processes that comprise green technologies are virtualization, cloud computing, client-server technology, efficient data centers, cluster computing, use of renewable energy and recycling of electronic components [20]. Green information systems (green IS) refer to the ecological and efficient use of computer resources, cellular telephony and other information media with the aim of reducing environmental impact as much as possible and increasing economic viability [21]. Pinto [22] defines technological scrap as those old electronic devices that are at the end of their useful life, such as computers, televisions, telephones, mobile phones, among others, that have been disposed of. When electronic waste is disposed of without any control, negative impacts on the environment and health will [23]. Then, IT equipment recycling and disposal programs can create habits of responsible disposal of computer equipment at the end of its life or reuse if it could be [24]. IT processes, policies, procedures and standards impact organizational behavior, end-user computing, uand sage standards for the appropriate use of end-user computing equipment [25].
3 Materials and Methods 3.1 Methodology The research is of an exploratory type since it allows a general vision to be given, with respect to a certain reality. In addition, the green IT theme has been little explored and recognized in the environment; it aims to determine trends and identify potential relationships between the variables and the constructs identified. It is also descriptive
776
A. Mory et al.
and correlational since it seeks to determine the way in which green IT practices are carried out in the organizations analyzed. At the same time, the degree of relationship or association existing between two or more variables is determined (hypothesis determination), prior to measuring the variables and then, by means of correlational hypothesis tests and the application of statistical techniques, the correlation is estimated.
3.2 Hypothesis and Model With the determination of the parameters in the theory, the following study hypotheses are put forward. H1: The applications used (AP) influence green IT in the organization (GIT). H2: Energy efficiency (EE) influences green IT in the organization (IWG). H3: Organizational strengths, policies, procedures (FO) influence the applications used (AP). H4: Organizational strengths, policies, procedures (FO) influence energy efficiency (EE). H5: Organizational strengths, policies, procedures (FO) influence green IT in the organization (GIT). H6: Organizational strengths, policies, procedures (FO) influence print management and paper use (IP). H7: Organizational strengths, policies, procedures (FO) influence the treatment and disposal of technological (RT) waste H8: Print management and paper use (IP) influences green IT in the organization (GIT). H9: The treatment and disposal of technological waste (RT) influences green IT in the organization (GIT). The set of hypotheses gives rise to the formation of a structural equation model, which is made up of six constructs or variables as shown in Fig. 1.
3.3 Instrument for Information Gathering In order to design the questionnaire, questions were developed and selected from the review of the literature, and the instrument was subjected to expert judgment, in order to analyze and refine the quality of the texts. The instrument contains 60 questions (indicators), resulting from the operationalization of the variables or constructs of the model; it is evaluated through a Likert scale from 1 to 4, where 1, never; 2, rarely; 3, often; 4, almost always; 5, always, in reference to the digital competencies model.
Green IT Practices in the Business Sector
777
Fig. 1 Structural model
3.4 Procedure Data processing is executed as follows: Authorization to gather information, given by the directors of the organizations, following the motivation of the research group. Disposition of the database of all informants. Selection of most suitable informants. Application of the questionnaire via the web to the selected actors; for this purpose, the e-mail address where the instrument built with Google Drive is located is sent. The information survey is carried out between February and April 2020. The data from the web-based survey are resident in Excel/In SPSS V.20, the survey is formatted; the Excel data is migrated to SPSS, to obtain descriptive statistics from the respondents. The structural model is implemented with the Smart PLS 3.1.9 software. The data in Smart PLS 3.1.9 is imported from SPSS. With the model generated in Smart PLS 3.1.9, statistics are generated that correspond to structural models.
4 Results 4.1 Validity and Reliability of the Measurement Model Here it is validated whether the theoretical concepts are supported through the observed variables, and the values are broken down in Table 1.
778
A. Mory et al.
Table 1 Model reliability analysis Parameter: Individual reliability Compliance values
Values obtained
The values of the loads are analyzed (λ) with respect to their own variable, and this value must be >0.707
The values obtained in the model show that some indicators are lower than 0.6, so they were eliminated, resulting in the final model
Parameter: Reliability of each variable or construct Compliance values
Values obtained
It evaluates Cronbach’s alpha and the composite reliability of the construct, should mark a value >0.7
The values obtained check the internal consistency of all the latent variables or constructs
Parameter: Convergent validity Compliance values
Values obtained
This parameter rates the AVE value, which should be set higher than 0.5
The results confirm that the indicators effectively measure the construct
Parameter: Discriminant validity Compliance values
Values obtained
The square root of AVE must be greater than or This criterion is not fully met in all equal to the correlation between the same constructions construct Cross load check Compliance values
Values obtained
The correlation value between an indicator and This parameter is also met for most indicators, its construct must be greater than the except for only an EE13, which implies that correlation value between the indicator and each construct is different from the others another construct
4.2 Assessment of the Structural Model This part seeks to demonstrate the relational hypotheses of the model (see Table 2).
5 Conclusions The identification of the variables present in the application of green IT practices within organizations is fulfilled. The research determined the situational scenario of four companies in terms of the level of involvement with these practices. With this information, a structural model was structured which made it possible to investigate the behavior of the parameters or constructs and predict how the variations on these affect the model in general. The model aims to evaluate the factors involved in the perception that the staff has in relation to the achievement of adequate practices for green IT; the proposed
Green IT Practices in the Business Sector
779
Table 2 Correlation analysis between variables Parameter: Index R2 Compliance values
Values obtained
The predictive capacity of the model is The values obtained from the R2 . In the case measured, for this the value of R2 must be > 0.1 of the GIT construct, it is clear that the AP, IP, EE, RT constructs have adequate predictive power Parameter: Effect ƒ2 Compliance values
Values obtained
It evaluates the impact of a latent variable on a dependent construct. Acceptable values are those above 0.15
In terms of value ƒ2 The only relationships that meet these parameters are those established between the FO organizational strengths, but not with respect to the GIT variable
Parameter: Standardized path coefficients Compliance values
Values obtained
The value of these standardized coefficients must exceed a minimum of 0.2
With regard to the path coefficients, the accepted values are those that are positive; however, according to the statistics, it is necessary that these have a value >0.2, giving as valid results the same correlations obtained in ƒ2
Parameter: Bootstrapping analysis Compliance values
Values obtained
This process identifies the standard error and the Student’s t coefficients. For a coefficient to be considered significant it must exceed the value of 1.96. Furthermore, to give a hypothesis as valid the value of p must be lower than 0.05
The bootstrapping results indicated in Table 3 once again show that the correlations between FO and the other constructs are significant, since they have a Student’s t value greater than 1.96 and a p-value (1 − (1 − t 2 ))k = t 2k .
Therefore, E
min xi − y ≤2T
(t 2k )n dt
1≤i≤n
ln(1−y m ) ln(1−(1−t 2 )m ) ln(1−y) ≤ ln(1−(1−t 2 ))
(1 − [1 − t 2 ] M )n dt.
+ 2T ln(1−y m ) ln(1−(1−t 2 )m ) ln(1−y) > ln(1−(1−t 2 ))
According to Lemma 12, we further have E
min xi − y ≤2T
(t 2k )n dt + 2T
1≤i≤n
(1−t 2 )≥y
⎛
(1 − [1 − t 2 ] M )n dt (1−t 2 )