1,009 52 33MB
English Pages XVIII, 1047 [1013] Year 2021
Advances in Intelligent Systems and Computing 1158
Sanjiv K. Bhatia · Shailesh Tiwari · Su Ruidan · Munesh Chandra Trivedi · K. K. Mishra Editors
Advances in Computer, Communication and Computational Sciences Proceedings of IC4S 2019
Advances in Intelligent Systems and Computing Volume 1158
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/11156
Sanjiv K. Bhatia Shailesh Tiwari Su Ruidan Munesh Chandra Trivedi K. K. Mishra •
•
•
•
Editors
Advances in Computer, Communication and Computational Sciences Proceedings of IC4S 2019
123
Editors Sanjiv K. Bhatia Department of Mathematics and Computer Science University of Missouri–St. Louis Chesterfield, MO, USA Su Ruidan Shanghai Advanced Research Institute Pudong, China
Shailesh Tiwari Computer Science Engineering Department ABES Engineering College Ghaziabad, Uttar Pradesh, India Munesh Chandra Trivedi National Institute of Technology Agartala Agartala, Tripura, India
K. K. Mishra Computer Science Engineering Department Motilal Nehru National Institute of Technology Allahabad, Uttar Pradesh, India
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-981-15-4408-8 ISBN 978-981-15-4409-5 (eBook) https://doi.org/10.1007/978-981-15-4409-5 © Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
The IC4S is a major multidisciplinary conference organized with the objective of bringing together researchers, developers and practitioners from academia and industry working in all areas of computer and computational sciences. It is organized specifically to help computer industry to derive the advances of next-generation computer and communication technology. Researchers invited to speak will present the latest developments and technical solutions. Technological developments all over the world are dependent upon globalization of various research activities. Exchange of information and innovative ideas is necessary to accelerate the development of technology. Keeping this ideology in preference, the International Conference on Computer, Communication and Computational Sciences (IC4S 2019) has been organized at Mandarin Hotel Bangkok, Bangkok, Thailand, during 11–12 October 2019. This is the third time the International Conference on Computer, Communication and Computational Sciences has been organized with a foreseen objective of enhancing the research activities at a large scale. Technical Program Committee and Advisory Board of IC4S include eminent academicians, researchers and practitioners from abroad as well as from all over the nation. In this book, selected manuscripts have been subdivided into various tracks named—Advanced Communications and Security, Intelligent Hardware and Software Design, Intelligent Computing Techniques, Web and Informatics and Intelligent Image Processing. A sincere effort has been made to make it an immense source of knowledge for all and includes 91 manuscripts. The selected manuscripts have gone through a rigorous review process and are revised by authors after incorporating the suggestions of the reviewers. IC4S 2018 received around 490 submissions from around 770 authors of 22 different countries such as USA, Iceland, China, Saudi Arabia, South Africa, Taiwan, Malaysia, Indonesia, Europe and many more. Each submission has been gone through the plagiarism check. On the basis of plagiarism report, each submission was rigorously reviewed by atleast two reviewers with an average of 2.4 per reviewer. Even some submissions have more than two reviews. On the basis
v
vi
Preface
of these reviews, 91 high-quality papers were selected for publication in this proceedings volume, with an acceptance rate of 18.57%. We are thankful to the keynote speakers—Prof. Shyi-Ming Chen, IEEE Fellow, IET Fellow, IFSA Fellow, Chair Professor in National Taiwan University of Science and Technology, Taiwan, and Prof. Maode Ma, IET Fellow, Nanyang Technological University, Singapore, to enlighten the participants with their knowledge and insights. We are also thankful to delegates and the authors for their participation and their interest in IC4S 2019 as a platform to share their ideas and innovation. We are also thankful to the Prof. Dr. Janusz Kacprzyk, Series Editor, AISC, Springer, for providing guidance and support. Also, we extend our heartfelt gratitude to the reviewers and Technical Program Committee Members for showing their concern and efforts in the review process. We are indeed thankful to everyone directly or indirectly associated with the conference organizing team leading it towards the success. Although utmost care has been taken in compilation and editing, however, a few errors may still occur. We request the participants to bear with such errors and lapses (if any). We wish you all the best.
Bangkok, Thailand
Editors Sanjiv K. Bhatia Shailesh Tiwari Munesh Chandra Trivedi K. K. Mishra
About This Book
With advent of technology, intelligent and soft computing techniques came into existence with a wide scope of implementation in engineering sciences. Nowadays, technology is changing with a speedy pace and innovative proposals that solve the engineering problems intelligently are gaining popularity and advantages over the conventional solutions to these problems. It is very important for research community to track the latest advancements in the field of computer sciences. Keeping this ideology in preference, this book includes the insights that reflect the Advances in Computer and Computational Sciences from upcoming researchers and leading academicians across the globe. It contains the high-quality peer-reviewed papers of ‘International Conference on Computer, Communication and Computational Sciences (IC4S-2019)’, held during 11–12 October 2019 at Mandarin Hotel Bangkok, Bangkok, Thailand. These papers are arranged in the form of chapters. The content of this book is divided into five broader tracks that cover variety of topics. These tracks are: Advanced Communications and Security, Intelligent Hardware and Software Design, Intelligent Computing Techniques, Web and Informatics and Intelligent Image Processing. This book helps the perspective readers’ from computer and communication industry and academia to derive the immediate surroundings developments in the field of communication and computer sciences and shape them into real-life applications.
vii
Contents
Advanced Communications and Security A Novel Crypto-Ransomware Family Classification Based on Horizontal Feature Simplification . . . . . . . . . . . . . . . . . . . . . . . . . . Mohsen Kakavand, Lingges Arulsamy, Aida Mustapha, and Mohammad Dabbagh Characteristic Analysis and Experimental Simulation of Diffuse Link Channel for Indoor Wireless Optical Communication . . . . . . . . . . . . . Peinan He and Mingyou He A Comparative Analysis of Malware Anomaly Detection . . . . . . . . . . . Priynka Sharma, Kaylash Chaudhary, Michael Wagner, and M. G. M. Khan
3
15 35
Future Identity Card Using Lattice-Based Cryptography and Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Febrian Kurniawan and Gandeva Bayu Satrya
45
Cryptanalysis on Attribute-Based Encryption from Ring-Learning with Error (R-LWE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tan Soo Fun and Azman Samsudin
57
Enhanced Password-Based Authentication Mechanism in Cloud Computing with Extended Honey Encryption (XHE): A Case Study on Diabetes Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tan Soo Fun, Fatimah Ahmedy, Zhi Ming Foo, Suraya Alias, and Rayner Alfred
65
An Enhanced Wireless Presentation System for Large-Scale Content Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Khong-Neng Choong, Vethanayagam Chrishanton, and Shahnim Khalid Putri
75
ix
x
Contents
On Confidentiality, Integrity, Authenticity, and Freshness (CIAF) in WSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shafiqul Abidin, Vikas Rao Vadi, and Ankur Rana
87
Networking Analysis and Performance Comparison of Kubernetes CNI Plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ritik Kumar and Munesh Chandra Trivedi
99
Classifying Time-Bound Hierarchical Key Assignment Schemes . . . . . Vikas Rao Vadi, Naveen Kumar, and Shafiqul Abidin
111
A Survey on Cloud Workflow Collaborative Adaptive Scheduling . . . . Delong Cui, Zhiping Peng, Qirui Li, Jieguang He, Lizi Zheng, and Yiheng Yuan
121
Lattice CP-ABE Scheme Supporting Reduced-OBDD Structure . . . . . Eric Affum, Xiasong Zhang, and Xiaofen Wang
131
Crypto-SAP Protocol for Sybil Attack Prevention in VANETs . . . . . . Mohamed Khalil and Marianne A. Azer
143
Managerial Computer Communication: Implementation of Applied Linguistics Approaches in Managing Electronic Communication . . . . . Marcel Pikhart and Blanka Klímová
153
Advance Persistent Threat—A Systematic Review of Literature and Meta-Analysis of Threat Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . Safdar Hussain, Maaz Bin Ahmad, and Shariq Siraj Uddin Ghouri
161
Construction of a Teaching Support System Based on 5G Communication Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hanhui Lin, Shaoqun Xie, and Yongxia Luo
179
Intelligent Hardware and Software Design Investigating the Noise Barrier Impact on Aerodynamics Noise: Case Study at Jakarta MRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sugiono Sugiono, Siti Nurlaela, Andyka Kusuma, Achmad Wicaksono, and Rio P. Lukodono
189
3D Cylindrical Obstacle Avoidance Using the Minimum Distance Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Krishna Raghuwaiya, Jito Vanualailai, and Jai Raj
199
Path Planning of Multiple Mobile Robots in a Dynamic 3D Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jai Raj, Krishna Raghuwaiya, Jito Vanualailai, and Bibhya Sharma
209
Autonomous Quadrotor Maneuvers in a 3D Complex Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jito Vanualailai, Jai Raj, and Krishna Raghuwaiya
221
Contents
xi
Tailoring Scrum Methodology for Game Development . . . . . . . . . . . . Towsif Zahin Khan, Shairil Hossain Tusher, Mahady Hasan, and M. Rokonuzzaman
233
Designing and Developing a Game with Marketing Concepts . . . . . . . Towsif Zahin Khan, Shairil Hossain Tusher, Mahady Hasan, and M. Rokonuzzaman
245
Some Variants of Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . . . . Ray-Ming Chen
253
An Exchange Center Based Digital Cash Payment Solution . . . . . . . . . Yong Xu and Jingwen Li
265
Design and Implementation of Pianos Sharing System Based on PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sheng Liu, Chu Yang, and Xiaoming You A Stochastic Framework for Social Media Adoption or Abandonment: Higher Education . . . . . . . . . . . . . . . . . . . . . . . . . . Mostafa Hamadi, Jamal El-Den, Cherry Narumon Sriratanaviriyakul, and Sami Azam Low-Earth Orbital Internet of Things Satellite System on the Basis of Distributed Satellite Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . Mikhail Ilchenko, Teodor Narytnyk, Vladimir Prisyazhny, Segii Kapshtyk, and Sergey Matvienko Automation of the Requisition Process in Material Supply Chain of Construction Firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adedeji Afolabi, Yewande Abraham, Rapheal Ojelabi, and Oluwafikunmi Awosika Developing an Adaptable Web-Based Profile Record Management System for Construction Firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adedeji Afolabi, Yewande Abraham, Rapheal Ojelabi, and Etuk Hephzibah Profile Control System for Improving Recommendation Services . . . . Jaewon Park, B. Temuujin, Hyokyung Chang, and Euiin Choi IoT-Based Smart Application System to Prevent Sexual Harassment in Public Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Md. Wahidul Hasan, Akil Hamid Chowdhury, Md Mehedi Hasan, Arup Ratan Datta, A. K. M. Mahbubur Rahman, and M. Ashraful Amin A Decision Support System Based on WebGIS for Supporting Community Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wichai Puarungroj, Suchada Phromkhot, Narong Boonsirisumpun, and Pathapong Pongpatrakant
275
287
301
315
325
335
341
353
xii
Contents
Structural Application of Medical Image Report Based on Bi-CNNs-LSTM-CRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aesha Abdullah Moallim and Li Ji Yun Integrating QR Code-Based Approach to University e-Class System for Managing Student Attendance . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suwaibah Abu Bakar, Shahril Nazim Mohamed Salleh, Azamuddin Rasidi, Rosmaini Tasmin, Nor Aziati Abd Hamid, Ramatu Muhammad Nda, and Mohd Saufi Che Rusuli
365
379
Intelligent Computing Techniques Decision-Making System in Tannery by Using Fuzzy Logic . . . . . . . . . Umaphorn Tan and Kanate Puntusavase
391
A Study on Autoplay Model Using DNN in Turn-Based RPG . . . . . . . Myoungyoung Kim, Jaemin Kim, Deukgyu Lee, Jihyeong Son, and Wonhyung Lee
399
Simulation Optimization for Solving Multi-objective Stochastic Sustainable Liner Shipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Saowanit Lekhavat and Habin Lee
409
Fast Algorithm for Sequence Edit Distance Computation . . . . . . . . . . Hou-Sheng Chen, Li-Ren Liu, and Jian-Jiun Ding
417
Predicting Student Final Score Using Deep Learning . . . . . . . . . . . . . . Mohammad Alodat
429
Stance Detection Using Transformer Architectures and Temporal Convolutional Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kushal Jain, Fenil Doshi, and Lakshmi Kurup
437
Updated Frequency-Based Bat Algorithm (UFBBA) for Feature Selection and Vote Classifier in Predicting Heart Disease . . . . . . . . . . Himanshu Sharma and Rohit Agarwal
449
A New Enhanced Recurrent Extreme Learning Machine Based on Feature Fusion with CNN Deep Features for Breast Cancer Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rohit Agarwal and Himanshu Sharma
461
Deep Learning-Based Severe Dengue Prognosis Using Human Genome Data with Novel Feature Selection Method . . . . . . . . . . . . . . . Aasheesh Shukla and Vishal Goyal
473
An Improved DCNN-Based Classification and Automatic Age Estimation from Multi-factorial MRI Data . . . . . . . . . . . . . . . . . . . . . . Ashish Sharma and Anjani Rai
483
Contents
xiii
The Application of Machine Learning Methods in Drug Consumption Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peng Han
497
Set Representation of Itemset for Candidate Generation with Binary Search Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Carynthia Kharkongor and Bhabesh Nath
509
Robust Moving Targets Detection Based on Multiple Features . . . . . . Jing Jin, Jianwu Dang, Yangpin Wang, Dong Shen, and Fengwen Zhai Digital Rock Image Enhancement via a Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yunfeng Bai and Vladimir Berezovsky Enhancing PSO for Dealing with Large Data Dimensionality by Cooperative Coevolutionary with Dynamic Species-Structure Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kittipong Boonlong and Karoon Suksonghong A New Encoded Scheme GA for Solving Portfolio Optimization Problems in the Big Data Environment . . . . . . . . . . . . . . . . . . . . . . . . Karoon Suksonghong and Kittipong Boonlong Multistage Search for Performance Enhancement of Ant Colony Optimization in Randomly Generated Road Profile Identification Using a Quarter Vehicle Vibration Responses . . . . . . . . . . . . . . . . . . . Kittikon Chantarattanakamol and Kittipong Boonlong
521
533
539
551
561
Classification and Visualization of Poverty Status: Analyzing the Need for Poverty Assistance Using SVM . . . . . . . . . . . . . . . . . . . . Maricel P. Naviamos and Jasmin D. Niguidula
571
Comparative Analysis of Prediction Algorithms for Heart Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ishita Karun
583
Sarcasm Detection Approaches Survey . . . . . . . . . . . . . . . . . . . . . . . . . Anirudh Kamath, Rahul Guhekar, Mihir Makwana, and Sudhir N. Dhage
593
Web and Informatics Interactive Animation and Affective Teaching and Learning in Programming Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alvin Prasad and Kaylash Chaudhary IoT and Computer Vision-Based Electronic Voting System . . . . . . . . . Md. Nazmul Islam Shuzan, Mahmudur Rashid, Md. Aowrongajab Uaday, and M. Monir Uddin
613 625
xiv
Contents
Lexical Repository Development for Bugis, a Minority Language . . . . Sharifah Raihan Syed Jaafar, Nor Hashimah Jalaluddin, Rosmiza Mohd Zainol, and Rahilah Omar
639
Toward EU-GDPR Compliant Blockchains with Intentional Forking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wolf Posdorfer, Julian Kalinowski, and Heiko Bornholdt
649
Incorum: A Citizen-Centric Sensor Data Marketplace for Urban Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heiko Bornholdt, Dirk Bade, and Wolf Posdorfer
659
Developing an Instrument for Cloud-Based E-Learning Adoption: Higher Education Institutions Perspective . . . . . . . . . . . . . . . . . . . . . . Qasim AlAjmi, Ruzaini Abdullah Arshah, Adzhar Kamaludin, and Mohammed A. Al-Sharafi
671
Gamification Application in Different Business Software Systems—State of Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zornitsa Yordanova
683
Data Exchange Between JADE and Simulink Model for Multi-agent Control Using NoSQL Database Redis . . . . . . . . . . . . . . . . . . . . . . . . . Yulia Berezovskaya, Vladimir Berezovsky, and Margarita Undozerova
695
Visualizing Academic Experts on a Subject Domain Map of Cartographic-Alike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diana Purwitasari, Rezky Alamsyah, Dini Adni Navastara, Chastine Fatichah, Surya Sumpeno, and Mauridhi Hery Purnomo An Empirical Analysis of Spatial Regression for Vegetation Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hemlata Goyal, Sunita Singhal, Chilka Sharma, and Mahaveer Punia Extracting Temporal-Based Spatial Features in Imbalanced Data for Predicting Dengue Virus Transmission . . . . . . . . . . . . . . . . . . . . . . Arfinda Setiyoutami, Wiwik Anggraeni, Diana Purwitasari, Eko Mulyanto Yuniarno, and Mauridhi Hery Purnomo The Application of Medical and Health Informatics Among the Malaysian Medical Tourism Hospital: A Preliminary Study . . . . . Hazila Timan, Nazri Kama, Rasimah Che Mohd Yusoff, and Mazlan Ali Design of Learning Digital Tools Through a User Experience Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gloria Mendoza-Franco, Jesús Manuel Dorador-González, Patricia Díaz-Pérez, and Rolando Zarco-Hernández
707
721
731
743
755
Contents
Fake Identity in Political Crisis: Case Study in Indonesia . . . . . . . . . . Kristina Setyowati, Apneta Vionuke Dihandiska, Rino A. Nugroho, Teguh Budi Santoso, Okki Chandra Ambarwati, and Is Hadri Utomo Cloud Computing in the World and Czech Republic—A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Petra Poulová, Blanka Klímová, and Martin Švarc Data Quality Improvement Strategy for the Certification of Telecommunication Tools and Equipment: Case Study at an Indonesia Government Institution . . . . . . . . . . . . . . . . . . . . . . . . E. A. Puspitaningrum, R. F. Aji, and Y. Ruldeviyani Evolution of Neural Text Generation: Comparative Analysis . . . . . . . . Lakshmi Kurup, Meera Narvekar, Rahil Sarvaiya, and Aditya Shah Research on the Status and Strategy of Developing Financial Technology in China Commercial Bank . . . . . . . . . . . . . . . . . . . . . . . . Ze-peng Chen, Jie-hua Xie, Cheng-qing Li, Jie Xiao, and Zi-yi Huang Understanding Issues Affecting the Dissemination of Weather Forecast in the Philippines: A Case Study on DOST PAGASA Mobile Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lory Jean L. Canillo and Bryan G. Dadiz
xv
765
771
779 795
805
821
Guideme: An Optimized Mobile Learning Model Based on Cloud Offloading Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rasha Elstohy, Wael Karam, Nouran Radwan, and Eman Monir
831
Model Development in Predicting Seaweed Production Using Data Mining Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joseph G. Acebo, Larmie S. Feliscuzo, and Cherry Lyn C. Sta. Romana
843
A Survey on Crowd Counting Methods and Datasets . . . . . . . . . . . . . Wang Jingying Decentralized Marketplace Using Blockchain, Cryptocurrency, and Swarm Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jorge Ramón Fonseca Cacho, Binay Dahal, and Yoohwan Kim A Expansion Method for DriveMonitor Trace Function . . . . . . . . . . . Dong Liu
851
865 883
Load Prediction Energy Efficient VM Consolidation Policy in Multimedia Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . K. P. N. Jayasena and G. K. Suren W. de Chickera
893
An Attribute-Based Access Control Mechanism for Blockchain-Enabled Internet of Vehicles . . . . . . . . . . . . . . . . . . . . Sheng Ding and Maode Ma
905
xvi
Contents
Intelligent Image Processing An Investigation on the Effectiveness of OpenCV and OpenFace Libraries for Facial Recognition Application . . . . . . . . . . . . . . . . . . . . Pui Kwan Fong and Ven Yu Sien
919
Virtual Reality as Support of Cognitive Behavioral Therapy of Adults with Post-Traumatic Stress Disorder . . . . . . . . . . . . . . . . . . Ivan Kovar
929
Facial Expression Recognition Using Wavelet Transform and Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . Dini Adni Navastara, Hendry Wiranto, Chastine Fatichah, and Nanik Suciati Survey of Automated Waste Segregation Methods . . . . . . . . . . . . . . . . Vaibhav Bagri, Lekha Sharma, Bhaktij Patil, and Sudhir N. Dhage
941
953
Classification of Human Blastocyst Quality Using Wavelets and Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Irmawati, Basari, and Dadang Gunawan
965
Affinity-Preserving Integer Projected Fixed Point Under Spectral Technique for Graph Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Beibei Cui and Jean-Charles Créput
975
A New Optimized GA-RBF Neural Network Algorithm for Oil Spill Detection in SAR Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vishal Goyal and Aasheesh Shukla
987
Survey of Occluded and Unoccluded Face Recognition . . . . . . . . . . . . 1001 Shiye Xu A Survey on Dynamic Sign Language Recognition . . . . . . . . . . . . . . . 1015 Ziqian Sun Extract and Merge: Merging Extracted Humans from Different Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1023 Minkesh Asati, Worranitta Kraisittipong, and Taizo Miyachi A Survey of Image Enhancement and Object Detection Methods . . . . 1035 Jinay Parekh, Poojan Turakhia, Hussain Bhinderwala, and Sudhir N. Dhage
About the Editors
Sanjiv K. Bhatia works as a Professor of Computer Science at the University of Missouri, St. Louis, USA. His primary areas of research include image databases, digital image processing, and computer vision. In addition to publishing many articles in these areas, he has consulted extensively with industry for commercial and military applications of computer vision. He is an expert on system programming and has worked on real-time and embedded applications. He has taught a broad range of courses in computer science and has been the recipient of the Chancellor’s Award for Excellence in Teaching. He is also the Graduate Director for Computer Science in his department. He is a senior member of ACM. Shailesh Tiwari currently works as a Professor at the Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, India. He is an alumnus of Motilal Nehru National Institute of Technology Allahabad, India. His primary areas of research are software testing, implementation of optimization algorithms, and machine learning techniques in software engineering. He has authored more than 50 publications in international journals and the proceedings of leading international conferences. He also serves as an editor for various Scopus, SCI, and E-SCI-indexed journals and has organized several international conferences under the banner of the IEEE and Springer. He is a senior member of the IEEE and a member of the IEEE Computer Society. Su Ruidan is currently an Assistant Professor at Shanghai Advanced Research Institute, Chinese Academy of Sciences. He has completed his Ph.D. from Northeastern University in 2014. His research areas include machine learning, computational intelligence, software engineering, data analytics, system optimization, multi-population genetic algorithm. Dr. Su has served as Editor-in-Chief of the journal “Journal of Computational Intelligence and Electronic Systems” during 2012–2016. He has published more than 20 papers in international journals.
xvii
xviii
About the Editors
Dr. Munesh Chandra Trivedi is currently working as Associate Professor, Department of Computer Science & Engineering, National Institute of Technology, Agartala (Tripura). He worked as Dean Academics, HoD & Associate Professor (IT), Rajkiya Engineering College with additional responsibility of Associate Dean UG Programs, Dr. APJ Abdul Kalam Technical University, Lucknow (State Technical University). He was also the Director (In charge) at Rajkiya Engineering College, Azamgarh. He has a very rich experience of teaching the undergraduate and postgraduate classes in Government Institutions as well as prestigious Private institutions. He has 11 patents in his credit. He has published 12 text books and 107 research papers publications in different International Journals and in Proceedings of International Conferences of repute. He has also edited 21 books of the Springer Nature and also written 23 book chapters for Springer Nature. He has received numerous awards including Young Scientist Visiting Fellowship, Best Senior Faculty award, outstanding Scientist, Dronacharya Award, Author of Year and Vigyan Ratan Award from different national as well international forum. He has organized more than 32 international conferences technically sponsored by IEEE, ACM and Springer’s. He has also worked as Member of organizing committee in several IEEE international conferences in India and abroad. He was Executive Committee Member of IEEE UP Section, IEEE computer Society Chapter India Council and also IEEE Asia Pacific Region-10. He is an active member of IEEE Computer Society, International Association of Computer Science and Information Technology, Computer Society of India, International Association of Engineers, and life member of ISTE. K. K. Mishra is currently working as an Assistant Professor at the Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, India. He has also been a Visiting Faculty at the Department of Mathematics and Computer Science, University of Missouri, St. Louis, USA. His primary areas of research include evolutionary algorithms, optimization techniques and design, and analysis of algorithms. He has also authored more than 50 publications in international journals and the proceedings of leading international conferences. He currently serves as a program committee member of several conferences and an editor for various Scopus and SCI-indexed journals.
Advanced Communications and Security
A Novel Crypto-Ransomware Family Classification Based on Horizontal Feature Simplification Mohsen Kakavand, Lingges Arulsamy, Aida Mustapha, and Mohammad Dabbagh
Abstract Analytical research on a distinct form of malware otherwise known as crypto-ransomware was studied in this current research. Recent incidents around the globe indicate crypto-ransomware has been an increasing threat due to its nature of encrypting victims, targeted information and keeping the decryption key in the deep Web until a reasonable sum of ransom is paid, usually by cryptocurrency. In addition, current intrusion detection systems (IDSs) are not accurate enough to evade attacks with intelligently written crypto-ransomware features such as polymorphic, environment mapping, and partially encrypting files or saturating the system with low entropy file write operations in order to produce a lower encryption footprint, which can cause inability toward the intrusion detection system (IDS) to detect malicious crypto-ransomware activity. This research has explored diverse data preprocessing technique to depict crypto-ransomware as images. In effort to classify cryptoransomware images, this research will utilize the existing neural network methods to train a classifier to classify new crypto-ransomware files into their family classes. In a broader context, the concept for this research is to create a crypto-ransomware early detection approach. Nevertheless, the primary contribution is the proof of baselining horizontal feature simplification concept, whereby it provides an accurate real-time detection rate for crypto-ransomware with less system load toward the user device.
M. Kakavand (B) · L. Arulsamy · M. Dabbagh Department of Computing and Information Systems, Sunway University Sunway City, Petaling Jaya, Malaysia e-mail: [email protected] L. Arulsamy e-mail: [email protected] M. Dabbagh e-mail: [email protected] A. Mustapha Department of Dept of Software Engineering, Universiti Tun Hussein Onn Malaysia Johor, Johor, Malaysia e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_1
3
4
M. Kakavand et al.
Keywords Sophisticated crypto-ransomware · Horizontal feature simplification · System load · Computational cost
1 Introduction The Internet of things (IoT) relates to the interconnected network of intelligent devices, sensors, embedded computers and so forth. IoT applications propagate most life infrastructure from health, food to sophisticated cities and urban handling. While IoT’s effectiveness and prevalence are increasing, security concerns for the sectors still remain a needed consideration [1]. Therefore, as cybersecurity threats continue to evolve, thus crypto-ransomware is also becoming the number one menace for both consumers and businesses using the Internet of things (IoT) devices worldwide. Moreover, crypto-ransomware has caused millions of dollars in loss for both information and money in paying the ransom. Current intrusion detection systems (IDSs) are not accurate enough to evade attacks with intelligently written crypto-ransomware such as polymorphic, environment mapping, partially encrypting files or saturating the system with low entropy file write operations [2]. Thus, this research study proposes a novel crypto-ransomware classification based on horizontal feature simplification (HFS) approach, in order to identify crypto-ransomware family attacks by monitoring the hexadecimal coding pattern of different structured and procedural programming languages to classify cryptoransomware family from non-malicious applications. Furthermore, this algorithm also helps to prevent sophisticated written crypto-ransomware from infecting the host. The primary contribution of this research is to proof the baselining horizontal feature simplification can provide a transfiguration of a data to an image without compromising the integrity of the features.
2 Related Work Crypto-ransomware development continues to dominate the threat landscape and influenced vital sectors (hospitals, banks, universities, government, law firms, mobile users) as well varied organizations equally worldwide [3, 4]. Securing from cryptoransomware is a vigorous analysis space. Furthermore, crypto-ransomware classification is one of many analyzed challenges and opportunities are known and associate current analysis topic [2, 5]. However, there are a few approaches to detect and remove the threat. One of the detection approaches includes using static-based analysis which means analyzing an application’s code prior to its execution to determine if it is capable of any malicious activities. On the other hand, the fundamental flaw of signature-based detection is its inability to detect unknown malware which has yet to be turned into a signature. A malicious executable is only detectable once
A Novel Crypto-Ransomware Family Classification Based …
5
it has first been reported as malicious and added to the malicious signature repository. Moreover, static-based detection is also ineffective against code obfuscation, high variant output and targeted attacks [2]. Furthermore, static-based detection is not an effective stand-alone approach to detect crypto-ransomware. Therefore, past researcher has reviewed on crypto-ransomware characteristics and developed few detection methods in order to overcome mitigation of crypto-ransomware [6]. Shaukat and Ribeiro proposed RansomWall [7], a cryptographic-ransomwarelayered defense scheme. It follows a mixture of static and dynamic analysis strategy to produce a new compact set of features characterizing the behavior of cryptoransomware. This can be accomplished when initial RansomWall layers tag a process for suspected crypto-ransomware behavior, and process-modified files are backed up for user data preservation until they have been categorized as crypto-ransomware or benign. On the other hand, behavioral-based detection methods are based on detecting mass file encryption where it could be effective however may come at a resource-intensive cost; this is because the file entropy needs to be calculated for every single write operation executed by an application [2]. In addition, these operations need to track file operations for each file separately over the life span of an observed process. Hence, such an approach may considerably deteriorate disk read and write performance and result in a high system load. Besides that, detecting crypto-ransomware by analyzing the file rename operations to identify ransom-like file names or extensions may work on simple crypto-ransomware, but will not work on more intelligently written crypto-ransomware such as CryptXXX which randomizes the file name or Spore which retains the original file name. Consequently, this will lead the model to produce a high false-positive rate. Azmoodeh et al. suggested a solution [8] that uses a strategy based on machine learning to identify crypto-ransomware attacks by tracking android device energy usage. In particular, it has been suggested that technique tracks the energy consumption patterns of distinct procedures to classify non-malicious apps for cryptoransomware. However, the use of energy consumption to detect crypto-ransomware can trigger a significant false negative indicating that a crypto-ransomware is not identified and marked as a non-malicious application [2]. Typically, this could occur because crypto-ransomware developers are aware of data transformation analysis techniques that have been known to use simple tricks to mask the presence of mass file encryption [2]. Nevertheless, the use of energy consumption to detect cryptoransomware can also set off a notable false positive, whereby benign application such as Web browsers uses high system resource which could lead the model to indicating benign application is identified and marked as a malicious application. Sgandurra et al. proposed EldeRan [9], a dynamically analyzing and classifying machine learning approach for crypto-ransomware. EldeRan observes a set of actions applied in their first phase of installation to check for crypto-ransomware characteristics. In addition, EldeRan operates without needing a complete family of cryptoransomware to be accessible in advance. EldeRan, however, has some limitations. The first limitation addresses the analysis and identification of crypto-ransomware samples that have been silent for some duration or are waiting for a trigger action done by the user. Hence, EldeRan does not properly extract their features; this is due
6
M. Kakavand et al.
Table 1 Similarity between past models Past models
Advantage
Disadvantage
Uses dynamic analysis
High detection rate classifies ransomware family
Resource-intensive Impractical commercial use Vulnerable to sophisticated attacks
to certain crypto-ransomware capable of environment mapping [2]. In other words, when crypto-ransomware is executed it will map the system environment before initiating its setup procedure. Typically, this is performed to determine whether it runs on a real user system or on a sandbox setting that might attempt to analyze it [2]. Another limitation of their approach is that in the current settings no other applications were running in the analyzed virtual machine (VM), except the ones coming packed with a fresh installation of Windows. This might not be a limitation by itself, but as in the previous cases, the crypto-ransomware might do some checks as to evade being analyzed. Recently, [10] conducted a research work toward ransomware threats, which leading to propose Deep Ransomware Threat Hunting and Intelligence System (DRTHIS), whereby it is capable of detecting earlier in invisible ransomware data from new ransomware families in a timely and precise way. However, DRTHIS is not capable for classifying some new threat such as TorrentLocker attack. In summary, the preliminary literature review shows that past studies are primarily focused on understanding distinct types of system design and layered architecture as stated in Table 1. In terms of the detection engine, it uses classifiers in order to increase the detection rate. Many system designs use a similar approach as shown in Table 2 in order to detect crypto-ransomware behavior. However, the false-positive and falsenegative rate towards intrusion samples differ due the unique research model design and layered architecture. Therefore, the detection rates for false positive and false negative toward intrusion samples differ. Therefore, the aim of this research is to create an approach baselining horizontal feature simplification to provide accurate real-time detection rate for crypto-ransomware with less system load toward the user device.
3 Objectives The objective of this research is to develop a novel crypto-ransomware family classification based on horizontal feature simplification with reduced system constraint approach. The term constraint is defined here as the process of identifying cryptoransomware without overloading the users’ machine. For example, encryption measures such as entropy change which requires the file entropy to be calculated for every single write operation executed by an application. Furthermore, these operations need to track file operations for each file separately over the life span of an observed process. Such an approach may considerably cause high system load
A Novel Crypto-Ransomware Family Classification Based …
7
Table 2 Past model summary References
Model architecture
Advantage
Disadvantage
Homayoun et al. (2019)
Deep Ransomware Threat Hunting and Intelligence System (DRTHIS) approach is used to distinguish ransomware from goodware
DRTHIS is capable of detecting previously unseen ransomware data from new ransomware families in a timely and accurate manner
DRTHIS is not capable for classifying some new threat such as TorrentLocker attack
Shaukat and Ribeiro (2018)
Each layer tags the process of the Portable Executable file for behavior such as read, write, rename and delete operation
Able to detect common crypto-ransomware with high detection rate
Resource-intensive, whereby the file entropy needs to be calculated for every single write operation. Moreover, this operation will also deteriorate the disk read and write performance. Furthermore, vulnerable toward intelligently written crypto-ransomware [2]
Azmoodeh et al. (2017)
Tracks the energy consumption pattern of distinct procedures to classify crypto-ransomware from non-malicious
Outperform other models such as k-nearest neighbors, neural network, support vector machine and random forest
Having significant false positive due to certain characteristics and weak against partially encryption files [2]
Sgandurra (2016)
Analyze a set of Able to classify actions applied in their common variant of first phase of ransomware family installation to check for crypto-ransomware, without needing a complete family of crypto-ransomware to be accessible in advance
The model does not properly extract their features as certain crypto-ransomware capable of environment mapping [2]
leading to deteriorating the disk read and write performance. In summary, the objective is to produce a classification algorithm with the practical approach for feature representation that is able to distinguish the crypto-ransomware family with a low computational cost.
8
M. Kakavand et al.
4 Methodology This section describes the approaches will be taken to achieve the proposed solution. Moreover, this section intends to describe and show the relationship between the various work activities that are performed during this research. Furthermore, the expected result from these activities will also be listed here.
4.1 Data Collection The crypto-ransomware and goodware dataset were obtained from [11] which consist of 100 working samples of 10 distinct classes of ransomware and 100 benign applications. The crypto-ransomware samples are gathered to represent the most common versions and variations presently found in the wild. Furthermore, each crypto-ransomware is grouped together into a well-established family name, as there are several discrepancies between the naming policies of anti-virus (AV) suppliers, and therefore it is not simple to obtain a common name for each ransomware family.
4.2 Android Application Package Conversion into Hexadecimal Codes This segment highlights the approaches used by the data preprocessing module to transform the raw information into a comprehensible format in order to support toward this research framework. Furthermore, three data preprocessing approaches will be utilized in this research. Generally, it is possible to consider all binary files as a series of ones and zeros. As shown in Fig. 1, the first method is to convert each android application package to binary. After that, the binary file will be converted to hexadecimal code. Moreover, during this process the data has retained the original integrity of the application. In line with our knowledge, the reason for using binary to hexadecimal conversion is to reduce the code complexity as shown in Fig. 2, which will be effective toward the next stage of the transfiguring image conversion process.
4.3 Hexadecimal Code to Dynamic Image Transfiguration In this process, the hexadecimal code content of the string is extracted into 6 characters which refers to 6 characters for every unit. Now knowing each unit as 6 characters, it is possible to take their unit as indicators of a two-dimensional color map that stores RGB values that match the particular unit. Furthermore, repeating this process for each unit allows me to get a sequence of RGB values (pixel values)
A Novel Crypto-Ransomware Family Classification Based …
9
Fig. 1 Hexadecimal code conversion framework
from the stage 1 preprocessed file. Next, we have transformed this series of pixel values into a two-dimensional matrix, which will be used in image transfiguration process resulting in an RGB picture representation. Besides, Fig. 3 shows the width of the image output is set to 510; however, the height image is set to be dynamic based on the hexadecimal to dynamic image transfiguration algorithm. The reason for setting the width of the image static and the height of the image dynamic is to create a standard baseline feature dimension. From this part of the analysis, we found out there is a frequent amount of the unique pattern appearing in the images corresponding to each crypto-ransomware family. Moreover, this statement can be proved in Fig. 4. Besides as we further dive into analyzing the crypto-ransomware family, we have discovered a complication whereby all the crypto-ransomware family image dimensions are not standardized. Furthermore, this complication will affect the convolution neural network model, whereby the model will assign inequal weight toward the stage 2 preprocessed images which will cause the loss function in the model to increase leading to bad predictions. In addition, general approaches to manipulate the images such as center crop, squashing or padding will not work toward this research dataset. This is because the images will be losing a significant number of important features and this will lead to bad classification. Therefore, in this research we have developed an algorithm which solves the problem faced by stage 2 preprocessed images. The algorithm will be explained in depth in the next stage of data preprocessing.
10
M. Kakavand et al.
Fig. 2 Android application package conversion into hexadecimal code output
Fig. 3 Hexadecimal code to dynamic image conversion flowchart
4.4 Horizontal Feature Simplification In this process, we have used the created algorithm known as “horizontal feature simplification (HFS)” to further preprocess the images produced by stage 2 data preprocessing. Moreover, the main condition for horizontal feature simplification is the width of the image should be fixed. The rule is applied because if the image does
A Novel Crypto-Ransomware Family Classification Based …
11
Fig. 4 Hexadecimal code to dynamic image transfiguration output for WannaLocker ransomware variant
not have a fixed number of features, it will cause the images to be not normalized leading bad prediction toward this research. If the condition meets the algorithm, then it will be executed. As shown in Fig. 5, the first process will be converting the stage 2 preprocessed image to two-dimensional plane array to extract each row pixel vector. Next, SimHash algorithm with a prime number hash bit is used to a
Fig. 5 Horizontal feature simplification flowchart
12
M. Kakavand et al.
Fig. 6 Horizontal feature simplification output for Koler ransomware variant output
create coordinate corresponding to row pixel vector, whereby the algorithm takes each row vector, passes through a segment, then acquires effective feature vectors and weighs each set of feature vectors (if a row pixel vector is given, then the feature vector is the pixels in the image and the weight is the number of times the pixel may appear). Furthermore, DJB2 algorithm with a prime number hash bit is used to produce 26-bit hash value which will be utilized to create an RGB color pixel. Besides, if there is a collision between two-row pixel vectors, then the RGB colors will be added together in order to maintain the integrity of the image. In summary, horizontal feature simplification will create a static image dimension which will be used in the convolution neural network model to create unbiased weight distribution in order to produce a better classification model. In this part, we will be analyzing HFS data output. From this part of the analysis, we found out there is still a frequent amount of unique pattern appearing in the images corresponding to each crypto-ransomware family even after the images been preprocessed from stage 2 to stage 3. Furthermore, this statement can be proved from Fig. 6. Besides, number pixel density per image is increased 5% due to using prime numbers for SimHash and DJB2 algorithm hash bits compared to non-prime numbers. Therefore, the number of characteristics in an image also increases, causing the classification model to produce a higher-quality prediction.
4.5 Crypto-Ransomware Classification Neural networks have been very effective in finding significance or acknowledging patterns from a collection of images. The model functions used in this research are sequential functions, which allowed me to build a linear stack of layers. Hence, each layer will be treated as an object that feeds data to the next one. The next layer is the convolution layer, whereby the parameters accept a “100 * 100 * 3” array of the pixel value. In addition, the feature map is passed through an activation layer called rectified linear unit (ReLU). In fact, ReLU is a nonlinear operation that replaces
A Novel Crypto-Ransomware Family Classification Based …
13
all the negative pixel values in the feature map with zero. Primarily, ReLU has the tendency to perform better compared to other operation functions. Nevertheless, the ReLU layer increases the nonlinear properties of our model, which means the proposed model will be able to learn more complex functions rather than just linear regression.
5 Results and Discussion In this research, we have used a convolutional neural network model with expression, frequency and modularity. Next, the dataset is divided into two directories which are training directories and validation directories using sklearn. After splitting all data into those directories, then a script is used to label which data belongs to the corresponding picture of the crypto-ransomware file and benign file by generating texture maps, training and validation, and feeding this information to the proposed model. Moreover, we begin training the dataset and printing the test precision result for every iteration. The convolutional neural network model tends to overfit during epochs 100; therefore, the ideal is epoch is in the range of 20 in order to reduce the loss function and increase the accuracy of the model. The accuracy is at 63%. Nevertheless, the overall accuracy is above average because the amount of live crypto-ransomware dataset is small due to the endangerment of crypto-ransomware itself. Furthermore, during the testing of behavioral phase of the model a few hyperparameters were fine-tuned to create a better classification model without overfitting it.
6 Conclusion Most of the methods of crypto-ransomware classification involve the malware to record its behavior or use techniques of disassembly to predict its behavior. Both malware execution and disassembly are time-consuming, which have elevated computing cost. This research can accomplish similar outcomes and important efficiency improvements with the assistance of image transfiguration. Behavioral or static-based assessment depends on the platform, so the distinct systems need distinct classifiers. While the image-based strategy described here is independent of the platform, it classifies data based on the degree of resemblance between the same types of malicious and benign hexadecimal pattern. Furthermore, the framework suggested in this research improves the prior work and paves the way for the application of the state-of-the-art methods in the data preprocessing assessment of detecting sophisticated crypto-ransomware with less computational cost. We achieved an accuracy of 63 percent in detecting crypto-ransomware families; however, horizontal feature simplification (HFS) algorithm has shown adequate performance in protection of the
14
M. Kakavand et al.
user device against sophisticated crypto-ransomware. Therefore, our future work is to improve the proposed model in the real-time test bed with different performance matrices.
References 1. S. Millar, Network Security Issues in The Internet of Things (IoT) (Queen’s University Belfast Research Portal, Belfast, 2016) 2. D. Nieuwenhuizen, A behavioural-based approach to ransomware detection. MWR Labs Whitepaper 2017 3. M. Kakavand, M. Dabbagh, A. Dehghantanha, Application of machine learning algorithms for android malware detection. in ACM International Conference on Computational Intelligence and Intelligent Systems (2018), pp. 32–36 4. I. Rijnetu, A closer look at ransomware attacks: Why they still Work. Heimdal security, 2018. [Online]. Available: https://heimdalsecurity.com/blog/why-ransomware-attacks-still-work/ 5. P. Luckett, J.T. Mcdonald, W.B. Glisson, R. Benton, J. Dawson, Identifying stealth malware using CPU power consumption and learning algorithms. J. Comput. Secur. 26, 589–613 (2018) 6. S. Maniath, P. Poornachandran, V.G. Sujadevi, Survey on prevention, mitigation and containment of ransomware attacks. (Springer Nature, Singapore, 2019) pp. 39–52 7. S.K. Shaukat, V.J. Ribeiro, Ransomwall : a layered defense system against cryptographic ransomware attacks using machine learning. 2018 10th International Conference on Communication Systems and Networks (2018) pp. 356–363 8. A. Azmoodeh, A. Dehghantanha, M. Conti, K.-K.R. Choo, Detecting crypto—ransomware in IoT networks based on energy consumption footprint. J. Ambient Intell. Humaniz. Comput. 9(4), 1141–1152 (2018) 9. D. Sgandurra, L. Muñoz-gonzález, R. Mohsen, E.C. Lupu, Automated dynamic analysis of ransomware : benefits, limitations and use for detection. ArXiv J. (2016) 10. S. Homayoun, A. Dehghantanha, M. Ahmadzadeh, S. Hashemi, R. Khayami, DRTHIS : Deep ransomware threat hunting and intelligence system at the fog layer. Futur. Gener. Comput. Syst. (2019) 11. A.H. Lashkari, A.F.A. Kadir, L. Taheri, A.A. Ghorbani, Toward developing a systematic approach to generate benchmark android malware datasets and classification. in 2018 International Carnahan Conference on Security Technology (ICCST), (2018), no. Cic, pp. 1–7
Characteristic Analysis and Experimental Simulation of Diffuse Link Channel for Indoor Wireless Optical Communication Peinan He and Mingyou He
Abstract Based on the existing solution of combining direct link and diffuse link to alternately transmit optical signals, the channel characteristics of diffuse link are deeply analyzed by using algorithm and simulation method, the influencing factors of average optical receiving power in four scenarios are discussed, and the experimental simulation is carried out. The purpose is to obtain the main parameters of diffuse link channel characteristics, to improve the reliability and practicability of a communication system and to provide a reference for the application of indoor wireless optical communication technology and the design and development of equipment and instruments. The simulation results show that the position of the emitter, the angle of the emitter (θ mig ) and the angle of the beam diffusion (θ max ) are the factors affecting the average receiving power of the light. In the design of indoor optical wireless communication system, the average receiving power can be improved and the expected performance of the communication system can be achieved by fully considering these factors. Keywords Wireless optical communication · Diffuse link channel · Channel characteristic analysis · Experimental simulation
1 Introduction Visible light communication (VLC) is a new wireless optical communication technology, which is one of the research hotspots at present. This technology has realized the huge expansion of the communication spectrum and combined the communication and lighting technologies. Compared with traditional wireless communication P. He College of Air Traffic Management, Civil Aviation Flight University of China, 618307 Guanghan, China e-mail: [email protected] M. He (B) Chengdu University of Technology, 610059 Chengdu, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_2
15
16
P. He and M. He
technology, visible light communication technology has the advantages of high security, good confidentiality and abundant frequency resources. From the perspective of application, visible light communication technology is not only applicable to indoor wireless access, intelligent home and other lighting and communication fields, but also applicable to electromagnetic sensitive environments such as hospitals and aircraft cabins, train boxes and cars [1–5], as well as communication scenes with high confidentiality requirements. Aiming at the problem that optical signal transmission may be blocked by obstacles at any time during indoor point-to-point wireless optical communication, He proposes a solution that combines direct and diffuse links to transmit optical signals alternately [6, 7] and mainly embodies the advantages of this scheme in optical communication way flexible: Both direct link channels can be used for point-topoint transmission light signals, can also use the diffuse link channel way of diffuse light transmitted signals and also can use the combination of direct link and diffuse link to alternate transmission light signal, so as to realize continuous communication. Based on this scheme, the characteristics of diffuse link channel are analyzed and simulated. The purpose of this paper is to obtain the main parameters of the characteristics of diffuse link channel, to improve the reliability and practicability of the communication system and to provide reference for the application of indoor wireless optical communication technology and the design and development of equipment and instruments.
2 Diffuse Link Channel In order to simulate diffuse channel characteristics, a series of positions (x, y) in the effective communication area is plotted to simulate the range of light radiation. Assuming that the ceiling of the indoor room is Lambertian, then the signal power received by the receiver is Ps , and the following equation can be obtained: ¨ Ps = Ah 2
H (x, y)rect(x, y)dxdy h 2 + (x − x1 )2 + (y − y1 )2
2
(1)
Among them, x 1 and x 2 are the positions of the receiver, A is the area of the detector, h is the distance between the transmitter and the receiver, and H is the irradiance. For the radiation position (x, y) on the ceiling, the first-order radiation illumination is H (x, y) =
ρ Ph 2 h2 + x 2 + y2
2
(2)
In the formula, P is the average transmitting power and ρ is the ceiling reflectance parameter. The distance from (x, y) to (x1 , y1 ) is
Characteristic Analysis and Experimental Simulation …
D=
(x − x1 )2 + (y − y1 )2
17
(3)
Receiving power can be given as Aρ Ph 2 tan2 θa Ps−DIFF = 2 h2 + D2
(4)
For simplicity, suppose p = 1, x = y = x 1 = y1 , similar to Formula (4), the signalto-noise ratio (SNR) of diffuse (DIFF) link channel can be obtained, which can be expressed as SNRDIFF =
r A P 2 h 4 tan4 θa 4 2q I Bπ 3 h 2 + D 2 Hb sin2 θa
(5)
Here, we can explore the relationship between θ a and SNR of DIFF link channel and h and SNR by simulating the room. Suppose the transmitting power is 50 mW, r is 0.6 A/W, distance h is from 3 to 5 m, data receiving and receiving speed B is 10 Mbps, and angle A is 10–40°. The simulation results are shown in Figs. 1 and 2, respectively. As can be seen from Fig. 1, the SNR increases when θ a changes from 10 to 40°. This means that the receiving power of the receiver increases accordingly. As can be seen from Fig. 1, when the room height is 3, 4 and 5 m, the SNR decreases with the increase of the distance between the receiver and the ceiling radiation area.
Fig. 1 Simulation of distance D and SNR in diffuse link communication system. The abscissa D is the distance; the ordinate SNR is the signal-to-noise ratio; the angles θ a are 10°, 20°, 40°
18
P. He and M. He
Fig. 2 Simulation of distance D and SNR in DIFF link communication system. The abscissa D is the distance; the ordinate SNR is the signal-to-noise ratio; the heights h are 3 m, 4 m, 5 m
In addition, as can be seen from Fig. 2, at points (4, 10−2 ), three lines intersect, and there is little difference between them. That is to say, the height of the room is not an important factor affecting the SNR of the DIFF link communication system.
3 Channel Characteristic Analysis In indoor scenes, diffuse light propagates in the area by reflection times of various objects, such as ceilings, walls, furniture and even people. In such a scenario, a transmitter faces the ceiling of an ordinary room. The first-order reflection of light will occur at a point on the ceiling and then on the surface of other objects. Because of the different lengths of the optical path, the time of the beam arriving at the receiver is also different. Therefore, the sum of the impulse response of the diffuse link can be regarded as the sum of all the light reflections. Diffuse link algorithms vary according to different environments or models, but basically, impulse response is usually modeled as the sum of two components, that is, first-order reflection and higher-order reflection. As shown in Fig. 3, the impulse response of the first-order reflection is obviously higher than that of other reflections, and the other peaks with attenuation are the impulse response of the higher-order reflection. The initial direction of light propagation depends on the light source location, the angle of emission (θ mig ) and the angle of beam diffusion (θ max ) [8, 9]. Reference [10, 11] describes a simple method for exploring channel characteristics by introducing
Characteristic Analysis and Experimental Simulation …
19
SUM
Impulse response of the first order reflection Impulse response of second and higher-order reflections
Impulse response of a furnished room(5m×3m×4m)
SUM-Impulse response of the sum of all-order reflections
Fig. 3 Step reflection pulse response of diffuse link (room size: 5 m × 3 m × 4 m)
some parameters, as they relate to the physical parameters of the channel, especially for primary reflection. However, in this method, the light source is directed directly to the ceiling, with no deflection. In order to further understand the characteristics of indoor diffuse channel and lay a solid theoretical foundation for exploring the new system, this paper introduces an improved method and analyzes the channel characteristics through scene-based simulation. In Fig. 4, the transmitter and receiver face the ceiling. By introducing the angle of θ 0 , which represents the normal direction of the emitter, and combining with the beam diffusion angle of θ max , it can be regarded as the maximum value of the half-angle horizontal light source, so as to determine the direction of the light source. In the equivalent uniform distribution, the normalized intensity of Lambert body light source and the angle at which the beam intensity is half of the total radiation intensity can be expressed as Я A
Ceiling
O'
O
B
C
Ceiling
θ0 θimg
hT
hR
θmax θ0 T
dTR
R
Opcal radiaon θimgθmax
Fig. 4 Optical paths in two scenarios. θ img —normal angle of light source; θ max —beam diffusion angle
20
P. He and M. He
θhalf
1 1 +1 = 2 · sin 2 m −1
(6)
in [10], by modifying equation (x), θ max can be expressed as θmax
⎛ ⎞ −1 (0.5) 1 1 − 0.4 × θ /cos half ⎠ = 2 · sin−1 ⎝ 2 m+1
(7)
As shown in Fig. 4, the normal angle of the light source (θ img ) is the angle between the light source and the receiver and transmitter. So, the angle θ img can be defined as θimg = tan−1
dT R hT + h R
− θ0
(8)
where d TR is the distance between the transmitter and the receiver, h T is the height T O is the normal In addition, of transmitter and hR is the height of the receiver. direction of the beams when the angle θ 0 is zero, T O is the normal direction of the beam, while |T O| is the normal of the light source. There are two cases as illustrated in Fig. 8: (a) When angle θ img < θ max , the point B, which is the cross point of the ceiling and the line of receiver’s image that connects the source, is within the area at where the light is illuminating; (b) when angle θ img > θ max , as the distance between transmitter and the receiver, d TR , is becoming larger, point B is out of the illuminating area. To describe the light propagation by geometrical way, length of different kinds of light paths can be calculated by L fast =
hT cos(θmax + θ0 )
+
[dT R − h T · tan(θmax + θ0 )]2 + h 2R
(9)
Here, L fast represents the fast path of light, which in case (a) and case (b) is the line connecting points T, C and R. Then, the slow path L slow , in both of the cases, which is the connection of points T, A and R, is calculated by L slow =
hT cos(θmax + θ0 )
+
[dT R + h T · tan(θmax + θ0 )]2 + h 2R
(10)
As for the distance between the image of the receiver and transmitter, the length L img is L img =
2 dT2 R + h 2R + h 2T
(11)
And the path where the strong part of light beam propagates is L strong =
d02 + h 2T + (dT R − d0 )2 + h 2T
(12)
Characteristic Analysis and Experimental Simulation …
21
Moreover, in case (a), the shortest path is the one along with points T, B and R, while in case (b), the one is connecting points T, C and R. For both of the cases, if point O is approaching point B, thus the strongest path T-O-R is becoming the shortest path meanwhile. Additionally, considering the case of the position of the receiver varies, as shown in Fig. 5, paths of light are illustrated. In this figure, there are two positions for the receiver, keeping the source same; similarly, one can calculate the path length by L strong−R2 =
2 d02 + h 2T + h 2T + dT R1 − d0 + h 2R1 R2
(13)
Since the distance between points O and A, and the distance between points T and A are |O A| = h T · tan(θmax − θ0 ) |T A| =
hT cos(θmax − θ0 )
Fig. 5 Light paths when the position of receiver changes
(14) (15)
22
P. He and M. He
L slow−R2 =
hT cos(θmax − θ0 )
+
L img−R2 =
[h T · tan(θmax − θ0 ) + T R1 ]2 + d R2 1 R2 + h 2T
h T + h R2
2
+ dT2 R2
(16)
Like the cases that just discussed previously, in this scenario, the shortest path will depend on the setting of light source angle and emitting angle. By simulating later in this chapter, the effect of positions of receiver for diffuse links will be discussed more. As one uses hLOS (t) to express the impulse response of line-of-sight channel, for the diffuse channel, the impulse response of the channel can be described as h DIFF (t) = h 1 (t − t1 ) + h n (t − tn )
(17)
where h1 (t) is the impulse response of the first-order reflection and hn (t) represents the impulse response of the higher-order reflections. T 1 and T n are, respectively, the starting times of the rise of the first-order reflection and the higher-order reflections. For modeling, gamma function is used to describe the impulse response of the channel in math; thus, the normalized channel response can be expressed as −t λ−a a−1 ·t · exp h 1 (t) = (a) λ
(18)
where G(α) is the gamma function and the parameters λ, α which were introduced in reference [10] with variance σ 2 and t that given as below can express the response with delay T 1 as 1 |H1 ( f )|2 = a 1 + (2π λ · f )2
(19)
where H 1 (f ) is the Fourier transform of h1 (t) and a, λ are σ 2 = α · λ2 t = α · λ
(20)
Therefore, the −3 dB bandwidth of the first-order reflection can be expressed as f1 =
1
2a − 1 2π λ
(21)
Taking the spherical model to describe the higher-order reflection response, then t n h n (t) = · exp − (22) τ τ
Characteristic Analysis and Experimental Simulation …
23
where hn =
Aeff ρ1 · ρ · sin2 (FOV) · Aroom 1 − ρ av τ = − InTρ 4·Vroom TaV = C·Aroom
(23)
Here, the ηn is the higher-order gain, FOV is the field of view, V room is the volume of a room, T aV is the average time of transmission, and the Aeff is the receiving area, ρ the mean of reflectivity of room objects. Therefore, taking Fourier transform, the higher-order reflection can be expressed as Hn ( f ) =
n
1 + j · 2π · τ · f
(24)
Then, the −3 dB bandwidth of higher-order reflections can be calculated by fn =
1 2π · τ
(25)
As Fig. 6 shows, corresponding to the equation, it is clear that as the reflectivity of room objects increases, the bandwidth is becoming smaller. Even though more light is counted in, the decay time is lengthening at meanwhile. Thus, the bandwidth decreases. And even more, the first-order reflection response is dominating the diffuse
Relative magnitued(dB)
First order reflection response
High order reflection response
Frequency(MHz)
Fig. 6 Frequency responses of first-order and higher-order reflections with reflectivity
24
P. He and M. He
channel bandwidth; certainly, the parameters that are relative to the bandwidth reveal that the locations of the transmitter and receivers are also influencing the bandwidth.
4 Experimental Simulation and Discussion In order to understand the characteristics of optical channel in the indoor environment and its influence on the design of optical communication system, it is necessary to discuss the influence of physical parameters, field of view angle and half-angle power angle.
4.1 Effect of Physical Parameters 4.1.1
Room Size
Modeling a room, to simulate the performance of diffuse link in such environment, as shown in Fig. 7, by varying the size of it (enlarging from 6 m × 3 m × 6 m to 10 m × 3 m × 10 m) while keeping the coordinates of the transmitter and receiver staying [the transmitter is at Tx(3 m, 1 m, 2 m), and the receiver is at Rx (0.8 m, 1 m, 2 m)] in order to see how the change especially from the width and length of the room will affect the diffuse channel, the results of the impulse response of channels will show in terms of different orders of reflections and the sum.
(m) 1
0.8
3
Rx
Tx
(m)
0
Rx
2
Tx (m)
Fig. 7 Locations of the transceiver in a room a 2D view and b 3D view. a Position of transceiver in two-dimensional view and b position of transceiver in three-dimensional view (position of receiver Rx (0.8 m, 1 m, 2 m); position of transmitter Tx (3 m, 1 m, 2 m)
Characteristic Analysis and Experimental Simulation …
25
Table 1 Experimental parameters T 1 —direct transmitter versus R1 —direct receiver
T 1 —direct transmitter versus R2 —DIFF receiver
Average received power
2.73e−05 mW
Average received power
2.38e−05 mW
Geometrical loss
−72.6 dB
Geometrical loss
−73.2 dB
DC channel gain
5.45E−08 dB
DC channel gain
4.74E−08 dB
T 2 —diffuse transmitter versus R1 —direct receiver
T 2 —diffuse transmitter versus R2 —DIFF receiver
Average received power
1.84e−05 mW
Average received power
1.62e−05 mW
Geometrical loss
−74.4 dB
Geometrical loss
−74.9 dB
DC channel gain
3.67E−08 dB
DC channel gain
−79.1 dB
Notes T 1 —direct transmitter; T 2 —diffuse transmitter; R1 —direct receiver; and R2 —diffuse receiver
Figure 7 illustrates 2D and 3D views of a room where the locations of the transmitter and receiver were mapped in such scenario. For the cases that simulated here, parameters are shown in Table 1. In Fig. 8, there is a comparison of the impulse response of the first-order reflection in the four cases. As seen in the results, the responses were started almost at the same time about 13 ns, this is because the place where the first-order reflection happened is on the ceiling for all four cases, with simulation parameters unchanged apart from the width and length of the room, the light path from transmitter to the ceiling is nearly the same, and thus the response time will keep the same. In addition, the blue line was added on the figure as the alignment stakes to measure the start time of the response impulses. Apparently, it is hard to see the difference in this figure, especially when comparing the results of the second-order reflection response that is shown in Fig. 9. By comparing the results of the impulse response of the second-order reflections, as shown in Fig. 9, it is much clear to see the delay of the commencement time of the second-order reflection response in all cases with the aid of the blue line. As the size of the room being enlarged, the time of response is decaying, while the duration of response time is also lengthened, since the slots between each peak are broadened, and the intensity of the response impulses is apparently becoming weaker and flatter. The results just well agree with the model that when the size of the room is getting bigger, in other words, as the light path is becoming longer, the light beams have to take longer time to reach the subsequent objects; in the case of room 9 m × 3 m × 9 m, the response decay time compared to the one in case of room 6 m × 3 m × 6 m is about 5 ns; the average magnitude of the number of the response peaks, which can reveal the objects that the light reached, like the side walls in the room, in the case of room 8 m × 3 m × 8 m is higher than the one in the case of room 6 m × 3 m × 6 m, and these peaks are spread in a long period of response time over 43 ns. Figure 10 shows the sum of impulse response of all cases. By calculating the average received power, it is apparent to see that the biggest figure (4.69 nW) had in
26
P. He and M. He
Fig. 8 Comparison of impulse response of the first-order reflection by room size in: 6 m × 3 m × 6 m; 8 m × 3 m × 8 m; 9 m × 3 m × 9 m; and 10 m × 3 m × 10 m
room case 10 m × 3 m × 10 m. The trend shows that along the size getting bigger, the average received power is increased. It can be explained that the surface of the room is increased which creates more reflection area to light source, and the calculation includes all-order reflections, along the response time broadening, the more light that counted in calculation is added, and thus the received optical power is increased. However, corresponding to the model that is discussed previously with the aid of equations, even though the gain of the channel becomes bigger along with the
Characteristic Analysis and Experimental Simulation …
27
Fig. 9 Comparison of the impulse response of the second-order reflections by room size in: 6 m × 3 m × 6 m; 8 m × 3 m × 8 m; and 9 m × 3 m × 9 m
increase of room size, the decay time is becoming longer; therefore, the −3 dB bandwidth is actually decreased with this trend.
4.1.2
Empty Room and Furnished Room
Because of the nature of the light, and the effect from reflection surfaces, it can be assumed that the diffuse channel response might vary under different conditions, like furnished or not. For this, there simulated a scenario of a furnished room to compare the channel response of the one in an empty room. Figure 11 shows the layout in 2D and 3D views. Both of the rooms are in same size 5 m × 3 m × 6 m, and the transmitter Tx is at position (3.8 m, 1 m, 3 m), while the receiver Rx is at (0.8 m, 1 m, 3 m). In the furnished room, a rack, three tables and a chair which are at positions (0 m, 1 m, 0 m), (1.5 m, 0 m, 0 m), (0.5 m, 0 m, 2.5 m), (3.7 m, 0 m, 2.5 m), (2.2 m, 0 m, 0.5 m) are added. The comparison of results is shown in Fig. 12.
28
P. He and M. He
Fig. 10 Impulse response of the sum of all-order reflections
The obvious change as seen is that the first-order response of both of the cases is slightly different at the value of the figure; however, the difference of the secondorder response is as large as the gain order decreases from 10−9 to 10−10 . This change can be seen more clearly in Fig. 13. The first-order response pulse that can be seen in the result of a furnished room came bit fast than the one in an empty room. This can be explained that with the tables, chair and rack’s add-in, some paths of the light are shortened which results that the light reaches the reflection surfaces faster so that the reflection response occurs earlier. Also, since the second-order reflections are mainly happened after the
Characteristic Analysis and Experimental Simulation …
29
(m) 1
0
3
0.8
3.8 (m)
Rx
Tx
(m) P os i ti on s: T x( 3 . 8 m , 1 m , 3 m ) R x( 0 . 8 m , 1 m , 3 m )
(m) 2
1.5
1 0
(a)
(b)
(c)
(d)
(m)
Chair 2.3 3
Rx
Tx
(m) P os i ti on s: T x( 3 . 8 m , 1 m , 3 m ) R x( 0 . 8 m , 1 m , 3 m ) R ack ( 0 m , 0 m , 0 m ) C hai r ( 2 . 2 m , 0 . 5 m )
T abl e 1 ( 1 . 5 m , 0 m ) T abl e 2 ( 0 . 5 m , 2 . 3 m ) T abl e 3 ( 3 m , 7 m , 2 . 3 m )
Fig. 11 Layout of transmitters and receivers in the room. a 2D view of an empty room; b 3D view of an empty room; c 2D view of a furnished room; and d 3D view of a furnished room
first-order reflections from the ceiling, thus it can be understandable that the rack might block some paths of the light so that the gain of the second-order reflection decreased. But the sum of the result shows that the first-order reflection response still dominates the all; in addition, it also reveals that even though the furniture at some location might result in light path blocked, they increase the reflection surfaces within the room on the other hand, which cause more reflections to calculate, so that the sum of the average gain was not varied largely in this scenario.
H0
P. He and M. He
H0
30
Time(ns)
H1
H1
Time(ns)
Time(ns)
H2
H2
Time(ns)
Time(ns)
SUM
SUM
Time(ns)
Time(ns)
Time(ns)
H0-First order reflection response; H1-First order reflection response; H2-Secod order reflection response; SUM-Impulse response of the sum of all-order reflections (a)Empty room
(b)Room with furniture
H2
Fig. 12 Order reflection impulse response of empty and furnished rooms. a Channel response of an empty room and b channel response of a furnished room
No furniture room
Room:5m×3m×6m
H2
Time(ns)
Furnished room
Room:5m×3m×6m
Time(ns) H2-Impulse comparison of the second order reflections Comparison of the impulse response of a room with and without furniture
Fig. 13 Impulse response of a room with or without furniture
Characteristic Analysis and Experimental Simulation …
31
Fig. 14 Average received power versus half-power angle versus FOV
4.2 Effect of Field of View (FOV) Angle and Half-Power (HP) Angle By calculating the average received power at the receiver, showing results in Fig. 14, one can find the relationship between the HP angle and the received power [12–14]. As shown in the figure, the trend of this relationship is that when the degree of the HP angle increases, the amount of average received power is increased. This is reasonable that the bigger the angle is, the more directions of light paths are generated. Thus, the receiver may have more opportunities to catch light from different directions. Also, along with the FOV increased, the average received power is increased too. It is easy to understand that because the receiving area is enlarged, and then more light arrived at the receiver can be collected. In addition, the average power that used through the radiation simulation was 1 W in this case. Moreover, it is apparent to see that there is a notch on the line waves at the point of 45°, where the value of average received power was sharply decreased. This point reveals that there is a threshold for HP angle that if the angle was too large that exceeds the threshold, some of the light would be misled to the less effective direction against destination so that the light beams had to undergo more reflections before the journey ended at the receiver, and possibly on their way, might suffer from blockage or even absorption which would cause received power to decrease. Therefore, taking the part of 45–85° section in Fig. 14 to confirm, the average received power is much weaker than one of their previous sections in which the largest figure (67.6 nW) can reach at 30° with FOV at 80° in this case.
32
P. He and M. He
5 Conclusion To sum up, the following conclusions are drawn: 1. The calculation of receiver receiving power shows that the average optical receiving power is closely related to the FOV and the HP. When the HP angle increases, the average receiving power increases. In addition, with the increase of the FOV angle, the average receiving power also increases greatly. However, 45° is a threshold value. When the HP angle is greater than 45°, the average receiving power value decreases sharply. In fact, half-power angle of 30° is the best HP angle. 2. Channel characteristic analysis shows that in the indoor environment, different objects such as the ceiling, wall, furniture and people may cause diffuse light to produce diffraction. At the same time, due to the difference of the length of the light path, the time of the reflected beam arriving at the receiver is also different. Diffuse link algorithm can change this situation to be adapted to changing environments or modes. 3. The influence of physical parameters is mainly manifested in two aspects: The size of the room is the best, and the size of the room is 10 m × 3 m × 10 m; among them, the empty room is less affected by the diffraction light produced by diffuse light, while the furniture room is more affected by the diffraction light produced by diffuse light. Therefore, when designing indoor optical wireless communication system, as long as the above factors are properly handled, the expected performance of an optical communication system can be achieved. Acknowledgements This study was supported by the International Student Fund of the University of Warwick, UK, by Scientific Research Fund of the Civil Aviation Flight University of China (J2018-11) and by Sichuan Science and Technology Project (2019YJ0721). I would like to thank Professor Roger j. Green, an expert in optoelectronic communications at the University of Warwick and Professor Dominic O’Brien, an expert in optical communications at the University of Oxford, for highly appraised on the results. I would like to thank the School of Engineering of University of Warwick for providing detection equipment for my experimental testing.
References 1. M.E. Yousefi, S.M. Idrus, C.H. Lee, M. Arsat, A.S.M. Supa’At, N.M. Safri, Indoor Free Space Optical Communications for Aircraft Passenger Cabin (IEEE, 2001). 978-1-4577-00057/11,2011,:1-5 2. M.D. Higgins, M.S. Leeson, R.J. Green, An Analysis of Intra-Vehicle Optical Wireless Communications from a Passenger Perspective (ICTON, 2012), We.C4.4:1–4 3. M.D. Higgins, R.J. Green, M.S. Leeson, Optical wireless for intravehicle communications: a channel viability analysis. IEEE Trans. Veh. Technol. 61(1), 123–129 (2012)
Characteristic Analysis and Experimental Simulation …
33
4. M.D. Higgins, R.J. Green, M.S. Leeson, Optical wireless for intravehicle communications: incorporating passenger presence scenarios. IEEE Tran. Veh. Technol. 62(8), 3510–3517 (2013) 5. D.C. O’Brien, G.E. Faulkner, S. Zikic, N.P. Schmitt, High data-rate optical wireless communications in passenger aircraft: measurements and simulation. IEEE, CSNDSP08. Proceedings (2008),978-1- 4244-1876-3/08:68-71 6. P. He, M. He, Orthogonal space- time coding and modulation technology for indoor wireless optical communication. Aero. Comput. Tech. 48(1), 127–134 (2018) 7. P. He, M. He, Dual-links configuration and simulation of indoor wireless optical communication system. Opt. Commun. Technol 43(7), 56–60 (2019) 8. A.G. Al-Ghamdi, J.M. Elmirghani, Analysis of diffuse optical wireless channels employing spot-diffusing techniques. Diversity receivers. And combining schemes. IEEE Trans. Commun. 52(10), 1622–1631 (2004) 9. A.K. Majumdar, Non-line-of-sight (NLOS) Ultraviolet and indoor free-space optical (FSO) communications. Adv. Free Space Optics (FSO) 186, 177–202 (2015) 10. H. Naoki, I. Takeshi, Channel modeling of non-directed wireless infrared indoor diffuse link. Electron. Commun. Japan. 90(6), 8–18 (2007) 11. F. Miranirkhani, M. Uysal, Channel modeling and characterization for visible light communications. IEEE J. Photon. 7(6), 1–16 (2015) 12. A.T. Hussein, J.M.H. Elmirghani, 10 Gbps mobile visible light communications systems employing angle diversity. Imaging receivers. And relay nodes. J. Optical Commun. Netw. 7(8), 718–735 (2015) 13. D.J.F. Barros, S.K. Wilson, J.M. Kahn, Comparison of orthogonal frequency division multiplexing and pulse-amplitude modulation in indoor optical wireless links. IEEE Trans. Commun. 60(1), 153–163 (2012) 14. M.T. Alresheedi, J.M.H. Elmirghani, 10 Gb/s Indoor optical wireless systems employing beam delay, power, and angle adaptation methods with imaging detection. J. Lightwave Technol. 30(12), 1843–1856 (2012)
A Comparative Analysis of Malware Anomaly Detection Priynka Sharma, Kaylash Chaudhary, Michael Wagner, and M. G. M. Khan
Abstract We propose a classification model with various machine learning algorithms to adequately recognise malware files and clean (not malware-affected) files with an objective to minimise the number of false positives. Malware anomaly detection systems are the system security component that monitors network and framework activities for malicious movements. It is becoming an essential component to keep data framework protected with high reliability. The objective of malware inconsistency recognition is to demonstrate common applications perceiving attacks through failure impacts. In this paper, we present machine learning strategies for malware location to distinguish normal and harmful activities on the system. This malware data analytics process carried out using the WEKA tool on the figshare dataset using the four most successful algorithms on the preprocessed dataset through cross-validation. Garrett’s Ranking Strategy has been used to rank various classifiers on their performance level. The results suggest that Instance-Based Learner (IBK) classification approach is the most successful. Keywords Anomaly · Malware · Data mining · Machine learning · Detection · Analysis
P. Sharma (B) · K. Chaudhary · M. Wagner · M. G. M. Khan School of Computing, Information and Mathematical Sciences, The University of the South Pacific, Suva, Fiji e-mail: [email protected] K. Chaudhary e-mail: [email protected] M. Wagner e-mail: [email protected] M. G. M. Khan e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_3
35
36
P. Sharma et al.
1 Introduction Malware has continued to mature in volume and unpredictability in remarkable dangers to the security of computing machines and services [1]. This has motivated the increased use of machine learning to improve malware anomaly detection. The history of malware demonstrates to us that this malicious threat has been with us since the beginning of computing itself [2]. The concept of a computer virus goes back to 1949, when eminent computer scientist John von Neumann wrote a paper on how a computer program could reproduce itself [3]. During the 1950s, workers at Bell Labs offered life to von Neumann’s thought when they made a game called “Center Wars”. In the game, developers would release programming “life forms” that vied for control of the computer system. From these basic and benevolent beginnings, a gigantic and wicked industry was conceived [3]. Today, malware has tainted 33% of the world’s computers [4]. Cybersecurity Ventures report that the losses are due to cybercrime, including malware, and are foreseen to hit $6 trillion each year by 2021 [4]. Malware is software designed to infiltrate or harm a computer framework without the proprietor’s informed assent. Many strategies have been used to safeguard against different malware. Among these, malware anomaly detection (MAD) is the most encouraging strategy to shield from dynamic anomaly practices. MAD system groups information into different classifications known to be as typical and bizarre [5]. Different classification algorithms have been proposed to plan a powerful detection model [6, 7]. The exhibition of a classifier is a significant factor influencing the performance of MAD model. Thus, the choice of precise classifier improves the performance of malware detection framework. In this work, classification algorithms have been assessed using WEKA tool. Four different classifiers have been estimated through Accuracy, Receiver Operating Characteristics (ROC) esteem, Kappa, Training Time, False-Positive Rate (FPR) and Recall esteem. Positions have additionally been appointed to these algorithms by applying Garret’s positioning strategy [8]. The rest of the paper is dependent on the sections as follows. The next two subsections discuss anomaly detection and classification algorithms. Section 2 discusses related work, whereas Sect. 3 presents the chosen dataset and describes WEKA tool and different classification algorithms. Results and discussions are obtainable in Sect. 4. Finally, Sect. 5 leads with the conclusion of this research.
1.1 Anomaly Detection Anomaly detection is a type of innovation that uses human-made intellect to recognise unusual behaviour classified in the dataset. Datch frameworks characterise anomaly discovery as “a strategy used to distinguish unpredictable instances in a perplexing domain”. Ultimately, anomaly identification spots design such that a human reader
A Comparative Analysis of Malware Anomaly Detection
37
Fig. 1 Anomaly detection on various problem-solving domains
cannot. Anomaly detection is bridging any issues among matrices and business procedures to give more proficiency [9]. Ever since the intensification in big data enterprises of all extents has been in a condition of vulnerability, anomaly detection is conquering prevention among measurements and business procedures to give more proficiency. There are two phases in anomaly-based detection. Phase 1 is training, and phase 2 is detecting [10, 11]. In the first phase, the machine learning attacks and then detects abnormal behaviour in the second phase. A key advantage of anomaly-based detection is its ability to detect zero-day attacks [12]. The limitations of anomalybased detection are high false alarm rate and difficulty in deciding features to be used for detection in the training phase [13]. Anomaly detection cracks these problems in numerous diverse ways, as depicted in Fig. 1. Anomaly detection stages can fall into the particulars of data identification where diminutive peculiarities cannot be seen by users observing datasets on a dashboard. Therefore, the best way to get continuous responsiveness to new data examples is to apply a machine learning technique.
1.2 Classification Algorithms Classification algorithms in data mining are overwhelmingly applied in anomaly detection system to order attacks from ordinary activities in the system. Classification algorithms take a supervised learning approach; that is, it does not require class marks for the forecast. There are essentially eight classifications of classifiers, and every classification contains diverse artificial intelligence algorithms. These classifications are:
38
P. Sharma et al.
Bayes Classifier: Also known as belief networks has a place with the group of probabilistic graphical models (GMs) which are used to state in learning about uncertain areas. In the graph, nodes mean random factors and edges are probabilistic conditions. Bayes classifier depends on foreseeing the class based on the estimation of individuals from the highlights [14]. Function Classifier: Develops the idea of a neural network and relapse [I]. Eighteen classifiers fall under this classification. Radial Basis Function (RBF) Network and Sequential Minimal Optimization (SMO) are two classifiers which perform well with the dataset used in this paper. RBF classifiers can present any nonlinear function effectively, and it does not utilise crude information. The issue with RBF is the inclination to over-train the model [15]. Lazy Classifier: Requests to store total training information. While building the model, new examples are not incorporated into the training set by these classifiers. It is mostly utilised for classification on information streams [16]. Meta Classifier: Locates the ideal set of credits to prepare the base classifier. The parameters used in the base classifier will be used for predictions. There are twenty-six classifiers in this category [8]. Mi Classifier: There are twelve multi-instance classifiers. None fits the dataset used in this paper. This classifier is a variation of the directed learning procedure. These kinds of classifiers are initially made accessible through a different programming bundle [17]. Misc or Miscellaneous Classifier: Three classifiers fall under misc classifier. Two classifiers, Hyperpipes and Voting Feature Interval (VFI), are compatible with our dataset [8]. Rules Classifier: Association standards are used for the right expectations of class among all the behaviours, and it is linked with the level of accuracy. They may anticipate more than one end. Standards are fundamentally unrelated. These are learnt one at a time [17]. Trees: Famous classification procedures where a stream graph like tree structure is created in which every hub signifies a test on characteristics worth and each branch expresses the result of the test. Moreover, it is known as Decision Trees. Tree leaves characterise to the anticipated classes. Sixteen classifiers fall under this category [8].
2 Related Work Numerous analysts have proposed different strategies and algorithms for anomaly detection on data mining classification methods. Li et al. [19] present a rule-based technique which exploits the comprehended examples to recognise the malignant attacks [18]. Fu et al. [20] discuss the use of data mining in the anomaly detection framework. It is a significant course in Intrusion Detection System (IDS) research. The paper shows the improved affiliation anomaly detection dependent on frequent pattern growth and fuzzy c-means
A Comparative Analysis of Malware Anomaly Detection
39
Table 1 An overview of publication that applied ML classifiers for malware detection Publication/year
ML algorithms
Sample size Optimal classifier
Evaluation of machine BN, MLP, J48, KNN, RF learning classifiers for mobile malware detection, 2016
3450
Using spatio-temporal IBK, J48, NB, Ripper, SMO information in API calls with ML algorithms for malware detection, 2009 DroidFusion: A novel multilevel classifier fusion approach for Android malware detection, 2018
516
RT, J48, Rep Tree, VP, RF, R. 3799 Comm., R. Sub., Adaboost, 15,036 Droid fusion 36,183
RF
SMO
DroidFusion
(FCM) network anomaly detection. Wenguang et al. proposed a smart anomaly detection framework dependent on web information mining, which is contrasted with other conventional anomaly detection frameworks [20]. However, for a total detection framework, there is still some work left such as improving information mining algorithms, best handling the connection between information mining module and different modules, improving the framework’s versatile limit, accomplishing the representation of test outcomes, and improving continuous proficiency and precision of the framework. Likewise, Panda and Patra [22] present the study of certain information mining systems, for instance, machine learning, feature selection, neural network, fuzzy logic, genetic algorithm, support vector machine, statistical methods and immunological-based strategies [21]. Table 1 presents an overview of papers that applied machine learning techniques for Android malware detection.
3 Dataset and Tool Description The Android malware dataset from figshare consists of 215 attributes, feature vectors detached from 15,036 applications (5560 malware applications from Drebin venture and 9476 amiable applications). Also, this dataset has been used to create multilevel classifier fusion approach for [1]. Table 2 shows that the dataset contains two classes, mainly malware and benign. There are 5560 instances of malware and 9476 instances of benign. Table 2 Dataset used for malware anomaly detection Dataset
Features
Samples
Class Malware
Benign
Drebin
215
15,036
5560
9476
40
P. Sharma et al.
3.1 Waikato to Environment for Knowledge Analysis (WEKA) WEKA is a data analysis tool developed at the University of Waikato, New Zealand, in 1997 [22]. It consists of several machine learning algorithms that can be used to mine data and extract meaningful information. This tool is written in Java language and contains a graphical user interface to connect with information files. It contains 49 information pre-preparing tools, 76 classification algorithms, 15 trait evaluators and 10 quest algorithms for highlight choice. It has three graphical user interfaces: “The Explorer”, “The Experimenter” and “The Knowledge Flow”. WEKA bolsters information placed in Attribute Relation File Format (ARFF) document group. It has a lot of boards that can be utilised to perform explicit errands. WEKA gives the capacity to create and incorporate a new machine learning algorithm in it.
3.2 Cross-Validation The cross-validation is equivalent to a single holdout validation set to evaluate the model’s predictive performance on hidden data. Cross-validation does this more robustly, by iterating the trial multiple times, using all the various fragments of the training set as the validation sets. This gives an increasingly exact sign of how well the model sums up to inconspicuous data, thus avoiding overfitting (Fig. 2).
Fig. 2 Cross-validation (10 folds) method application
A Comparative Analysis of Malware Anomaly Detection
41
Fig. 3 Confusion matrix application in machine learning
3.3 Performance Metrics Used The performance measurements used to assess classification strategies depicted via confusion matrix. It contains information regarding test dataset, which contains known values. The confusion matrix displays results of prediction as follows: 1. False Positive (FP): The model predicted a benign class as a malware attack. 2. False Negative (FN): It means wrong expectation or prediction. The prediction was benign, but it was a malware attack. 3. True Positive (TP): The model predicted a malware attack, and it was a malware attack. 4. True Negative (TN): The model predicted as benign, and it was benign. A confusion matrix, as shown in Fig. 3, is a method for condensing the presentation of a classification calculation. Classification accuracy alone can be misdirecting on the off chance that you have an inconsistent number of perceptions in each class or if there are multiple classes in your dataset. Computing a confusion matrix can give you a superior thought of what the classification model is predicting [17]. • • • • •
Accuracy = (TP + TN)/n True-Positive Rate (TPR) = (TP + TN)/n False-Positive Rate (FPR) = FP/(TN + FP) Recall = TP/(TP + FN) Precision = TP/(TP + FP).
4 Results and Discussions Android malware dataset was used to evaluate anomaly detection retrospectives. We have used WEKA to implement and evaluate anomaly detection. Feature ranking and file conversions in arff file format were additionally completed using WEKA tool. In every single investigation, we set K = 4, where K represents the number of base classifiers. In this paper, we used four base classifiers. Besides, we took N = 10 for the cross-validation and weight assignments separately. The four base classifiers are Instance-Based Learner (IBK), Logistic, Rotation Forest and Sequential Minimal
42
P. Sharma et al.
Optimization (SMO). The classifier has been evaluated in WEKA environment using 215 attributes detached from 15,036 applications (5560 malware applications from Drebin venture and 9476 amiable applications). Garrett’s Ranking Technique has been used to rank different classifiers according to their performance. Figure 4 shows the predictive model evaluation using knowledge flow. The arff loader was used to load the dataset. The arff loader was associated with “ClassAssigner” (permits to pick which segment or column to be the class) component from the toolbar and was eventually set on the layout. The class value picker picks a class value to be considered as the “positive” class. Next was the “CrossValidationFoldMaker” component from the evaluation toolbar as described in Sect. 3.2. Upon completion, the outcomes were acquired by option showing results from the pop-up menu for the TextViewer part. Tables 3 and 5 illustrate the evaluation results of the classifiers. Table 3 shows the quantity of “statistically significant wins”; each algorithm has against all the other algorithms on the malware detection dataset used in this paper. A win implies an accuracy that is superior to the accuracy of another algorithm, and the difference was statistically significant. However, we can agree with the results table that IBK has a notable success when compared to RF, SMO and logistic.
Fig. 4 A predictive model evaluation using knowledge flow
Table 3 Results obtained with algorithm-based ranking (1 = Highest-Rank) Classifier ROC area
FPR
Accuracy Kappa
MAE
Recall Precision Training Rank Time (s)
IBK
0.994 0.013 98.76
0.9733 0.013
0.988
0.988
0.01
1
Rotation Forest
0.997 0.020 98.51
0.9679 0.0333 0.985
0.985
98.63
2
SMO
0.976 0.027 97.84
0.9535 0.0216 0.978
0.978
34.46
3
Logistic
0.995 0.027 97.81
0.953
0.978
22.44
4
0.0315 0.978
A Comparative Analysis of Malware Anomaly Detection
43
Table 4 Predictive model evaluation using knowledge flow Dataset
Mean (Standard Deviation)
Drebin-215
IBK
Rotation Forest
SMO
Logistic
98.76 (0.27)
98.50 (0.32)*
97.81 (0.37)*
97.86 (0.39)*
*Significance of 0.05
Each algorithm was executed ten times. The mean and standard deviation of the accuracy are shown in Table 4. Therefore, the difference between the three accuracy scores is significant for RF, SMO and logistic and is less than by 0.05, indicating that these three techniques compared to IBK are statistically different. Henceforth, IBK leads the algorithm accuracy level in determining the malware anomaly detection on the Drebin dataset.
5 Conclusion and Future Work In this paper, we proposed a novel machine learning anomaly malware detection approach for Android malware data collection, which identifies malware attacks and achieves zero false-positive rate. We achieved an accuracy rate as high as 98.76%. Exclusively, if this system turns out to be a part of profoundly focused business entry, various deterministic exemption components must be included. As we contemplate, malware detection by means of machine learning will not substitute the standard detection strategies used by anti-virus merchants; however, it will come as an extension to them. Any business against infection item is liable to a certain speed and memory impediments. In this way, the most reliable algorithm among those introduced here is the IBK.
References 1. Y. Yerima, S. Sezer et al., Droidfusion: a novel multilevel classifier fusion approach for android malware detection. J. IEEE Trans. Cybern. 49, 453–466 (2018) 2. I. YouI, K. Yim, Malware obfuscation techniques: a brief survey. in Proceedings of the 5th International Conference on Broadband, Wireless Computing, Communication and Applications, Fukuoka, Japan, 4–6 November 3. J. Grcar, John von Neumann’s analysis of Gaussian elimination and the origins of modern numerical analysis. J. Soc. Ind. Appl. Mathe. 53, 607–682 (2011) 4. P. John, J. Mello, Report: malware poisons one-third of world’s computers. Retrieved June 6, 2019, from Tech News World. https://www.technewsworld.com/story/80707.html (2014) 5. G. Guofei, A. Porras et al., Method and Apparatus for Detecting Malware Infections (Patent Application Publication, United Sates, 2015), pp. 1–6 6. A. Shamili, C Bauckhage et al., Malware detection on mobile devices using distributed machine learning. in Proceedings of the 20th International Conference on Pattern Recognition (Istanbul, Turkey, 2010), pp. 4348–4351
44
P. Sharma et al.
7. Y. Hamed, S. AbdulKader et al., Mobile malware detection: a survey. J. Comput. Sci. Inf. Sec. 17, 1–65 (2019) 8. B. India, S. Khurana, Comparison of classification techniques for intrusion detection dataset using WEKA. in Proceedings of the International Conference on Recent Advances and Innovations in Engineering, Jaipur, India, 9–11 May 9. M. Goldstein, S. Uchida, A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. J. PLOS ONE 11, 1–31 (2016) 10. L. Ruff, R. Vandermeulen et al., Deep semi-supervised anomaly detection. ArXiv 20, 1–22 (2019) 11. T. Schlegl, P. Seeböck et al., Unsupervised Anomaly Detection with (2017) 12. Generative adversarial networks to guide marker discovery. in Proceedings of the International Conference on Information Processing in Medical Imaging, Boone, United States, 25–30 June 13. A. Patch, J. Park, An overview of anomaly detection techniques: existing solutions and latest technological trends. Int. J. Comput. Telecommun. Netw. 51, 3448–3470 (2007) 14. V. Chandola, A. Banerjee, Anomaly detection: a survey. J. ACM Comput. Surv. 50, 1557–7341 (2009) 15. R. Bouckaert, Bayesian Network Classifiers in Weka. (Working paper series. University of Waikato, Department of Computer Science. No. 14/2004). Hamilton, New Zealand: University of Waikato: https://researchcommons.waikato.ac.nz/handle/10289/85 16. R. Mehata, S. Bath et al., An analysis of hybrid layered classification algorithms for object recognition. J. Comput. Eng. 20, 57–64 (2018) 17. S. Kalmegh, Effective classification of Indian news using lazy classifier IB1 and IBk from weka. J. Inf. Comput. Sci. 6, 160–168 (2019) 18. I. Pak, P. Teh, Machine learning classifiers: evaluation of the performance in online reviews. J. Sci. Technol. 45, 1–9 (2016) 19. L. Li, D. Yang et al., A novel rule-based intrusion detection system using data mining. in Proceeding of the International Conference on Computer Science and Information Technology. Chengdu, China, 9–11 July 20. D. Fu, S. Zhou et al., The Design and implementation of a distributed network intrusion detection system based on data mining. in Proceeding of the WRI World Congress on Software Engineering, Xiamen, China 19–21 2019 May 21. W. Chai, C. Tan et al., Research of intelligent intrusion detection system based on web data mining technology. in Proceedings of the International Conference on Business Intelligence and Financial Engineering. Wuhan, China, 17–18 October 22. M. Panda, M. Patra, Evaluating machine learning algorithms for detecting network intrusions. J. Recent Trends Eng. 1, 472–477 (2009)
Future Identity Card Using Lattice-Based Cryptography and Steganography Febrian Kurniawan and Gandeva Bayu Satrya
Abstract Unauthorized or illegal access to confidential data belonging to an individual or corporation is the biggest threat in information security. Many approaches have been proposed by other researchers to prevent credential data and identity theft, i.e., cryptography, steganography, digital watermarking, and hybrid system. In mid-90s, Shor’s algorithm was introduced to be used in quantum computing. This algorithm could break the well-known cryptography or steganography. Shor’s algorithm has been cleverly used in the quantum computing as a new breakthrough in computer science to parallelly solve problems (NP-hard). However, it can be a threat for security system or cryptosystem. This research proposed a new hybrid approach by using post-quantum cryptography and advanced steganography. N th degree truncated polynomial ring (NTRU) is one of the candidates of post-quantum cryptography that is claimed to be hard to break even with quantum computing. Least significant bit (LSB) is a spatial steganography technique done by replacing bit of the cover image with message bit. The result and comparison of the proposed approach with different existing cryptosystem proved that this approach is promising to be implemented in identity card, banking card, etc. Keywords Identity theft · Post-quantum cryptography · NTRU · Steganography · LSB · Identity card
Supported by PPM, SoC, and SAS, Telkom University, Indonesia. F. Kurniawan (B) School of Computing, Telkom University, Bandung, Republic of Indonesia e-mail: [email protected] G. B. Satrya School of Applied Science, Telkom University, Bandung, Republic of Indonesia e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_4
45
46
F. Kurniawan and G. B. Satrya
1 Introduction Toward incoming future communication, business in secured 5G networks flourishes on the confidentiality, integrity, and availability of personal or group business information. The task of ensuring business information safety is very simple in a closed environment but it can be complex in an open environment. It drives companies to outsource their system security and maintenance in order to reduce costs and streamlines. On the other hand, the outsourcing method allows at least two possible security breaches to the company’s system, i.e., a process perspective and a technology perspective. By outsourcing its data and system maintenance to other party, the company allows their business partner to access and process critical data such as government data, medical data, and intellectual capital. Not only that the company must get its business partners to commit to the formalized security measures and policies, but also must take steps to protect themselves in precautions that its business partners have a security breach. So as to prevent data breach, the company should ensure its data safety by implementing advanced information security for open environment. Based on Risk-Based Security’s data breach report issued on August 2019, there were more than 3,800 publicly disclosed breaches on the second quarter of 2019 exposing 3.2 billion of compromised records which were either being held for ransom or data stealing. A data breach occurs when confidential or private data and other sensitive information is accessed without authorization. There are two main risks in security system, i.e., threats and vulnerabilities. Threats can occur at every second in various ways, e.g., gaining access, denial of service, man-in-the-middle attack, etc. While vulnerabilities can be fixed by redesigning the security system so that adversary penetrating test cannot cause breakage. Those problems can be addressed by developing a new cryptography, steganography, digital watermarking or by mixing all those three systems called as a hybrid system. Many studies have proposed methods in information security system by combining cryptography and steganography. Bloisi and Iocchi described a method for integrating Vernam cryptography and discrete cosine transform (DCT) steganography through image processing [4]. Narayana and Prasad proposed methods for securing image by converting it into cipher text using S-DES algorithm and concealing this text in another image by using least significant bit (LSB) steganographic method [14]. Another approach was used by Joshi and Yadav where Vernam cipher algorithm was utilized to encrypt a text and then the encrypted message was embedded inside an image using the LSB with shifting (LSB-S) steganography [12]. Moreover, Abood proposed hybrid cryptography and steganography. The cryptography used RC4 stream cipher and the steganography used hash-LSB (HLSB) with RGB pixel shuffling [1]. Budianto et al. applied elliptic curve cryptography (ECC) to encrypt data in identity card and then used LSB to embed the chiper text into an image [6]. Considering quantum computing as an ultimate threat for a secured system or cyrptosystem, the researches in information security are thriving toward post-quantum cryptography. Provable security aspect of post-quantum cryptography is impossible to encrypt or decrypt as it requires a solution to NP-hard problems for attacks from
Future Identity Card Using Lattice-Based Cryptography …
47
a quantum computer. Lattice-based cryptography is the most secured candidate of cryptosystem to resist attacks from quantum computing [5]. Recent developments in lattice-based cryptography are one-way function [2], collision-resistant hash function [9], and public-key cryptosystems [10, 11]. Furthermore, digital steganography is another approach that can be used in information security, e.g., spatial domain, transform domain, and spread spectrum. LSB steganography in spatial domain is manipulating and storing secret information into the least significant bits of a cover file. Based on previous researcher, hybrid system is one of the promising candidates to address the vulnerability in universal information secured system. The contributions of this research are as follows: (i) Presenting a new system architecture by using lattice-based cryptography and advanced steganography. (ii) Proposing NTRU cryptography for securing text to be embed into image. (iii) Using a new development of LSB steganography, e.g., spiral-LSB for embedding secret text into image. (iv) Comparing the proposed system (NTRU) with the existing cryptography, e.g., AES and RSA. As for the rest of this research, Sect. 2 reviews recent researches on advanced cryptography and steganography. Section 3 explains the proposed architecture including the encoding and decoding procedures. Then, the details of results and benchmarking with existing cryptography are provided in Sect. 4. Finally, Sect. 5 gives the conclusions of this research.
2 Literature Review 2.1 Hybrid Information Security A new method in information security was presented by Joshi and Yadav for image LSB steganography in gray images combined with Vernam cryptography [12]. First, the message was encrypted using Vernam cipher algorithm and then the encrypted message was embedded inside an image using the new image steganography. LSB with shifting steganography was proposed by performing circular left shift operation and XOR operation. The amount of pixels in an image was 256 * 256, i.e., 65,536 while the amount of hidden bit was 65,536. If all of LSB bits were being extracted by the intruder, they would not get the message. The experimental result showed that all PSNR values were below 70 dB. Mittal et al. combined and implemented RSA asymmetric key cryptography and least significant bit steganography technique [13]. The original message was encrypted by using RSA and the cipher text obtained as the output was taken as an input data for being embedded in the cover image, the subsequent stego-image had cipher text embedded in it. The analysis revealed that for preserving the implementa-
48
F. Kurniawan and G. B. Satrya
tion of LSB technique and RSA data security algorithm, the most crucial thing was to ensure that the size of original image and stego image must be equivalent and it also applied for plaintext and cipher text used in RSA. Histogram, time complexity, and space complexity analysis were also provided as experiment results but the process to obtain the results was not described thoroughly. Rachmawanto et al. proposed steganographic mobile applications using LSB method by adding AES encryption to improve security [17]. First, entered the text and the key AES that will be hidden. Second, read the cover image and the LSB key, then performed the embedding process by using LSB algorithm. In the preprocessing data, it was tested using five different sizes of cover images. The result of PSNR and histogram was obtained with most of the averages to be under 70 dB. Abood introduced RC4 stream cipher for encryption and decryption processes based on image matrix. The study also proposed steganography by using hash-LSB (HLSB) with RGB pixel shuffling [1]. RC4 only requires byte-length manipulations so it is suitable for embedded systems. Despite the vulnerabilities in RC4, the combination of RC4 and RGB pixel shuffling makes it almost impossible to break. The image encryption and decryption processes used pixel shuffling. However, the PSNR and security quality values showed that this method still needed improvements. Alotaibi et al. designed security authentication systems on mobile devices by combining hash, cryptography, and steganography mechanisms [3]. The hash function provided a message authentication service to verify authentication and integrity, as found in MD5 and SHA-1. During signup or login, a cover image was chosen by the user as a digital signature. Afterward, the user would encrypt the password using AES algorithm with username as a key. Then, the result of AES algorithm would be hidden in cover image by using LSB algorithm. The study concluded three recommended techniques, e.g., AES with LSB, hash with LSB, and the combination of hash, AES, and LSB. According to the results, all PSNR values were less than 40 dB. To the best of authors’ knowledge, the preliminary stage in this research has been carried out by surveying eleven relevant literature in the last five years. Table 1 explained and compared the literature thoroughly in order to find a gap that can be used as the art of the state for this research in crypto-stegano system.
2.2 Implementation of NTRU This research implements NTRU cryptosystem introduced by Hoffstein, Pipher and Silverman as a new public-key cryptosystem for securing the message for general purposes [11]. The security of NTRU is obtained from the interaction of the polynomial mixing system with the independence of reduction modulo two relatively prime integers p and q. An NTRU cryptosystem derives on three integer parameters (N , p, q) and for set lattice-based ζ f , ζg , ζt , ζm of polynomials of degree N − 1 with integer coefficients [11]. Considering that p and q do not need to be prime, but this research assumes that gcd( p, q) = 1, and q will always be considerably larger than
Future Identity Card Using Lattice-Based Cryptography … Table 1 Recent studies in cryptography and steganography No. Relevant study Methodology Phase 1 Phase 2 1
2
3
4
5
6
7
8 9
10
11
Joshi and Yadav [12]
First encrypted using Vernam cipher algorithm
49
Remarks
LSB with shifting PSNR and (LSB-S) histrogram with different message sizes Reddy and Text is encrypted LSB with LL No evaluation of Kumar [18] using AES sub-band wavelet the stego image decomposed image Bukhari et al. [7] LSB Double random PSNR and steganography phase encoding entropy with (DRPE) noise type cryptography (Gaussian, salt & pepper and speckle) Mittal et al. [13] RSA for the LSB for the Time complexity, message images space complexity, histogram Patel and Meena Pseudo random Dynamic key PSNR [15] number (PRN) rotation which provide cryptography double layer security Phadte and Randomized LSB Encrypted using Histogram and Dhanaraj [16] steganography chaotic theory key sensitive analysis Rachmawanto et AES-128 bit for LSB for the PSNR and al. [17] the text images histrogram with different image sizes Chaucan et al. [8] Variable block LSB PSNR and size cryptography steganography entropy Abood [1] RC4 Hash-LSB PSNR, histogram, cryptography for steganography security quality, image elapsed time for secret image and cover images Saxena et al. [19] Proposed LSB PSNR and encryption entropy architecture using EI(secret image), k, and CI (cover image) Budianto et al. [6] ECC for data LSB for person PSNR information picture
50
F. Kurniawan and G. B. Satrya
p. This research operates in the ring of R = Z[X ]/(X n − 1). An element F ∈ R will be written as a polynomial or a vector. To create an NTRU key, cryptosystem randomly designates two polynomials f, g ∈ ζg . The polynomial f must fulfill the additional requirement that it has inverses modulo q and modulo p. It will denote those inverses by Fq and F p as Eq. 1. Fq f ≡ 1
mod q and Fq f ≡ 1 mod q
(1)
Then the quantity is computed with Eq. 2. User’s public key is the polynomial μ and user’s private key is the polynomial f . μ ≡ Fq g
2.2.1
mod q
(2)
Encrypting the Message
Presume that Alice wants to send a message to Bob. Alice starts selecting a message m from the set of plaintexts ζm . Next, Alice randomly chooses a polynomial t ∈ ζt and uses Bob’s public key h to compute as can be seen in Eq. 3. This ξ is the encrypted message. ξ ≡ pφ μ + m
2.2.2
mod q
(3)
Decrypting the Message
In the event that Bob has received the message ξ from Alice, Bob wants to decrypt it using his private key f . To do this efficiently, Bob should precompute the polynomial F p . In order to decrypt ξ , Bob first has to compute as Eq. 4. Bob then chooses the coefficients of ψ in the interval from −q/2 to q/2. Now considering ψ as a polynomial with integer coefficient, Bob recovers the message by computing as Eq. 5. ψ ≡ f ξ Fp ψ
mod q
mod p
(4)
(5)
Future Identity Card Using Lattice-Based Cryptography …
51
3 Proposed System 3.1 System Architecture The embedding system will be begun with message encryption, producing the cipher and preparing the cover image as the carrier as can be seen in Fig. 1. To manipulate the last bits of the designated pixels by using spiral-LSB, both the cipher and the cover image need to be converted to bits. The converted bits on the image are the values of the pixels R, G, B channel. After the bit conversion, next is calculating the cipher length, then the cipher bits and the image bits will be generated. To do encoding in spiral pattern, the first step is to generate the location list of the pi xels(x, y), the center will be the starting point. The cipher length will be encoded in the carrier on the first n-bytes (customize) as identifier of the amount of data embedded on the carrier so the decoder knows where to stop decoding. Then, every bits of the cipher is placed into the last bit of every R, G, B value of the pixel. When all last bit of the pixels is filled, the encoder will proceed to the previous bit slot and repeat the process for the rest of the cipher bits. The spiral-LSB encoding is expected to be an improvement of the conventional LSB.
3.2 Encoding Procedure Algorithm 1 is going to be conducted with a spiral pattern to prevent adversary or infiltrator from extracting the data embedded in the carrier. The extracting process will be more complex than the conventional LSB pattern where it uses the edge as starting point. It affects the security of the embedded data because the pattern will be far less predicted. Moreover, the spiral-LSB encoding will use the cipher length as the identifier on the amount of data supposed to be hidden in the carrier. In order to get the data, the decoder must know the length which encoded on the first n-bytes
Spiral-LSB
Message
EncrypƟon
Cipher
Bit Conversion
Cover Image Stegano Image
Fig. 1 Proposed crypto-stegano architecture
Cipher Length
Encode
Cipher Bits
Image Bits
52
F. Kurniawan and G. B. Satrya
(this experiment used the first 2 bytes). Then the iteration will be started to decode the embedded data according on how many data are hidden inside. Algorithm 1 Secured Spiral Encoding 1: INITIALISE Message 2: Pixels ← LOAD image data 3: Cipher ← CALL cryptosystem 4: Length ← Length of cipher 5: Spiral ← GENERATE centered spiral Pixels location 6: PUT Length to first 2 bytes 7: for x in Length*8: do 8: PUT Cipher[x] to last bit of R/G/B channel spiral[1] of Pixels 9: NEXT channel 10: if Spiral[1] channels filled then 11: POP Spiral 12: end if 13: end for 14: SAVE encoded Pixel as an image
determining the plain-text determining the cover image encryption with AES, RSA or NTRU
stego image
3.3 Decoding Procedure The decoder needs to know the cipher length to get data inside the carrier. This security addition becomes one of the keys to extract the data. For example, if the encoder sets the cipher length identifier to 2 bytes then the decoder needs to get the first 2 bytes from the first pixel as starting point to get the length of the embedded data. Without this parameter, the decoder will not know where to start and where to stop the data extraction. The decoder also needs to acknowledge the spiral pattern to know the pixel location sequence of the hidden data. The generated spiral pixel location will be used to extract the data with the iteration from the extracted cipher length as described in Algorithm 2. Algorithm 2 Secured Spiral Decoding 1: Pixels ← LOAD stego image 2: Spiral ← GENERATE centered spiral pixels location 3: Length ← READ Length on first 2 bytes 4: Cipher → Empty space 5: for x=1 to Length*8: do 6: Cipher ← Cipher+ Last bit of R/G/B channel spiral[1] of Pixels 7: NEXT Channel 8: if Spiral[1] channels extracted then 9: POP Spiral 10: end if 11: end for 12: CALL cryptosystem 13: OUTPUT Message
the output from Algorithm 1 to assign variable of ciphers
decryption with AES, RSA or NTRU extracting the plain-text
Future Identity Card Using Lattice-Based Cryptography …
53
4 Result and Analysis The results of NTRU implementation can be seen in Table 2 by comparing AES with RSA. The representative images for testing were the well-known image processing testing with the size of 512 × 512 in .png format, i.e., Lena, Baboon and Pepper. The parameters used in this research were encoding time, decoding time, histogram, and PSNR (w.r.t MSE), the highest parameters are highlighted. The test was carried out by using 32 bytes of plaintext and various key length for each cryptosystem. This research provided two NTRU recommendations with different keys. NTRU gave a fairly constant result even when the key length level was increased. As can be seen, there was quite a high increase in the key in RSA which was the time consumption of the encode and decode processes. In addition, by increasing the key length of NTRU also resulted in a fairly constant PSNR. To verify the quality of stegano images when embedded with encryption is shown in Fig. 2 from the histogram. The results of this research have successfully demonstrated the differences in histogram from the original image with AES and RSA. These results explained that NTRU can produce more stable stegano image quality as the difference between NTRU-439 and NTRU-743. Abiding the proceeding rules about number of pages, the detailed results for the histogram will not be shown instead the Lena image is used a representative of the results.
Table 2 Validating sample images with spiral-LSB on each cryptography system Image Parameter AES-128 RSA-3072 RSA-7680 NTRU-439 Lena
Baboon
Pepper
Encode time (ms) Decode time (ms) PSNR (dB) Encode time (ms) Decode time (ms) PSNR (dB) Encode time (ms) Decode time (ms) PSNR (dB) Length plain (bytes) Length cipher (bytes)
NTRU-743
260.625
445.047
714.479
408.889
453.681
182.577
439.030
685.992
386.814
434.599
82.971 283.994
72.237 459.688
68.215 726.358
73.989 389.848
72.557 425.883
284.549
441.952
682.593
393.097
436.533
82.726 356.599
72.281 501.394
68.214 770.478
73.882 431.288
72.565 476.118
273.602
439.756
681.508
294.788
329.983
83.108 32
72.153 32
68.252 32
73.953 32
72.543 32
64
768
1920
520
708
54
F. Kurniawan and G. B. Satrya
(a) Histogram analysis of original Lena
(b) Histogram analysis of Lena with AES
(c) Histogram analysis of Lena with RSA
(d) Histogram analysis of Lena with NTRU
Fig. 2 Comparison between original, AES, RSA, NTRU in lena image
5 Conclusion This research proposed a new system architecture of crypto-stegano in order to improve information security. The overall experiments showed that combining NTRU cryptography and spiral-LSB steganography outperforms some aspects of conventional encryption, including time performance. This remarkable result indicated that NTRU lattice-based cryptography can be a candidate to be implemented as future identity card, banking card, etc. Indeed, some modifications are needed to gain optimal security for the implementation. Also need to be noted that the experiment may vary because of the matter of hardware performances and there are still some improvements needed on the hardware types.
Future Identity Card Using Lattice-Based Cryptography …
55
References 1. M.H. Abood, An efficient image cryptography using hash-lsb steganography with rc4 and pixel shuffling encryption algorithms, in 2017 Annual Conference on New Trends in Information Communications Technology Applications (NTICT) (March 2017), pp. 86–90. https://doi.org/ 10.1109/NTICT.2017.7976154 2. M. Ajtai, Generating hard instances of lattice problems (extended abstract), in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’96 (ACM, New York, NY, USA, 1996), pp. 99–108. https://doi.org/10.1145/237814.237838 3. M. Alotaibi, D. Al-hendi, B. Alroithy, M. AlGhamdi, A. Gutub, Secure mobile computing authentication utilizing hash, cryptography and steganography combination. J. Inf. Security Cybercrimes Res. (JISCR) 2(1) (2019). https://doi.org/10.26735/16587790.2019.001 4. D.D. Bloisi, L. Iocchi, Image based steganography and cryptography, in VISAPP, vol. 1 (Citeseer, 2007), pp. 127–134 5. Z. Brakerski, V. Vaikuntanathan, Lattice-based fhe as secure as pke, in Proceedings of the 5th Conference on Innovations in Theoretical Computer Science, ITCS ’14 (ACM, New York, NY, USA, 2014), pp. 1–12. https://doi.org/10.1145/2554797.2554799 6. C.D. Budianto, A. Wicaksana, S. Hansun, Elliptic curve cryptography and lsb steganography for securing identity data, in R. Lee (ed.), Applied Computing and Information Technology (Springer International Publishing, Cham, 2019), pp. 111–127. https://doi.org/10.1007/9783-030-25217-5_9 7. S. Bukhari, M.S. Arif, M.R. Anjum, S. Dilbar, Enhancing security of images by steganography and cryptography techniques, in 2016 Sixth International Conference on Innovative Computing Technology (INTECH) (Aug 2016), pp. 531–534. https://doi.org/10.1109/INTECH.2016. 7845050 8. S. Chauhan, K.J. Jyotsna, A. Doegar, Multiple layer text security using variable block size cryptography and image steganography, in 2017 3rd International Conference on Computational Intelligence Communication Technology (CICT) (Feb 2017), pp. 1–7. https://doi.org/10. 1109/CIACT.2017.7977303 9. O. Goldreich, S. Goldwasser, S. Halevi, Collision-free hashing from lattice problems. IACR Cryptol. ePrint Archive 9 (1996) 10. O. Goldreich, S. Goldwasser, S. Halevi, Public-key cryptosystems from lattice reduction problems, in B.S. Kaliski (ed.), Advances in Cryptology—CRYPTO ’97 (Springer, Berlin, Heidelberg, 1997), pp. 112–131. https://doi.org/10.1007/BFb0052231 11. J. Hoffstein, J. Pipher, J.H. Silverman, Ntru: a ring-based public key cryptosystem, in J.P. Buhler (ed.), Algorithmic Number Theory (Springer, Berlin, Heidelberg, 1998), pp. 267–288. https://doi.org/10.1007/BFb0054868 12. K. Joshi, R. Yadav, A new lsb-s image steganography method blend with cryptography for secret communication, in 2015 Third International Conference on Image Information Processing (ICIIP) (Dec 2015), pp. 86–90. https://doi.org/10.1109/ICIIP.2015.7414745 13. S. Mittal, S. Arora, R. Jain, Pdata security using rsa encryption combined with image steganography, in 2016 1st India International Conference on Information Processing (IICIP) (Aug 2016), pp. 1–5. https://doi.org/10.1109/IICIP.2016.7975347 14. S. Narayana, G. Prasad, Two new approaches for secured image steganography using cryptographic techniques and type conversions. Signal Image Process. Int. J. (SIPIJ) 1(2) (2010). https://doi.org/10.5121/sipij.2010.1206 15. N. Patel, S. Meena, Lsb based image steganography using dynamic key cryptography, in 2016 International Conference on Emerging Trends in Communication Technologies (ETCT) (Nov 2016), pp. 1–5. https://doi.org/10.1109/ETCT.2016.7882955 16. R.S. Phadte, R. Dhanaraj, Enhanced blend of image steganography and cryptography, in 2017 International Conference on Computing Methodologies and Communication (ICCMC) (July 2017), pp. 230–235. https://doi.org/10.1109/ICCMC.2017.8282682
56
F. Kurniawan and G. B. Satrya
17. E.H. Rachmawanto, R.S. Amin, D.R.I.M. Setiadi, C.A. Sari, A performance analysis stegocrypt algorithm based on lsb-aes 128 bit in various image size, in 2017 International Seminar on Application for Technology of Information and Communication (iSemantic) (Oct 2017), pp. 16–21. https://doi.org/10.1109/ISEMANTIC.2017.8251836 18. M.I.S. Reddy, A.S. Kumar, Secured data transmission using wavelet based steganography and cryptography by using aes algorithm. Proc. Comput. Sci. 85, 62–69 (2016). https://doi.org/10. 1016/j.procs.2016.05.177; International Conference on Computational Modelling and Security (CMS 2016) 19. A.K. Saxena, S. Sinha, P. Shukla, Design and development of image security technique by using cryptography and steganography: a combine approach. Int. J. Image Graph. Signal Process. 10(4), 13 (2018). https://doi.org/10.5815/ijigsp.2018.04.02
Cryptanalysis on Attribute-Based Encryption from Ring-Learning with Error (R-LWE) Tan Soo Fun and Azman Samsudin
Abstract The encouraging outcomes on the hardness of the lattice-based problem in resisting the recent quantum attacks have aroused the recent works of attribute-based encryption (ABE) schemes. In October 2014, Zhu et al. presented an increasingly proficient ABE with the hardness of ring-learning with error (R-LWE), so-called as ABER-LW E scheme. The ABER-LW E scheme is assured under the selective-set model, and it can be further achieved the shortest vector problem (SVP) in the worst case of ideal lattices problem. However, there is a noteworthy defect in the structure of the ABER-LW E scheme. The key generation algorithm that served as the core component of the ABER-LW E is defenseless against collusion attack. The contribution of this paper includes: (i) discusses the collusion attacks in the hardness of R-LWE problem. Subsequently, this paper demonstrates how the illegitimate users collude with other users by pooling their insufficient attributes together to recover a user’s private key and recover an encrypted message; (ii) suggests several alternatives to prevent such attacks in the ring-LWE lattice problem. Keywords Attribute-based encryption · Lattice · Ring-learning with error (R-LWE) · Collusion attacks
1 Introduction Attribute-based encryption (ABE) dominated as the holy grail of modern cryptography recently, promising an efficient tool to defeat the long-standing performance bottleneck issues of public key infrastructure (PKI). The ABE is able to control a finegrained access on ciphertext by holding attributes set as a public key and associating it T. S. Fun (B) Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu 88400, Malaysia e-mail: [email protected] A. Samsudin School of Computer Sciences, Universiti Sains Malaysia, Penang 11800, Malaysia © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_5
57
58
T. S. Fun and A. Samsudin
either with encrypted data (so-called as ciphertext attribute-based encryption scheme, CP-ABE) or user’s private key (so-called as key-policy attribute-based encryption scheme, KP-ABE). In general, the structure of ABE design can be ordered into two primary streams: bilinear pairings on elliptic curves structure and lattice structure. Majority of former researches aimed to design the ABE scheme based on the bilinear pairings on elliptic curves structure either KP-ABE [1–4] or CP-ABE [5–11]. Recent researchers had proposed ABE scheme from the quantum cryptology aspect, which is based on the lattice approach [12–14]. The lattice problems are generally being conjectured to be able to withstand quantum attacks and the provably secure from the worst-case hardness problem under a quantum reduction [15–17] form a solid foundation to design a secure ABE scheme. While the vast majority of the post-quantum ABE schemes are designed with the hardness of learning with error (LWE) problem [15–17], Zhu et al. [14] extended the pairing-based fuzzy identity-based encryption (FIBE) [1] into lattice cryptography with the aim of enhancing the algorithm efficiency, further named it as ABER-LW E scheme. Compared to the former lattice-based ABE schemes which were constructed based on lattice groups, which are as yet experiencing the quadratic overhead issue in the hardness of LWE problem, the ABER-LW E scheme is designed based on the ideal lattices, as originally proposed by Lyubashevsky et al. [18]. The ideal lattice is a special group of lattices which is a generalization of cyclic lattice that benefits performance efficiency as contrasted to lattices [19, 20]. The hardness of ABER-LW E scheme is originated with the assumptions of R-LWE problem, which had been proven to be capable to achieve the lattice worst-case scenario of shortest vector problem (SVP) [18, 21]. However, the ABER-LW E scheme inherited some design flaw from FIBE, in which FIBE has been proven to be IND-FID-CCA insecure [12]. This chapter demonstrates that the ABER-LW E scheme is inadequately against the collusion attack and demonstrates that there is a possibility that multiple unauthorized users can affiliate and aggregate their private keys to recover the original message. The rest of this paper is structured as follows. Section 2 defines lattice foundations and collusion attack. Section 3 reviews algorithms of ABER-LW E scheme and performs cryptanalysis on ABER-LW E scheme. Lastly, Sect. 4 suggests alternative for improving ABER-LW E scheme.
2 Preliminaries This section discusses the foundations of collusion attacks, which are needed to demonstrate a cryptanalysis attack on the ABER-LW E scheme are presented. The definition of lattice, ideal lattice, decision R-LWEd ,q,χ problem [16, 18] and the reduction of R-LWE problem [16] to achieve the worst case of SVP in ideal lattice over the ring over R can be referred to previous research in [22, 23].
Cryptanalysis on Attribute-Based Encryption …
59
Definition 1 (Threshold (t,n) Attribute-Based Encryption, Threshold ABE [1, 19]): A threshold (t,n) ABE scheme is a fourfold of probabilistic polynomial time (PPT) algorithms that consists of Setup, KeyGen, Encrypt, Decrypt such that: Setup (1λ , U ). With security parameter λ, and universe attributes set U , the function generates the ABE’s master key, MK and public key, PK. KeyGen (MK, PK, W, t). Given the MK, the PK, a set of attributes W and a threshold value t, the function outputs a private key DW for W . Encrypt (PK, W , t, M ). Given the PK, attributes set W , the threshold value t and a message M , the function will produce the ciphertext C. Decrypt (PK, C, W , t, DW ). Given the public key PK, the C for W , the threshold value t, and private key DW for W , the function will produce the message M if |W ∩ W | ≥ t. In a basic threshold (t, n) ABE scheme, a message is encoded with n attributes of W , such that a user can recover the original message correctly with his private key that consists of at least threshold t of common attributes W as compared to the attributes set that embedded in ciphertext W . Similar to other ABE schemes, the user’s private key in the ABER-LW E scheme is generated with the Shamir’s (t, n) threshold secret sharing scheme [25], which further describes as follows. Definition 2 (Shamir’s Threshold (t,n) Secret Sharing Scheme [19, 24], Shamir’s Threshold (t, n) SSS): In the ABER-LW E , Shamir’s threshold (t, n) SSS is constructed based on Lagrange polynomials interpolation over the ring Rq . The Lagrange coeffi cient Li,S for i ∈ Zq and a set of S elements in Zq is formulated as Li,S = j∈S,j=i y−j i−j . Setup. Given P = {P1 , P2 , . . . , Pn } is participants set. Define the threshold value, t and select secret value, SK ∈ Rq to be shared. Next, the dealer selects a polyno j mial with degree t − 1 and constructs d (y) = SK + t−1 j=1 dj y , where dj ∈ Rq and ∗ dj = 0. The dealer further selects n distinct elements yi of Zq , calculates secret share, si = d (yi ) ∈ Rq and lastly sends a pair of (yi , si ) to each participant, Pi . Pooling of Shares.Any t participants pool their secret shares with the Lagrange interpola can . Let Y = {yi }ti=1 , then the secret SK can tion formula: d (y) = ti=1 si tj=1,j=i y−j i−j t t −yj be obtained as d (0) = i=1 si j=1,j=i yi −yj = i∈Y si Li,Y (0) ∈ Rq . Definition 3 (Selective-Set Model [14]): The security of ABE scheme is assured if each of the probabilistic polynomial time (PPT) adversary has at most a negligible advantage in the following selective-set game. Init. The adversary A defines attributes set W ∗ that he wishes to be challenged upon. Setup. The challenger executes the algorithm Setup, subsequently delivers the public key PK to A. Phase 1. A issues the adaptive queries q1 , q2 , . . . , qm of private keys Dγj , for the attributes γj of his choice, with the chosen γj fulfills the condition of |γj ∩ W ∗ | < t for all γj , where j ∈ {1, 2, . . . , m}. Challenge. A indicates its preparation to receive a challenge and proposes a message to encode. The challenger encodes the message with a set of challenged attributes, W ∗ . The challenger scramblers a random binary coin, r. If r = 1, the encrypted message is delivered to A, otherwise, the challenger generates and returns a random encrypted element of the ciphertext space. Phase 2. Rerun the Phase 1. Guess. A returns a guess of r of r. The adversary A’s advantages, is formulated as ad v(A) = |Pr[r = r] − 21 |.
60
T. S. Fun and A. Samsudin
Next, we further define collusion attack in their defined selective-set model as follows. Definition 4 (Collusion Attack [21]): Collusion attack of the ABER-LW E scheme is formulated in the game of challenger and an adversary A as follows. Init. The adversary A defines attributes set W ∗ that he wishes to be challenged upon. Setup. The challenger executes the algorithm Setup and delivers the public key PK to A. Phase 1. A issues the adaptive queries q1 , q2 , . . . , qm for private keys Dγj of his choosen attributes γj , with the chosen fulfills the condition of |γj ∩ W ∗ | < t for all γj , where j ∈ {1, 2, . . . , m}. Challenge. A outputs a secret key DW ∗ by combining each of Dγj . Given the public key PK, the ciphertext C for attributes set W , the threshold value t, A decrypts a message M by choosing an arbitrary t-element subset of DW ∗ such that |W ∗ ∩ W | ≥ t and outputs a valid message M . Then, the adversary A conduct a collusion attack successfully on the scheme.
3 Cryptanalysis of ABER-LW E Scheme 3.1 Review of ABER-LW E Scheme The ABER-LW E scheme extends the FIBE scheme into lattice cryptosystem, which structured with the hardness of R-LWE problem. In ABER-LW E scheme, both user’s private key and encrypted message are attached with attributes set, and decryption works correctly if the attached attribute set in both encrypted message and user’s private key is overlapping with leastwise threshold value, t. As noticed, the access structure of the ABER-LW E scheme is fairly straightforward as contrasted to others pairing-based ABE scheme whose are able to handle the non-monotonic and monotonic access structure includes OR and AND gate. The ABER-LW E scheme consists of four algorithms as follows: Setup (1n , U ). With security parameter, n = 2λ (λ ∈ Z+ ), and a universe set of attributes U of size u, choose an adequately large prime modulus q = 1 mod (2n) and a small non-negative integer p (typically p = 2 or p = 3). Set f (x) = xd + 1 ∈ Z[x] and Rq = Zq [x]/ < f (x) >. With χ ⊂ Rq be an error distribution, choose a uniformly random a ← Rq and SK ← Rq . Select a random error term, e ← χ and define PK0 = (a · SK + pe) ∈ Rq . Next, select a uniformly random −1 for each attribute i ∈ u, where SKi−1 is the inverse SKi ∈ R× q together with SKi × of SKi ∈ Rq . Choose a random error term, ei ← χ where each i ∈ U and compute PKi = (a · SKi−1 + pei ) ∈ Rq for each attribute i ∈ U . Lastly, outputs: Public key: PK = (PK0 , {PKi }ui=1 ) Master key: MK = (a, SK, {SKi }ui=1 , {SKi−1 }ui=1 ) KeyGen (MK, PK, W, t). Given the MK, PK, W ⊆ U with size of w and t ≤ u. j Choose a polynomial d (y) = SK + t=1 i=1 dj y of degree t − 1 such that d (0) = SK where dj ← Rq is a random element in Rq .
Cryptanalysis on Attribute-Based Encryption …
61
Set Di = (d (i) · SKi ) ∈ Rq for all i ∈ W . Generates the private key, DW = {Di }i∈w for W . Encrypt (PK, W , t, M ). Given the PK, t, W ⊆ U with size w , such that |w | ≥ t and M ∈ {0, 1}n ⊂ Rq .generate a uniformly random r ∈ Rq , error terms e0 and ei for each i ∈ W from χ ⊂ Rq . Outputs the ciphertext, C = (C0 , {Ci }i∈w ) where: C0 = (PK0 · r + pe0 + M ) ∈ Rq Ci = (PKi · r + pei ) ∈ Rq for all i ∈ W Decrypt(PK, C, W , t, DW ). Given the PK, C for W , t, and DW for W . If |W ∩ W | < t, output ⊥; otherwisegenerate an arbitrary t-element subset of W ∩ W , and calculate M = (C0 − ti=1 Ci · Di · Li,wD (0)) ∈ Rq where Li,wD (0) is Lagrange coefficient, then outputs the plaintext M = M mod p.
3.2 Collusion Attack on ABER-LW E Scheme Collusion resistance serves as a security necessity linchpins in structuring the ABE design. Generally, collusion resistance denotes that multiple illegitimate users, whose associated attributes are inadequately to satisfies the control access policy, shape an alliance and aggregate their attributes together to form a valid private key and decrypt message correctly. Zhu et al. [25] claimed that the ABER-LW E scheme is assured under the hardness of R-LWE assumptions in selective-set model. However, we found that ABER-LW E scheme cannot resist the collusion attack under selective-set model. Firstly, the encryption of each attribute separately as Ci = (PKi · r + pei ) ∈ Rq and straightforwardly embedded each of these encrypted attributes as separate component into ciphertext, leads them vulnerable and exposed to others to collect the valuable information such as the total of attributes included and attribute list that are needed to recover the message. Secondly, the user’s secret key, DW = {Di }i∈w , is generated directly by combining each of their associated attributes vector as Di = (d (i) · SKi ) ∈ Rq for all i ∈ w. Such construction leads the ABER-LW E scheme to easily fall prey to simple collusion attacks. Without using randomization technique, any unauthorized users can re-generate the private key by aggregating the component of other user’s private keys. Subsequently, based on the information obtained from ciphertext, Ci and public key PK, select t-element subset of W ∩ W randomly, subsequently a message M can be recovered correctly. It should be noted that similar to FIBE scheme, the ABER-LW E scheme did inherent a randomization technique from Shamir’s secret sharing scheme by incorporating independently the chosen secret shares into each d (i). However, this does not insure that their ABER-LW E scheme is collusion resistance due to the problem as mentioned above. In the following section, we show that there exist a polynomial time adversary A who can perform collusion attack against the ABER-LW E scheme. Init. The adversary A defines a set of attributes W ∗ ⊆ U with size w ∗ and |w ∗ | ≥ t that he wishes to be challenged upon.
62
T. S. Fun and A. Samsudin
For a simple example, let U = {att1 , att2 , . . . , attu } be a universe attributes set with a size of u = 50 and threshold value, t = 4. A defines attributes set, W ∗ = {att1 , att2 , . . . , attw∗ } with a size of w ∗ = 10. Setup. A receives the public key, PK = (PK0 , {PKi }ui=1 ) from the challenger that runs Setup algorithm from the ABER-LW E scheme. Phase 1. A selects his choice of attributes γj where |γj ∩ W ∗ | < t. Then, A sends adaptive queries q1 , q2 , ..., qm in order to obtain the private key Dγj based on his choice of attribute γj for all j ∈ {1, 2, . . . , m}. For example, A sends adaptive queries based on his declaration of W ∗ = {att1 , att2 , . . . , attw∗ } with a size w ∗ = 10 during the Init phase, as follows: • For query q1 , A selects his choice of attributes γ1 = {att1 , att3 , att4 , att12 , att23 } and receives private key, Dγ1 = {Di }i∈γ1 = {Datt1 , Datt3 , Datt4 , Datt12 , Datt23 }. • For query q2 , A selects his choice of attributes γ2 = {att2 , att5 , att7 , att28 , att34 , att49 } and receives private key, Dγ2 = {Di }i∈γ2 = {Datt2 , Datt5 , Datt7 , Datt28 , Datt34 , Datt49 }. • For query q3 , A selects his choice of attributes γ3 = {att6 , att8 , att24 , att33 , att47 } and receives private key, Dγ3 = {Di }i∈γ3 = {Datt6 , Datt8 , Datt24 , Datt33 , Datt47 }. • For query q4 , A selects his choice of attributes γ4 = {att9 , att10 , att18 , att21 , att29 } and receives private key, Dγ4 = {Di }i∈γ4 = {Datt9 , Datt10 , Datt18 , Datt21 , Datt29 }. Challenge. A firstly pools his obtained private keys,Dγ1 , Dγ2 , . . . , Dγm together, to re-construct his secret key, Dw∗ . Given the PK, C for W , t, A recover a M ∗ ∗ by choosing an arbitrary t t-element subset of DW such that |W ∩ W | ≥ t and compute M = (C0 − i=1 Ci · Di · Li,w∗ D (0)) ∈ Rq where Li,wD (0) is Lagrange coefficient, then outputs a valid message, M = M mod p. For example, A re-construct his private key, DW ∗ , by pooling his obtained private keys,Dγ1 , Dγ2 , Dγ3 and Dγ4 . Then, DW ∗ = Datt1 , Datt2 , . . . , Datt10 , Datt12 , Datt18 , Datt21 , Datt23 , Datt24 , Datt28 , Datt29 , Datt33 , Datt34 , Datt47 , Datt49 }. The encrypted message is attached with attributes set, W = {att2 , att3 , att6 , att8 , att9 , att11 , att27 , att39 , att55 } with a threshold value, t = 4. Given a ciphertext, C = (C0 , {Ci }i∈w ) where: C0 = (PK0 · r + pe0 + M ) ∈ Rq ] Ci = (PKi · r + pei ) ∈ Rq for all i ∈ w ] Since |W ∗ ∩ W | ≥ t, A is able to recover a message M by choosing an arbitrary t-element subset of DW ∗ . For instance, t-element subset of DW ∗ can be {Datt2 , Datt3 , Datt6 , Datt8 }, {D att3 , Datt6 , Datt8 , Datt9 }, {Datt2 , Datt3 , Datt8 , Datt9 }, etc. Then, A calculate M = (C0 − ti=1 Ci · Di · Li,w∗ D (0)) ∈ Rq where Li,wD (0) is Lagrange coefficient, and recover a valid message, M = M mod p. Thus, the adversary A successfully conducts a collusion attack on the ABER-LW E scheme.
Cryptanalysis on Attribute-Based Encryption …
63
4 Conclusion Generally, collusion attack on ABE scheme can be insured by applying secret key randomization technique [5, 11, 23]. In secret key randomization technique, each user’s private key component is attached with a random secret value that uniquely corresponds to that particular users, such that without the knowledge of this random secret value the user is unable to separately use their key components to take part in collusion attack. On the other hand, some researchers [11, 22, 25] proposed a more stringent method to achieve the collusion resistance properties of ABE scheme by hiding the access policies or credentials either into ciphertext for CP-ABE or user’s private key for KP-ABE [22]. In the ABER-LW E scheme, publishing a list of attributes leads to privacy issues and disclose the attributes set that needed to decrypt the encrypted data makes them vulnerable to attack. The KeyGen algorithm of ABER-LW E scheme [14] can be further improved by ensuring that different polynomials are independently selected to generate the private keys of attibutes att1 , att2 , . . . , attw∗ . As a result, the attackers cannot collude to retrieve the private key DW ∗ . In conclusion, this chapter analyzes ABER-LW E scheme [14] which extends the FIBE scheme into a lattice post-quantum cryptographic scheme. First, existing works on ABE research were briefly reviewed. Then, a collusion attack for the ABER-LW E scheme under selective-set model is presented. Subsequently, this chapter demonstrates that the ABER-LW E scheme is vulnerable against the collusion attacks due to the design flaw inherited from FIBE scheme. Subsequently, alternatives to prevent such attack are suggested.
References 1. A. Sahai, B. Waters, Fuzzy Identity-Based Encryption, Advances in Cryptology–EUROCRYPT 2005; LNCS, vol. 3494, R. Cramer (Ed.). (Springer, Berlin, Heidelberg, 2005), pp. 457–473. https://doi.org/10.1007/11426639_27 2. S.-Y. Tan, K.-W. Yeow, O.W. Seong, Enhancement of a lightweight attribute-based encryption scheme for the internet of things. IEEE IoT J. 6(4), 6384–6395 (2019). https://doi.org/10.1109/ JIOT.2019.2900631 3. B. Chandrasekaran, R. Balakrishnan, An efficient tate pairing algorithm for a decentralized key-policy attribute based encryption scheme in cloud environments. Cryptography 2(3), 14 (2018). https://doi.org/10.3390/cryptography2030014 4. L. Ning, Tan Zheng, W. Mingjun, T.Y. Laurence, Securing communication data in pervasive social networking based on trust with KP-ABE. ACM Trans. Cyber-Phys. Syst. (Special Issue on Dependability in CPS) 3(9), 14 (2019). https://doi.org/10.1145/3145624 5. J. Bethencourt, A. Sahai, B. Waters, Ciphertext-policy attribute-based encryption, in ACM Transactions on Cyber-Physical Systems—Special Issue on Dependability in CPS, IEEE Symposium on Security and Privacy (SP ’07) (2007), pp. 321–334. https://doi.org/10.1109/SP. 2007.11 6. T. Nishide, K. Yoneyama, K. Ohta, Attribute-based encryption with partially hidden encryptorspecified access structures. Appl. Cryptogr. Netw. Secur. LNCS 5037, 111–129 (2008). https:// doi.org/10.1016/j.ins.2014.01.035
64
T. S. Fun and A. Samsudin
7. Z. Liu, Z. Cao, D.S. Wong, White-box traceable ciphertext-policy monotone access structures. IEEE Trans. Inf. Forens. Secur. 8(1), 76–88 (2013). https://doi.org/10.1109/TIFS.2012. 2223683 8. J. Venkata Rao, V. Krishna Reddy, C.P. Pavan Kumar Hota, Enhanced ciphertext-policy attribute-based encryption (ECP-ABE), in ICICCT 2019—System Reliability, Quality Control, Safety, Maintenance and Management (2019), pp. 502–509. https://doi.org/10.1007/978981-13-8461-5_57 9. D. Ziegler, J. Sabongui, P. Gerald, Fine-grained access control in industrial internet of things. ICT Syst. Secur. Priv. Protect. 562, 91–104 (2019). https://doi.org/10.1007/978-3-030-223120_7 10. B.-C. Chifor, I. Bica, V.-V. Patriciu, F. Pop, A security authorization scheme for smart home internet of things devices. Fut. Gener. Comput. Syst. 86, 740–749 (2018). https://doi.org/10. 1016/j.future.2017.05.048 11. H. Deng, Q. Wu, B. Qin, J. Domingo-Ferrer, L. Zhang, J. Liu, W. Shi, Ciphertext-policy hierarchical attribute-based encryption with short ciphertexts. Inform. Sci. 275, 370–384 (2014). https://doi.org/10.1016/j.ins.2014.01.035 12. S. Agrawal, D. Boneh, X. Boyen, Efficient lattice (H)IBE in the standard model, advanced in cryptology EUROCRYPT 2010. LNCS 6110, 1–40 (2010). https://doi.org/10.1007/978-3642-13190-5_28 13. Y. Wang, Lattice ciphertext policy attribute-based encryption in the standard model. Int. J. Netw. Secur. 16(4), 358–365 (2014) 14. W. Zhu , J. Yu, T. Wang, P. Zhang, W. Xie, Efficient attribute-based encryption from R-LWE. Chin. J. Electron. 23(4) (2014). https://doi.org/10.6688/JISE.2017.33.3.5 15. M. Ajtai, Generating hard instances of lattice problems, in Proceedings of the 28th Annual ACM Symposium on Theory of Computing (1996), pp. 99–108. https://doi.org/10.1145/237814. 237838 16. O. Regev, On lattices, learning with errors, random linear codes, and cryptography, in Proceedings of the 37th Annual ACM Symposium on Theory of Computing (2009), pp. 1–37. https:// doi.org/10.1145/1060590.1060603 17. O. Regev, The learning with errors problem. Invit. Surv. Chaos Computer Club CCC 3(015848), 1–23 (2010) 18. V. Lyubashevsky, C. Peikert, O. Regev, On ideal lattices and learning with errors over rings, advances in cryptology—EUROCRYPT. LNCS 6110, 1–23 (2010). https://doi.org/10.1007/ 978-3-642-13190-5_1 19. S.F. Tan, A. Samsudin, Lattice ciphertext-policy attribute-based encryption from ring-LWE, in International Symposium on Technology Management and Emerging Technologies (ISTMET), pp. 258–262 (2015). https://doi.org/10.1109/ISTMET.2015.7359040 20. S.F. Tan, A. Samsudin, Ciphertext policy attribute based homomorphic encryption (CPABHERLWE): a fine-grained access control on outsourced cloud data computation. J. Inf. Sci. Eng. (JISE) 33(3), 675–694 (2017). https://doi.org/10.6688/JISE.2017.33.3.5 21. V. Lyubashevsky, C. Peikert, O. Regev, A toolkit for ring-lwe cryptography, advances in cryptology—EUROCRYPT 2013. LNCS 7881, 35–54 (2013). https://doi.org/10.1007/9783-642-38348-9_3 22. K. Frikken, M. Atallah, Attribute-based access control with hidden policies and hidden credentials. IEEE Trans. Comput. 55(10), 1259–1270 (2006) 23. Z. Liu, Z. Cao, D.S. Wong, Fully collusion-resistant traceable key-policy attribute-based encryption with sublinear size ciphertexts, in Proceedings 10th International Conference Security and Cryptology (2014) 24. X.A. Wang, X. Yang, M. Zhang, Y. Yu, Cryptanalysis of a fuzzy identity based encryption scheme in the standard model. Informatica 23(2), 299–314 (2012) 25. A. Shamir, How to share a secret. Commun. ACM 22(11), 612–613 (1979)
Enhanced Password-Based Authentication Mechanism in Cloud Computing with Extended Honey Encryption (XHE): A Case Study on Diabetes Dataset Tan Soo Fun, Fatimah Ahmedy, Zhi Ming Foo, Suraya Alias, and Rayner Alfred Abstract The recent advancement on cloud technologies promises a cost-effective, scalable and easier maintenance data solution for individuals, government agencies and corporations. However, existing cloud security solutions that exclusively depend on conventional password-based authentication mechanism cannot productively defence to ongoing password guessing and cracking attacks. Recent HashCat attack would brute be able to compel any hashed eight-characters-length secret key that comprises of any blend of 95 characters in less than 2.5 h. Several trivial approaches such as two-factor authentication, grid-based authentication and biometric authentication mechanisms have been enforced recently as an additional or optional countermeasure of defending password guessing and cracking attacks. These approaches, be that as it may, can be frustrated with an ongoing malware assault that capable of intercepting One-Time Password (OTP) sent to the mobile device. These stolen passwords often do not trigger any alerts and can be subsequently exploited to access other users’ cloud accounts (e.g. 61% of the users are utilizing the single secret key repeatedly to access different online records). To address these problems, this research aimed to implement an eXtended Honey Encryption (XHE) scheme for improving the assurance of conventional password-based authentication mechanism in cloud computing. At the point when the attacker tries to retrieve the patient’s diabetes information by speculating password, rather than dismissing their record access as a customary security defence mechanisms, the proposed XHE outputs an indistinct counterfeit patient’s record that closely resembles the legitimate patients’ diabetes information in light of each off base speculation on legitimate password. Along these lines, the implemented XHE scheme solidifies the multifaceted nature of password speculating and cracking assaults, as assailant cannot distinguish which of his speculated passwords is correct password. Then, a security message will be T. S. Fun (B) · Z. M. Foo · S. Alias · R. Alfred Faculty of Computing and Informatics, Universiti Malaysia Sabah, 88400 Kota Kinabalu, Malaysia e-mail: [email protected] F. Ahmedy Faculty of Medicine and Health Sciences, Universiti Malaysia Sabah, Kota Kinabalu, Malaysia © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_6
65
66
T. S. Fun et al.
produced and delivered to alert the network administrator and security responses team. Furthermore, the potential implementation of the proposed XHE scheme can be aimed at improving the password-based authentication system in other networks, including but not limited to Internet of Things (IoT) and mobile computing. Keywords Cloud security · Password-Based authentication · Honey encryption
1 Introduction Generally, the principle of cloud security mechanism has adopted existing security models, which includes confidentiality, integrity, availability (CIA) model and authentication, authorization and accounting (AAA) model. The CIA model concerns on ensuring the confidential and privacy of data with encryption algorithms (e.g. RSA, AES, etc.), data integrity with hashing algorithms (e.g. MD5, SHA-2, PBKDF2, etc.), as well as the data availability with network defenses approaches (e.g. firewalls, intrusion detection system, etc.), whereas, the AAA model focuses on the access control context, which involves identifying a user with a pre-defined credentials (e.g. username, passport number, etc.), authenticating and verifying the user’s claimed identity (e.g. password, fingerprints, retina scans, etc.) and authorizing the access to cloud resources (e.g. data, server, etc.). The CIA of cloud computing can recently be achieved using the Transport Layer Security (TLS) and Secure Socket Layer (SSL) mechanism, while password-based authentication mechanism has become the de facto standard for ensuring AAA model in cloud computing due to its practicality and ease to use. The password-based authentication mechanism is a composition of identification (username) and authentication (secret words), also well-known as password-only authentication or single-factor authentication. The strengthens of password-based authentication mechanism, however, are relies on the hardness of computationally secure, with the assumptions that the best-known approach of cracking the algorithms require an unreasonably extensive of computing power and processing time [1–3]. With the advanced of recent computing power, progressive distributed processing and parallelism algorithms, the ordinary composition of username and password as the methods of authenticating and controlling the access of cloud account poses extensive security risks of being challenged and susceptible to password-based attacks includes hashed password rainbow attack, brute-force password guessing and birthday attack [4–6]. Recent password cracking studies demonstrated that any hashed eight-characters-length secret words that comprise of a combination of 95 characters could be cracked in 330 min with the speed of 0.35 trillion-guess-per second [7] in 2012 and subsequently reduced to 150 min in 2019 [8]. To address this problem, recent trivial approaches are to use submissive approach (such as forcing a user to select a stronger password, educate or increase user awareness regarding password-based attacks), implement two-factor or multi-factor authentication, grid-based authentication or biometric authentication mechanism.
Enhanced Password-Based Authentication Mechanism in Cloud …
67
The two-factor authentication mechanism approaches, however, vulnerable to withstand recent malware attacks, include Man-In-The-Middle (MITM), Man-In-TheBrowser (MITB) attacks, SSL stripping attack. These attacks are capable of intercepting Transaction Authorization Code (TAC), or One-Time Password (OTP) sent to mobile device [9, 10]. The grid-based authentication system is a challenge-responsebased system, in which the OTP is formulated using a predetermined pattern derived from a graphical grid on displayed screen, however, can be thwarted with social engineering attack. On the other hand, the biometric authentication mechanisms used to identify users using fingerprints, facial recognition, iris or voice recognition, always require integration or additional hardware. Biometrics is permanently linked to the user, if compromised, cannot be replaced or reset. If a biometric is eventually compromised in one application, it will compromise basically all applications where the same biometric is used. Besides, these cracked passwords present broad security hazards in these existing mechanisms since assaults utilizing the cracked passwords always do not trigger any security alarms. This project aimed to implement the eXtend Honey Encryption (XHE) scheme [1, 3] to increase the complexity of authentication and access control mechanisms based on password in the cloud computing platform. Considering the recent healthcare data breaches are increasing by tripled, affected 15.08 million patients records recently as compared to 5.58 million patient records in 2017 [11], the implementation uses the healthcare Pima Indians diabetes dataset as the patient account data, subsequently used to construct the message space and seed space of XHE scheme. When the vicious assault tries to access the patient’s cloud account unauthorizedly with his/her speculating password, rather than dismissing their access as recent online account practices, the implemented XHE algorithm produces a vague fake patient record that is firmly identified with the authentic patient record, in which the assault could not decide if the speculated password is working accurately or not. Consequently, it solidifying the multifaceted nature of password speculating and cracking attacks. Along these lines, a security alert will be activated to notify the security response team on a regular basis.
2 EXtended Honey Encryption (XHE) Algorithm in Securing Password and Diabetes Data in Cloud Computing The Honey Encryption (HE) scheme is a probabilistic method that output a conceivable looking yet incorrect original message when assailant attempting to unscramble the ciphertext with an off base speculated password, in this manner hardening the complexity of password speculating process. The idea of Honey Encryption scheme was firstly presented by Juels and Ristenpart [12, 13, 16] in 2014 to enhance the security of credit card applications with additional defence layer. Tyagi et al. [14] and Huang et al. [15] subsequently adopted the HE scheme in protecting the text
68
T. S. Fun et al.
messaging and genomic library, respectively. Then, HE scheme has been improved against the assaults of message recovery. Most recently, Tan et al. [1, 3, 18] improving the HE scheme in protecting the file storage in cloud computing, subsequently named it as eXtended Honey Encryption (XHE) scheme. In this research, we further extended the works of Tan et al. [1, 3, 18] in hardening the complexity of passwordbased authentication mechanisms in cloud computing. The construction of XHE scheme [3] with the implementation of Pima Indians diabetes dataset as presented in the following: Plaintext Space (M 1 , M 2−i , . . . , M 2−n ). Also called as message space. With the message, M 1 is a hashed 64-characters-length-password that comprises of 95 alphanumeric character combination. With the aggregate of 9664 probabilities, the distribution over M 1 is indicated as ψm1 ; subsequently, the M 1 distribution sampling is identified as M1 ← ψm1 M1 . Then, let M2− i , . . . , M2−n is the patient’s diabetes data that consist of numerical characters with maximum lengths of four characters (e.g. the number of pregnancies with maximum length of two characters, the measurement of diastolic blood pressure in maximum length of three characters, etc.), the distribution over Mi−1 , . . . , Mn , is denoted as ψmi−1 . . . , ψmn and the sampling over the distribution Mi−1 , . . . , Mn , is defined as Mi−1 , . . . , Mn ← ψmi−1 . . . , ψmn . Seed Space (S1 , S2−i , . . . , S2−n ). Seed space consists of prefix seed, S 1 , and suffix seed, S2−i , . . . , S2−n , over the n-bit binary strings. Each message in M 1 and M 2 are mapped to a seed in S 1 and S2−i , . . . , S2−n respectively such that Σs1−∈S1 p(S1 ) = 1 and Σs2−i∈S2−i p(S2−i ) = 1, . . . , Σs2−n∈S2−n p(S2−n ) = 1. Distribution-Transforming Encoder (DTE). The algorithm of DTE composed of DTE_sub1 and DTE_sub2 functions, defined as below: DTE_1(encode_sub1, decode_sub1). The encode_sub1 function inputs the password of a patient, M 1 ∈M 1 and outputs array seed value of prefixes, s1 from seed space, S 1 . The deterministic decode_sub1 function inputs a message s1 ∈ S 1 and outputs the corresponding plaintext space M 1 ∈ M 1 . DTE_2(encode_sub2, decode_sub2). The encode_sub2 algorithm inputs a patient diabetes data, M2−i ∈ M2−n , . . . , M2−i ∈ M2−n and outputs a series of suffix seed value, s2−i, . . . , s2−n from seed space, S2−i , . . . , S2−n . The deterministic function of decode_sub2 inputs seed message s2−i ∈ S2−i . . . , s2−n ∈ S2−n and outputs a message M2− i , . . . , M2−n by running the binary search on the inverse sampling table and linear search on the message space to locate the original password and diabetes data. Inverse Sampling DTE (IS-DTE). The algorithm of IS-DTE composed of ISDTE_sub1 and IS-DTE_sub2 functions, defined as below: IS-DTE_1(IS-encode_sub1, IS-decode_sub1). The function of IS-encode_sub1 executes the Cumulative Distribution Function (CDF), F m1 , with the distribution of pre-defined plaintext, ψ m1 and M1 = {M1−1 , M1−2 , . . . , M1−|M| } . Denote F m1 (M 1−0 )= 0, then outputs M 1−i such that Fm1 (Mi−1 ) ≤ S1 < Fm1 (M1−i ), where S 1 ← $ [0, 1). Eventually, encodes the M 1−i input plaintext by choosing an
Enhanced Password-Based Authentication Mechanism in Cloud …
69
arbitrary value the Fm1 M(1−i)−1 , Fm1 (M1−i ) set uniformly. Let the function 1 of IS-decode_sub1 is the inverse of CDF, specified as IS-decode_sub1 = F − m1 (S 1 ). IS-DTE_2(IS-encode_sub2, IS-decode_sub2). The function of IS-encode_sub2 executes the Cumulative Distribution Function (CDF), Fm2 = Fm2−i , . . . , Fm2−n and M2−i = {M2−−i−1 , M2−−i−2 , . . . , M2−−i−|M| }, . . . , M2−n = {M2−−n−1 , M2−−n−2 , . . . , M2−−n−|M| } . Define F m2 (M 2−0 )= 0, and produces M 2−i such that Fm2 (Mi−1 ) ≤ S2 < Fm2 (M2−i ), where S 2 ← $ [0,1) and M 2−i choosing an arbiS2−i , . . . , S2−n . Lastly, encodes the input message trary value from Fm2 M(2−i)−1 , Fm2 (M2−i ) set uniformly. The function of 1 IS-decode_sub2 is the inverse of CDF, defined as IS-decode_sub2 = F − m2 (S 2 ). DTE-then-Cipher (HE [DTE, SE]). Also called as DTE-then-Encrypt. The algorithm of HE [DTE, SE] is a pair of HEnc function and HDec function. The HE ciphers a plaintext by executing the DTE function, and then re-ciphers the output of DTE by using determined symmetric encryption function as follows. HEnc (SK, M 1 , M 2 ). Let the H be a determined hashing function and n is arbitrary number of bits, with the symmetric key SK, a password M 1 , and its extension M2 = M2−i, . . . , M2−n , choose the arbitrary values uniformly, s1 ← $ encode(M 1 ), s2 ← $ encode(M 2 ) and R ← $ {0,1}n , then generates the output cipher messages, C 1 = H(R, SK) ⊕ s1 and C 2 = H(R, SK) ⊕ s2 . HDec (SK, C 1 , C 2 , R). By providing the R, SK, C 1 and C2 = C2−i , . . . , C2−n , calculates s1 = C 1 ⊕ H(R, SK) and s2 = C 2 ⊕ H(R, SK). Then, recovers the patient’s diabetes data, M 1 = decode (s1 ) and its extension, M 2 = decode(s2 ) by looking up the inverse sampling tables.
3 Implementation and Performance Analysis Three dominant symmetric encryption algorithms have been selected to execute the algorithm of HEnc (K, M 1 , M 2 ) and HDec (K, R, C 1 , C 2 ), includes Advanced Encryption Standard (AES), Blowfish and Tiny Encryption Algorithm (TEA). AES algorithm is developed in response to the obsolescence of DES and the processing burden of the TripleDES algorithm. Recently, AES has become widely adopted in several recent cryptosystems due to its efficient processing power and RAM requirements. Whereas, both Blowfish and TEA algorithms use a 64-bit block size as compared to AES 128-bit block size. The used dataset is Pima Indians diabetes dataset [17], that consists of female patient’s diabetes data, includes quantities of times pregnant, measurement of blood pressure in mm Hg and thickness of skin in mm, the reading of plasma glucose concentration for 120 min in the test of oral glucose tolerance, BMI, diabetes pedigree function, age and outcome. The implementation measurements were recorded on Intel® Core™ i5-5200U CPU @ 2.2 GHz, 4 GB RAM, running 64 bits, and the values are the mean of 100 measurements of the respective algorithms. The web-based application that aimed to stimulate the legitimate patient login
70
T. S. Fun et al.
and attacker password guessing attacks is developed with PHP 7.2.0 in CodeIgniter Framework. The MySQL database is used to store the diabetes patients record in a structured format and deployed in a cloud computing platform as in Fig. 1. As shown in Fig. 2, the symmetric TEA algorithm enjoys a higher efficiency on generating the encryption key, encrypting the diabetes data and decrypting them as benchmarked to AES and Blowfish due to its implementation simplicity. AES is constructed based on the substitution-permutation network, therefore performed better than Blowfish that obstacle with its inefficient of Feistel Structure. However, after assessing the security aspect, the TEA algorithm is subjected to related-key differential attack. Therefore, this research decided to use the AES algorithm to further implement the HEnc (K, M 1 , M 2 ) and HDec (K, R, C 1 , C 2 ) in XHE scheme in cloud computing. The implementation of XHE algorithm in securing the diabetes patients data is demonstrated in Figs. 3, 4 and 5. When the legitimate patient enters the valid username and password, the XHE algorithm generates the legitimate user account. However, whenever there is an invalid password entered, the XHE algorithm outputs
Fig. 1 The implementation of XHE algorithm in securing diabetes patient data in cloud computing
CPU Time (ns)
Enhanced Password-Based Authentication Mechanism in Cloud …
800 600 400 200 0
71
Comparison of Symmetric Key Algorithms
Key Generation
Encryption
Decryption
Symmetric Algorithms AES
Blowfish
TEA
Fig. 2 Comparison of symmetric algorithms
Fig. 3 The web-based login interface to access the diabetes data in cloud storage
the indistinguishable counterfeit patient’s diabetes data and triggers an email sends to alert the legitimate user and security incident response team.
4 Conclusion This paper presents the implementation of XHE algorithm in securing the diabetes patient data in cloud computing. Compared to recent works that focus on summative approach or multi-factor and multi-level authentication, this research can solve the root problem of password guessing and cracking attack by hardening the complexity of password guessing. With the generation of indistinguishable counterfeit diabetes data closely related to the data of the legitimate patient in response to the invalid password, the attacker cannot differentiate and decide if the conjectured password
72
T. S. Fun et al.
Fig. 4 The sample of XHE scheme generated legitimate patient’s data when the user enters the valid username and password
is working accurately or not, as well as the accessed diabetes data are belonging to that particular patient or not. The XHE scheme can be developed further in the future to hardening the password-based authentication and authorization system in other platforms including, but not limited to, the Internet of Things (IoT) and mobile computing.
Enhanced Password-Based Authentication Mechanism in Cloud …
73
Fig. 5 The sample of XHE scheme generated bogus patient’s data when the attacker enters invalid username and password
Acknowledgements This work was supported by Universiti Malaysia Sabah grant [SLB0159/2017]. The authors also thank the anonymous reviewers of this manuscript for their careful reviews and valuable comment.
References 1. S.F. Tan, A. Samsudin, Enhanced security for public cloud storage with honey encryption. Adv. Sci. Lett. 23(5), 4232–4235 (2017). https://doi.org/10.1166/asl.2017.8324 2. M. Jiaqing, H. Zhongwang, C. Hang, S. Wei, An efficient and provably secure anonymous user authentication and key agreement for mobile cloud computing. in Wireless Communications and Mobile Computing (2019), pp. 1–12. https://doi.org/10.1155/2019/4520685 3. S.F. Tan A. Samsudin, Enhanced security of internet banking authentication with EXtended honey encryption (XHE) scheme. in Innovative Computing, Optimization and Its Applications ed by I. Zelinka, P. Vasant, V. Duy, T. Dao. Studies in Computational Intelligence vol 741 (Springer, Cham, 2018) pp. 201–216. https://doi.org/10.1007/978-3-319-66984-7_12 4. W. Xiaoyun, F. Dengguo L, Xuejia Y. Hongbo, Collisions for hash functions MD4, MD5, HAVAL-128 and RIPEMD. Cryptology ePrint Archive Report 2004/199, 16 Aug 2004, revised 17 Aug 2004. http://merlot.usc.edu/csacf06/papers/Wang05a.pdf
74
T. S. Fun et al.
5. M. Stevens, New collision attacks on SHA-1 based on optimal joint local-collision analysis. in Advances in Cryptology – EUROCRYPT 2013 ed by T. Johansson, P.Q. Nguyen. EUROCRYPT 2013. Lecture Notes in Computer Science vol 7881 (Springer, Berlin, Heidelberg, 2013), pp. 245–261. https://doi.org/10.1007/978-3-642-38348-9_15 6. A. Leekha, A. Shaikh, Implementation and comparison of the functions of building blocks in SHA-2 family used in secured cloud applications, J. Dis. Mathe. Sci. Cryptograp. 22(2) (2019). https://doi.org/10.1080/09720529.2019.1582865 7. D. Goodin, 25-GPU Cluster Cracks Every Standard Windows Password in < 6 Hours. (Arc Techica, 2012) 8. N. Hart, HashCat Can Now Crack An Eight-Character Windows NTLM Password Hash In Under 2.5 Hours (Information Security Buzz, 2019) 9. A. Mallik, M. Ahsan, M.Z. Shahadat, J.C. Tsou, Understanding Man-in-the-middle-attack through Survey of Literature. Indonesian J. Comput. Eng. Design 1, 44–56 (2019) 10. V. Haupert, S. Gabert, How to attack PSD2 internet banking. in Proceeding of 23rd International Conference on Financial Cryptography and Data Security (2019) 11. Protenus 2019 Breach Barometer, 15 M + Patient Records Breached in 2018 as Hacking Incidents Continue to Climb (Protenus Inc and DataBreaches.net, 2019) 12. A. Juels, T. Ristenpart, Honey encryption: encryption beyond the brute-force barrier. IEEE Sec.Priv. IEEE Press New York 12(4), 59–62 (2014) 13. A. Juels, T. Ristenpart, Honey encryption: security beyond the brute-force bound. in Advances in Cryptology—EUROCRYPT 2014 ed by P.Q. Nguyen, E. Oswald. EUROCRYPT 2014. Lecture Notes in Computer Science vol 8441 (Springer, Berlin, Heidelberg, 2014), pp. 293–310. https:// doi.org/10.1007/978-3-642-55220-5_17 14. N. Tyagi, J. Wang, K. Wen, D. Zuo, Honey Encryption Applications, Computer and Network Security Massachusetts Institute of Technology. Available via MIT (2015). http://www.mit.edu/ ~ntyagi/papers/honey-encryption-cc.pdf Retrieved 15 July 2017 15. Z. Huang, E. Ayday, J. Fellay, J.-P. Hubuax, A. Juels, GenoGuard: Protecting Genomic Data Against Brute-Force Attacks, IEEE Symposium on Security and Privacy (IEEE Press, California, 2015), pp. 447–462 16. J. Joseph, T. Ristenpart, Q. Tang, Honey Encryption Beyond Message Recovery Security (IACR Cryptology ePrint Archive, 2016), pp. 1–28 17. Pima Indians diabetes dataset, UCI Machine Learning Repository. Access Feb 2018 18. M. Edwin, S.F. Tan, A. Samsudin, Implementing the honey encryption for securing public cloud data storage. in First EAI International Conference on Computer Science and Engineering (2016)
An Enhanced Wireless Presentation System for Large-Scale Content Distribution Khong-Neng Choong, Vethanayagam Chrishanton, and Shahnim Khalid Putri
Abstract Wireless presentation system (WPS) is a video capturing and streaming device which enables users to wirelessly cast content from user devices such as PC and smartphone onto a larger display device with screen mirroring technology. This technology is applicable to any collaborative environment which includes meeting rooms, conference rooms and even classrooms. However, it is not designed to serve large number of users such as in auditorium and lecture hall, particularly for distributing the projected content to hundreds of audiences. In this paper, we describe an enhanced WPS (eWPS) which extends the functionality of a WPS with a content sharing and distribution capability. Performance results showed that serving a page of presentation content took an average of 1.74 ms with an audience size of 125. Keywords Wireless presentation system · Enhanced WPS · Presentation management system · Content sharing · Content distribution
1 Introduction Wireless presentation system (WPS) [1–5] is becoming a common technology in meeting room to enable seamless presentation wirelessly from any devices, ranging from laptop, tablet to smartphone. It offers flexibility and mobility to presenters to present from any computing devices, and also to move around in the meeting room while conducting the presentation without worrying about the limited VGA/HDMI cable length as in the conventional presentation setup. However, existing WPS K.-N. Choong (B) · V. Chrishanton · S. K. Putri Wireless Innovation Lab, Corporate Technology, MIMOS Berhad Technology Park Malaysia, 57000 Kuala Lumpur, Malaysia e-mail: [email protected] V. Chrishanton e-mail: [email protected] S. K. Putri e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_7
75
76
K.-N. Choong et al.
Fig. 1 An overview of wireless presentation system
focuses mainly on providing flexibility and functionality to the presenter, while ignoring potential needs of serving audience. A typical wireless presentation system is shown in Fig. 1. The wireless presentation box provides Wi-Fi connectivity which allows presenter device to connect and mirror its screen content to a display device such as overhead projector or TV. The WPS box can also be enabled to distribute the presentation content currently being projected to the audience with the use of webview clients. The webview client is basically web browser which connects and reads updated projected content from the WPS. The limitation of existing WPS is its inability to serve large number of audiences. WPS systems found in the market today typically state their capacity of concurrently connecting 32 or 64 client devices, but do not explicitly mentioned its maximum supported webview client size. This means it would be a challenge to employ existing WPS in an auditorium or lecture hall scenario for distributing content to an audience size of 150 and above. In this paper, we describe an improved version of WPS, with an emphasis on the mechanism used to distributing content to larger audience size along with performance measurement. Our enhanced WPS (eWPS) runs on top of a Wi-Fi infrastructure which consists of multiple access points and network switch connected over a wired gigabit Ethernet backhaul. It utilizes an external web server which receives periodical screenshots from the eWPS and subsequently makes them accessible by the audience web devices. As opposed to conducting simulation of large-scale Wi-Fibased system as in [6], we approach the challenges with prototype implementation over a testbed as in [7, 8] for practical performance studies.
An Enhanced Wireless Presentation System for Large-Scale …
77
This paper is organized as follows. Related work is discussed in Sect. 2. Section 3 describes the experimental setups and the test configurations/flow. Section 4 explains the performance results. Summary and conclusion are found in Sect. 5.
2 Related Works A diversity of products and solutions can be found in the literature offering basic features of a WPS. Some of these products even offer content sharing/distribution. However, the number of supported client devices is below 64. Based on our latest review, none of these products is designed to support larger scale of audience size. Table 1 summarizes the features of some prominent WPS found in the market and their differences. Basic mirroring refers to the ability of a user device to capture and stream screen/desktop activity to the connected display. Basic webviewing means the system is capable of distributing screenshots captured at the presenter/user device. Advanced webviewing goes beyond basic webviewing, by further allowing users to navigate across all past/presented content. As an example, users who are late to the session are now able to flip back to previously presented slides/content and later flip forward to the current/latest screenshot content. Wireless projector [1] is the most common and widely used product in the market. Even though it is a self-contained device, it has many shortcomings. It is not a straightforward process for connecting laptops to the wireless projectors as each brand comes with its own configuration and execution steps. In addition, they support only basic mirroring and lacking of broadcasting presentation content to audience. Chromecast [2], Barco ClickShare [3] and DisplayCast [4] are standalone hardware appliance/systems which support only basic mirroring without any content sharing capability. The wePresent system [5] is the few in the market which provides basic webviewing capability. There is no product which supports advanced webviewing as proposed in this paper. Table 1 Features comparison of WPS products or solutions against the proposed eWPS
Basic mirroring
Basic webviewing
Advanced webviewing
Yes
No
No
Chromecast [2] Yes
No
No
Barco ClickShare [3]
Yes
No
No
DisplayCast [4]
Yes
No
No
Wireless projector [1]
wePresent [5]
Yes
Yes
No
eWPS
Yes
Yes
Yes
78
K.-N. Choong et al.
The use of IP multicast on multimedia applications such as live presentation is increasingly popular in both corporate and university/campus networks. It is therefore a potential mechanism to be explored in the design of eWPS. However, multicast over Wi-Fi is restricted with various challenges such as low data rate, high packet loss and unfairness against competing unicast traffic [9]. Moreover, not all Wi-Fi access points support multicasting [9]. For the end client devices, customized application will be needed to subscribe and receive these multicast packets. Our design is to allow any user device which has a browser to work, without the need of any specific application installation. Hence, in this paper, we utilized unicast instead of multicast in our design and implementation of the current system.
3 Experimental Setup 3.1 Setup of Network Connectivity and Server The eWPS system is designed with scalability in mind to ensure continuous support for growing audience size. As shown in Fig. 2, the eWPS system is therefore divided into two parts, namely the connectivity infrastructure section and the presentation management section. The connectivity infrastructure section focuses mainly on managing the networking aspect, from providing Wi-Fi accessibility, load balancing across various Wi-Fi access points (AP) controlled by the access controller (AC), to
Fig. 2 Network architecture of eWPS consists of two parts, namely the connectivity infrastructure and the presentation management section
An Enhanced Wireless Presentation System for Large-Scale …
79
Fig. 3 Layout of the auditorium
enabling Internet connectivity. There is no favor of any specific brand in our design, as long as the chosen Wi-Fi access points are matched to their designated access controller of the same brand/model. The presentation management section manages the content capturing from the source device, to store the content at the distribution server (HTTP server) for accessibility by the end client devices over the connectivity infrastructure described earlier. All the above devices, which include APs, AC and HTTP server, are connected via a 1000 Mbps Ethernet switch in a local area network (LAN) setup. Figure 3 shows the layout of the auditorium measured 54 × 40 where our experiment has been conducted. Commercially available network equipment from Huawei was used to set up the connectivity infrastructure, which includes one unit of access controller (AC) and six units of access points (APs) [4]. Two APs were located at the front of the auditorium, with the remaining four installed on the same row at the back of the auditorium. All six APs have been configured to share a common SSID named “Smart_Presentation” over channels 1, 6, 52, 56, 60, 64, 149 and 157 which are non-overlapped. Six channels operate on 5 GHz radio band, leaving only channels 1 and 6 sitting on 2.4 GHz. The reason to allocate 2.4 GHz serving band was to serve older phones which may not have 5 GHz radio support. The AC is used to perform load balancing/distribution, by directing different Wi-Fi stations (STAs), which are the end client devices, to associate to designated channels based on current radio condition.
80
K.-N. Choong et al.
The distribution server runs on an Intel i5 micro-PC with 4 GB RAM and 500 GB SATA hard disk. We have chosen such form factors and configuration for its lightweight size to ease portability.
3.2 Test Event and Metrics A test run has been conducted on an actual event, which lasts about 4 h from 8:30 a.m. The event was conducted as a workshop which introduces the use of a new mobile app with general explanation followed by tutorial sessions. The flow of the workshop is as follows: 8:30 a.m.–10:45 a.m.: Welcoming speech 10:45 a.m.–12:15 p.m.: PowerPoint presentation, app demonstration. There were 15 and 52 pages of slides for the above session 1 and 2. Data from various monitoring sources were gathered for studying. Both the radio status and user devices were extracted from the AC at fixed intervals for the entire session. Radio status list provides detailed information on the wireless conditions and status of all connected APs, which include the link type, both uplink and downlink rate, number of sent/received packets, retransmission rate and packet loss ratio, etc. The glances [10] system tool was set up to run on the HTTP server to access/monitor its resource utilization which includes CPU and memory. The HTTP log of the distribution server was also used for studying the server performance, total number of accesses and server resource utilization.
4 Results and Analysis There were 125 client devices connected to the given Wi-Fi infrastructure during the event. The radio list recorded that 70 devices, which were 56% of the total, have been connected to more than one AP. This could be the effect of AC performing load balancing among the APs. It could also be caused by certain Wi-Fi stations which left the venue and later re-connect back to the SSID at a later time. Figure 4 shows the allocation of stations by radio band. By referring to the flow of the workshop in Sect. 3.2, the actual slide presentation was started around 10:45 a.m., which corresponds to the sudden spike in Fig. 4. It can be observed that post 10:45 a.m., there were roughly 20 + % stations associated with the 2.4 GHz frequency band leaving 70 + % connected to 5 GHz frequency band. It fit in quite nicely to our frequency band allocation as described in Sect. 3.1 where 2 out of 8 frequency bands (25%) are at 2.4 GHz. Figure 5 shows the allocation of stations by specific frequency channel. It can be observed that the distribution for 2.4 GHz is of an average difference of about five stations. Station distribution on 5 GHz has a much bigger variance with channel
An Enhanced Wireless Presentation System for Large-Scale …
81
Fig. 4 Allocation of Wi-Fi stations by radio band
Fig. 5 Allocation of Wi-Fi stations by both 2.4 and 5 GHz frequency channels
149 utilized by more 25 stations, while channel 60 was utilized by fewer than five stations. This could be due to the seating location of the users, who could be closer to the AP configured to operate at channel 149. Figure 6 shows the average downlink rate of the stations by frequency band. The average downlink rate for both 2.4 and 5 GHz was 50 and 150 Mbps, respectively. Again this downlink rate fit nicely to the initial configuration, where 5 GHz was used to handle most of the traffic load. Figure 7 shows both the total number of slide accesses and downloads. An instant spike in the number of webview clients was observed around 10:45 a.m., which was the starting time scheduled for the second event. The sudden surge on slide accesses
82
K.-N. Choong et al.
Fig. 6 Average downlink rate by frequency band
Fig. 7 Total number of content access for both direct access and downloads
corresponded to the sudden network load as seen in Figs. 4 and 5. The spike was mainly caused by the sudden connectivity requests from the audience immediately after the proposed system has been introduced in the session. A total of 9273 slides accesses were recorded for the entire session. Dividing this by the total session time of 1.5 h (session 2) showed us that about 1.71 HTTP requests/sec were made on average. There were no failed HTTP requests or responses throughout the session. A total of 563 downloads were recorded. Again, all these requests were successfully received and responded by our distribution HTTP server.
An Enhanced Wireless Presentation System for Large-Scale …
83
Figure 8 shows a sudden increase and gradual decrease on the webview clients, with a maximum of 93 clients recorded. Given a total number of slide accesses at 9273, there was an average of 99 slide accesses per webview clients. Given that the total pages of slides for the entire session were 67, it shows that the eWPS has even captured and shared the animated details on certain slide pages, which made up of the remaining 32 slide accesses. Figure 9 shows the system resource utilizations of the distribution server as recorded with the glances tool. The first peak of utilization (at 20%) which occurred at the start of the session should be ignored because it was mainly caused by the loading of the glances software itself. The second peak, which was around 12.5%, would be a correct representation of the actual CPU utilization of serving HTTP
Fig. 8 Total number of Webclients connected for content access
Fig. 9 System resource utilization of the distribution server
84
K.-N. Choong et al.
requests. This corresponded to the time of 10:45 a.m. where there was a surge on the number of webview clients. As for the memory utilization, it was maintained below 5% throughout the session. The chart in general shows that both the CPU and memory were very much idle throughout the session. Based on such utilization, we foresee the system is capable of serving six times more the workload as experienced in this current setup. The mean access time (delay) for webview client was 1.74 ms for a maximum of 125 stations. This means the setup is capable of serving 1000 stations with an estimated average access time of 13.92 ms. Such good performance was due to a few reasons. First, the backbone of the setup is on a 1000 Mbps Ethernet. Second is the use of six Wi-Fi APs which are overly sufficient for the client size. Further, the workload and traffic of these six APs were centrally managed by AC. Lastly, the size of the content (i.e., size of screenshot images range from 300 to 500 KB) has been optimized for distribution over network.
5 Conclusion In this paper, an enhanced wireless presentation system which focuses on largescale content distribution has been presented. This system has made use of Wi-Fi APs as the access infrastructure for end user devices/stations to request and receive presentation screenshots captured from the live presentation device. Results from our experiments showed that the system was capable of supporting workload of an actual event with good performance. Moving forward, we planned to investigate the optimum number of APs which are needed to deliver content within an acceptable range of delay intervals. Compliance with Ethical Standards Conflict of Interest The authors declare that they have no conflict of interests. Ethical Approval This chapter does not contain any studies with human participants or animals performed by any of the authors. Informed Consent “Informed consent was obtained from all individual participants included in the study.”
References 1. Wireless Projector, https://www.epson.com.my/Projectors/Epson-EB-1785W-WirelessWXGA-3LCD-Projector/p/V11H793052 2. Chromecast, https://store.google.com/product/chromecast_2015?srp=/product/chromecast 3. Barco Clickshare, https://www.barco.com/en/clickshare
An Enhanced Wireless Presentation System for Large-Scale …
85
4. S. Chandra, L.A. Rowe, Displaycast: a high performance screen sharing system for intranets. in 20th ACM international conference on Multimedia (2012) 5. wePresent, https://www.barco.com/en/page/wepresent 6. M. Nekovee, R.S. Saksena, Simulations of large-scale WiFi-based wireless networks: Interdisciplinary challenges and applications. Fut. Gener. Comput. Syst. 26(3), 514–520 (2010) 7. Y. Yiakoumis, M. Bansal, A. Covington, J.V. Reijendam, S. Katti, N. Mckeown, BeHop: A testbed for dense Wifi networks. ACM Sigmobile Mobile Comput. Commun. Rev. 18(3), 71–80 (2014) 8. C. Adjih, E. Baccelli, E. Fleury, G. Harter, N. Mitton, T. Noel, R. Pissard-Gibollet, F. SaintMarcel, G. Schreiner, J. Vandaele, T. Watteyne, FIT IoT-LAB: A large scale open experimental IoT testbed. in IEEE 2nd World Forum on Internet of Things (WF-IoT) (Milan, 2015) 9. P. Pace, G. Aloi, WEVCast: Wireless eavesdropping video casting architecture to overcome standard multicast transmission in Wi-Fi networks. J. Telecommun. Syst. 52(4), 2287–2297 (2013) 10. Glances—An eye on your system, https://github.com/nicolargo/glances
On Confidentiality, Integrity, Authenticity, and Freshness (CIAF) in WSN Shafiqul Abidin, Vikas Rao Vadi, and Ankur Rana
Abstract A Wireless sensor network (WSN) comprises several sensor nodes such as magnetic, thermal, and infrared, and the radar is set up in a particular geographical area. The primary aim of sensor network is to transmit reliable, secure data from one node to another node, node to base station and vice versa and from base station to all nodes in a network and to conserve the energy of sensor nodes. On the other hand, there are several restrictions such as large energy consumption, limited storage/memory and processing ability, higher latency, and insufficient resources. The related security issues in wireless sensor network are authenticity, confidentiality, robustness, integrity, and data freshness. The sensor nodes are susceptible to several attacks such as DOS, Sybil, flood, black hole, selective forwarding which results in the leakage of sensitive and valuable information. It is therefore necessary to provide security against these critical attacks in the network. Wireless sensor network were earlier used for military applications with the objective of monitoring friendly and opposing forces, battlefield surveillance, detection of attacks, but today Wireless Networking have a huge number of applications-environmental, healthcare, home, industrial, commercial and are still counting. This paper is an extensive review of the security requirements, attacks that are to be avoided and resolved for achieving a secure network connection. This paper also emphasizes various limitations and defense strategies to prevent threats and attacks. The issues of applications of wireless sensor network for smooth and reliable transmissions are also discussed. The sensor networks are popular for mission-critical-tasks and security is immensely required for such hostile environment employed networks. S. Abidin (B) HMR Institute of Technology & Management, (GGSIPU), New Delhi, Delhi, India e-mail: [email protected] V. R. Vadi Bosco Technical Training Society, New Delhi, India e-mail: [email protected] A. Rana Quantum University, Roorkee, Uttrakhand, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_8
87
88
S. Abidin et al.
Keywords Cryptography · Data confidentiality · Sybil · Data authentication · Black hole attack · Attacks on WSN
1 Introduction During the past years, wireless sensor network has found a huge number of applications in various fields that can have a significant impact on the society. Sensor network may consist of many different types of sensor nodes such as magnetic, thermal, infrared, and radar which are able to monitor a wide number of environmental conditions, possessing either unique or different features, designated with some predetermined functions and functionalities. A sensor network consists of a large number of tiny sensor nodes and their number in a network may range from few, to hundreds, to thousands even more and possibly a few powerful nodes called base stations. These nodes are capable of communicating with each other by means of single or multiple channels. Multiple channel-based communications are preferred over single channel due to reliable communication, less collisions or interferences, greater bandwidth, and improved network throughput. Base stations are responsible for aggregating the data, monitoring, and network management [1]. Base stations act as a gateway between the sensor nodes and the base stations. The WSN also includes task manager, network manager, security manager communication hardware and compatible software as its related components. The principle working mechanism of WSN is that the sensor nodes senses, accumulates data from varied locations in the environment, co-operates among themselves, processes the information and then transmits the data to the main location that is base station. Some important parameters need to be considered while designing a network: end-to-end latency [2] (should decrease), energy consumption (should be less), packet delivery ratio (should be high), and throughput (should increase). These factors play a significant role in determining the efficiency of the network. Routing in another fundamental aspect of sensor network, determining the path between source and destination, and leads to secure transmission of data packets and can also help optimize application availability, improves productivity and responsiveness. The need for secure transmission resulted in the development of different routing protocols and algorithms [3]. The routing protocols are categorized into four broad categories based on network structure, based on protocol operation, based on path establishment, and based on the initiator of communication. The wireless sensor networks are considered application-specific. There are huge applications of wireless sensor networks including sensor-based alarming/fire system, smart homes and farming/harvesting, structural health monitoring, habitat monitoring, civil engineering, surveillance and emergency response systems, environmental research, disaster management, robotics, traffic monitoring, transportation and logistics, military, tracking, industrial and consumer process, monitoring and control applications [4]. The wide range of applications of the sensor network
On Confidentiality, Integrity, Authenticity, and Freshness …
89
requires each sensor to be highly diminished in terms of size, power consumption, and price. One of the primary concerns of WSN is that transmitted information should be safe and secure [5]. No one should be allowed to access network and information. It is possible that a third party after receiving data packets may create mess and nuisance with the original packets if there is no secure mechanism has been implemented or imposed to protect data between nodes. To implement this, it is mandatory to explore challenges in design, intelligent applications with minimum memory and shortage of computing power. Further, tough radio transmission bandwidth is also needed. In short, smart security mechanisms are required to protect, detect, and pull through severe attacks. Levels of implementing these mechanisms may differ according to the nature of attacks. The transmitted data need to be confidential, authentic, private, robust, and sustainable and should also maintain the forward and backward secrecy [6]. WSN security is somewhat different from local area networks and wide area networks because sensor nodes equipped with computer power, battery, wireless, ad hoc, Unattended. Therefore, due to these limitations as well as constraints in WSNs a large number of protocols rely on symmetric-key cryptography. Key management (use of global, pair wise, and symmetric keys), encryption, secure data aggregations are some of the commonly employed security solutions for wireless sensor networks. There are different techniques that are used to prolong the lifetime of a sensor network [7]. The security requirements for WSN are: confidentiality, integrity, availability, and freshness [20]. Additional requirements may include authentication, access control, privacy, authorization, non-repudiation, and survivability. The paper has been narrated in the following manner: Sect. 2 highlights constraints and limitations, whereas Sect. 3 emphasizes upon the concerned security issues, requirements and the need of providing security in wireless sensor networks, Sect. 4 discusses the attacks/threats on WSN and also the defensive measures required to counter those attacks, Sect. 5 finally provides the conclusion, which briefly summarizes the paper.
2 Constraints in Wireless Sensor Network In this section, limitations and constraints on WSN have been discussed that are differentiated from a large number of interconnected sensor nodes and the sensor field. WSN is non-resistant to attacks where an adversary may alter original data, inject corrupt data, capture data, floods a particular node with same message repeatedly and perform other such restricted activities, which often results in data loss, misrouting, delays, destroy the network resources and the nodes are at a risk of physically being damaged. Malicious node can be added to exhaust other sensor node capability affecting the whole environment. Large energy consumption, limited storage/memory and processing ability, higher latency, lesser reliability, scalability
90
S. Abidin et al.
and fault-tolerance, risky operating environment, poor data aggregation [8], slow data processing speed and its optimization, computation and communication capabilities are some of the constraints in the network. Energy is a very important factor in a WSN. Energy determines the life of a node. It is the most important criteria which distinguish a normal node from an attacker node. Therefore, based on the different energy levels, attackers, cluster heads, and base station are selected. There is energy loss when a packet is being sent or received. The reasons for energy loss in wireless sensor network communication [9].
2.1 Control Packet Overhead Control packet exhausts more energy in comparison with ordinary packets while transmission, receiving, and listening, thereby it is useful to use less numbers of control packet for transmission, thus it reduces the overhead.
2.2 Collision Basically, collision permits two or more stations try to exchange and transmit the data concurrently. When two stations transmit data concurrently, collision may likely to occur and the packets are junked and retransmitted and thereby resulting in energy loss.
2.3 Idle Listening Idle listening refers to, when sensor listen for incoming packet even when no data is being sent. This results in energy loss of the node and depleting the lifetime of wireless sensor network.
2.4 Overhearing Overhearing: It is an indirect communication method where an agent receives packet which is not an addressee which results in unnecessary traffic which in turn results in energy loss. Maximum Lifetime (Max-life) routing to balance the rate of energy consumption in different sensor nodes according to varying characteristics, energy-efficient routing algorithm (EEMR) to improve the energy utilization by varying the activities of
On Confidentiality, Integrity, Authenticity, and Freshness …
91
wireless communication modules of sensor network are some of the methodologies implemented for energy concerned issues [10].
3 Concerned Security Issues It is very important to know and understand the security issues in WSN, prior to the knowledge of attacking techniques and the ways to prevent different attacks, so that the counter-attack strategies may be developed more efficiently. Typical WSN consists of many nodes which have been assigned a particular energy level and malfunction of any node can cause damage to the whole working environment thereby decreasing the lifespan of that network. Some of the basic security issues in WSN are confidentiality, integrity, authenticity, which have been discussed below.
3.1 Data Authentication Authenticity is a process by which one verifies that someone is who they claim they are and not someone else. Authentication is to check the authenticity. In WSN, authentication is a process by which the node makes sure that weather the incoming data is from correct source or has been injected by any intruder or any unknown surrounding network [11]. Data authentication puts on ice the acceptor that the data packet has not been altered during transmission. Authentication also makes sure that the data packets have been transmitted from exact source. Node authenticity tells user about genuineness of individuality of sender node. If authentication is not present, intruder without any obstacle can inject wrong data in the environment. Generally, to remove this problem of data authenticity public-key cryptography is used. Public key or asymmetric cryptography assigns same key to all the nodes in a particular network, so whenever a data is to be transmitted the sender node will only transmit the data if the receiving node has that public key and thus it verifies the authenticity of the node.
3.2 Data Confidentiality Confidentiality means making something confidential. Sensor node transmits hypersensitive information. If an attacker node views data between the two nodes, the confidentiality between the two nodes is broken. So, it is very important to keep in mind that any intruder cannot have access to hypersensitive data by intercepting the transmission between the two particular nodes. The most basic approach to attain data confidentiality is to provide a particular key and this process of assigning a particular key is called encryption. The data is assigned a particular key, encrypted
92
S. Abidin et al.
with that key then it is transmitted and again decrypted at the receiving node with the help of that key, so even if the attacker views the data it cannot process it as that node does not have that particular key [12].
3.3 Robustness and Sustainability If a node is attacked by an attacker node and the attacker node replaces the normal node, the working of the whole environment will be affected as now the attacker node will be having the particulars of a particular node in that environment. The sensor network should be robust against several intruder attacks, even if an intruder attack succeeds its encounter should be knocked down or decreased. System should be constructed so that it can tolerate and adapt to failure of a particular node. Each particular node should be designed to be as robust as possible.
3.4 Data Integrity Integrity should be attained to verify that any intruder or unknown node cannot alter or modify the transmitted data. Data integrity ensures that the data being transmitted has not been altered during the transmission time and reaches the acceptor node in its original form. Data authentication may also be related to data integrity as data authentication provides data integrity.
3.5 Data Freshness Data freshness signifies that no old data should be transmitted between the nodes which have already transmitted the same message, i.e., data should be fresh. Each node in WSN is assigned with a particular energy level. Energy is spent whenever a node sends, accepts, or processes a data. In a particular wireless sensor network, there is a possibility that an intruder or an attacker can catch hold of the transmitted data and retransmit the copy of old transmitted data again and again to the nodes in a particular environment thereby decreasing the residual energy level of the node and gradually the node will get destroyed due to insufficient energy. An attacker can send expired packets to the whole network environment, wasting the network resources and decreasing the lifetime of the network system. Data freshness is achieved by using nonce or a timestamp can be included with each data.
On Confidentiality, Integrity, Authenticity, and Freshness …
93
4 Attacks on WSN The WSNs attract a number of severe attacks due to the unfriendly environment, insecure wireless communication channels, energy restrictions, and various other networking circumstances. The malicious node or the attacker node when injected into the network could spread among all the neighboring nodes, potentially destroying the network, disrupting the services and taking over the control of entire network. The attacks on in WSN are generally classified into two broad categories as passive attacks-against data confidentiality and active attacks-against data confidentiality and data integrity. Furthermore, some of the important attacks are discussed in the section below.
4.1 Denial-of-Service Attack This attack attempts to demolish or anesthetize all the network’s resources available to its destined users either by the consumption of scarce or limited resources, alteration of configuration information, or physical destruction of the network components. The two general forms of the attacks are: the ones which crash the services and the others which flood the services. The most important attacking technique is IP address spoofing with the purpose of masking or suppressing the identity of sender or mimicking the identity of another node. It results in unusually slow, interrupted network performance and the inability of the sensor node to perform its designated (proposed) functions [13].
4.2 Sybil Attack This includes large-scale, distributed, decentralized, and several other types of protocols in WSN are primarily susceptible to such attacks. Earlier known as pseudo spoofing, a particular sensor node unjustifiably claims multiple identities and resides at multiple places in the networking environment. The Sybil attack has three orthogonal extending dimensions: direct versus indirect communication, fabricated versus stolen identities and simultaneity. Such type of attacks necessitates a one-to-one correspondence between the sensor nodes and their individual identities. Sybil attack can be used to initiate the attack on several types of protocols in WSN such as distributed protocols, data aggregation, misbehavior detection, voting, fair resource allocation protocols, and the routing protocols [14, 15].
94
S. Abidin et al.
Fig. 1 Black hole attack
4.3 Black hole Attack The attack occurs when a compromised node appears to be more attractive in comparison with the surrounding nodes with respect to the residual energy level and the routing algorithm used, resulting in a black hole and the assemblage is known as the black hole region. The accumulated traffic that is the incoming and outgoing traffic is diverted to this region and the data packets are unable to reach to their expected destination. It also provides an opportunity for analyzing, detecting the aberrant activities and thereby applying specially designed security analysis tools to the traffic [16] (Fig. 1).
4.4 Selective Forwarding Selective forwarding is also known as drop-data packet attack. Any mischievous node behaving as normal nodes receives and selectively analyzes different data packets. These nodes allow only few packets to pass from one node to another but on the other hand make excuses and deny to forward certain data packets, suppresses, or modifies them and eventually drops them thereby increasing the congestion and lowering the performance of network system. Such attacks happen either on the data flow or on the controlling packets and are often used in combination with the wormhole/sinkhole attacks [17].
4.5 Wormhole Attack One of the severe attacks in wireless sensor networks specifically against the locationbased wireless security system since these attacks do not require compromising of any sensor nodes and could be initiated even if the system maintains authenticity, confidentiality, integrity, non-repudiation of the data transferred. The attacker records the data packets (messages/bits) at a particular individual location in the network and
On Confidentiality, Integrity, Authenticity, and Freshness …
95
Fig. 2 Wormhole attack
traverses them to another location in the network via an established link (wired or wireless). The data packets can therefore be retransmitted selectively. In a way, the attacker simply fakes a route and destabilizes and interrupts the routing within the network [18] (Fig. 2).
4.6 Hello Flood Attack This is one of the simplest and easiest denial-of-service (DOS) attacks. The attacker is not an authorized node, broadcasts HELLO messages to all its neighboring authorized nodes, or simply bombards a targeted node with forged requests, successful or failure connection messages, with great transmission power. A guaranteed response is expected from the receiving node which assumes the sender node to be within its normal radio frequency range. However, these attacker nodes (sender nodes) are far away from frequency range of the network. The number of packets per second increases, processing of individual nodes decreases and the system or the network gets flooded with tremendously large amount of traffic [19, 21] (Fig. 3; Table 1).
5 Conclusions The sensor networks are popular for mission-critical-tasks, and security is immensely required for such hostile environment employed networks. Wireless sensor network is an interesting, fast emerging field in the modern scenario and provides great exposure for experimentation in the area of research and development. Through this paper, we have highlighted the four basic security issues in WSN like confidentiality, integrity, robustness, and authenticity. Various constraints to the network specifically focusing on the energy constraint and some applications of sensor network have also been talked about. The major types of attacks such as denial of service, black hole, wormhole, hello flood attack, Sybil and selective forwarding, and their defense strategy have been discussed briefly.
96
S. Abidin et al.
Fig. 3 Hello flood attack
Table 1 Attacks on layers and defensive measures Layers
Attacks
Defensive measures
Physical layer
Jamming, Tampering
Region mapping, spread-spectrum, priority messages, duty cycle, channel hopping Encryption, tamper proofing, hiding
Data Link layer Collision, Resource exhaustion Unfairness Network layer
Error-correcting codes, time diversity Limited rate transmission Small frames
Selective Forwarding Multi-path routing, monitoring Spoofed routing information, Upstream and downstream detection Sinkhole, wormhole, Sybil, hello flood Egress filtering, authentication, monitoring Redundancy checking Monitoring Authentication, probing, transmission time-based mechanism, geographical and temporal packet leashes, graphical, topological and connectivity-based approaches Trusted certification, key validation, position verification, resource testing, authentication, redundancy Verification of the bi-directionality link, signal strengthen detection, identity verification
Transport layer Flooding, De-synchronization
Client puzzles, rate limitation Authentication
On Confidentiality, Integrity, Authenticity, and Freshness …
97
References 1. G. Lu, B. Krishnamachari, C.S. Raghavendra, An adaptive energy-efficient and low-latency MAC for data gathering in wireless sensor networks. in IEEE IPDPS (2004) 2. B. Heinzelman, A.P. Chandrakasan, H. Balakrishnan, An application-specific protocol architecture for wireless microsensor networks. IEEE Trans. Wireless Commun. 1(4), 660–670 (2002) 3. S. Abidin, Key agreement protocols and digital signature algorithm. Int. J. Curr. Adv. Res. 6(8), 5359–5362 (2017) 4. N. Burri, P. von Rickenbach, R. Wattenhofer, Dozer: ultra-low power data gathering in sensor networks. in ACM/IEEE IPSN (2007) 5. Z. Jiang, J. Ma, W. Lou, J. Wu, A straightforward path routing in wireless ad hoc sensor networks. in IEEE International Conference on Distributed Computing Systems Workshops (2009), pp. 103–108 6. S. Bashyal, G.K. Venayagamoorthy, Collaborative Routing Algorithm for Wireless Sensor Network Longevity (IEEE, 2007) 7. J. Burrell, T. Brooke, R. Beckwith, Vineyard computing: sensor networks in agricultural production. Pervas. Comput. IEEE 3(1), 38–45 (2004) 8. S. Kim, S. Pakzad, D. Culler, J. Demmel, G. Fenves, S. Glaser, M. Turon, Health Monitoring of Civil Infrastructures using Wireless Sensor Networks. in ACM/IEEE IPSN (2007) 9. M. Ceriotti, L. Mottola, G.P. Picco, A.L. Murphy, S. Guna, M. Corra, M. Pozzi, D. Zonta, P. Zanon, Monitoring heritage buildings with wireless sensor networks: the torre aquila deployment. in ACM/IEEE IPSN (2009) 10. K. Lorincz, D. Malan, T.R.F. Fulford-Jones, A. Nawoj, A. Clavel, V. Shnayder, G. Mainland, S. Moulton, M. Welsh, Sensor networks for emergency response: challenges and opportunities. IEEE Pervas. Comput. Special Issue. Pervas. Comput. First Resp. (2004) 11. S. Abidin, A novel construction of secure RFID authentication protocol. Int. J. Sec. Comput. Sci. J. Malaysia 8(8), 33–36 (2014) 12. N.M. Durrani, N. Kafi, J. Shamsi, W. Haider, A.M. Abbsi, Secure Multi-hop Routing Protocols in Wireless Sensor Networks: Requirements, Challenges and Solutions (IEEE, 2013) 13. V. Bulbenkiene, S. Jakovlev, G. Mumgaudis, G. Priotkas, Energy loss model in Wireless Sensor Networks. in IEEE Digital Information Processing and communication (ICDIPC), 2012 Second International conference (2012), pp. 36–38 14. J.H. Chang, L. Tassiulas, Maximum lifetime routing in wireless sensor networks. IEEE/ACM Trans. Netw. 12(4), 609–619 (2004) 15. M. Zhang, S. Wang, C. Liu, H. Feng, An Novel Energy-Efficient Minimum Routing Algorithm (EEMR) in Wireless Sensor Networks (IEEE, 2008) 16. M. Saraogi, SecurityiIn Wireless Sensor Networks (University of Tennessee, Knoxville) 17. A. Jain, K. Kant, M.R. Tripathy, Security solutions for wireless sensor networks. in 2012, Second International Conference on Advanced Computing & Communication Technologies 18. C. Karlof, D. Wanger, Secure routing in wireless sensor network: attacks and countermeasures. in First IEEE International Workshop on Network Protocols and Applications (2013), pp. 113– 127 19. M. Ahuja, S. Abidin, Performance analysis of vehicular ad-hoc network. Int. J. Comput. Appl. USA 151(7) 28–30 (2016) 20. N. Kumar, A. Mathuria, Improved Write Access Control and Stronger Freshness Guarantee to Outsourced Data. (ICDCN, 2017) 21. W. Feng, J. Liu, Networked wireless sensor data collection: issues, challenges, and approaches. IEEE Commun. Surv. Tutor. (2011)
Networking Analysis and Performance Comparison of Kubernetes CNI Plugins Ritik Kumar and Munesh Chandra Trivedi
Abstract Containerisation, in recent world, has proved to be a better aspect to deploy large-scale applications in comparison with virtualisation. Containers provide a small and compact environment, containing all libraries and dependencies, to run an application. It has also come to acknowledgement that application deployment on a multi-node cluster has proved to be more efficient in terms of cost, maintenance and fault tolerance in comparison with single-server application deployment. Kubernetes play a vital role in container orchestration, deployment, configuration, scalability and load balancing. Kubernetes networking enable container-based virtualisation. In this paper, we have discussed the Kubernetes networking in detail and have tried to give a in-depth view of the communication that takes place internally. We have tried to provide a detailed analysis of all the aspects of Kubernetes networking including pods, deployment, services, ingress, network plugins and multi-host communication. We have also tried to provide in detail comparison of various container network interface(CNI) plugins. We have also compared the results of benchmark tests conducted on various network plugins keeping performance under consideration (Ducastel, Benchmark results of Kubernetes network plugins (CNI) over 10 Gbit/s network [1]). Keywords Networking analysis · Performance comparison · Kubernetes · CNI · Docker · Plugins · Containers · Pod · Virtualisation · Cluster
R. Kumar (B) · M. C. Trivedi Department of Computer Science and Engineering, National Institute of Technology, Agartala, Agartala, India e-mail: [email protected] M. C. Trivedi e-mail: [email protected] URL: https://bit.ly/2Zz4pB2 © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_9
99
100
R. Kumar and M. C. Trivedi
1 Introduction With increasing traffic to applications online scalability, load balancing, fault tolerance, configuration, rolling updates and maintenance remain an area of concern. Traditional method of solving these issue makes virtualisation come into play. Creating virtual machines requires hardware resources, processing power and memory same as an operating system would require. Requiring huge amount of resources limits the number of virtual machines that can be created. Moreover, the overhead of running virtual machines remains an area of deep concern. Although virtual machines are still widely used in the Infrastructure as a Service (IaaS) space, we see Linux containers dominating the platform as a Service (PaaS) landscape [2]. Containerisation comes to the rescue considering such huge disadvantages of virtualisation. In a container-based virtualisation, we create small compact environment comprising of all the libraries and dependencies to run the application. Various research studies have shown that containers incur negligible overhead and have performance at par with native deployment [3] and take much less space [4]. In real-world scenario, a large-scale application deployment as a single service is not feasible rather it should is fragmented into small microservices and deployed in a multi-node cluster inside containers using container orchestration tools such as Kubernetes and docker swarm. Kubernetes is most portable, configurable and modular and is widely supported across different public clouds like Amazon AWS, IBM Cloud, Microsoft Azure, and Google Cloud [5]. Deployment on multi-node cluster not only reduces the overhead but also make the deployment and maintenance economically viable.For instance, large-scale applications such as eBay and pokemon go are deployed on Kubernetes. Containers can be launched in a few seconds of duration and can be shortlived. Google Search launches about 7,000 containers every second [6] and a survey of 8 million container usage shows that 27% of containers have life span less than 5 min and 11% less than 1 min [7]. Container networking becomes very important when containers are created and destroyed so often [8]. Systems like Kubernetes are designed to provide all the primitives necessary for microservice architectures, and for these the network is vital [9]. Using Kubernetes, we can create a strongly bind cluster with one master node and multiple slave nodes. In this chapter, we have discussed in detail about all the aspects of Kubernetes networking, various third-party plugins that kubernetes support, container to container communication, container to host communication and host-to-host communication. At the same time, we have compared the available CNI plugins on various parameters to provide an in-depth analysis Sect. 4.
Networking Analysis and Performance Comparison of Kubernetes CNI Plugins
101
Fig. 1 Pod network
2 Components of Kubernetes Networking 2.1 Pod Network In Kubernetes networking, pod is the most fundamental unit similar to what atom is to matter or a cell is to human body. A pod is a small environment consisting of one or more containers, an infrastructure (Pause) container and volumes. The idea of pod has been included in Kubernetes to reduce the overhead when two containers need to communicate very frequently. The infrastructure container is created to hold the network and inter-process communication namespaces shared by all the containers in the pod [10]. Inside a pod all the containers communicate through a shared network stack. Figure 1 depicts veth0 as a shared network stack between two containers inside a pod. Both the containers are addressable with IP 172.19.0.2 from outside the cluster. As both the containers have the same IP address, they are differentiated by the ports they listen to. Apart from this, each pod has a unique IP using which is referenced by other pods in cluster and data packets are transmitted and received. The shared network stack veth0 is connected to the eth0 via the cbr0 bridge (a custom bridge network) for communication with other nodes in the cluster. An overall address space is assigned for bridges in each node and then each bridge is assigned an address within this space depending on the node. The concept of custom bridge network is adopted instead of choosing docker0 bridge network to avoid IP address conflict between bridges in different nodes or namespaces. Whenever a packet is transmitted, it goes via the router or external gateway which is connected to each of the nodes in cluster. Figure 2 shows two nodes connected via the router or external gateway. When a pod sends a data packet to a pod in different node in the network, it goes via the router. On receiving a data packet at the gateway, routing rules are applied. The rules specify how data packets are destined for each bridge and how they should be routed. This combination of virtual network interface,
102
R. Kumar and M. C. Trivedi
Fig. 2 Pod network
bridges and routing rules is known as overlay network. In Kubernetes, it is known as a pod network as they enable pods on different nodes to communicate with each other.
2.2 Service Networking The pod networking is robust but not durable as the pods are not long lasting. The pods IP address cannot be taken as an endpoint for sending packets in the cluster as the IP changes everytime a new pod is created. Kubernetes solves this problem by creating services. A service is a collection of pods. It acts as a reverse-proxy load balancer [11]. The service contains a list of pods to which it redirects the client requests. Kubernetes also provides additional tools to check for health of pods. Service also enables load balancing, i.e. the external traffic is equally distributed among all the pods. There are four types of services in Kubernetes ClusterIP, NodePort, LoadBalancer and ExternalName. Similar to the pod network, the service network is also virtual [12]. Each service in Kubernetes has an virtual IP address. Kubernetes also provides an internal cluster DNS which maps service IP address to service name. The service IP is reachable from anywhere inside the cluster. Each node in cluster has a kube-proxy running on it. The kube-proxy is a network proxy that passes traffic between the
Networking Analysis and Performance Comparison of Kubernetes CNI Plugins
103
client and the servers. It redirects client requests to server. The kube-proxy supports three modes of proxy, i.e. userspace, IPTables and IPVS. Now, whenever a packet is sent from client to server it passes the bridge at the client node and then moves to the router. The kube-proxy running on the nodes determines the path in which packet is to be routed. The packet is then forwarded to its destination. The IPTables rules are applied and the packet is redirected to its destination. Thus, this makes up the service network. Service proxy system is durable and provides effective routing to packets in the network. The kube-proxy runs on the each node and is restarted whenever it fails. The service network ensures a durable and secures means of transmission of data packets within the cluster.
2.3 Ingress and Egress The pod network and service network are complete within itself for communication between the pods inside the cluster. However, the services cannot be discovered by the external traffic. In order to access the services using HTTP or HTTPS requests, ingress controller is used. The traffic routing is controlled by the rules defined by the ingress resource. Figure 3 depicts how external requests are directed to the services via ingress.
Fig. 3 Ingress
104
R. Kumar and M. C. Trivedi
Egress network helps pods communicate to outside world. Egress network helps to establish communication path between the pods in the network to other services outside the cluster on the top of network policies.
3 Kubernetes Architecture A Kubernetes cluster comprises of a master node and one or more worker nodes. The master node runs the kube-api server, kube-control manager, kube-scheduler and etcd [11]. The kube-api server acts as an interface between the user and kubernetes engine. It creates, reads, update and delete pods in the cluster. It also provides a shared frontend through which all the components interact. The control manager watches the state of the cluster and shift the state to the desired state by maintaining the desired number of pods in the cluster. The scheduler is responsible scheduling pods to be most appropriate nodes in the cluster. The etcd is key-value database which stores configuration data of the cluster. It also represents the state of the cluster. Each worker node has a kubelet running on it. Kubelet is a Kubernetes agent running on each node which is responsible for communication between the master and the worker nodes. The kubelet executes commands received from the master node on the worker nodes.
4 Container Network Interfaces (CNI) Kubernetes offers a lot of flexibility in terms of kind of networking we want to establish. Instead of providing its own networking, Kubernetes allow us to create a plugin-based generic approach for networking of containers. CNI consists of a specification and libraries for writing plugins to configure network interfaces in Linux containers, along with a number of supported plugins [13]. The CNI is responsible for IP address assignment in the cluster and providing routes for communication inside the cluster. CNI is capable of addressing containers IP addresses without resorting to network address translation. The various netwrok plugins built on top of CNI which can be used with Kubernetes are Flannel, Calico, Weavenet, Contiv, Cilium, Kube-router and Romana. The detailed comparison of the plugins is shown in below [14–20].
Networking Analysis and Performance Comparison of Kubernetes CNI Plugins Flannel
Calico
WeaveNet
Contiv
Cilium
Language IP Version Network Policy Encryption
Go IPV4 None None
Go IPV4 IPV6 Ingress Egress None
Go IPV4 Ingress Egress NaCl crypto Libraries
Go IPV4 IPV6 Ingress Egress None
Network model Layer 2 Encapsulation Layer 3 routing
Layer 2
Layer 3
Layer 2
VXLAN
–
VXLAN
Layer 2 Layer 3 VXLAN
Go IPV4 IPV6 Ingress Egress Encryption with IPSec tunnels Layer 2
IP tables
IP tables Kube-proxy
IPTables Kube-proxy
Layer 3 Encapsulation Layer 4 route distribution Load balancing Database
–
IPIP
–
Platforms
Kube Router Go IPV4 Ingress
105 Romana
None
Bash IPV4 Ingress Egress None
Layer 3
Layer 3
VXLAN Geneve
–
–
IP tables
BPF Kubeproxy
IP tables
Sleeve
VLAN
–
BGP
–
BGP
–
IP tables IPVS IP sets IPVS/ LVS DR mode, GRE/IPIP BGP
BGP, OSPF
–
Available
Available
Available
Available
Available
Available
K8s ETCD
Storing state in K8s API datastore Windows linux
Storage pods
K8s ETCD
K8s ETCD
K8s ETCD
K8s ETCD
Linux
Linux
Linux
Linux
Windows Linux
Linux
in
–
5 Table Showing Comparison of CNI Plugins Conclusions from the above table 1. All plugins use ingress and egress network policies except Flannel. 2. Only Weavenet and Cilium offer encryption, thus providing security while transmission. The encryption and decryption increase the transmission time for packets. 3. Weavenet and Contiv offers both Layer 2 as well as Layer 3 encapsulation. 4. Romana does not offer any encapsulation. 5. Flannel, WeaveNet and Cilium is not supported by any Layer 4 route. 6. All network plugins have inbuilt load balancer except flannel.
106
R. Kumar and M. C. Trivedi
6 Performance Comparison The main aim of all CNI plugins is to achieve container abstraction for container clustering tools. Availability of so many network plugins makes it difficult to choose one considering advantages and disadvantages. In this section, we have compared the CNI plugins, using benchmark results obtained on testing network plugins on Supermicro bare-metal servers connected through a supermicro 10 Gbit switch, keeping performance under consideration [1]. Testing Setup 1. The test was performed by Alexis Ducastel on three supermicro bare-metal servers connected through a supermicro 10 Gbit switch [1]. 2. The Kubernetes (v1.14) cluster was set up on Ubuntu 18.04 LTS using docker 18.09.2. 3. The test compares Flannel (v0.11.0), Calico (v3.6), Canal (Flannel for networking + Calico for firewall) (v3.6), Cilium (v1.4.2), Kube-Router (v0.2.5) and Weavenet (v2.5.1). 4. The tests were conducted on configuring server and switch with jumbo frames activated (by setting MTU to 9000). 5. For each parameter, the tests were conducted thrice and the average of the three tests were noted.
6.1 Testing Parameters 1. The performance was tested on the basis of bandwidth offered by the plugins to different networking protocols such as TCP, UDP, HTTP, FTP and SCP. 2. The plugins were also compared on basis of the resource consumption such as RAM and CPU.
6.2 Test Results 6.2.1
Bandwidth Test for Different Network Protocol
In this test, the different network plugins were tested on the basis of various network protocol such as TCP, UDP, HTTP, FTP and SCP. The bandwidth offered by the plugins to each networking protocol was recorded. The table below summarises the result of the tests. All the test results are recorded at custom MTU of 9000.
Networking Analysis and Performance Comparison of Kubernetes CNI Plugins Network Plugins Flannel Calico Canal WeaveNet Cilium Kube router Cilium encrypted WeaveNet encrypted
TCP 9814 9838 9812 9803 9678 9762 815 1320
UDP 9772 9830 9810 9681 9662 9892 402 417
HTTP 9010 9110 9211 7920 9131 9375 247 351
FTP 9295 8976 9069 8121 8771 9376 747 1196
107
SCP 1653 1613 1612 1591 1634 1681 522 705
Note: The bandwidth is measured in Mbit/s (Higher the better). Conclusions from bandwidth tests 1. For transmission control protocol (TCP) calico and flannel stand out as best cnis. 2. For user datagram protocol (UDP) kube-router and calico offered higher bandwidth. 3. For hyper text transfer protocol (HTTP) kube-router and canal offered higher bandwidth. 4. For file transfer protocol (FTP) and secure copy protocol (SCP) kube-router and flannel offered higher bandwidth. 5. Among weavenet and cilium encrypted, weavenet offered higher bandwidth for all of the network protocols. Network plugins
Bare metal (No plugins only K8s)
Flannel
Calico
Canal
Cilium
WeaveNet
Kube router
Cilium encrypted
WeaveNet encrypted
RAM CPU usage
374 40
427 57
441 59
443 58
781 111
501 89
420 57
765 125
495 92
Note: The RAM (without cache) and CPU usage are measured in percentage (Lower the better). Conclusions from resource utilisation tests 1. Kube Router and Flannel show a very good performance in resource utilisation consuming minimum percentage among all network plugins in terms of CPU and RAM. 2. Cilium consumes a lot of memory and processing power in comparison with other network plugins. 3. For encrypted network, plugins weavenet offers better performance in comparison with cilium.
108
R. Kumar and M. C. Trivedi
7 Conclusion The pod, services, ingress and CNIs collectively make the Kubernetes networking durable and robust. The Kubernetes architecture (including the pod, services, kube-apiserver, kube-scheduler, kube-control manager and etcd) manages the entire communication that takes place inside the cluster. The option of having CNI enables us to choose from a variety of plugins as per the requirement. From the comparison of the various plugins, none stood out efficent in all the parameters. The following conclusions can be drawn from the comparison results: 1. Flannel is easy to set up, auto-detects MTU, offers (on average) a good bandwidth for transmitting packets under all the protocols and has very less CPU and RAM utilisation. However, flannel does not support network policies and does not provide a load balancer. 2. Calico and kube-router on average offers a good bandwidth for transmitting packets, supports network policies, provides a load balancer and have a low resource utilisation. At the same time, both the networks do not auto-detect MTU. 3. Among the encrypted networks weavenet stand out better than cilium in most of the aspects including bandwidth offered to networking protocols and resource utilisation (CPU and RAM).
References 1. A. Ducastel, Benchmark results of Kubernetes network plugins (CNI) over 10 Gbit/s network. Available at https://itnext.io/benchmark-results-of-kubernetes-network-plugins-cniover-10gbit-s-network-updated-april-2019-4a9886efe9c4. Accessed on Aug 2019 2. L.H. Ivan Melia, Linux containers: why they’re in your future and what has to happen first, White paper. Available at http://www.cisco.com/c/dam/en/us/solutions/collateral/datacenter-virtualization/openstack-at-cisco/linux-containers-white-paper-cisco-red-hat.pdf. Downloaded in Aug 2019 3. R.R.W. Felter, A. Ferreira, J. Rubio, An updated performance comparison of virtual machines and linux containers, in Published at the International Symposium on Performance Analysis of Systems and Software (ISPASS) (IEEE, Philadelphia, PA, 2015), pp. 171–172 4. A.R.R.R. Dua, D. Kakadia, Virtualization vs containerization to support paas, in Published in the proceedings of IEEE IC2E (2014) 5. W.F. Cong Xu, K. Rajamani, NBWGuard: realizing network QoS for Kubernetes, in Published in proceeding Middleware ’18 Proceedings of the 19th International Middleware Conference Industry 6. A. Vasudeva, Containers: the future of virtualization & SDDC. Approved by SNIA Tutorial Storage Networking Industry Association (2015). Available at https://goo.gl/Mb3yFq. Downloaded in Aug 2019 7. K. McGuire, The Truth about Docker Container Lifecycles. Available at https://goo.gl/Wcj894. Downloaded in Aug 2019 8. Official Docker Documentation. Available at https://docs.docker.com/network/. Accessed on Aug 2019 9. R.J. Victor Marmol, T. Hockin, Networking in containers and container clusters. Published in Proceedings of netdev 0.1 (Ottawa, Canada, 2015). Downloaded in Aug 2019
Networking Analysis and Performance Comparison of Kubernetes CNI Plugins
109
10. A. Kanso, H. Huang, A. Gherbi, Can linux containers clustering solutions offer high availability. Published by semantics scholar (2016) 11. Official Kubernetes Documentation Available at https://kubernetes.io/docs/concepts/. Accessed on Aug 2019 12. M. Hausenblas, Container Networking from Docker to Kubernetes 13. Container Networking Interface. Available at https://github.com/containernetworking/cni. Accessed on Aug 2019 14. Flannel. Available at https://github.com/coreos/flannel. Accessed on Aug 2019 15. Calico. Available at https://docs.projectcalico.org/v3.8/introduction/. Accessed on Aug 2019 16. WeaveNet. Available at https://www.weave.works/docs/net/latest/overview/. Accessed on Aug 2019 17. Contiv. Available at https://contiv.io/documents/networking/. Accessed on Aug 2019 18. Cilium. Available at https://cilium.readthedocs.io/en/v1.2/intro/. Accessed on Aug 2019 19. Kube Router. Available at https://github.com/cloudnativelabs/kube-router. Accessed on Aug 2019 20. Romana. Available at https://github.com/romana/romana. Accessed on Aug 2019
Classifying Time-Bound Hierarchical Key Assignment Schemes Vikas Rao Vadi, Naveen Kumar, and Shafiqul Abidin
Abstract A time-bound hierarchical key assignment scheme (TBHKAS) ensures time-restricted access to the hierarchical data. There is a large number of such schemes are proposed in the literature. Crampton et al. studied the existing hierarchical key assignment schemes in detail and classify them into generic schemes on the basis of storage and computational requirements. Such generic classification helps the implementers and researchers working in this area in identifying a suitable scheme for their work. This work studies the TBHKAS and classifies them with the same spirit. As best of our knowledge, the proposed classification captures every existing TBHKAS in the literature. Furthermore, the proposed schemes are compared and analyzed in detail. Keywords Hierarchical access control · Key management · Time-bound
1 Introduction A hierarchical access control system defines an access control in the hierarchy for controlled information flow with the help of a key called access key. A hierarchy is composed of security classes with partially ordered relation between them. A security class consists of a number of users and resources sharing some common security primitives. Let there is m number of security classes in the hierarchy say C1 , C2 ,…, Cm , having a partially ordered relationship between them denoted by V. R. Vadi TIPS, New Delhi, India e-mail: [email protected] N. Kumar (B) IIIT Vadodara, Gandhinagar, India e-mail: [email protected] S. Abidin HMR Institute of Technology & Management, (GGSIPU), New Delhi, Delhi, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_10
111
112
V. R. Vadi et al.
binary relation ‘’. Ci C j denotes that the class Ci is at lower security level than class C j and users of class C j are authorized to access resources of class Ci and its descendants. There are other applications which require time-bound access control in the hierarchy like,first: digital pay-TV system, where the service provider organizes the channels into several possible subscription packages for the users and a user is authorized to access his subscription package for a fixed amount of time, second: electronic journal subscription system, where the user can subscribe to any combination of available journals for a specified amount of time, third: electronic newspaper subscription system where a user can access news online for a specified period of time, etc. In a Time-Bound Hierarchical Key Assignment Scheme (TBHKAS), total system time T is divided into z distinct time slots named t0 , t1 , . . . , tz−1 . Each time slot tx , 0 ≤ x ≤ z is of same size, for example, an hour or one month. The system time T is long enough so that it will not constrained the system. In a TBHKAS, each user associated with a class can access the own class resources and the resources associated with the descendant class for an authorized interval of time such as ta to tb , t0 ≤ ta ≤ tb ≤ tz−1 . Tzeng [1] in 2002 proposed first TBHKAS, that was based on RSA. A central authority (CA) is assumed in the system. CA assigns a secret information I (i, ta , t2 ) securely to each user at the time of their registration and is authorize to access the resources and data of their descendant classes between its specified time interval ta to tb . Class Ci ’s data is encrypted with a key ki,t for time slot t. To decrypt the data of an authorize class C j and for a authorize time slot t, ta ≤ t ≤ tb , the user needs to compute its decryption key k j,t using its secret information I (i, ta , tb ) and public data. A user associated with class Ci is authorized to access class C j ’s resources using encryption key ki,t , between time interval ti to t j (0 ≤ ti ≤ t j ≤ z), C j Ci . Once an encryption key expires, the user will not able to access any resource with the help of expired key. Bertino et al. [2] immediately adopted the Tzeng’s scheme and showed that the scheme is readily applicable to broadcasting XML documents securely on the Internet. However, Yi and Ye [3] found collusion attack on the Tzeng’s [1] scheme. The scheme shows that three users can conspire to access some class encryption keys which they should not authorize to access. An another scheme was proposed by Tzeng [4] based on anonymous authentication scheme which provides an authentication mechanism so that the user’s identity is unknown while retrieving data from a website or a large database system through Internet. The system uses the TBHKAS to control the authorization. Yeh [5] propose another RSA-based TBHKAS with similar key derivation cost as in Tzeng’s scheme [1]. They claim that the scheme is collusion secure. However, later in 2006 Ateniese et al. [6] identified two-party collusion attack on the scheme which was even spreaded over a later scheme proposed by Yeh [7]. An alternative scheme was proposed by Wang and Laih [8] over Akl-Taylor [9] scheme. It uses the idea of merging keys where multiple keys can be combined (known
Classifying Time-Bound Hierarchical Key Assignment Schemes
113
as aggregate key) corresponding to a class users. The computational requirement for generating and deriving a key from an aggregate key requires only one modular exponentiation. Later Yeh and Shyam [10] scheme has improved on the Wang Laih scheme and has theoretically shown polynomial improvement in both memory and performance requirements. Shyam [11] does the performance analysis of the two given schemes and show that the scheme in [10] does better as compared to the Wang Laih scheme [8]. Another RSA-based time-bound key assignment scheme was proposed by Huang and Chang [12]. It uses distinct key for each time period so that if a user is revoked from a class, all communication should be immediately stopped. The objective is to reduce any possible damage over an insecure channel. The algorithms for key generation and derivation are simple. Dynamic operations such as class addition and deletion are given. The scheme improves public storage cost as compare to the Tzeng’s [1] scheme. However, later two-party collusion attack is given by Tang and Mitchell [13] on the scheme. A new time-bound hierarchical key assignment scheme was proposed by Chien [14] based on low-cost tamper-proof devices, where even the owner cannot access the protected data on the tamper-resistant device. In contrast of using Tzeng [1] style, a tamper-proof device performing mainly low-cost one-way hash functions. Compare with Tzeng’s [1] scheme and without public-key cryptography, Chien’s scheme reduces computation cost and implementation cost. The scheme uses indirect key derivation with independent keys. Yi [15] in 2005, show that this scheme is insecure against collusion attack whereby three users conspire to access some secret class keys that they should not know according to Chien’s scheme, but in a manner not as sophisticated as in case of against Tzeng’s scheme. In 2006, Alfredo et al. [16] give few solutions to the issues found in Chien et al.’s [14] scheme. Bertino et al. [17] in 2008 propose another secure time-bound HKAS for secure broadcasting based on a tamper-resistant device. The security of this scheme relies on hardness of discrete logarithm problem on Elliptic-curve over the finite field, that require more computation cost as compare to Chien’s [14] scheme. Sun et al. [18] recently in 2009 showed that scheme [17] is more vulnerable than the Chien’s [14] scheme aside from increased computation power. Also, they suggest a solution to the Chien’s scheme avoiding inefficient Elliptic-Curve cryptography.
1.1 Motivation Researchers have done a lot of work on Time-Bound Hierarchical Access Control (TBHAC) after Tzeng’s [1] work, however majority of the schemes either have design issues, takes significant computation cost or are not practical to implement. Researchers are still trying to find practical and cost-effective solutions to real problems. To move forward toward the effective solution to the problem, we required a classify existing TBHKAS’s and their analysis that will help researchers working in this field, to move in the right direction. Crampton et al. [19] in 2006 classify
114
V. R. Vadi et al.
the existing non-TBHKAS’s from the literature and give a high-level comparison between them. Later, Kumar et al. [20] extend their classification to include indirect key derivation schemes with dependent keys. In this paper, we are trying to classify TBHKAS’s from the literature and give a high-level comparison between them without reference to their implementation details.
1.2 Notations Used We follow the notations of Crampton et al. [1]. Notations used in this paper are as follows: • • • • • • •
λ is a time interval between time slots t1 and t2 , t1 , t2 ∈ z Sx,λ is the secret information corresponding to class x and time interval λ Pub is the set of values stored in the public storage of the system k x,t is the key of the class labeled x for a time slot t x · y1> k(y) using above function g and produce k(y). Scheme 4 is an indirect scheme. It requires single-value private storage and comparable public storage with respect to scheme 2 and scheme 3. Advantages of such
Classifying Time-Bound Hierarchical Key Assignment Schemes
117
schemes are low key generation cost and low key derivation cost in comparison with other schemes as these schemes are using low-cost hash functions in comparison with exponential functions. The disadvantage of these type of schemes is that these schemes require some secure mechanism with each user such that it can derive all authorize derivation keys locally without revealing their secret information to the user, for example, tamper-proof device. Some schemes of this type in the literature are [14, 16–18, 21–25], etc.
3 Discussion In this section, we compare all proposed time-bound generic schemes with respect to private storage required by a user, public storage required by the system and their user revocation cost. Table 1 gives a comparison between all proposed time-bound generic schemes. In Table 1, we also consider user revocation cost for comparison all proposed generic TBHKAS when a user is revoked in the hierarchy. In the table, |E| represents the number of edges in the hierarchy. From the table, we can see that scheme 1 require a significant amount of private storage with a significant update cost when a user is revoked from the hierarchy. In comparison with scheme 1, scheme 2 require singlevalue private storage with each user to store their secret information but with the cost of a huge public store(i.e. m · z). Update operations needed when a user is revoked in the hierarchy affects all the users in the hierarchy. With respect to scheme 2, scheme 3 requited comparable update cost in private storage with additional update cost in public store when a user is revoked from the hierarchy but public storage required is significantly less, that makes this type of schemes better than previous schemes. Scheme 4 is an indirect scheme that more suited the pay-TV type of applications. This scheme is comparable to scheme 3 with an advantage that it requires less update cost in private storage when a user is revoked.
Table 1 Comparison of time-bound generic HKAS’s Scheme Storage Direct/Indirect Update cost Private Public Private Scheme 1 Scheme 2 Scheme 3 Scheme 4
| ↓ x| · |λ| 1 1 1
All m·z m+z |E| + z
Direct Direct Direct Indirect
| ↑ (↓ x)| ALL | ↑ (↓ x)| | ↓ x|
Public NIL NIL | ↓ x| | ↓ x|
118
V. R. Vadi et al.
4 Conclusions and Future Work We proposed an extension to the work by Crampton et al. [1]. They have proposed a generic classification to non-time-bound HKASs. We extend their classification to classify TBHKAS’s that will capture all the existing TBHKAS’s from the literature as best of my knowledge. We are also given a comparison between all proposed generic TBHKAS’s in Table 1. From the table, we can conclude that scheme 3 is better if direct TBHKAS’s are taken into consideration. Scheme 4 is efficient than other schemes but requires an extra mechanism with each user for secure computation. We can see from Table 1 that TBHKAS’s are still taking a significant amount of public storage and user revocation cost. Also, many of the time-bound schemes are not having their security proofs. Another important problem is that we are still do not have any good dynamic TBHKAS. So, in all, we have a lot of opportunities in this area for research.
References 1. W.G. Tzeng, A time-bound cryptographic key assignment scheme for access control in the hierarchy. IEEE Trans. Knowl. Data Eng. 14(1), 182–188 (2002) 2. E. Bertino, B. Carminati, E. Ferrari, A temporal key management scheme for secure broadcasting of XML documents, in Proceedings of the ACM Conference on Computer and Communications Security (2002), pp. 31–40 3. X. Yi, Y. Ye, Security of Tzeng’s time-bound key assignment scheme for access control in a hierarchy. IEEE Trans. Knowl. Data Eng. 15(4), 1054–1055 (2003) 4. W.G. Tzeng, A secure system for data access based on anonymous and time-dependent hierarchical keys, in Proceedings of the ACM Symposium on Information, Computer and Communications Security (2006), pp. 223–230 5. J. Yeh, An RSA-based time-bound hierarchical key assignment scheme for electronic article subscription, in Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management (Bremen, Germany, Oct 31–Nov 5, 2005) 6. G. Ateniese, A. De. Santis, A.L. Ferrara, B. Masucci, Provably secure time-bound hierarchical key assignment key assignment scheme, in Proceedings 13th ACM Conference on Computer and Communications Security (CCS ’06) (2006), pp. 288–297 7. J. Yeh, A secure time-bound hierarchical key assignment scheme based on RSA public key cryptosystem. Inf. Process. Letters 105, 117–120 (2008). February 8. S.-Y. Wang, C. Laih, Merging: an efficient solution for a time-bound hierarchical key assignment scheme. IEEE Trans. Depend. Sec. Comput. 3(1), 91–100 (2006) 9. S.G. Akl, P.D. Taylor, Cryptographic solution to a problem of access control in a hierarchy. ACM Trans. Comput. Syst. 1(3), 239–248 (1983) 10. J. Yeh, R. Shyam, An efficient time-bound hierarchical key assignment scheme with a new merge function. Submitted to IEEE Trans. Depend. Sec. Comput. (2009) 11. R. Shyam, an efficient time-bound hierarchical key assignment scheme with a new merge function: a performance study, MS project posted at ScholarWorks (2009). http://scholarworks. boisestate.edu/cs_gradproj/1 12. H.F. Huang, C.C. Chang, A new cryptographic key assignment scheme with time-constraint access control in a hierarchy. Comput. Stand. Interf. 26, 159–166 (2004) 13. Q. Tang, C.J. Mitchell, Comments on a cryptographic key assignment scheme. Comput. Stand. Interf. 27, 323–326 (2005)
Classifying Time-Bound Hierarchical Key Assignment Schemes
119
14. H.Y. Chien, Efficient time-bound hierarchical key assignment scheme. IEEE Trans. Knowl. Data Eng. 16(10), 1301–1304 (2004). Oct 15. X. Yi, Security of Chien’s efficient time-bound hierarchical key assignment scheme. IEEE Trans. Knowl. Data Eng. 17(9), 1298–1299 (2005). Sept 16. A.D. Santis, A.L. Ferrara, B. Masucci, Enforcing the security of a time-bound hierarchical key assignment scheme. Inf. Sci. 176(12), 1684–1694 (2006) 17. E. Bertino, N. Shang, S. Samuel, Wagstaff Jr., Efficient time-bound hierarchical key management scheme for secure broadcasting. IEEE Trans. Depend. Sec. Comput. Arch. 5(2), 65–70 (2008) 18. H.M. Sun, K.H. Wang, C.M. Chen, On the security of an efficient time-bound hierarchical key management scheme. IEEE Trans. Depend. Sec. Comput. 6(2), 159–160 (2009). April 19. J. Crampton, K. Martin, P. Wild, On key assignment for hierarchical access control, in Proceedings of 19th Computer Security Foundations Workshop (2006), pp. 98–111 20. N. Kumar, A. Mathuria, M.L. Das, On classifying indirect key assignment schemes for hierarchical access control, in Proceedings of 10th National Workshop on Cryptology (NWCR 2010) (2010), pp. 2–4 21. F. Kollmann, A flexible subscription model for broadcasted digital contents, cis, in 2007 International Conference on Computational Intelligence and Security (CIS 2007) (2007), pp. 589– 593 22. N. Kumar, A. Mathuria, M.L. Das, Simple and efficient time-bound hierarchical key assignment scheme. ICISS 191–198, 2013 (2013) 23. H.-Y. Chien, Y.-L. Chen, C.-F. Lo, Y.-M. Huang, A novel e-newspapers publication system using provably secure time-bound hierarchical key assignment scheme and XML security, in Book on Advances in Grid and Pervasive Computing, vol. 6104 (May 2010), pp. 407–417. ISBN 9783-642-13066-3 24. N. Kumar, S. Tiwari, Z. Zheng, K.K. Mishra, A.K. Sangaiah, An efficient and provably secure time-limited key management scheme for outsourced data. Concurr. Comput. Pract. Exp. 30(15) (2018) 25. Q. Xu, M. He, L. Harn, An improved time-bound hierarchical key assignment scheme, in Proceedings of the 2008 IEEE Asia-Pacific Services Computing Conference (2008), pp. 1489– 1494, ISBN: 978-0-7695-3473-2 26. W.T. Zhu, R.H. Deng, J. Zhou, F. Bao, Time-bound hierarchical key assignment: an overview. IEICE Trans. Inf. Syst. E93-D(5), 1044–1052 (2010)
A Survey on Cloud Workflow Collaborative Adaptive Scheduling Delong Cui, Zhiping Peng, Qirui Li, Jieguang He, Lizi Zheng, and Yiheng Yuan
Abstract Objectively speaking, cloud workflow requires task assignment and virtual resource provisioning to work together in a collaborative manner for adaptive scheduling, thus to balance the interests of both supply and demand in the cloud service under the service level agreements. In this study, we present a survey on the current cloud workflow collaborative adaptive scheduling from the perspectives of resource provisioning and job scheduling, together with the existing cloud computing research, and look into the key problems to be solved and the future research. Keywords Cloud workflow · Collaborative scheduling · Adaptive scheduling
1 Introduction The scheduling problem of workflow under cloud computing environment has attracted considerable attention of researchers recently [1], and important progress has been made in the implementation of time minimization, fairness maximization, throughput maximization and resource optimization and allocation. But from the perspective of cloud service supply and demand, service level agreements (SLAs) and resource utilization rate are, respectively, the two most fundamental interests that cloud users and cloud service providers care for [2–4]. In complex, transient, and heterogeneous cloud environment, in order to balance the interests of supply and demand sides under the premise of ensuring user service level agreement, it is necessary to conduct cooperative and adaptive scheduling on workflow tasks and virtualized resources. When the supply and demand sides of cloud service achieve agreement on the work amount to be performed and service level agreements, cloud service suppliers are more concerned with how to maximize resource utilization rate with a certain resource combination scheme, thereby minimizing operational costs, and cloud service users are more concerned about how to minimize the rental time D. Cui (B) · Z. Peng · Q. Li · J. He · L. Zheng · Y. Yuan College of Computer and Electronic Information, Guangdong University of Petrochemical Technology, 525000 Maoming, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_11
121
122
D. Cui et al.
with a certain kind of task scheduling method, thus minimizing the use cost. Therefore, a compromised solution is to achieve balance between supply and demand sides, and to achieve co-scheduling between cloud workflow tasks and virtualized resources. However, due to the cyclical and transient nature of cloud workflow, cloud computing resources have a high degree of complexity and uncertainty, and that the workflow tasks and virtualized resources achieve online optimization scheduling through collaborative self-adaptive means is required objectively [5–7]. However, it is very difficult to perform co-adaptive scheduling of workflow tasks and virtualized resources under the rapidly changing cloud computing environment [8]. From the perspective of optimized allocation of task, the scheduling of various types of cloud workflow tasks on multiple processing units has proved to be an NP complete problem [9]. From the perspective of resource optimization supply, the virtual unit placement needs to consider energy consumption, that is, to reduce the number of physical machines activated and network devices applied, and virtualized unit placement can be abstracted as packing problem, which is an NP complete problem [10]. On the other hand, it is necessary to consider the transmission of data between virtual units, that is, to reduce the use of network bandwidth. In this case, virtual unit placement can be abstracted as quadratic assignment problem, which is also an NP complete problem [10]. Current studies on cloud workflow scheduling focus on the allocation of workflow tasks under fixed virtualized resources [11], or focus on the flexible resource supply under the change of workflow load, or focus on how to integrate the existing workflow management system into cloud platform [12, 13], but rarely on the adaptive scheduling of the collaboration between workflow task allocation and virtualized resource supply. The execution of cloud workflow in cloud computing environment mainly includes task allocation and resource supply. The dependency and restriction relation between workflow tasks are usually described by directed acyclic graph (DAG). However, whether it is task allocation or resource supply, based on the integrity degree of information about external environment and scheduling object, the research methods used can be generally divided into three categories: static scheduling method, dynamic scheduling method, and hybrid scheduling method, and the relationship among the three types of research methods is shown in Fig. 1.
2 Research on Task Allocation of Cloud Workflow and Analysis on Its Limitations 2.1 Static Scheduling Method Shojafar et al. [14] proposed a hybrid optimization task scheduling algorithm based on fuzzy theory and genetic algorithm, to optimize the load balancing among virtual machines under the constraint of finish time and cost. Abrishami et al. [15] designed a unidirectional IC-PCP based on IaaS cloud and a two-way IC-PCP task allocation
A Survey on Cloud Workflow Collaborative Adaptive Scheduling
123
DAG features, structure, task length, side length and other information are known. Static scheduling
Hybrid scheduling
Dynamic scheduling
All the resources have been instantiated, and the performance is fixed. Conduct scheduling before the task runs. Task execution time and communication time can be estimated. Static planning is performed based on the estimated information before the task runs, but dynamic allocation is only conducted at run time. Due to incomplete DAG and resource information, task execution time and communication time can only be obtained when the task is running. Conduct scheduling when the task is running. For each scheduling time, select the task to be allocated and deploy it to the selected resource.
Fig. 1 Diagram of cloud workflow scheduling research methods
algorithm with dead-time constraint, respectively, based on their research on grid workflow scheduling, and pointed out the time complexity of these two algorithms is only polynomial time. Fan et al., Tsinghua University, [16] designed a multi-objective and multi-task scheduling algorithm based on extended order optimization, which greatly reduced the time cost and also proved the suboptimality of the algorithm. He et al., Dalian University of Technology, [17] proposed a workflow cost optimization model with communication overhead, and a hierarchical scheduling algorithm based on the model. Liang et al., Wuhan University, [18] proposed a multi-objective optimization algorithm of task completion time and reliability simulating the combination of annealing algorithm and genetic algorithm. Peng et al., Taiyuan University of Technology, [19] for the security threats confronted with cloud workflow scheduling problems, applied the security of cloud model quantization tasks and virtual machine resource, measured the security satisfaction degree of virtual machine resource allocated to the task by users through security cloud similarity degree, established a cloud workflow scheduling model considering security, completion time, and usage cost, and then proposed a cloud workflow scheduling algorithm based on discrete particle swarm optimization.
2.2 Dynamic Scheduling Method Szabo et al. [20] designed a dynamic allocation algorithm of task based on evolutionary method for data-intensive scientific workflow application, while taking into account the constraints of network transmission and execution time. Chen et al. [21] designed a dynamic task reordering and scheduling algorithm with priority constraint for the fair allocation of multiple workflow tasks. Li Xiaoping and his team of Southeast University had done a lot of research on dynamic task scheduling, and recently published a cost optimization algorithm of cloud service workflow with the constraints of preparation time and deadline [22]. The algorithm establishes corresponding integer programming mathematical model and introduces the strategy of
124
D. Cui et al.
individual learning from the global optimal solution, to improve the global search and local optimization ability of the algorithm. Li et al., Anhui University, [23] designed a matrix partition model based on data dependency destructiveness, aiming at the deficiencies of traditional data layout method and combining with the characteristics of data layout in hybrid cloud, and proposed a workflow task layout method facing data center, and the workflow tasks with high dependency are generally put in the same data center, thereby reducing the transmission time of data set crossing data center. Peng et al. designed a scheduling algorithm of cloud users task based on DAG critical path [24], and its improved algorithm [25–31].
3 Research on Resource Supply of Cloud Workflow and Analysis on Its Limitations 3.1 Static Scheduling Method Rodriguez and Buyya [32] designed a scheduling general model of cost minimization workflow under the constraint of execution time and designed a cloud resource scheduling supply algorithm based on particle swarm optimization algorithm. Chen et al., Sun Yat-sen University, [33] designed a cloud workflow resource supply strategy based on two-stage genetic algorithm for the problem of violating dead-time constraint in particle swarm optimization algorithm, based on the general model. The algorithm first searches the execution time of optimization task under the constraint of deadline, and then the feasible solution obtained is used as the initial condition to search for the resource supply scheme with minimal rental cost. Wang et al., Northeastern University, [34] proposed a cloud workflow scheduling algorithm based on service quality constraint for the low efficiency of using workflow scheduling strategy to schedule instance-intensive workflows in cloud environment.
3.2 Dynamic Scheduling Method Byun et al. proposed a hybrid cloud-based resource supply algorithm BTS [35] and its improved algorithm under the constraint of the cycle variation of virtual unit price [36]. The core idea of the two algorithms is to conduct priority setting according to the scheduling delay of the workflow task, and the task with smaller delay is assigned higher priority and more resources. Peng et al. proposed a resource scheduling algorithm based on reinforcement learning [37] by abstracting the resource scheduling in cloud environment into sequential decision problems, and introduced two performance indexes, namely segmented service level agreement and unit time cost utilization, to redesign the return function. Aiming at the problem of virtualization placement, a multi-objective comprehensive evaluation model of virtual machine
A Survey on Cloud Workflow Collaborative Adaptive Scheduling
125
is designed [38], and a multi-objective particle swarm optimization algorithm is proposed to dynamically place the virtual machine.
4 Hybrid Scheduling Method Maheshwari et al. [39] designed a multi-site workflow scheduling algorithm based on resource execution power and network throughput prediction model for hybrid resource supply environment. Lee et al. [40] introduced a simple and efficient objective function to help users make decisions for task-package application in hybrid cloud environment. Li et al. [41] designed a fair scheduling algorithm of heterogeneous resource in hybrid cloud, and introduced fairness index of superior resource to achieve the mutual restraint of the resource allocation among users and applications. For the problems of low utilization rate of private cloud and high cost of public cloud in hybrid cloud scheduling, Zuo et al. proposed two scheduling strategies, deadline-first and cost-first strategies based on performance and cost objectives [42], and established task and resource models in hybrid cloud, which can adaptively select suitable scheduling resources according to the task requirements submitted by users. Tasks with higher deadline requirement can be scheduled to the public cloud first, tasks with high cost requirement can be scheduled to private cloud first, and both two strategies can meet the deadline and a certain cost constraint.
5 Research on Adaptive Cooperative Scheduling and Analysis on Its Limitations The University of Southern California has developed Pegasus Cloud Workflow Management System [43], which takes into account the scientific workflow tasks and adaptive scheduling of cloud resources. However, the system needs to be further improved in terms of local user queue detection and user requirements customization. Part of the work of EGEE, global grid infrastructure project of EU, involves applying multi-objective reinforcement learning algorithm to conduct the optimization scheduling of task allocation and resource elasticity supply, respectively, and proposes to achieve the collaborative and integrating scheduling of the two in its alternative project EGI [44]. Vasile et al. [45] designed a resource-aware hybrid scheduling algorithm based on virtual machine groups. The algorithm applies task allocation algorithm of two stages. In the first stage, the user tasks are assigned to the established virtual machine group. In the second stage, classical scheduling algorithm is used to conduct second scheduling according to the number of resources in each group. However, this algorithm ignores the dependencies between workflow tasks, thus leading to the frequent transmission of user data among virtual machine groups, and even network congestion. Jiao et al. [46] proposed a hybrid Petri net model of
126
D. Cui et al.
cloud workflow under cooperative design, applied stochastic Petri net idea to analyze each process flow and work efficiency of collaborative design, and proposed optimization from the aspects of business conflict and model decomposition. But this model is only suitable for the collaborative optimization scheduling of single cloud workflow, so that it is not close to the real application scenario; Table 1 shows a summary of the articles reviewed in this section. Table 1 Summary of the reviewed literature about reinforcement learning in cloud computing envirnomnet Metric
SLA
Workloads
Experimental platform
Technologies
References
Successful task execution, utilization rate, global utility value
Deadline
Synthetic. Poisson distribution
Custom testbed
ANN, on-demand information sharing, adaptive
[47]
User request, Response VMs, time time, cost
Synthetic. Poisson distribution
MATLAB
Multi-agent; parallel Q-learning
[48]
GA
[49]
CPU, memory
Throughput, Real. Wikipedia VMware, Response time trace RUBiS, WikiBench
CPU, Response time Real. ClarkNet memory, I/O trace
Xen, TPC-C, TPC-W, SpecWeb
CMAC, distributed learning mechanism
[50]
MIPS, CPU
Average SLA Violation Percentage, Energy Consumption
Synthetic Real. CoMon project
CloudSim
Multi-agent
[51]
CPU utilization rate
Response time, Energy Consumption
Synthetic. Poisson distribution
Custom testbed
Intelligent controller, adaptive
[52]
Utility accrual
Response time Synthetic. Poisson distribution
Custom testbed
Failure and recovery
[53]
System parameters
Throughput, Synthetic. Response time workloadmixes
Xen, TPC-C, TPC-W, SpecWeb
parameter grouping
[54]
A Survey on Cloud Workflow Collaborative Adaptive Scheduling
127
6 Conclusion and Prospect In summary, at present related domestic and foreign research work mostly concentrates in cloud workflow task allocation or unilateral adaptive scheduling of virtual system resources supply, ignoring the inherent dependency relationship between the two, so it is difficult to achieve interest balance between supply and demand sides of cloud service on the premise of guaranteeing the SLA. The research of adaptive collaborative scheduling on the two is just beginning, the research results are deficient, and the depth of research and the application of research methods are also lacking. But it can be predicted that conducting adaptive scheduling on the two with collaborative method will inevitably become one of the core problems to be solved urgently in the big data processing technology typically represented by cloud workflow application. Acknowledgements The work presented in this paper was supported by National Natural Science Foundation of China (No. 61672174, 61772145). National Natural Science Foundation of China under Grants (No. 61803108). Maoming Engineering Research Center for Automation in PetroChemical Industry (No. 517013), and Guangdong University Student Science and Technology Innovation Cultivation Special Fund (no. pdjh2019b0326, 733409). Zhiping Peng is corresponding author.
References 1. F. Wu, Q. Wu, Y. Tan, Workflow scheduling in cloud: a survey[J]. J. Supercomput. 71, 3373– 3418 (2015) 2. J. Zhang, S.E. Zhuge, Q.K. Wu, Efficient fault-tolerant scheduling on multiprocessor systems via replication and deallocation. Int. J. Embed. Syst. 6(2–3), 216–224 (2014) 3. T. Krylosova, Implementing container-based virtualization in a hybrid cloud. Metro. Ammattikorkeakoulu 1–35 (2014) 4. R.N. Calheiros, Ra Buyya, Meeting deadlines of scientific workflows in public clouds with tasks replication. IEEE Trans. Parallel Distrib. Syst. 25(7), 1787–1796 (2015) 5. G. Menegaz, The future of cloud computing: 5 predictions. http://www.thoughtsoncloud.com/ future-cloud-computing-5-predictions/ 6. IDC, Virtualization and multicore innovations disrupt the worldwide server market. Document number: 206035 (2014) 7. R. Raju, J. Amudhavel, E. Saule, S. Anuja, A heuristic fault tolerant MapReduce framework for minimizing makespan in Hybrid Cloud Environment. in Proceedings of the Green Computing Communication and Electrical Engineering (ICGCCEE 2014) (IEEE, Piscataway, 2014) 8. G. Tian, C. Xiao, Z. XU et al., Hybrid scheduling strategy for multiple DAGs workflow in heterogeneous system. J. Sofw. 23(10), 2720–2734 (2012) 9. Z. Cai, X. Li, J. Gupta, Critical path-based iterative heuristic for workflow scheduling in utility and cloud computing. in Proceedings of 11th International Conference on Service-Oriented Computing (Springer Berlin Heidelberg, Berlin, Germany, 2013), 207–221 10. W. Jing, Z. Wu, H. Liu, J. Dong, Fault-tolerant scheduling algorithm for precedence constrained tasks. Tsinghua Sci.Technol. 51(S1),1440–1444 (2011) 11. E.N. Alkhanak, S.P. Lee, S.U.R. Khan, Cost-aware challenges for workflow scheduling approaches in cloud computing environments: Taxonomy and opportunities. Fut. Gener. Comput. Syst. 50, 3–21 (2015)
128
D. Cui et al.
12. Y. Zhao, Y. Li, R. Ioan, S. Lu, Lin Cui, Y. Zhang, W. Tian, R. Xue, A service framework for scientific workflow management in the cloud. IEEE Trans. Serv. Comput. 8(6), 930–944 (2015) 13. F. Zhang, Q.M. Malluhi, T. Elsayed, S.U. Khan, K. Li, A.Y. Zomaya, CloudFlow: A dataaware programming model for cloud workflow applications on modern HPC systems. Fut Gener. Compu. Syst. 51, 98–110 (2015) 14. M. Shojafar, S. Javanmardi, S. Abolfazli, N. Cordeschi, UGE: A joint meta-heuristic approach to cloud job scheduling algorithm using fuzzy theory and a genetic method. Cluster Comput. 18(2), 829–844 (2015) 15. S. Abrishami, M. Naghibzadeh, D.H.J. Epema, Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds. Fut. Gener. Comput. Syst. 29, 158–169 (2013) 16. F. Zhang, J. Cao, K. Li, S.U. Khan, K. Hwang, Multi-objective scheduling of many tasks in cloud platforms. Fut. Gener. Comput. Syst. 37, 309–320 (2014) 17. H. Guo, Z. Chen, Y. Yu, X. Chen, A communication aware DAG workflow cost optimization model and algorithm. J. Comput. Res. Develop. 52(6), 1400–1408 (2015) 18. T. Huang, Y. Liang, An improved simulated annealing genetic algorithm for workflow scheduling in cloud platform. Microelectron. Comput. 33(1), 42–46 (2016) 19. Y. Yang, X. Peng, M. Huang, J. Bian, Cloud workflow scheduling based on discrete particle swarm optimization. J. Comput. Res. Develop. 31(12), 3677–3681 (2014) 20. C. Szabo, Q.Z. Sheng, T. Kroeger, Y. Zhang, Y. Jian, Science in the cloud: Allocation and execution of data-intensive scientific workflows. J. Grid Comput. 12, 245–264 (2014) 21. W. Chen, Y.C. Lee, A. Fekete, A.Y. Zomaya, Adaptive multiple-workflow scheduling with task rearrangement. J. Supercomput. 71(4), 1297–1317 (2014) 22. H. Shen, X. Li, Algorithm for the cloud service workflow scheduling with setup time and deadline constraints. J. Commun. 36(6), 1–10 (2015) 23. X. Li, W. Yang, X. Liu, H. Cheng, E. Zhu, Y. Yang, Datacenter-oriented data placement strategy of workflows in hybrid cloud. J. Soft. 27(7), 1861–1875 (2016) 24. D. Cui, W. Ke, Z. Peng, J. Zuo, Multiple dags workflow scheduling algorithm based on reinforcement learning in cloud computing. in Proceeding of 7th International Symposium on Computational Intelligence and Intelligent Systems (Guangzhou, China, 2015), pp. 305–311 25. D. Cui, W. Ke, Z. Peng, J. Zuo, Cloud workflow scheduling algorithm based on reinforcement learning. Submit Int. J. High Perform. Comput. Netw. Accepted 26. Z. Peng, D. Cui, J. Zuo, Q. Li, Random task scheduling scheme based on reinforcement learning in cloud computing. Cluster Comput. 18(4), 1595–1607 (2015) 27. Z. Peng, D. Cui, Q. Li, B. Xu, J. Xiong, W. Lin, A reinforcement learning-based mixed job scheduler scheme for cloud computing under SLA constraint. Submit to Soft Comput 28. H.M. Fard, R. Prodan, J.J.D. Barrionuevo, T. Fahringer, A multi-objective approach for workflow scheduling in heterogeneous environments. in Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (Ottawa, Canada, 2012), pp. 300–309 29. Y. Liu, Q. Zhao, W. Jing, Task scheduling algorithm based on dynamic priority and firefly behavior in cloud computing. Appl. Res. Comput. 32(4), 1040–1043 (2015) 30. W. Jing, Z. Wu, H. Liu, Y. Shu, Multiple DAGs dynamic workflow reliability scheduling algorithm in cloud computing system. J. Xidian Univ. 43(2), 92–97 (2016) 31. G. Tian, C. Xiao, J. Xie, Scheduling and fair cost-optimizting methods for concurrent multiple DAGs with Deadline sharing resources. Chin. J. Comput. 37(7), 1607–1619 (2014) 32. M.A. Rodriguez, R. Buyya, Deadline based resource provisioning and scheduling algorithm for scientific workflows on clouds. IEEE Trans. Cloud Comput. 2(2), 222–235 (2014) 33. Z. Chen, K. Du, Z. Zhan, Deadline constrained cloud computing resources scheduling for cost optimization based on dynamic objective genetic algorithm. in Proceedings of the 2015 IEEE Congress on Evolutionary Compution (IEEE, Sendai, Japan, 2015), pp. 708–714 34. Y. Wang, J. Wang, C. Wang, Y. Han, Modified scheduling algorithm for cloud workflow based on QoS. J. Northeastern Univ. (Nat. Sci.) 35(7), 939–943 (2014) 35. E. Byun, Y. Kee, J. Kim, Cost optimized provisioning of elastic resources for application workflows. Fut. Gener. Comput. Syst. 27, 1011–1026 (2011)
A Survey on Cloud Workflow Collaborative Adaptive Scheduling
129
36. E. Byun, Y. Kee, J. Kim, E. Deelman, S. Maeng, BTS: Resource capacity estimate for timetargeted science workflows. J. Paral. Distrib. Comput. 71, 848–862 (2011) 37. Z. Peng, D. Cui, J. Zuo, Research on cloud computing resources provisioning based on reinforcement learning. Mathe. Problems Eng. 9, 1–12 (2015) 38. B. Xu, Z. Peng, F. Xiao, A.M. Gates, J.P. Yu, Dynamic deployment of virtual machines in cloud computing using multi-objective optimization. Soft Comput. 19(8), 2265–2273 (2015) 39. K. Maheshwari, E.S. Jung, J. Meng, V. Morozov, V. Vishwanath, R. Kettimuthu, Workflow performance improvement using model-based scheduling over multiple clusters and clouds. Fut. Gener. Comput. Syst. 54, 206–218 (2016) 40. H.F.M. Reza, Y.C. Lee, A.Y. Zomaya, Randomized approximation scheme for resource allocation in hybrid-cloud environment. J Supercomput. 69(2), 576–592 (2012) 41. W. Wang, B. Liang, Multi-resource fair allocation in heterogeneous cloud computing systems. IEEE Trans. Parallel Distrib. Syst. 26(10), 2822–2835 (2016) 42. L. Zuo, S. Dong, Resource scheduling methods based on deadline and cost constraint in hybrid cloud. Appl. Res. Comput. 33(8), 2315–2319 (2016) 43. E. Deelman, K. Vahi, M. Rynge, G. Juve, R. Mayani, R.F. da Silva, Pegasus in the cloud: science automation through workflow technologies. IEEE Internet Comput. 20(1), 70–76 (2016) 44. http://www.egi.eu/ 45. M.A. Vasile, F. Pop, R.I. Tutueanua, V. Cristea, J. Kołodziej, Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Fut. Gener. Comput. Syst. 51, 61–71 (2015) 46. H. Jiao, J. Zhang, J. Li, J. Shi, J. Li, Cloud workflow model for collaborative design based on hybrid petri net. J. Appl. Sci. 32(6), 645–651 (2014) 47. M. Hussin, N.A.W.A. Hamid, K.A. Kasmiran, Improving reliability in resource management through adaptive reinforcement learning for distributed systems. J. Parallel. Distri. Comput. 75, 93–100 (2015) 48. E. Barrett, E. Howley, J. Duggan, Applying reinforcement learning towards automating resource allocation and application scalability in the cloud. Concurr. Computat.-Pract. Exp. 25(12), 1656–1674 (2013) 49. Y. Guo, P. Lama, J. Rao, V-cache: towards flexible resource provisioning for multi-tier applications in IaaS clouds. in Proceedings of IEEE 27th International Parallel And Distributed Processing Symposium (2013), pp. 88–99 50. J. Rao, X. Bu, C. Xu, K. Wang, A distributed self-learning approach for elastic provisioning of virtualized cloud resources. in Proceedings of IEEE 19th International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems (2011), pp. 45–54 51. F. Farahnakian, T. Pahikkala, P. Liljeberg, J. Plosila, H. Tenhunen, Multi-agent Based Architecture For Dynamic VM Consolidation in Cloud Data centers (2014), pp. 111–118 52. M. Hussin, C.L. Young, A.Y. Zomaya, Efficient energy management using adaptive reinforcement learning-based scheduling in large-scale distributed systems. in Proceedings of International Conference on Parallel Processing (2011), pp. 385–393 53. B. Yang, X. Xu, F. Tan, D.H. Park, An utility-based job scheduling algorithm for Cloud computing considering reliability factor. in Proceedings of International Conference on Cloud and Service Computing (2011), pp. 95–102 54. X. Bu, J. Rao C.Z. Xu, A reinforcement learning approach to online web systems autoconfiguration. in Proceedings of 29th IEEE International Conference on Distributed Computing Systems (2009), pp. 2–11
Lattice CP-ABE Scheme Supporting Reduced-OBDD Structure Eric Affum, Xiasong Zhang, and Xiaofen Wang
Abstract Ciphertext attribute-based encryption (CP-ABE) schemes from lattices are considered as a flexible way of ensuring access control, which allow publishers to encrypt data for many users without worrying about quantum attacks and exponential computational inefficiencies. However, most of the proposed schemes are inefficient, inflexible, and cannot support flexible expression of access policy. To achieve an efficient and flexible access policy expression, we construct CP-ABE scheme from lattice which supports reduced ordered binary decision diagram (reduced-OBDD) structure. This approach is entirely different but can achieve an efficient and flexible access policy. Encryption and decryption are based on walking on the links between the nodes instead of using the nodes. By adopting the strategy of using ReducedOBDD and walking on its links, we can obtain an optimized access structure for our policy which do not only support AND, OR, and Threshold operations but also support negative and positive attributes. We finally prove our scheme to be secured under decision R-Learning with error problem in a selective set model. Keywords Lattice · Access policy · CP-ABE · Reduce-OBDD
1 Introduction In a complex situation such as content-centric network (CCN) environ where data owners do not have control over their own published data, access control cannot be effectively enforced with the traditional one-to-one access control approach [1]. E. Affum · X. Zhang (B) · X. Wang School of Computer Science and Technology, University of Electronic Science and Technology of China, 611731 Chengdu, China e-mail: [email protected] E. Affum e-mail: [email protected] X. Wang e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_12
131
132
E. Affum et al.
The effective way to achieve a required secured data sharing is to provide a more scalable and flexible access control for this pervasive and distributed CCN environment [2]. Fortunately, attribute-based encryption (ABE) cryptosystem has been proposed as a fine-grained access control mechanism for the future CCN. This work will focus on ciphertext-policy ABE (CP-ABE) scheme for its special properties and advantages over other cryptography schemes such as symmetric, asymmetric, and KP-ABE. CP-ABE scheme has impressive properties of maintaining and describing access privileges of users in a more intuitive and scalable way. By using this scheme and without prior knowledge of the receivers of information, data could be shared according to the encrypted policy. There are two approaches of designing the algorithms of ABE schemes, including bilinear map over elliptical curve and lattice-based approach. Almost all of the existing schemes such as [3, 4] are based on bilinear map with a high computational complexity and also cannot address the problem of quantum attacks. To address the problems of quantum attacks, researcher [5] first introduced the idea of lattice into cryptography, and there has been recent progress in the construction of ABE schemes based from lattice. LSSS CP-ABE scheme access policy with lightweight ideal grid based on R-LWE problem was proposed by Tan et al. [6], which is capable to resist collision attack. Yan et al. [7] used LSSS access structure to propose the ideal multi-agency CP-ABE scheme. Wang et al. [8] achieved an effective encryption scheme based on R-LWE with high encryption and decryption run time and an integrity support features based chosen ciphertext security. Authors of [9] proposed CP-ABE scheme-based lattices. Their scheme is flexible and supports (k, n) threshold access policies on Boolean operation. Based on full-rank differences function, authors [10] proposed a large universe CP-ABE scheme to attain improvement in expressing of attributes and unbounded attribute space. An efficient revocable ABE scheme was constructed by [11], their revocation of attributes and granting is based on binary tree approach. A single random vector parameter was selected for nodes corresponding to attributes. In 2018, authors of [12] proposed attribute-based encryption scheme supporting tree-access structure on ideal lattices. They used an expressive and flexible access policy by Shamir threshold secret sharing technology, including “and,” “or,” and “threshold” operations. In order to construct more efficient lattice-based ABE to resolve inefficiency issues in lattice ABE cryptosystem, the accessed structures and some components such as the trapdoor design and the matrix dimension which play a significant role in the construction lattice-based ABE scheme need to optimized. The main contribution of our work is to propose a flexible lattice CP-ABE scheme based on Reduced-OBDD based on R-LWE. This scheme has a compacted and optimized access structure with less nodes and links. Encryption and decryption are based on walking on the links instead of nodes. This means that it has a less encryption and decryption computational time over rings. The proposed scheme supports threshold operation, Boolean operations, and multiple subscribers with positive and negative attributes. The remaining of the paper is organized as follows: The preliminaries are discussed in Sect. 2. We demonstrate our access structure and our scheme in Sect. 3.
Lattice CP-ABE Scheme Supporting Reduced-OBDD Structure
133
The security analysis of the scheme is in Sect. 4. The performance analysis of the simulated result is evaluated in Sect. 5. Finally, this paper is concluded in Sect. 6.
2 Preliminaries 2.1 Lattice Let us consider an integer and a real number as Z and R, respectively, and chosen a real number x from R. We denote x the largest integer less than or equal to x, x which is x − 1 while Round(x) = x + 21 . Let k denotes a positive integer and k represent {1, . . . , k}. Let bold capital letter A and x represent matrices and column vectors, respectively, and x T represent a transposed vector. Let ai represent the i-th column vector of A. The horizontal and vertical concatenation of A/x is presented as [A|Ax] and [A; Ax], respectively. Represent the maximal Gram--Schmidt length of the matrix A as A. Assuming n is the security value, let poly(n) represent a function of f (n) = O(n c ) for some constant, while O(·) represent the factor of log. Let negl(n) represent negligible, if f (n) = O(n c ) for any fixed constant. It is then concluded that an event occurs with probability if it occurs with the probability of at least 1 − negl(n). Given X and Y as two distributions, the distance between them is considered as a function the domain D = 1/2 d∈D |(d) − Y (d)|. Statistically, we can then conclude that, the two distributions are closed if the distance between them is negl(n). Definition 1 (Lattice) A lattice is a set of points in m-dimensional space with a periodic structure, that satisfy the following properties: Given n linearly independent vectors b1 , . . . , bn ∈ Rm , the lattice genern ated by them is defined as: L(b1 , . . . , bn ) = xi bi |xi ∈ Z, 1 ≤ i ≤ n , where b1 , . . . , bn is basis of the lattice.
i=1
2.2 Gaussian Sampling for Rings To compute a solution to a hard problem, trapdoor is used. In this paper, we rely on the lattice trapdoors introduced and implemented in [13]. Let m be a parameter for the generation of trapdoor and define A ∈ Rq1×m as a distributed uniformly generated random row vector of ring element. Giving β Rq it is computationally hard, finding a short vector polynomials. ω ∈ Rq1×m satisfying Aω = β. There must be spherically distributed in the solution with Gaussian with s as a distributed parameter which is ω ← D L ,s . Where a secret share and a pseudo-random trapdoor TA and A, respectively, and benefit from the hardness assumption of ring learning with error.
134
E. Affum et al.
As shown in Algorithm 1, m = [2n logq ] is the length of modulus q in base n of some integers. The parameters σ, q and n are selected based on the secrete parameter λ. Compute public key A by sampling secrete trapdoor TA . For an efficient construction, power of two bases is used. Using Gaussian distribution with a parameter of σ , we obtain a trapdoor of two short vectors, TA = (ρ, u). For the preimage sampling algorithm, we will implement and utilize the algorithm for G − lattice proposed by ¯ as primitive vector proposed by [15], we [14]. Denoting g T = b0 , b1 , . . . , bm−1 generated an efficient preimage sampling, G −lattice and define its efficiency in terms of lattice A and TA as a trapdoor. For the Gaussian preimage sampling in Algorithm 2, to ensure a spherical distribution of Gaussian and for that matter preventive measures against information leakage, a perturb vector p is used for the solution y. In summary, Ay = β, while y ← D L q (A),σs , p ∈ R m , z ∈ R m¯ and m = m¯ + 2, where σs is called spectral norm, which is a Gaussian distribution parameter y. Given C as constant √ for √ 2 nk + 2n + 4.7 parameter, the spectral norm satisfies σs > C · Q · Algorithm 1 Generation of trapdoor
Algorithm 2 Gaussian preimage sampling
2.3 Decision Ring-Learning with Error Problem Given n as security parameter, let d and q be integers depending on n. Where f (x) = (x n + 1) and Rq = R/q R, let R = Z [x]/( f ). Given a distribution χ over Rq depending on n, the decision learning with error problem instance consists of access to an unspecified challenge oracle o, either a noisy pseudo-random sampler Os , for random secrete key S ← Rq , or a truly random sampler O$ . The decision RLWE problem is to distinguish the sampling between Os and O$ , i.e., Os : Given a uniform distribution constant invariant value across invocation as S ∈ Zqn , a new sample xi ∈ Zq from χ and a uniform sample u ∈ Zqn . Output a sample of form as
Lattice CP-ABE Scheme Supporting Reduced-OBDD Structure
135
(u i, vi = u i .u iT S + xi ) ∈ Zqn × Zq . O$ : An exact uniform output sample (u, v) drawn from Zqn × Zq . The aim of the decision ring-LWE problems is to allow a repeated querries to be sent to the challenge oracle O. The attacker’s algorithm decides the os = 1] − Pr[Attackero$ = decision ring-learning with error problem if |Pr [Attacker
n 1]| is non-negligible random value for s ∈ Zq . x defining integer x ∈ Z, noise distribution X LWE problem is hard as as SIVP and GapSVP under reduction
3 Our Construction Let the access policy of a Boolean function be f (u 0 , u 1 , . . . u n−1 ), where (0 ≤ i ≤ n − 1) and n is the whole number of attributes, denote a sequential predefined access policy number as u(0 ≤ i ≤ n − 1). The function f (u 0 , u 1 u n−1 ) is converted between fundamental logical operations such as AND, OR, and NOT. An operation is considered as threshold gate T (t, n) if and only if t attributes of a subset n can complete this operation successfully. To be able to decrypt a message in some security system, a user must be able to complete some specific threshold operations. To construct a Boolean function of a given T (t, n ∈ N ), where N is the attribute set. Extract all the subset of N with t attributes and separately compute the number of subsets C(n, t) = Com1, Com2 . . . ComC(n,t) by using permutation and combination. This is followed by the construction of a separate set level conjugate for each subset C(n, t) = Con1, Con2 . . . Con (n,t) . Finally, obtain Boolean function C(n,t) of f (t, n) = ∨i=1 Coni by a disjunctive operation on C(n, t).
3.1 Construction of Access Structure Based on Reduced-OBDD To construct reduced-OBDD for Boolean function f (x1 x2 . . . xn ) in Fig. 1a, Algorithms 3 and 4 based on the expansion theorem of Shannon are used. To obtain a specific a unique ordered-BDD, the variable ordering must be specified to give a specific diagram. Let N = {0, 1, 2, . . . n} be node numbers with a condition that the low terminal node is 0 and the high terminal nodes is 1. However, the terminal nodes have specific meaning and their attributes may not be considered. The variable ordering related to N is = (x0 0, then there exist an algorithm that can distinguish Z q , n, ψα − LWE problem with advantage of ε. The problem of LWE is provided as sample oracle O which can be really random O$ or noisy pseudo-random for some secret key S ∈ Z np . Initialize: Adversary A, send the access structure τ = Ndi∈I to the id∈I D challenger’s simulator B. Instance: B make a request to the oracle and the oracle respond by sending new pairs of (1 , υ1 ) ∈ Z qn × Z q to obtain m s1 si + 1, where i ∈ {1, 2, . . . , s} Target: A makes announcement of the the set of attributes that it is intended to challenge Setup: The public parameters are generated by B. Let us denoteυ as 1 . For Ai, j ∈ A∗s , generate Ai, j =
1 i
m 2 , where , , . . . , i i p=1 s p + j p=1 s p + j si 0 i = (1, . . . , s) and j = (1, . . . , si ) to obtain Ai = j=1 Ai, j . / A∗s by running trapdoor B generate matrix for each ai, j where attributes and ai, j ∈ ∗ m×m and compute Ai, ∗j = Ai Ri,∗ j . Using algorithm generate random matrix Ri, j ∈ Z q p=1 s p + j
140
E. Affum et al.
¯ Ai, j as an input, B generate random matrix Ri,∗ j ∈ Z qm×m and a trapdoor T Ai∗ j ∈ Z qm×u for Lq⊥ Ai, ∗j . Finally, B generate Ai = Ai0 Ri, j and set PP = (Ai )i∈As , α to the A. ∗ Phase 1: A sends private key request for a set of attributes A G = ∗ as1 , a2∗ , · · · , a ∗j , where a1∗ ∈ / τ ∗ , ∀i. B compute s share of α and any user can get a share of ai . For any legitimate path, / there must exist an attributes a j ∈ A I satisfying cases such as a j ∈ As* ∧ a−j = a j ∈
As* ∧a−j = a j . Generally, A attributes should satisfy the condition; a j ∈ / A∗s ∧a−j = a j
and each attributes are assigned as follows: a j ∈ / As* ∧ a j = a j , y −
aj
= m · ya j ; for
ai = a j . B runs the key generation algorithm to generate Ai j = Ai Ri−1 j and a trapdoor 1 ∗ m+w ¯ for L q Ai j and invoke Gaussian algorithm to output di, j ∈ matrix T A ∈ Z q ¯ Z qm+v = Z qm Set the private key of a ∗j as S K u = di, j ai, j∈As to the A Challenge: A agrees to accept the challenge and submit challenge message (m 0 , m 1 ) ∈ {0, 1} with the attribute set a ∗j and flips a coin to generate randomly m ∈ (0, 1). B generate ciphertext as C T ∗ = (C0 , {Ci }i, j|i∈τ ) to A
where C0* = α T s + θ + m q2 mod q and Ci,(i)j * = Ai, Tj · si + θi, j mod q It is clear that the encrypted message C T ∗ is a valid encryption of m under the access policy of τ if O = O$ . The encrypted message is uniform in Z p , Z qm and O = O$ υ, υi, j is uniform in Z p , Z qm . Phase 2: A continues by repeating Phase 1 Decision: A output a guess m for m. If m = m the challenger considers the samples O to be Os sample, else it guesses them as O$ samples. Assuming the adversary A can correctly guess m with probability of at least 1/2 + ε. Then, A canmake a decision of the decisionring-LWE problem with an advantage of 1 1 Prob. m = m|(w, u) ← Os + Prob. m = m|(w, u) ← O$ 2 2 1 1 1 1 ε 1 × +ε + × = + = 2 2 2 2 2 2
5 Performance Evaluation The implementation was conducted on Intel i7-8700 processor at 2.53 GHz and 8 GB memory running Windows 10 operating system of 64 bits. We simulated with PALISADE library version on C++ [16]. The analysis of the implementation results entails the comparison of execution time and storage capacity of ciphertext, key generation, encryption, and decryption operations. The capacity analysis entails public parameters PP size, secrete key SK size, masters key MK, and the ciphertext
Lattice CP-ABE Scheme Supporting Reduced-OBDD Structure
141
Table 3 Comparison in terms of access structures, operations, and supported attributes Scheme
Access structures
Operations AND
OR
Supporting attributes Threshold
[17]]
Tree
Yes
Yes
Yes
Positive
Ours
Reduced-OBDD
Yes
Yes
Yes
Negative and positive
size. The parameters are set based on [17]. As represented in Table 1, the storage size of our PP, MK, SK, and ciphertext is smaller than the scheme [17]. Table 2 summarizes the comparison analysis of the execution time of our proposed scheme and [17]. s = (4/8/12/16/20/) represents different number of attributes used. Our construction is faster than scheme [17] in so many aspects as shown in Table 2. Although, the key generation process was a little slow but the encryption and decryption processes are very fast. This is based on the efficiency of our optimized access structure and the technique of using the link instead of the nodes. Table 3 compares our scheme and scheme [17] in terms of access structure, supporting operations and supported attributes. Scheme [17] support tree access structure. However, we used reduced-OBDD which has been ordered and reduced with special properties of nonredundancy, uniqueness and canonicity. Our scheme does not only support AND, OR and Threshold but it can also support negative attributes. In scheme [17], key generation encryption and decryption operations are based on the number of nodes but ours is based on the number of paths which is less than the number of nodes. This makes our scheme more efficient than scheme [17]. On a whole, our scheme is practical with respect to execution time, storage size and secured against quantum attacks.
6 Conclusion To ensure an efficient and flexible CP-ABE scheme which is secured and resistant to quantum attacks, we proposed a lattice CP-ABE scheme supporting reduced-OBDD with more efficient and flexible access structure for the access policy expressions. This scheme supports multiple occurrences, Boolean operations, and attributes of positive and negative features in the access policy expression. Our scheme is secured under decision ring-learning with error problem in selective set model. In future, we will investigate into how to revoke attributes and also improve on the efficiency of the key generation, encryption, and decryption operations of our schemes. Acknowledgements This work is supported by the National Natural Science Foundation of China under Grants 61502086, the foundation from the State Key Laboratory of Integrated Services Networks, Xidian University (No. ISN18-09).
142
E. Affum et al.
References 1. H. Yin, Y. Jiang, C. Lin, C.Y. Luo, Y. Liu, Big data: Transforming the design philosophy of future Internet. IEEE Net. 28(4), 1419 (2014) 2. B. Anggorojati, P.N. Mahalle, N.R. Prasad, R. Prasad, Capability based access control delegation model on the federated IoT network. in Proceedings of IEEE International Symposium on Wireless Personal Multimedia Communications (2012), pp. 604–608 3. A. Grusho et al., Five SDN-oriented directions in information security. in IEEE Science and Technology Conference (2014), pp. 1–4 4. E. Affum, X. Zhang, X. Wang, J.B. Ansuura, Efficient CP-ABE scheme for IoT CCN based on ROBDD. in Proceedings International Conference Advances in Computer Communication and Computer Sciences, 924 (2019), pp 575–590 5. M. Ajtai, Generating hard instances of lattice problems (extend abstract). in Proceedings of STOC ACM (1996), pp 99–108 6. S.F. Tan, A. Samsudin, Lattice ciphertext-policy attribute-based encryption from RingLWE (IEEE Int. Sym. Technol. Manage. Emer. Technol., Langkawi, 2015), pp. 258–262 7. X. Yan, Y. Liu, Z. Li, Q. Huang, A privacy-preserving multi-authority attribute-based encryption scheme on ideal lattices in the cloud environment. Proc. Netinfo Sec. 8, 19–25 (2017) 8. T. Wang, G. Han, J. Yu, P. Zhang, X. Sun, Efficient chosen-ciphertext secure encryption from R-LWE. Wirel. Pers. Commun. 95, 1–16 (2017) 9. J. Zhang, Z. Zhang, A. Ge, Ciphertext policy attribute-based encryption from lattices. in Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security (2012), pp. 16–17 10. Y.T. Wang, Lattice ciphertext policy attribute-based encryption in the standard model’. Int. J. Netw. Sec. 16(6), 444–451 (2014) 11. S. Wang, X. Zhang, Y. Zhang, Efficient revocable and grantable attribute-based encryption from lattices with fine-grained access control. IET Inf. Secur. 12(2), 141–149 (2018) 12. J. Yu, C. Yang, Y. Tang, X. Yan, Attribute-based encryption scheme supporting tree-access structure on ideal lattices. in International Conference on Cloud Computing and Security (Springer, 2018), pp. 519–527 13. R.E. Bansarkhani, J.A. Buchmann, Improvement and efficient implementation of a lattice-based signature scheme selected areas in cryptography. SAC 8282, 48–67 (2013) 14. N. Genise, D. Micciancio, Faster Gaussian sampling for trapdoor lattices with arbitrary modulus (Cryptol. ePrint Arch., Report, 2017), p. 308 15. D. Micciancio, C. Peikert, Trapdoors for lattices: Simpler, tighter, faster, smaller. in EUROCRYPT (2012), pp. 700–718 16. The PALISADE Lattice Cryptography Libraryn https://git.njit.edu/palisade/ (PALISADE, 2018) 17. Y. Liu, L. Wang, L. Li, X. Yan, Secure and efficient multi-authority attribute-based encryption scheme from lattices. IEEE Access 7, 3665–3674 (2018)
Crypto-SAP Protocol for Sybil Attack Prevention in VANETs Mohamed Khalil and Marianne A. Azer
Abstract VANETs are considered as sub-category from MANETs. They provide the vehicles with the ability of communication among each other to guarantee safety and provide services for drivers. VANETs have many network vulnerabilities like: Working on wireless media makes it vulnerable to many kinds of attacks and nodes can join or leave the network dynamically making change in its topology which affects communication links stability. In VANETs, each car works as a node and router, so if a malicious attacker joins the network, the attacker could send false messages to disrupt the network operation and that is why VANETs are vulnerable to many types of attacks. Denial of service, spoofing, ID disclosure, and Sybil attacks can be launched against VANETs. In this paper, we present cryptographic protocol for Sybil Attacks Prevention (Crypto-SAP) which is a new protocol. Crypto-SAP uses symmetric cryptography to defend VANETs against Sybil nodes. Simulations were done to investigate how Crypto-SAP protocol affects the network performance. Keywords Ad hoc · Sybil attacks · Symmetric cryptography · VANETs
1 Introduction In vehicular ad hoc networks (VANETs), each vehicle acts as a host and a router at the same time. On-board unit (OBU) is an embedded device in the vehicle which makes it work as a node inside the network. Another entity called roadside unit (RSU) exists in VANETs for network management. VANETS are important for safety applications such as warning the drivers if people crossing the street, giving alarm while driving M. Khalil (B) · M. A. Azer (B) Nile University, Giza, Egypt e-mail: [email protected] M. A. Azer e-mail: [email protected] M. A. Azer National Telecommunication Institute, Cairo, Egypt © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_13
143
144
M. Khalil and M. A. Azer
so fast, giving information about the traffic signals and road signs and informing the driver if an accident happens or if there is a traffic jam to take another route. VANETs can support non-safety applications as well such as giving access to some web applications. This type of networks is characterized by having high mobility, frequent changing information, variable network density, better physical protection, and no power constraints. Vehicles can communicate with each other which is called vehicular to vehicular communication (V2V) and can communicate as well with RSUs and that is called (V2R) communication [1]. As VANETs are wireless networks, they are vulnerable to several types of attacks such as channel jamming which can result in denial of service (DoS). Another attack can be implemented against VANETs called timing attack in which a malicious vehicle can delay forwarding messages making them useless, as the receivers get the messages after the needed timing. Ad hoc networks, in general, suffer from many security challenges such as “authentication and identification” which means each node in the network must have single unique identity and authenticated using this identity within such networks. They also suffer from privacy and confidentiality challenges. These challenges can be solved by cryptography and that’s why we are going to introduce Crypto-SAP protocol to resolve these challenges and prevent Sybil attacks. Sybil attacks are very dangerous against VANETs. A malicious vehicle acts as many different vehicles with fake different identities and locations which can result in traffic congestion illusion and making the network works badly. The rest of this paper is organized as follows. Section 2 presents the work that has been done in the literature to mitigate Sybil attacks. In Sect. 3, we present our proposed scheme for Sybil attacks detection in VANETs as well as simulation results. Finally, future work and conclusions are presented in Sect. 5.
2 Literature Review In this section, we classify and present the different approaches that were used in the literature to prevent or detect Sybil attacks in VANETs.
2.1 Reputation Systems Reputation systems were proposed [2, 3] to maintain trust between nodes in VANETs based on having a scoring table at each node for the other nodes inside the network. However, the attackers are capable of changing reputation points by making use of the new created identities especially in the case of multi-nodes Sybil attacks.
Crypto-SAP Protocol for Sybil Attack Prevention in VANETs
145
2.2 Time Stamp In [4, 5] detecting the Sybil attacks by using time-stamp series approach based on RSU support was presented. Two messages with same time-stamp series mean that they are Sybil messages sent by one vehicle. However, timing is not accurate within communication and small differences could be used by Sybil attacker to overcome this technique. Another disadvantage occurs when RSUs are implemented in intersections, a malicious vehicle could obtain multiple time stamps that have large time difference from other vehicles by stopping nearby the RSU.
2.3 Key Based In [6, 7] Sybil attacks detection using public key cryptography was proposed. The authors used public key infrastructure for VANETs (VPKI) to secure communication and addressed a mechanism for key distribution. One drawback is overloading the network and processing resources while revoking the certificates when the vehicle needs to join a new RSU network. Revoking the certificate of a malicious vehicle takes long time as well.
2.4 Received Signal Strength It is noticed by researchers that Sybil nodes have high signal strength, so by investigating received signal strength (RSS) value of vehicles, authors of [8, 9] found a threshold to use and detect Sybil attacks. However, transmission signal power can be tuned and make this technique limited.
2.5 Position Authors of [10, 11] presented a new technique to detect Sybil attacks through vehicles’ GPS positions. These positions are periodically broadcasted within the network. This method does not need any extra hardware. Therefore, it is a lightweight technique. However, by sending false GPS positioning information through manipulating network packets this security measure could be bypassed.
146
M. Khalil and M. A. Azer
2.6 MAC Address Another technique to detect Sybil nodes was proposed in [12] by using MAC addresses to verify nodes’ identities. This method protects the network against having multiple idetities for the same MAC address. However, MAC addresses are shared inside the network and can be collected by a sniffer, then it can be used as a spoofed MAC address for authentication and performing Sybil attacks.
3 Crypto-SAP Protocol We present Crypto-SAP scheme in this section. Basic requisites for the new protocol are presented in Sect. 3.1 and the new scheme algorithm is presented in Sect. 3.2. Section 3.3 provides security analysis of the protocol using BAN-Logic.
3.1 Prerequisites • • • •
There must be a law to install special OBUs in vehicles. A unique ID is applied for each OBU and it is known by the government. MAC address vs ID database is kept secret in a secure cloud database. The cloud database is wired connected to RSUs.
3.2 Proposed Scheme Algorithm The proposed protocol consists of three main stages. In the first stage of the proposed protocol, the system has to ensure the validity of the requester. In other words, the vehicle which wants to gain authentication to join the RSU network has to provide its MAC address and this MAC address has to be existing in the MAC database connected to the RSU. After verification, the RSU asks from neighbor RSUs to drop this “MAC address” from their authenticated list not to have repetition for the same vehicle on the road. The second stage is for ID verification. RSU sends to the requesting node an encrypted message using the unique ID as the encryption key. “ENCID (MSG)” the node receives the encrypted message and decrypts it automatically using its own ID and gets the message in a plain text. Then it runs its message authentication code on the message using the ID as the main key and outputs a tag “t = HMACID (MSG)” using SHA2, the one way hash function, then encrypts the tag using the ID and sends it to the RSU. The RSU decrypts the incoming message to get the tag and verify it by checking whether the tag is the corresponding output tag for that ID or not
Crypto-SAP Protocol for Sybil Attack Prevention in VANETs
147
In the third and final stage, the Network Key (NK) is given to the requesting node if the ID has been verified. NK is a symmetric key generated by RSU for the vehicles to communicate with each other securely after successful authentication. Each RSU has a different NK and these NKs are periodically changed not to have an old key cached in a malicious vehicle and that can cause many problems. The RSU sends its network key encrypted by the ID of the vehicle. A vehicle could contact with other vehicles via encrypted messages using the given NK from the RSU as the encryption-decryption key and contact the RSU by encrypting the messages using its own ID. RSU sends the MAC address of the authenticated vehicle to other authenticated vehicles in the network to be trusted. The flow chart of the proposed system is presented in Fig. 1 while the messages exchanged between the OBU and RSU for the authentication are illustrated in Fig. 2 and Table 1. A session counter is applied for the RSU to count the messages exchanged between itself and the vehicle to protect the network against replay attacks [13]. If the network has a situation of a Sybil attack node, the malicious vehicle will try to send a fake MAC address to the RSU to gain at least one more identity. Then, it will be asked to decrypt a message using the ID related to that MAC address (if it exists already in the database) and it will not be able to do so as IDs cannot be shared by any means over the network. That makes it impossible for a malicious vehicle to perform Sybil attacks. We have done some simulations to validate the system using network simulator 2 (NS2). AODV routing protocol was selected in the simulations because route discovery is initiated only when a certian node needs to be in contact communicate
Fig. 1 Proposed scheme flowchart
148
M. Khalil and M. A. Azer
Fig. 2 Messages exchanged between RSUs and vehicles
Table 1 Notations used in the proposed scheme
Symbol
Meaning
||
Appending notation
MSG
Message created form RSU
SHA (MAC)
The one-way hashed function of the MAC address
EncID (MSG)
Encrypt MSG with the ID
DecID (MSG)
Decrypt MSG by the ID
RTAG
Received Tag from OBU
GTAG
Generated Tag at RSU
NK
Network key for the RSU network
with another node which results in less loading on the traffic of the network. Advanced encryption standard (AES) has been used in the encryption and decryption process while SHA2 has been used for hashing. Five scenarios were used to simulate Crypto-SAP against Sybil attackers in VANETs. The five scenarios had ten, twenty, thirty, forty, and fifty nodes. Each scenario was simulated in the presence of Sybil nodes representing ten, twenty, and thirty percent. Figure 3 illustrates the packet delivery ratio in all scenarios while Fig. 4 depicts the throughput within the simulation. It can be seen that the packet delivery ratio (PDR) decreases with the increase of the number of the attacking nodes, which meets the expectations. On the other hand,
Crypto-SAP Protocol for Sybil Attack Prevention in VANETs
149
Fig. 3 Packet delivery ratio within the simulation
Fig. 4 Total throughput within the simulation
throughput seems to slightly decrease versus the increase of the number of attacking nodes. We have benchmarked our proposed scheme vs. the MAC Address verification method [12]. For malicious node detection, it can be seen from Fig. 5 that Crypto-SAP method outperforms the MAC verification method [12] in detecting Sybil attacks. The proposed Crypto-SAP protocol does not need any certification authorities (CAs). RSUs are going to need access control management to be able to query the information they need from the cloud database. It has been shown that Crypto-SAP protocol needs only few sending and receiving messages, which minimizes the delay and makes the mobility management much easier. The scheme never transmits clear text messages between RSU and a vehicle or among the vehicles themselves which solves one of the main security challenges in VANET, confidentiality. This system can
150
M. Khalil and M. A. Azer
Fig. 5 MAC-trust detection method versus Crypto-SAP
be used besides preventing Sybil attacks by the government to trace stolen vehicles and can help in forensics by analyzing the logs of RSU networks.
3.3 Security Analysis of Crypto-SAP using BAN Logic To prove the security of our protocol, we will perform an analysis with the wellknown BAN Logic algorithm. There are some basic rules of BAN Logic security module and they are: • If P (a network agent) believes that P and Q (another network agent) can communicate with the shared key K and P receives message X encrypted with the Key K ({X}K ), then P believes that Q sent X. This step is to verify the message origin. • If P believes that Q transmitted X and P believes that X has not previously been sent in any other messages, then P believes that Q acts as X is true message. This step is to make sure of the freshness of the message as X may be replayed by an attacker. • If P believes that Q has jurisdiction over X and P believes that Q acts as X is true message, then P believes X. This last step or rule is to verify the origin’s trustworthiness. So, in our algorithm, we have a basic assumption and it is that OBUi and RSU have a symmetric key (IDi) already shared by law as illustrated previously and our main
Crypto-SAP Protocol for Sybil Attack Prevention in VANETs
151
goal is to make sure that this particular OBU has that particular ID corresponding to its MAC address. Step 1: While OBU tries to authenticate itself in the RSU network and get the Network Key (NK), it must first initialize the communication with the RSU with its MAC and there is a session counter to make sure of the freshness of the message (1). RSU will assume that it believes the particular OBU has jurisdiction over the MAC message. Step 2: Once the message received by the RSU, it sends an encrypted message (X) using IDi as a symmetric key to the OBU. OBU will assume that it believes RSU has jurisdiction over the encrypted message. Step 3: OBU can decrypt the message if and only if it has the corresponding ID to the MAC sent in step 1. Then it can generate a Tag containing the SHA of ID and X and encrypt that Tag with the ID. Finally, it sends the tag back to the RSU. Step 4: The RSU decrypts the Tag and generate the correct Tag then compare it with the received decrypted Tag. If both Tags are identical then the assumption made in step 1 is correct and RSU verifies the MAC origin (2). RSU also verifies the trustworthiness of the OBU (3). Step 5: OBUi receives NK from the RSU encrypted using IDi and OBU verifies the correction of the assumption made in step 2 (4). From (1), (2), (3) and (4) we can conclude that our algorithm is secure according to BAN Logic security module and our goal has been accomplished.
4 Conclusions The different methods to detect and prevent Sybil attacks in VANETs are classified and proposed in the literature. Cryptographic scheme for Sybil attack prevention (Crypto-SAP) protocol is presented as a new approach to protect VANETs form Sybil attacks. This protocol solves the confidentiality issue among the different nodes in the network. Simulations were done to check the performance and the ability of Crypto-SAP using NS2, and the results show that no Sybil attack succeeded against the proposed scheme, but the performance decreases while the increase of simultaneous Sybil attacks attempts. For future work, it is planned to use this approach in conjunction with other techniques to detect malicious behaviors of the nodes inside VANETs and isolate the malicious nodes out of the network.
References 1. A. Ullah et al., Advances in position based routing towards ITS enabled FoG-Oriented VANETA survey. in IEEE Transactions on Intelligent Transportation Systems (2019) 2. S. Buchegger, J. Le Boudec, A Robust reputation system for P2P and mobile ad hoc networks. in Proceedings WkshpEconomics of Peer-to-Peer Systems (2004)
152
M. Khalil and M. A. Azer
3. S. Marti, T.J. Giuli, K. Lai, M. Baker, Mitigating routing misbehavior in mobile ad hoc networks. in Mobile Computing and Networking (2000), pp. 255–265 4. S. Park, et al., Defense against sybil attack in vehicular ad hoc network based on roadside unit support.. in MILCOM 2009 (IEEE, 2009) 5. K. El Defrawy, T. Gene Tsudik, Prism: Privacy-friendly routing in suspicious manets (and vanets). in 2008 IEEE International Conference on Network Protocols (IEEE, 2008) 6. C. Zhang, On Achieving Secure Message Authentication for Vehicular Communications. Ph.D. Thesis, Waterloo, Ontario, Canada 2010 7. A.H. Salem et al., The case for dynamic key distribution for PKI-based VANETs. arXiv preprint arXiv:1605.04696 (2016) 8. G. Jyoti, et al., RSS-based Sybil attack detection in VANETs. in Proceedings of the International Conference Tencon2010 (IEEE, 2010) 9. A. Sohail et al., Lightweight sybil attack detection in manets. IEEE Syst. J. 7(2) (2013) 10. H. Yong, J. Tang, Y. Cheng, Cooperative sybil attack detection for position based applications in privacy preserved VANETs. in GLOBECOM 2011. (IEEE, 2011) 11. G. Yan, O. Stephan, C.W. Michele, Providing VANET security through active position detection. Comput. Commun. 31(12) (2008) 12. P. Anamika, M. Sharma, Detection and prevention of sybil attack in MANET using MAC address. Int. J. Comput. Appl. 122(21), 201 13. K. Jonathan, Y. Lindell, Introduction to Modern Cryptography (CRC Press, 2014)
Managerial Computer Communication: Implementation of Applied Linguistics Approaches in Managing Electronic Communication Marcel Pikhart and Blanka Klímová
Abstract Managerial communication in the past few decades has been much influenced by new trends in computer science bringing new means of communication. The use of computers for communication is ubiquitous and the trend toward computerization will be even stronger soon. However, the intercultural aspect of this communication must not be neglected as it poses a potential threat to company management and all the communication processes in the company and global business environment. Applied linguistics presents a potentially useful tool which could be used for IT developers of communication tools and apps so that they take into consideration critical aspects of human communication which must be implemented in these modern communication tools and methods. Applied linguistics and its practical implementation can prove very useful and can reinforce managerial communication when using modern technological tools. ICT departments, in the past few years, have been focusing more on technological aspects of communication when creating these tools; however, now it is clearly visible that there is an urgent need to have more parameters embedded in the communication tools and one of them is the interculturality aspect because electronic or computer communication is now in its essence global and intercultural. The conducted research into the intercultural communication awareness in ICT departments proves that the need for bringing this topic to priority in education ICT students is crucial to further competitiveness and lossless transfer of information. Keywords Communication · Managerial communication · Business communication · Electronic communication · Computer communication · Intercultural communication
M. Pikhart (B) · B. Klímová Faculty of Informatics and Management, University of Hradec Kralove, Hradec Kralove, Czech Republic e-mail: [email protected] B. Klímová e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_14
153
154
M. Pikhart and B. Klímová
1 Managerial Electronic Communication Managerial electronic communication is ubiquitous [1] all over the world and the trend of making communication more ICT based is evident [2]. Business communication and managerial competencies are considered to be culturally specific [3, 4] and it is a truism in today’s global world of interconnectedness [5, 6]. The interculturality means that what is acceptable for one culture will not be acceptable in another and one cultural context of communication may not overlap with another one, thus causing misunderstanding and conflicts in communication when electronics means of communication are used [7–9]. Business organizations and also ICT departments are based in a given culture, and therefore, necessarily represent and reflect the dominant paradigm of the society in which they are based. The western, European or the American model of communication is therefore not applicable to all global cultures and the ethnocentric communication paradigms are no more valid and available to be used when creating communication tools and apps [10, 11]. The obsolete paradigm of communication based on the western model cannot lead to success and cooperation, nor does it support competitiveness in the global business world [12]. Therefore, it is crucial for the employees of ICT departments to understand new communication paradigms and the paradigm shift in the past few years so that they are well equipped with modern tools which they will implement in their work [13, 14]. It will also improve managerial depth when using these communication tools which reflect modern trends of electronic communication [15– 17]. The utilization of managerial communication and its modern trends by ICT employees is crucial for further development of a healthy business environment and future competitiveness in the global world [18–21]. Management supported by ICT departments is reflected in a coordination of human effort when utilizing human and technological tools to achieve desired aims [12, 22, 23]. It is an interaction between human and non-human tools and resources and this interaction is always necessarily culturally rooted and reflects cultural traits of the users, but also of the designers or creators of communication tools [24]. The famous definition of culture by Hofstede as collective mental programming [25] leads us to view ICT as an area for connecting information and society by the electronic tools which are used for storing, processing, warehousing, mining and transferring information [26, 27].
2 Research Background The conducted research was based on literature review and past research done into the area of business electronic communication and the second half of the research was focused on the awareness and implementation of the intercultural aspects into
Managerial Computer Communication: Implementation of Applied …
155
international management in several multinational enterprises doing business in the Czech Republic. The qualitative research was carried out in two multinational corporations doing business in the Czech Republic and focused on potential issues which could influence company internal communication and potential problems which could arise between foreign management and the Czech employees. It was conducted in the first half of the year 2019 in a Taiwanese electronics company and a Belgium ICT company: the top management was questioned (five respondents, all of them foreigners), and middle management (19 respondents, all of them Czech) who communicates with the top management a few times a week as well. The data collection was done by interviews with guided and structures questions about the situation of internal communication using modern communication tools such as the intranet, presentation technology, instant messaging, emails, memos, reports, etc. Standard methods for the analysis of the answers were used.
2.1 Research Hypothesis The hypothesis was that both international management and the Czech as well are not aware of potential cultural issues which could influence the quality of internal electronic communication in the company.
2.2 Research Aim The aim of the research was to bring the attention of the management to the need for increasing the awareness of cultural aspects of doing business which is crucial in everyday managerial practice.
3 Research Results The results of the research are as follows: • Cultural differences which are transferred into communication are not considered by the top management to be very important. They also thought that they will influence communication and information loss only marginally. • Cultural differences which are transferred into communication are very important for the middle management and they see there is a potential for misunderstanding and loss of information during the communication process.
156
M. Pikhart and B. Klímová
• Both the top and the middle management are aware that that communication patterns have changed significantly, and they expect ICT employees to implement these changes into other communication tools they are responsible for. • Both the top and the middle management see the ICT department responsible for flawless processes and communication strategies because ICT is managing and maintaining communication from its technological aspect, and therefore, they must be ready to accommodate the needs of the management and global communication environment. Both researched companies, i.e., Taiwanese and Belgium, are well-established global companies so that communication paradigms used by them are more or less equalized and adapted to the global patterns of communication; however, the management still sees a lot of space for improvement and also a lot of potential problems which could cause significant misunderstanding. The biggest issue we observed was that the company based in the Asian culture is a high-context culture as defined by Hofstede; therefore, they would convey the meaning and ideas in radically different ways than Europeans. Explicitness and openness is not valued in these high-context cultures, and on the other hand, it is viewed as arrogance and even aggression. Moreover, European directness was viewed by Asian management as inappropriate and rude leading to taking offense and canceled communication, even when using electronic means of communication. On the other hand, the Czech management viewed this Asian high-context background as a lack of vision, leadership and was no motivation for the employees. Another issue view in these companies regarding electronic communication was the sense of collectivism in Asian management. They usually transferred the individual achievement to the whole team which was not vied positively by the Czech employees. When negotiation over the internet and instant messaging, the indirect answer from the Asian management was not interpreted as denial even if it had been meant so, but the Czech management understood it incorrectly as the space for further negotiation. The basic Asian way to show disagreement by using indirect communication means rather than confrontation was not understood, and therefore, the information transfer was blurred. It was Hofstede who as early as 1980s highlighted the importance for the Europeans to try to understand the means of communication of the Asian culture because it is the only way how to get through various communication pitfalls. This remains as a mutual challenge for both European and Asian to attempt to get over this obstacle. It is the ICT department who is now facing this challenge as the communication of both global (i.e., Asia and Europe) and electronic (i.e., the Internet, apps, instant messaging, intranet). These two parameters are an underlying principle which must be considered; otherwise, our communication can never succeed.
Managerial Computer Communication: Implementation of Applied …
157
3.1 Research Limitations The conducted research was conducted on a limited sample, further research is needed, however, we still claim that the results have a significance and help us understand current trends in company electronic communication patterns and problems. Further research could be focused on practical consequences of improving communication efficiency in intercultural environment alongside with creation of a certain manual which could help in achieving better communication skills in the intercultural environment for the employees of ICT departments.
4 Discussion The research proved the hypothesis that both the top management of the change of communication patterns and the middle management are aware of the urgent need. Generally, the top and the middle management are not very much aware of the potential problems in internal communication which is caused by the improper information transfer through electronic media due to cultural conditioning of our communication. The globalizing processes which are under way urge us to look for alliances and other forms of business cooperation throughout continents and cultures, and it is the optimized communication which is the basic building block of this multinational cooperation. Communication in various ways, such as verbal vs nonverbal, direct vs indirect, etc., is the means which influences information transfer and knowledge transfer but also regular everyday management in the global environment. It must not be forgotten that the main role of a subsidiary is to communicate the knowledge of the local market directly toward the parent company so that it can optimize its business performance. Therefore, the knowledge of local culture is a must.
4.1 Practical Implications • Implementation of business communication courses for the management of the companies which operate in the global market. • Implementation of business communication for the students of ICT so that they are well equipped with modern patterns of global communication and not only with their technical expertise. • Implementation of intercultural business communication in the global business environment for students of business so that the graduates are ready and well equipped to work in such a varied world of international business.
158
M. Pikhart and B. Klímová
• Further courses for ICT professionals so that they are able to implement new trends and improve communication channels within companies (internal communication) and among them (global business communication) as well.
5 Conclusion The communication mistakes in electronic communication caused by inappropriate interpretation of cultural values and symbols can deeply influence company profitability in a negative way and the paper tried to show how the development of communication competencies in the management of multicultural companies could be helpful in achieving better information transfer. The use of modern means of communication does not automatically create better information transfer environment since information quantity does not necessarily mean information quality. Technological means can enhance communication but only if the management acknowledges that the most important factor of communication is an individual person who exists in a certain cultural context and communication must be seen as a fragile part of any managerial activity. Acknowledgements This paper is part of the SPEV project at the Faculty of Informatics and Management, University of Hradec Kralove, Czech Republic, and the authors would like to thank Ales Berger for his help when collecting data for this paper.
References 1. B. Klimova, P. Poulova, Older people and technology acceptance. Lecture in Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). (LNCS, 2018), pp. 85-94. 10926 2. M. Pikhart, Sustainable communication strategies for business communication. in Proceedings of the 32nd International Business Information Management Association Conference (IBIMA) ed by Ed. Khalid, S. Soliman. (International Business Information Management Association, Seville, Spain, 2018a), pp. 528–53. ISBN: 978-0-9998551-1-9 3. M. Pikhart, Intercultural linguistics as a new academic approach to communication. in ERPA International Congress on Education 2015. SHS Web of Conferences, vol 26, (EDP Sciences, 2016) 4. M. Pikhart, New horizons of intercultural communication: Applied linguistics approach. in ERPA International Congress on Education 2014. Procedia Social and Behavioral Sciences, vol 152 (Elsevier Science, Amsterdam, 2014), pp. 954–957 5. A. Svetlana, K. Elena, A. Zotova, Global educational trends in intellectual capital development. in Inovative Economic Symposium 2017. SHS Web of Conferences, vol 39 (2017) 6. M. Sclater, Sustainability and learning: Aesthetic and creative responses in a digital culture. Res. Comparat. Int. Edu. 13(1), 135–151 (2018) 7. B. Christopher et al., Transnational Management. Text, Cases, and Readings in Cross-Border Management. (McGraw-Hill, Irwin, New York, 2008). ISBN 978-0-07-310172-9 8. H. Deresky, International Management. Managing Across Borders and Cultures. Text and Cases (Pearson Education, New Jersey, 2008). ISBN 978-0-13-614326-0
Managerial Computer Communication: Implementation of Applied …
159
9. B.R. Luo, Y.L. Lin, N.S. Chen, W.C. Fang, Using smartphone to facilitate English communication and willingness to communicate in a communicative language teaching classroom. in Proceedings of the 15th International conference on Advanced Learning Technologies (IEEE, 2015), pp. 320–322 10. I.D. Langset, J.D. Yngve, H. Halvdan, Digital professional development: towards a collaborative learning approach for taking higher education into the digitalized age. Nordic J. Dig. Lit. 13(1), 24–39 (2015) 11. T. Welzer, M. Druzovec, A. Kamisalici, Cultural aspects in technology-enhanced education. in World Congress on Medical Physics and Biomedical Engineering 2018, ed by L. Lhotska, L. Sukupova, I. Lackovic, G.S. Ibbott. vol 1. Book Series: IFMBE Proceedings (2019), pp. 885– 888 12. D.A. Ball, M.W. Mccullochl, International Business. The Challenge of Global Competition (McGraw-Hill, New York, 1996). ISBN 0-256-16601-3 13. M. Pikhart, Multilingual and intercultural competence for ICT: Accessing and assessing electronic information in the global world (MISSI 2018).in Advances in Intelligent Systems and Computing, vol 833. 11th International Conference on Multimedia and Network Information Systems, MISSI 2018 (Wroclaw, Poland, 2018c). ISSN 2194-5357. https://doi.org/10.1007/ 978-3-319-98678-4_28 14. M. Pikhart, Implementing new global business trends to intercultural business communication. in ERPA International Congress on Education 2014. Procedia Social and Behavioral Sciences vol 152 (Elsevier Science, Amsterdam, 2014), pp. 950–953 15. M. Pikhart, Intercultural business communication courses in European Universities as a way to enhance competitiveness. in Proceedings of the 32nd International Business Information Management Association Conference (IBIMA) ed by Ed. Khalid S. Soliman (International Business Information Management Association, Seville, Spain, 2018b), pp. 524–527. ISBN: 978-0-9998551-1-9 16. M. Pikhart, Technology enhanced learning experience in intercultural business communication course: A case study. in Emerging Technologies for Education. Third International Symposium, SETE 2018, Held in Conjunction with ICWL 2018 ed by T. Hao, W.C. Ha. Revised Selected Papers. Book Series: Lecture Notes in Computer Science. (Springer International Publishing, Chiang Mai, Thailand, 2018d). Print ISBN: 978-3-030-03579-2, Electronic ISBN: 978-3-03003580-8 17. Y.B. Sikora, The use of knowledge management methods for e-learning organization. Inf. Technol. Learning Tools 61(5), 162–174 (2017) 18. E. Alpaydin, Machine Learning (MIT Press, The New AI, 2016) 19. J. Banks. J. Cheng, S. Payne, Fostering a technologically innovative teaching culture. in Conference: 1st International Conference on Higher Education Advances (HEAd) Univ Politecnica Valencia, Faculty of Business Adm & Management. Valencia, SPAIN Jun 24–26, 2015. 1st International Conference on Higher Education Advances (HEAD’15), pp. 225–232 20. D. El-Hmoudova, The value of intelligent multimedia application in applied linguistics instruction. in Advanced Science Letters (American Scientific Publishers, Valencia, 2018), pp. 2546–2549 21. B. Klimova, P. Poulova, Mobile learning and its potential for engineering education. in Proceedings of 2015 I.E. Global Engineering Education Conference (EDUCON 2015) (Tallinn University of Technology, Estonia, Tallinn, 2015), pp. 47–51 22. M. Buckland, Information and Society (MIT Press, 2017) 23. P.F. Drucker, Management Challenges for the 21st Century (Routledge, 2007) 24. M. Cerna, L. Svobodova, Internet and social networks as a support for communication in the business environment-pilot study. Hradec Econ. Days 7(1), 120–126 (2017) 25. G. Hofstede, Cultural dimensions in management and planning. Asia Pac. J. Manag. 1, 81–99 (1984). Springer. ISSN 0217-4561
160
M. Pikhart and B. Klímová
26. W.G. Alghabban, R.M. Salama, A.H. Altalhi, Mobile cloud computing: An effective multimodal interface tool for students with dyslexia. Comput. Hum. Behav. 75, 160–166 (2017) 27. A.L. Clark et al., The Handbook of Computational Linguistics and Natural Language Processing (Blackwell, Chichester, 2010)
Advance Persistent Threat—A Systematic Review of Literature and Meta-Analysis of Threat Vectors Safdar Hussain, Maaz Bin Ahmad, and Shariq Siraj Uddin Ghouri
Abstract Cyber adversaries have moved from conventional cyber threat to being advance, complex, targeted and well-coordinated attackers. These adversaries have come to use Advance Persistent Threat vectors to penetrate classified and large business organizations network by various evasive cyber techniques. This paper presents a systematic review of literature work carried out by different researchers on the topic and also explicates and compares the most significant contributions made by them in this area of APT. The paper addresses the shortfalls in the proposed techniques which will form the areas for further research. Keywords Advance Persistent Threat (APT) attack · Advance Persistent Adversary (APA) · Industrial control systems (ICS) · Review
1 Introduction The world is faced with a very dynamic and evolving threat landscape in the cyber space domain. Ever since computer networks have existed, cyber adversaries and cyber criminals as well as state sponsored cyber offenders have tried to exploit the computer network for notorious or personnel gains. They have by far succeeded in infiltrating not only the public but also the classified secure networks. Although cyber security organizations have provided state-of-the-art solutions to the existing cyber threats, hackers have succeeded in causing colossal damage to many multibillion S. Hussain (B) · M. B. Ahmad Graduate School of Science & Engineering, PAF Karachi Institute of Economics and Technology, Karachi, Pakistan e-mail: [email protected] M. B. Ahmad e-mail: [email protected] S. S. Uddin Ghouri Faculty of Electrical Engineering (Communication Systems), Pakistan Navy Engineering College—National University of Sciences and Technology (NUST), Karachi, Pakistan e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_15
161
162
S. Hussain et al.
dollar organizations. According to one estimate, cyber adversaries will cost business organization over $2 Trillion damage inform of data espionage and cyber frauds by 2019. Large business and military organizations have invested and are expected to invest millions of dollar in cyber security defence. According to another estimate, organizations plan to invest well over $1 Trillion on cyber security within the period 2017–2021. Despite investing considerably, incidents of cyber infiltration and security breaches continue an upward trend rather than descending. Cyber domain is an ever changing threat landscape that has evolved constantly and finds new dimensions to target enterprise organizations for either data espionage or causes permanent damage hardware system. Among many different threats that prevail today, Advance Persistent Threat (APT) has emerged as most damaging threat that has ever been encountered by security specialists [1]. Contrary to spontaneous untargeted cyberattacks by anonymous hackers, APT is well-coordinated high profile cyber-attack that targets large commercial organizations and government entities such as military and state institutions [2]. Initially, APT was considered an unrealistic threat and unwarranted hype, however latter on it proved to be a reality as more and more organization became the victim of APT attack [3]. This led security researcher to conclude enterprise business organizations, government and military institutions no longer immune to data breaches despite heavy investment in information security [4]. Statistically speaking, cyber security remains a prime concern throughout the world. More than 15% of the enterprise organizations have experienced targeted attack, out of which more than 53% ended up losing their classified sensitive data [5]. Organizations have seen more than 200% growths in initiation of system or data recovery process at the same day and after week of discovering a security breach in their organizations. When it comes to cost suffered by organizations targeted, it is estimated that approx $891 K average loss from single targeted attacks has been inflicted. Targeted attacks follow the same kill chain as outline in Fig. 1. This might lead to suggest that by automatically blocking the reconnaissance phase, a multistage cyber-attack such as APT can be thwarted; however, this is not the case with APT. APT attacks have reached unprecedented level of sophistication and nonlinearity in terms of their evolution and implementation. Therefore, a multiphase strategy such as continuous monitoring of communication, automated detection capabilities and monitoring of all types of threats is required to be implemented while thwarting APT type of attacks. The ascendance of APT has made many organization including government agencies more alert of the type of vulnerabilities that may exist within their network and information systems. Complexity and inimitability of the attack warrants it to go beyond the perimeters of traditional network defence. This approach has allowed organizations to protect it against attacker that has already penetrated in the organization network. APT has drawn considerable attention of security community due to ever increasing and changing threat scenario posed by it. This ever changing threat landscape leads to lack of clear and comprehensive understanding of its inner working of APT research quandary. Before proceeding further, it is imperative that we define APT, in order to get a clear understanding of the intensity of the cyber-attack. National Institute of Standard and Technology
Advance Persistent Threat—A Systematic Review of Literature …
163
Fig. 1 Typical life cycle of APT attack
(NIST) defines APT [6]. As an adversary (state sponsored or otherwise) that has the capability at sophisticated levels at its disposal create opportunities to achieve its concurrent objectives by using multiple attack vectors such as cyber, physical and deception. It establishes a strong footing in the physical infrastructure of the target organization. Its sole purpose is to extricate valuable information or in some cases inflict damage to the physical infrastructure of resident hardware. This definition forms the basic foundation of understanding APT and distinguishes it from other cyber-attacks. The paper is organized in five sections. Section-1 details brief introduction of cyber-attack environment with broad introduction to APT and its lethality. Section 2 presents the most renowned APT detection and prevention frameworks including the inner working of targeted attack inform of iterative lifecycle process. Section 3 carries out detail critiques of the presented framework in order to highlight the research gaps in Sect. 4 in summarized form. Section 5 presents conclusion and future research directions.
164
S. Hussain et al.
2 Advanced Persistent Threat (APT) Advance Persistent Adversary (APA) continues to develop unprecedented technological advancement in skills and orchestrating cyber-attacks with unparalleled dimensions. One of the threats the adversary has been able to formulate and bring into existence is called Advance Persistent Threat (APT) which is dissimilar and lethal than any other traditional cyber threat. The threat persistently pursues its objectives repeatedly over a longer and extended period of time and adapts to any efforts by the defender in detecting it. It also maintains communication level between the host and server in order to ex-filtrate its harvested data or inflict physical damage to hardware entity as and when required [7]. APT attacks are unique and complex in nature as the scope of attacks is very narrow in comparison to other common and unsophisticated attacks. APA designs a malware that aims to remain undetected over a longer duration. This aspect makes APT harder to detect and defend against [1, 6]. Generally, an APT attack is composed of six iterative phase as illustrated in Fig. 1 [5, 6] and defined as following:• Reconnaissance Phase: The first step that cyber adversaries carry out is the phase of reconnaissance during which gathering of intelligence is carried out on targeted organization. This involves both human and technological aspects to gather as much as information about the target organization network. This is used to identify weak areas to infiltrate within the organizations network. In this phase, intelligence reconnaissance involves both from technical as well gathering information from weak human links. • Incursion/Penetration Phase: In this phase of incursion, the APA attempts to penetrate the organization’s network using many diverse techniques. It employs social engineering tactics such as spear phishing technique, injecting malicious code using SQL injection for delivery of targeted malware. It also exploits zero day vulnerability in the software system to find opening into the network. • Maintaining Access Phase: Once inside the organization network, cyber adversaries maintain access to its payload by deploying a remote administration tool (RAT). The RAT communicates with command and control (C&C) server outside the organization. The communication between the host and external C&C server is normally encrypted HTTP communication. This allows it to easily bypass the firewall and defence systems of network by camouflaging itself in order to remain undetected. • Lateral Movement Phase: In this phase, the APT malware moves itself to other uninfected hosts over the network. The other host usually has higher privileged access which provides a better chance of containing classified information as well as better chance of data ex-filtration. • Data Ex-filtration Phase: The last phase involved in the APT cycle is data exfiltration. In this phase, the host uploads its harvested data to any external source or cloud. This process is either done in single burst or takes place slowly without the knowledge of the end user.
Advance Persistent Threat—A Systematic Review of Literature …
165
Anonymous cyber-attacks are designed to target larger scale systems with an aim to disrupt normal operation of information systems [8]. In case of APT, the target theatre is quiet diverse. Its attack signatures are very unique than any other cyberattack which makes it highly target centric. APT become more challenging as it sometime involves combination of different attack vectors embedded with some unique strategies customized for the particular target organization. It involves network penetration and data ex-filtration tailored specifically for the target network. The horizon of the attacker in case of APT is fairly small and a well-coordinated targeted attack is aimed, mainly at government institutions and large-scale business organizations. Another characteristic of APT attack is that its attack vector is highly customized and sophisticated. It involves blend of tools and techniques which often are executed simultaneously to launch multiple attack vectors. It either exploits zero vulnerability or by attacking the target through malware, drive-by download or uses sink hole attacks to download APT malware [7]. Its communication is designed to conceal itself among other data packets which makes it harder to detect by normal IDS and anti-virus systems [6]. Another characteristic that defines APT is its objectivity of the attack which includes business rivalry to steal trade secrets, economic motivation or military intelligence gathering. Objectively of APT adversaries change overtime depending upon the target organization that is being targeted. Disruption of organization network or destruction of classified equipment in case of military organization and data pilferage are just few examples that define APT objectivity. APT group are highly staffed with ample technical and financial resources that may or may not operate with the support of state actors and machinery [9].
2.1 Target Organizations Cyber-attacks have moved from being generalized to well-coordinated and targeted and from being simple to being sophisticated. These are by nature extensive and more serious threats than ever faced [4]. With this change in threat landscape, cyber adversaries have now gone beyond the perimeters and moved to target rich environments involving clandestine operations, international espionage, cyber operations, etc. These long term, possibly state sponsored, targeted campaign mainly target following types of organization as summarized in following Table 1 [5].
2.2 Incursion Process of APT Attacker Cyber adversaries normally use traditional methods of incursion into targeted organization network. This often involves social engineering tactics using spear phishing emails targeted towards unsuspecting employee of organization to click on a link [5]. In addition, opening an attachment that appears to come from legitimate trusted
166
S. Hussain et al.
Table 1 Type of organizations targeted By APT attacks Types of targeted organization
Attack target
Government, Military and Public Sector Organization
Confidential information pilferage Breach of security, disruption of services
Power Systems
Disruption of supply from power grid system as well as gas system
Financial and Corporate Sector
Pilferage of corporate and financial secrets
Health Sector and Medical System
Leakage of medical information and disruption of service
Manufacturing Sector
Leakage of corporate secret and disruption of operation
IT Industry
Disruption of IT services/internet
colleague of the same organization is also another commonly method used for incursion into the network [7]. Other method includes exploiting multiple zero day vulnerability focusing on highly targeted systems and network is also carried out to run multiple attack vectors simultaneously. This includes downloading of additional tools for the purpose of network exploration and assessing various vulnerabilities [10]. As discussed earlier, the objective of cyber criminals is to remain inside the organization network for infinite period of time undetected. This provides them with the opportunity to fully exploit the vulnerabilities of host network and harvest (as well as ex-filtrate) as much data from the organization it can, while remaining undetected under the radar [6]. This is achieved by specifically designing APT to avoid detection at all cost, which may include evasion techniques to make the attack more difficult to detect and determine [11].
2.3 Communication Mechanism Adopted by APT Attacker One of the most essential parts of APT malware is its communication with its command and control (C&C) server from where the persistent malware takes commands and harvested data is ex-filtrated [1]. The communication between the host and server is often low and slow and sometimes is camouflaged with normal network data traffic packets [9]. The communication is mostly HTTP based, which often acts like a normal network traffic. Other communication mechanisms such as peer to peer (P2P) and IRC are also used by cyber criminals who take advantages in terms of penetrability into the network and concealment of communication [11]. Use of HTTP protocol for communication is quiet high and amounts to more than 90% of the cases of APT infiltration [12]. In addition to HTTP-based communication, other existing protocols such as FTP, SMTP, IRC and traditional FTP-based email systems have been frequently leveraged to ex-filtrate or steal intellectual property, sensitive internal business and legal documents and other classified data from the penetrated
Advance Persistent Threat—A Systematic Review of Literature …
167
organization [13]. Use of HTTP protocol for communication across network by the attacker mainly provides two advantages for their accessibility. Firstly, the communication protocol is most widely used by all organizations across the globe, and secondly, it generates huge amount of web traffic which allows them to hide their malevolent activity and bypass the organization firewall [2]. Various researchers [11, 14] have focused their detection strategy purely on detecting the communication between APT host and its command and control (C&C) server. This is considered to be the most essential part in APT detection as it is always essential to maintain the communication between compromised host and C&C server. Most of the communication that takes place uses supervised machine learning approach that train on the APT malware. Different malware samples have found different forms of communication methods between compromised host and its command server. In one instance, APT sample malware communicates with its C&C server by encoding the commands and response was sent in the form of cookies using base64 codes [15]. In another instance, APT malware logs the user commands by using key-logger, downloading and executing the code and ex-filtrating the stored data to a remote HTTP C&C server periodically [16]. In another and most common APT cases, spear phishing is used in tricking the user to download or initiate a web session [17]. Use of general web services such as blogs pages has also been used by cyber criminal for infiltration purpose [18].
3 Different Attack Vectors In addition to classified data pilferage from the target organization network, APT has another serious threat dimension that attacks the hardware system. The hardware system is normally an industrial control system or weapon systems in case of military organization. The sole purpose of this APT malware is to permanently make the system dysfunctional by causing hardware irreparable damage to its system controls.
3.1 APT—Industrial Threat Vector Advance cyber adversaries have not only attacked information system for data pilferage but also have found ways to infiltrate and take control of industrial control systems (ICS). This is possible because of tight coupling that exist between the cyber and physical components of industrial system such as power grid systems, nuclear power plants, etc. This aspect of cyber-attack has far reaching implications for human lives which are totally dependent on such systems for their normal daily activities. ICS systems such as Supervisory Control and Data Acquisitions (SCADA) systems mainly found in energy sector corporation have been the main target of cyber adversaries [19]. Despite segregation approaches applied by organizations to protect their network, cyber-attacks continue to persist. Contrary to the claims of isolated
168
S. Hussain et al.
networks, complete network isolation remains a myth. Insider threats using microstorage devices or non-permanent modem often prove to be fatal and provide access to restricted networks [9]. This allows malware to spread into deep isolated networks that it becomes difficult if not impossible to assess the damage and depth of infiltration into the network.
3.2 APT—Military Threat Vector APA also aims to target military installations networks and weapon systems. This forms the core reason due to which APT is termed as the most dangerous in terms of lethality. It does not only target large business enterprises but their primary targets are military organizations and their weapon systems. Just to give an example of how terrifying this threat can become, recently some authorized malevolent hackers seized control of weapon system being acquired by the US military. The trial was conducted to assess the digital vulnerability of military assets of US [20]. Today, military assets such as radars, fighter jets, satellites, missiles, submarines, etc. and nuclear weapons delivery system of weapons manufacturing nations have become heavily dependent on computer systems. This has allowed cyber adversaries to exploit vulnerabilities present in the core shell of the system. The targeted attacks carried out by sophisticated adversaries have no linearity when it comes to defining the type of organization being attacked. Therefore, we state that APT can have catastrophic consequences if remain undetected and risk of weapons system being used in response to false alarm or slight miscalculation is very much real than a myth.
3.3 APT Datasets One of the most important aspects of any research is the ability to acquire useful dataset to carryout in-depth analysis. Availability of data relating to APT has been difficult to acquire as it is not publicly available and no organization is willing to share due to intellectual property laws and for the reason that this disclosure will publicly jeopardize the goodwill of the organization. Therefore, chances for obtaining specific dataset related to APT are quiet minimal. Symantec Corporation, being a large and renowned anti-malware solution provider, has the world’s largest repository of internet threats and vulnerability database [9]. It has over 240,000 sensors in over 200 countries of the world that monitors cyber related threats. The company also has gathered malicious code from over 133 million client servers as well as gateways in addition to deployment of honey-pot systems that collect cyberattacks across the world [9]. The organization also claims to gathers data of nonvisible threats that possibly include APT attack data. In addition to using the datasets of Symantec Corporation, development of honey-pot system and its deployment on live server is another method to gather APT datasets in addition to generating
Advance Persistent Threat—A Systematic Review of Literature …
169
live attack scenarios during implementation. This method can also prove useful in gathering real-time datasets and eventually be used to train the model. One drawback with this approach is that it may yield just ordinary cyber-attacks and malwares and no data useful related to APT. Other available option to gather APT datasets is to use open source dataset available [21]. These datasets are highly anonymous datasets that encompasses numerous months and represents several successful authentication events as well as calibration datasets from users to computers.
4 Research Techniques and Gaps Cyber security researchers have proposed limited, yet significant research theories in countering APT which are claimed to be state of the art and comprehensive approaches. However, in-depth analysis suggests that the proposed framework has limited comprehensibility as well as inability to adapt to diverseness and ever changing cyber threat landscape. In this section, we have carried out comprehensiveness analysis of major approaches proposed by prominent researchers on APT detection and prevention frameworks. We have also summarized the strength and weakness of each approach, to give a comprehensive view of APT threat landscape.
4.1 APT Detection Framework Using Honey-Pot Systems The proposed framework is an implementation of a honey-pot system [4]. Honeypot systems are a computer application that simulates the behaviour of real system being used in organization network. Its sole purpose is to attract cyber-attacker in attacking an isolated and monitored system. The system mimics a real live system of organization which usually an information system. It studies the behaviour of the attacker that exploits the weakness and vulnerabilities found in the information system. The cyber-attacks on the system is recorded inform of logs, which afterwards are analyzed in order to gain comprehensive level of understanding into the types and sophistication of the attacks. In this approach to APT detection, researchers [4] suggested that a properly configured honey-pot system be connected to devised alarming system. This would set an alarm once an APT attack is detected. This would early warn the security experts in organization to take appropriate counter measures accordingly and thus protect the organizations vital assets. This approach offered by researcher may prove to be effective in detecting APT; however, it is a passive defence approach, rather than active one. The approach is simple and straight forward with less resource consumption. This approach is only limited to its system behaviour within its defined domain and cannot go beyond the prescribed domain area. The framework only focus on incoming network traffic and disregards any check on network traffic going outside the network. This leaves a grey area of vulnerability of
170
S. Hussain et al.
network traffic going outside the organization network, unsupervised and undetected under the sensor.
4.2 Detection of APT Using Intrusion Kill Chains (IKC) In this approach, the author [22] proposes to detect multi stage cyber-attack. This approach uses the properties of Intrusion Kill Chain (IKC) to model the attack at an early stage. The model collects security event logs from various sensors such as host intrusion detection system (HIDS), firewalls and network intrusion detection systems (NIDS) for analysis and further processes it through Hadoop based log management module (HBLMM). Later on, the intelligent query system of this module correlates the event with IKC. Code and behavioural analysis is also carried out using the same module. The approaches also offer predicting IKC by analyzing collected sensors logs and maps each of the suspicious event identified to one of the seven stages of the attack model [23]. Analysis suggests that although this approach proves to be better and efficient, it involves in-depth analysis of network as well as host-related data flows and analysis of all system mined events. This process may prove to be time consuming and tedious task as the amount of data collected would be phenomenal. Moreover, this approach is solely passive one and mainly focuses on solitary analysis unit, i.e. system event data. Possibility of increase in false positives alarms cannot also be ruled out.
4.3 Countering Advanced Persistent Threats Through Security Intelligence and Big Data Analytics This approach [3] offers effective defence against APT and multidimensional approaches. Based upon big data analytic, the researcher intends to detect weak signals of APT intercommunication. It proposes a framework that works on two set of indicators. One being a compromised indicator, which prioritize the hosts based on suspicious network communication and second being the exposure indictor, that calculates the possibility of APT attack. The framework that the researcher proposes is called AUSPEX (named after an interpreter of omens in ancient Rome). It proposes to include human analyst who analyzes inter-system network communications to detect APT threat within internal hosts. The final outcome of the proposed framework is a list of internal hosts arranged according to the defined compromised and exposure indictors. At latter stage, the human analyst will analyze the internal host communication to detect APT signatures within the organization network. Critically analyzing this framework, AUSPEX is based on combining big data analytics with security intelligence of internal as well as external information. Although, this framework proposes a novel approach in detecting APT communication. However, big data
Advance Persistent Threat—A Systematic Review of Literature …
171
analytics have been used in the past to detect security violations in varied sets of data such as detecting Stuxnet, Duqa, etc. malwares. In this framework, the researchers focus on assisting the security analyst in analyzing large sets of big data sets which are most likely to be compromised. This approach although offers the subset of hosts that are most likely to have been compromised. However, it is more human specialist centric, who analyzes the most likely APT infected host within the context of big data analytics. This approaches although novel may also generate more false alarms as it does not define any rule set for analyzing compromised big data and purely relies on the skills of human specialist. Secondly, the framework also falls short of prevention of strategy of APT; rather, it focuses on confining towards detection strategies of APT using big data analytics.
4.4 APT Countermeasures Using Collaborative Security Mechanisms In this approach, the researchers [24] have presented a framework for detecting APT malware which targets the system at an early stage of infiltration into the network. In this framework, open source version of Security Information and Event Management (SIEM) is used to detect Distributed Denial of Service (DDOS) attack. This is achieved by analyzing the system files and inter-process communication. The research revolves around the concepts of function hooking to detect zero day malware. It uses a tool called Ambush, which is an open source host-based intrusion prevention system (HIPS). The proposed system observes all type of function calls in operating system (OS) and detects its behaviour for any notable malevolent behaviour that might lead to detection of zero day malware. Critiquing this technique, we suggest that this approach to detect zero day attack using OS function hooking might prove useful to security analyst in detecting APT. Furthermore, the proposed theatrical framework is primarily used in detection of DOS attack on system services and may not work for APT-based malware, as the cyber criminals are highly skilled in obfuscating the intra function calls. Moreover, this framework may yield more false positive alarms as every function call is being monitored by the open source host based intrusion prevention system (HIPS). This method may also lead to updating the anomalies database with more false positive, thus rendering the database insignificant. The framework author suggests to comprehensively automating the zero day attack detection based upon the concept presented in their research paper. The researcher in this case does not provide any pre-infiltration phase (ability of the framework to monitor communication before penetration into the network takes place) detection in their framework. This area also needs to be included in the research in countering APT threat at the very initial stage in order make it a comprehensive approach. The major focus of this approach lies in detecting APT attack after its successful infiltration into the network thus falling short of comprehensibility of the approach.
172
S. Hussain et al.
4.5 APT Detection Using a Context-Based Framework In this framework, the researchers propose a conceptual model that can inference the malware based on contextual data. It proposes [25] a conceptual framework that is based on the concept of attack tree which maps itself to form an attack pyramid. The attack pyramid forms a conceptual view of attacker goal that takes place inside is an organization network. An attack tree structure is based on the work of Amoroso [26] and Scheier [27] that have introduced the concept of correlating the attack with its lateral plain. The tree is formed by positioning the target of the attack as the root of the tree and various means of reaching the goal as the child node of the root. The attack tree depicts a visual view of vulnerable elements in hierarchal order as well as likely path of attack. This helps the security experts to obtain an overview picture of the security architecture of the attack. The second element of the framework is notated as an attack pyramid which is an extended modification of attack tree model. It positions the attacker goal as the root of the attack pyramid which corresponds to its lateral environment to locate the position of the attack. This detection framework inferences the attack based on context and correlation rules, confidence and risk level to reach a conclusion about the concentration of the threat posed. The detection rules are based on signatures, policy of the system and correlation among them. It is based on a mathematical correlation function that finds relationship between independent events and its corresponding attack pyramid plain. This framework is based on matching the signature and policy with various attack events, which is passive approach and may not lead to detecting APT type of attack.
4.6 APT Attack Detection Using Attack Intelligence The attack intelligence approach proposed by the researchers, [28] monitors and records all system events that occur in the system. Behaviour and pattern matching is carried out between all recorded events with all known attack behaviour and alarm is set whenever a match is found. The proposed approaches also make use of Deep Packet Inspection (DPI) in industrial control system using intelligence tools like Defender and Tofio [28]. Although, the approach is based on pattern matching between behaviour and event similar to formal method approach in language recognition. However, it does not offer state of the art solution to APT traffic detection as no real-time datasets are available to update the attack behaviour database. This may prove to be a big limitation as the approach is only useful as long as the attack behaviour database is up to date with latest rules base.
Advance Persistent Threat—A Systematic Review of Literature …
173
4.7 Detection of Command and Control (C&C) Communication in Advanced Persistent Threat Researchers [2, 11] have also proposed another novel method of detection of APT malware. This approach primarily focuses on monitoring communication between comprised host and its command control (C&C) server and leaves other detection aspects. A post-malware infiltration approach, similar to Botnet communication, the communication between C&C server usually takes place inform of bulk HTTP web traffic. This approach is easier for an attacker to camouflage its traffic to avoid being detected by human expert as well as firewall. In this regard, various models have been proposed and tested for detection of APT traffic within web traffic with accuracy level of 99.5% of true positives as claimed. In one of the model presented [2], researchers use unsupervised machine learning approach to detect C&C channel in web traffic. APT follows a different set of communication pattern which is quiet dissimilar to regular web traffic. The approach reconstructs the dependencies between web request (analysis is done by plotting a web request graph) and filtering the nodes related to regular web browsing. Using this approach, an analyst can identify malware request without training a malware model. The first limitations with this framework are that it is a post-infiltration approach. In contrast to other approaches proposed by the researchers, we consider this as a shortfall. Once infiltrated inside the organization network, APT malware may cause some amount of damage to the organization inform of data pilferage until and unless detected at an early stage. Secondly, once the malware successfully infiltrates the network, it would be difficult to detect without comprehensively analyzing the entire communication that takes place from external source and inside network communication between two or more host. In addition, C&C traffic can adapt itself in a way that can mimic requests similar to web browsing traffic, thus hiding itself among bulk of HTTP packets. Another drawback with approach is that it may lead to increase in false positive alarm due to complexity of web request graphs. Therefore, this approach as claimed by the researcher to have high accuracy rate requires a supervised learning approach. An APT detection model can learn and accurately detect and correct APT communication with lesser false positives. Summarized form of APT defence techniques outlined by different researchers including its shortfall is stated as in Table 2. In addition to the limitations identified in the frameworks proposed by various researchers, one area that has been identified by the researcher to further carry forward research is the creation of relevant training and testing datasets for APT [19]. Researchers argue that cyber-attacks constantly change and adapt to defence mechanism placed by organizations in an unsupervised anomaly based detection methods. Therefore, datasets for learning and testing the model are either not available or expensive to create. Therefore, training the model would be difficult, which would result in poor detection rate and thus chances for higher false positives and lower true negatives may increase. Summarized defence mechanisms adopted by various researchers are illustrated in Table 3.
174
S. Hussain et al.
Table 2 Summary of defence mechanism against APT and its shortfalls Sl. No.
Approaches
Strategy applied
Limitations
1
Honey-pot systems [4]
Deployment of windows-based low interaction honey-pot system as an alarm indicator
• Post-infiltration detection methodology • May only log normal cyber-attacks • No real-time detection and prevention
2
Detection of APT using Analysis of system Intrusion Kill Chains (IKC) event logs and [22] correlation with IKC
• Passive approach mainly focusing on post-infiltration detection methodology • No real-time prevention • Time consuming effort in detection • Chances of false positive high
3
Big data analytics [3]
Analysis of network flow structure for pattern matching
• Post-infiltration detection methodology • Involves interaction of human analysts to carryout analysis of priority threats, which may prove futile and generate higher false positives • No real-time prevention
4
Collaborative security mechanisms [24]
Monitoring the malware activities by implementing Open Source SIEM (OSSIM) system.
• Post-infiltration detection methodology • Monitor all abnormal processes of accessing system software DLL, which may prove to be tedious work for the deployed application and may thus prove to be less efficient • No real-time prevention
5
Context-based framework [25]
Matching the signature • A passive approach and policy with various post-infiltration detection attack events methodology • No real-time detection and prevention of APT attacks
6
APT attack detection using attack intelligence [28]
Deep packet inspection, pattern matching between behaviour and event
• Non-availability of APT datasets to update the database • Post-infiltration detection methodology (continued)
Advance Persistent Threat—A Systematic Review of Literature …
175
Table 2 (continued) Sl. No.
Approaches
Strategy applied
Limitations
7
Analysis of Communication between C&C server and compromised host [2, 11]
Analysis of HTTP communication packet for discovering C&C server
• Post-infiltration detection methodology. C and C traffic can adapt itself to mimic benign • Web browsing traffic, which may go undetected • The framework may yield high false positives • No real-time prevention
8
Industrial solutions to APT [28]
Deep Package Inspection, Sandbox Analysis, DNS Analysis, Network Flow Analysis
• Cannot be assessed at this point in time as very little literature is available on the • Industrial solutions provided by renounced cyber security organizations
4.8 Industrial Solutions to APT Different security vendors such as Kaspersky, Symantec and others have also provided various solutions to against APT type of threats. Defence mechanisms such as Network Flow Analysis, Deep Packet Analysis as well as Sandbox Analysis and DNS-based intelligence have been presented by large security organizations [28]. These security mechanisms need to be tested for efficiency and cannot be commented, owing to limited literature available on their product solution. Secondly, these cannot be trusted to be used in the critical organizations of the country as the threat of covert channel presence is always there.
5 Conclusion and Future Research Work Advance Persistent Threat (APT) is a sophisticated and intelligent cyber threat authored by a highly skillful and resourceful adversary. It viewed as the most critical peril to private and public as well as military organizations. APT is quiet disparate than normal traditional cyber-attack as its targets are selective system organizations. The APT malware tends to hide itself for a very long time and bypass normal IDS. It has a rallying mechanism of maintaining communication to its C&C server outside the organization network and sending harvested organizational secrets outside the network. Various research frameworks relating to the topic have been analyzed and its shortfalls have been presented in the paper. Owing to the weakness in the analyzed detection frameworks, there is a need to propose a multilayered/multiphase comprehensive APT detection and prevention framework. We suggest that the framework to
✓ ✓
✓ ✓
✓
Big data analytics [3] ✗
✗
✗
Collaborative security mechanism [24]
Context base framework [25]
APT attack detection ✗ using attack intelligence [28]
Analysis of ✗ communication between C&C server and compromised host [2, 11]
Industrial solution to – APT [28]
3
4
5
6
7
8
–
✓
✗
Detection of APT using Intrusion Kill Chain (IKC) [22]
2
✓
✗
Honey-pot systems [4]
1
Post-infiltration malware detection
Pre-infiltration malware detection
Framework approaches
Sl. No.
Table 3 APT defence mechanism summarized
–
✓
✓
✓
✓
✗
✓
✓
High probability of false positives
–
✓
✓
✓
✓
✓
✓
✓
Passive detection approach
–
✗
✗
✗
✗
✗
✓
✗
Real-time prevention
–
✓
✗
✗
✓
✓
✗
✗
Humanistic analysis
176 S. Hussain et al.
Advance Persistent Threat—A Systematic Review of Literature …
177
have capability of defence-in-depth protection in a multilayer protection and detection system protecting the enterprise organization network across different layers. We suggest that the defence strategy should ensure that any APT attack must not bypass one or more of the defence layers. We also suggest that framework to offer a conceptual hybrid implementation strategy that uses AI-based technology such as multiagent system or neural networks. The AI technology offers the capability to design and implement state of the art self-learning framework that adapt to multidimensional evolving cyber threats. It also offers solutions for designing efficient defence system against APT or polymorphic malicious code. Once designed and implemented, the framework should prove to be an efficient in detecting APT threats without raising false alarms. Technology such as a multiagent paradigm delivers best performance of the system and ensures real-time mechanism in detecting and protecting against APT attacks exists. As far as the datasets related to the APT is concerned, we conclude that a good quality training and testing datasets are hard to come by and they are difficult if not impossible to generate. In this regard, we suggest that the best way to generate high quality datasets is to generate own datasets using customized hotpot systems uploaded on an isolated server. A honey-pot solution will generate much needed dataset as and when required for analysis and training the APT model. Additionally, own attack scenario can also prove helpful in deploying honey-pot systems to log a comprehensive training dataset. In order to verify the validity of this approach, same method can also be used to gather datasets which can be applied during the implementation phase. However on the other hand, we can also state that there is no guarantee the dataset collected can be classified as an APT dataset. Therefore, we also suggest that generating high quality datasets related to APT that can be used to train and test detection and prevention framework can be considered a much needed area for future research. Furthermore, there is also a need to carryout analysis and propose an APT defence framework for industrial control systems such as Supervisory Control and Data Acquisition (SCADA) system and also explore different means to efficiently create training and testing data sets to train and test APT prevention framework for industrial control systems. APT prevention framework for military control systems is also another avenue for future research work, provided access to military system is gained to carry out the research.
References 1. J.V. Chandra, N. Challa, S.K. Pasupuleti, Advanced persistent threat defense system using selfdestructive mechanism for cloud security. in Engineering and Technology (ICETECH), 2016 IEEE International Conference on IEEE (IEEE, 2016) 2. P. Lamprakis et al., Unsupervised detection of APT C&C channels using web request graphs. in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (Springer, 2017) 3. M. Marchetti et al., Countering Advanced Persistent Threats through security intelligence and big data analytics. in Cyber Conflict (CyCon), 2016 8th International Conference on IEEE. (IEEE, 2016)
178
S. Hussain et al.
4. Z. Saud, M.H. Islam, Towards proactive detection of advanced persistent threat (APT) attacks using honeypots. in Proceedings of the 8th International Conference on Security of Information and Networks (ACM, 2015) 5. I. Jeun, Y. Lee D. Won, A practical study on advanced persistent threats. in Computer Applications for Security, Control and System Engineering (Springer, 2012), pp. 144–152 6. J. de Vries et al., Systems for detecting advanced persistent threats: A development roadmap using intelligent data analysis. in Cyber Security (CyberSecurity), 2012 International Conference on IEEE (IEEE, 2012) 7. P. Chen, L. Desmet, C. Huygens, A study on advanced persistent threats. in IFIP International Conference on Communications and Multimedia Security (Springer, 2014) 8. R. Gupta, R. Agarwal, S. Goyal, A Review of Cyber Security Techniques for Critical Infrastructure Protection 9. F. Skopik, T. Pahi, A Systematic Study and Comparison of Attack Scenarios and Involved Threat Actors, in Collaborative Cyber Threat Intelligence (Auerbach Publications, 2017) pp. 35–84 10. J. Vukalovi´c, D. Delija, Advanced persistent threats-detection and defense. in Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2015 38th International Convention on IEEE (IEEE, 2015) 11. X. Wang et al., Detection of command and control in advanced persistent threat based on independent access. in Communications (ICC), 2016 IEEE International Conference on IEEE (IEEE, 2016) 12. D. Research, Malware Traffic Patterns (2018) 13. M. Ask et al., Advanced persistent threat (APT) beyond the hype. Project Report in IMT4582 Network Security at Gjøvik University College (Springer, 2013) 14. I. Friedberg et al., Combating advanced persistent threats: From network event correlation to incident detection. Comput. Sec. 48, 35–57 (2015) 15. C. Barbieri, J.-P. Darnis, C. Polito, Non-proliferation regime for cyber weapons. in A Tentative Study (2018) 16. S. McClure, Operation Cleaver. (Cylance Report, 2014 December) 17. R.G. Brody, E. Mulig, V. Kimball, Phishing, pharming and identity theft. Acad. Account. Finan. Stu. J. 11(3) (2007) 18. B. Stone-Gross et al., Your botnet is my botnet: analysis of a botnet takeover. in Proceedings of the 16th ACM conference on Computer and communications security (ACM, 2009) 19. C. Wueest, Targeted Attacks Against The Energy Sector (Symantec Security Response, Mountain View, CA, 2014) 20. G. Coleman, Hacker, Hoaxer, Whistleblower, Spy: The Many Faces of Anonymous (Verso books,2014) 21. G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science 313(5786), pp. 504–507 (2006) 22. E.M. Hutchins, M.J. Cloppert, R.M. Amin, Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains. Leading Iss. Inf. Warfare Sec. Res. 1(1), 80 (2011) 23. P. Bhatt, E.T. Yano, P. Gustavsson, Towards a framework to detect multi-stage advanced persistent threats attacks. in Service Oriented System Engineering (SOSE), 2014 IEEE 8th International Symposium on IEEE. (IEEE, 2014) 24. N.A.S. Mirza et al., Anticipating Advanced Persistent Threat (APT) countermeasures using collaborative security mechanisms. in Biometrics and Security Technologies (ISBAST), 2014 International Symposium on IEEE (IEEE, 2014) 25. P. Giura, W. Wang, A context-based detection framework for advanced persistent threats. in IEEE (IEEE, 2012) 26. B. Schneier, Attack trees. Dr. Dobb’s J. 24(12), 21–29 (1999) 27. E.G. Amoroso, Fundamentals of Computer Security Technology. (PTR Prentice Hall New Jersy, 1994) 28. J.T. John, State of the art analysis of defense techniques against advanced persistent threats. in Future Internet (FI) and Innovative Internet Technologies and Mobile Communication (IITM) Focal Topic: Advanced Persistent Threats (2017)
Construction of a Teaching Support System Based on 5G Communication Technology Hanhui Lin, Shaoqun Xie, and Yongxia Luo
Abstract With the advent of the 5G era, new-type classroom teaching has welcomed a new development opportunity. It is urgent to design a teaching support system which utilizes high-speed network technology and new-type display technology, the existing 5G-supported educational research mainly concentrates on researches based on 5G educational application scenarios and those exploring the implementation path for intelligent education, but the teaching support system based on 5G communication technology has been less investigated. Based on an analysis of application of front-projected holographic display technology, panoramic video technology, haptic technology, VR technology and AR technology, these technologies were integrated to construct a support system applied to classroom teaching, its operating mode was introduced, and the expectation on this research was raised in the end. Keywords 5G · Teaching application · System
1 Research Background Under the policy guidance of “Internet +”, various industries have made fruitful achievements in the information construction in China. Importance has been attached to “Internet + education,” and especially with the advent of the 5G era, new-type classroom teaching has welcomed a new development opportunity. The past classroom teaching support systems only used text, picture, low-format video and so on H. Lin Center of Faculty Development and Educational Technology, Guangdong University of Finance and Economics, Guangzhou, China S. Xie (B) Center of Network and Formation, Guangdong University of Finance and Economics, Guangzhou, China e-mail: [email protected] H. Lin · Y. Luo College of Educational Information Technology, South China Normal University, Guangzhou, China © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_16
179
180
H. Lin et al.
to support classroom teaching. These manifestation modes were not vivid enough, and it is difficult for students to understand and master the learning contents within a short time, which was closely related to the past low-speed communication network. In April, 2018, China Mobile joined hands with ZTE to get through the first 5G phone across China, thus opening the curtain of 5G. In December 2018, the first 5G bus came into being. In April 2019, the first “5G + 8K” superhigh-definition video live broadcast gave its first show in Shanghai, etc. Especially on June 6, 2019, Ministry of Industry and Information Technology of the PRC formally issued 5G business license plates to China Telecom, China Mobile, China Unicom and China Broadcast Network, indicating that 5G has involved various aspects of our life. Under the support of 5G communication network, classroom teaching pattern has embraced a new development opportunity. In order to better support classroom teaching, promote cultivation of innovative talents and improve teaching quality, it is urgent to use highspeed network technology and new-type display technology to design and develop a teaching support system based on 5G communication technology.
2 Literature Review Recently, 5G-supported education has become a research hotspot, but the researches on 5G-supported education mainly focus on 5G educational application scenarios. For instance, Zhao et al. [1] exploratively proposed the connotations and scenarios of 5G application to education in 5G Application to Education: Connotation Explanation and Scenario Innovation (2019) and deemed that when teachers and students were used to new characteristics brought by 5G, heuristic mechanism should be used to reflect upon education and teaching, and the focus should be placed on new requirements, challenges and opportunities faced by students in the future study, work and life; by analyzing interactions among teachers, students, learning environment and learning resources on educational scenarios in the 5G era, Yuan et al. [2] expounded educational scenario element reform under the 5G background, and this reform was mainly embodied by intelligence of teachers’ teaching, autonomation of students’ learning, ever-increasing enrichment of learning environment and enhanced diversification of learning resources, etc., in Educational Scenario Element Reform and Countermeasures in the 5G era (2019); based on the development and application of 5G and AI technologies, Zhang et al. [3] analyzed the evolution of 5G + AI technology and all kinds of application scenarios from the perspective of technology fusion in Incoming Road and Access Road: A New Review on Teaching and Learning in 5G + AI Technology Field (2019). During the construction process of educational informatization 2.0, 5G + AI technology will be an important basis for the educational informatization ecosystem with technical traits of empowering, enabling and augmenting. Researches on 5G-supported technology also include researches exploring implementation paths for intelligent education. For instance, in 5G + AI: Construct A New Intelligent Educational Ecosystem in the “Intelligence +” Era (2019) [4], Lan G S
Construction of a Teaching Support System Based on 5G …
181
et al. discussed opportunities and challenges brought by 5G-empowered intelligent technologies to the intelligent educational ecosystem, how to cope with challenges, and innovate and implement possible paths of intelligent education; in Connotation, Function and Implementation Path of Intelligent Adaptive Learning Platform under the AI + 5G Background—Intelligence-Based Concept Construction of A Seamless Learning Environment [5], Lu took full advantages of various intelligent technologies to construct a big data-based intelligent online learning and education platform under the promotion of AI, 5G and the development of big data technology. However, a void is left in systematic researches on classroom teaching supported by high-speed network technology and new-type display technology, especially design and development of 5G communication technology-based teaching support system using 5G-based front-projected holographic display technology, panoramic video technology, haptic technology, VR technology and AR technology are not involved. Therefore, studying how to construct a teaching support system based on high-speed communication technology will be great realistic significance.
3 Teaching Application of Emerging Technologies 3.1 Teaching Application of Front-Projected Holographic Display Technology Front-projected holographic display, a 3D technology, refers to recording and reproducing real 3D image of an object based on the interference principle [6]. After then, guided by science fiction films and commercial propaganda, the concept of holographic display has extended into commercial activities like stage performance, display and exhibition. However, the holography we understand regularly is not holographic display in strict sense but a kind of holographic-like display technology, which uses Pepper’s ghost, edge blanking and other methods to realize 3D effect. On the Third World Intelligence Congress on May 16, 2019, front-projected holographic display technology was presented. The front-projected holographic display technology, supported by 5G network transmission technology, is quite suitable for teaching application especially for schools running in a remote way or in two different places. It can realize online interaction between teachers and students in different places, thus transcending time and space, and it has been innovatively applied to the teaching field.
3.2 Teaching Application of Panoramic Video Technology Panorama, also called 3D panorama, is an emerging rich media technology, and its main differences from traditional streaming media like video, sound and picture
182
H. Lin et al.
are “operability and interaction.” Panorama is divided into virtual reality (VR) and 3D live-action, where the former uses software such as maya, the produced scenario representatives simulating the reality are virtual Forbidden City, Hebei virtual tourism, Mount Tai virtual tourism, etc.; the latter uses digital single lens reflex (DSLR) or street view vehicle to shoot real pictures, which are specially spliced and processed so that students can be placed in a picturesque scene, and the most beautiful side will be displayed out. Teachers can present teaching contents and information vividly to students, innovative teaching applications and enrich teaching means.
3.3 Teaching Application of Haptic Technology On April 1, 2014, Baidu Institute of Deep Learning (abbreviated as IDL) and Baidu Video jointly announced and developed a complicated touch sensing technology, which could detect different frequency ranges of sense of touch in cell phone screen commonly used so that this screen became a sensor, and this sensor could not only accurately recognize multi-point touch control behaviors of the user but also could detect object texture on the screen which could be perceived by the user by touching, e.g., different body parts of human could perceive liquid. Touch interaction has become a standard interaction mode of smartphones and tablet PCs. Therefore, such kind of interaction technology used, we can turn visual contents into reliable sense of touch, so it will be of enormous value for enriching the experience of students.
3.4 Teaching Application of VR/AR Technology VR immersion-type teaching is a teaching pattern featured by multi-person synchronization, real-time interaction and placement of participants in the virtual world [7]. It provides scenarios for teaching contents and immersion-type, practice-based and interactive virtual reality teaching and practical training environment for students [8]. Nowadays, with the advent of 5G, VR teaching has been deeply associated with 5G communication technology so that virtual reality technology has been more extensively and effectively applied to teaching. Under the 5G background, edge computing is utilized to upload videos to the cloud end. The model is constructed using highconfiguration clusters at the cloud end, followed by real-time rendering and real-time downloading of images to the head-mounted display, and the whole process does not take over 20 ms. For courses with strong operability like laboratory courses, VR head-mounted display can exert a greater effect. After wearing the head-mounted displays, which provide different angles of view, students can observe the teacher from front side, lateral side and even hands of the teacher to do the experiment, so as to realize immersion-type teaching.
Construction of a Teaching Support System Based on 5G …
183
4 System Construction In order to overcome defects and deficiencies of the exiting systems, a teaching support system based on 5G communication technology was constructed. Featured by high-speed data interaction, diversified teaching modes and vivid display forms, this system could provide schools with modern classroom teaching tools. In order to solve the existing technical problems, the technical proposal adopted by this teaching support system is: teaching support system based on 5G communication technology, including high-speed wireless communication module, which is connected to front-projected holographic display module, panoramic video module and haptic technology module; high-speed wireless communication module is connected to VR module and AR module; high-speed wireless communication module is connected to high-speed processing memory module, which is then connected to students’ end and teachers’ end. The front-project holographic display module includes holographic display equipment which records and reproduces 3D image of the object according to principles of interference and diffraction, and it is very suitable for teaching application; panoramic video module includes high-definition shooting equipment, which consists of a group of high-definition image acquisition equipment, and they can realize 360° data acquisition; haptic technology module includes ultrasonic peripheral pad which can emit 40 kHz sound wave and create physical sense of touch by regulating ultrasonic wave so that the user can experience the sense of touch of different virtual objects like “edge” and “plane.” The VR module/AR module mainly includes head-mounted display which can totally place participants in a virtual world and provide them with immersiontype, practice-based and interactive virtual reality teaching and practical training environment. The high-speed wireless communication module is connected to a high-speed processing memory module, which is connected to teachers’ end and students’ end, where the former is used to manage the whole system and carry out teaching and the latter is used to classroom learning. This system is set with a high-speed wireless communication module, which is connected to front-projected holographic display module, panoramic video module, haptic technology module, VR module and AR module. The teacher controls and invokes equipment of modules like front-projected holographic display module, panoramic video module, haptic technology module, VR module and AR module through the teachers’ end to organize the classroom teaching. Under the guidance of the teacher, the students study at the students’ end. All teaching processes and teaching data are saved in the memorizer of the high-speed processing memory module.
184
H. Lin et al.
5 System Operation Mode The operation process of this system will be hereby listed combining Fig. 1. The system provides a teaching support system based on 5G communication technology, including high-speed wireless communication module 5, a 5G high-speed wireless communication module, and more specifically, 802.11ac communication module, which is commonly known as 5G module. As shown in Fig. 1, the high-speed wireless communication module 5 is connected to front-projected holographic display equipment 1, high-definition shooting equipment group 2 and ultrasonic peripheral pad 3, where front-projected holographic display equipment 1 is LED rotary 3D holographic video playing and animation display equipment, high-definition shooting equipment group 2 is 3D camera equipment group, and ultrasonic peripheral pad 3 can emit 40 kHz sound wave, create physical sense of touch by regulating ultrasonic wave and make it possible for users to experience sense of touch of different virtual objects like “edge” and “plane.”
2
1
3
front-projected holographic display equipment
high-definition shooting equipment group
ultrasonic peripheral pad
VR module
high-speed wireless communication module
AR module
4
6
5 students’ end
high-speed processing memory module
7 Fig. 1 The operation process of the system
8
teachers’ end
9
Construction of a Teaching Support System Based on 5G …
185
As shown in Fig. 1, the high-speed wireless communication module 5 is connected to VR module 4 and AR module 6, where VR module 4 is implemented using a head-mounted display which can place the participants totally in a virtual world and provide students with immersion-type, practice-based and interactive virtual reality teaching and practical training environment, and AR module 6 is also implemented using a head-mounted display and embeds the virtual world in the real world for the sake of interactive teaching. As shown in Fig. 1, the high-speed wireless communication module 5 is connected to the high-speed processing memory module 8, which is connected to teachers’ end 9 and students’ end 7, where the former is used to manage the whole system and carry out teaching and the latter is used for classroom learning. As shown in Fig. 1, teachers’ end 9 consists of personal computer, intelligent terminal, etc. Students’ end 7 consists of smartphone, tablet PC, etc. The two are connected to each teaching support device via the high-speed wireless communication module 5, so as to realize the goal of controlling the utilization of modern teaching equipment; the high-speed wireless communication module is connected to high-speed processing memory module 8, which is used for data saving and processing.
6 Conclusion The advent of 5G has generated revolutionary influences on various walks of life, and educational industry is no exception. We have been aware that all kinds of cutting-edge technologies such as front-projected holographic display technology, panoramic video technology, haptic technology, VR technology and AR technology, which are supported by 5G communication technology, will be applied to the educational industry. Especially when applied to classroom teaching, it can improve students’ learning efficiency, improve teaching quality and promote cultivation of innovative talents. These new-type technologies were integrated in this paper to construct a teaching support system based on 5G communication technology. The operation mode of this system was discussed, expected to provide a design idea for our peers. The subsequent research work of this research group will apply this system to teaching practice and repeatedly optimize it as the 5G communication technology enters campuses. Acknowledgements This work was supported by the Education Project of Industry-University Cooperation (201801186008), the Guangdong Provincial Science and Technology Program (2017A040405051), the Higher Education Teaching Reform Project of Guangdong in 2017, the “Twelfth Five-Year” Plan Youth Program of National Education Information Technology Research (146242186), the Undergraduate Teaching Quality and Teaching Reform Project of Wuyi University (JX2018007), the Features Innovative Program in Colleges and Universities of Guangdong (2018GXJK177, 2017GXJK180).
186
H. Lin et al.
References 1. X. Zhao, L. Xu, Y. Li, 5G in education: connotation and scene innovation—new thinking on optimization of educational ecology based on emerging information technology. Theory Educ. Technol. (4), 5–9 (2019) 2. Y. Lei, Zhang Yanli, L. Gang, The change of elements of educational scene in 5G era and the strategies. J. Dist. Educ. 37(3), 27–37 (2019) 3. K. Zhang, Z. Xue, T. Chen, J. Wang, J. Zhang, Incoming road and approaches: New thoughts on teaching and learning from the perspective of 5G + AI. J. Dist. Educ. 37(3), 17–26 (2019) 4. G. Lan, Q. Guo, J. Wei, Y.X. Yu, J.Y. Chen, 5G + intelligent technology: Construct a new intelligent education ecosystem in the “intelligence+” era. J. Dist. Educ. 37(3), 3–16 (2019) 5. W. Lu, Connotation, function and implementation path of intelligent adaptive learning platform in the view of AI + 5G: Based on the construction of intelligent seamless learning environment. J. Dist. Educ. 37(3), 38–46 (2019) 6. Da Chu, J. Jia, J. Chen, Digital Holographic Display Encyclopedia of Modern Optics, 2(4) edn., 2018, pp. 113–129 7. T.-K. Huang, C.-H. Yang, Y.-H. Hsieh, J.-C. Wang, C.-C. Hung, Augmented reality (AR) and virtual reality (VR) applied in dentistry. The Kaohsiung J. Med. Sci. 34(4), 243–248 (2018) 8. A. Suh, J. Prophet, The state of immersive technology research: A literature analysis. Comput. Hum. Behav. 86, 77–90 (2018)
Intelligent Hardware and Software Design
Investigating the Noise Barrier Impact on Aerodynamics Noise: Case Study at Jakarta MRT Sugiono Sugiono, Siti Nurlaela, Andyka Kusuma, Achmad Wicaksono, and Rio P. Lukodono
Abstract The high noise exposure at MRT station due to the noises of the speeding trains can cause health problems for humans. This research aims to reduce the noise impact due to the speeding trains by modifying the design of the noise barrier on the outdoor MRT in Jakarta. The first step conducted in this research is a literature review on aerodynamics noise, CAD model, Computational Fluid Dynamics (CFD) and Computational Aeroacoustics (CAA), and human comfort. Furthermore, it was conducted a design of 3D noise barrier model and 3D train model in one of the outdoor MRT stations in Jakarta using the CAD software. The simulation using the CFD and CAA was implemented to acknowledge the distribution of airflow and sound occurred. The vorticity configuration resulted from the simulation was used as the foundation to modify the noise barrier. The addition of holes on the noise barrier for every 5 m is able to decrease the noise impact significantly. One of the results is that the existing aerodynamic noise of 1.2 dB up to 3.6 dB can be reduced to close to 0 dB with only minor noises around the holes. Scientifically, it can be stated that a way to lower the noise made by the train movement is by creating the right design of a noise barrier that can neutralize the source of the noise. Keywords Aerodynamics · CFD · Noise barrier · Aeroacoustics · MRT
S. Sugiono (B) · R. P. Lukodono Department of Industrial Engineering, Brawijaya University, Malang, Indonesia e-mail: [email protected] S. Nurlaela Department of Urban Area Planning, ITS, Surabaya, Indonesia A. Kusuma Department of Civil Engineering, Universitas Indonesia, Jakarta, Indonesia A. Wicaksono Department of Civil Engineering, Brawijaya University, Malang, Indonesia © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_17
189
190
S. Sugiono et al.
1 Research Background Noise is an unwanted disruptive voice or sound. The noise intensity is defined by one decibel or dB while frequency is defined by the Hz unit. The noise intensity that can be tolerated by humans during the working hour (which is about 8 h) is at a maximum of 85 dB. A noise with high intensity can damage human hearing, for instance, by lowering the hearing range up to deafness. Besides, it can cause health problems, such as increased blood pressure and heartbeat that are potentially lead workers to suffer from a heart attack and digestive problems. Meanwhile, lowintensity noise can cause stress, headache, sleep deprivation, concentration loss, and declining performance of workers. Modes of transportation, such as cars, ships, airplanes, and trains are included as sources of noise that do not only affect the passengers but also people in its surroundings. Multiple researchers have discussed noise related to mass rapid transit (MRT), including Pamanikabud and Paoprayoon [8] that explained the noise level measurement on elevated MRT stations. This research also argued that MRT noise contributes significantly to sleep deprivation on the residence around railways. Roman Golebiewski [2] in his research entitled “Influence of turbulence on train noise” argued that the propagation on outdoor noise is caused by several aspects including air absorption, wave-front divergence, ground effect, diffraction at obstacles, aerodynamics turbulence, and refraction. Zhang et al. [12] stated that the interior noise in high-speed trains causes discomfort to the conductor and, therefore, becomes an important part of a design. Latorre Iglesias et al. [4] explained the model used to predict aerodynamics noise using photos of high-speed trains. This model is able to reduce the number of simulations using CFD and CAA. Jik Lee and Griffin [3], Dai et al. [1], and Vittozzi et al. [11] emphasized that noise caused by MRT affects the residents of its surrounding environment. Land Transport and Authority explained that the presence of a noise barrier will decrease 10 dB of noise in the surrounding environment. In fact, the MRT in Jakarta is still very noisy, therefore, further investigation is necessary. Based on the background above, it is very necessary to conduct a study on the impact of aerodynamics noise of MRT trains in Jakarta, Indonesia, especially for the outdoor elevated stations. The design of the noise barrier will be simulated using the CFD and CAA simulation to acknowledge the existing condition. The overview of the condition of train stations is an inseparable part of studying aerodynamics and aeroacoustic on MRT motion.
2 Material and Research Methodology 2.1 Aerodynamics Noise Noise is a sound unwanted by humans or a sound that is unpleasant to humans who are exposed to it. Aerodynamics noise is sound caused by airflow due to turbulence
Investigating the Noise Barrier Impact on Aerodynamics Noise …
191
or formulation of spiraling vortices. The direction and shape of airflow can also cause alternation in the level of noise from certain sources. In each working environment, there are many reasons to keep the sound at the appropriate level. The sound above this level is considered “noise”. Noises can disrupt labors’ attention, creating an unsafe working environment. An aeroacoustics investigation focuses on the sound source caused by turbulence or shifts in the aerodynamics surface. Lighthill [5] explained the basic aerodynamics noise theory based on the understanding of Navier–Stokes’ equations. Watson R., Downey O [9] in his book explained that the area or place wherein a sound is heard or measured is known as ‘sound field’. The sound field, based on the method and environmental effects, is classified into two categories: free field and diffuse field (Near-field and Far-field). The sound channeled from a contributing source is not combined into a unified sound, therefore, the sound radiation produces a variance in Sound Pressure Level (SPL) up to the measurement position that has been moved with a distance of one or double the longest sound dimension. If the level of acoustic sound pressures measured is close to the source, they usually show a quite big variance to the receiver’s position. In light of this, the position of the farfield receiver is the best choice, meanwhile, far from the source, acoustic pressure, and speed become merely related, such as in-plane wave. Fflowcs William Hawkings (FW-H) is a far-field acoustic model that implements the Lighthill–Curle equation to determine the sound signal propagation from the sound source to the receiver with the types of sound sources of the monopole, dipole, and quadruple. Figure 1 depicts the process of sound signal propagation to the receiver in the far-field x and t position of the time period of the sound signal. The primary unit of noise is the decibel (dB) as a logarithmic function. Generally, noise in aerodynamics is defined as Sound Pressure Level (SPL). The SPL formulation is written down in Eq. (1) below [9]: SPL(dB) = 20 log10 (P/Pref ) where Pref = sound power reference (=2 × 10−5 Pa).
Fig. 1 A far-field sound generation in turbulence current for M 1
(1)
192
S. Sugiono et al.
2.2 Research Methodology The objective of this research is to investigate the impact of the noise barrier on the residents in the station and also in the environment around the elevated MRT station, Jakarta. The design of the noise barrier and design of MRT station become the main aspect that must be implemented to reduce the existing noise. In this research, it was conducted an investigation on the noise barrier which is an elevated railroad fence on the elevated MRT station. In MRT Jakarta, there are seven elevated stations on a variance of heights between 10 m up to 30 m. Based on the initial observation, it was acquired that the sound sources are the trains rubbing against the railways, aerodynamic noise, and vehicles moving on the roads underneath the MRT stations. Figure 2 depicts the general overview of elevated MRT stations in Jakarta, and one shows the physical form of an elevated railroad fence made of concrete. Figure 3 depicts the steps of research implementation which includes observation and initial measurement, the 3D CAD design for trains and elevated train stations, the CFD and CAA simulation, and analysis/discussion. The CFD simulation was used to investigate the airflow, while the CAA was used to acknowledge the noise magnitude produced from the airflow. There are several instruments used to acquire the existing data, such as air velometer, camera, noise meter, and roll meter. The air velometer was to measure the wind velocity, temperature, and relative humidity. The noise meter was used to measure the sound pressure level (dB) on certain points. Other data collected were the dimension of trains (width = 2.9 m, height = 3.9 m, and length 20 m with a total of six rail cars), dimension of railroad noise barrier with a height of 1.5 m, thickness of 10 cm made of concrete, railroad width = 1.067 m, and a maximum outdoor speed = 80 km/hr. There were two types of simulations being compared, aerodynamics noise produced by one train and two trains passing.
3 Results and Discussion Overpass railways, or also known as a noise barrier on elevated railways or MRT, are used to protect the trains, as well as allocating the noise happening to the surrounding environment. However, on the other side, it will increase the noise at waiting rooms in MRT stations. Based on the result of field research observation in MRT Jakarta, the noise in elevated MRT stations came from the sound of friction between trains and railways, sound of vehicles around the stations, and additional sound from wind gust made by the speeding trains. Based on the measurement of the several existing elevated MRT stations, it is acquired that the noise existing in elevated MRT stations is considered high with an average = 82.66 dB, with maximum value during train arrivals of = 89.3 dB. Meanwhile, inside the train, the noise value experienced by the passengers when passing the overpass railways is 84.01 dB. In order to remove or reduce the noise impact, it is necessary to investigate the influence of the noise barrier or elevated railroad fence on noise distribution. Müller
Investigating the Noise Barrier Impact on Aerodynamics Noise …
193
Fig. 2 An example of elevated MRT station in Jakarta and noise barrier design
and Obermeier [7], Manela [6] and Kelly et al. [13] explained the presence of a strong and clear relationship between the flows of vorticity fluid and sound generating. Figure 4 is a contour of air vorticity configuration which is formed by the passing trains on maximum speed on the elevated railways, which is 80 km/hr. The first picture elaborates on the shift of air pattern for a single train while the second is when two trains were passing. From the picture, it can be explained that there is an interesting moment in which the vorticity will be more in between the two rail cars, yet in the ends, the trains, respectively, neutralize each other so that the vorticity value becomes less (red dashed circle).
194
S. Sugiono et al.
Fig. 3 Flowchart of research steps for reducing noise in the elevated MRT station
Fig. 4 Contour of vorticity value on the noise barrier for a single train and double trains passing
The vorticity existing in the CFD simulation result can be used to elaborate the aeroacoustics value in a form of sound pressure level (dB) value. Figure 5 has 4 graphics that explain the noise difference due to the aerodynamics when one train passed (5a) and when two trains passed against each other (5b). From Fig. 5a, it can be explained that the noise will increase gradually from the front end (1.2 dB) to the back of the rail car (3.6 dB) for the surrounding noise between the noise barriers and tend to have no impact (0 dB noise) to the sides of trains with other railways. Figure 5b can explain that two trains passing against each other generally will reduce the noise
Investigating the Noise Barrier Impact on Aerodynamics Noise …
195
Fig. 5 Sound pressure level (dB): a existing noise barrier—one train, b existing noise barrier—two trains passing, c modification noise barrier—single train, d modification noise barrier—two trains passing
that occurs due to the airflow, this can be further explained that the existing vorticity neutralizes each other due to the movement of train 1 and train 2. The size of noise value received by the passengers on the train and around the MRT stations must be reduced appropriately. The profile overview of the noise distribution created by the aerodynamics can be used as a foundation to reduce other noise sources, such as the one made by the friction between the wheels and railways. The noise release around the overpass railways can be implemented by modifying the shape of the existing noise barrier. Figure 5c and d show the impact of holes on noise barrier for a single train and double trains when passing against each other. From the two pictures, it can be explained that the modification is able to reduce the noise significantly (close to 0 dB) and the noise remains only in the noise barrier holes. Bendtsen et al. [10] and Ishizuka and Fujiwara [14] clearly explained the importance of material, shape, and size of the noise barrier to eliminate the occurring noise.
196
S. Sugiono et al.
4 Conclusion This paper has succeeded in simulating the aerodynamics condition for a train that moves alone or when passing against each other on elevated railways. The existence of noise barrier, like a canal, is proven to capitalizing the present noise sources so that it becomes more uncomfortable with noises in the elevated MRT stations = 89.3 dB when the train arrives and inside the train with a noise value of 84.01 dB. 89.3 dB. The experiment result of noise barrier modification by adding holes every 5 m on both the right and left sides of the train can reduce the noise significantly. For instance, the aerodynamic noise is able to be neutralized up to close to zero dB throughout the train, with a minor noise in the noise barrier holes. This research has become the foundation of future noise barrier design optimization. The future research on environmental impact due to noise barrier modification would be a distinctive challenge. Acknowledgements Thanks to the Ministry of National Education of the Republic of Indonesia for supporting this paper. The authors are also grateful to the Bioengineering research group and the Laboratory of Work Design and Ergonomics, Department of Industrial Engineering, the Brawijaya University, Malang Indonesia for their extraordinary courage.
References 1. W. Dai, X. Zheng, L. Luo, Z. Hao, Y. Qiu, Prediction of high-speed train full-spectrum interior noise using statistical vibration and acoustic energy flow. Appl. Acoust. (2019). https://doi.org/ 10.1016/j.apacoust.2018.10.010 2. R. Goł¸ebiewski, Influence of turbulence on train noise. Appl. Acoust. (2016). https://doi.org/ 10.1016/j.apacoust.2016.06.003 3. P. Jik Lee, M.J. Griffin, Combined effect of noise and vibration produced by high-speed trains on annoyance in buildings. J. Acoust. Soc. Am. (2013). https://doi.org/10.1121/1.4793271 4. E. Latorre Iglesias, D.J. Thompson, M.G. Smith, Component-based model to predict aerodynamic noise from high-speed train pantographs. J. Sound Vib. (2017). https://doi.org/10.1016/ j.jsv.2017.01.028 5. M.J. Lighthill, A new approach to thin aerofoil theory. Aeronaut. Q. (1951). https://doi.org/10. 1017/s0001925900000639 6. A. Manela, Sound generated by a vortex convected past an elastic sheet. J. Sound Vib. (2011). https://doi.org/10.1016/j.jsv.2010.08.023 7. E.A. Müller, F. Obermeier, Vortex sound. Fluid Dyn. Res. (1988). https://doi.org/10.1016/01695983(88)90042-1 8. P. Pamanikabud, S. Paoprayoon, Predicting mass rapid transit noise levels on an elevated station. J. Environ. Manage. (2003). https://doi.org/10.1016/S0301-4797(02)00219-0 9. R. Watson, O. Downey, The Little Red Book of Acoustics: A Practical Guide (Blue Tree Acoustics, 2008) 10. H. Bendtsen, E. Kohler, Q. Lu, B. Rymer, Acoustic aging of road pavements. in 39th International Congress on Noise Control Engineering 2010, INTER-NOISE 2010 (2010) 11. A. Vittozzi, G. Silvestri, L. Genca, M. Basili, Fluid dynamic interaction between train and noise barriers on High-Speed-Lines. Procedia Eng. (2017). https://doi.org/10.1016/j.proeng. 2017.09.035
Investigating the Noise Barrier Impact on Aerodynamics Noise …
197
12. X. Zhang, R. Liu, Z. Cao, X. Wang, X. Li, Acoustic performance of a semi-closed noise barrier installed on a high-speed railway bridge: Measurement and analysis considering actual service conditions. Measur. J. Int. Measur. Confeder. (2019). https://doi.org/10.1016/j.measurement. 2019.02.030 13. M.E. Kelly, K. Duraisamy, R.E. Brown, Predicting blade vortex interaction, airloads and acoustics using the Vorticity Transport Model. in AHS Specialists Conference on Aerodynamics 2008 (2008) 14. T. Ishizuka, K. Fujiwara, Performance of noise barriers with various edge shapes and acoustical conditions. Appl. Acoust. 65(2), 125–141 (2004). https://doi.org/10.1016/j.apacoust.2003. 08.006
3D Cylindrical Obstacle Avoidance Using the Minimum Distance Technique Krishna Raghuwaiya, Jito Vanualailai, and Jai Raj
Abstract In this article, we address the motion planning and control problem of a mobile robot, herein considered as a navigating in a workspace with obstacle. We use Lyapunov’s second method to control the motion of the mobile robot. The minimum distance technique, incorporated for the avoidance of the cylindrical obstacle in a 3D space, used for the first time. Here, the minimum distance between the center of the point mass, representing a mobile robot, and the surface of the cylinder is calculated; thus, only the avoidance of a point on the surface of the cylinder is considered. We propose a set of artificial potential field functions that can be used for the avoidance of the cylindrical obstacle, and for the attraction to the assigned target. The effectiveness of the suggested robust, continuous, nonlinear control inputs is verified numerically via a computer simulation. Keywords Cylindrical obstacle · Lyapunov functions · Minimum distance technique · Stability
1 Introduction Recently, there has been numerous research on mobile robots and its applications [1, 2]. The findpath problem, which essentially is a geometric problem has gained huge support and attention over the years. It involves the identification of a continuous path that allows a mobile robot to reach its predefined final configuration from its initial configuration while ensuring collision and obstacle avoidance that may exist in the space [3, 4]. There are various motion planning and control (MPC) algorithms for collision-free navigation of single and multiple robotic systems, and researchers have attempted a number of different techniques, strategies and schemes [1, 5, 6]. Multi-agent systems research is more favored for its efficiency with respect to time, cost, harnessing preferred behaviors, achieve course not executable by an individual K. Raghuwaiya (B) · J. Vanualailai · J. Raj The University of the South Pacific, Suva, Fiji e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_18
199
200
K. Raghuwaiya et al.
robot, to name a few [7]. There are three motion planning archetypes for robots working in a given domain with obstructions, namely (1) cell decomposition-based motion planning, (2) road maps, and (3) artificial potential field (APF) [1]. Here, we utilize the Lyapunov controllers, via the Lyapunov’s second method [1], which is an APF strategy for the control of point masses. Spearheaded by Khatib in 1986 [8], the impact-free way for a self-governing robot is dictated by setting up APFs with horrendous shafts around the obstruction and appealing posts around the objectives. The advantages of APF method include easier implementation, easier analytical representations of system singularities, limitations and inequalities, and the ease of modeling via the kinematic and dynamic equations of a robot. However, it is always a challenge in any APF method to construct total potentials without local minima. To tackle this, various analysts have effectively considered this issue by means of the utilization of special functions [9]. We introduce the avoidance of cylindrical obstacles via the minimum distance method used in ([6]) to construct parking bays and avoid line obstacles. It involves the computation of the minimum distance from the center of the mobile robot to a point on the surface of the cylinder and thus the avoidance of the resultant point on the cylinder surface. This strategy characteristically ensures that the repulsive potential field function is just for the nearest point on the outside of the cylindrical chamber to the center of the point mass at all times. Section 2 represents the kinematic model of the mobile robot; in Sect. 3, the APF functions are defined; subsequently, in Sect. 4, the Lyapunov function of the system is constructed and the robust nonlinear continuous control inputs for the mobile robot is extracted; in Sect. 5, the stability analysis of our system is performed; in Sect. 6, the motion control for the point mass representing the mobile robot is simulated and the results are presented to verify the effectiveness and robustness of the proposed controllers; and finally, the conclusion and future work are given in Sect. 7.
2 Modeling of the Point Mass Mobile Robot A simple kinematic model for the moving point mass is proposed in this section. A two-dimensional schematic representation of a point mass with and without obstacle avoidance is shown in Fig. 1. We begin with the following definition: Definition 1 A point mass mobile robot, Pi , is a sphere of radius rpi centered at (xi (t), yi (t), zi (t)) ∈ R3 for t ≥ 0. Specifically, the point mass represents the set Pi = (Z1 , Z2 , Z3 ) ∈ R3 : (Z1 − xi )2 + (Z2 − yi )2 + (Z3 − zi ) rpi2 At every t ≥ 0, define the instantaneous velocities of Pi as (vi (t) , wi (t) , ui (t)) = (˙xi (t) , y˙ i (t) , z˙i (t)). With the initial conditions taken at t = t0 ≥ 0 for Pi , the differential system governing Pi :
3D Cylindrical Obstacle Avoidance Using the Minimum Distance Technique
Z2
Target of
Trajectory of
Point mass robot
201
i
i
i
Z1 Fig. 1 2D illustration of Pi in the Z1 − Z2 plane
x˙ i = vi (t) , y˙ i = wi (t) , z˙i = ui (t) , xi0 := xi (t0 ) , yi0 := yi (t0 ) , zi0 := zi (t0 ) ,
(1)
for i = 1, . . . , n. The objective is to steer Pi to the target configuration in R3 .
3 Use of the APF Functions Using kinodynamic constraints, the collision-free trajectories of Pi are detailed out. We want to design the velocity controllers vi (t) , wi (t) and ui (t) for i = 1, . . . , n so that Pi explored securely in a 3D workspace toward its objective while avoiding cylindrical obstacles. To obtain a feasible collision-free path, we utilize the APF functions in the Lyapunov-based control scheme (LbCS) to design the new controllers.
3.1 Attractive Potential Field Functions Target Attraction In our MPC problem, we want Pi to navigate from some initial configuration and converge toward the center of its target. We characterize the fixed objective to be a sphere with focus (τi1 , τi2 , τi3 ) and span of rτi . The point mass mobile robot Pi needs to be attracted toward its predefined target, hence, we consider the accompanying Vi (x) = where i = 1, . . . , n.
1 (xi − τi1 )2 + (yi − τi2 )2 + (zi − τi3 )2 , 2
(2)
202
K. Raghuwaiya et al.
Auxiliary Function The seminal goal is to ensure that the point mass mobile robot stops at its assigned target. To ensure this, a function of the form G i (x) = Vi (x) ,
(3)
for i = 1, . . . , n is proposed to ensure the intermingling and convergence of Pi to its assigned objective. This function is then combined via multiplication with every one of the repulsive potential field functions.
3.2 Potential Field Functions for Obstacle Avoidance Workspace Boundary Limitations The motion of Pi is restricted within the confinement of a workspace which is a 3D framework of dimensions η1 × η2 × η3 . The walls of the robots workspace will be considered as fixed obstacles in the LbCS. Along these lines, for Pi to maintain a strategic distance from these, we propose the accompanying functions ⎫ W Si1 (x) = xi − rpi , ⎪ ⎪ ⎪ W Si2 (x) = η2 − (yi + rpi ) , ⎪ ⎪ ⎪ ⎬ W Si3 (x) = η1 − (xi + rpi ) , W Si4 (x) = yi − rpi , ⎪ ⎪ ⎪ ⎪ W Si5 (x) = zi − rpi , ⎪ ⎪ ⎭ W Si6 (x) = η3 − (zi + rpi ) ,
(4)
for i = 1, . . . , n. The workspace is a fixed, closed, and bounded region. Since η1 , η2 , η3 > 2 × rpi , the functions for the avoidance of the walls in the given workspace are positive. Cylindrical Obstacles The surface wall of the cylinder are fixed obstacles that needs to be avoided by Pi . We begin with the following definition: Definition 2 The kth surface wall is collapsed and buckled into a cylinder in the Z1 Z2 Z3 plane between the following coordinates, (ak , bk , ck1 ) and (ak , bk , ck2 ) and with radius rck . The parametric representation of the kth cylinder with height (ck2 − ck1 ) can be given as Cxk = ak ± rck cos χk , Cyk = bk ± rck sin χk and Czk = ck1 + λk (ck2 − ck1 ) where χk : R → (− π2 , π2 ) and λk : R2 → [0, 1]. To facilitate the avoidance of the surface wall of the cylinder, we adopt the MDT from [6], which computes the minimum distance between the center of Pi and the surface of the kth cylinder. The coordinates of this point can be expressed as Cxik = ak ± rck cos χik , Cyik = bk ± rck sin χik and Czik = ck1 + λik (ck2 − ck1 )
3D Cylindrical Obstacle Avoidance Using the Minimum Distance Technique
203
yi − bk 1 and λik = (zi − ck1 ) , and the saturation xi − ak ⎧ (ck2 − ck1 ) ⎨ 0 , if λik < 0 π π . functions are given by λik = λik , if 0 ≤ λik ≤ 1 and χik = − , ⎩ 2 2 1 , if λik > 1 For the avoidance of the closest point on the surface of the kth cylinder by Pi , the following function is constructed
where χik = tan−1
COik (x) =
1 (xi − Cxik )2 + (yi − Cyik )2 + (zi − Czik )2 − (rpi )2 , 2
(5)
where i = 1, . . . , n and k = 1, . . . , m.
4 Nonlinear Controllers Next, we design nonlinear control laws pertaining to system (1).
4.1 Lyapunov Function Using the tuning parameters, the Lyapunov function or the total energy function for system (1) is given below: L (x) :=
n
Vi (x) + G i (x)
i=1
m k=1
℘is ζik + COik (x) s=1 W Sis (x) 6
.
(6)
4.2 Nonlinear Controllers Next, we look at the time derivative of various components of (6) along the solution of the kinematic system (1) and force it to be at least negative semi-definite. Using the convergence parameters, αi1 , αi2 , αi3 > 0, and upon suppressing x, the constituents of the controllers are of the form
204
K. Raghuwaiya et al.
m 6 ζik ℘is ℘i1 ℘i3 fi1 = 1 + + + Gi (xi − τi1 ) − G i 2 COik W S S Si3 )2 (W ) (W is i1 k=1 ⎞ ⎛ s=1 (yi − bk ) − Cx sin χ 1 ∓ r m (x ) ik k ik ζik ⎜ i 2 ⎟ (xi − ak )2 + (yi − bk ) ⎟, ⎜ −G i ⎠ ⎝ (yi − bk ) COik ± (y − Cy ) r cos χ k=1 i ik k ik 2 2 (xi − ak ) + (yi − bk ) m 6 ζik ℘is ℘i2 ℘i4 fi2 = 1 + + − Gi (yi − τi2 ) + G i 2 COik W S S S )2 (W ) (W is i2 s=1 k=1 i4 ⎞ ⎛ (xi − ak ) ± r − Cx sin χ (x ) i ik k ik m ⎟ ⎜ (xi − ak )2 + (yi − bk )2 ζik ⎜ ⎟, −G i COik ⎝ ⎠ (xi − ak ) k=1 + (yi − Cyik ) 1 ∓ rk1 cos χik 2 2 (xi − ak ) + (yi − bk ) m 6 ζik ℘is ℘i5 ℘i6 fi3 = 1 + + + Gi , (zi − τi3 ) − G i 2 COik W S (W Si5 ) (W Si6 )2 is s=1 k=1 for i = 1, . . . , n. Consider the theorem that follows: Theorem 1 Consider a mobile robot, Pi whose movement is administered along a solution of process system (1). The key objective is to encourage the motion and control inside a restricted workspace with the ultimatum goal for convergence to its assigned target. The subtasks comprise of: convergence to predefined targets, avoidance of the fixed cylindrical obstacles, and the walls of the workspace. To intrinsically guaranty the stability in the sense of Lyapunov of system (1), we consider the accompanying continuous time-invariant velocity control inputs: 1 fi1 , αi1 1 wi = − fi2 , αi2 1 ui = − fi3 , αi3 vi = −
⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭
(7)
for i = 1, . . . , n.
5 Stability Analysis The stability analysis to system (1) is considered next. We begin with the following theorem: Theorem 2 Let (τi1 , τi2 , τi3 ) be the position of the target of Pi . Then, xe := (τi1 , τi2 , τi3 , 0, 0, 0) ∈ R6 is a stable critical point of (1).
3D Cylindrical Obstacle Avoidance Using the Minimum Distance Technique
205
Proof For i = 1, . . . , n: 1. Over D(L(x)) = {x ∈ R6n : COik (x) > 0, k = 1, . . . , m; W Sis (x) > 0, s = 1, . . . , 6}, the function L(x) is continuous and positive; 2. L(xe ) = 0; 3. L(x) > 0 for all x ∈ D(L(x))/xe . Along a solution of system (1), the time derivative of (6) is L˙ (1) (x) =
n fi1 x˙ i + fi2 y˙ i + fi3 z˙i . i=1
Using the ODEs for system (1) and the controllers given in (7), the accompanying negative semi-definite function is acquired L˙ (1) (x) = −
n αi1 vi2 + αi2 wi2 + αi3 ui2 ≤ 0. i=1
Now, L˙ (1) (x) ≤ 0 ∀x ∈ D(L(x)) and L˙ (1) (xe ) = 0. Furthermore, L(x) ∈ C 1 (D(L(x))). Hence, L(x) is a Lyapunov function for system (1).
6 Simulation Results To illustrate the effectiveness of the proposed continuous time-invariant nonlinear velocity control laws within the framework of the LbCS, this section will demonstrate a virtual scenario via a computer simulation. We consider a point mass in a 3D environment with a fixed cylindrical obstacle in its path. The robot starts from its initial configuration and navigates to reach its final configuration while avoiding the cylindrical obstacle. It is observed that the mobile robot, which could resemble a quadrotor aircraft, has a smooth trajectory as well as a smooth increase its altitude until it reaches its target configuration. Figure 2 shows the default 3D-view and Fig. 3 shows the top 3D view of the motion of the mobile robot. Table 1 gives every one of the estimations of the underlying conditions, requirements, and various parameters used in the reenactment.
206
K. Raghuwaiya et al.
Fig. 2 Default 3D motion view of the point mass mobile robot at t = 0, 100, 200, 600 units
The behavior of the Lyapunov function L and its related time derivative L along the system trajectory is shown in Fig. 4. Essentially, it shows the intervals over which system (7) increments or diminishes its pace of vitality dissemination. The trajectory of the mobile robot is collision-free as indicated by the continuous evolution of the Lyapunov function.
7 Conclusion This paper provides the cylindrical obstacle avoidance method that incorporated into the LbCS. Controllers for the robotic system were derived and extracted using the LbCS, which successfully tackle the problem of MPC of point mass mobile robots. The MDT proposed in [6] has been modified for the avoidance of the cylindrical obstacle for the mobile robot. The robust controllers produced a smooth, feasible trajectory of the system with an amicable convergence to its equilibrium state. The effectiveness and robustness of the given control inputs were demonstrated in virtual scenario by the means of a computer recreation. To the author’s knowledge, this algorithm for the avoidance of cylindrical obstacles in a 3D space has not been proposed in literature. Future research will include the amalgamation of cylindrical
3D Cylindrical Obstacle Avoidance Using the Minimum Distance Technique
Fig. 3 Top 3D motion view of the point mass mobile robot at t = 0, 100, 200, 600 units Table 1 Initial and final states, constraints and parameters Description Value Initial state of the point mass mobile robot Workspace Rectangular position Radius of point mass Constraints Target center, radius Center of cylinder Height and radius of cylinder Control and convergence parameters Avoidance of cylindrical obstacles Avoidance of workspace Convergence
η1 = η2 = 200, η3 = 100 (x1 , y1 , z1 ) = (10, 90, 20) rp1 = 5 (τ11 , τ12 , τ13 ) = (180, 100, 850), rτ1 = 5 (a1 , b1 , c11 ) = (80, 100, 0), c12 = 90 and rc1 = 50 ζ11 = 1 ℘1s = 50 for s = 1, . . . , 6 α11 = α12 = α13 = 0.01
There is one point mass mobile robot, (n = 1) and 1 cylindrical obstacle, (k = 1)
207
208
K. Raghuwaiya et al.
Fig. 4 Behavior of L(x) and its associated time derivative L˙ (1) (x) along a solution of system (7)
obstacles together with spherical-shaped obstacles in a dynamic environment with multiple mobile robots and finally the implementation of the quadrotor dynamic model into the MPC problem.
References 1. B.N. Sharma, J. Raj, J. Vanualailai, Navigation of carlike robots in an extended dynamic environment with swarm avoidance. Int. J. Robust Nonlinear Control 28(2), 678–698 (2018) 2. K. Raghuwaiya, B. Sharma, J. Vanualailai, Leader-follower based locally rigid formation control. J. Adv. Transport. 1–14, 2018 (2018) 3. J. Vanualailai, J. Ha, S. Nakagiri, A solution to the two-dimensional findpath problem. Dynamics Stab. Syst. 13, 373–401 (1998). Dec. 4. J. Raj, K. Raghuwaiya, J. Vanualailai, B. Sharma, Navigation of car-like robots in threedimensional space, in Proceedings of the 2018 5th Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE) (2018), pp. 271–275 5. J. Raj, K. Raghuwaiya, S. Singh, B. Sharma, J. Vanualailai, Swarming intelligence of 1-trailer systems, in Advanced Computer and Communication Engineering Technology, ed. by H. A. Sulaiman, M. A. Othman, M.F.I. Othman, Y.A. Rahim, N.C. Pee (Springer International Publishing, Cham, 2016), pp. 251–264 6. B. Sharma, New Directions in the Applications of the Lyapunov-based Control Scheme to the Findpath Problem. PhD thesis, University of the South Pacific, Suva, Fiji Islands, July 2008 7. K. Raghuwaiya, S. Singh, Formation types of multiple steerable 1-trailer mobile robots via split/rejoin maneuvers. N. Z. J. Math. 43, 7–21 (2013) 8. O. Khatib, Real time obstacle avoidance for manipulators and mobile robots. Int. J. Robot. Res. 7(1), 90–98 (1986) 9. J. Vanualailai, B. Sharma, S. Nakagiri, An asymptotically stable collision-avoidance system. Int. J. Non-Linear Mech. 43(9), 925–932 (2008)
Path Planning of Multiple Mobile Robots in a Dynamic 3D Environment Jai Raj, Krishna Raghuwaiya, Jito Vanualailai, and Bibhya Sharma
Abstract In this paper, we present a theoretical exposition into the application of an artificial potential field method, that is, the Lyapunov-based control scheme. A motion planner of mobile robots navigating in a dynamic environment is proposed. The dynamic environment includes multiple mobile robots, fixed spherical and cylindrical-shaped obstacles. The motion planner exploits the minimum distance technique for the avoidance of the cylindrical obstacles. The mobile robots navigate in a bounded environment, avoiding all the obstacles and each other whilst enroute to its target. The effectiveness of the suggested nonlinear velocity governing laws is verified by a computer simulation which proves the efficiency and robustness of the control technique. Keywords Cylindrical obstacles · Minimum distance technique · Stability
1 Introduction Autonomous navigation is an active and functioning exploration domain and has been so far over the most recent few decades. When contextualised with mobile robots, this motion planning and control (MPC) problem involves the planning of a collision-free path for a given mobile robot to reach its final configuration in a designated amount of time [1]. In comparison with single agents, multi-agent also needs to avoid each other whilst synthesising a robots motion subject to the kinodynamic constraints associated with the robotic system [2, 3]. Multi-agent research is always favoured over single agents. The swarm of robots is able to carry out tasks such as surveillance, transportation, healthcare and mining, resulting in a high rate of system effectiveness [4, 5]. The workspace for multi-agent research is no longer static but dynamic, hence, devising motion planning algorithms becomes inherently difficult. J. Raj (B) · K. Raghuwaiya · J. Vanualailai · B. Sharma The University of the South Pacific, Suva, Fiji e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_19
209
210
J. Raj et al.
There are numerous algorithms that addresses the MPC problem of mobile robots [2]. In this research, we utilise the artificial potential field (APF) method, classified as the Lyapunov-based control scheme (LbCS) for the control of point-mass mobile robots [6]. The LbCS provides a simple and effective methodology of extracting control laws for various systems. The advantage of the control scheme lies in the flexibility to consider system singularities, for example, workspace limitations, velocity constraints and obstacles [7]. We use the concept of the minimum distance technique (MDT) [8] to develop an algorithm for cylindrical obstacle avoidance. Using the architecture of the LbCS, we propose a motion planner for point-mass robots traversing in the presence of obstacles in a given workspace. Based on the APF method, the stability of the system is considered via the direct method of Lyapunov. This research article is organised as follows: Sect. 2 represents the kinematic model of the point-mass system; in Sect. 3, the APF functions are defined; in Sect. 4, the design of the Lyapunov function is considered and the robust continuous nonlinear control laws for the point-mass mobile robot is extracted; in Sect. 5, the stability analysis of our system is performed; in Sect. 6, the effectiveness and robustness of the derived controllers are simulated; and finally, the conclusion and future work are given in Sect. 7.
2 Modelling of the Point-Mass Mobile Robot A simple kinematic model for mobile robots is presented in this section and frontier the development of velocity controls to address the multi-tasking problem of multiple point-mass in a dynamic environment. Figure 1 represents a two-dimensional schematic of a point-mass mobile robot. Definition 1 A point-mass, Pi is a sphere of radius rpi and centred at (xi (t), yi (t), zi (t)) ∈ R3 for t ≥ 0. Specifically, the point-mass represents the set P i = (Z1 , Z2 , Z3 ) ∈ R3 : (Z1 − xi )2 + (Z2 − yi )2 + (Z3 − zi ) ≤ rpi2 At time t ≥ 0, let (vi (t) , wi (t) , ui (t)) = (˙xi (t) , y˙ i (t) , z˙i (t)) be the instantaneous velocities of Pi . Assuming the initial conditions at t = t0 ≥ 0, a system of first-order ODE’s for Pi is given below, : x˙ i = vi (t) , y˙ i = wi (t) , z˙i = ui (t) , xi0 := xi (t0 ) , yi0 := yi (t0 ) , zi0 := zi (t0 ) ,
(1)
for i = 1, . . . , n, with the principle objective to steer and navigate Pi to its goal configuration in R3 .
Path Planning of Multiple Mobile Robots in a Dynamic 3D Environment
Z2
Target of
2
Target of
211
1
Obstacle
2
1
Z1 Fig. 1 2D representation of Pi in a two-dimensional plane
3 Deployment of the APF Functions In this section, we will present and outline a disposition for yielding collision-free motions of Pi in a well-defined, bounded but dynamic environment. For Pi , the velocity controllers vi (t), wi (t) and ui (t) will be designed using the LbCS. The ultimate goal for Pi is to navigate safely in the dynamic environment and converge to its target configuration. We begin by unfolding the target, the obstacles and the boundary limitations.
3.1 Attractive Potential Field Functions Target Attraction The target of Pi is a sphere with centre (τi1 , τi2 , τi3 ) and radius rτi . For the attraction of Pi to this target, the work utilises, in a candidate Lyapunov function to be proposed, a function for the convergence to the target of the form Vi (x) =
1 (xi − τi1 )2 + (yi − τi2 )2 + (zi − τi3 )2 , 2
(2)
for i = 1, . . . , n. Auxiliary Function For Pi to converge to its designated target, an auxiliary function of the form (3) G i (x) = Vi (x) , for i = 1, . . . , n is proposed.
212
J. Raj et al.
3.2 Repulsive Potential Field Functions Workspace Boundary Limitations A really important consideration with any robot is the set of all possible points that it can reach. We refer to this volume as the workspace of the robot. Hence, we desire to confine the motion of Pi in a 3D framework of dimensions η1 × η2 × η3 . The boundary walls are treated as fixed obstacles. Hence, for the avoidance of these walls, we propose the following functions of the form ⎫ W Si1 (x) = xi − rpi , ⎪ ⎪ ⎪ W Si2 (x) = η2 − (yi + rpi ) , ⎪ ⎪ ⎪ ⎬ W Si3 (x) = η1 − (xi + rpi ) , (4) W Si4 (x) = yi − rpi , ⎪ ⎪ ⎪ ⎪ W Si5 (x) = zi − rpi , ⎪ ⎪ ⎭ W Si6 (x) = η3 − (zi + rpi ) , for i = 1, . . . , n, noting that they are positive within the rectangular cuboid. Moving Obstacles Every mobile robot turns into a moving deterrent for each other moving robot in the bounded environment. Therefore, for Pi to avoid Pj , we consider the following M Oij (x) =
2
2
2
2 1 xi − xj + yi − yj + zi − zj − rpi + rpj , 2
(5)
for i, j = 1, . . . , n, j = i. Spherical Obstacles Consider qa ∈ N spherical-shaped obstacles within a bounded environment which Pi needs to avoid. We consider the following definition: Definition 2 A stationary solid object is a sphere with centre (ol1 , ol2 , ol3 ) and radius rol . Thus, Or = (Z1 , Z2 , Z3 ) ∈ R3 : (Z1 − ol1 )2 + (Z2 − ol2 )2 + (Z3 − ol3 )2 rol 2 . For Pi to avoid these spherical obstacles, we consider FOl (x) =
1 (xi − ol1 )2 + (yi − ol2 )2 + (zi − ol3 )2 − (rpi + rol )2 , 2
(6)
where i = 1, . . . , n and l = 1, . . . , qa. Cylindrical Obstacles The surface wall of the cylinder are classified as fixed obstacles. Hence, the point-mass mobile robot Pi needs to avoid these walls. To begin, the following definition is made: Definition 3 The kth surface wall is collapsed into a cylinder in the Z1 Z2 Z3 plane with initial coordinates (ak , bk , ck1 ) and final coordinates (ak , bk , ck2 ) with radius rck . The parametric representation of the kth cylinder of height (ck2 − ck1 ) can be given as Cxk = ak ± rck cos χk , Cyk = bk ± rck sin χk and Czk = ck1 + λk (ck2 − ck1 ) where χk : R → (− π2 , π2 ) and λk : R2 → [0, 1].
Path Planning of Multiple Mobile Robots in a Dynamic 3D Environment
213
In order to facilitate the avoidance of the surface wall of the cylinder, we adopt the MDT from [8]. We compute the shortest distance between the centre of Pi and the surface of the kth cylinder. This results in the avoidance of the resultant point of the surface of the kth cylinder. The coordinates of this point can be expressed cos χik , Cy as Cxik = ak ± rck ik = bk ± rck sin χik and Czik = ck1 + λik (ck2 − ck1 ) y − b 1 i k and λik = (zi − ck1 ) , and the saturation where χik = tan−1 xi − ak ⎧ (ck2 − ck1 ) ⎨ 0 , if λik < 0 π π . functions are given by λik = λik , if 0 ≤ λik ≤ 1 and χik = − , ⎩ 2 2 1 , if λik > 1 Therefore, for Pi to circumvent past the closest point on the surface wall of the kth cylinder, we consider the following function COik (x) =
1 (xi − Cxik )2 + (yi − Cyik )2 + (zi − Czik )2 − (rpi )2 , 2
(7)
where k = 1, . . . , m and i = 1, . . . , n.
4 Design of the Nonlinear Controllers Next, we design the Lyapunov function and extract the governing laws pertaining to system (1).
4.1 Lyapunov Function Using the tuning parameters and for i, j = 1, . . . , n, (i) (ii) (iii) (iv)
℘is > 0, s = 1, . . . , 6; ϑil > 0, l = 1, . . . , qa; ζik > 0, k = 1, . . . , m; βij > 0, j = i,
the Lyapunov function for system (1) is: ⎞⎤ qa ℘is ϑil + + ⎟⎥ ⎢ ⎜ n ⎢ ⎜ W Sis (x) FOil (x) ⎟⎥ ⎢ ⎜ s=1 ⎟⎥ l=1 m n L (x) := ⎢Vi (x) + G i (x) ⎜ ⎟⎥. βij ζik ⎢ ⎜ ⎟⎥ i=1 ⎣ + ⎝ ⎠⎦ COik (x) M O (x) ij j=1 k=1 ⎡
⎛
6
j =i
(8)
214
J. Raj et al.
4.2 Nonlinear Controllers By differentiating the different components of L (x) along t, we then extract the control laws for Pi . Using the convergence parameters αi1 , αi2 , αi3 > 0, the constituents of the control inputs are: ⎞ qa 6 m n βij ⎟ ℘is ϑil ζik ⎜ + + + fi1 = ⎝1 + ⎠ (xi − τi1 ) W Sis FOil COik M Oij j=1 s=1 ⎛
l=1
k=1
j =i
qa ℘i1 ℘i3 ϑil −G i + Gi − Gi (xi − ol1 ) 2 2 (W Si1 ) (W Si3 ) (FOil )2 l=1 n
βij −2G i xi − xj M Oij j=1 j =i ⎞ ⎛ (yi − bk ) 1 ∓ r − Cx sin χ m (x ) ik k ik ζik ⎜ i 2 ⎟ (xi − ak )2 + (yi − bk ) ⎟, ⎜ −G i ⎠ (yi − bk ) COik ⎝ ± (y − Cy ) r cos χ k=1 i ik k ik 2 2 (xi − ak ) +⎞(yi − bk ) ⎛
℘is ϑil ζik βij ⎟ ⎜ fi2 = ⎝1 + + + + ⎠ (yi − τi2 ) W Sis FOil COik M Oij j=1 s=1 6
qa
m
l=1
k=1
n
j =i
qa ℘i2 ℘i4 ϑil +G i − G − G (yi − ol2 ) i i 2 2 (W Si2 ) (W Si4 ) (FOil )2 l=1 n
βij −2G i yi − yj M Oij j=1 j =i ⎞ ⎛ (xi − ak ) ± (xi − Cxik ) rk sin χik m ⎟ ζik ⎜ (xi − ak )2 + (yi − bk )2 ⎜ ⎟, −G i ⎠ ⎝ − a (x ) i k COik + (y − Cy ) 1 ∓ r cos χ k=1 i ik k1 ik 2 2 (xi − ak ) + (yi − bk )
⎞ qa 6 m n βij ⎟ ℘is ϑil ζik ⎜ fi3 = ⎝1 + + + + ⎠ (zi − τi3 ) W S FO CO M Oij is il ik j=1 s=1 ⎛
l=1
k=1
j =i
ϑil ℘i5 ℘i6 −G i + G − G (zi − ol3 ) i i (W Si5 )2 (W Si6 )2 (FOil )2 l=1 n
βij −2G i zi − zj , M Oij j=1 qa
j =i
for i = 1, . . . , n.
Path Planning of Multiple Mobile Robots in a Dynamic 3D Environment
215
Theorem 1 Let the motion of the mobile robot, Pi be controlled by the ODE’s in system (1). The overall objective is to navigate Pi in a dynamic workspace and reaching its final configuration. The subtasks include: convergence to predefined targets, avoidance of the fixed spherical and cylindrical obstacles, avoidance of the walls of the boundaries and avoidance of other moving point-mass mobile robots. To guaranty stability in the sense of Lyapunov of system (1), we consider the following continuous velocity control laws: vi = −
1 1 1 fi1 , wi = − fi2 , ui = − fi3 , αi1 αi2 αi3
(9)
for i = 1, . . . , n.
5 Stability Analysis We begin with the accompanying theorem for stability analysis: Theorem 2 Let (τi1 , τi2 , τi3 ) be the position of the target of the point-mass mobile robot, Pi . Given a stable equilibrium point for system (1), xe ∈ D(L(x)), where xe := (τi1 , τi2 , τi3 , 0, 0, 0) ∈ R6 . Proof For i = 1, . . . , n: 1. Over the space D(L(x)) = {x ∈ R6n : W Sis (x) > 0, s = 1, . . . , 6; FOil (x) > 0, l = 1, . . . , qa; COik (x) > 0, k = 1, . . . , m; MOij (x) > 0, j = i}, L(x) is positive and continuous; 2. L(xe ) = 0; 3. L(x) > 0 ∀x ∈ D(L(x))/xe . Then, along a solution of system (1), we have: L˙ (1) (x) =
n fi1 x˙ i + fi2 y˙ i + fi3 z˙i . i=1
Using (9), we have the following time derivative of L(x)semi-negative definite function for system (1) L˙ (1) (x) = −
n αi1 vi2 + αi2 wi2 + αi3 ui2 ≤ 0. i=1
216
J. Raj et al.
Therefore, L˙ (1) (x) ≤ 0 ∀x ∈ D(L(x)) and L˙ (1) (xe ) = 0. Moreover, L(x) ∈ C 1 (D(L(x))), hence, for system (1), L(x) is classified as its Lyapunov function and xe is a stable equilibrium point. The above result does not contradict Brockett’s theorem [9] since we are only proving stability.
6 Simulation Results To demonstrate the effectiveness and robustness of our proposed scheme, we simulate a virtual scenario. The stability results obtained from the Lyapunov function are verified numerically. We consider the motion of point-mass mobile robots with fixed spherical and cylindrical obstacles in its path. The mobile robots navigate towards its designated targets whilst ensuring collision-free manoeuvers with any obstacle. Figure 2 shows the default 3D-view and Fig. 3 shows the top 3D view of the motion of the mobile robots. The values of the initial conditions, constraints and different parameters utilised in the simulation are provided in Table 1. The behaviour of the Lyapunov function and its time derivative are shown in Fig. 4.
Fig. 2 Default 3D motion of the point-mass mobile robots at t = 0, 22, 101, 500 units
Path Planning of Multiple Mobile Robots in a Dynamic 3D Environment
217
Fig. 3 Top 3D motion of the point-mass mobile robot at t = 0, 22, 101, 500 units Table 1 Parameters utilised in the numerical simulation. There are 2 point-mass mobile robot, (n = 2), 4 spherical-shaped obstacles (qa = 4) and 3 cylindrical obstacle, (m = 3) Description
Value
Initial state of the point-mass mobile robot Workspace Rectangular position
η1 = 500, η2 = 200, η3 = 100 (x1 , y1 , z1 ) = (30, 100, 20) (x2 , y2 , z2 ) = (30, 150, 50)
Radius of point-mass
rp1 = rp2 = 5
Constraints Target centre, radius
(τ11 , τ12 , τ13 ) = (400, 130, 75), rτ1 = 5 (τ21 , τ22 , τ23 ) = (400, 50, 50), rτ2 = 5
Centre of cylinder, radius and height
(a1 , b1 , c11 ) = (250, 110, 0), rc1 = 50 , c12 = 70 (a2 , b2 , c21 ) = (350, 70, 0), rc2 = 30 , c22 = 85 (a3 , b3 , c31 ) = (350, 110, 0), rc3 = 30 , c32 = 90
Sphere centre and radius
(o11 , o12 , o13 ) = (100, 120, 5), ro1 = 20 (o21 , o22 , o23 ) = (120, 160, 20), ro2 = 20 (o31 , o32 , o33 ) = (110, 50, 70), ro3 = 20 (o41 , o42 , o43 ) = (160, 70, 50), ro4 = 20
Control and convergence parameters Avoidance of spherical obstacles
ϑil = 10, for i = 1, 2, l = 1, . . . , 4
Avoidance of cylindrical obstacles
ζik = 0.5, for i = 1, 2, k = 1, . . . , 3
Avoidance of workspace
℘is = 50 for i = 1, 2, s = 1, . . . , 6
Inter-individual collision avoidance
βij = 20 for i = j = 1, 2, j = i
Convergence
αi1 = αi2 = αi3 = 0.05, for i = 1, 2
218
J. Raj et al.
Fig. 4 Behaviour of L˙ (1) (x) and L(x)
7 Conclusion In this paper, for a point-mass mobile robot, we explore its motion in a dynamic environment. This environment includes spherical and cylindrical obstacles and interindividual collision avoidance amongst the mobile robots. To do so, the robust, velocity control inputs were derived from the LbCS, which produced feasible trajectories of the mobile robots to navigate safely in the dynamic environment, whilst ensuring collision avoidance with any obstacles. To show the effectiveness and robustness of our scheme, a virtual scenario is simulated for the mobile robots. Future research will address having tunnel passing manoeuvers in a 3D-space using hollow cylinders.
References 1. J. Vanualailai, J. Ha, S. Nakagiri, A solution to the two-dimensional findpath problem. Dynamics Stability Syst. 13, 373–401 (1998). Dec. 2. B.N. Sharma, J. Raj, J. Vanualailai, Navigation of carlike robots in an extended dynamic environment with swarm avoidance. Int. J. Robust Nonlinear Control 28(2), 678–698 (2018) 3. J. Raj, K. Raghuwaiya, S. Singh, B. Sharma, J. Vanualailai, Swarming intelligence of 1-trailer systems, in Advanced Computer and Communication Engineering Technology, ed. by H. A. Sulaiman, M. A. Othman, M.F.I. Othman, Y.A. Rahim, N.C. Pee (Springer International Publishing, Cham, 2016), pp. 251–264 4. A. Prasad, B. Sharma, J. Vanualailai, A new stabilizing solution for motion planning and control of multiple robots. Robotica 34(5), 1071–1089 (2016) 5. K. Raghuwaiya, B. Sharma, J. Vanualailai, Leader-follower based locally rigid formation control. J. Adv. Transport. 1–14, 2018 (2018) 6. B. Sharma, J. Vanualailai, A. Prasad, Trajectory planning and posture control of multiple mobile manipulators. Int. J. Appl. Math. Comput. 2(1), 11–31 (2010) 7. J. Vanualailai, B. Sharma, A. Ali, Lyapunov-based kinematic path planning for a 3-link planar robot arm in a structured environment. Global J. Pure Appl. Math. 3(2):175–190 (2007)
Path Planning of Multiple Mobile Robots in a Dynamic 3D Environment
219
8. B. Sharma. New Directions in the Applications of the Lyapunov-based Control Scheme to the Findpath Problem. PhD thesis, University of the South Pacific, Suva, Fiji Islands, July 2008 9. R.W. Brockett, Differential geometry control theory, Asymptotic Stability and Feedback Stabilisation (Springer, Berlin, 1983), pp. 181–191
Autonomous Quadrotor Maneuvers in a 3D Complex Environment Jito Vanualailai, Jai Raj, and Krishna Raghuwaiya
Abstract This paper essays collision-free avoidance maneuvers of a quadrotor aircraft. We use the artificial potential field method via a scheme, known as the Lyapunov-based control scheme to extract the inputs of the control laws that will be utilized to govern the autonomous navigation of the quadrotor. The hollow cylinder, which becomes an obstacle for the quadrotor, is avoided via the minimum distance technique. The surface wall of the cylinder is avoided whereby we compute the minimum Euclidean distance from the centre of the quadrotor to the surface wall of the cylinder and then avoid this resultant point. The quadrotor autonomously navigates itself past the obstacle to reach its target. The effectiveness of the proposed nonlinear control inputs are demonstrated of a virtual scenario via a computer simulation. Keywords Cylindrical obstacle · Quadrotor · Minimum distance technique
1 Introduction Autonomous navigation of unmanned aerial vehicles (UAVs) has been an active area of research in the past decade and has attracted significant interest from both academic researchers and commercial designers. Among all the other types of UAVs, quadrotor UAVs are the most common and favoured. In essence, this is due to their advanced characteristics, simple structure and ease to assemble, and its vertical take-off and landing (VTOL) capabilities [1]. Despite the quadrotor being popular, there still exists problems at large that needs to be solved. The quadrotor is an under actuated system which has only four control inputs and six outputs. In addition, the attitude dynamics and position dynamics of a quadrotor are strongly coupled. To solve these problems, several control solutions have been proposed by researchers, including PID controllers, fuzzy PID, sliding mode control, linear quadratic regulator (LQR), and Lyapunov method [2]. J. Vanualailai (B) · J. Raj · K. Raghuwaiya The University of the South Pacific, Suva, Fiji e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_20
221
222
J. Vanualailai et al.
In this research, we will employ the Lyapunov-based control scheme (LbCS) [3, 4], which is an artificial potential field (APF) method for the autonomous control of quadrotor, which is an intelligent vehicle system (IVS). The governing principle behind the LbCS revolves around the construction of functions for target attraction and obstacle avoidance. With respective to the APF functions, the attractive functions are contemplated as attractive potential field functions and the obstacle avoidance functions as repulsive potential field functions. The rational functions are then constructed with positive tuning parameters in the numerator of the repulsive potential field functions [4, 5]. The advantage of employing the APF method lies in its simplicity and elegance to construct functions out of system constraints and inequalities, favourable processing speed, decentralization, and stability features [6, 7], although it inherently involves the existence of local minima. The workspace for the quadrotor is immersed with positive and negative fields, with the direction of motion facilitated via the notion of steepest decent. The environment in which the quadrotor will be exposed to is a complex and dynamic environment. First of all, the dynamics of the quadrotor itself is complex, hence making the motion planning and control (MPC) problem a challenging, computer intensive, yet an interesting problem. These results in the dynamic constraints are to be considered as repulsive potential filed functions. We will also introduce obstacles in the workspace, in this case cylindrical obstacles, whereby the minimum distance technique (MDT) [8] will be employed for a smooth collision-free path by the quadrotor. Using the composition of the LbCS, we propose a motion planner for quadrotor aircraft navigating in a workspace cluttered with obstacles. The remainder of this research article is organized as follows: Sect. 2 represents the dynamic model of the quadrotor UAV; in Sect. 3, the APF functions are defined; in Sect. 4, the Lyapunov function is constructed, and the robust continuous nonlinear control inputs for the quadrotor are extracted; in Sect. 5, we demonstrate the effectiveness of the proposed controllers via a computer simulation; and finally, the conclusion and future work are given in Sect. 6.
2 Dynamic Model of a Quadrotor In this section, we model a dynamic model of a quadrotor UAV. Its dynamics are shown in Fig. 1. The dynamic model of the quadrotor can be expressed by the following set of nonlinear differential equation:
Autonomous Quadrotor Maneuvers in a 3D Complex Environment
223
Fig. 1 Schematic structural configuration of a quadrotor UAV
x˙ i Ui1 (cos φi sin θi cos ψi + sin φi sin ψi ) − κi1 , mi mi y˙ i Ui1 y¨ i = (cos φi sin θi sin ψi − sin φi cos ψi ) − κi2 , m mi z˙i Ui1 z¨i = cos φi cos θi − g − κi3 mi mi ˙ φi θ˙i ψ˙ i li l 1 φ¨ i = Ui2 − li κi4 , θ¨i = Ui3 − li κi5 , ψ¨ i = Ui4 − κi6 . Iix Iix Iiy Iiy Iiz Iiz x¨ i =
⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭
(1)
For the ith quadrotor UAV, (xi , yi , zi ) represents position of the quadrotor, (φi , θi , ψi ) are the three Euler angles, namely the roll, pitch, and yaw angles, respectively, g is the gravitational acceleration, mi is the total mass of the quadrotor structure, li is the half length of the quadrotor, Iix,iy,iz are the moments of inertia, κiι , ι = 1, . . . , 6 are the drag coefficients, and Uiς , ς = 1, . . . , 4 are the virtual control inputs. To remove and solve the under-actuation of the quadrotor system, we introduce the control inputs
2 2 2 + Ui1y + Ui1z . The Ui1x , Ui1y , and Ui1z to replace Ui1 . Consequently, Ui1 = Ui1x set of first-order ODE’s for the quadrotor model is hence:
224
J. Vanualailai et al.
x˙ i = vi ,
y˙ i = wi , z˙i = ui ,
vi Ui1x − κi1 , mi mi Ui1y wi = (cos φi sin θi sin ψi − sin φi cos ψi ) − κi2 , mi mi ui Ui1z = cos φi cos θi − g − κi3 , mi mi ˙ ˙ = qi , θi = pi , ψi = ri , qi pi ri li li 1 = Ui2 − li κi4 , p˙ i = Ui3 − li κi5 , r˙i = Ui4 − κi6 . Iix Iix Iiy Iiy Iiz Iiz
v˙i = (cos φi sin θi cos ψi + sin φi sin ψi ) w˙ i u˙ i φ˙ i q˙ i
⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭
(2)
3 Deployment of the APF Functions In this section, we formulate collision-free trajectories of the quadrotor UAV using control laws extracted via the LbCS. We design attractive potential field function for attraction to the target and repulsive potential field functions for repulsion from obstacles. We enclose the ith quadrotor UAV, Ai with a spherical protective region of radius li .
3.1 Attractive Potential Field Functions Attraction to Target The designated target of Ai is a sphere with center (τi1 , τi2 , τi3 ), and radius rτi . For the attraction of the quadrotor UAV to this target, we design the target attractive potential field function of the form 1 (xi − τi1 )2 + (yi − τi2 )2 + (zi − τi3 )2 + vi2 + wi2 + ui2 + qi2 + pi2 + ri2 2 (3) for i = 1, . . . , n. The function is not only a measure of the distance between the quadrotor and its target, but also a measure of convergence to the target. Auxiliary Function In order to achieve the convergence of the quadrotor aircraft to its designated target and to ensure that the nonlinear controllers vanish at this target configuration, we consider the auxiliary function of the form Vi (x) =
G i (x) =
1 (xi − τi1 )2 + (yi − τi2 )2 + (zi − τi3 )2 , 2
(4)
for i = 1, . . . , n. This auxiliary function is then multiplied to each of the obstacle avoidance functions to be designed in the following subsection.
Autonomous Quadrotor Maneuvers in a 3D Complex Environment
225
3.2 Repulsive Potential Field Functions Modulus Bound on the Angles The roll (φi ), pitch (θi ), and yaw (ψi ) angles of Ai are limited and bounded to avoid the quadrotor UAV from flipping over. These are treated as artificial obstacles, and for their avoidance, we construct obstacle avoidance functions of the form 1 (φmax − φi ) (φmax + φi ) , 2 1 Si2 (x) = (θmax − θi ) (θmax + θi ) , 2 1 Si3 (x) = (ψmax − ψi ) (ψmax + ψi ) , 2 Si1 (x) =
(5) (6) (7)
where i = 1, . . . , n, φmax is the maximum pitching angle, φmax = π2 , θmax is the maximum rolling angle, θmax = π2 , and ψmax is the maximum yawing angle, ψmax = π. Modulus Bound on the Velocities In order to avoid fast solutions and for safety reasons, the translational and angular velocities of the quadrotors are bounded. Again, treated as artificial obstacles, for their avoidance, we construct obstacle avoidance functions of the form Di1 (x) = Di2 (x) = Di3 (x) = Di4 (x) = Di5 (x) = Di6 (x) =
1 (vmax − vi ) (vmax + vi ) , 2 1 (wmax − wi ) (wmax + wi ) , 2 1 (umax − ui ) (umax + ui ) , 2 1 (qmax − qi ) (qmax + qi ) , 2 1 (pmax − pi ) (pmax + pi ) , 2 1 (rmax − ri ) (rmax + ri ) , 2
(8) (9) (10) (11) (12) (13)
where i = 1, . . . , n. Cylindrical Obstacles The surface wall of the cylinder are classified as fixed obstacles. Hence, the quadrotor UAV Ai needs to avoid these walls. To begin, the following definition is made: Definition 1 The kth surface wall is collapsed into a cylinder in the Z1 Z2 Z3 plane between the following coordinates, (ak , bk , ck1 ) and (ak , bk , ck2 ) with radius rck . The parametric representation of the kth cylinder of height (ck2 − ck1 ) can be given as Cxk = ak ± rck cos χk , Cyk = bk ± rck sin χk and Czk = ck1 + λk (ck2 − ck1 ) where χk : R → (− π2 , π2 ) and λk : R2 → [0, 1].
226
J. Vanualailai et al.
In order to facilitate the avoidance of the surface wall of the cylinder, we adopt the architecture of the MDT from [8]. We compute the minimum Euclidian distance from the center of Ai to the surface of the kth cylinder and avoid this resultant point of the surface of the cylinder. From geometry, the coordinates of this point can be given cos χik , Cy as Cxik = ak ± rck
ik = bk ± rck sin χik and Czik = ck1 + λik (ck2 − ck1 ) y − b 1 i k and λik = (zi − ck1 ) , and the saturation where χik = tan−1 xi − ak ⎧ (ck2 − ck1 ) ⎨ 0 , if λik < 0 π π . functions are given by λik = λik , if 0 ≤ λik ≤ 1 and χik = − , ⎩ 2 2 1 , if λik > 1 Therefore, for Ai to avoid the closest point on the surface of the kth cylinder, we construct repulsive potential field functions of the form COik (x) =
1 (xi − Cxik )2 + (yi − Cyik )2 + (zi − Czik )2 − rτi2 , 2
(14)
for i = 1, . . . , n and k = 1, . . . , m.
4 Design of the Nonlinear Controllers In this section, the Lyapunov function will be proposed, and the nonlinear control laws for system 2 will be designed.
4.1 Lyapunov Function First, for i = 1, . . . , n, we design the Lyapunov function by introducing the following tuning parameters that will be utilized to form the repulsive potential field functions: (i) ξis > 0, s = 1, . . . , 3, for the avoidance of sth artificial obstacles from the dynamic constraints of the angles; (ii) γid > 0, d = 1, . . . , 6, for the avoidance of the d th artificial obstacles from dynamic constraints of the translational and rotational velocities; (iii) ζik > 0, k = 1, . . . , m, for the avoidance of the surface wall of the kth cylinder. Using these tuning parameters, we now propose the following Lyapunov function for system (2): L (x) :=
n i=1
Vi (x) + G i (x)
3 s=1
γid ξis ζik + + Sis (x) Did (x) COik (x) d =1 k=1 6
m
. (15)
Autonomous Quadrotor Maneuvers in a 3D Complex Environment
227
4.2 Nonlinear Controllers To extract the control laws for the quadrotor, we take the time derivative of the various components of L (x), which upon suppressing x is: L˙ (2) (x) =
fi1 vi + fi2 wi + fi3 ui + fi4 vi v˙i + fi5 wi w˙ i + fi6 ui u˙ i +gi1 qi + gi2 pi + gi3 ri + gi4 qi q˙ i + gi5 pi p˙ i + gi6 ri r˙i ,
(16)
where
3 6 m ξis γid ζik fi1 = 1 + + + (xi − τi1 ) S Did COik s=1 is ⎛d =1 k=1
⎞ (yi − bk ) 1 ∓ rc − Cx sin χ m (x ) i ik k ik ζik ⎜ 2 ⎟ (xi − ak )2 + (yi − bk )
⎟, ⎜ −G i ⎠ ⎝ − b (y ) i k COik ± (y − Cy ) rc cos χ k=1 i ik k ik 2 2 (xi − ak ) + (yi − bk ) 3 6 m ξis γid ζik fi2 = 1 + + + (yi − τi2 ) S D CO is id ik s=1 d =1 k=1
⎞ ⎛ (xi − ak ) rc ± − Cx sin χ m (x ) i ik k ik ζik ⎜ ⎟ (xi − ak )2 + (yi − bk )2 ⎜
⎟, −G i ⎠ ⎝ (xi − ak ) COik + (y − Cy ) 1 ∓ rc cos χ k=1 i ik k ik 2 2 (xi − ak ) + (yi − bk ) 3 6 m ξis γid ζik γi1 fi3 = 1 + + + (zi − τi3 ) , fi4 = 1 + 2 , S D CO D id ik i1 s=1 is d =1 k=1 γi2 γi3 ξi1 ξi2 ξi3 , gi2 = , gi3 = , fi5 = 1 + 2 , fi6 = 1 + 2 , gi1 = Si1 Si2 Si3 Di2 Di3 γi4 γi5 γi6 gi4 = 1 + 2 , gi5 = 1 + 2 , gi6 = 1 + 2 . Di4 Di5 Di6 First, we introduce the convergence parameters δi > 0, = 1, . . . , 6. Then, we utilize the concept fi1 v + fi4 v v˙ = −δi1 vi2 and make necessary substitutions to derive the control inputs. Utilizing these for the other cases, the following nonlinear control inputs are generated for system (2):
228
J. Vanualailai et al.
Ui1x = Ui1y = Ui1z = Ui2 = Ui3 = Ui4 =
−mi (fi1 + δi1 vi ) + fi4 κi1 vi , fi4 (cos φi sin θi cos ψi + sin φi sin ψi ) −mi (fi2 + δi2 wi ) + fi5 κi2 wi , fi5 (cos φi sin θi sin ψi − sin φi cos ψi ) −mi (fi3 + δi3 ui − fi6 g) + fi6 κi3 ui , fi6 (cos θi cos φi ) −Iix (gi1 + δi4 qi ) + gi4 li κi4 qi , gi4 li −Iiy (gi2 + δi5 pi ) + gi5 li κi5 pi , gi5 li −Iiz (gi3 + δi6 ri ) + gi6 κi6 ri . gi6
⎫ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎬ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎭
(17)
5 Simulation Results In this section, we exhibit, via a virtual scenario the effectiveness of our proposed control scheme. The quadrotor begins its flight from its initial position. Initially, the quadrotor pitches in order to initiate the movement. While in motion to its target, it comes across a hollow cylinder. It tries to navigate its motion past the cylindrical obstacle by going from its side; however, to minimize the time and distance, it flies avoiding the curve surface of the cylinder and goes over it. Since the cylinder is hollow, it pitches inside and, however, encounters the inside wall of the cylinder. It
Fig. 2 Default 3D motion view of the quadrotor at t = 0, 165, 260, 315, 450 units
Autonomous Quadrotor Maneuvers in a 3D Complex Environment
229
Fig. 3 Top 3D motion view of the quadrotor at t = 0, 165, 260, 315, 450 units
Fig. 4 Front 3D motion view of the quadrotor at t = 0, 165, 260, 315, 450 units
then moves out of the cylinder and moves towards its target destination. In essence, the A1 , while in flight motion to its target, avoids the cylindrical-shaped obstacle. Figure 2 shows the default 3D view, Fig. 3 shows the top 3D view, and Fig. 4 shows the front 3D view of the motion of the point quadrotor UAV. Table 1 provides all the values of the initial conditions, constraints, and different parameters utilized in the simulation.
6 Conclusion In this paper, we present a set of robust, nonlinear control laws, derived using the LbCS for the control of motion of quadrotors. The quadrotor, whilst in motion to its target, needs to avoid the cylindrical obstacle and undergo a collision-free navigation. The effectiveness of our proposed control laws is exhibited using a virtual scenario via a computer simulation. The derived controllers for the control input for the
230
J. Vanualailai et al.
Table 1 Parameters of the quadrotor UAV Description Value Initial state of the quadrotor Position Angles Translational velocities Rotational velocities Constraints Mass Length Gravitational acceleration Moments of inertia Drag coefficient Target position Maximum translational velocities Maximum rotational velocities Parameters for the quadrotor Dynamic constraints on the angles Dynamic constraints on the velocities Avoidance of cylindrical obstacles Convergence
(xi , yi , zi ) = (0, 0, 0) φi = θi = ψi = 0 vi = wi = ui = 0.5 qi = pi = ri = 0 mi = 2 li = 0.2 g = 9.8 Iix = Iiy = 4.856 × 10−3 , and Iiz = 4.856 × 10−2 κiι = 0.01, for ι = 1, . . . , 6 (τi1 , τi2 , τi3 ) = (450, 120, 145) vmax = wmax = umax = 1 qmax = pmax = rmax = 0.5
Units
rad m/s rad/s kg m m/s2 kg m2 Ns/m m/s rad/s
ξis = 100 for s = 1, . . . , 3 γi1 = 1, γi2 = 2, γi3 = 0.5, γi4 = γi5 = γi6 = 100 ζik = 1, for k = 1 δi1 = 0.1, δi2 = 0.2, δi3 = 0.05, δi4 = 2, δi5 = 1 and δi6 = 0.5
There is one quadrotor, i = 1
quadrotor ensured feasible trajectories and a nice convergence while satisfying all the constraints tagged to the quadrotor system. Future work on quadrotors will address the MPC of multiple quadrotors in a dynamic environment.
References 1. K.D. Nguyen, C. Ha, Design of synchronization controller for the station-keeping hovering mode of quad-rotor unmanned aerial vehicles. Int. J. Aeronaut. Space Sci. 20(1), 228–237 (2019) 2. H. Shraim, A. Awada, R. Youness, A survey on quadrotors: configurations, modeling and identification, control, collision avoidance, fault diagnosis and tolerant control. IEEE Aerospace Electron. Syst. Mag. 33(7), 14–33 (2018). July 3. K. Raghuwaiya, S. Singh, Formation types of multiple steerable 1-trailer mobile robots via split/rejoin maneuvers. N. Z. J. Math. 43, 7–21 (2013) 4. B.N. Sharma, J. Raj, J. Vanualailai, Navigation of carlike robots in an extended dynamic environment with swarm avoidance. Int. J. Robust Nonlinear Control 28(2), 678–698 (2018) 5. JJ. Raj, K. Raghuwaiya, S. Singh, B. Sharma, J. Vanualailai, Swarming intelligence of 1-trailer systems, in Advanced Computer and Communication Engineering Technology, ed. by H. A.
Autonomous Quadrotor Maneuvers in a 3D Complex Environment
231
Sulaiman, M. A. Othman, M.F.I. Othman, Y.A. Rahim, N.C. Pee (Springer International Publishing, Cham, 2016), pp. 251–264 6. J. Vanualailai, J. Ha, S. Nakagiri, A solution to the two-dimensional findpath problem. Dynamics Stability Syst. 13, 373–401 (1998) 7. K. Raghuwaiya, B. Sharma, J. Vanualailai, Leader-follower based locally rigid formation control. J. Adv. Transport. 1–14, 2018 (2018) 8. B. Sharma. New Directions in the Applications of the Lyapunov-based Control Scheme to the Findpath Problem. Ph.D. thesis, University of the South Pacific, Suva, Fiji Islands, July 2008
Tailoring Scrum Methodology for Game Development Towsif Zahin Khan, Shairil Hossain Tusher, Mahady Hasan, and M. Rokonuzzaman
Abstract The closest comparison of video game development would be other types of software development such as application software development or system software development. Yet these comparisons do not do it any justice. A video game is more than a software or the sum of its parts, unlike other types of software. In video games there is a much larger emphasis on performance, system requirements are often subjective. Designing a video game also tends to be a lot harder due to the complex interactions between the objects and the end users. The same is true when it comes to testing video games. Whereas other software developers can get away with functional and meeting all the requirements, in due time, a video game also has to provide the target audience with entertainment value. Keywords Game development · Development methodology · Scrum methodology · Agile methodology · Tailoring
1 Introduction The video game industry is a relatively young industry while at the same time is fast growing and ever changing and is always at the forefront of technological innovation. As such, research for or about the video game development is in its infancy. T. Z. Khan (B) · S. H. Tusher · M. Hasan Department of Computer Science and Engineering, Independent University, Bangladesh, Dhaka, Bangladesh e-mail: [email protected] S. H. Tusher e-mail: [email protected] M. Hasan e-mail: [email protected] M. Rokonuzzaman Department of Electrical & Computer Engineering, North South University, Dhaka, Bangladesh e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_21
233
234
T. Z. Khan et al.
Most video game research attempts to look at video games from the outside, as a foreign entity and how it may harm, benefit or simply change or affect others, but comparatively very few are trying to work with the industry to find solutions to problems that have been plaguing both the veterans and novices to the industry since its inception. For example, there are no proposed or researched general development methodologies or guidelines specifically for the video game development addressing its many unique challenges and pitfalls. Figuring out a development methodology wholly unique to video games would be very costly. On the other hand, a much more feasible option would be to adopt guidelines or frameworks from the well-researched software engineering domain [1–3] and altering it to best suit the needs for specific work. In this case study, we attempt to adopt and tailor the well-known agile development methodology from the software engineering domain, and use it to development an actual video game and observe the effectiveness of this particular methodology for this particular project. We envision this paper being used to compare and inspire similar research efforts in the near future, leading to improved practices that will eventually be adopted by the industry itself.
2 Literature Review Currently, the library for video game research consists of mostly on video game violence [4], video game addiction [5], video game industry [6], video game effects [7], video game cultures [8, 9], video game benefits [10, 11], etc. Software development methodologies are different frameworks used to manage and plan the development process of an IT-based work. Research has shown that the implementation of software development methodology is actually very limited. Even when methodologies are used they are not used in their entirety as they are, rather different combinations and parts of methods are cherry picked to benefit a particular work in its particular context. Agile development methodology in itself is not a particular process to follow, but rather a guideline constructed to bring agility to the software development process. According to the context of the work, agility can be more beneficial in reaching the project goals rather than a rigorous development process. These contexts may consist of requirements not being fully realized during development, the context may change during development, the need to deliver working software iterations mid-development, etc. The four core factors of agile methodology are [12]: • • • •
Individuals and interactions over processes and tool. Working software over comprehensive documentation. Customer collaboration over contract negotiation Responding to change over following a plan.
Tailoring Scrum Methodology for Game Development
235
3 Problem Statement The development methodology contributes to many of the problems that are currently plaguing the video game industry. Big budget game developments consisting of teams made up of hundreds of developers are still using development methodologies which were popular when a team made up of ten developers was considered a big team [13, 14]. Some common and reoccurring video game development problems are discussed below: Technology: Video games are technology dependent. Releasing games for any specific technology, for example, dedicated game consoles, means getting over the initial learning curve and finding work abounds for the shortcomings of that specific technology [13]. New generation of video game consoles is being introduced closer to the release of the previous generation of consoles with time [15]. This means that developers will have to face these issues even more frequently in the future [16]. Collaboration and Team Management: A game development team consists of people from many different career backgrounds such as programmers, plastic artists, musicians, scriptwriters and designers. Communication between people from so many different career backgrounds is often as hard as it is critical to successfully complete a video game project [13]. Nonlinear Process: Game development is never a linear process and requires constant iterations. Often the waterfall model with some enhancements to allow for iterations is used [13]. Schedule: Projects may face delays because proper targets or deliverables were not established. Even when estimated, estimates are often calculated wrong because of the time needed for communication, emergent requirements or lack of documentation. Since the game development process consists of teams made up of different disciplines and each team is dependent on the delivery of one or more other teams the required development time may be hard to estimate [13]. It is also hard to estimate deadlines and schedules as there are less historical data to look back on [16, 17]. Crunch Time: Crunch time is a term in the game industry to address the period of time when the whole development team has to overwork to get tasked finished and read before validation or project delivery deadlines. Crunch time may last 6–7 days and the work hours for per day may be over 12 h [13, 17]. Scope and Feature Creep: Feature creep is a term in the game addressing new functionality that is added during the development process which in turn increases the workload, and hence negatively impacts the schedule and deadlines [17]. Any new feature needs to be evaluated effectively as unmanaged feature creeps may increase errors, possible defects and ultimately the chances of failure. However, feature creeps are not completely avoidable during game development as video games are an interactive medium and things simply have to be changed mid-development to make it more enjoyable or entertaining for the end users.
236
T. Z. Khan et al.
Inadequate Documentation: Documentation helps avoid “feature creeps” and estimate the scope and time of the work. How much documentation is required depends on how complicated the work is [13].
4 Methodology In this section, we describe the scrum development process. Following are the roles required for scrum development methodology. Product Owner: Product owner is a single entity who is responsible to decide the priorities of the features and functionalities that are to be built. Scrum Master: A single entity who is responsible to teach the scrum values and processed to the team and actively works on improving the scrum experience on an extremely contextual basis. Development Team: In scrum, the development team is simply a cross-functional collection of individuals who together has the skills, resources and capabilities to design, build and test the product. Usually, the development team is made up of five to nine people who are self-organized and self-responsible to find out and implement the best ways to fulfill the goals set by the product owner [18]. The detail scrum framework is explained below [18]: • A sprint usually consists of an entire calendar month. • Product owner has a vision of the complete product. “The Big Cube” is to be made. • Then, the Big Cube is broken down into a prioritized list of features, “The Product Backlog.” • During “Sprint Planning” the product owner provides the team with the top most features from the “The Product Backlog.” • The team then selects the features they can complete during the sprint and breaks them down into the “Sprint Backlog” which is a list of tasks that is needed to complete the features (Fig. 1). • The “Sprint Execution” phase starts which lasts for mostly the entire sprint. • A daily short-meeting is held called the “Daily Scrum.” The development team and also the product owner must be present during the Daily Scrum to put into context what has already been made, what needs to made next and to make sure that the team is not deviating from the goals and requirements of the final product. • At the end of the sprint, the team by should have a portion of the final product that has the potential to be shipped and completely functional on its own. • Then, the “Sprint Review” starts where the development team and stakeholders simply discuss the product being built. • The sprint ends with the “Sprint Retrospective.” The development team tries to answer three key questions about the current sprint. “What the team did well”, “What the team should change” and “What the team should stop doing altogether.”
Tailoring Scrum Methodology for Game Development
237
Fig. 1 Scrum framework
• After the end of one sprint, the whole process starts again from the Product Backlog. • The process continues until the deadline or completion of the requirements.
5 Solution Different development methodologies are better for different development contexts. If a project, for example, has very well-defined requirements which have a very low chance of changing during development and the means or skills to reach those requirements are well known then methodologies such as the tried and true, linearsequential design with extensive documentation such as the “Waterfall Model” would be best. As such the first step would be to understand the project we are working on.
5.1 Background of the Experiment Elevator Pitch and Tool used: Single-device two-player game for Android smartphones made up of multiple mini-games (as shown in Table 1). Table 2 lists down the tools used for the development purpose. Team and Skills: The team is made up of four undergraduate students of Computer Science at Independent University, Bangladesh. Although the team is made up of hobbyist artists, musicians and even game designers and developers this research
238
T. Z. Khan et al.
Table 1 Pitch breakdown Keywords
Description
Two players
The games are meant to be played by two players at the same time either competing with each other or playing as a team
Single-device The game is meant to be played on the same device which means external connections or peripherals such as Internet connectivity is not required Mini-games
A collection of small games that can be played in short bursts
Android
Although due to the flexibility of the game engine, Unity, being used the product can easily be built for iOS smartphones we decided to focus and test it only on Android smartphones
Smartphones All interactions with the game or app will be done through hand gestures
Table 2 Tools used Tools
Description
Unity
A very well-known and widely used game engine free to use for hobbyist game developers, arguably has the most amount of online resources compared to other game engines
Adobe Photoshop PS One of the oldest and most popular digital art and graphics editor software by Adobe Systems MediBang Paint Pro
Lightweight and free digital art and graphics designing software
is the first time they will be using these particular set of tools. Being full time students first the amount of time each individual can contribute to this research at any given time will vary extremely due to exam dates, family responsibilities, etc. Each individual will mostly work remotely from home. Time: The research is part of the senior project/thesis of two undergraduate students of the Independent University, Bangladesh and as per university rules this research has a production time of seven months. Stakeholders: The stakeholders were the individual members of the development team and the university faculty supervising this research. The faculty, in particular, had one requirement for the research. The requirement was that the team had to show significant progress on a weekly basis. The chosen methodology has to promote experimentation and exploration as the team is using tools that they have never used before. The work requirements will inevitably change as the team gains more knowledge on the tools and gains more understanding of the scope of the work and the effort and time required to complete it. The time and deliverables have to be flexible enough to account for the extremely varying availability of individual members of the research.
Tailoring Scrum Methodology for Game Development
239
5.2 Tailoring Scrum Methodology Backlogs: Instead of keeping multiple backlogs for a very small team (smaller than the usual scrum team of five), it would inevitably and unnecessarily increase the development time, and the team has opted to simplify the multiple backlogs into one online spreadsheet. Below we describe simplified backlog: • A task code starting with the character “A” depicts an art asset, while “C” depicts a code asset. “M” depicts the integration of multiple assets into the main game. “Undefined” represents a feature that is yet to be broken down into smaller tasks. • The “Not Started” list is prioritized. • The heading for each column (“Not Started”, “Started” and “Done”) simply lists the tasks in each phase of development as the name suggests. “Done Done” depicts a task that was tested and the team is very sure will not go through any more iterations. • This list was updated every week at the end of every sprint (Table 3). Daily Scrum: The team could not work on the experiment full time and also worked remotely. Instead of daily sprint, the team opted to simply contact each other and provide support to each other whenever it was necessary during the Weekly Sprint. Potentially Shippable Product Increment: Due to the extremely varying amount of time each individual had to work on this research per week the team decided that it would be too difficult to set a goal to have a Shippable Increment of the product each week. Instead, the goal was set to build a “Product Increment Showing Progress” every week. This concept was also in line with the weekly update requirement of the faculty (product owner) who was overseeing this research. Finally, the process model of tailored scrum development if provided below (Fig. 2). Table 3 Tailored backlogs Not started
Started
Done
A2
Minigam1 character “Pink” animation
C2
Game select start button
A3
Game select button and start button
A1
Minigame1 character “Blue” animation
M1
A1 + C2
Undefined
Game over state
C1
Done Done Slide controls
240
T. Z. Khan et al.
Fig. 2 Tailored scrum methodology
6 Experiment Results Team Composition: Early in development the team realized that two of its members were unable to deliver as was needed by them. They were unable to make time for the research due to various reasons. After a few days of discussions, the aforementioned two members had quit the research, rendering the development team size only half as big as before. Due to the extremely flexible nature of the methodology, the team was able to re-scope and redistribute its resources and successfully finish development before the due date. Team Skills: In the beginning, the workflow was very slow and the deliverables per week were also very small in scope. At different points in the development, the team had reached different skill levels, at which point the team often rebuilt the most recent increment from scratch to increase modifiability or quality. Improvement in Programming Skills and Code Assets: In the beginning, the team had difficulties with even the most basic modules. The team was more concerned with showing progress at the end of the sprint rather than thinking about long time modifiability or reusability or testability. A particular mini-game initially used two code files to get everything done and was not reusable elsewhere in the project. The team gradually gained programming and development expertise in the language (C#) and the engine (Unity), respectively. Later, these codes were reworked in a completely object-oriented approach which increased future modifiability, testability and reusability greatly.
Tailoring Scrum Methodology for Game Development
241
Nine out of the thirteen code files written for Game1 was reused elsewhere within the larger application. Approximately 70% of the code files were reused. As expected the sprint deliverables increased drastically in scope and ultimately benefited the work even though there were some overhead costs of these changes. Improvement in Graphics Art Skill and Art Assets: In the beginning, the team decided on the theme ponds and creatures found around ponds. This idea turned out to be very costly. Firstly, the team had to relate the game-play with the creatures and backdrop. Each art asset took a long time to create as these creatures and backgrounds had to be accurate enough to convey the idea, for example, a bee had to look like a bee to the players. When all the art assets were put together in the game things did not look very attractive or intuitive to play. Mid-development the team decided to redo the art assets for all the mini-games with a few simple concepts in mind: • Use few colors through the whole game. • Use two characters to represent the two players for each game. • Use bubbles of two different colors to represent different game objects (other than the player characters) through the whole game. • Make the two characters simple and cartoony enough so that they can be placed in a variety of scenarios and be easily animated for all the different games in different ways. • Use a simple color for the backdrop for all games that simply contrasts with all other game objects. • Similarly, to the improvements in code these decisions made it far easier and faster to create reusable graphics that were also better in quality and more intuitive for play (Fig. 3). Game-Play: The final product has three mini-games while the team has tested twice as many mini-games and has talked about thrice as much. During the early tests among students and faculties from different departments of Independent University, Bangladesh, a few things were clear. The games were extremely un-intuitive. This was due to both the game-play (and rule set) as well as the graphics. From the testers feedbacks, the team later decided to completely redo the graphics and set up some guidelines for the mini-games to make them easier to learn, remember and even teach to others. Below the game-play guidelines are provided: • Number and complexity of the game rules must be simple enough to explain in one or two sentences. • The “win condition” must remain consistent throughout the game. Changing attributes such as increasing the speed of play with time to increase the chances of meeting the game over condition, as long as the game rules still remain the same. • Each game will be played while holding the device in portrait orientation. Top half will always be the play area for Player1 while the bottom half will always be the play area for Player2.
242
T. Z. Khan et al.
Fig. 3 Graphic asset comparison. Initial versus final
• The rapid development cycle and flexibility of the methodology allowed for a random selection of testers to give quick feedback while the team was able to work on those feedbacks just as fast.
7 Conclusion The success of the research by a large part can be attributed to the methodology used which was extremely flexible yet disciplined process. The team was able to quickly adapt to new knowledge and unpredictable circumstances and take drastic but successful decisions relatively. The development process would have been far more efficient in the long run if a scrum master was included. On rare occasions, too much flexibility has led to the team becoming less effective, when they were de-motivated as the team could easily promise to deliver less than they were capable of for the next sprint. The methodology used was tailored particularly for this work and could be reused for similar small-scale work and an inexperienced development team, depending on the context, but will definitely not be suitable for work of larger scales and larger team sizes. Compliance with Ethical Standards Conflict of Interest The authors declare that they have no conflict of interests.
Tailoring Scrum Methodology for Game Development
243
Ethical Approval This chapter does not contain any studies with human participants or animals performed by any of the authors. Informed Consent “Informed consent was obtained from all individual participants included in the study.”
Bibliography 1. L.N. Raha, A.W. Hossain, T. Faiyaz, M. Hasan, N. Nahar, M. Rokonuzzaman, A guide for building the knowledgebase for software entrepreneurs, firms, and professional students, in IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA), Kunming (2018) 2. S. Tahsin, A. Munim, M. Hasan, N. Nahar, M. Rokonuzzaman, Market analysis as a possible activity of software project management, in 2018 IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA), Kunming (2018) 3. M.M. Morshed, M. Hasan, M. Rokonuzzaman, Software Architecture Decision-Making Practices and Recommendations, in Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, Singapore (2019). 4. C.A. Anderson, A. Shibuya, N. Ihori, E.L. Swing, B.J. Bushman, A. Sakamoto, H.R. Rothstein, M. Saleem, Violent video game effects on aggression, empathy, and prosocial behavior in Eastern and Western countries: a meta-analytic review. Psychol. Bull. 136(2), 151 (2010) 5. D.L.K. King, D.H. Paul, F.D. Mark, Video Game Addiction (2013), pp. 819–825 6. A. Gershenfeld, M. Loparco, C. Barajas, Game Plan: The Insider’s Guide to Breaking in and Succeeding in the Computer and Video Game Business (St. Martin’s Griffin, 2007) 7. N.L. Carnagey, C.A. Anderson, B.J. Bushman, The effect of video game violence on physiological desensitization to real-life violence. J. Exp. Soc. Psychol. 43(3), 489–496 ( 2007) 8. A. Shaw, What is video game culture? Cultural studies and game studies. Games Culture 5(4), 403–424 (2010) 9. Y. Aoyama, H. Izushi, Hardware gimmick or cultural innovation? Technological, cultural, and social foundations of the Japanese video game industry. Res. Policy 32(3), 423–444 (2003) 10. D.E. Warburton, S.S. Bredin, L.T. Horita, D. Zbogar, J.M. Scott, B.T. Esch, R.E. Rhodes, The health benefits of interactive video game exercise. Appl. Physiol. Nutr. Metab. 32(4), 655–663 (2007) 11. K. Squire, Video Games and Learning: Teaching and Participatory Culture in the Digital Age. Technology, Education–connections (the TEC Series) (Teachers College Press, 2011), p. 272 12. K. Bect, M. Beedle, A.V. Bennekum, A. Cockburn, W. Cunningham, M. Fowler, J. Grenning, J. Highsmith, A. Hunt, R. Jeffries, Manifesto for Agile Software Development (2001) 13. R. Al-Azawi, A. Ayesh, M.A. Obaidy, Towards agent-based agile approach for game development methodology, in World Congress on Computer Applications and Information Systems (WCCAIS) (2014), pp. 1–6 14. R. McGuire, Paper burns: game design with agile methodologies, in Gamasutra—The Art & Business of Making Games (2006) 15. A. Marchand, H.-T. Thorsten, Value creation in the video game industry: industry economics, consumer benefits, and research opportunities. J. Interactive Market. 27(3), 141–157 (2013) 16. J.P. Flynt, O. Salem, Software Engineering for Game Developers, 1st edn. (Course Technology PTR, 2005) 17. F. Petrillo, M. Pmenta, F. Trindade, C. Dietrich, Houston, we have a problem …: a survey of actual problems in computer games development, in Proceedings of the 2008 ACM Symposium on Applied Computing (2008) 18. K.S. Rubin, Essential Scrum: A Practical Guide to the Most Popular Agile Process (AddisonWesley, 2012)
Designing and Developing a Game with Marketing Concepts Towsif Zahin Khan, Shairil Hossain Tusher, Mahady Hasan, and M. Rokonuzzaman
Abstract The video game industry is still in its infancy and lacks academic research concerning the actual process of developing a video game. Even though many aspiring video game developers dream of becoming a professional video game developer, someday reaching that goal is mostly bolstered by many self-experience failures as more formal academic resources are lacking. In this research, we have attempted to combine the well-established basic marketing concepts with the development process itself in an effort to bring some form of formal education into the video game development process. Keywords Game development · Marketing · Game design · Software design · Software development
1 Introduction The video game industry is still in its infancy. Yet the industry has a net worth of $118.6 billion [1]. Game such as “Clash of clans” [2] or “flappy bird” [3] earns a daily revenue of $50,000. Most papers on video games attempt to look at video games from the outside, as a foreign entity and how it may harm, benefit or simply change or affect others, T. Z. Khan (B) · S. H. Tusher · M. Hasan Department of Computer Science and Engineering, Independent University, Bangladesh, Dhaka, Bangladesh e-mail: [email protected] S. H. Tusher e-mail: [email protected] M. Hasan e-mail: [email protected] M. Rokonuzzaman Department of Electrical & Computer Engineering, North South University, Dhaka, Bangladesh e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_22
245
246
T. Z. Khan et al.
but comparatively very few are trying to work with the industry to find solutions to problems that have been plaguing both the veterans and novices of the industry alike. For example, there is no proposed or researched general development methodology or guidelines specifically for the video game development, addressing its many unique challenges and pitfalls. In this paper, we propose using some basic marketing concepts starting from the very development of the game or the product, so that in the long-run marketing the game can be more fruitful. In an attempt to do so, we have developed a video game with said concepts in mind and observe the final results and respective effects. We envision that this paper will both be used to compare and inspire similar research efforts in the future, leading to better practices that will eventually get adopted by the industry itself so that video games can be made in a more marketable manner with respect to traditional marketing concepts.
2 Literature Review Video Game Research Library: Currently, the library for video game research consists of mostly on video game violence [4, 5], video game addiction [6, 7], video game industry [8, 9], video game effects [4, 10], video game cultures [11, 12], video game benefits [13], etc. Marketing and Software Engineering: Although this paper is the first of its kind as per our knowledge, in recent times, a lot of research is being published with the goal of integrating marketing concepts with concepts from the Software Engineering Domain in an effort to enhance the latter domain [14, 15]. Aforementioned papers approach the research from mostly a theoretical point of view, such as the possibility of incorporating market research as a component of Software Project Management [16].
3 Problem Statement Ineffective Communication: Communication between the developers of any product and its end users is very important regardless of the medium or type of product [17]. The same concepts of communication can be applied to the game industry concerning the video game developers and video game players, respectively. Misunderstood Role of Marketing: Proper marketing begins even before the product has been developed in contrast to simply promoting a product postdevelopment. Developers can get better results by being more mindful of the marketing concepts through pre-production, production and post-production.
Designing and Developing a Game with Marketing Concepts
247
Unaligned Strategies: Marketing and development are often treated as separate entities in the game development. As a result, often the game is marketed toward audiences for whom the game was not meant for, diminishing returns. Undefined Game Portfolio: Game or product portfolio is the mission and vision statement that defines how the game developers or sellers of the product want the product to be ultimately envisioned by the customers. Vague or nonexistent portfolios give rise to contradictory marketing efforts that reduce the effectiveness of marketing and fail to create a focused image of the product in the consumer’s mind. Ambiguous Market Plan: An ambiguous or nonexistent marketing plan will have reduced success rate or reduced effectiveness. Proper marketing plans are mapped out even before the product has been announced to the consumers according to the game or product portfolio.
4 Solution 4.1 Effective Marketing Communication Definition: Marketing communication is effective when the proper feedback from the consumers is taken into account all throughout development and even after launch so that the products can be further developed or marketed accordingly. An effective marketing communication can be modeled with keeping these ideas in the mind: • • • • •
Understanding the target audience. Uncover your unique selling proposition. Sharpen or associate your brand look. Consistency in messaging. Choose a perfect media mix.
Demonstration: According to the basic marketing concepts, there are a few tried and true sequence of processes for effective marketing communication. Pre-development—market research: • • • •
Market segmentation Target market Target market analysis SWOT analysis During development—brand development:
• Brand positioning • Brand identity
248
T. Z. Khan et al.
4.2 Market Research Market analysis is a thorough analysis of the current and predictive marketplace so as to take better decisions regarding the development of new products and improvements of old products. A basic market research consists of market segmentation, target market and SWOT analysis. Market Segmentation Definition: Group of consumers on the basis of certain traits such as age, income, sex and usage. Consumers who belong to a certain group will respond similarly to marketing strategies. Demonstration: We segmented our market, according to the medium or platform the games are developed for. Our segments namely contain smartphones, tablet, handhelds, TV/consoles, browser based and personal computer games. According to the secondary data analysis (Fig. 1), the market share for smartphone games is steadily increasing, while the market share of other platforms is either staying consistent or steadily decreasing. Target Market Definition: A certain segment, to whom the product and marketing strategies are aimed for, is chosen from segments found in the market segmentation phase. Demonstration: Further secondary data analysis on the market share of different Operating Systems (Fig. 2) shows that the Android OS has a significant market share compared to its competitors. Android and iOS are used primarily by Android smartphones and Apple smartphones, respectively. Accordingly, we can conclude
2016-2019 Global Games Market 40
Market Share
35 30
29
27
27
34
32
30
28
27
26
26
26
25
25 20 15
10
5
11
10
10 2
5
5 1
0 2016 Smart Phone
2017 Tablet
Fig. 1 Global games market [1]
11
Handheld
1
Year
2018
Tv/Console
4
1
4
2019 Casual Web Games
PC
Designing and Developing a Game with Marketing Concepts
249
Market Share of Different OS 100.00%
81.70%
80.00% 60.00% 40.00% 17.90%
20.00%
0.30%
0.00%
0.10%
Windows
BlackBerry
Others
0.00% Android
iOS
Fig. 2 Game’s market share based on OS platforms [18]
that Android smartphones have a much larger market share within the smartphone industry. Picking up the perfect game genre is another challenging task. After analyzing secondary data (Fig. 3) and some games [6] in the market, we have found that our idea, scope and market research drive us to the single-device multiplayer. Target Market Analysis Definition: Once the target market has been selected, the target market has to be further defined to guide both the development and marketing efforts to cater effectively to this particular segment.
Game genre's market
900
807
Number
800 700
645
631
565 498 467
600 500 400
347
335341
300
239 202 154 108
200 100
194 162
168 139
105 93 71 7062 2544
73
18
87937265
14 8
0 Monthly Downloads Monthly Active Users Time Spent per month Aracde
Puzzle
SimulationCasual
Casual
Strategy
Fig. 3 Game’s market share based on game genre [19]
Monthly Revenue
Role Playing
Racing
Dice
250
T. Z. Khan et al.
Table 1 SWOT analysis Strengths
Weakness
New, innovative and unique idea Multi-disciplinary teams of programmers, game directors, sound designer and graphic artists who have years of experience in their own dedicated domains The game assets can be reused to develop similar games for non “touch” devices
Lack experience developing software or games for profit Lack marketing experience Specifically designed for “touch” devices, and hence, the fundamental of the game play will need to be redesigned if we wish to port to other type devices
Opportunities
Threats
Recent government initiatives to support and nurture local game development with training and/or funding Applicable methods to create “buzz” such as bonuses for sharing the game with friends over social media
Quick cloning of the game after release Overshadowing of the game by other games with stronger marketing and/or appeal parallel release
Demonstration: Casual games are defined as games for casual gamers. From the consumer perspective, these games are usually having simple rules, demand less time or attention and require less learned skills. From the developer’s perspective, these games require less production and distribution cost. SWOT Analyses Definition: SWOT Analysis stands for Strength, Weakness, Opportunity and Threats for a particular product. Demonstration: Table 1 demonstrates the SWOT analysis.
4.3 Brand Development Brand development is the act of creating a distinctive image of a particular product or brand in the eyes of the consumers in an effort to stand out when compared to its competitors. The basic branding consists of brand identities and brand positioning. Brand Positioning Definition: Brand positioning using points of parity (POP) and points of differences (POD) is a tool for brand development (Table 2). Brand Identity Definition: Brand identities are disparate elements of the brand purposefully planned and developed to give a complete and focused image or identity for the entire brand (Table 3).
Designing and Developing a Game with Marketing Concepts
251
Table 2 POP and POD analysis Tools
Description
POP
Can be played on a single device such as a tablet or smartphone Two players can share the same screen between them
POD
Brand association with the two distinctive characters “Bloop” & “Ploop” Multiple game levels in a single-device multiplayer game
Table 3 Brand identities Identity
Description
Logo
Different colors to associate with each game character and players Memorable Meaningful enough to be word of mouth Distinctive
Characters
Two distinctive colors represent two different player characters Two distinctive colors of these two game characters are easily associated with the brand name “Bloop Ploop”
Image
5 Conclusion The project is not market tested, and we cannot infer by any means that by using these concepts to build a new game, the developers are guaranteed to see significant returns on investment. Regardless the research puts forth a concept of harmonizing traditional marketing concepts with a new age media development, namely video games. Along with the concept, we have also demonstrated how to go about using these marketing concepts in the actual game development process by developing a small-scale video game ourselves named “Bloop Ploop.” More research is needed to refine the methods in which traditional marketing concepts can be used in designing a video game. In this way, we can be mindful of how and to whom we are going market the product to. The degree of effectiveness of this concept can only be evaluated after collecting the related real-world data, post-launch. Hence, even more research is needed on the effect of these concepts post-launch as opposed to only during development.
Bibliography 1. The Global Games Market Reaches $99.6 Billion in 2016, Mobile Generating 37%, [Online]. Available: https://newzoo.com/insights/articles/global-games-market-reaches-99-6billion-2016-mobile-generating-37/
252
T. Z. Khan et al.
2. Clash of Clans, [Online]. Available: https://www.similarweb.com/app/google-play/com.superc ell.clashofclans/statistics 3. A. Dogtiev, in Flappy bird revenue—How much did Flappy Bird make? [Online]. Available: http://www.businessofapps.com/data/flappy-bird-revenue/ 4. N.L. Carnagey, The effect of video game violence on physiological desensitization to real-life violence. J. Exp. Soc. Psychol. 43(3), 489–496 (2007) 5. K.E. Dill, D.C. Jody, Video game violence: a review of the empirical literature. Aggress. Violent. Beh. 3(4), 407–428 (1998) 6. D.L.K. King, D.H. Paul, F.D. Mark, Video Game Addiction (2013), pp. 819–825 7. A.J.V. Rooij, T.M. Schoenmaker, A.A. Vermulst, R.J.V.D. Eijinden, D.V.D. Mheen, Online video game addiction: identification of addicted adolescent gamers. Addiction 106(1), 205–212 (2010) 8. B.L. Bayus, V. Shankar, Network effects and competition: an empirical analysis of the home video game industry. Strateg. Manag. J. 24(4), 375–384 (2002) 9. Y. Aoyama, H. Izushi, Hardware gimmick or cultural innovation? Technological, cultural, and social foundations of the Japanese video game industry. Res. Policy 32(3), 423–444 (2003) 10. C.A. Anderson, A. Shibuya, N. Ihori, E.L. Swing, B.J. Bushman, A. Sakamoto, H.R. Rothstein, M. Saleem, Violent video game effects on aggression, empathy, and prosocial behavior in Eastern and Western countries: a meta-analytic review. Psychol. Bull. 136(2), 151 (2010) 11. A. Shaw, What is video game culture? Cultural studies and game studies. Games Culture 5(4), 403–424 (2010) 12. K. Squire, Video Games and Learning: Teaching and Participatory Culture in the Digital Age. Technology, Education–Connections (the TEC Series) (Teachers College Press, 2011), p. 272 13. D.E. Warburton, The health benefits of interactive video game exercise. Appl. Physiol. Nutr. Metab. 32(4), 655–663 (2007) 14. L.N. Raha, A.W. Hossain, T. Faiyaz, M. Hasan, N. Nahar, M. Rokonuzzaman, a guide for building the knowledgebase for software entrepreneurs, firms, and professional students, in IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA), Kunming (2018) 15. M.M. Morshed, M. Hasan, M. Rokonuzzaman, Software architecture decision-making practices and recommendations, in Advances in Computer Communication and Computational Sciences. Advances in Intelligent Systems and Computing, Singapore (2019) 16. S. Tahsin, A. Munim, M. Hasan, N. Nahar, M. Rokonuzzaman, Market analysis as a possible activity of software project management, in 2018 IEEE 16th International Conference on Software Engineering Research, Management and Applications (SERA), Kunming (2018) 17. K.L. Keller, Strategic Brand Management: Building, Measuring and Managing Brand Equity (Pearson Esucation Limited, 1997) 18. J. Vincent, in 99.6 percent of new smartphones run Android or iOS While BlackBerry’s market share is a rounding error, [Online]. Available: https://www.theverge.com/2017/2/16/ 14634656/android-ios-market-share-blackberry-2016 19. M. Sonders, in New mobile game statistics every game publisher should know in 2016, SurveyMonkey Intelligence, [Online]. Available: https://medium.com/@sm_app_intel/new-mobilegame-statistics-every-game-publisher-should-know-in-2016-f1f8eef64f66
Some Variants of Cellular Automata Ray-Ming Chen
Abstract Cellular Automata (CA) shed some light on the mechanisms of evolution. Typical cellular automata have many assumptions, simplifications and limitations. Most of them have been extended or generalized or even lift. In this chapter, I introduce more other cellular automata from different perspectives with the aim at enriching the research scope and applications of CA. Keywords Cellular automata · Transition · Interactive CA
1 Introduction The concept of CA [1] provides an approach to delving into how the physical and social worlds work and evolve. Though the settings and concepts [2, 3] of CA are easy to comprehend, they have a far-reaching power to simulate real or imaginary objects and their lives. Since it had been introduced, it has snowballed into a new subject of research. Many different generalizations of CA have emerged and developed. In this chapter, I look into CA—in particular, their transition rules—from other perspectives. Later on, I would design some CA to capture these perspectives. First of all, in Sect. 2, I look into the state-dependent transition rule. Unlike a universal transition rule for all the states, the transition rules will depend on the states of the cells. In Sect. 3, I study the transition rules based on cellular automata with cells whose states are the rules. There are already some research on transition rules which take environment into consideration. I will also focus on the mutation of states and the behaviour of camouflage induced by environment. With or without explicit transition rules, evolution of states consumes resources. Hence, resources per se become implicit transition rules. This mechanism will be also explored. In a
R.-M. Chen (B) School of Mathematics and Statistics, Baise University, 21, Zhongshan No. 2 Road, Baise, Guangxi Province, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_23
253
254
R.-M. Chen
dynamical system, the behaviour of one set of cellular automata might depend on other cellular automata or mutually dependent. I will study such mechanisms. Lastly, I will investigate some combinations of the above mechanisms.
2 State-Dependent Transition In this paper, we use CA to denote either one-dimensional cellular automaton or cellular automata. It depends on the context. A state might play an active role in forming or choosing its neighbourhood or rules. All the neighbourhood and transition rules are not predetermined. It depends on the value of the state and evolves accordingly.
2.1 State-Dependent Neighbourhoods The concept of neighbourhood of a state capsules the degree a state is influenced by other related states. A fixed radius assumes that the range of influence depends only on a fixed number of position and makes no distinction of dependency between different states. This simplification might overlook the dependency of the cells. To distinguish different weight on each cell, a mechanism is introduced. Each array of cells at time tn decides the neighbourhoods at time tn+1 and which activates the related rule to decide the results of next array of cells at time tn+1 . Example 1 Let k be the length of a lattice of an one-dimensional CA. Let the lattice of cells or an array of cells be C = (C1 , C2 , . . . , Ck ). Let v tn (Ci ) denote the value (or state) in the cell Ci at time tn and let the array s(tn ) = (v tn (C1 ), v tn (C2 ), . . . , v tn (Ck )). Define Right s(tn ) (Ci ) := h, the order of the first element at C in the right of Ci such that v tn (Ch ) = v tn (Ci ), if there exists such h and Right s(tn ) (Ci ) := i, otherwise. Define Left s(tn ) (Ci ) := m, the order of the first element at C in the left of Ci such that v tn (Cm ) = v tn (Ci ), if there exists such m and Left s(tn ) (Ci ) := i, otherwise. Define the rule for the neighbourhood of cell Ci , N bs(tn ) (Ci ) := {Cn : Lefts(tn ) (Ci ) ≤ n ≤ Rights(tn ) (Ci )}. For example, assume that s(tn ) = (0, 1, 1, 0, 1, 0, 1, 0). Based on the above definitions, one computes each neighbourhood of each cell as follows: N bs(tn ) (C1 ) = {C1 , C2 , C3 , C4 }, N bs(tn ) (C2 ) = {C2 , C3 }, N bs(tn ) (C3 ) = {C2 , C3 , C4 , C5 }, N bs(tn ) (C4 ) = {C1 , C2 , C3 , C4 , C5 , C6 }, and N bs(tn ) (C5 ) = {C3 , C4 , C5 , C6 , C7 }, N bs(tn ) (C6 ) = {C4 , C5 , C6 , C7 , C8 }, and N bs(tn ) (C7 ) = {C5 , C6 , C7 }, N bs(tn ) (C8 ) = {C6 , C7 , C8 }. Now suppose the transition rule is: v tn+1 (Ci ) = min{v tn (D) : D ∈ N bs(tn ) (Ci )}. Then one has the transited state s(tn+1 ) = (0, 1, 0, 0, 0, 0, 0, 0).
Some Variants of Cellular Automata
255
2.2 State-Dependent Rules The transition rule of a state is not predetermined, but it depends on the array of the states. It indicates the transition routes highly rely on the values of states per se. Example 2 Assume C = (C0 , C1 , C2 , . . . , C7 ) is a lattice of cells of an one-dimensional CA. Each cell has only two states: 0 and 1 and the radius of its neighbourhood is 1. Let the value of a cell C at time tn be denoted by v tn (C). Assume the chosen transition rule from an array s(tn ) to an array s(tn+1 ) is v tn (C0 ) × 20 + v tn (C1 ) × 21 + v tn (C2 ) × 22 + · · · + v tn (C7 ) × 27 . Now suppose that s(tn ) = (1, 1, 0, 1, 0, 1, 0, 1) and N bs(tn ) (C0 ) = (1, 1, 0), and N bs(tn ) (C1 ) = (1, 1, 0), N bs(tn ) (C2 ) = (1, 0, 1), N bs(tn ) (C3 ) = (0, 1, 0), N bs(tn ) (C4 ) = (1, 0, 1) and N bs(tn ) (C5 ) = (0, 1, 0), N bs(tn ) (C6 ) = (1, 0, 1), N bs(tn ) (C7 ) = (1, 0, 1). Then s(tn+1 ) = R171 (s(tn )) = (0, 0, 1, 0, 1, 0, 1, 1), where R171 (s(tn )) means the Rule 171 applies to the array s(tn ).
3 Automated-Rule-Dependent Transition A fixed predetermined transition rule is not always the best strategy for an evolution. Transition rules might evolve as well. Here we devise a mechanism to automate such evolution. Let L a lattice of cells with length k + 1 of a CA. Let time scale be a time scale of the evolution. Let h be the size of the neighbourhood of each cell. Let S be a set of states and s(tn ) be the array of states at time tn . Let si,j denote the value in the cell j at time ti . Let R(S, h) be the potential transition rules, for example, the one-dimensional CA with the size of a neighbourhood h = 3 (i.e. the radius is 1) and S = {0, 1} has R({0, 1}, h = 3) = Now we construct another CA, say CA , to automate the potential transition rules. Let L be another lattice of cells with length k + 1. Let T be the same time scale as T . Let h be the size of each neighbourhood of a cell. Let S = R(S, h) be a set of states and r (tn ) be the array of states at time tn . Let ri,j denote the value (a rule) in the cell j at time ti . Let be its transition rule for CA . Now the activated transition rule for each state si,j is ri,j . This mechanism is shown in Fig. 1: Example 3 Let CA be a cellular automaton with only two states {0, 1}. Let the size of the neighbourhood of each cell h = 3 (i.e. radius 1). Let N btn (C) denote the set of neighbourhood of the cell C at time tn . Suppose the length of the lattice is 6. Suppose the array of states of CA at time t0 is s(tn ) = (1, 0, 0, 0, 1, 1). Suppose that the ordered set of all the neighbourhoods at time tn is N (s(tn )) ≡ (N bs(tn ) (C0 ), N bs(tn ) (C1 ), N bs(tn ) (C2 ), N bs(tn ) (C3 ), N bs(tn ) (C4 ), N bs(tn ) (C5 ))=((1, 0, 0), (1, 0, 0), (0, 0, 0), (0, 0, 1), (0, 1, 1), (0, 1, 1)). Let a • b be the inner product Similarly, let CA have the same length of lattice of cells as CA and of vectors a and b.
256
R.-M. Chen
Fig. 1 Transited states
S = R({0, 1}, h = 3) = {Rule 0, Rule 1, Rule 2, . . . , Rule 255}. Let the size of the neighbourhood of each cell h = 3 (i.e. radius 1). Suppose is Rule 189889 and the array of rules (or states) at time tn is r (tn ) = (9, 231, 20, 0, 20, 255). Let the ordered set of all the neighbourhoods N (r (tn )) ≡ (N br (tn ) (C0 ), N br (tn ) (C1 ), N br (tn ) (C2 ), N br (tn ) (C3 ), N br (tn ) (C4 ), N br (tn ) (C5 )) = ((9, 231, 20), (9, 231, 20), (231, 20, 0), (20, 0, 20), (0, 20, 255), (0, 20, 255)). Assume e = (2550 , 2551 , 2552 ). Define N ∗ (r (tn )) := (b0 ≡ N br (tn ) (C0 ) • e, b1 ≡ N br (tn ) (C1 ) • e, b2 ≡ N br (tn ) (C2 ) • e, b3 ≡ N br (tn ) (C3 ) • e, b4 ≡ N br (tn ) (C4 ) • e, b5 ≡ N br (tn ) (C5 ) • e). Suppose k is a fixed positive integer and Y is an arbitrary set. For any non-negative integer n = c0 × |Y |0 + c1 × |Y |1 + c2 × |Y |2 + · · · + ck−1 × |Y |k−1 , define RulenS : {0, 1, 2, . . . k − 1} → {c0 , c1 , c2 , . . . , ck−1 } by RulenY (m) := cm , where all the coefficients are non-negative integers lying between 0 and |Y | − 1 and where 0 ≤ m ≤ k − 1. Then s(tn+1 ) = (R9 (1 × 20 + 0 × 21 + 0 × 22 ), R231 (1 × 20 + 0 × 21 + 0 × 22 ), R20 (0 × 20 + 0 × 21 + 0 × 22 ), R0 (0 × 20 + 0 × 21 + 1 × 22 ), R20 (0 × 20 + 1 × 21 + 1 × 22 ), R255 (0 × 20 + 1 × 21 + 1 × 22 )) = (R9 (1), R231 (1), R20 (0), R0 (4), R20 (6), R255 (6)) = (0, 1, 0, 0, 0, 1), where Rn (m) denotes the state after applying the Rule n to the coded neighbourhood number m. Furthermore, N (s(tn+1 )) = ((0, 1, 0), (0, 1, 0), (1, 0, 0), (0, 0, 0), (0, 0, 1), (0, 0, 1)).
Some Variants of Cellular Automata
257
S S Moreover, it follows r (tn+1 ) = (u0 ≡ Rule189889 (b0 ), u1 ≡ Rule189889 (b1 ), u2 ≡ S S S S Rule189889 (b2 ), u3 ≡ Rule189889 (b3 ), u4 ≡ Rule189889 (b4 ), u5 ≡ Rule189889 (b5 )). Hence, s(tn+2 ) = (Ru0 (0 × 20 + 1 × 21 + 0 × 22 ), Ru1 (0 × 20 + 1 × 21 + 0 × 22 ), Ru2 (1 × 20 + 0 × 21 + 0 × 22 ), Ru3 (0 × 20 + 0 × 21 + 0 × 22 ), Ru4 (0 × 20 + 0 × 21 + 1 × 22 ), Ru5 (0 × 20 + 0 × 21 + 1 × 22 )) = (Ru0 (2), Ru1 (2), Ru2 (1), Ru3 (0), Ru4 (4), Ru5 (4)).
4 Environment-Dependent Transition In the strict sense, there exists no closed system. Everything is somehow connected. If the effect from the environment is very small, then one could consider the whole system as an isolated system. Nonetheless, if the environment interacts with the system closely and has high impact on the system, then one needs to design some rules to capture their interactions. Here I only describe some mechanisms for mutation and camouflage since there are already some research on CA with environment factor. Mutation is common and occurs very often—for example, nuclear radiation creates some mutations to living creatures or some unpredictable swerve of a social structure from tradition, while camouflage is also prevalent among some natural or social mechanisms.
4.1 Mutation Under normal conditions or a closed system, a set of states is expected to remain the same under the whole course of evolution. However, when the environment has a high impact on the system, then the course of the evolution would change accordingly. For example, under harsh environment, the offspring of one species would change their genes or behaviours to adapt to the environment. Here I consider the mutation of the set of states. Let SE be a set of states under the environment E. Now when the environment changes to E , the set of states would change to SE . Such changes could be specified by some threshold rules. These rules might be derived from some other mechanisms too. Example 4 Let E(tn ) be the degree of temperature in an environment at time tn . Suppose s(tn ) =(yellow, blue, white) and E(tn ) = 40 ◦ F. The transition rule is: if the temperature shifts from 30–50 ◦ F range to 390–600 ◦ F range, then yellow will turn black, blue will remain blue and white will turn transparent (a mutated state). Now suppose the temperature at time tn+1 is E(tn+1 ) = 508 ◦ F. Then one has s(tn+1 ) =(black, blue, transparent). Such mutation is still under the realm of predictability, since it could still be specified by the threshold rules. For a random mutation, one could add more threshold rules for a single situation or introduce probability into the transition rule.
258
R.-M. Chen
4.2 Camouflage In the case of camouflage, when the environment changes, CA would evolve as close to the environment as possible. This enables them to assimilate into the whole system and save themselves from being exposed to the enemies. Example 5 Suppose a CA has only two states: black and white and its environment has three states: brown, red and blue. Assume its array of states at tn is s(tn ) = (white, black) and assume the state of its environment at tn is brown. When CA observes the difference between itself and the environment, it transits from the array s(tn ) to s(tn+1 ) = (black, black) since black is a state closer to brown. If one considers a simultaneous transition, i.e. no time lag between the environment and CA, then CA needs some estimation or prediction of the outcome of the environment and acts accordingly.
5 Resource-Dependent Transition The evolution of CA does not come for free. It consumes resources. The form of resources could be any physical resources: electricity, gas, money, …, etc or nonphysical ones: knowledge, information, love, …, etc. No matter whether there exist an explicit transition rule or not, the transition is also implicitly constrained by the resources provided. In the case with no explicit rules, resource constraint will act as an implicit rule for the transition. There are many aspects that would involve the consumption of resources.
5.1 Length of Lattice To create each cell to bear states, it needs space or capacity, for example, memory of a computer or the strength of power supply. An infinite length of a lattice is too idealistic for real applications. It is conceptual. To implement it, one needs some capacity or resources. Suppose one lattice is constructed based on power supply. The more power you provide, the longer lattice you have. Then the amount of power in time t would decide the length of a lattice. One needs to specify the transition rules for the states from one length of lattice to another one. In the following, I give a very simple example. Example 6 Let PWtn denote the amount of power available at time tn . Let Lentn (PWtn ) denote the length of lattice supplied by power at time tn . Suppose each megawatt electrical output creates exactly one cell. Assume PWtn = 8 megawatts (MW). Then one has Lentn (PWtn ) = 8. Suppose that s(tn ) = (1, 1, 0, 1, 0, 1, 1, 0) and the transition rule is to copy its previous state from left to right repeatedly. Suppose the power sup-
Some Variants of Cellular Automata
259
plied at time tn+1 is 14 MW, then one has s(tn+1 ) = (1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1). Suppose at time tn+2 there is a power shortage and PWtn+2 = 5 MW, then one has s(tn+2 ) = (1, 1, 0, 1, 0).
5.2 Cell-Powered Transition Suppose each cell is assigned or distributed with a specific amount of resources. The transition rule for each cell will be guided by the amount of the resources assigned or distributed. Example 7 Let s(tn ) = (1, 0, 0, 0, 1, 1). Suppose each cell is equipped with a circuit with functionality (or states): on and off. If it is on, then the resource is provided and off, otherwise. Suppose the transition rule is: when the resource at time tm is supplied, the state from time tm changes to its opposite one at time tm+1 and stays the same, otherwise. Let Cirt (C) be the state of the Circuit of the cell C at time t. Now assume Cirtn (C1 ) =on, Cirtn (C2 ) =off, Cirtn (C3 ) =on, Cirtn (C4 ) =on, Cirtn (C5 ) =off, Cirtn (C6 ) =off. Then one has s(tn+1 ) = (0, 0, 1, 1, 1, 1). This mechanism also entitles each cell to have its own set of states based on the resources it obtains. Example 8 Let L = (C1 , C2 , C3 ) be a lattice of cells. Let the value of a cell C at time tn be denoted by v tn (C). Let the resource-dependent set of states go as follows: SR1 = {1, 8, 18, 29}, the set of states given the resource level R1 . Similarly, SR2 = {2, 11, 12, 219}, SR3 = {12, 22, 25, 69}, SR4 = {41, 44, 65, 69}, SR5 = {0, 4, 51, 91}. Assume the resources assigned to each cell at time tn and tn+1 are: Restn (C1 ) = R2 , Restn (C2 ) = R1 , Restn (C3 ) = R4 ; Restn+1 (C1 ) = R5 , Restn+1 (C2 ) = R4 , and Restn+1 (C3 ) = R3 . Assume s(tn ) = (5, 19, 55). Suppose the transition rule is to choose the maximal state among all the states that maximize the distance between the to-be-transited state and potential transited state, i.e. v tn+1 (C) = max{argmax{|v tn (C) − d | : d ∈ SRestn (C) }}. Then s(tn+1 ) = (219, 1, 69) and s(tn+2 ) = (0, 69, 12).
5.3 Array-Powered Transition Unlike cell-powered transition, there is no resource designated for each cell. Instead, the resource is provided to the lattice as a whole. Then there are many situations to be considered. Are the cells themselves competing for resources? Or are they acting as a whole? Here I only consider the latter situation, i.e. the transition rule is not based on each individual cell, but on the whole lattice. These act more like team-work cellular automata. Example 9 Let the time scale T = {t0 , t1 , t2 , t3 , t4 }. Let L = (C0 , C2 , . . . , C6 ) be a lattice of cells. Given a state of resource, suppose the transition rule is to maximize the
260
R.-M. Chen
distance between the transition s(tn ) and s(tn+1 ). Let RS = {RS(t) : t ∈ T } be a set of resource states in which each is measured in terms of the amount of information. Suppose at time tn , the state is s(tn ). Let us define the distance between two states in 6 := terms of the amount of information. For any α , β ∈ D ≡ {0, 1}, define dis( α , β) 6 |α(j)−β(j)| are the j’th elements in α and β, respectively. 2 , where α (j) and β(j) j=0
The transition rule is s(tn+1 ) = argmax{dis(s(tn ), α ) : dis(s(tn ), α ) ≤ RS(tn+1 )}. Let α ∈D
RS = {RS(t0 ) = 100, RS(t1 ) = 27, RS(t2 ) = 0, RS(t3 ) = 31}. Let the initial state, s(t0 ) = (1, 1, 0, 0, 1, 0, 1). Thus s(t1 ) = (1, 1, 1, 0, 1, 1, 0), s(t2 ) = (0, 0, 1, 1, 0, 1, 0), s(t3 ) = (0, 0, 1, 1, 0, 1, 0) and s(t4 ) = (1, 1, 0, 0, 1, 1, 0). Resource constraints also provide a natural way to design a beginning or an ending to cellular automata. If the resources are presented, it starts to evolve according to its transition rules; if not, it decays or reverses or ceases to proceed. Resource constraints could also serve as a way to specify the boundary and its transition rule of CA. In addition to that, they could also serve as a time scale.
6 Cellular-Automata-Dependent Transition The transition rules of one CA depend on the transition rules of another CA. This is also a dynamical-transition-rule setting as there is no predetermined and fixed rules. In a simple situation, there is a leading cellular automaton, say CAL , whose actions contribute to the transition rules of his following cellular automaton, say CAF . Suppose the transition rules of CAL are observable to both CAL and CAF . In a more complicated situation, the transition rules of the CAL are only known to the leading CA, but unknown to the following CA. Then the following CA must learn the leading CA’s transition rules via observing the transition of its states. The mechanism is described in Sect. 6.1. If there is no leading or following cellular automaton, but only mutual dependent transitional rules (i.e. the transition rules of CA1 depend on the ones of CA2 and vice versa), then a dynamical process of observation or learning would then decide the their transition rules. The mechanism is described in Sect. 6.2.
6.1 Leading and Following CA There is a time lag and the performance of one cellular automaton will specify or decide the rules for another cellular automaton. This situations occur virtually everywhere, for example, a food chain. We call them a leading and following cellular automaton, respectively. There are at two aspects to be considered:
Some Variants of Cellular Automata
261
1. Transited states: the following CA observes or estimates the states of its leading CA and acts afterwards. 2. Transition rules: the following CA observes or estimates the transition rules of the leading CA and acts afterwards.
6.1.1
Direct Observation: Leading States
The following CA will observe the states of the leading CA at time tn and then transit at tn+1 accordingly. Example 10 (Aspect One: State) Suppose the lengths of lattice of cells of the leading CAL and following CAF are the same. Suppose there are only two states: 0 and 1 for both CA. Assume the transition rule for the following CAF is: it will change the value in a cell C to the opposite value when its state in the cell C is the same as CAL in its cell C. Suppose the arrays of states of CAL and CAF at time tn are sL (tn ) = (1, 0, 0, 1, 0, 0, 1, 1, 0) and sF (tn ) = (0, 0, 1, 0, 0, 0, 1, 0, 1), respectively. Then sF (tn+1 ) = (0, 1, 1, 0, 1, 1, 0, 0, 1). Furthermore, if sL (tn+1 ) = (1, 1, 1, 0, 0, 1, 1, 0, 0), then sF (tn+2 ) = (0, 0, 0, 1, 1, 0, 0, 1, 1).
6.1.2
Direct Observation: Leading Transition Rules
The following CA will observe the transition rules of the leading CA at time tn and then make its transition at time tn+1 . Example 11 (Aspect Two: Transition Rule) Suppose there are only two states: 0 and 1 for both CA. Let sF (tn+1 ) = (0, 0, 1, 0). Assume there are two options of transition rules for CAL : rule a and rule b. The transition rule for the following CA is: if the leading CA adopt rule a, then CAF will keep its states unchanged and change to the opposite if rule b is adopted. Suppose the following CA observes that rule b is adopted by CAL for the transition from sL (tn ) = (0, 1, 1, 1) to sL (tn+1 ) = (1, 1, 0, 1). Then CAF acts accordingly and has sF (tn+2 ) = (1, 1, 0, 1).
6.1.3
Following CA’s Learning
When direct observation of the states or transition rules of the leading CA is not possible, then the following CA must design some learning mechanisms or devise some approaches to approximate or extract the leading CA’s states or transition rules. Here we omit the unknown state part, and only focus on the unknown transition rules, i.e. we assume the states of the leading CA are always observable by the following CA. In addition, if there exist several solutions for such estimation, then the following CA must also devise a mechanism to make an optimal choice.
262
R.-M. Chen
Example 12 Let us continue Example 11, but which rule among rule a and rule b is adopted by the leading CA is unknown to the following CA. Now suppose the following CA devises a mechanism: if the total number of state 1 in sL (tn+1 ) is greater or equal to the total number of state 1 in sL (tn ), then rule a is adopted by the leading CA and rule b, otherwise. Based on this mechanism, the following CA jumps to the conclusion that rule a is adopted by the leading CA and transits to sF (tn+2 ) = (0, 0, 1, 0), accordingly.
6.2 Interactive CA Suppose there are two CA: CA1 and CA2 . and there is no leading or following cellular automaton. CA2 will base its transition from its array s2 (tn ) to array s2 (tn+1 ) by either the state s1 (tn ) or the transition rule from CA1 ’s array of states s1 (tn−1 ) to array s1 (tn ) and vice versa.
6.2.1
Direct Observation: Mutual States
Both CA base their transition rules on the states of the other CA and evolve interactively. Example 13 Let CA1 and CA2 be two CA whose sets of states are both {0, 1}. Suppose the arrays of states of CA1 at time t0 is s1 (t0 ) = (1, 0, 1, 0, 1) and CA2 is s2 (t0 ) = (1, 1, 1, 0, 0). The transition rule for CA1 is: if the sum of the values of s2 (tn ) is greater than or equal to 3, then it changes the values in the last two cells to the opposite ones; if not, it changes the values in the first two cells to the opposite ones. The transition rule for CA2 is: if the values in the cell 1 and cell 3 of s1 (tn ) are the same, then it changes the values in its first and second cells to the opposite ones; if not, then it changes the values in the second and third cells to the opposite ones. The the transition could be shown as in Table 1: Observe that this transition is stable and periodic with a length of 4.
Table 1 Interactive transition Time scale t0 t1 t2 t3
Cellular automaton one
Cellular automaton two
s1 (t0 ) = (1, 0, 1, 0, 1) s1 (t1 ) = (1, 0, 1, 1, 0) s1 (t2 ) = (0, 1, 1, 1, 0) s1 (t3 ) = (0, 1, 1, 0, 1)
s2 (t0 ) = (1, 1, 1, 0, 0) s2 (t1 ) = (0, 0, 1, 0, 0) s2 (t2 ) = (1, 1, 1, 0, 0) s2 (t3 ) = (1, 0, 0, 0, 0)
Some Variants of Cellular Automata
6.2.2
263
Direct Observation: Mutual Transition Rules
Both CA base their transition rules on the transition rules of the other CA and evolve interactively. The whole mechanism is similar to Examples 11 and 13.
6.2.3
Mutual Learning
When the states or the transition rules of either CA are unknown to the other one, then both CA must come up with some mechanisms or methods to obtain the transition rules of the other CA. All the mechanisms are similar to Example 12.
7 Mixed Systems So far, we have learned many different transition rules from various perspectives. Based on this, one can start to design some more complicated cellular automata systems which might involve many different set of transition rules. Example 14 Suppose the transition rule of a leading CAL is predetermined while the one for its following CAF depends on the resources produced by the transition of CAL . Then, this combination contributes to a composite cellular automata system.
8 Conclusions and Future Work I have introduced various transition rules for cellular automata from different points of view: state-dependent, environment-dependent, resource-dependent and cellularautomata-dependent. Based on these mechanisms, one could start to put forward some theories or implementation of them. In this paper, I give only some guidelines of designing some cellular automata that would much more applicable to real situations. In view of uncertainty, one could also couple probability or entropy with the models or calculations to settle the problem of optimal choices. In addition, one can also combine these cellular automata with other generalized or non-conventional cellular automata, if the real situations are much more complicated.
References 1. S. Wolfram, Cellular Automata and Complexity (Westview Press, 1994) 2. A. Ilachinski, Cellular Automata—A Discrete Universe (World Scientific Publishing Co., 2001) 3. A. Adamatzky, Game of Life Cellular Automata (Springer, London Limited, 2010)
An Exchange Center Based Digital Cash Payment Solution Yong Xu and Jingwen Li
Abstract The paper proposes a digital cash (Dcash) online payment system based on exchange center that represents central bank to manage and monitor issuance, payment and refund of Dcash. Difference Dcash could be exchanged when its issuer registered at the exchange center. Then it introduces architecture of the system and a series of transaction procedures. This paper models the payment activity using Color Petri Nets. At last, by analyzing the state space reports of the model, we can verify and check this system and get the results that the flow charts of these entities are controllable, realizable and feasible. Keywords Digital cash · Online payment · Exchange center · Color petri nets
1 Introduction Usually the payment tools used in e-commerce are named electronic money (emoney) [1]. There are two kinds of e-money systems that one is online system and another is offline system [2, 3]. In this paper, we only focus on online payment systems. Online systems are divided into two kinds [3]: (1) the traceable systems, such as: First Virtual, Cyber Cash,Netbill, Minipay and so on [4–6]; (2) the untraceable systems, such as: NetCash [7]. However, there are many problems in especial two problems listed below in these solutions [7–9]. Lack of unified and standard security mechanisms. Most of e-money systems can be used in specified narrow scope, such as one game website or one e-commerce website. They cannot be universal. And they cannot be exchanged. It means that one kind of e-money cannot be used in other systems and vice versa. The lack of one Y. Xu (B) · J. Li School of Economics and Commerce, South China University of Technology, 510006 Guangzhou, China e-mail: [email protected] J. Li e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_24
265
266
Y. Xu and J. Li
unified e-money system may result in waste of resources. Then a lot of distributed applications will be necessary to establish many e-money systems. And also it is not conducive to the development of e-money related technologies and standards. Lack of Supervisory. With the development of electronic payment, e-money systems as one kind of electronic payment systems have been used in a wide range of applications. Its influence is also growing. But e-money’s supervisory laws and means are not yet legible. To overcome these two problems, this paper presents a solution to build a unified online digital cash system. In this paper, we will focus on digital cash’s business process, not on digital cash technologies such as security, anonymity, and signature algorithm. But the system can adopt these mature technologies if needed. The main contributions of this work are as follows: (1) We define a unified digital cash payment model as an online payment system based on exchange center and a series of transaction procedures for the model. (2) We model the solution using CPNs (Colored Petri Nets). By analyzing the state space report of the CPNs model, we know that the payment business process of the system is controllable, realizable and feasible. The remainder of the paper is organized as follows. Section 2 discusses the related work. Section 2.1 proposes the scheme of a unified digital cash payment system. Section 3 analyzes the model verification and checking of the payment CPNs model. Finally, Sect. 4 provides some concluding remarks.
2 Related Work 2.1 Researches on E-Cash 2.1.1
Research on Online E-cash System
D. Chaum presented an online anonymous e-cash system. He was the first researcher to apply blind signature algorithm to perform e-cash systems. In his solution, banks issue e-cashes that are anonymous and record used e-cash to avoid e-cash being recycled [9]. Damgard proposed an e-cash system that is provable security using cryptogram protocols and zero-knowledge proof algorithm [16]. Many Internet payment systems are online systems: • Credit-card payment: First Virtual, CyberCash. First Virtual is a system for making credit card payments over the Internet without exposing the credit card number to the merchant. It required no special software for a customer to make a purchase [5]. CyberCash is a system that allows customers to pay by a credit card without revealing the credit card number to the merchant [4]. The main point of this scheme was to prevent merchant’s fraud, and thus allow customers to do business with more merchants without fear of scam. However, CyberCash is not able to find the market.
An Exchange Center Based Digital Cash Payment Solution
267
• Micropayments: NetBill, Millicent and NetCash NetBill is a system for micropayments for information goods on the Internet. A NetBill transaction transfers information goods from merchant to customer, debiting the customer’s NetBill account and crediting the merchant’s account for the value of the goods [6]. Millicent was designed by Digital Equipment’s Systems Research Center to provide a mechanism for secure micropayments. The NetCash research prototype is a framework for electronic currency developed at the Information Sciences Institute of the University of Southern California [10]. • Multi-banks e-cash system based on group signature. Brands proposed an e-cash system based on many banks that each bank issues different e-cash signed by different public key and private key and cleared by accounting center [11–13]. Chaum and Camenisch respectively put forward different group signature solutions and algorithms [14, 15]. • Multi-banks e-cash system based on proxy signature. Zhou presented a multi-banks’ e-cash system based on proxy signature that center bank managed merchant banks allying the trusted third party with proxy signature [16]. A number of recent contributions [17, 18] confirm an ongoing interest in the development of new payment systems.
2.2 Model Verification, Checking and Colored Petri Nets CPNs are widely applied to model, verify and analyze communication protocols and network protocols [19–21]. CPN Tools have been used to model check the absence of deadlocks and livelocks and the absence of unexpected dead transitions and inconsistent terminal states [22]. Aoyang. C. used CPNs to verify and analyze the Internet Open Trading Protocol [23].
3 Activities of the Model The improvement of this scheme is based on proxy signature of Schnorr which adds the identity information of Digital Cash Exchange Center (DCEC) to the signature. Similar to the bill clearing house of central bank and the bank card switching center, DCEC is able to build a unified digital cash signatures, management, communication methods as well as clearing and regulatory modes easily. The model is shown in Fig. 1. There are five kinds of entities, such as customer, merchant, bank of customer (BankC), bank of merchant (BankM) and DCEC in this solution.
268
Y. Xu and J. Li
Fig. 1 Architecture of unified Dcash system
3.1 Authority of Agency and Authority of Issue Based on the authorization of central bank, the functions of DCEC includes: managing Dcash, undertaking e-signature as the agent of Central Bank, clearing for commerce banks and supervising to banks, maintaining the Dcash database. Commercial banks should make an application to central bank for authority of issue and achieve their functions such as Dcash issue, customer register, deposit, withdraw and so on. Register There are three kinds of registers must be completed before using Dcash: 1. Issue banks register in DCEC and provide IDBANKi, mwi and signature algorithms, the structures of Dcash. 2. Merchants register in BANKj and provide their certificates. 3. Customers register in BANKk and provide their certificates. Issue The issuance of Dcash in our program is divided into two stages. (1) Banks issue the Dcash. (2) Customers deposit. In this section, we will introduce how the banks issue the Dcash. In the follow section, we will introduce the process that how customers obtain Dcash.
An Exchange Center Based Digital Cash Payment Solution
269
As we know, the existing Dcash issued by different organizations has different formats. The different data formats in different Dcashes are compatible by registering in DCEC. The denomination in unified Dcash is formulated as the principle in existing paper currency. DCEC and banks are able to make batch digital signatures of Dcashes.And the Dcash can be produced without the customers’ application so that the requirement for networks is reduced. Withdraw Withdraw means customers get Dcash from their registered banks by pre-payment. In this solution it will completely be changed for the issue mode of Dcash. Each time the value of issued Dcash will become multiples of some fixed denomination or the combination of some fixed denomination so that Dcash does not have to be produced temporarily based on the value of the user’s requirements. It will be more convenient for customers’ withdraw. (1) Dcash does not have to be produced temporarily based on the value of the customer’s requirements. (2) the process of withdraw just relates to the customers and their registered bank without making real-time interaction with DCEC. Payment The process of Dcash payment is as follows: (1) Customer U selects goods on the merchant’s site and produces original order. (2) Merchant checks the request and generates the final order. Then it is sent to U, who confirms the order. After that, U selects Dcash payment among several payment options. (3) U uses Dcash client to combine or split Dcash into series of Dcash C in accordance with the number of payments SUM. Then C is sent to the trade company. (4) Merchant D receives Dcash C and makes a digital signature of information about IDD, order number, the total amount to pay and so on. Then D sends the information to registered bank which means Dcash authorization request. (5) BANKj decrypts them and gets IDD, C , SINGD(IDD, C )and so on. BANKj verifies the merchant’s signature and gets Dcash according to the combination or spin-off of C .BANKj makes a digital signature of C and IDBANKj. Then BANKj sends the signature to DCEC. (6) DCEC verifies the BANKj’s signature. It verifies itself signature in Dcash original information. If the verification of signature isn’t proved to be true, the authorization will be failed. Otherwise, DCEC obtains the signature signed by issuing bank which is the BankC. Then DCEC verifies this signature. If the signature is proved to be true, DCEC will check the Dcash database to confirm whether the corresponding serial number of received Dcash still in the available Dcash database. If Dcash is available, the combination or spin-off of series of Dcash pass the authorization verification successfully. When the issue bank’s signature or the DCEC’s signature isn’t proved, the authorization of Dcash will fail. Only when all the Dcash serial number of C is
270
Y. Xu and J. Li
available and the portion of spin-off of Dcash is verified, the whole authorization is completed. Otherwise the authorization fails. Then DCEC will send the failed information back to BANKj. BANKj will send the failed information to both D and BankC. Then customer receives the failed information sent BankC. If C is a combination of several Dcashes, DCEC will mark the Dcashes “used” to prevent these serial number of Dcashes retransmission reuse. If there are Dcashes split in C , DCEC will mark these serial number of Dcashes “split”. What’s more, DCEC marks the split Dcashes which are used to pay “used”. DCEC produces new time stamp and makes a digital signature of marked Dcashes C . (7) BANKj stores both C and the signature which DCEC made of C in the merchant’s ACCOUNTD. Meanwhile BANKj sends the information about successful authorization to merchant, which means the payment has been successful. Merchant is able to deliver to customer. After receiving combination of original Dcashes of C , BANKj makes corresponding deduction from ACCOUNTU. Meanwhile, BANKk sends the information about successful authorization to customer, which means the payment is successful. Then customer is able to check the merchant’s delivery information. Refund Refund means merchants or customers make an application to their registered banks for converting their Dcashes to corresponding value of the existing currency. When using Dcash online, both merchants and customers manage their Dcash accounts through their registered banks. Therefore, the process of refund is easier.
3.2 Analysis of the Activities According to the process mentioned previously, the activities involved in the whole process of using Dcash are showed in Table 1.
3.3 The CPNs Model of the Payment Activity Assumptions To simplify the model, we give three assumptions. 1. Both customer and merchant accept Dcash as payment tool. 2. All communication channels are reliable. All five roles are reliable. So we do not need CA (Certificate Authority) in whole trade process when the Dcash system is running. 3. We do not consider particular contents and formats of all messages, responses and verified result.
An Exchange Center Based Digital Cash Payment Solution
271
Table 1 Relationship between entities and activities Role
Customer
Merchant
BankC
BankM
√a
√
Activity Authority of agency Authority of issue Registerc Inquiryd Issue Withdraw Payment Refund Clearing Logistics
DCEC
Central Bank
√
√ √
√b
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√
√ The symbol stands for that the entity will attend at the step. For example, the role BankC, BankM and DCEC will appear in the activity Issue b The symbol √ stands for that the role or entity may attend at the step. It depends on the activity. For example, when Inquiry is that Customer inquiries his Dcash account, the role Customer, BankC and DCEC will attend in the activity Inquiry c As mentioned above, registration is divided into three different roles register: (1) Customers register involves customer and bank of customer; (2) Merchants register involves merchant and bank of merchant; (3) The issue banks register in DCEC, which involves DCEC and issue bank d There are two kinds of inquiry: (1) Customer’s inquiry involves customer, BankC and DCEC; (2) Merchant’s inquiry involvers merchant, BankM and DCEC a
4. We only model the payment activity, because payment is most complex activity in all activities of the Dcash system. In this activity, there are five entities to be considered: customer, merchant, BankC, BankM and DCEC.
4 State Space Analysis of the CPN Model This part analyses the state space report of the CPNs model. The state space (SS) report is generated by the CPNs Tools. Part of report is showed in Table 2. It shows that there are 187 nodes and 668 arcs in the state space report and the number of nodes and arcs in SCC Graph is the same as in the SS report.This implies that there is no cycles in this SS report. It means that there no livelocks in the model of payment activity. The table also shows that there is no dead transition instance. So the model of payment activity has no dead nodes. In the table we find that there are 3 Dead Markings. All three dead markings are listed in Table 3. From the table they are different characteristic. We will discuss them in detail.
272
Y. Xu and J. Li
Table 2 Part of the state space report Statistics
State space
SCC graph
Liveness properties
Nodes
187
Arcs
668
Secs
0
Status
Full
Node2
187
Arcs
668
Secs
0
Dead markings
3[33,103,187]
Dead transition instances
None
Live transition instances
None
Table 3 Type of dead markings Type
Dead markings
Characteristic
1
33
2
103
Payment result is Success.
The number of Invalid Order is up to limit (3)
3
187
The number of that payment verified result is Failure is up to 3
4.1 Type 1: When the Number of Invalid Order Verified by Merchant Is up to 3 In this situation the dead markings is markings 33. From the CPNs tools, we get the values of all of places. All states of others place in this dead marking 33 are empty or 1‘[]. Due to this, we can conclude that the dead marking is expected.
4.2 Type 2: When Payment Verified Result Is Success In this situation, the dead marking is marking 103. Payment is verified to be Success. It means that customer has finished the payment for his order and will get his goods. At the same time, the merchant has gotten Dcash and should deliver goods to customer. All entities have finished their Flow Chart and go to the terminate state. All the places of counters return to their initial value its initial state or empty or 1 []. Due to this analysis all states of all places, we can conclude that the dead marking 103 is expected.
An Exchange Center Based Digital Cash Payment Solution
273
4.3 Type 3: Payment Verified Result Is Failure In this situation, the dead marking is node 187. The Characteristic of this type is the number of that payment verified result is Failure is up to 3. That means that any of payment message (or Dcash) is invalid or is no sufficient for paying. From the CPNs Tools we get the values of all of places. Similarly, due to this analysis all states of all places, we can conclude that the dead marking 186 is expected. Obviously, the length of this path from node 1 to node 187 is longer than that of the path from node 1 to node 103.
5 Conclusions We present a unified Dcash payment system based on exchange center, which is modeled using CPNs tools and analyze state space report of the model.The contributions of the paper are summarized as below. (1) Definition of the unified Dcash payment model. We propose a unified model and a series of transaction procedures for the model, such as system preparation phase including system initialization, application for DCEC’s agency agreement of signature from central bank and authority of commerce banks’ issue, registering phase, inquiry phase, issuing phase, withdraw phase, payment phase, refund phase and clearing accounts phase in detail. We analyze the relationships among entities and activities in the Dcash system. From a form of the relationship, we know that not all five entities will present in all activities. But they will be in payment phase or payment activities and play important roles respectively. We therefore focus on the payment phase. Finally, we refine the detailed definitions of flow charts (or state machine) for five entities. (2) Modeling the payment activity by CPN Tools. To simplify the model, we give a series of assumptions. Then we take five entities into consideration for the core places. We declare their states as color sets. Base on these points and workflows, we get the CPN model. (3) Verification of the payment activity. By analyzing the state space of CPN model, we know that the payment activity has no livelock or dead code and that all three dead markings are desirable terminal states. There is not unexpected marking in this model. So the payment work flows of five entities all are controllable, realizable and feasible.
274
Y. Xu and J. Li
References 1. http://econc10.bu.edu/Ec341_money/Papers/papers_frame.htm 2. N. Asokan, A. Phillipe, J. Michael, S.M. Waidner, The state of the art in electronic payment systems. Computer 30(9), 28–35 (1997) 3. P. Anurach, Money in electronic commerce: digital cash, electronic fund transfer, and Ecash. Commun. ACM 39(6), 45–50 (1996) 4. J.D Tygar, Atomicity in Electronic Commerce. in Proceedings of the ACM Symposium on Principles of Distributed Computing’96 (Philadelphia, PA, USA, 1996), pp. 8–26 5. N. Borenstein, Perils and pitfalls of practical cyber commerce: the lessons of first victual’s first year. in Presented at Frontiers in Electronic Commerce (Austin, TX, 1994) 6. B. Cox, J.D. Tygar, M. Sirbu, NetBill security and transaction protocol. in Proceedings of the First USENIX Workshop on Electronic Commerce (1995) pp. 77–88 7. Y. Xu, Q.Q. Hu, Unified electronic currency based on the fourth party platform integrated payment service. Commun. Comput. Inform. Sci. 135, 1–6 (2011). 1 8. Y. Xu, C.Q Fang, A theoretical framework of fourth party payment. in The International Conference on E-Business and E-Government (iCEE, 2010), pp. 5–8 9. B.D. Chaum, Lind signatures for untraceable payments. in Advances in cryptology-rypto’82, (1983) pp. 199–203 10. G. Medvinsky, B.C. Neuman, Electronic currency for the internet, electronic markets. 3(9/10), 23–24 (1993) (invited); Also appeared in connexions 8 (6), 19–23 (1994, June) 11. I.B. Damgard, Payment systems and credential mechanisms with provable security against abuse by individuals. in Proceedings of Cryptology (1988). pp. 328–335 12. S. Garfinkel, Web Security, Privacy and Commerce, 2 ed. (O’Reilly Media Inc, 1005 Gravenstein Highway North, Sebastopol, CA, 2001) 95472.2002.1 13. T. Okamoto, K. Ohta, Universal electronic cash. in Advances in Cryptology-Crypto’91. Lecture Notes in Computer Science (Springer, 1992), pp. 324–337 14. D. Chaum, E. van Heyst, Group signatures. in Proceedings of EUROCRYPT’91. Lecture Notes in Computer Science (Springer, 1991) pp. 257–265 15. J. Camenisch, M. Stadler, Efficient group signatures for large groups. in Proceedings of CRYPTO’97. Lecture Notes in Computer Science 1294 (Springer, 1997) pp. 410–424 16. H.S. Zhou, B. Wang, L. Tie, An electronic cash system with multiple banks based on proxy signature scheme. J. Shanghai Jiaotong Univ. 38(1), 79–82 (2004) 17. Y. Mu, K.Q. Nguyen, V. Varadharajan, A fair electronic cash scheme. in Proceedings of ISEC 2001. Lecture Notes in Computer Science 2040 (Springer, 2001), pp. 20–32 18. A. Nenadic, N. Zhang, S. Barton, A security protocol for certified e-goods delivery. in Proceedings of the International Conference on Information Technology: Coding and Computing (IEEE Computer Society, 2004) 19. K. Jensen, in Coloured Petri Nets. Basic Concepts, Analysis Methods And Practical Use. Basic Concepts. Monographs in Theoretical Computer Science, vol 1 (Springer, 1997). 2nd corrected printing. ISBN: 3–540-60943-1 20. K. Jensen, in Coloured Petri Nets. Basic Concepts, Analysis Methods And Practical Use. Analysis Methods. Monographs In Theoretical Computer Science, vol 2 (Springer, 1997). 2nd corrected printing. ISBN: 3–540-58276-2 21. K. Jensen, in Coloured Petri Nets. Basic Concepts, Analysis Methods And Practical Use. Practical Use. Monographs In Theoretical Computer Science (Springer, 1997). ISBN: 3–54062867-3 22. J. Billington, G.E. Gallasch, B. Han, A coloured petri net approach to protocol verification. ACPN 2003, LNCS 3098 (2004) pp. 210–290 23. C. Ouyang, J. Billington, Formal analysis of the internet open trading protocol. in FORTE 2004 Workshops, LNCS 3236 (2004) pp. 1–15
Design and Implementation of Pianos Sharing System Based on PHP Sheng Liu, Chu Yang, and Xiaoming You
Abstract In order to realize the concept of sharing economy, enrich our public culture,decrease the idling rate of piano, in this paper, guided by the Yii2.0 framework, B/S architecture and MAMP is used as the system structure and integrated development environment respectively. PHP, Ajax and Boostrap are used as key development technologies; a sharing piano system based on PHP is designed and implemented to meet the actual demand of society. Through this platform, the requirements of both suppliers and demanders for using piano can be quickly got, so as to meet users’ multi-level and personalized using piano trading objectives. Keywords Sharing piano · PHP · MySQL · Yii2.0 framework
1 Introduction With the maturity of Internet technology, the concept of “sharing economy” is becoming more and more popular. After sharing bicycles, electric cars and charging treasures, the concept of “sharing culture” has begun to penetrate into people’s lives from all aspects. Especially with the improvement of people’s living standard, music has more and more influence on people. Because of the high cost of piano and its idleness, it is easy to cause resources not to be fully utilized and even wasted. Therefore, the demand for “sharing piano” is also increasing. Most of the existing sharing pianos are based on the piano company. It has not fully solved the problem of family piano idleness. How to rely on the development of “Internet +” technology to realize the piano service trading opportunities for users
S. Liu · C. Yang · X. You (B) School of Management, Shanghai University of Engineering Science, 201620 Shanghai, China e-mail: [email protected] S. Liu e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_25
275
276
S. Liu et al.
and customers based on sharing economy has become a real problem that has to be considered [1]. At present, there are related research on sharing management system such as sharing bicycle and sharing car management system, but there are few reports about the design and implementation of piano sharing system [2, 3]. Based on this, a piano sharing system is designed and developed in this paper. The system is guided by Yii 2.0 framework, B/S architecture and MAMP is used as the system structure and integrated development environment respectively. PHP, Ajax and Boostrap are used as key development technologies [4–6], and development tools such as Sublime Text 3, Navicat Premium are used to develop a flexible PHP-based piano sharing system to meet the requirement of sharing pianos [7, 8].
2 System Business Process Analysis This system will play an intermediary role, providing a secure trading platform for both parties who have idle pianos and need to use pianos. From the piano demand side, some families do not have the ability to buy piano, but there is a need to learn the piano. From the standpoint of piano idle, high maintenance costs and loss of piano idle will have a certain impact on the economy. Through this platform, the piano can be paid to more people who need to use it. The business process of the system is shown in Fig. 1. Sharing piano users can view the piano information nearby through the piano map and select a piano to make a time reservation. After checking out the relevant information, the system will give the estimated order price according to the user’s credit score, piano score and the duration of the reservation. After the reservation is completed, the sharing piano users will wait for the confirmation of the piano owner. Upon the confirmation of the piano owner, the user can play the piano at the appointed place at the appointed time and pays for the piano by sweeping the code offline. Then user can comment on the playing situation online, including the feeling of playing, the environment and the evaluation of the piano owner. The piano owner can provide piano information and personal information, improve the two-dimensional code of the receipt, and then conduct piano rental after online and offline audits. In the piano Information Interface, you can see the rental price per hour after the evaluation is completed. After received the piano reservation information, the information of the sharing piano users can be checked. If the credit score of the user is low, the owner has the right to refuse the reservation request made by the user. If the reservation is accepted, an offline collection will be made after the piano is hired out. At the end of the hiring service, the piano owner can be evaluated in the piano sharing system. If the evaluation is less than the required level, according to comments, the administrator can deduct piano owner credit points after auditing in the background.
Design and Implementation of Pianos Sharing System Based on PHP
277
Begin
Visitors
N Whether or not the user
register
Y log-in
N Query Piano Map
Y
Whether or not the owner
Providing Piano and Information to Improve Personal Receipt Information
Receiving Piano Reservation Information
Choose Piano to View Piano Information Re order
View user information Reservation time and place This order is over, please reserve it again.
Waiting for the confirmation time of the piano owner
whether or not Success
N
Whether or not agree to an appointment Y Off-line collection
N
Administrator's background audit, deducting the corresponding credit score
Y Off-line use of piano off-line payment
Evaluate users
Y
Comments on the Use of Piano
Is it less than 2 stars? N
Whether to apply for repair or not Y
N
Administrators decide to temporarily shut down Piano Service Based on information provided
End
Notify the piano owner
Fig. 1 System flowchart
3 System Design 3.1 System Overall Function Design The whole system includes three types of users: administrators, piano owners and sharing piano users. The functional module diagram of the system is shown in Fig. 2.
278
S. Liu et al. Sharing Piano System
Sharing Piano User
Administrators
Piano Owner
System feedback
View piano assessment results
Evaluating the piano user
View reviews
View reservation information
System feedback
Order details
Evaluating the piano owner
View reservation information
Reservation for piano
View piano map
Personal information
Feedback mailbox management
Credit score management
View repair information
Evaluating the piano
Fig. 2 System function module diagram
3.2 System Database Design The database is the core and foundation of the system and its design directly affects the quality of the whole system. (1) Conceptual Structural Design From the data requirement analysis, there are four entities in the system, namely, user, administrator, piano and piano owner. Through the integration of the dependence relationship between entities, the main E-R diagram of the system is designed as shown in Fig. 3. Because the system is a platform for individual piano rental, by default, a piano owner will provide a piano. So the relationship between the piano owner and the piano is one-to-one and the relationship between other entities is many-to-many. (2) Logic Structure Design According to the E-R diagram of the system, the design of logical structure for the database of the system is carried out. That is, the field names, types and lengths of the main tables in the database are described in detail. (a) user (user ID char (10), name char (8), sex char (2), age int, credit score int, status char (10), mailbox char (15), telephone char (11), identity card number char (20)) (b) piano owner (owner ID char (10), name char (8), sex char (2), age int, credit score int, status char (10), piano ID char (10), mailbox char (15), telephone char (11), ID card number char (20), payment code varchar (100)) (c) Reservation (Reservation ID char (10), Piano ID char (10), User ID char (10), Reservation Start Time Datetime, Reservation End Time Datetime, Reservation Status Char (10), Reservation Date Datetime, Reason varchar (100))
Design and Implementation of Pianos Sharing System Based on PHP Fig. 3 System E–R diagram
n
Piano owner
279 m
Administration
Administrators
n n evaluate
m Assess
l
User
n m
Reserve
m
provide
l
Piano
(d) Assessment (piano ID char (10), timbre int, handle int, stability int, environment int, administrator ID char (10)) (e) Piano (Piano ID Char (10), Hour Price int, Piano Picture Long Blob, Piano Age int, Piano Score int, Detailed Address varchar (50), Longitude Decimal, Latitude Decimal) (f) Evaluation (Evaluation ID Char (10), User ID Char (10), Owner ID Char (10), Evaluation Time Datetime, Evaluation Content varchar (100), Score int).
4 Implementation of the System In the system, Yii 2.0 framework is applied, MVC mode (Model, View and Controller) is followed, and business logic, data and interface display is separated to write code. In addition, business logic is putted into a background folder, while compiling personalized beautiful interface and improving user interaction experience, it can be reused. Model is the core of the application, View is used to display data, Controller is used to process data input. From the perspective of database system, Model is responsible for reading and writing database. View is used to display data read from database in controller, it is usually created based on model data. Controller processes and controls database data to cooperate with business operation, that is, it handles the parts which interact with the users [9, 10].
280
S. Liu et al.
4.1 Implementation of Piano Map Module This module is displayed in the piano user system, after entering the system, the piano user can click on the piano map to view the piano in the sub-menu under the piano order in the navigation bar. This map assists the piano user to hire the piano. On this map, the user can see the piano nearby (as shown in Fig. 4). After clicking on the blue bubbles, user can see the ID, detailed geographical location and piano score of the selected piano. This module uses Golden Map API and corresponding interface classes. The piano information and coordinate information are inquired from the background database, and transmitted to the front desk using json array. The front desk uses Ajax technology to receive data, after receiving data, AMap. Marker method in API is used to mark points on the map. The key codes are as follows: var infoWindow = new AMap.InfoWindow({offset:new AMap.Pixel(0,-30)}); for(var i= 0,marker;i na , then D[m, n a ] + dis({x[m + 1], x[m + 2], · · · x[M]}, {y[n a + 1], y[n a + 2], · · · y[N ]}) ≤ D[m, n a ] + dis(− , {y[n a + 1], y[n a + 2], · · · y[n b ]}) + dis({x[m + 1], x[m + 2], · · · x[M]}, {y[n b + 1], y[n b + 2], · · · y[N ]}) = D[m, n a ] + n b − n a + dis({x[m + 1], x[m + 2], · · · x[M]}, {y[n b + 1], y[n b + 2], · · · y[N ]}) ≤ D[m, n b ] + dis({x[m + 1], x[m + 2], · · · x[M]}, {y[n b + 1], y[n b + 2], · · · y[N ]})
(7)
where “–” means an empty set. Therefore, from (5) and (7), one can see that skipping D[m, nb ] does not affect the result of edit distance computation. In the case where nb < na and D[m, nb ] − D[m, na ] ≥ na − nb , since one can always find an ma such that m ≤ ma ≤ M and dis({x[m + 1], x[m + 2], . . . x[M]}, {y[n b + 1], y[n b + 2], . . . y[N ]}) = dis({x[m + 1], x[m + 2], . . . x[m a ]}, {y[n b + 1], y[n b + 2], . . . y[n a ]}) + dis({x[m a + 1], x[m a + 2], . . . x[M]}, {y[n a + 1], y[n a + 2], . . . y[N ]}), (8)
Fast Algorithm for Sequence Edit Distance Computation
421
from the fact that dis({x[m + 1], x[m + 2], . . . x[m a ]}, {y[n b + 1], y[n b + 2], . . . y[n a ]}) ≥ Max(m a − m − n a + n b , 0) ≥ m a − m − (n a − n b )
(9)
D[m, n b ] + dis({x[m + 1], x[m + 2], . . . x[m a ]} , {y[n b + 1], y[n b + 2], . . . y[n a ]}) ≥ D[m, n b ] + m a − m − (n a − n b ) ≥ D[m, n a ] + m a − m, D m, n b + dis({x[m + 1], x[m + 2], . . . x[M]}, {y[n b + 1], y[n b + 2], . . . y[N ]}) ≥ D[m, n a ] + m a − m + dis({x[m a + 1], x[m a + 2], . . . x[M]} , {y[n a + 1], y[n a + 2], . . . y[N ]}) ≥ D[m, n a ] + dis({x[m + 1], x[m + 2], . . . x[M]}, {y[n a + 1], y[n a + 2], . . . y[N ]})
(10) Therefore, from (5) and (10), one can see that skipping D[m, nb ] does not affect the result of edit distance computation (Fig. 2). In other words, if (4) is satisfied, D[m, nb ] can be ignored no matter whether nb is larger or smaller than na . The slope rule is very helpful for removing the computation redundancy in the DP process. There are some rules that can be viewed as an extension of the slope rule and are also helpful for efficiency improvement. (a) First Row Rule. That is, instead of (2), only one of the entry in the first row has to be initialized: D[0, 0] = 0
(11)
and other entries in the first row can be ignored. This is due to that D[0, n] − D[0, 0] = n ≥ |n − 0|.
(12)
Therefore, other entries in the first row are bound to be removed by the slope rule, (b) Same Entry Rule. Suppose that D[m, n] has been determined. If x[m + 1] = y[n + 1], then D[m + 1, n + 1] = D[m, n]
(13)
must be satisfied. Moreover, one has not to determine D[m + 1, n] from D[m, n] + 1. This rule can be proven from the fact that, if D[m, n + 1] does not delete from the slope rule, then D[m, n + 1] = D[m, n] should be satisfied. Thus, D[m, n + 1] + 1 =
422
H.-S. Chen et al.
D[m, n] + 1 is larger than D[m, n] + dif (x[m + 1], y[n + 1]) = D[m, n] and (3) is simplified as D[m + 1, n + 1] = min{D[m + 1, n] + 1, D[m, n]}.
(14)
Also note that D[m + 1, n] = min D[m, τ ] + dis(x[m + 1], {y[τ + 1] . . . y[n]}) τ =0,1,...,n D[m, τ ] + dis(x[m + 1], {y[τ + 1] . . . y[n]}), = min min τ =0,1,...,n−1
D[m + 1, n] + 1)
(15)
If D[m, n] is not deleted by the slope rule, then D[m, n − 1] < D[m, τ ] + n – 1 − τ where τ < n − 1. Therefore, D[m, τ ] + dis(x[m + 1], {y[τ + 1] . . . y[n]}) > D[m, n] − (n − τ ) + dis(x[m + 1], {y[τ + 1] . . . y[n]}) ≥ D[m, n] − (n − τ ) + n − τ − 1, D[m + 1, n] > D[m, n] − 1, D[m + 1, n] ≥ D[m, n]
(16) (17)
From (14) and (17), we can conclude that (13) must be satisfied. Moreover, since D[m + 1, n + 1] = D[m, n], if D[m + 1, n] is not omitted by the slope rule, then D[m + 1, n] = D[m + 1, n + 1] must be satisfied and D[m + 1, n] cannot be equal to D[m, n] + 1. (c) Different Entry Rule. Suppose that D[m, n] has been determined. If x[m + 1] = y[n + 1], then (i) If D[m, n + 1] is not empty, then only D[m + 1, n] = D[m, n] + 1
(18)
should be determined and one has not to determine D[m + 1, n + 1] from D[m, n]. This is due to that, if both D[m, n] and D[m, n + 1] does not delete by the slope rule, then D[m, n] = D[m, n + 1] should be satisfied and D[m, n] + di f (x[m + 1], y[n + 1]) = D[m, n] + 1 = D[m, n + 1] + 1. (19) (ii) If D[m, n + 1] is empty, then one has to compute D[m + 1, n] = D[m + 1, n + 1] = D[m, n] + 1.
(20)
Fast Algorithm for Sequence Edit Distance Computation
423
(iii) Suppose that x[m + 1] = y[n + τ ] but x[m + 1] = y[n + k]where k = 1, 2, . . . , t − 1, (21) if D[m, n + k] where k = 1, 2, …, τ − 1 are all empty, then we determine D[m + 1, n + k] = D[m, n] + k − 1.
(22)
If one of the entries among D[m, n + 1], D[m, n + 2], …, D[m, n + τ − 1] is active, then (22) is unnecessary to be computed. With these extension rules, the DP process can be further simplified.
2.3 Magic Number Rule The magic number rule is to delete the entry in the DP matrix that is impossible to achieve the minimal edit distance. Remember that the edit distance can be expressed by (5). Note that, if M − m ≥ N − n and x[m + 1] = y[n + 1], x[m + 2] = y[n + 2], …, x[m + N − n] = y[N], then dis({x[m + 1], x[m + 2], . . . x[M]}, {y[n + 1], y[n + 2], . . . y[N ]}) = M − m − N + n.
(23)
If N − n > M − m and x[m + 1] = y[n + 1], x[m + 2] = y[n + 2], …, x[M] = y[n + M − m], then dis({x[m + 1], x[m + 2], . . . [M]}, {y[n + 1], y[n + 2], . . . y[N ]}) . = N −n−M +m
(24)
Moreover, if none of the elements among {x[m + 1], x[m + 2], …, x[M]} is equal to {y[n + 1], y[n + 2], …, y[N]}, then dis({x[m + 1], x[m + 2], · · · x[M]}, {y[n + 1], y[n + 2], · · · y[N ]}) = max(N − n, M − m)
(25)
Therefore, from (23) to (25), we have |N − n − M + m| ≤ dis({x[m + 1], x[m + 2], . . . x[M]}, . {y[n + 1], y[n + 2], . . . y[N ]}) ≤ max(N − n, M − m) From (26), we can derive the magic number rule as follows:
(26)
424
H.-S. Chen et al.
[Magic Number Rule] If D[m 0 , j] + Max(N − j, M − m 0 ) ≤ D[m 0 , j0 ] + |N − j0 − M + m 0 |,
(27)
then the entry D[m0 , j0 ] can be deleted. Note that, if (27) is satisfied, then from (26), D[m 0 , j] + dis({x[m 0 + 1], x[m 0 + 2], . . . x[M]}, {y[ j + 1], y[ j + 2], . . . y[N ]}) ≤ D[m 0 , j0 ] + dis({x[m 0 + 1], x[m 0 + 2], . . . x[M]}, {y[ j0 + 1], y[ j0 + 2], . . . y[N ]}) (28) must be satisfied. Therefore, from (5), deleting the entry D[m0 , j0 ] does not affect the result of edit distance computation. For example, in Fig. 1b (M = 5 and N = 6), D[2, 3] = D[3, 3] = D[3, 4] = 1. However, since D[3, 4] + Max(N − 4, M − 3) = 1 + 2 = 3, D[2, 3] + | N – 2 − M +3| = 1 + 2 ≥ 3, from the magic number rule in (27), D[2, 3] can be deleted. Moreover, in the next row, D[3, 4] = 2 and D[4, 5] = 1. Since D[4, 5] + Max(N − 5, M − 4) = 2 and D[3, 4] + | N −3 – M +4| = 4 ≥ 2, from (27), D[3, 4] can also be deleted (Fig. 3).
Fig. 2 Illustration of the slope rule
Fig. 3 Illustration of the magic number rule
Fast Algorithm for Sequence Edit Distance Computation
425
Fig. 4 Flowchart of the proposed fast sequence edit distance computation algorithm
As the slope rule, the magic number rule is also very helpful for improving the computation efficiency for edit distance computation. The overall flowchart of the proposed efficient edit distance computation, which applies the slope rule, the magic number rule, and the rules extended from the slope rule in (11)–(22), is shown in Fig. 4.
3 Simulations To evaluate the performance of the proposed algorithm, we coded it in Matlab and performed on Intel i5-7400 4 core CPU at 3.00 GHz with 24 GB RAM. We randomly generate DNA sequences with different lengths. Then, these sequences are mutated (including insertion, deletion, and replacement) with different ratios from 5 to 50%. In Figs. 5 and 6, we compare the running time of the traditional DP method and the proposed algorithm in different sequence lengths and different mutation ratios. In Fig. 5, we show the case where the mutation ratio is 10%. In Fig. 6, we show the ratios of the computation time of the original DP method to that of the proposed fast algorithm under different sequence lengths and different mutation ratios. One can
426
H.-S. Chen et al.
Fig. 5 The computation times of the proposed fast algorithm and the original DP method when the mutation ratio is 10%
Fig. 6 The ratios of the computation time of the DP method to that of the proposed fast algorithm
see that the proposed algorithm is much more efficient than the original DP method, especially for the long sequence length case and the high similarity case. Note that, when the mutation ratio is 0.05, the computation of the proposed algorithm is 1/6 of that of the original DP method when the sequence length is 2000. Moreover, when the sequence length is about 15,000, the computation of the proposed algorithm is 1/18 of that of the original DP method. In Fig. 7, we show how the proposed algorithm simplifies the computation of the DP matrix (the entries that are processed are marked by blue color and the entries that are not processed are blank). We can see that, compared to other algorithms, when using the proposed algorithm, only a small part of entries is processed. Therefore, the proposed algorithm has much less computation redundancy than other edit distance computation methods. In Table 1, the computation times of the
Fast Algorithm for Sequence Edit Distance Computation
(a) Original DP
427
(b) Ukkonen’s method (c) Davidson’s method
(d) Proposed
Fig. 7 The entries that should be processed using different algorithms
Table 1 Comparison time of the proposed algorithm and other methods for edit distance computation Sequence length 600
DP method (ms) 5.66
Ukkonen’s method (ms) 2.24
Davidson’s method (ms) 2.47
Proposed algorithm (ms) 1.03
1000
15.72
6.60
7.33
2.66
2000
63.83
27.36
34.52
10.49
4000
256.30
113.61
163.10
41.17
8000
1033.51
463.86
764.24
166.99
original DP method, Ukkonen’s method [7], Davidson’s method [8], and the proposed algorithm are shown. The results show that the proposed algorithm requires much less computation time and can compute the edit distance between two sequences efficiently.
4 Conclusion In this paper, a very efficient algorithm to compute the edit distance between two DNA sequences is proposed. Several techniques, including the slope rule, the magic number rule, the first row rule, the same entry rule, aa the different entry rule, were proposed to remove the computation redundancy. With these rules, only a very small part of the entries in the DP matrix should be computed. It can much reduce the time for determining the similarity between two DNA sequences and will be very helpful for biomedical signal processing.
428
H.-S. Chen et al.
References 1. M.S. Waterman, Introduction to Computational Biology: Maps, Sequences and Genomes (Chapman and Hall/CRC, London, 2018) 2. J.M. Bower, H. Bolouri, Computational Modeling of Genetic and Biochemical Networks (MIT press, 2004) 3. P. Pevzner, Computational Molecular Biology: An Algorithmic Approach (MIT press, 2000) 4. W.R. Pearson, Using the FASTA program to search protein and DNA sequence databases, in Computer Analysis of Sequence Data (Humana Press, 1994), pp. 307–331 5. J. Ye, S. McGinnis, T.L. Madden, BLAST: improvements for better sequence analysis. Nucleic Acids Res. 34(2), 6–9 (2006) 6. N. Bray, I. Dubchak, L. Pachter, AVID: A global alignment program. Genome Res. 13(1), 97–102 (2003) 7. E. Ukkonen, Algorithms for approximate string matching. Inf. Control 64, 100–118 (1985) 8. A. Davidson, A fast pruning algorithm for optimal sequence alignment, in IEEE International Symposium Bioinformatics and Bioengineering Conference (2001), pp. 49–56
Predicting Student Final Score Using Deep Learning Mohammad Alodat
Abstract The purpose of this paper is to create a smart and effective tool for evaluating students in classroom objectively by overcoming human subjectivity resulting from lack of experience of instructors, and students’ over-trust in themselves. We had provided instructors in Sur university with the “Program for Student Assessment (PISA)” tool to assess it’s positive impact on academic performance, self-regulation, and improvement on their final exam scores. The study sample included in this study was the students enrolled at Sur University College at the time of data collection in the 2018/2019 semester. In the purpose of testing the efficiency of four models in predicting students’ final scores based on their mark in the first exam. The four tested algorithms were: Multiple Linear Regressions (MLP), K-mean cluster, Modular feed for-ward neural network and Radial Basis Function (RBF) (De Marchi and Wendland, Appl Math Lett 99:105996, 2020 [3], Niu et al, Water 11(1):88, 2019 12]). After comparing the four models’ effectiveness in predicting the final score, results show that RBF has the highest average classification rate, followed by neural network and K-mean cluster, while Multiple Linear Regressions was the worst at performance. RBF has been used to create the Instructor Program for Student Assessment (PISA).predicting student performance early will help students to improve their performance and help instructors modify their teaching style to fit their student’s needs. Keywords Euclidean dissimilarity · K-mean cluster · PISA · RBF · Neural network · Learn the deep machine
1 Introduction Traditional assessment of students is either the student assesses himself or instructor’s assessment for students [14]. However, Traditional assessment is a subjective way of M. Alodat (B) IST Department, Sur University College, Sur, Oman e-mail: [email protected]; [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_39
429
430
M. Alodat
assessment that depends on self-awareness which might increase a problem of SelfCognitive bias (Over-Trust) [8]. Self-Cognitive bias appears clearly when weaker and less mature individuals tend to overrate themselves, they tend to support positive results and reject negative outcomes. Weak assessment tools will mislead the learning process; for Instance, instructors when mistaken in expectations of the final exam score they write in recommendations low student effort, low student capability, and students comment that the reason is hard exams and incompetent teachers [2]. In order to increase the quality and chances of success after the first test of the course, it was necessary to predict the degree of students currently enrolled. The benefits that reflect the public (instructors, students, and administrator) after predicting the degree of students currently enrolled are as follows: (1) help students critically evaluate and more self-awareness in their studies, and create plans of action to improve their performance, which leads to academic accuracy and quality of education. (2) The administrator helps when to know expected grades in class room management and student maintaining. (3) Helps instructors to reduce effort and workload because instructors get feedback for every student [6].
2 Distance Numerical Measure 2.1 Euclidean Dissimilarities Euclidean dissimilarities: Distance numerical measure between objects and when the amount of difference is as small as possible are more alike [10, 13]. The degree of variation of Euclidean dissimilarities is extracted through averaging or by using the minimum or the maximum of the two values; In general, the distance value is non-negative number. Euclidean dissimilarities benefit to find a preference between prediction models, separation of classes (Good for Observation), and creation of good classifiers (compare). Euclidean dissimilarities Semi-Symmetric belongs to the same class is Euclidean dissimilarities symmetric, when values closer to zero and not a problem if it is smaller or equal to 0.5 as long as it is the smallest distance possible for that object but Interval between [1.5, 4] refers to a lousy, non-proper distance measure. The equation for the Euclidean dissimilarities (Interval, Ratio) is as follows: E D = | (Max P x − Min P x − 1) |
(1)
The degree of variation was divided into (1) Euclidean dissimilarities matched, they are two types (a) Euclidean dissimilarities symmetric is the degree of variation is complete. (b) Euclidean dissimilarities Semi- Symmetric (EDSS) is Close to match and start from 0.5 and close to zero. (2) Euclidean dissimilarities no matched, they are two types (a) Cut Dissimilarity (CD) is over the period [0.5.1.5] degrees. (b) Euclidean Dissimilarity of Wide (DDW) is over the period [1.5.4] degrees.
Predicting Student Final Score Using Deep Learning
431
2.2 Kernel Methods Kernel K-mean cluster is an unsupervised learning algorithm to classifying data into groups n objects based on attributes into k partition, and you want to group them and as the name suggests you want to put them into clusters, where k < n and k is the number of groups or data points within a homogeneous and heterogeneous group, which means objects that are similar in nature similar in characteristics need to be put together. Its advantages are that it is the easiest algorithm of clustering and has the advantage of speed, strength, and efficiency and gives the best results in cases of core clustering from hierarchical clustering.
2.3 Multiple Linear Regressions Multiple Linear Regressions is one of the most populate machine learning algorithms, it is fast, simple, primary and easy to use, By installing the best line to include all data points, this line is known as linear regressions and is represented by the following linear equation: Y = α + a0 X 1 + a1 X 2 + a2 X 3 .
(2)
They are used to assess the real values based on variables and the relationship between independent and non-independent variables. Its disadvantages are that it is limited to linear relationships and relies on independent data, thus rendering it incapable of complex relationships [5, 11].
3 Deep Machine Learning 3.1 Neural Network It is used in deep machine learning and fall within the two types are supervised models and unsupervised models. It is designed to mimic the brain’s human brain through massive processing and is distributed in parallel and has many applications. Consisting of simple processing units and these units are only computational elements called neurons or nodes neurons. The neurons have a neurological characteristic; they store practical knowledge and experimental information to make them available to the user by adjusting the weights. In order to get an acceptable output response, all ANNs go through the following phases: (1) training, (2) cross-validation, and (3) testing. Training a neural network means feeding it with teaching patterns and letting it to adjust its weights in the nodes and passing them to other classes for training and output results. It distinguishes them in the detection of complex nonlinear
432
M. Alodat
relationships between independent variables, and their disadvantages are difficult to follow and correct [1, 4, 7, 9].
3.2 Radial Basis Function (RBF) Radial basis functions (RBFs) is a nonlinear hybrid network and popular as a result of its computational advantages that typically contains a single hidden layer of processing elements. It can be applied for data interpolation and approximation plus it works on appropriate for large data and noisy data points in comparison to the other methods. RBF can be used to produce smoother curves or surfaces from a large number of data points in comparison to the other fitting methods and to deal with large scattered data points (unorganized) and able to be extended to high dimensional space[4, 7].
4 Results and Discussion The methodology used in this study, relies on deep learning to predict students’ performance. Effectiveness was tested a method Radial basis Function (RBF), by creating a quantity of data points and it is compared with three other models, and include: Neural Network, K-mean cluster, Multiple Linear Regressions. We discuss some differences among the predict students’ performance obtained by instructors from four models and each user model is represented as follows neural network (P1), K-mean cluster (P2), Radial basis Function (P3) and Multiple Linear Regressions (P4). We Excluded degrees Euclidean Dissimilarity of Wide (DDW) and degrees Euclidean dissimilarities matched less than 90% and the greater values of the Euclidean distance, as in Table 1. The best prediction was among the methods for the grade of the student in the final exam score of a course after the first exam, as in Table 2. Table 1 Excluded worst degrees Euclidean Variance
P1, P2, P3, P4
P1, P2
P2, P3
P2, P4
symmetric (0)
0.10
0.16
0.17
0.30
Semi-symmetric (0.5)
0.16
0.22
0.19
0.30
CDD (1.0)
0.36
0.33
0.34
0.21
CDD (1.5)
0.24
0.17
0.17
0.11
DDW (2.0)
0.02
0.09
0.01
0.02
DDW (2.5)
0.12
0.03
0.12
0.06
20.61
17.69
19.33
15.62
Euclidean distance
Predicting Student Final Score Using Deep Learning
433
Table 2 Best predictive methods Method Variance
P3, P4 No.
P1, P4 %
P1, P3
No.
%
No.
%
symmetric (0)
126
0.52
120
0.50
0.44
106
Semi-symmetric (0.5)
94
0.39
101
0.42
0.45
108
Total
220
0.91
221
0.92
0.89
214
Euclidean distance
7.35
7.38
7.35
The degree of variation in Euclidean dissimilarities symmetric is the best with model P1, P3, and obtained a match in predicting students’ final scores are 126 students. The degree of variation in Euclidean dissimilarities Semi-Symmetric (EDSS) is the best with model P3, P4, and obtained a semi-match is 108 students. The degree of variation in Euclidean dissimilarities matched is the best with model P1, P4, and got 92%. Equal Euclidean distance between model P1, P3 with P3, P4, and the worst model is P1, P4. Help the degree of variation between models to extract the status of the material currently registered for each student at the University College in the semester 2018/2019, such as violating, unknown, failure, withdrawn, incomplete, and transferred, as in Table 3. Students transferred to the college extracted from model P1, P3, Harvest the largest number of 13 students. Students violated by the currently registered material extracted from the model P3, P4, Harvest the largest number of seven students. Students who are failure, withdrawal, and incomplete, extracted from the model P3, P4, Harvest the largest number of 3 students. Students who are unknown there has been a sudden change in their lives, such as psychological, family, or financial problems, extracted from model P1, P3, Harvest the lowest number of four students. Table 3 The status of the material currently registered Method
P1, P3
P1, P4
P3, P4
Violating
2
1
7
Unknown
4
5
17
Failure
1
–
Withdrawn
1
Incomplete
–
1
Transferred
5
6
–
Withdrawn
–
1
–
Transferred
8
6
–
21
20
27
Variance 1
1.5 Total
1 1 1
434
M. Alodat
4.1 Indicators in the Status of the Materials The status of each student in the previous requirement for each registered subject indicates, as follow: (1) Finding Transferred students is due to the ease of graduation at the college from among the colleges located in the Sultanate or providing students with comfort in the access and reputation of the college. (2) Finding unknown students indicates a change for better or worse in surrounding the students, such as differences in the teaching style of teaching staff, and provision of comfort management for students. The degree of variation between models to extract the status of the materials currently registered for each unknown students, as in Table 4. Table 4 shows that all students in the models will receive a high score at the end of the semester. This indicates a rise in study levels, time management, management support, and academic guidance, which will result in fewer withdrawals and an increase in the overall rate. The degree of variation between models was used to extract the status of the materials currently registered for each student violated of the study plan, Table 5 shows that all students who violators of the study plan will receive a lower score at the end of the semester. This indicates the low level of scientific and practical students, and weak follow-up guidance and support by instructors and administration, which will lead to an increase in the number of withdrawals and decrease in the overall rate resulting in an increase in warnings, dismissal, and failure of the study. In order to solve this problem, it is necessary to stop the excesses of the study plan by solving the gap in the registration system and activating the role of the instructors. Table 4 The status of the materials related to Unknown students Status
Materials
P3, P4
Unknown
Business systems
2
Up Data structures
1
Decision support
2
E-commerce
3
Down
Down
Up
Down
1 1
2
1
1
1
Object-oriented
2
Strategic information
1
17
P1, P3
1
Knowledge management
14
Up 1
IS innovation
Enterprise architecture Total
P1, P4
1
1
1
1 1
2
3
5
3
5
4
Predicting Student Final Score Using Deep Learning
435
Table 5 The status of the materials related to students violated of the study plan Materials
Violating P1, P3
P1, P4
P3, P4
Up Down Up Down Up Down Business systems
1
2
E-commerce IS innovation
1
1
Enterprise architecture systems, IT audit and control and java programming
1
Total
2 2
1 1
1
6
7
5 Conclusion The Radial Basis Function (RBF) is one of the four models and was chosen to predict the student’s final exam mark. In comparison to students’ own expectations of their final exam score; RBF was more accurate and objective, which would assist decisionmakers (instructors and psychologists) in achieving student retention and increase profits, and assist students early to work harder during the semester. The results of the study indicate that both students’ and instructor’s perceive Instructor Program for Student Assessment (IPSA) as very satisfying and objective when compared with self-assessment or when evaluated by instructors. Future research should focus on investigating machine learning algorithms to predict the student’s performance by considering prerequisite and the place of training that suits his abilities and also extending the coverage of the dataset used in this paper.
6 Compliance with Ethical Standards The author declares that there is no conflict of interest and no fund was obtained. Institution permission and Institution Review Board (IRB) was taken from Sur University College. Written informed consent was obtained from all individual participants included in the study. The researcher explained the purpose and the possible outcomes of the research. Participation was completely voluntary and participants were assured that they have right to withdraw at any time throughout the study and non-participation would not have any detrimental effects in terms of the essential or regular professional issues or any penalty. Also, participants were assured that their responses will be treated confidentially. Ethical approval: This research paper contains a survey that was done by students’ participants as per their ethical approval. “All procedures performed in studies involving human participants were in accordance with the ethical standards
436
M. Alodat
of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.” Acknowledgements I would like to thank the management of Sur University College for the continued support and encouragement to conduct this research.
References 1. E. Akgün, M. Demir, Modeling course achievements of elementary education teacher candidates with artificial neural networks. Int. J. Assess. Tools Educ. 5(3), 491–509 (2018) 2. A.A. Darrow, C.M. Johnson, A.M. Miller, P. Williamson, Can students accurately assess themselves? Predictive validity of student self-reports. Update Appl. Res. Music Educ. 20(2), 8–11 (2002). 3. S. De Marchi, H. Wendland, On the convergence of the rescaled localized radial basis function method. Appl. Math. Lett. 99, 105996 (2020) 4. M. Gerasimovic, L. Stanojevic, U. Bugaric, Z. Miljkovic, A. Veljovic, Using artificial neural networks for predictive modeling of graduates’ professional choice. New Educ. Rev. 23(1), 175–189 (2011) 5. Z. Ibrahim, D. Ibrahim, Predicting students’ academic performance: comparing artificial neural network, decision tree and linear regression, in 21st Annual SAS Malaysia Forum, 5th September (2007) 6. B.A. Kalejaye, O. Folorunso, O.L. Usman, Predicting students’ grade scores using training functions of artificial neural network. Science 14(1) (2015) 7. K. Kongsakun, C.C. Fung, Neural network modeling for an intelligent recommendation system supporting SRM for Universities in Thailand. WSEAS Trans. Comput. 11(2), 34–44 (2012) 8. K. Leithwood, S. Patten, D. Jantzi, Testing a conception of how school leadership influences student learning. Educ. Admin. Quart. 46(5), 671–706 (2010) 9. I. Lykourentzou, I. Giannoukos, G. Mpardis, V. Nikolopoulos, V. Loumos, Early and dynamic student achievement prediction in e-learning courses using neural networks. J. Am. Soc. Inform. Sci. Technol. 60(2), 372–380 (2009) 10. Z. Miljkovi´c, M. Gerasimovi´c, L. Stanojevi´c, U. Bugari´c, Using artificial neural networks to predict professional movements of graduates. Croatian J. Educ. 13, 117–141 (2011) 11. M.F. Musso, E. Kyndt, E.C. Cascallar, F. Dochy, Predicting general academic performance and identifying the differential contribution of participating variables using artificial neural networks. Frontline Learn. Res. 1(1), 42–71 (2013) 12. W.J. Niu, Z.K. Feng, B.F. Feng, Y.W. Min, C.T. Cheng, J.Z. Zhou, Comparison of multiple linear regression, artificial neural network, extreme learning machine, and support vector machine in deriving operation rule of hydropower reservoir. Water 11(1), 88 (2019) - c, A neural network model for predicting children’s 13. M. Pavlekovi´c, M. Zeki´c-Sušac, I. Ðurdevi´ mathematical gift. Croatian J. Educ. Hrvatski cˇ asopis za odgoj i obrazovanje 13(1), 10–41 (2011) 14. K. Struyven, F. Dochy, S. Janssens, Students’ perceptions about evaluation and assessment in higher education: a review. Assess. Eval. Higher Educ. 30(4), 325–341 (2005)
Stance Detection Using Transformer Architectures and Temporal Convolutional Networks Kushal Jain, Fenil Doshi, and Lakshmi Kurup
Abstract Stance detection can be defined as the task of automatically detecting the relation between or the relative perspective of two pieces of text- a claim or headline and the corresponding article body. Stance detection is an integral part of the pipeline used for automatic fake news detection which is an open research problem in Natural Language Processing. The past year has seen a lot of developments in the field of NLP and the application of transfer learning to it. Bidirectional language models with recurrence and various transformer models have been consistently improving the state-of-the-art results on various NLP tasks. In this research work, we specifically focus on the application of embeddings from BERT and XLNet to solve the problem of stance detection. We extract the weights from the last hidden layer of the base models in both cases and use them as embeddings to train task-specific recurrent models. We also present a novel approach to tackle stance detection wherein we apply Temporal Convolutional Networks to solve the problem. Temporal Convolutional Networks are being seen as an ideal replacement for LSTM/GRUs for sequence modelling tasks. In this work, we implement models to investigate if they can be used for NLP tasks as well. We present our results with an exhaustive comparative analysis of multiple architectures trained on the Fake News Challenge (FNC) dataset. Keywords Stance detection · BERT · XLNet · Temporal convolutional networks · Fake news challenge
Kushal Jain and Fenil Doshi have made equal contributions to the work and Lakshmi Kurup was our supervisor. K. Jain (B) · F. Doshi · L. Kurup Dwarkadas J. Sanghvi College of Engineering, Mumbai 400056, India e-mail: [email protected] F. Doshi e-mail: [email protected] L. Kurup e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_40
437
438
K. Jain et al.
1 Introduction The modern media has become a home to a large number of misleading and manipulative content, from questionable claims, alternate facts to completely fake news. In the current scenario, the major ways in which information is spread is via social networking sites like Twitter and Facebook. To tackle such problems, many factchecking websites like PolitiFact, Snopes, etc. were introduced. However, by the time these websites debunk the truth of the fake news, it has already been read by millions of people. Therefore, in this information age, checking the authenticity of news and claims and quelling the spreading of fake news at its source has become a necessity. Moreover, manual fact-checking requires a lot of resources including dedicated and unbiased editors. Given the rapid advancements in Machine Learning and NLP, stance detection became an important part of automating this process of fact-checking. Since a lot of such polarizing content and information is disseminated through microblogging sites such as Twitter, [1] introduced a dataset that considered specific targets (usually a single word or phrase) and collected tweets about those targets. This dataset comprised of manually labelled the stance as well as the sentiment of tweets and was a task in Semantic Evaluation in 2016. In 2017, the Fake News Challenge (FNC-1) was launched along with a dataset containing headlines and articles to support or refute the headline. Furthermore, in 2017, Semantic Evaluation also included a task called RumorEval [2] that dealt with stance detection in Twitter threads. We make the following contributions in this research work: • Applying the recent state-of-the-art transformer and attention-based architecturesBERT and XLNet to the problem of stance detection. • Use the embeddings from pretrained language models to train multiple recurrent neural networks that employ conditional as well as bidirectional encoding of textual information and present an exhaustive comparative analysis of all the models. • Apply temporal convolutional networks to tackle the problem of stance detection and analyze whether they can replace LSTM/GRU based networks.
2 Related Work Previously, most of the work related to stance detection was centered around targetspecific stances where data was collected for a specific target and stance of different bodies were to be estimated in relation to the specified target. In the real world, not much data exists for a particular target word. Fake News Challenge [3] released a training dataset, the largest publicly available dataset for stance detection where the target was not a predefined single word but headlines with a series of words. The dataset was built upon the Emergent dataset [4].
Stance Detection Using Transformer Architectures …
439
The common approaches for dealing with the problem were to use handcrafted feature engineering techniques or using neural networks for automated feature selection. The baseline provided by the FNC organizers used hand-engineered features namely word/n-gram overlap, word-polarity, and refutation. In [5], the authors used another set of hand-engineered features-bag of words representation (Term frequency and Inverse document frequency) and cosine similarity between the two in their work. Bhatt et al. [6] combined the two approaches and employed a combination of character n-grams, word n-grams, weighted TF-IDF, polarity and additionally used the component-wise product and the absolute difference between skip-thought vector representation of length 4800. The winning solution in the FNC-1 used a combination of deep convolutional neural networks and gradient boosted decision trees with lexical features [7]. The next best solution by team Athene [8] comprised of an ensemble of five multi-layer perceptron (MLP) with six hidden layers each and handcrafted features. With the success of Word Embeddings and Recurrent Neural networks for NLP tasks, many research works incorporated Glove [9] and word2vec [10] embeddings along with LSTM/GRU cells into the network for this task. Chaudhry [11] represented the words as trainable GloVe vectors of 50 dimensions and trained LSTM network over them followed by an attention mechanism. Zeng [12] tried a similar approach with non-trainable GloVe embeddings and compared results between Bilateral Multiple Perspective Matching (BiMPM) network and attention models using RNN with GRU cells. Conforti et al. [13] used word2vec embeddings as word representations and proposed two architectures—Double Conditional Encoding and Co-matching attention, both followed by self-attention. The progress in approaches to deal with stance detection closely resembles the way in which the field of NLP has developed. Researchers have steadily attempted to apply state-of-the-art approaches to stance detection. This can be seen with the application of word embeddings, convolutions and attention for stance detection and the subsequent improvement in results with novel architectures incorporating these techniques.
3 Proposed Approach Much of the previous methods used for stance detection, as discussed above are primarily based on static or context-free word embeddings. To the best of our knowledge, we are the first to apply deep contextual word embeddings extracted from transformer-based models: BERT and XLNet to the problem of stance detection. Moreover, we also propose a novel architecture that employs temporal convolutional blocks to infer the task. Using temporal convolution to solve stance detection has been unprecedented and we present our results on such a network for the first time. The structure of the paper from here on follows this pattern. In this section, we briefly introduce each of the techniques used namely BERT, XLNet and Temporal
440
K. Jain et al.
Convolutional Networks, and then explain how we use it for our models. In the next section, we provide exhaustive details about our experiments, implementation details and results. Finally, we conclude with an analysis of the obtained results and discuss about the future scope of our work.
3.1 BERT BERT which stands for Bidirectional Encoder Representations from Transformers [14] introduced a novel pre-training approach based on masked language modelling and reported state-of-the-art results on multiple NLP tasks. The pre-training strategy used in BERT is different than the traditional strategy of autoregressive language modelling employed by models thus far. Previously, bidirectional LSTM based language models trained a standard left-to-right language model or combined leftto-right and right-to-left models, like ELMO [15]. However, instead of predicting the next word after a sequence of words, BERT randomly masks words in the sentence and predicts them. More specifically, during pre-training, BERT masks 15% of the words in each sequence with a [MASK] token. It then attempts to predict the masked words based on the context provided by words present on both sides of the masked word, hence giving a bidirectional representation. For our experiments, we have used the base model which consists of 12 transformer blocks, 12 attention heads and 110 million parameters. We extracted the weights of the last layer from the pretrained model for all the tokens in the text sequences present in the dataset. To reduce the computational overhead, we calculated these embeddings beforehand and saved them to disk, so that these embeddings can be used directly while training the recurrent models (discussed in later sections). To calculate these embeddings, we use a PyPI package called pytorch-transformers [16].
3.2 XLNet XLNet [17] was recently published by Google AI and reported state-of-the-art results on multiple NLP tasks, outperforming BERT on 20 tasks [17]. The major problem with BERT was that it corrupted the input with [MASK] tokens which are used only during pre-training and do not appear when we finetune BERT on downstream tasks [17]. This leads to a significant pre-train finetune discrepancy. To deal with the limitations of BERT, XLNet was trained using permutation language modelling. Permutation language models are trained to predict one token given preceding context like the traditional language model, but instead of predicting the tokens in sequential order, it predicts tokens in some random order. Unlike BERT which consists of transformer blocks, XLNet is based on another novel architecture called TransformerXL [18].
Stance Detection Using Transformer Architectures …
441
For our experiments, we used the base model of XLNet which consists of 12 Transformer-XL blocks and 12 attention heads. To extract the embeddings, we used a similar approach as we did for BERT.
3.3 Recurrent Architectures (LSTM) We use the embeddings extracted from the language models to train task-specific recurrent neural networks. We train three variants of architectures like in [19], which was, however, done on a different dataset • Independent Encoding: In this architecture, embeddings of headline and the article are fed independently to two different LSTM layers. The final hidden states of both the layers are concatenated and connected with a fully connected layer followed by a softmax transformation for predictions. • Conditional Encoding: Unlike independent encoding of inputs where the LSTM layers operated in parallel, here, they operate in sequence. First, the headline is passed through an LSTM layer. The final states of this layer are used as initial states for another LSTM layer through which the article is passed. The final states of this LSTM layer (article) are fed to a fully connected layer followed by a softmax layer for predictions. • Bidirectional Conditional Encoding: The architecture used here is like the previous one with the only difference being that bidirectional LSTMs are employed instead of the vanilla LSTM layers.
3.4 Temporal Convolutional Networks Temporal Convolutional Networks (TCN) are a relatively novel architecture for sequential tasks introduced in [20]. They have outperformed canonical recurrent models (with LSTM, GRU cells, and vanilla RNN) across multiple sequence modelling tasks [21]. TCN’s can be processed in parallel unlike RNN since convolutions can be performed in parallel [20]. Hence, compared to LSTM/GRU, these networks require a comparatively very low memory and are faster. TCN uses a combination of 1D Fully Convolution Network [22] and Causal Convolutions. These convolutions are dilated so that they can learn from bigger sequences for the tasks that need to remember longer history. Hence, we believe that a TCN would be able to capture important information from large sequences of text. Dilated convolutions enable an exponentially large receptive field, unlike usual convolutions which work on a linear receptive field. Causal convolutions are particularly useful for predicting the information related to future time steps since they ensure that there is no leakage of information from future to past. Since our task doesn’t necessarily obey the requirement of no leakage as the entire sequence can be used at a time to predict the output, we modified the architecture
442
K. Jain et al.
to use non-causal convolutions. This ensures that at a time step, the model learns from the forward as well as backward time steps. A single layer of TCN block is a series of 1D Convolution, Activation and Normalization layers stacked together with Residual connection. The convolution operation used is non-causal dilated convolution. Activation used is (Rectified Linear Units) ReLU [23] activation and weighted normalization are applied to convolution filters.
4 Experiments and Results 4.1 Training Dataset We perform our experiments on the FNC dataset. The key components of the dataset are the following: • Headline/Claim: A shorthand representation of an article, capturing only the essence of the article without any additional information. • Article Body: A detailed explanation of the headline or the claim. The article body might be related to the headline or be completely unrelated. • Stance: The output classes. The relation between the headline and the body can be classified into four classes: Agree, Disagree, Discuss, Unrelated. The total number of unique headlines and articles are around 1600 each. However, since each article is matched with multiple headlines, the total number of training examples reach up to 50,000.
4.2 Test Dataset The test dataset used in this work is the final competition dataset released by the organizers after the completion of the competition. This dataset was originally used to judge the submissions. The test dataset is structurally similar to the training set, however, the total number of tuples here, is around 25,000 with around 900 unique headlines and articles.
4.3 Pre-processing All the stop-words were removed from the text using NLTK [24]. Stemming or lemmatization was not performed on the text hoping that the embeddings from the pretrained models might capture important information from the text. Punctuations
Stance Detection Using Transformer Architectures …
443
were removed from the text so that their embeddings do not corrupt the word vectors. These words were later fed to the trained model to calculate the embeddings.
4.4 Embeddings Calculation We use an open-source library called pytorch-transformers developed by Hugging Face for getting the BERT as well as XLNet embeddings [13]. We use the pretrained model bert_base_uncased and xlnet_base_cased that gave us 768 length vector representation for each word based on its context. We used the base models in both cases due to the lack of computational resources.
4.5 Training 4.5.1
LSTM Models
We trained both LSTM as well as TCN models on a batch size of 128 for a total of 49,972 tuples in the training. LSTM models were trained for 20 epochs and TCN models were trained for 30 epochs. To account for the unbalanced dataset, we used a weighted loss function. We calculated the class weights using a simple function from scikit-learn [25]. The hidden-size or the number of hidden units in the LSTM layers in all the models was chosen to be 128. After combining LSTM encodings in different ways (depending on the architectures discussed previously), a fully connected layer with 128 units was used before making final predictions. The ReLU was used as the activation function for all the models. We trained our LSTM models using Keras [26] and PyTorch [27]. The models were trained on an apparatus with a 16 GB RAM which had an NVIDIA 960 M GTX GPU. Some models were also trained on CPU.
4.5.2
TCN Model
We used 100-D GloVe embeddings [9] to represent the text sequences. To capture the relation between the headline and article texts, we simply concatenated them. The words in the concatenated sequence were then converted into tokens and fed to a pretrained embedding layer. These embeddings are then fed to a TCN layer. TCN network has 128 filters with kernel size of 12 and 1 stacked-layer with a sequence of dilations as (1, 2, 4, 8, 16, 32). Dropout rate used was 0.1 for regularization and ReLU activation was used. The output from this layer was then fed to dense layer which finally gave us the predictions. We use the Keras-TCN [28] PyPi package for our experiments (Fig. 1).
444
K. Jain et al.
Fig. 1 Architecture of TCN model
4.6 Results We performed multiple experiments and present an exhaustive analysis of our competitive results. We compare and report the training and testing accuracy of all the three models while using both BERT and XLNet embeddings to represent the text in the headlines and article. The test dataset was completely unseen to the model. Maximum test accuracy of 70.3% was given by the bidirectional model trained on BERT representations. Overall, BERT embeddings outperformed XLNet on all three fronts with bidirectional models giving the best training and test accuracy (Tables 1 and 2). Table 1 Our results that show training accuracy with a weighted loss function and test accuracy for LSTM architectures after training for 20 epochs Models
Accuracy
Independent
Conditional
Bidirectional
BERT
Training
93.3
85.7
84.5
Test
65.4
70.1
70.3
Training
80.7
82.0
83.2
Test
64.3
67.0
67.8
XLNet
Stance Detection Using Transformer Architectures … Table 2 Results obtained after training for 20 epochs
445
Models
Accuracy
TCN
GLoVe
Training
83.64
Test
71.1
On the whole, TCN models trained on GLoVe gave the best accuracy compared to the recurrent models. Thus, it can be inferred that these networks are also a viable approach to obtain competitive results in NLP tasks. Our results also align with the findings in [20] that RNN architectures with LSTM/GRU units should not be the default go-to as is the case in many NLP related tasks.
5 Conclusions and Future Scope Our results show that BERT embeddings perform significantly better than those calculated using XLNet. While this seems anomalous, we must understand that we still do not completely understand what the neural network is learning and need to analyze the results to a greater extent before we reach any intuitive conclusion. As per our expectations, conditional and bidirectional conditional models perform better than independent models in most cases. We do see slightly anomalous results when it comes to training accuracy of BERT Independent models. Upon further investigation, we found that our model began to overfit after a certain point in time. The test accuracy, which is comparatively lower than subsequent models, however, confirms our hypothesis. We also experimented with Temporal Convolution Networks to see whether they can outperform canonical RNN architectures due to their larger receptive field. We conclusively infer that they gave more accurate results on the test set with lesser use of memory and training time. Our research and results are limited to the computing power and resources that we had access to. Further research with more resources needs to be performed to confirm the conclusions in [20] that TCN can be used to replace existing LSTM/GRU architectures for sequence modelling tasks. In this work, we used word embeddings for each token in the headline and article text. In future, we could try to use sentence-vectors to capture the meaning of the whole text into one fixed-size vector. This would reduce the computational resources required due to the reduced dimensionality of embeddings. Such models have recently become an integral part of fake news detection systems. TCN networks can also be trained with different sets of representations such as BERT, ELMo, etc. and comparative analysis can be done to give the best set of models. Models with greater accuracy and faster predictions will further help in improving these systems.
446
K. Jain et al.
References 1. S. Mohammad, S. Kiritchenko, P. Sobhani, X. Zhu, C. Cherry, SemEval-2016 Task 6: Detecting Stance in Tweets (2016). SemEval@NAACL-HLT 2. A. Zubiaga, G.W. Zubiaga, M. Liakata, R. Procter, P. Procter, Analysing how people orient to and spread rumours in social media by looking at conversational threads. PloS One (2016) 3. D. Pomerleau, D. Rao, Fake News Challenge (2017). http://www.fakenewschallenge.org/ 4. W. Ferreira, A. Vlachos, Emergent: a novel data-set for stance classification (2016). HLTNAACL 5. B. Riedel, I. Augenstein, G.P. Spithourakis, S. Riedel, A simple but tough-to-beat baseline for the fake news challenge stance detection task (2017). abs/1707.03264 6. G. Bhatt, A. Sharma, S. Sharma, A. Nagpal, B. Raman, A. Mittal, Combining Neural, Statistical and External Features for Fake News Stance Identification (2018). WWW 7. W. Largent, Talos Targets Disinformation with Fake News Challenge Victory. https://blog.tal osintelligence.com/2017/06/talos-fake-news-challenge.html 8. A. Hanselowski, P.V. Avinesh, B. Schiller, F. Caspelherr, D. Chaudhuri, C.M. Meyer, I. Gurevych, A retrospective analysis of the fake news challenge stance-detection task (2018). COLING 9. J. Pennington, R. Socher, C.D. Manning, Glove: global vectors for word representation. EMNLP (2014) 10. T. Mikolov, K. Chen, G.S. Chen, J. Chen, Efficient Estimation of Word Representations in Vector Space (2013). abs/1301.3781 11. A.K. Chaudhry, Stance Detection for the Fake News Challenge: Identifying Textual Relationships with Deep Neural Nets 12. Q.Q. Zeng, Neural Stance Detectors for Fake News Challenge (2017) 13. C. Conforti, N. Collier, M.T. Pilehvar, Towards Automatic Fake News Detection: Cross-Level Stance Detection in News Articles (2019) 14. J. Devlin, M. Chang, K. Chang, K. Toutanova, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2018). abs/1810.04805 15. M.E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L.S. Zettlemoyer, Deep contextualized word representations (2018). abs/1802.05365 16. Hugging Face pytorch-transformers. https://github.com/huggingface/pytorch-transformers 17. Z. Yang, Z. Dai, Y. Yang, J.G. Carbonell, R. Salakhutdinov, Q.V. Le, XLNet: Generalized Autoregressive Pretraining for Language Understanding (2019). abs/1906.08237 18. Z. Dai, Z. Yang, Y. Yang, J.G. Carbonell, Q.V. Le, R. Salakhutdinov, Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (2019). abs/1901.02860 19. I. Augenstein, T. Rocktäschel, A. Vlachos, K. Bontcheva, Stance detection with bidirectional conditional encoding. EMNLP (2016) 20. S. Bai, J.Z. Kolter, V. Koltun, An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling (2018). abs/1803.01271 21. D. Paperno, G. Kruszewski, A. Lazaridou, Q.N. Pham, R. Bernardi, S. Pezzelle, M. Baroni, G. Boleda, R. Fernández, The LAMBADA dataset: word prediction requiring a broad discourse context (2016). abs/1606.06031 22. J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation. CVPR (2015) 23. A.F. Agarap, Deep Learning using Rectified Linear Units (ReLU) (2018). abs/1803.08375 24. E. Loper, S. Bird, NLTK: The Natural Language Toolkit (2002). cs.CL/0205028 25. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. VanderPlas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011) 26. F. Chollet, Keras, GitHub (2015). https://github.com/fchollet/keras
Stance Detection Using Transformer Architectures …
447
27. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, Automatic differentiation in PyTorch (2017) 28. P. Rémy, Philipperemy/keras-tcn (2019). Retrieved from https://github.com/philipperemy/ker as-tcn
Updated Frequency-Based Bat Algorithm (UFBBA) for Feature Selection and Vote Classifier in Predicting Heart Disease Himanshu Sharma and Rohit Agarwal
Abstract In modern society, mortality and morbidity are caused majorly by heart disease (HD), and in world, deaths are mainly caused by heart disease (HD). The detection of HD and prevention against death is a challenging task. Medical diagnosis is highly complicated and it is very important. It must be performed efficiently with high accuracy. The professionals in healthcare in heart disease diagnosis are assisted by using various techniques in data mining. In this work, heart disease prediction method with following steps is introduced. The steps are preprocessing technique, feature selection, and learning algorithm. Before that important features are selected via the use of the updated frequency-based bat algorithm (UFBBA). In the UFBBA algorithm, the frequency values are computed via the use of the features. If the features are most important, then the frequency is higher else the frequency is lower. A selected feature from the UFBBA is used for better accuracy results than the other classifiers. A feature selected from the algorithm is applied for classification (Vote). Experimentation dataset of the proposed system is collected from Irvine (UCI) Cleveland dataset, University of California dataset. The results are measured with respect to accuracy, f-measure, precision, and recall. Keywords Data mining · Knowledge discovery in data (KDD) · Cleveland dataset · Cardiovascular disease (CD)
1 Introduction In modern society, mortality and morbidity are caused majorly by heart disease (HD), and in world, deaths are mainly caused by heart disease (HD). The detection of HD and prevention against death is a challenging task [1]. The heart disease H. Sharma (B) · R. Agarwal Department of Computer Engineering & Applications, GLA University, Mathura 281406, India e-mail: [email protected] R. Agarwal e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_41
449
450
H. Sharma and R. Agarwal
detection from “various factors or symptoms are a multi-layered issue which is not free from false presumptions often accompanied by unpredictable effects” [2]. The professionals in healthcare in heart disease diagnosis are assisted by using various techniques in data mining. The diagnosis process is made simple by collecting and recording the data of patient. The experience and knowledge of specialist who is working with the symptoms of same disease are used. In healthcare organization, service with less cost is required. Based on “valuable quality service denotes the accurate diagnosis of patients and providing efficient treatment”, poor clinical decisions may lead to disasters and hence are seldom entertained. The prediction of cardiovascular disease is done in a new dimension with the use of techniques in data mining. Efficient collection data and processing in computer are done in data mining. Sophisticated mathematical algorithms are used by data mining techniques to segment and to evaluate future events probability. Knowledge discovery in data is also termed as data mining [7, 8]. From large amount of data, in order to obtain, implicit results and important knowledge, data mining techniques are used. From massive data, new and implicit patterns are computed by user by using technology of data mining. In the domain of healthcare, diagnosis accuracy can be enhanced by discovering knowledge by medical physicians and healthcare administrators. This also enhances surgical operation goodness, and the effect of harmful drugs is reduced [3]. KDD is defined as: “The extraction of hidden previously unknown and potentially useful information about data” [4]. In data, hidden and novel patterns are computed by a user-oriented technique given by technologies in data mining. The quality of service is enhanced by using this discovered knowledge by healthcare administrators. The adverse effect of drug is reduced by using this discovered knowledge and alternative therapeutic treatment with less cost also suggested by this [5]. The issue in the prediction of heart disease can be solved by introducing various classification methods. The important features and data mining techniques are identified by this research for heart disease prediction. From Irvine (UCI) machine learning repository, University of California, the dataset is collected. Most commonly used dataset is Cleveland dataset due to its completeness [6]. Data mining classification techniques are used to create model, after feature selection. The model is named as Vote and it is a hybrid technique formed by combing logistic regression with Naïve Bayes. This chapter introduced updated frequencybased bat algorithm (UFBBA) for compared multiple features at a time. Then, classification is performed by Vote.
Updated Frequency-Based Bat Algorithm (UFBBA) …
451
2 Literature Review Wu et al. [9] studied data processing technique impacts and various classifiers are proposed. The heart disease of patient is classified by conducting various experimentations and tried to enhance accuracy data processing techniques are used. In high dimension dataset, highly accurate result can be produced by Naïve Bayes (NBs) and logistic regression (LR) techniques. Benjamin Fredrick David and Antony Belcy [10] developed a three data mining classification algorithms like RF, DT, and NBs. The algorithm giving high accuracy and is identified by this research. In the prediction of heart disease, better results are performed by RF algorithm. Haq et al. [11] used heart disease dataset to predict heart disease by machine learning-based diagnosis system. Seven most popular classifiers and three feature selection algorithms are used for selecting features which are important. This includes artificial neural networks (ANNs), k-nearest neighbor (kNN), logistic regression, support vector machine (SVM), RF, DT, NB, minimum redundancy maximum relevance (mRMR), relief, and least absolute shrinkage and selection operator (LASSO). The better performance with respect to execution time and accuracy is achieved by feature reduction techniques. The heart patient’s diagnosis can be assisted effectively by proposed machine learning-based decision support system. Kumari and Godara et al. [12] implemented repeated incremental pruning to produce error reduction (RIPPER) classifier using data mining techniques. Dataset of cardiovascular disease is analyzed using DT, SVM, and ANN. False positive rate (FPR), error rate, accuracy, sensitivity, true positive rate (TPR), and specificity are used to measure and compare performance. Thomas and Princy [13] predicted the heart disease by developing various classification techniques. The patient age, blood pressure, gender, and pulse rate are used compute the risk level. Data mining techniques like DT algorithm, kNN, and Naïve Bayes are used for classifying risk level of patient. More attributes are used for enhancing accuracy. Chadha and Mayank [14] extracted interested patterns to predict the heart disease using data mining techniques. This paper strives to bring out the methodology and implementation of these techniques such as ANNs, DT, and NBs and stress upon the results and conclusion induced on the basis of accuracy and time complexity.
3 Proposed Methodology The data mining process has following steps, preprocessing, feature selection, selection of various feature combination, and classifier model design. In this chapter, important features are selected via the use of the updated frequency-based bat algorithm (UFBBA). In the UFBBA algorithm, the frequency values are computed via
452
H. Sharma and R. Agarwal
the use of the features. If the features are most important, then the frequency is higher else the frequency is lower. A selected feature from the UFBBA is used for better accuracy results than the other classifiers. For very attribute combination, selection of feature and modeling process, record the performance of every model created by attribute selection. Record the performance of every model created by attribute selection and techniques in data mining. After the completion of whole process, results are displayed. Matrix laboratory (MATLAB) is used for implementation and it uses the UCI Cleveland heart disease dataset which is shown in Table 1 and workflow of proposed system is shown in (Fig. 1). Table 1 Description of attributes from UCI Cleveland dataset Attribute
Type
Description
Sex
Nominal
Patient gender(1 for male and 0 for female)
Age
Numeric
Patient age in years
Cp
Nominal
4 values are used to describe chest pain type Value 1 represents typical angina Value 2 represents atypical angina Value 3 represents non-anginal pain Value 4 represents asymptomatic
Fbs
Nominal
Fasting blood sugar >120 mg/dl; 1 if true and 0 if false
Chol
Numeric
Cholesterol in serum in mg/dl
Exang
Nominal
Angina induced by exercise
Trestbps
Numeric
Blood pressure is resting condition
Restecg
Nominal
3 values are resulted by resting electrocardiographic Value 0 defines normal Value 1 defines ST-T wave abnormality Value 2 defined probable or definite left ventricular hypertrophy by Estes’ criteria
Thalach
Numeric
Heart rate which is achieved as a maximum
Slope
Nominal
The slope of the peak exercise ST segment Value 1 is used to describe up sloping Value 2 is used to describe flat Value 3 is used to describe down sloping
Oldpeak
Numeric
ST depression which induced by exercise relative to rest
Num
Nominal
With 5 values, heart disease diagnosis is represented. Value 0 represents absence, and 1–4 represents presence of heart disease
Ca
Numeric
Major vessels count (0–3) which are colored by fluoroscopy
Thal
Nominal
3 values are used for describing heart status Value 3 is used to describe normal Value 6 is used to describe fixed defect Value 7 is used to describe reversible defect
Updated Frequency-Based Bat Algorithm (UFBBA) …
453
Fig. 1 Workflow of proposed system
3.1 Data Preprocessing The collected data is preprocessed. In Cleveland dataset, missing values are there in six records and they are removed to reduce the record count to 297 from 303. From multiclass value, transform the predict the heart disease presence value. The value 0 is used for absence and 1, 2, 3, 4 for presence representation and they are converted to binary values 0 for absence and 1 for presence of heart disease. The values of diagnosis ranging from 2 to 4 are converted into 1 by data preprocessing method. Resultant dataset is going to have only the values of 0 and 1. Absence of heart disease is represented by 0 and its presence is represented by 1. With 297 records, after reduction and transformation, 139 records are assigned with 1 and 158 records are assigned with 0.
3.2 Feature Selection In the prediction of heart disease, 13 features are used and it includes patient’s personal information like “sex” and “age.” From different examinations in medical field, balance 11 features are collected and they are clinical attributes. In classification experimentation, various feature combinations are selected. The classification model is named as Vote and it is a hybrid technique formed by combing logistic regression with Naïve Bayes. The lower bound is limited by applying Brute force method. This
454
H. Sharma and R. Agarwal
paper introduced updated frequency-based bat algorithm (UFBBA) for compared multiple features at a time.
3.2.1
Updated Frequency-Based Bat Algorithm (UFBBA)
First, bat algorithm is made of real encrypted bats. Second important notification is that for each binary encrypted bat fitness is computed in the hierarchical fashion. The candidate solution for binary bat is the amalgamation of preprocessing technique, feature selection, learning algorithm, and selecting adaptively the best possible solution for the UCI Cleveland dataset. It is represented as { p1 , . . . , pn , f 1 , . . . , f m , c1 , . . . , cl }, where p1 , . . . , pn are the preprocessing techniques out of which best performing technique is selected for the particular UCI Cleveland dataset. Each practical combination is an attributes of UCI Cleveland dataset. f 1 , . . . , f m are the total number of features among which only the optimal set of features are being selected. Each position in the UCI Cleveland dataset represents a feature. In the binary version of the bat algorithm, UCI Cleveland dataset consists of 0 and 1 values, position value having 1 means that particular feature is selected otherwise it is not selected. Furthermore, c1 , . . . , cl are the learning algorithms and each possible combination is the UCI Cleveland dataset. When the bat position is updated according to the traditional bat algorithm, it got real values in their next position. To produce its binary version, Gaussian transfer function is applied to the updated real values to bind them in between 0 and 1. Then, a random number is generated which is considered as the threshold value. It is represented in Eq. 1: F(S) =
1 if 1/(1 + exp(−(Snew ))) > rand 0 otherwise
(1)
Initially, a random population is generated in the starting phase of the evaluation of bats; fitness computation of each bat is carried in the hierarchical fashion. The dataset is preprocessed; first n bits consist of preprocessing techniques out of which appropriate technique is selected, after cleaning the dataset feature, selection is done on that trained dataset to select the optimal number of features out of next m bits. Finally, from l learning algorithms, appropriate model is selected and then the fitness of each bat is calculated. Fitness function is represented in Eq. 2. The fitness function comprises of two major objectives, first to maximize the performance of classification, and second to reduce the number of features selected. Fitness = c1 ∗ CError + c2 ∗ (N Fs/N )
(2)
where C Error is the classification error. c1 and c2 are random numbers such that c1 + c2 = 1. NFs is number of features selected and total number of features is represented as N. After computing fitness value of every bat, bat with minimum fitness is saved
Updated Frequency-Based Bat Algorithm (UFBBA) …
455
for the further analysis. The positions and velocities of bats are updated within each timestamp t. It is shown in Eqs. 3–5: f i = f min + f max − f min β
(3)
vit = vit−1 + xit−1 − x∗ f i
(4)
xit = xit−1 + vit
(5)
Important features are selected via the use of the updated frequency-based bat algorithm (UFBBA). In the UFBBA algorithm, the frequency values are computed via the use of the features. If the features are most important, then the frequency is higher else the frequency is lower. For every bat in local searching, a new solution is generated by using random walk: xnew = xold + ε At
(6)
n f i = n f min + n f max − n f min β
(7)
where random vector is represented by β ∈ [0, 1], current global best location is given by x ∗ and it is obtained by comparing all solutions of n bats. Scaling factor is given by ε ∈ [−1, 1]. Loudness parameter for bats is given by A. For every bat, velocity and positions are updated using the equations. Real encoded values are computed by this new velocity and positions. Gaussian transfer functions are used to binarize these real encoded values. In hierarchical manner, fitness is computed with new binarized solution. In all technique, every possible feature combination is tested. From 11 attributes, possible combination of 8 features is selected. Data mining technique is sued to test every combination.
3.3 Classification Modeling Using Data Mining Technique Data mining classification techniques are used to create model, after feature selection. The model is named as Vote and it is a hybrid technique formed by combing logistic regression with Naïve Bayes. The model performance is validated using a technique called tenfold cross-validation. Ten subsets are formed by dividing the entire dataset in this method and 10 times they are processed. Out of 10 subsets, for training 9 subsets used and for testing 1 subset is used. The results of 10 iterations are averaged to obtain final results. Stratified sampling is used to form subsets. In this, every subset is going to have same class ratio of main dataset.
456
H. Sharma and R. Agarwal
4 Results and Discussion Experimentation results are discussed in this section. Four performance evaluation parameters are used to measure the classification model performance. They are accuracy, f-measure, precision, and recall. Among all instances, correctly predicted instances define the accuracy. Precision and recall’s weighted mean defines fmeasure. For positive class, correct prediction percentage is given by precision and true positive value percentage defines recall. These performance measures are used for the identification of important features. Best performing model is created by identifying data mining techniques. The measured precision and accuracy values are used. Based on the performance of combined feature behavior, significant features are identified. Highly accurate models are created by analyzing techniques in data mining for the prediction of heart disease. The performance of the model is highly influenced by precision and accuracy. For future analysis, measure the performance of each classifier individually and they are recorded properly (Table 2). Figure 2 shows the performance results of precision metrics with respect to three classifiers like proposed NBs, Vote, and Vote + feature selection. The results demonstrate that the proposed Vote + feature selection classifier gives higher precision Table 2 Performance comparison metrics versus CD classification methods Methods
Metrics Precision (%)
Recall (%)
F-measure (%)
Accuracy (%)
NBs
83.33
87.76
85.54
75.91
Vote
89.15
88.46
88.0
81.47
Vote + feature selection
90.20
92.74
91.45
92.74
92
Precision (%)
90 88 86 84
NBs
82
Vote
80
Vote+feature selection
78
NBs
Vote Methods
Vote+feature selection
Fig. 2 Precision results evaluation of CD classification methods
Updated Frequency-Based Bat Algorithm (UFBBA) …
457
value of 90.20%, the other existing methods such as NBs, Vote gives lesser precision value of 83.33%, 89.15%, respectively. Recall results metrics with respect to three classifiers like proposed NBs, Vote, and Vote + feature selection are shown in Fig. 3. From Fig. 3, it shows that the proposed Vote + feature selection classifier provides higher recall value of 92.74%, the existing methods such as NBs, Vote gives lesser recall value of 87.76%, 88.46%, respectively. F-measure comparison of three classification methods is given in Fig. 4. Those methods are NBs, Vote, and Vote + feature selection. The results disclose that the proposed Vote + feature selection classifier provides higher f-measure results of 91.45%, whereas other existing methods such as NBs, Vote gives of 85.54% and 88.00%, respectively.
Recall (%)
94 92 90
NBs
88
Vote
86
Vote+feature selection
84 NBs
Vote
Vote+feature selection
Methods Fig. 3 Recall results evaluation of CD classification methods
F-Measure (%)
92 90 88
NBs
86
Vote
84
Vote+feature selecƟon 82 NBs
Vote
Methods
Vote+feature selection
Fig. 4 F-measure results evaluation of CD classification methods
458
H. Sharma and R. Agarwal
100 90
Accuracy (%)
80 70 60
NBs
50
Vote
40 30 20 10 0 NBs
Vote
Vote+feature selection
Methods Fig. 5 Accuracy results evaluation of CD classification methods
The accuracy results comparison of the three classification method is shown in Fig. 5. It discloses that proposed Vote + feature selection classifier provides higher accuracy results of 92.74%, whereas other existing methods such as NBs, Vote gives of 75.91% and 81.47%, respectively.
5 Conclusion and Future Work In modern society, mortality and morbidity are caused majorly by heart disease (HD), and in world, deaths are mainly caused by heart disease (HD). The detection of HD and prevention against death is a challenging task. Medical diagnosis is highly complicated and it is very important. It must be performed efficiently with high accuracy. The professionals in healthcare in heart disease diagnosis are assisted by using various techniques in data mining. The raw data is analyzed by techniques in data mining. Highly accurate disease prevention can be achieved by new insights of data given by this analysis. The UFBBA algorithm, the frequency values are computed via the use of the features. If the features are most important, then the frequency is higher else the frequency is lower.
Updated Frequency-Based Bat Algorithm (UFBBA) …
459
A selected feature from the UFBBA is used for better accuracy results than the other classifiers. A feature selected by feature selection algorithm is used for classification. Experimentation dataset of proposed system is collected from Irvine (UCI) Cleveland dataset, University of California dataset. The results are measured with respect to accuracy, f-measure, precision, and recall.
6 Future Work (1) The heart disease can be predicted from collection of patient data from remote devises by using an efficient remote heart disease prediction system with high accuracy. (2) Various optimization and feature selection techniques can be utilized to enhance the performance of this prediction classifier with more experimentation.
References 1. M. Gandhi, S.N. Singh, Predictions in heart disease using techniques of data mining, in International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE) (2015), pp. 520–525 2. S. Oyyathevan, A. Askarunisa, An expert system for heart disease prediction using data mining technique: Neural network. Int. J. Eng. Res. Sports Sci. 1, 1–6 (2014) 3. Z. Jitao, W. Ting, A general framework for medical data mining, in International Conference on Future Information Technology and Management Engineering (FITME) (2010) 4. A.K. Sen, S.B. Patel, D.P. Shukla, A data mining technique for prediction of coronary heart disease using neuro-fuzzy integrated approach two level. Int. J. Eng. Comput. Sci. 1663–1671 (2013) 5. K. Srinivas, G.R. Rao, A. Govardhan, Analysis of coronary heart disease and prediction of heart attack in coal mining regions using data mining techniques, in 5th International Conference on Computer Science & Education (2010), pp. 1344–1349 6. M.S. Amin, Y.K. Chiam, K.D. Varathan, Identification of significant features and data mining techniques in predicting heart disease. Telematics Inform. 36, 82–93 (2019) 7. J. Soni, U. Ansari, D. Sharma, S. Soni, Predictive data mining for medical diagnosis: an overview of heart disease prediction. Int. J. Comput. Appl. 17(8), 43–48 (2011) 8. S.S. Ubha, G.K. Bhalla, Data mining for prediction of students’ performance in the secondary schools of the state of Punjab. Int. J. Innov. Res. Comput. Commun. Eng. 4(8), 15339–15346 (2016) 9. C.S.M. Wu, M. Badshah, V. Bhagwat, Heart disease prediction using data mining techniques, in 2nd International Conference on Data Science and Information Technology (2019), pp. 7–11 10. H. Benjamin Fredrick David, S. Antony Belcy, Heart disease prediction using data mining techniques. ICTACT J. Soft Comput. 09, 1824–1830 (2018) 11. A.U. Haq, J.P. Li, M.H. Memon, S. Nazir, R. Sun, A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob. Inf. Syst. (2018) 12. M. Kumari, S. Godara, Comparative study of data mining classification methods in cardiovascular disease prediction. Int. J. Comput. Sci. Technol. 2(2) (2011)
460
H. Sharma and R. Agarwal
13. J. Thomas, R.T. Princy, Human heart disease prediction system using data mining techniques, in 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) (2016), pp. 1–5 14. R. Chadha, S. Mayank, Prediction of heart disease using data mining techniques. CSI Trans. ICT 4, 193–198 (2016)
A New Enhanced Recurrent Extreme Learning Machine Based on Feature Fusion with CNN Deep Features for Breast Cancer Detection Rohit Agarwal and Himanshu Sharma
Abstract The health and life of women’s are seriously threatened by breast cancer. In female diseases, morbidity breast cancer and mortality breast cancer are ranked as first and second. The breast cancer’s mortality can be reduced by effective lump detection in early stages. The early detection, diagnosis and treatment of breast cancer are enabled by a mammogram-based computer-aided diagnosis (CAD) system. But unsatisfied results are produced by available CAD systems. Feature fusion-based breast CAD method is proposed in this work, which uses deep features of convolutional neural network (CNN). Deep feature of CNN-based mass detection is proposed in the first stage. Clustering is performed by enhanced recurrent extreme learning machine (ERELM) method. Loads are forecasted using recurrent extreme learning machine (RELM) and gray wolf optimizer (GWO) is used to optimize the weights. Deep, morphological, density and texture features are extracted in the next stage. The malignant and benign breast masses are classified by developing fused feature set-based ERELM classifier. High value of efficiency and accuracy is produced by a proposed classification technique. Keywords Computer-aided diagnosis · Gray wolf optimizer · Mass detection · Deep learning · Recurrent extreme learning machine · Fusion feature
1 Introduction In women, death is most commonly produced by breast cancer. In 2017, 40,610 women in the USA are expected to die, as estimated by the statistics of American cancer society. In use, 3.1 million women are having the breast cancer in March 2017. R. Agarwal (B) · H. Sharma Department of Computer Engineering & Applications, GLA University, Mathura 281406, India e-mail: [email protected] H. Sharma e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_42
461
462
R. Agarwal and H. Sharma
The suspicious breast abnormalities are assessed using various imaging models, like breast ultrasound, dynamic contrast-enhanced magnetic resonance and diagnostic mammography in clinical diagnosis. The unnecessary biopsies are avoided by radiologists by interpreting these images. Computer-aided diagnosis (CADx) techniques are used to assist the radiologists in interpretation. These techniques enhance the suspicious breast region detection accuracy [1]. The challenges in breast cancer early detection are rectified by deep learning (DL) methods [2, 3]. This includes advances in computational technologies, image processing techniques and significant progress in machine learning. In recent CAD, DL and convolutional neural networks are used due to its advancements [4, 5]. The high accurate quantitative analysis is given by CNN when compared to CAD [6, 7]. The error rate is reduced about 85% by the recent advancement in DL methods [8]. Small breast cancers are identified by radiologists by recent models of CNN. Lesions description is generated by CNN and it helps the making accurate results by radiologist [9]. In the future, independent MG are reading can be done by radiologist using CNN advancements. Feature extraction is done by CNN and clustering is done by ERELM. Various sub-regions are formed by dividing mammogram. Fused set of features, morphological features, density features and texture features are used in classification stage. The process of selecting important features defines the accuracy of breast CAD system. Malignant and benign of breast mass are classified by a classifier, after feature extraction. The proposed work used ERELM as a classifier. Accurate results in classification of multi-dimensional feature are produced by ERELM classifier.
2 Related Work Zhang et al. [9] used rejection option in two-stage cascade framework which uses random subspace ensembles and various image descriptors. One class classifier is used ensemble the dataset in [10]. Bahlmann et al. [11] formed E and H intensity channels are formed by transforming color RGB image. Twenty-two features are extracted and used in classifier. Image classification is done by deep convolutional networks is designed to category machine learning [12]. The latest version of CNN and it requires high time in execution [13]. The artificial neural network is used for the prediction and classification of breast cancer in [14]. From wavelet transform, various techniques are derived for segmentation. Wavelet transform is used for breast cancer detection, its degree of localization varies with scale and produce images on different scales [15]. Other methods like C-mean clustering has been used and suggested that along with genetic algorithm it gives better results for segmentation efficiency of affected region’s extraction and detection [16]. State-of-the-art algorithms [17–20] are proposed by researchers for breast cancer detection.
A New Enhanced Recurrent Extreme Learning Machine …
463
3 Proposed Methodology In the detection of breast cancer, following five steps are used by this work. They are pre-processing of breast image, detection of mass, extraction of features, generation of training data and training of classifier. The contrast between surrounding tissues and suspected masses is increased by employing contrast enhancement algorithm in this work. The ROI is localized by performing mass detection. From ROI, morphological, density, texture and deep features are extracted. From dataset of breast image, images are used to train the classifier in the process of training using extracted features with their labels. Well-trained classifier is used to identify mammogram. The entire process of diagnosis is shown in Fig. 1.
3.1 Breast Image Pre-processing The noise in original mammogram is eliminated by an adaptive mean filter algorithm. This is done to reduce the effect of noise in subsequent analysis. In the image direction, a sliding window with fixed size is used. The noise content in a sliding window is computed by calculating variance, mean and spatial correlation values. The mean value replaces the pixel in the selected window, if it noisy. The contrast between surrounding tissues and suspected masses is increased by employing contrast enhancement algorithm in this work. Uniformly distributed histogram of image is formed. Enlarge the image’s gray scale. This will enhance the contrast and clear image details will be produced.
Fig. 1 Flowchart of the proposed mass detection process
464
R. Agarwal and H. Sharma
3.2 Mass Detection From normal tissue, mass regions are extracted in mass detection. Accurate extracted features will produce precise mass segmentation. Deep features of CNN sub-domain are used to propose a mass detection method. It used US-ELM clustering technique. After pre-processing, from images, ROI is extracted. Sliding window is used to divide ROI which form various non-overlapping subregions. The successful traverse of sub-regions is determined. The deeps features are extracted from sub-regions if they are traversed successfully. Else deep features will be clustered. The process of mass detection is completed by obtaining boundaries of mass area.
3.3 Extract ROI There are huge amount of gray values with 0 value in mammogram and they are not any useful information in breast CAD. The area of mammogram has to be separated from mammograms ROI to enhance the efficiency of mammary image processing and to ensure diagnosis accuracy. The breast mass region is extracted by using an adaptive mass region detection algorithm in this work. The first and last non-zero pixel in every row and column is computed by scanning mammogram sequentially. They are denoted by xs , xd , ys and yd .
3.4 Partition Sub-region The techniques used to form several non-overlapping sub-regions by dividing ROI are proposed in this section. In a rectangular area [xs , xd , ys , yd ], from ROI, searching area to compute masses is fixed. W = xd − xs gives the searching rectangular length and H = yd − ys gives width of it. The sliding window with width h and length w are used to segment the area of searching window. With certain step size, sliding window is moved in rectangle window. Non-overlapping sub-regions with equal size of w × h are formed by splitting ROI. The feature extraction is done using these sub-regions. The 48 × 48 sliding window with step size of 48 is used in this work. N non-overlapping sub-region (s1 , s2 , . . . , s N ) is formed by dividing ROI.
A New Enhanced Recurrent Extreme Learning Machine …
465
3.5 Extract Deep Features Using CNN From the sub-regions of ROI, deep features are extracted using CNN in this work. From previous steps, sub-region of image with dimension 48 × 48 is captured and given as an input to CNN. Input image with 48 × 48 × 3 dimension and 12 kernel is given as input to convolutional layer and 40 × 40 × 12 is obtained as an output. k Conv(i, j) =
W k,l (u, v) · input j (i − u, j − v) + bk,l
u,v
where kth kernel is represented W k,l and bias of kth layer is given by bk,l . The tanh is used as an activation function and it is lies in the range [−1, 1]. k k Out put(i, j) = tanh Conv(i, j) A max pooling layer is connected to first convolution layer’s output. Until reaching size of output as 2 × 2 × 6, next convolutional and max pooling layers are connected to one another. There are 2 × 2 × 6 = 24 neurons in fully connected layer. They are used in clustering analysis.
3.6 Clustering Deep Features Using ERELM The features extracted from architecture of CNN are clustered using ERELM algorithm. The number of cluster is set as 2. Two categories of feature sub-regions are formed. They are suspicious and non-suspicious mass area. Satisfied results are not produced by the supervised learning, if the training data is very small. The effect can be improved by the use of semi-supervised learning. Clustering is also performed by this. The relationship in internal of unlabeled dataset can be computed by using ERELM algorithm, which is a semi-supervised learning algorithm. Deep feature matrix X is given as input to algorithm and feature clusters are produced at the output. From training set X, Laplacian operator L is constructed. Randomly generate the output matrix of hidden layer neuron. To compute, output weights, expression min β 2 + λT r β T H T L Hβ is used, β∈R nh×n 0
if input neuron number is greater than hidden neuron number. The weight between output and hidden by β. If not this condition, the output weight layer is represented is computed by I0 + l H T L H v = γ H T H v. Embedded matrix is computed next and N point is classified into K categories using k-means algorithm.
466
R. Agarwal and H. Sharma
3.7 Enhanced Recurrent Extreme Learning Machine with GWO For medical diagnosis purpose, a new computational framework, GWO-ERELM, is proposed by this study. There are two phases in this framework. The irrelevant and redundant information is filtered out by GWO in first stage. In medical data, best combination of features is searched adaptively for doing the same. The population’s initial positions are generated by GA in proposed IGWO. In discrete space of searching, population’s current position is updated using a GWO. From first stage, optimum subset of features is obtained. On this subset, ERELM classification is performed which is an efficient and effective technique. There are three steps in proposed procedure. Step 1: Finalize the parameters of optimum networks like context neurons and neuron count network approximation function. The biases and weights are optimized at first time by employing recurrent extreme learning machine (RELM) with GWO learning algorithm. The accuracy of forecasting can be improved by this optimization. Step 2: Forecasting ERELM accuracy is calculated using RMSE and R2 . Proposed technique will calculate RMSE used for prediction measurement MAE and MSE as shown below. N t=1
MSE =
RMSE =
X (t) − Xˆ (t)
2
N N X (t) − Xˆ (t) 2
t=1
MAE =
N N
2 X (t) − Xˆ (t)
t=1
where current iteration is represented by t, number of samples is given by N, actual value is indicated by X and predicted value is indicated by X . In ELM, randomly select biases and weights [21]. The forecasting error minimization depends on these biases and weight values, as shown by simulation. GWO metaheuristic is used to optimize the weights. Input data with the structure of network and rate of learning is given to search algorithm. Best weight and biases values are searched by GWO. The mechanism of forecasting is shown in Algorithm 1. 1. Input: Original image dataset which has N image sample and objective function 2. Output: Segmented part with desired predicted value 3. Begin
A New Enhanced Recurrent Extreme Learning Machine …
467
4. Input weights wi are assigned and after optimization from GWO, receive biases b as 5. Output matrix H of a hidden layer Where H = h(i = 1, . . . , N ) is computed. and j = (1, . . . , K ) and h i j = g w j · xi + b j 6. Output weight matrix β = H + T is computed, where Moore-Penrose generalized inverse of matrix H is given by H + 7. To hidden and input layer context neurons, updated weights are fed 8. End
3.8 Classify Benign and Malignant Masses Based on Features Fused with CNN Features Deep fusion feature-based ELM classifier for diagnosis is proposed in this subsection. CNN is used to extract deep features. From the area of mass, density, texture and morphological features are extracted. Fusion features are classified using an ELM classifier. The results of malignant and benign are obtained. Feature Modeling: The early stage of breast disease is indicated by masses of breast in clinical domain. There are two classes of masses. They are benign and malignant. The most important mass properties are represented by deep features and they are extracted using a CNN. The following characteristics are contained in malignant mass of mammography as per the experience of doctor. They are, shape will be irregular and it has blurred edges, surface will be unsmoothed and it may have hard nodules and with respect to tissues in the surroundings, and they may have various intensities. The following characteristics are contained in benign mass of mammography. They will be regular in shape and edge, surface will be smooth and nodules are not accompanied and uniform distribution of density. Fusion features can be modeled as F = [F1 , F2 , F3 , F4 ]. where F1 denotes deep features, F2 denotes morphological features, F3 represents texture features and F4 represents density features. Classification: A feed-forward neural network with single hidden layer is proposed and it, i.e., termed as ERELM. The algorithm has better performance in generalization, manual parameter setups cannot affect the performance and it has high speed in learning. The results of breast cancer malignant and benign are obtained by using ERELM classifier in this work.
4 Experiment Results and Discussion The effectiveness of fusion feature-based diagnosis of breast cancer and US-ELM algorithm and CNN-based detection methods are analyzed in this section. The dataset
468
R. Agarwal and H. Sharma
Fig. 2 Result of precision rate
Precison (%)
100
ELM
RELM
ERELM
80 60 40 20 0
5
10
15
20
25
Number of images with 400 mammogram is used for experimentation and it has 200 benign and malignant images. To evaluate the results of the experiments accuracy, f-measure, precision and recall are used between the methods of ELM, RELM and ERELM.
4.1 Precision Rate Comparison From Fig. 2, the graph explains that the precision comparison for the number of images in specified datasets. The methods are executed such as CNN and DCNN. When number of image increased according with the precision value is increased. From this graph, it is learnt that the proposed DCNN provides higher precision than the previous methods which produce better CBMIR results.
4.2 Recall Rate Comparison From Fig. 3, the graph explains that the recall comparison for the number of images in specified datasets. The methods are executed such as ELM, RELM and ERELM. When number of images is increased corresponding recall value is also increased. From this graph, it is learnt that the proposed ERELM provides higher recall than Fig. 3 Result of recall rate
Recall (%)
100
ELM
RELM
ERELM
80 60 40 20 0
2
4
6
8
Number of Images
10
A New Enhanced Recurrent Extreme Learning Machine …
469
the previous methods. The reason is that the GWO produces the optimal ERELM parameters which will improve the breast cancer detection results.
4.3 F-Measure Rate Comparison From Fig. 4, the graph explains that the f-measure comparison for the number of images in specified datasets. The methods are executed such as ELM, RELM and ERELM. When the number of data is increased and the f-measure value is increased correspondingly. From this graph, it is learnt that the proposed ERELM provides higher f-measure than the previous methods. Thus, the proposed ERELM algorithm is greater to the existing algorithms and has better results of retrieval. This is due to pre-processing of image that will improve the breast cancer detection results even better than the existing methods such as ELM and RELM.
Fig. 4 Result of f-measure rate
F-measure (%)
85
ELM
RELM
ERELM
80 75 70 65
2
4
6
8
10
Number of Images Fig. 5 Result of processing time
Accuracy(%)
100
ELM
RELM
ERELM
80 60 40 20 0
2
4
6
8
Number of Images
10
470
R. Agarwal and H. Sharma
4.4 Accuracy Comparison From Fig. 5, the graph explains that the processing time comparison for the number of images in specified datasets. The methods are executed such as ELM, RELM and ERELM. In x-axis, the number of data is considered and in y-axis the accuracy value is considered. From this graph, it is learnt that the proposed ERELM provides lower processing time than the previous methods such as ELM and RELM. Thus, the output explains that the proposed ERELM algorithm is greater to the existing algorithm in terms of better cancer detection results with high accuracy rate.
5 Conclusion and Future Work In this work, fusion deep features are used to propose a breast CAD. From CNN, deep features extracted and they applied to mass diagnosis and detection. The USELM clustering and deep feature sub-domain of CNN-based method are used in mass detection stage. Malignant and benign breast are classified using ELM classifier in mass diagnosis stage. Fused set of features, morphological features, density features and texture features are used in classification stage. The process of selecting important features defines the accuracy of breast CAD system. Malignant and benign of breast mass are classified by a classifier, after feature extraction. The proposed work used ERELM as a classifier. Accurate results in classification of multi-dimensional feature are produced by ERELM classifier. This method can be applied to practical problems in the future and in parallel manner they can be implemented.
References 1. M.L. Giger, N. Karssemeijer, J.A. Schnabel, Breast image analysis for risk assessment, detection, diagnosis, and treatment of cancer. Annu. Rev. Biomed. Eng. 15, 327–357 (2013) 2. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015) 3. G. Litjens, T. Kooi, B.E. Bejnordi, A.A.A. Bejnordi, F. Ciompi, M. Ghafoorian, et al., A survey on deep learning in medical image analysis (2017) 4. R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, R.X. Gao, Deep Learning and Its Applications to Machine Health Monitoring: A Survey (2016) 5. J.G. Lee, S. Jun, Y.W. Cho, H. Lee, G.B. Kim, J.B. Seo et al., Deep learning in medical imaging: general overview. Korean J Radiol. 4(18), 570–584 (2017) 6. M.A. Hedjazi, I. Kourbane, Y. Genc, On identifying leaves: a comparison of CNN with classical ML methods, in Signal Processing and Communications Applications Conference (SIU) 2017 25th (IEEE, 2017), pp. 1–4 7. T. Kooi, A. Gubern-Merida, J.J. Mordang, R. Mann, R. Pijnappel, K. Schuur, et al., A comparison between a deep convolutional neural network and radiologists for classifying regions of interest in mammography, in International Workshop on Digital Mammography (Springer, 2016), pp. 51–56
A New Enhanced Recurrent Extreme Learning Machine …
471
8. J. Wang, H. Ding, F. Azamian, B. Zhou, C. Iribarren, S. Molloi, et al., Detecting cardiovascular disease from mammograms with deep learning. IEEE Trans. Med. Imaging (2017) 9. R. Platania, S. Shams, S. Yang, J. Zhang, K. Lee, S.J. Park, Automated breast cancer diagnosis using deep learning and region of interest detection (BC-DROID, in Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics (ACM, 2017), pp. 536–543 10. Y. Zhang, B. Zhang, F. Coenen, W. Lu, Breast cancer diagnosis from biopsy images with highly reliable random subspace classifier ensembles. Mach. Vision Appl. 24(7), 1405–1420 (2013) 11. Y. Zhang, B. Zhang, F. Coenen, W. Lu, J. Xiao, One-class kernel subspace ensemble for medical image classification. EURASIP J. Adv. Signal Process. (1), 17 (2014) 12. A. Krizhevsky, L. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 1097–1105 (2012) 13. T. Araújo, G. Aresta, E. Castro, J. Rouco, P. Aguiar, C. Eloy, A. Campilho, Classification of breast cancer histology images using convolutional neural networks. PLoS ONE 12(6), e0177544 (2017) 14. I. Saratas, Prediction of breast cancer using artificial neural networks. J. Med. Syst. 36(5), 2901–2907 (2012) 15. S. Bagchi, A. Huong, Signal processing techniques and computer-aided detection systems for diagnosis of breast cancer-a review paper. Indian J. Sci. Technol. 10(3) (2017) 16. S. Sharma, M. Kharbanda, G. Kaushal, Brain tumor and breast cancer detection using medical images. Int. J. Eng. Technol. Sci. Res. 2 (2015) 17. C. Bahlmann, A. Patel, J. Johnson, J. Ni, A. Chekkoury, ParmeshwarKhurd, A. Kamen, L. Grady, E. Krupinski, A.Graham, et al., Automated detection of diagnostically relevant regions in H&E stained digital pathology slides, in SPIE Medical Imaging (International Society for Optics and Photonics, 2012), pp. 831504–831504 18. P. Gu, W.-M. Lee, M.A. Roubidoux, J. Yuan, X. Wang, P.L. Carson, Automated 3d ultrasound image segmentation to aid breast cancer image interpretation. Ultrasonics 65 (2016) 19. F. Strand, K. Humphreys, A. Cheddad, S. Törnberg, E. Azavedo, J. Shepherd, P. Hall, K. Czene, Novel mammographic image features differentiate between interval and screen-detected breast cancer: a case-case study. Breast Cancer Res. 18(1) (2016) 20. H.D. Cheng, J. Shan, W. Ju, Y. Guo, L. Zhang, Automated breast cancer detection and classification using ultrasound images: a survey. Pattern Recogn. 43(1), 299–317 (2010) 21. R.L. Siegel, K.D. Miller, A. Jemal, Cancer statistics. CA Cancer J. Clin. 66(1), 7–30 (2016)
Deep Learning-Based Severe Dengue Prognosis Using Human Genome Data with Novel Feature Selection Method Aasheesh Shukla and Vishal Goyal
Abstract In recent years, patients affected by dengue fever are getting increased. Outbreaks of dengue fever can be prevented by taking measures in the initial stages. In countries with high disease incidence, it requires early diagnosis of dengue. In order to develop a model for predicting, outbreak mechanism should be clarified and appropriate precautions must be taken. Increase in temperature, sea surface temperature, rapid urbanization and increase of rainfall due to global warming is the interplay factors which influences the outbreaks. Human travel and displacement due to increase in urbanization and temperature, causes dengue virus-infected mosquitoes to be spread. High accurate classification can be achieved by deep learning methods. It is a versatile and regressive method. Small amount of tuning is required by this and highly interpretable outputs are produced. Healthy subjects and dengue patients are differentiated by the factors determined by this and they are used to visualize them also. These factors increase the stability and accuracy of the boosting process in construction of dengue disease survivability prediction model. Problems in overfitting can also be reduced. Applications like decision support systems in healthcare, tailored health communication and risk management are incorporated with the proposed method. Keywords Machine learning · Data mining · Deep learning · Dengue virus · Particle swarm optimization · Feature selection
1 Introduction As per the records of Union Health Ministry, over 80 billion people are claimed by dengue and around 40,000 people are affected by dengue in our country. Till A. Shukla (B) · V. Goyal Department of Electronics & Communication Engineering, GLA University, Mathura 281406, India e-mail: [email protected] V. Goyal e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_43
473
474
A. Shukla and V. Goyal
September 30, disease has claimed the lives of 83 people as stated by National Vector Borne Disease Control Programme (NVBDCP) under the Health Ministry. Last year, 325 people have been killed by mosquito-borne tropical disease. In Kerala, 35 people are claimed by dengue till 30 September and around 3,660 are affected in the state. In Maharashtra, 18 people were killed and 4,667 people are affected by this. It is very important to identify it in initial stages. This is required to avoid hospitalization of patients unnecessarily. Adaptive and innate responses are comprised of human’s immune system. The first defence line system in human body is immune system. Innate immune responses are triggered by pattern recognition receptors (PRR). In antigen presenting cells, PRP are present and it includes, Fc-receptors, toll-like receptors, lectins and complement receptors. Various forms of APC responses are activated by these receptors engagement. Genetic background defines the adaptive and innate pathways. Associations between single polymorphisms (SNPs) and dengue infection phenotype in multiple genes are computed. It includes transporter associated with antigen processing (TAP), cytotoxic T lymphocyte-associated-antigen-4 (CTLA-4), endritic cell-specific intercellular adhesion molecule 3 (ICAM-3)-grabbing nonintegrin (DCSIGN), vitamin D receptor (VDR), acute plasma glycoprotein mannose-binding lectin (MBL) and human platelet-specific antigens (HPA), FCcRIIa, cytokines (IL, IFN, TNF and others), Fc gamma receptor IIA (a pro-inflammatory regulatory Fc receptor), human leukocyte antigen (HLA) genes and vitamin D receptor and [1]. Classification and regression trees (CART), support vector machines (SVM) and linear discriminant analysis (LDA) classification algorithms are used in dengue research based on ML. None methodology is predicting dengue severity based on genomic markers. PSO-SNP is used for feature selection and deep learning for prediction is proposed in this study. This chapter is organized as follows and related works are reviewed in Sect. 2, and in Sect. 3 proposed methodology is presented. Section 4 discusses experimental results and in Sect. 5 conclusion and future work is presented.
2 Related Work From clinical data, disorder mentions are extracted by entity recognition and time expressions and other features are extracted by the authors [2]. The absence and presence of dengue are predicted by a proposed model. The manifestation of dengue symptoms and occurrence of dengue are correlated by performing frequency analysis. Multivariate Poisson regression is proposed by [3] and it is a statistics-based technique. In science, it is a well-established statistics methodology. They are used to find the relationship between linear parameters. In hidden layer, knowledge is computed by data mining techniques.
Deep Learning-Based Severe Dengue Prognosis Using Human Genome …
475
In small area, real-time dengue risk prediction model is proposed in [4]. Early warning is given by using this model instead of traditional statistical model. Decision tree algorithm is used by [5]. In tribal area, occurrences of dengue disease are predicted by a decision tree model. Gomes et al. [6] used radial basis function (RBF) kerneland gene expression data. Guo et al. [7] applied support vector regression (SVR). Various machine learning techniques are approached by this. In a more recent work, Carvajal et al. [8] studied the incidence of dengue in Philippines using meteorological factors. They compare several machine learning techniques such as general additive modelling, seasonal autoregressive integrated moving average with exogenous variables, random forest and gradient boosting. Fathima et al. [9] use random forests to discover significant clinical factors that predict dengue infection prognosis. Khan et al. [10] used traditional statistical techniques to identify the factors that predict severe dengue. Potts et al. [11] showed the results of applying traditional statistical models to prediction of dengue severity in Thai children using early laboratory tests. Caicedo et al. [12] compared the performance of several machine learning algorithms for early dengue severity prediction in Colombian children. Lee et al. [13] used ML models in specific case of dengue–chikungunya discrimination to perform differential diagnosis among dengue and chikungunya. It uses multivariate logistic regression and decision trees. Furthermore, they employ an adult cohort and include clinical symptoms. Laoprasopwattana et al. [14] used small cohort of children in southern Thailand to conduct prospective study. Under standard logistic regression model, around 70.6% of correct predictions are made by the proposed study and 83.3% of specificity is shown by it. Paternina-Caicedo et al. [15] performed differential diagnosis using a decision tree. Children under 24 months are used to collect dataset and interesting results have been produced.
3 Proposed Methodology In proposed method, deep learning methods are used to prognosis the dengue. There are four stages in the proposed method (i) data acquisition, (ii) data pre-processing, (iii) feature selection and (iv) patient classification. Figure 1 shows the entire process of processed method.
3.1 Patient Cohort and Clinical Definitions In this study, hospitals in Brazil and Recife city are used to collect the database of patients with dengue symptoms. The patients are explained with the process
476
A. Shukla and V. Goyal
Fig. 1 Deep learning-based dengue prognosis
Genome data
Feature extraction and normalization
Feature selection using PSO
Prognostic results
Deep learning
of proposed technique, and their consent is obtained from willing patients. Ethics committee of FIOCRU reviewed this study and approved it. Patient’s blood is processed at laboratory and obtained the results of the following test: White blood cell count, platelet count, hemogram, serum albumin, serum alanine transaminase (ALT), serum aspartate transaminase (AST) and haematocrit. Summarize the cohort information of patient. Between group of DF and SD, age bias is not considered as shown in Table 2. (1) Peripheral blood mononuclear cells (PBMC) isolation and genomic DNA extraction (features F): Centrifugation at 931 g for 30 min on a Ficoll-Paque PLUS gradient (Amersham Biosciences, Uppsala, Sweden) is used to isolate PBMC from blood. From interface, mononuclear cells are collected and in cold, they are washed with phosphate-buffered saline. Supernatant is discarded after centrifugation for 335 g for 15 min. To remove residual red blood cells, pellet was washed in ammonium-chloride-potassium (ACK) lysing buffer. In supplemented culture medium, PBMC is re-suspended and they are cryopreserved at 80 °C. From PBMC of patients, extract the DNA using Wizard DNA extraction kit by following protocol defined by the manufacturer. (2) Illumina Golden Gate Genotyping Kit (Cat# GT-901-1001) is used to genotype the selected dengue patients. Allele-specific extension method is employed by this protocol. At high multiplex level, PCR amplification reaction is conducted. High quality Bead Chips and multiplexed assay are combined in a precise scanning system. In imaging system tray, Bead Chips carriers are placed to process genome data. Bead Array Reader is used to image the Bead Chips. Two high-resolution laser imaging channels are used. These channels scan the Bead Chips simultaneously to generate to images. It has high throughput and data quality. 3.1.1
Data Pre-processing
There are genotypes of 102 patients in genome data and at 322 loci polymorphisms they are measured. Indicators are formed by encoding the data by heterozygous or homozygous recessive and homozygous dominant. One feature per SNP genotype is produced by this. Additional category of data is formed by the missing data.
Deep Learning-Based Severe Dengue Prognosis Using Human Genome …
477
3.2 Feature Selection In order to avoid overfitting of classifier, reduction of dimensionality is needed. This is also due to the fact that of small sample size [16]. PSO algorithm is used for feature selection, which in turn reduces the dimensionality. Algorithm 1. PSO-based Feature Selection Input : D-Dataset, M-features, N-instances, SWARMSIZE- number of gene subset, pbestlocGene subset’s is Personal best position, pbestis- Fitness value associated with pbestloc, gbestisFitness value corresponding to gbestloc and gbestlocis-entire swarm’s Best position. Output : Feature subset m 1. n random subset of genes are initialized and they are assumed as particles There are m features in each subset. 2. For every random subset, velocity and position are initialized. 3. Swarm’s fitness value is computed as follows, 4. Using equation (1), fitness value is set as squared error
5. 6. 7. 8. 9. 10.
For every subset, pbestloc and pbest are initialized using initial value of fitness. Repeat if pbest value of subset is greater than its fitness then Subset’s pbestloc and pbest are updated end if Among all subsets, minimum fitness value position is computed and gbestloc, gbest are set. 11. for to SWARMSIZE-1 do 12. Using equation (2) , new velocity is computed (2) 13. Using equation (3), location is updated as, (3) 14. End for 15. Squared error is computed and gene subset’s current location is used for setting fitness value 16. Repeat until reaching maximum number of iterations. 17. Best subset of genes is outputted by gbestloc
3.3 Classification Based on Deep Learning (DL) DL is trained using 37 points in original dataset. Test set is formed using 25 samples. This study proposes pretrained deep learning architectures based on transfer learning. Prior to training phase, various parameters have to be defined in CNN algorithm [17]. Classification results are influenced by these parameters based on application. The set of parameters having high influences are computed performing prior random
478
A. Shukla and V. Goyal
test. In full factorial experiment, these computed factors are analysed statistically to obtain best results. In algorithm, other parameters with less influence are set by default.
3.3.1
Convolutional Neural Network
Mammal brain’s deep structure is inspired by deep learning method and it is a machine learning technique [18]. Multiple hidden layers are used to characterize the deep structure. Feature’s different level of abstraction is allowed by this. It is a nun supervised single layer greedily training algorithm and layers are used to train the deep network. Various deep networks are trained by this method due to its effectiveness. Convolutional neural network is a more powerful deep network. It has various hidden layers. Subsampling and convolution are performed by this layer to extract, features of high and low level from input. There are three layers in the network. They are fully connected, pooling and convolutional layers. Following section explains every layer in detail.
3.3.2
Convolution Layer
A kernel with size a ∗a is convolved with an input image with R ∗C size in this layer. The kernel is convolved with every block of input matrix independently. This is used to generate output pixel. N features of output image are generated by convolving the kernel with input image. Filter refers to the convolutional matrix’s kernel and feature map refers to the obtained feature by convolving input data with kernel and it has a size of N. There are various convolutional layers in CNN. Feature vector is the input and output of convolutional layer. In every convolutional layer, there are n filters. Input is convolved with these filters. In the operation of convolution, number of filters equals the generated feature map’s depth. At dataset’s certain attribute, specific feature is considered as an every filter. The C (l) j denotes the lth convolution layer output. Feature maps are contained in this and it can be calculated as, Ci(l)
=
Bi(l)
ai(l−1)
+
K i,(l−1) ∗ C (l) j j
(4)
j=1
where bias matrix is represented as Bi(l) and convolution filter or kernel is represented and it is of size a ∗ a. The jth feature map in layer (l − 1) with ith feature as K i,(l−1) j map in same layer are connected by this. Feature maps are contained by this. First convolutional layer is input space in Eq. (4), that is, Ci(0) = X i .
Deep Learning-Based Severe Dengue Prognosis Using Human Genome …
479
Feature map is generated by this kernel. The outputs are transformed nonlinearly, by the application of activation function after convolutional layer. Yi(l) = Y Ci(l)
(5)
where output of activation function is represented as Yi(l) and input is represented as Ci(l) . Sigmoid, rectified linear units (ReLUs) and tanh are commonly used activation function. Rectified linear units (ReLUs) is used in this work and denoted as Yi(l) =
max 0, Yi(l) . In deep learning models, this function is used commonly because of reduction in interaction and effects of nonlinear. If ReLU receives a negative input, output is converted to 0 by this and if it receives positive value, same value is returned. Due to error derivatives, faster training of activation function is achieved. In saturation region, they are very small and it vanish the weight updates. This is termed as vanishing gradient problem. Full Connection: Traditional feed-forward network is a final layer of a CNN and it has one or more hidden layers. Softmax activation function is used by the output layer: yi(l)
= f
z i(l)
, z i(l)
m i(l−1)
=
wi(l−1) yi(l−1)
(6)
i=1
where weights are represented as wi(l−1) . Complete fully connected layer is used to tune these weights to make a representation of every class. Transfer function is represented as f and it represents nonlinearity. Within the neurons of fully connected layer, nonlinearity is built. It is not built in separate layers like convolutions and pooling layers. CNN training is initiated after computing output signal. Stochastic gradient descent algorithm is used to perform training [19]. From training set, single example is picked randomly to estimate gradients. CNN parameters are computed by training.
4 Experimental Results and Discussion The application of designed convolutional neural network on genomic DNA is presented in this section. Through the experiments, network’s suitable learning parameters are computed. Based on the measures like accuracy, recall and precision, the performance is compared with SVM method, which is an existing technique.
480
A. Shukla and V. Goyal
Fig. 2 Result of precision rate
SVM
Precison (%)
100
CNN
80 60 40 20 0
5
10
15
20
25
Number of data
4.1 Precision Rate Comparison From Fig. 2, the graph explains the precision comparison for the number of data in specified datasets. The methods are executed such as SVM and proposed CNN method. When number of data increased according with the precision value is increased. From this graph, it is learnt that the proposed CNN provides higher precision than the previous method of SVM which produce better classification of disease detection results. The reason is that the proposed method has PSO-based feature selection for further classification.
4.2 Recall Rate Comparison From Fig. 3, the graph explains that the recall comparison for the number of data in specified datasets. The methods are executed such as SVM and proposed CNN. When number of images is increased corresponding recall value is also increased. The graph shows that the proposed CNN method provides higher recall than the previous method of SVM. The reason is that the CNN will train the bridge images which will improve the nearest disease detection results of crack through optimal feature value. Fig. 3 Result of recall rate
Recall (%)
100
CNN
CNN-distributed ABC
80 60 40 20 0
2
4
6
Number of data
8
10
Deep Learning-Based Severe Dengue Prognosis Using Human Genome … Fig. 4 Result of accuracy
Accuracy(%)
100
CNN
481
CNN-distributed ABC
80 60 40 20 0
2
4
6
8
10
Number of data
4.3 Classification Accuracy Comparison From Fig. 4, the graph explains that the processing time comparison for the number of data in specified datasets. The methods are executed such as SVM and proposed CNN. In x-axis, the number of images is considered and in y-axis the accuracy value is considered. From this graph, it is learnt that the proposed CNN provides higher accuracy than the previous method of SVM. Thus, the output explains that the proposed CNN algorithm is greater to the existing algorithm in terms of better classification results with high accuracy rate. The reason is that existing approaches has a low rate of success as well, which has a high probability to cause misdetection of disease data.
5 Conclusion Dengue disease is diagnosed by designing convolutional neural network (CNN) in this work. This prediction method has various major advantages. At any stage, this model can be applied including stage before infection. It is the tissue sample of human in broad way. The experimental results show the efficiency of the proposed method in predicting severity of dengue. Optimum loci combination of data can be selected by this model. Based on patient’s genome background, it predicts accurately the development of severe phenotype. Large volume of data can be accommodated by this technique and better performance is produced with increased number of data. So, it is adaptive as well as scalable. In genetically influenced diseases, key element of defining clinical phenotype is multivariate (multi-loci) genomic signatures which are defined by genetic context. Deep structure of CNN produced better results. From various levels, features are extracted powerfully and it has capability in generalization. Consent The consent of the data was obtained from the participants/patients was verbal.
482
A. Shukla and V. Goyal
References 1. Carvalho et al., Host genetics and dengue fever. Infect. Gen. Evol. (2017) 2. V. Nandini, R. Sriranjitha, T.P. Yazhini, Dengue detection and prediction system using data mining with frequency analysis. Comput. Sci. Inf. Technol. (CS & IT) (2016) 3. P. Siriyasatien, A. Phumee, P. Ongruk, K. Jampachaisri, K. Kesorn, Analysis of significant factors for dengue fever incidence prediction. BMC Bioinf. (2016) 4. T.-C. Chan, T.-H. Hu, J.-S. Hwang, Daily forecast of dengue fever incidents for urban villages in a city. Int. J. Health Geograph. (2015) 5. N.K. Kameswara Rao, G.P. SaradhiVarma, M. Nagabhushana Rao, Classification rules using decision tree for dengue disease. Int. J. Res. Comput. Commun. Technol. 3(3) (2014) 6. A.L.V. Gomes, L.J.K. Wee, A.M. Khan, et al., Classification of dengue fever patients based on gene expression data using support vector machines. PLoS One 5(6), Article ID e11267 (2010) 7. P. Guo, T. Liu, Q. Zhang, et al., Developing a dengue forecast model using machine learning: a case study in China. PLoS Neglect. Trop. Dis. 11(10), Article ID e0005973 (2017) 8. T.M. Carvajal, K.M. Viacrusis, L.F.T. Hernandez, H.T. Ho, D.M. Amalin, K. Watanabe, Machine learning methods reveal the temporal pattern of dengue incidence using meteorological factors in Metropolitan Manila, Philippines. BMC Infect. Dis. 18(1), 183 (2018) 9. A. ShameemFathima, D. Manimeglai, Analysis of significant factors for dengue infection prognosis using the random forest classifier. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 6(2) (2015) 10. M.I.H. Khan, et al., Factors predicting severe dengue in patients with dengue fever. Mediterr. J. Hematol. Infect. Dis. 5(1) (2013) 11. J.A. Potts et al., Prediction of dengue disease severity among pediatric Thai patients using early clinical laboratory indicators. PLoS Negl. Trop. Dis. 4(8), e769 (2010) 12. W. Caicedo-Torres, A. Paternina, H. Pinz´on, Machine learning models for early dengue severity prediction, in M. Montes-y-G´omez, H.J. Escalante, A. Segura, J.D. Murillo (eds.), IBERAMIA 2016. LNCS (LNAI), vol. 10022 (Springer, Cham, 2016), pp. 247–258 13. V.J. Lee et al., Simple clinical and laboratory predictors of chikungunya versus dengue infections in adults. PLoS Negl. Trop. Dis. 6(9), e1786 (2012) 14. K. Laoprasopwattana, L. Kaewjungwad, R. Jarumanokul, A. Geater, Differential diagnosis of chikungunya, dengue viral infection and other acute febrile illnesses in children. Pediatr. Infect. Disease J. 31(5) (2012) 15. A. Paternina-Caicedo, et al., Features of dengue and chikungunya infections of Colombian children under 24 months of age admitted to the emergency department. J. Trop. Pediatr. (2017) 16. Keogh, Mueen, Curse of dimensionality, in Encyclopedia of Machine Learning (Springer, 2011), pp. 257–258 17. V.O. Andersson, M.A.F. Birck, R.M. Araujo, Towards predicting dengue fever rates using convolutional neural networks and street-level images, in 2018 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2018), pp. 1–8 18. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015) 19. R.G.J. Wijnhoven, P.H.N. de With, Fast training of object detection using stochastic gradient descent, in Proceedings of International Conference on Pattern Recognition (ICPR) (Tsukuba, Japan, 2010), pp. 424–427
An Improved DCNN-Based Classification and Automatic Age Estimation from Multi-factorial MRI Data Ashish Sharma and Anjani Rai
Abstract In recent years, automatic age estimation has gained popularity due to its numerous applications in forensic and medical applications. In this effort, a programmed multi-factorial age estimation technique is proposed dependent on MRI information of hand, clavicle and teeth to broaden the lifetime period run starting from 19 years, as usually utilized for age appraisal depends on hand bone, to as long as 25 years, as joined with clavicle bone and slyness teeth. Intertwining ageapplicable data starting every one of the three anatomical destinations, this work uses an improved deep complexity neural system. Besides, when worn for greater part age grouping, we demonstrate that a group got from performance our relapse-based indicator is more qualified than a group legitimately prepared with categorization misfortune, particularly when considering that cases of minors being wrongly named grown-ups need to be limited. Copying how radiologists carry out age judgment, the projected technique dependent on deep complexity neural systems accomplishes improved outcomes in anticipating ordered age. These outcomes will support scientific deontologists and different experts to assess with elevated exactness both age and dental development in kids and youth. Keywords Information fusion · Multi-factorial · Complexity neural network · Age estimation · Mainstream age categorization · Magnetic resonance imaging
1 Introduction Age opinion of livelihood persons missing legitimate distinguishing proof records right now is an exceptionally significant research field in legal and lawful prescription. Its primary request originates from ongoing movement inclinations, where it A. Sharma (B) · A. Rai Department of Computer Engineering & Applications, GLA University, Mathura 281406, India e-mail: [email protected] A. Rai e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_44
483
484
A. Sharma and A. Rai
is a legally significant inquiry to recognize grown-up shelter searchers from young people who have not yet arrived at period of larger part. Broadly utilized radiological techniques for criminological age estimation in youngsters and youths consider the corresponding natural improvement of bone and dental structures. This enables a specialist to look at progress in physical development identified with the shutting of epiphysis holes and mineralization of astuteness teeth. Notwithstanding organic variety between concepts of the equivalent ordered age (CA), hand bone is mainly appropriate anatomical place to pursue bodily development in minors, because epiphysis holes begin shutting at various occasions, with distal bone completing prior and for example the radius bone completing at a period of around 18 years. In any case, the age scope of attention for criminological mature valuation is somewhere in the range of 13 and 25 years. The advancement of the substantial development of youthful people can be utilized as an organic marker associated with maturing. Assessing biological age (BA) from bodily advancement is in this way a profoundly significant subject in both clinical and legitimate (measurable) prescription usage. In scientific medication, BA evaluation is roused with the conclusion of end ocarina logical ailments similar to quickened otherwise deferred improvement in young people, or for ideally arranging the timepurpose of paediatric orthopaedic medical procedure intercessions while bone is as yet developing. Instances of such intercessions incorporate leg-length disparity adjustment or spinal distortion redress medical procedure. In legitimate medication, when distinguishing proof records of youngsters or teenagers are missing, as might be the situation in haven looking for strategies or in criminal examinations, estimation of physical development is utilized as a guess to evaluate obscure ordered age (CA). Set up radiological techniques for evaluation of bodily development depend on the optical assessment of bone hardening in X-beam pictures. Hardening is best followed in the lengthy bone and radiologists for the most part analyse hand bones because of the huge amount of quantifiable bone that are noticeable in X-beam pictures of this anatomical district, jointly by the way that maturing development isn’t synchronous for all hand bones. All the additional explicitly, carpal and distal phalanges are the main unresolved issues hardening, while in sweep and ulna, development may be pursued up to a time of roughly 18 years. As of the degree of solidification evaluated by means of the radiologist, the evaluation of bodily development of a person is next measured by partner its development to the age of the concepts in the suggestion map book who demonstrated a similar degree of hardening. Within the rest of this original copy, we will allude to this measurement as organic age as evaluated by radiologists (BAR). A significant downside of the generally utilized BAR method is the presentation to ionizing radiation cannot be supported in lawful drug usage for looking at solid kids and teenagers with no indicative reasons. Moreover, the reliance on abstract optical correlation with reference X-beam pictures makes these techniques inclined to elevate between and intra-ratter changeability. At long last, the populace utilized in constructing the reference chartbook dates from the centre of the only remaining century, which, because of changes in wholesome propensities just as
An Improved DCNN-Based Classification and Automatic Age …
485
quicker improvement of the current populace, prompts this technique being viewed as off base and outdated. In excess of the ongoing years, explore in age assessment has shown gigantic enthusiasm for magnetic resonance imaging (MRI) as an ionizing emission at no cost choice to grow novel plans for subsequent hardening development comparing to the advancement of bone. Right off the bat, evaluating CA based on the appraisal of natural improvement is intrinsically limited because of organic variety among subjects of the similar CA [1]. This natural variety characterizes the least blunder to any strategy for criminological age assessment can make. Through no apparent accord in the writing, the organic variety is supposed to be as long as one year in the set of free tools for digital image forensics applicable era contemplated in this original copy. Also, because of optical test, built up radiological techniques for assessing organic improvement include intraand between ratter variability [2], which can be wiped out by using programmingbased routine age assessment. A tale pattern in criminological age assessment study is to reinstate X-beam-based strategies with attractive reverberation imaging (MRI), because lawful frameworks in many nations deny the utilization of ionizing radiation on solid subjects. As of late, programmed strategies for age assessment dependent on MRI information were created [3, 4], by the by means of the hand they additionally exclusively examine a solitary anatomical site. A significant disadvantage of the previously mentioned strategies suggested in favour of multi-factorial age evaluation is their utilization of ionizing radiation, which is lawfully restricted in solid subjects for non-indicative reasons. Be that as it may, because of the absence of a recognized criminological age estimation strategy without involving energy released by atoms in the form of electromagnetic wave, a few European nations have prepared a precise exclusion to this rule on account of shelter looking fraction. As of late, to beat the downside of ionizing radiation, a great deal of investigation has concentrated on utilizing compelling resonance imaging (MRI) for scientific age assessment [5]. It is right now vague if similar organizing rules developed for ionizing radiation-based techniques can likewise be used for MRI [6]. Subsequently, unique MRI-dependent process has been produced for surveying natural expansion for every one of the three anatomical destinations [7]; however, these strategies still depend on the thought of discretizing biological development into various stages and on emotional optical assessment. To empower target age assessment with no the disadvantage of intra-or between ratter fluctuation as presented by radio-rationale optical assessment, programmed age opinion from X-ray imagery of the hand has just been projected in the creative writing with various strategies. Defeating the requirement for limitation, very recently [8] demonstrated a profound knowledge method including complexity neural systems (CNNs) for age assessment, which done age relapse on entire X-beam pictures of the hand. Analysed on 200 pictures, the victor of the struggle utilized the profound beginning V3 CNN [9] with added sexual orientation data. Diversely to the enormous attention in programmed age assessment from hand X-beam pictures, up to our information no AI-dependent arrangements have yet been projected for assessing age from clavicle CTs, as for wisdom teeth OPGs an initial step.
486
A. Sharma and A. Rai
The gathering has recently added to the growth of mechanized age assessment techniques from hand and wrist MRI. Afterward, we improved performance of the era relapse part via preparing a profound CNN (DCNN) for age assessment in [10]. To the best of the learning, no routine image examination strategy for multi-factorial age assessment, independently of the imaging methodology, has been exhibited yet. In these work, novel strategies for multi-factorial age evaluation are investigated from MRI information of hand bones, clavicles and shrewdness teeth. Propelled by how radiologists perform organizing of various anatomically locales, our strategies consequently combine the age-important form data from singular anatomical into a solitary ordered age. The projected techniques are assessed on an MRI database of 322 subjects by performing analyses surveying CA gauges regarding relapse, just as differentiation of minority/lion’s share age, characterized as having passed the eighteenth birthday celebration, as far as characterization. The outcomes exhibit the raise in exactness and lessening in vulnerability when utilizing the multi-factorial move towards as contrasted with depending on a solitary anatomical.
2 Proposed Methodology Subsequent of the built up radiological arranging advance including diverse anatomical destinations in a multi-factorial arrangement, subsequent to editing of agepertinent structures, we carry out age evaluation from trimmed knowledge teeth, hand, and clavicle bone, whichever through utilizing RF or an IDCNN design. In the projected strategy, multi-factorial age evaluation is done with a IDCNN design foreseeing age (see Fig. 1). This nonlinear relapse prototype depends on plotting look data from hand and clavicle bones and also the astuteness teeth towards the nonstop CA objective movable. So, via removing age-significant data on behalf of various anatomical locations got over trimming after the info MRI statistics, our approach imitates the set up radiological arranging methods established aimed at allocation independently, yet deprived of the requirement aimed at defining distinct phases. In Fig. 1 IDCNN-based programmed multi-factorial age assessment outline
An Improved DCNN-Based Classification and Automatic Age …
487
a start towards finish way, we consolidate the data originating from various anatomical sites to consequently evaluate the age of a concept as of its MRI information delineating age-pertinent anatomical arrangements.
2.1 Cropping of Age-Relevant Structures Inspiration for trimming age-important structures in a different preprocessing phase is towards streamlining the issue of relapsing age since arrival data, with the end goal which the situation is additionally appropriate on behalf of information sets which are restricted in dimension. Moreover, related to the deep-examining of the first 3D pictures, which unavoidably prompts for the defeat of significant maturing evidence as of the epiphyseal hole areas, editing of the age-related constructions likewise diminishes GPU memory prerequisites in addition permits us towards a shot at an a lot higher picture goals. Diverse automated milestone limitation techniques as introduced in [11] could be utilized to precisely confine, adjust and metric measurements yield age-important anatomical constructions as of bone and dental 3D MRI information (Fig. 1). Through tracing dual anatomical tourist spots each bone, on behalf of the hand MRI information we yield a similar 13 bones which are utilized in the Tanner-Whitehouse RUS technique (TW2). In clavicle MRI information, the dual clavicle bones remain trimmed independently depends on two known milestones used for every clavicle, correspondingly. The districts typifying intelligence teeth are removed after the dental MRI information utilizing the areas of the focuses of next and subsequent molars. If there should arise an occurrence of a missing insight tooth, we gauge its in all likelihood area as per the next molars also concentrate the locale enclosing the lost tooth equally if it would be available.
2.2 Deep CNN Construction Each DCNN square comprises of three stages of dual back to back 3 × 3 × 3 intricacy coats deprived of stuffing also a maximum combining coating that parts the goals. Rectified linear units (ReLUs) remain utilized as nonlinear actuation capacities. A completely associated (fc) coating towards the finish of the characteristic mining (fb) square (fcˆfb) prompts a dimensionality diminished characteristic depiction for each edited info volume independently (Fig. 2), which fills in as a component selection on behalf of that particular anatomical arrangement. In this effort, three distinct systems investigated while to combine data as of anatomical destinations inside the CNN architecture. The principal methodology is to meld the three anatomical locations straightforwardly on the contribution through linking wholly trimmed response capacities as stations afore the particular DCNN square (Fig. 3), trailed through dual completely associated coatings fcˆi and fcˆo. In the second centre combination design, the locales are melded right after the DCNN squares (one for each trimmed volume)
488
A. Sharma and A. Rai
Fig. 2 Distinct bone/tooth feature extraction block used iDCNN constructions for multi-factorial age estimation
by concatenating the yields of their completely associated layers fcˆfb before the two completely associated layers fcˆi and fcˆo. At last, in the late combination engineering, the separate DCNN chunks are initially joined through completely associated coatings fcˆi for every of the three anatomical destinations independently, beforehand intertwining the locations with the latter completely associated coating fcˆo that produces the age forecast. For training, each training sample associated as sn , n ∈ {1, . . . , N }, comprising of 13 trimmed hand bone capacities s_(n, h)ˆj, j ∈ {1, …, 13}, dual clavicle areas s_(n, c)ˆl, l ∈ {1, 2} also four regions casing astuteness teeth s_(n, w,)ˆk k ∈ {1, …, 4}, any with CA as objective variable y_n for a relapse job, or through a parallel variable y_n which is 1 for a minor (m); i.e. CA is littler than 18 years, and 0 for a grown-up (an); i.e. CA is bigger or equivalent than 18 years, in a characterization job. Enhancing a relapse DCNN design ϕ with limits w is achieved by stochastic inclination plunge limiting an L_2 misfortune on the relapse objective y = (y_1, …, y_N)ˆT in Eq. (1): wˆ = arg min
N
ϕ(sn ; w)2
(1)
n=1
To regularize the relapse issue, a benchmark mass rot regularization expression used just as fall out. For assessing if a concept is immature or mature, the aftereffect of the relapse DCNN engineering may be applied for order by attaining the evaluated age. In this effort, we look at the grouping outcomes got from the deterioration forecast with the characterization results obtained by preparing the equivalent DCNN design with a multinomial logistic order misfortune registered as wˆ = arg min
N n=1 j∈{m,a}
−ynj log
e(ϕ j (sn ;w)) (ϕk (sn ;w)) k∈{m,a} e
(2)
An Improved DCNN-Based Classification and Automatic Age …
489
(a) Early fusion architecture
(b) Middle fusion architecture
(c) Late fusion architecture Fig. 3 Three DCNN architectures for multi-factorial age estimation with a early, b middle and c late fusion strategies
Once more weight rot and idler are utilized for regularization. To decide the significance of every bone or tooth and every anatomical location freely for various forecast ages in our multi-factorial age assessment technique, we compute the impact of the separate DCNN obstructs on the system forecast. Aimed at every tried example and the situation anticipated age, we compute the mean initiation esteem later the completely connected coating fcˆfb at the finish of each element mining square. To
490
A. Sharma and A. Rai
picture the comparative significance on the anticipated age of every bone or tooth, the mean start prices are standardized to summarize to one. Furthermore, we imagine the comparative significance on the anticipated period of every anatomical location autonomously, by first figuring the mean start estimation of all component mining squares contributing to hand, clavicle and teeth locales independently, trailed by a standardization of the three determined qualities to summarize to single.
2.3 Improved Deep Convolutional Neural Network In this effort, alterations are performed in the guidance algorithm of DCNN to make it progressively effective in the wording of correctness and unpredictability. Customarily, there are two parameters (one in the convolutional coating and other which is fully associated coating) in DCNN which should be adjusted, though, in the projected methodology, the parameter in the convolutional layer alone will be balanced. Appropriate modifications are made in the completely associated layer to estimate the weight esteems with no emphasis. The structural design of the IDCNN is same as that of the DCNN. The preparing calculation of the proposed IDCNN is specified under. Stage 1: Execute steps are pursued as talked about in the guidance algorithm of DCNN. Stage 2: Fix the stochastic slope drop limiting in Eq. (1). The perfect value of crossentropy is nil. In any case, it isn’t virtually achievable. Indeed, even in traditional DCNN, the calculation is supposed to be joined when it arrives at a higher rate than nil. Henceforth, in this methodology, this worth is fixed at 0.01 which is the most ordinarily utilized an incentive in the literature. Stage 3: As relapse esteem and the objective worth are known, the yield esteem y is evaluated utilizing Eq. (1). Stage 4: With the assessed yield estimation of yield layer, the NET estimation of the yield layer can be evaluated using Eq. (2). Stage 5: Since the info and the NET worth are known, the weights of the shrouded layer are assessed utilizing Eq. (2). Step 6: The remainder of the preparation procedure continues as before as conventional DCNN. Therefore, the weight esteems are evaluated without any iteration. It has been assessed with basic numerical process. It might be noticed that this procedure is accomplished for only one cycle which decreases the computational difficulty to high degree. Presently, just the channel coefficients in the convolutional layers need modification. In the IDCNN, the complex weight modification conditions are not essential which improves the reasonable attainability of the projected approach. Here in our effort, we have explored three unique methodologies while intertwining data as of anatomical locales inside our CNN engineering. Subsequently, making sure that we follow the soul of profound CNNs that the system is able to
An Improved DCNN-Based Classification and Automatic Age …
491
separate entirely data significant aimed at an evaluation task without anyone else, Our initial combination procedure feedback areas from every single anatomical site are intertwined by connecting them before they are exhibited to the system. By means of a mean total blunder of 1.18 ± 0.95 years in relapsing age, this system stayed defeated through the two; subsequently, the interpretation invariance, a significant property of CNNs, may not be completely abused in our restricted preparing information set due to the huge varieties in the general situation of anatomical sites while being linked. Also, in other two procedures, we firstly, concentrate age-applicable highlights in a CNN square and afterward consolidate includes on two distinct levels. In the centre combination system, data from all bones and teeth are intertwined following highlights are removed. This system relates to a measurable master as shown in the pictures of the separate anatomical arrangements at the same time and rationally melding all data when assessing age in a multi-factorial way. The presentation of this technique was like the third late combination arranges design, which first consolidate’s data for every anatomical site independently, trailed by intertwining the three anatomical locales with a completely associated layer. The late combination system is enlivened by how criminological specialists are right now joining individual data from hand radiographs, insight teeth OPGs and clavicle CTs by and by when executing multi-factorial age assessment. We utilized combination organize engineering for our further assessments at last because of its astounding age relapse execution as far as mean supreme reversion mistake of 1.01 ± 0.74 years.
3 Experimental Results and Discussion The MRI information set was taken from Ludwig Boltzmann Institute for scientific forensic imaging in Graz having a major aspect to a revise examination job of MRI in estimating criminological age, learning includes male Caucasian offers being achieved in accordance with Declaration of Helsinki which is endorsed by the moral advisory group of the Medical University of Graz (EK21399 ex 09/10). Every single qualified member gave composed educated assent and from underage members composed assent of the legitimate gatekeeper was furthermore gotten. Prohibition standards are considered as a history of endocrinal, metabolic, hereditary or formative sickness. Our proposed multifactorial assessment technique on the information set 3D MRIs from N = 322 concepts with realized CA running, somewhere in the range of 13.0 and 25.0 years (mean ± std: 19.1 ± 3.3 years, 134 subjects were minors beneath 18 years at the hour of the MRI filter). Aimed at every topic, we apply as our contribution for the DCNN design the three relating MRI dimensions of the left hand, upper thorax, and the jaw, which were taken in MRI filter session. CA of topics was determined as contrast among birthday and date of the MRI examination. T1weighted 3D angle reverberation successions with fat immersion were utilized for obtaining the hand and clavicle facts (physical voxel goals of 0.45 × 0.45 × 0.9 and 0.9 × 0.9 × 0.9 mm3 , separately), while teeth were scanned using a proton thickness weighted turbo turn reverberation sequence (0.59 × 0.59 × 1.0 mm3 ). Voxel size of
492
A. Sharma and A. Rai
the entire input volumes was 288 × 512 × 72 for hand, 168 × 192 × 44 for clavicle and 208 × 256 × 56 for knowledge teeth, respectively. Attainment times of hand, clavicle, and astuteness teeth MR orders were about 4, 6 and 10 min, respectively, whereas it shows the possible further acceleration through undersampling. To assess the results of the experiments accuracy, f-measure, precision, recall are used between the methods of DCNN and IDCNN.
3.1 Precision Rate Comparison From Fig. 4, the graph explains that the precision comparison for the number of data in specified datasets. The methods are executed such as DCNN and IDCNN. When number of image increased according to the precision value is increased. From this graph, it is learnt that the proposed IDCNN provides higher precision than the previous methods which produce better age detection results.
Fig. 4 Result of precision rate
Fig. 5 Result of recall rate
An Improved DCNN-Based Classification and Automatic Age …
493
Fig. 6 Result of f-measure rate
3.2 Recall Rate Comparison From Fig. 5, the graph explains that the recall comparison for the number of data in specified datasets. The methods are executed such as DCNN and IDCNN. When number of data is increased, corresponding recall value is also increased. From this graph, it is learnt that the proposed IDCNN provides higher recall than the previous methods. The reason is that the IDCNN produces the weight parameters which will improve the age detection results.
3.3 F-Measure Rate Comparison From Fig. 6, the graph explains that the f-measure comparison for the number of data in specified datasets. The methods are executed such as DCNN and IDCNN. When the number of data is increased, the f-measure value is increased correspondingly. From this graph, it is learnt that the proposed ERELM provides higher f-measure than the previous methods. Thus, the proposed IDCNN algorithm is greater to the existing algorithms in terms of better retrieval results. The reason that the preprocessing of image will improve the age detection rate even better than the existing method DCNN.
3.4 Accuracy Comparison From Fig. 7, the graph explains that the processing time comparison for the number of images in specified datasets. The methods are executed such as ELM, RELM and ERELM. In x-axis, the number of data is considered, and in y-axis, the accuracy value is considered. From this graph, it is learnt that the proposed ERELM provides lower processing time than the previous methods such as ELM and RELM. Thus, the output explains that the proposed ERELM algorithm is greater to the existing algorithm in terms of better age detection results with high accuracy rate.
494
A. Sharma and A. Rai
Fig. 7 Result of accuracy
4 Conclusion and Future Work In this effort, a profound knowledge-dependent multifactorial age approximation process is studied from the information gathered from MRI taking a huge about 322 subjects. Forensically, it is important that the age between 13 and 25 years, which naturally intertwines data from hand bones, intelligence teeth and clavicle bones. We try to access the balance time with a methodology that tries to defend a limit of the technique which is utilized presently in lawful training, i.e., utilization of ionizing radiation along with the subjectivity due to conveying separate plans for the separate anatomical locations and the absence of agreement in the data from separate locations have to be optimized in the final age estimate. Afterward going through diverse arrange structures, we have demonstrated that the multi-factorial age assessment is conceivable via consequently intertwining age-relevant information from every individual site. In this effort, we also explored the legitimately significant inquiry of the greater part age classification, by contrasting threshold expectations from the similar technique having the outcomes committed from binary classifier, which is prepared with the IDCNN structural design. The results indicated that the relapsebased strategy is better suitable for this undertaking, not withstanding, because of the great biotic difference of topics among equivalent ordered age, distinctive care must be taken to choose the trade-off among minor age groups which are classified incorrectly as grown-ups as well the same are incorrectly grouped as minors. Likewise, the profound learning calculations additionally experience the disappearing/detonating slope issue found in counterfeit neural system with angle-based learning techniques and backpropagation, which can be settled in upcoming effort. Consent Every single qualified member gave composed educated assent.
References 1. T.J. Cole, The evidential value of developmental age imaging for assessing age of majority. Ann. Hum. Biol. 42(4), 379–388 (2015)
An Improved DCNN-Based Classification and Automatic Age …
495
2. P. Kaplowitz, S. Srinivasan, J. He, R. McCarter, M.R. Hayeri, R. Sze, Comparison of bone age readings by pediatric endocrinologists and pediatric radiologists using two bone age atlases. Pediatr. Radiol. 41(6), 690–693 (2011) 3. D. Stern, C. Payer, V. Lepetit, M. Urschler, Automated age estimation from hand MRI volumes using deep learning, in MICCAI 2016, vol. 9901, LNCS, ed. by S. Ourselin, L. Joskowicz, M.R. Sabuncu, G. Unal, W. Wells (Springer, Cham, 2016), pp. 194–202 4. D. Stern, M. Urschler, From individual hand bone age estimation to fully automated age estimation via learning-based information fusion, in 2016 IEEE 13th International Symposium on Biomedical Imaging (2016), pp. 150–154 5. E. Hillewig, J. De Tobel, O. Cuche, P. Vandemaele, M. Piette, K. Verstraete, Magnetic resonance imaging of the medial extremity of the clavicle in forensic bone age determination: a new four-minute approach. Eur. Radiol. 21(4), 757–767 (2011) 6. P. Baumann, T. Widek, H. Merkens, J. Boldt, A. Petrovic, M. Urschler, B. Kirnbauer, N. Jakse, E. Scheurer, Dental age estimation of living persons: comparison of MRI with OPG. Forensic Sci. Int. 253, 76–80 (2015) 7. S. Serinelli, V. Panebianco, M. Martino, S. Battisti, K. Rodacki, E. Marinelli, F. Zaccagna, R.C. Semelka, E. Tomei, Accuracy of MRI bone age estimation for subjects 12–19. Potential use for subjects of unknown age. Int. J. Legal Med. 129(3), 609–617 (2015) 8. C. Spampinato, S. Palazzo, D. Giordano, M. Aldinucci, R. Leonardi, Deep learning for automated bone age assessment in X-ray images. Med. Image Anal. 36, 41–51 (2017) 9. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 2818–2826 10. D. ˇStern, C. Payer, V. Lepetit, M. Urschler, Automated Age estimation from hand MRI volumes using deep learning, in S. Ourselin, L. Joskowicz, M. Sabuncu, G. Unal, W. Wells (eds.), Medical Image Computing and Computer-Assisted Intervention MICCAI 2016, volume 9901 LNCS (Springer, Cham, Athens, 2016), pp. 194–202 11. C. Lindner, P.A. Bromiley, M.C. Ionita, T.F. Cootes, Robust and accurate shape model matching using random forest regression voting. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1862– 1874 (2015)
The Application of Machine Learning Methods in Drug Consumption Prediction Peng Han
Abstract Nowadays, drug abuse is a universal phenomenon. It can bring a huge damage to the human body and cause an irreversible result. It is important to know what can lead to the abusing so that it can be prevented. In order to prevent the abusing of those drugs, it is necessary to figure out what elements make people abuse the drug and how they relative to the abusing. According to the drug consumption data from the UCI database, Big Five personality traits (NEO-FFI-R), sensation seeking, impulsivity, and demographic information are considered to be the relative elements of the abusing. However, how they affect on the abusing of drugs is not clear so they cannot predict the probability of a person whether he is going to abuse a drug. There are many traditional ways to analysis the data based on scoring, such as give every element a score and but they can only tell an inaccurate predictive value. Machine learning is very hot nowadays because of its strong learning ability, high efficiency, and high accuracy. In this paper, we build models for accurate prediction of drugabusing with the personality traits and some other information, based on logistic regression, decision tree, and random forest separately. We find out that the sensation of respondents and the country which they are from is the most important factor for drug abuse. And we can get a conclusion that drug abuse is not only depending on a person’s inner being, but also affected by the environment they lived in. Keywords Machine learning · Drug abuse · Logistic regression · Decision tree · Random forest
1 Introduction Nowadays, drug abuse has become a serious problem in almost every country around the world. Drug abuse means use a drug in a great quantity or with methods which can harm the user’s body, such as influence their fertility. It can cause a serious disease P. Han (B) College of Beijing University of Technology, NO. 100, Pingleyuan Road, Chaoyang District, Beijing, People’s Republic of China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_45
497
498
P. Han
and even cause death especially for young teenagers, and drug abuse can destroy their whole life. Many drug users’ personality was changed during the long-term influence of a drug which cause a psychological harm, so that many of them engage in criminal activities which cause great damage to the society. It is significant to prevent drug abuse, and the best way is to find the potential population, educate and psycho treat them before they have the thought of drug abuse. It just like finding the naughtiest kids in the school and educate him before he made a mistake. Now, the problem becomes how to identify the naughty kid. The traditional way to model the relationship between personality traits and drug abuse is correlation analysis. This kind of analysis can only tell if there is possible relationship between personality traits and drug abuse, but it cannot tell a predictive value and the accuracy of the prediction is limited especially when there are too many relative factors, it would cost a lot of time. However, machine learning is popular for its high accuracy and efficiency, which is suitable to solve this problem. Machine learning is a way which uses computer to finish a task effectively relying on patterns and inference instead of using explicit instructions. Machine learning algorithms can build mathematical model with a large amount of simple data. The mathematical model can make a prediction with a set of data, and the result is accurate in most time. Using machine learning to do the prediction is also efficiently because you do not have to analysis every set of data by yourself, the computer can do it faster and more accurate. Logistic model [1], decision tree [2], and random forest [3] are three classical methods in machine learning fields. Logistic model is a statistical model which uses logistic function to model a binary-dependent variable. Decision trees are very popular in machine learning [4] and operation research, and it can explicitly and directly show the decisions. It can predict the value of a target based on all the input variables. The leaves represent the labels of several potential results, and the branches represent the judgment conditions of the decision tree. Random forest model is a kind of upgrade model of decision tree model. It helps dealing with the over-fitting problem of decision tree model. Random forest model can average the prediction results of plenty of decision trees, and they are trained on different variables of the same training set so that the performance of the model can have a great improvement. We use logistic model, decision tree and random forest to build models for 3 drugs so that we can get an accurate prediction on whether a person have a tendency of drug abuse through his personality traits and some other basic information. The prediction accuracy rate can reach 75% for all three drugs. By the way, we also compare the prediction of those three models which can lead to a result which model can do the most accurate prediction. The rest of the paper is organized as follows. In Sect. 2, we would describe the data we used for the analyzation. In Sect. 3, we briefly describe those three models we used, they are logistic model, decision tree model, and random forest model. In Sect. 4, we would have a summary through all the process of training the model and how we get the result. And we also will compare these three models with all the results and have a conclusion about those models in this section.
The Application of Machine Learning Methods … Table 1 Numbers and proportions of respondents who have used and never used drugs
499
Amphet
Benzos
Coke
Have used
909 (48.2%)
885 (46.9%)
847 (44.9%)
Never used
976 (51.8%)
1000 (53.1%)
1038 (55.1%)
2 Data Description We use the data donate by Evgeny M. Mirkes to build our prediction models, which is publicly available at https://archive.ics.uci.edu/ml/datasets/Drug+consumption+% 28quantified%29. The data including records for 1885 respondents. Each record includes 12 features which include 5 personality measurements. And the 5 personality measurements include neuroticism, extraversion, openness to experience, agreeableness and conscientiousness (NEO-FFI-R), BIS-11 (impulsivity), and ImpSS (sensation seeking), level of education, age, gender, country of residence, and ethnicity. I build model for drugs (Amphet, Benzos, and Coke) with aforementioned variables, respectively. The numbers and proportions of respondents who have used and never used drugs are given in Table 1. We chose these three drugs because the proportion on who have used and never used these drugs is well balanced so that we have a better opportunity to build a more accurate model. We divide all the data into training data (70%) and testing data (30%), which contains of 1320 respondents for training data and 565 respondents for testing data.
3 Models 3.1 Logistic Regression Logistic regression is a common tool widely used in many areas. It uses a logistic function to model a binary dependent variable. Logistic regression can conduct a binary regression [5] which means the dependent variable of the model only have two possible values such as 0 and 1, pass and fail. We conduct logistic regression for the use of Amphet, Benzos, and Coke with abovementioned variables, respectively. The regression coefficients and p-valves are given in Tables 2, 3 and 4. We can see that age, gender, country, Oscore, Cscore, and SS are significant variables for Amphet according to Table 2. They play the key role in whether a person will abuse Amphet. From Table 3, we can realize that age, gender, country, Nscore, Oscore, and Cscore are significant variables for Benzos. In Table 4, we can realize that age, gender, country, Nscore, Oscore, Ascore, Cscore, and SS are significant variables for Coke. We summarize the training and testing accuracy of logistic model together with train and test AUC in Table 5. According to the data in Table 5, the accuracy of training
500
P. Han
Table 2 Logistic regression results for Amphet Estimate (Intercept) Age
0.071
Std. error 0.129
z value 0.548
Pr(>|z|) 0.584
0.199
0.078
2.566
0.010
Gender
−0.588
0.136
−4.338
0.000
Education
−0.030
0.070
−0.431
0.666
Country
−0.498
0.101
−4.914
0.000
Ethnicity
0.237
0.362
0.653
0.514
Nscore
0.003
0.074
0.036
0.972
Escore
−0.119
0.076
−1.565
0.118
Oscore
0.247
0.074
3.338
0.001
Ascore
−0.028
0.067
−0.409
0.682
Cscore
−0.184
0.075
−2.454
0.014
Impulsive
0.059
0.085
0.694
0.488
SS
0.439
0.092
4.784
0.000
Table 3 Logistic regression results for Benzos Estimate
Std. error
z value
Pr(>|z|)
(Intercept)
0.315
0.141
2.241
Age
0.540
0.082
6.555
0.000
−0.354
0.140
−2.534
0.011
Gender Education
0.025
0.012
0.071
0.169
0.866
−0.976
0.107
−9.088
0.000
Ethnicity
0.521
0.401
1.301
0.193
Nscore
0.469
0.077
6.067
0.000
Escore
0.077
0.077
0.997
0.319
Country
Oscore
0.217
0.075
2.884
0.004
Ascore
−0.062
0.069
−0.897
0.370
Cscore
−0.181
0.076
−2.373
0.018
Impulsive
0.025
0.087
0.289
0.772
SS
0.105
0.093
1.133
0.257
and testing are basically the same for those three drugs, with training accuracy and AUC fluctuating around 68 and 74%, and testing accuracy and AUC fluctuating around 67 and 73%. The performance of logistic regression is comparable on the training and testing data for all three drugs, which implies the model is appropriate without significant over-fitting or under-fitting.
The Application of Machine Learning Methods …
501
Table 4 Logistic regression results for Coke Estimate (Intercept) Age Gender Education
0.043
Std. error
z value
0.136
Pr(>|z|)
0.316
0.752
0.319
0.079
4.062
0.000
−0.387
0.136
−2.845
0.004
0.032
0.070
0.456
0.648
−0.552
0.102
−5.411
0.000
Ethnicity
0.495
0.389
1.270
0.204
Nscore
0.157
0.074
2.136
0.033
Escore
0.052
0.076
0.684
0.494
Oscore
0.188
0.074
2.555
0.011
Ascore
−0.176
0.067
−2.614
0.009
Cscore
Country
−0.197
0.075
−2.620
0.009
Impulsive
0.110
0.085
1.288
0.198
SS
0.338
0.091
3.711
0.000
Table 5 Training/testing accuracy/AUC of logistic model
Amphet
Benzos
Coke
Training accuracy
0.680
0.697
0.684
Testing accuracy
0.678
0.687
0.676
Training AUC
0.744
0.755
0.734
Testing AUC
0.754
0.757
0.725
3.2 Decision Tree Decision tree is a commonly used model in machine learning. It is a model which can use several input variables to predict the value of a target variable. Every interior node related to one input variable. Every possible result is contained in one of the edges in the tree. Every leaf is one possible result for the model and the chance depends on the path from root to the leaf. The way of building a decision tree is splitting the source set which is based on settled splitting rules which is based on classification features [4]. The splitting process repeats on each derived subset which make the tree expand from a root node to plenty of leaf nodes. This recursion is completed when the splitting can no longer bring a new value to the predictions. And these are the learning process of a decision tree [6]. We took the first drug as an example to show the process of model building. At first, we build a tree which is big enough to avoid under-fitting. However, in order to avoid over-fitting, we have to do the pruning for the tree [7]. So, we introduce a complexity parameter which is an important value to balance between the accuracy and the size of the decision tree. In Fig. 1, we present the relative error for different complexity parameters. In order to have more accurate model, we have to make sure
502
P. Han
Fig. 1 Relative error for different complexity parameters
the relative error is at its deepest point. According to Fig. 1, we chose cp = 0.0077 to do the pruning tree after the pruning is shown in Fig. 2. We also build decision trees for the other two drugs in a similar way. The training and testing accuracy together with training and testing AUC in are shown in Table 6. Generally speaking, the performance of decision tree is worse than that of logistic regression. From Table 6, we can see that the accuracy of prediction for the three drugs has some differences. That is because different data sets have different scales, so when we build models, we chose different value to do the pruning for the decision tree and that lead to an uncontrollable error which cause the difference between the prediction accuracy for the three drugs.
3.3 Random Forest Random forest is an ensemble learning method which is commonly used in classification, regression, and other tasks. It is formed by numbers of decision trees at training time and outputting the mode of classes or the mean prediction of all the individual trees. The decision trees built-in random forest are based on sub-data sets
The Application of Machine Learning Methods …
503
Fig. 2 Tree after the pruning
Table 6 Traing/testing accuracy/AUC of decision tree model
Amphet
Benzos
Coke
Training accuracy
0.689
0.683
0.702
Testing accuracy
0.650
0.657
0.634
Training AUC
0.710
0.686
0.747
Testing AUC
0.670
0.665
0.688
which are chosen randomly from the original data set. Random forest overcomes the over-fitting habit of decision tree [8] and brings the accuracy and stability of prediction to a new level. Random forest has a lot of advantages. It can produce classifiers with high accuracy for many kinds of data. It can also handle plenty of input variables without sacrifice its accuracy. In a word, random forest is a very useful model in machine learning. We take Amphet as an example to show how we build a random forest model. Firstly, we build a random forest model with 1000 decision trees. The model is one hundred percent over-fitting now so we have to simplify it to neutralize over-fitting. Out-of-bag (OOB) error is a commonly used value to evaluate the prediction error of random forest model [9]. The relationship between out-of-bag (OOB) error and the number of trees is shown in Fig. 3. In order to have a more accurate model, we have to find the bottom value of OOB error. Based on this figure, to avoid over-fitting and under-fitting we have to choose the parameter when the number of trees is 385. We use the same way to build random forest model for the other two drugs and the training and testing accuracy and AUC is given in Table 7. From the table, we can see that training accuracy and AUC of random forest are obviously higher than decision tree model and logistic model we mentioned before. The testing accuracy
504
P. Han
Fig. 3 Relationship between out-of-bag (OOB) error and the number of trees
Table 7 Traing/testing accuracy/AUC of random forest model
Amphet
Benzos
Coke
Training accuracy
0.744
0.752
0.776
Testing accuracy
0.681
0.694
0.678
Training AUC
0.836
0.832
0.855
Testing AUC
0.742
0.760
0.742
and AUC of random forest model are comparable to logistic model but better than decision tree model.
4 Summary In the whole process, we build three models for three drugs in order to find a more suitable way to predict drug abuse. The accuracy and AUC of three models are given in Table 8. From the table, we can figure out that in these three models, random forest model has the best performance comprehensively. Logistic regression model is comparable to random forest model, and it also has a good prediction accuracy
The Application of Machine Learning Methods … Table 8 Traing/testing accuracy/AUC of all three models
505 Logistic
Tree
Random forest
Train accuracy
0.680
0.689
0.744
Test accuracy
0.678
0.650
0.681
Train AUC
0.744
0.710
0.836
Test AUC
0.754
0.670
0.742
Fig. 4 ROC curves of three models on training data set
which is acceptable compare to its simplicity. Decision tree model takes the last place but it is the most intuitive model and easy to understand. The ROC curves [10] of three models on training data set are set shown in Fig. 4, and the ROC curves of three models on testing data set are shown in Fig. 5. The measure of area below the curve is the AUC value of the model which is a significant value to tell which model is better [11]. From Fig. 4, we can see that on training data, random forest model is better than logistic model and logistic model is better than decision tree. From Fig. 5, we can see that on testing data, random forest model is comparable to logistic model, and they are all better than decision tree model.
506
P. Han
Fig. 5 ROC curves of three models on testing data set
References 1. S.H. Walker, D.B. Duncan, Estimation of the probability of an event as a function of several independent variables. Biometrika 54(1–2), 167–179 (1967) 2. L. Rokach, O. Maimon, Data mining with decision trees: theory and applications. World Scientific Pub Co Inc. ISBN 978-9812771711 (2008) 3. T.K. Ho, Random decision forests (PDF), in Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, 14–16 August 1995 (1995), pp. 278–282. Archived from the original (PDF) on 17 April 2016. Retrieved 5 June 2016 4. S. Shalev-Shwartz, S. Ben-David, 18. Decision Trees. Understanding Machine Learning (Cambridge University Press, Cambridge, 2014) 5. S.H. Walker, D.B. Duncan, Estimation of the probability of an event as a function of several independent variables. Biometrika 54(1/2), 167–178 (1967). https://doi.org/10.2307/2333860. JSTOR 2333860 6. J.R. Quinlan, Induction of decision trees (PDF). Mach. Learn. 1, 81–106 (1986). https://doi. org/10.1007/BF00116251 7. Mansour, Y. Pessimistic decision tree pruning based on tree size, in Proceedings of 14th International Conference on Machine Learning (1997), pp. 195–201 8. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, 2nd ed. (Springer, 2008). ISBN 0-387-95284-5 9. G. James, D. Witten, T. Hastie, et al., An introduction to statistical learning: with applications in R, in An Introduction to Statistical Learning (2013)
The Application of Machine Learning Methods …
507
10. T. Fawcett, An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–874 (2006) 11. J.A. Hanley, B.J. McNeil, A method of comparing the areas under receiver operating characteristic curves derived from the same cases
Set Representation of Itemset for Candidate Generation with Binary Search Technique Carynthia Kharkongor and Bhabesh Nath
Abstract Till date, there are many association mining algorithms which have been used in varied applications. The performance of these mining algorithms are greatly affected by the execution time and the way the itemsets are stored in the memory. Most of the time is wasted while scanning the database and searching for the frequent itemsets. The search space keeps on increasing when the number of attributes in the database is large, especially when dealing with millions of transactions. Furthermore, the representation of itemsets in the memory plays a vital role when large databases are handled. To solve this problem, this paper shows an improvement of the algorithms by representing a concise representation and reducing the searching time for the frequent itemsets. Keywords Association rule mining · Apriori algorithm · Frequent itemset · Binary search
1 Introduction Apriori algorithm was first introduced by Agarwal [2]. It is the basis of the mining algorithms. Since then, many improved versions of Apriori algorithm have been developed such as [4, 16, 22]. However, these algorithms have not been able to solve all the problems of the mining algorithms. This paper will solve the problem of memory requirement and the efficiency of Apriori algorithm by reducing the searching time of the candidate itemset [3]. The main task of mining the association mining algorithms is divided into two: • Generation of the frequent itemsets having support count greater than the minimum threshold. • After the generation of frequent itemsets, the rules are generated depending on the confidence threshold. C. Kharkongor (B) · B. Nath Tezpur University, Tezpur, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_46
509
510
C. Kharkongor and B. Nath
For a given dataset with transaction T = {t 1 , t 2 , t 3 , …, t n }, each transaction contains a set of items called itemsets. An itemset I consists of a set of items I = {i1 , i2 , i3 , …, in } where each item can represent an element. The two important metrics used in association rule mining are: • Support: is the frequency of the items when it appears in the dataset. Support = Frequency of the item (X)/Number of transaction in the dataset (N). • Confidence: is another metric used for measuring the strength of the rule. Confidence of the rule X → Y = support(X ∪ Y )/Support(X) [1, 15]. An itemset is said to be frequent itemset if its support count is greater than the minimum threshold value [11, 15].
1.1 Apriori Algorithm Firstly, the database scans multiple times to find the support count of the items. The items at k−1th level are called large itemsets, L k −1 . The large itemsets L k −1 is self-join with itself to generate L k itemsets. Union operation is performed during the self-join where only those itemsets which share a common prefix are joined. The support count of the itemsets L k −1 is measured. These itemsets are called candidate itemsets. The next step in Apriori algorithm is the pruning of the itemsets. The itemsets are prune when their support count is less than the minimum threshold count. These itemsets are now, called frequent itemsets. Furthermore, those itemsets whose subset are not frequent are also prune. This ensures that when the itemsets are not frequent then its superset is also not frequent [2].
2 Related Works A bitmap representation approach has been adopted by Bitmap Itemset Support Counting (BISC) [19], Closed FI and Maximal FI [17]. In BISC, the data is stored as bit sequences. These bit sequences are treated as bitmaps [19]. The transaction sets are stored in the form of bits. This reduces the memory storage especially when large datasets are handled. RECAP algorithm also provides a concise representation of itemsets by grouping together the patterns sharing regularities among attributes [12]. TM algorithm compresses the transaction ID’s by mapping the ID’s in a different space. Then, the itemsets are counted by intersecting these interval lists [14]. Similarly, VIPER algorithm uses bit vectors for itemset representation [13]. Another representation called LCM is a hybrid representation that consists of array, prefix and bitmap [18]. When the dataset is sparse, the array technique works for FP-tree. Array representation is used for representing itemsets in FPmax. The transversal time for all items is saved by using array [7].
Set Representation of Itemset for Candidate Generation …
511
3 Itemset Representation for Apriori Algorithm The itemsets in the memory are represented in horizontal or vertical layout. • Horizontal Layout: In this layout, the itemsets are represented using the item ID’s. In each transaction, the ID’s of the items are stored as shown in Fig. 1. • Vertical layout: The itemsets are represented as a set of columns where only the item ID’s are stored shown in Fig. 2. Depending on the type of representation, algorithms such as [12] which uses horizontal layout while [10] and [11] uses vertical layout. To represent the itemset I = {1, 5, 12, 25, 29, 31}, some of the data structures that can be used for itemsets representation are as follows: • Linked list: Suppose the itemset I needs to be represented in the memory. The total memory requirement will be 6 × (2 × Integers) regardless of the itemset size. Suppose the integer is 4 bytes, then memory requirement is 6 × (2 × 4) = 48 bytes. A representation using linked list is shown in Fig. 3. • Array: The above itemset I can be represented using array. Each item in the itemsets is stored as an element in the array. The memory requirement is 32 × Integers (32 bit). If integer is 4 bytes, then the total memory requirement is 32 × 4 = 128 bytes shown in Fig. 4. Algorithms such as [4] and [16] have used array for representing. • Bitmap: The itemsets I can be represented using bitmap by marking ‘1’ if the item is present in the itemset and ‘0’ if the item is absent from the itemset [20, 21]. The memory requirement to represent the itemset I is 32 × 1 byte = 32 bytes. Some Fig. 1 Horizontal layout for the itemsets
512
C. Kharkongor and B. Nath
Fig. 2 Vertical layout for the itemsets
Fig. 3 Linked list representation for the itemsets I = {1, 5, 12, 25, 29, 31}
Fig. 4 Array representation for the itemsets I = {1, 5, 12, 25, 29, 31}
algorithms such as [5, 9, 10, 19] have used bitmap for itemset representation as shown in Fig. 5. • Set representation: Each item can be stored by labeling ‘1’ if present or ‘0’ if absent. Using set representation, the maximum size of the attributes is the cardinality of set. Therefore, a 4 bytes long integer can be used for representing the itemset I as shown in Fig. 6 [8].
Fig. 5 Bitmap representation for the itemsets I = {1, 5, 12, 25, 29, 31}
Set Representation of Itemset for Candidate Generation …
513
Fig. 6 Set representation for the itemsets I = {1, 5, 12, 25, 29, 31}
4 Problem Definition The main challenge of Apriori algorithm is how the candidate itemsets are represented in the memory. With millions number of transactions, the process of mining becomes infeasible. If the database is of size 100, then the total number of generated candidate itemsets will be 2100 ∼ 1030 . If the integer takes 4 bytes each, then the memory consumption will be 4 × 1030 bytes for a candidate itemset size of only 100. When the database is large consisting of million of attributes, storing these itemsets in the main memory is impractical. Additional storage is required if the itemsets does not fit in the main memory. This will incur cost and I/O operations for retrieving itemsets from main memory to secondary and vice versa. Moreover, time is wasted while searching for those itemsets in the memory. This affects the overall performance of the algorithm.
5 Searching Techniques In Apriori algorithm, the itemsets are inserted in the list after the generation of candidate itemsets. In order to avoid redundant itemset in the list, the duplicate itemset needs to be removed from the list. This requires searching the whole list to find the duplicate itemset. The two types of searching that are most commonly used are linear and binary search. • Linear search: perform the searching operation sequentially. When the item is present at the first position, then the complexity will be O(1), which is the best case. When the item is present at the n/2 position where n is the total number of items present, then the number of comparisons will be n/2. The probability of finding the items is equally likely to be present. This gives the average case. However, when the item is present at the last position, the number of comparisons is n. This implies that searching has to start from the first position and then goes on checking each and every element until it reaches the last position. Hence, the complexity is O(n) which is the worst case. • Binary search: works on sorted list. It first divides the entire list into half with mid as the middle point. The first half of the list consists of items which is lesser than the middle value and the second half contains items which is greater than the middle value. The searched item is first compared with the middle value. If it is less than the middle value, then the first half of the list is again divided and
514
C. Kharkongor and B. Nath
it is then compared. If it is more than the middle value, the second half is again divided into two halves and then it is compared. This process of dividing the list into halves continues until the item is found [4, 6, 23].
6 Set Representation for Apriori Algorithm with Binary Search In this paper, the itemsets are represented using set representation for Apriori algorithm [8]. To represent the itemsets in the memory, the item is mark ‘1’ if it is present and ‘0’ if the item is absent. The operations used in set for Apriori algorithm are as follows: 1. Union operation: is done by using OR bitwise operation as shown in Fig. 7. 2. Subset operation: is check whether the itemset is a subset of another itemset. 3. Superset operation: is also similar to subset operation and checks whether a particular itemset is a superset of another itemset or not. 4. Membership operation: checks whether an item is a part of the itemset or not. 5. Intersection operation: is computed by using AND bitwise operation as shown in Fig. 8. 6. Set Difference operation: of two itemsets contain of only those items which are present in first itemset but not in the second itemset. This is shown in Fig. 9. The Apriori algorithm is implemented which involves using the above-mentioned operations that is needed for generation of the candidate itemsets. The candidate
Fig. 7 Union operation
Set Representation of Itemset for Candidate Generation …
Fig. 8 Intersection operation
Fig. 9 Set difference operation
515
516 Table 1 Complexity difference between binary search and linear search
C. Kharkongor and B. Nath Complexity
Binary search
Linear search
Best case
O(1)
O(1)
Average case
O(log n)
O(n)
Worst case
O(log n)
O(n)
itemsets are then sorted in ascending order. As seen from Table 1, the complexity of using binary search is better than the linear search in all the cases theoretically. In this paper, binary search technique is performed on these generated candidate itemsets to find the duplicate or redundant itemsets. The sorted and the unique itemsets are inserted into the candidate item list. The efficiency of the algorithm will improve if binary search is used.
7 Experimental Analysis The candidate itemsets generated in Apriori algorithm are tested using the three synthetic datasets. These datasets are of 50 attributes having size of 1000, 5000 and 20,000. With varying size of the datasets, the performance of the algorithm also changes. The candidate itemsets are represented using both the set and array representation with varying support count. The candidate itemsets are tested with support count 2.5, 5 and 10%. The set representation is compared with the array representation for candidate itemsets generation. Theoretically, using array representation each element will take 4 bytes while using set representation each element will take only 1 bit. Moreover, using linear search for worst case the time complexity is O(n) where as for binary search the complexity is log n. Both the searching techniques are used in set and array representation and the results are shown in Tables 2, 3 and 4.
8 Conclusion As seen from the results, we can say that the time requirement using set representation is less than the representation using array. The memory using set representation is also less than representation using array. Furthermore, using binary search the performance of set representation is better than array representation with linear search. Therefore, the set representation with binary search is more efficient for candidate generation. This will eventually increase the overall performance of the mining algorithms.
37881.2
24536
212.606
2.5%
5%
10%
21152
21198
21326 206
24495
37658.61 21140
21188
21253
Memory (Kbs)
83.522
15530.7
25858.4 13258
13358
13440
Memory (Kbs)
Time (ms)
Time (ms)
Time (ms)
Memory (Kbs)
Set representation using linear search
Support array representation count Array representation using binary using linear search search
80.88
15423.1
25633.7
Time (ms)
13158
13293
13382
Memory (Kbs)
Set representation using binary search
Table 2 Candidate itemset generation using array and set representation with binary and linear search for 1000 datasets respectively
Set Representation of Itemset for Candidate Generation … 517
653789
357076
943.788
2.5%
5%
10%
21592
22972
24635 909.78
355526
639126 21342
22206
24418
Memory (Kbs)
484.9
304867
597860 13892
14513
16383
Memory (Kbs)
Time (ms)
Time (ms)
Time (ms)
Memory (Kbs)
Set representation using linear search
Support array representation count Array representation using binary using linear search search
426.8
303867
591665
Time (ms)
13600
14293
16157
Memory (Kbs)
Set representation using binary search
Table 3 Candidate itemset generation using array and set representation with binary and linear search for 5000 datasets, respectively
518 C. Kharkongor and B. Nath
1247806
920218
14949.7
2.5%
5%
10%
27682
31693
40210 14639.7
901362
1190246 26382
30693
38458
Memory (Kbs)
14370.6
479508
783429 16782
21376
31721
Memory (Kbs)
Time (ms)
Time (ms)
Time (ms)
Memory (Kbs)
Set representation using linear search
Support array representation count Array representation using binary using linear search search
14178.3
477506
701665
Time (ms)
16582
20083
29703
Memory (Kbs)
Set Representation using binary search
Table 4 Candidate itemset generation using array and set representation with binary and linear search for 20,000 datasets, respectively
Set Representation of Itemset for Candidate Generation … 519
520
C. Kharkongor and B. Nath
References 1. M. Abdel-Basset, M. Mohamed, F. Smarandache, V. Chang, Neutrosophic association rule mining algorithm for big data analysis. Symmetry 10(4), 106 (2018) 2. R. Agrawal, T. Imieli´nski, A. Swami, Mining association rules between sets of items in large databases, in ACM Sigmod Record, vol. 22 (ACM, 1993), pp. 207–216 3. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A.I. Verkamo et al., Fast discovery of association rules. Adv. Knowl. Discov. Data Mining 12(1), 307–328 (1996) 4. M. Al-Maolegi, B. Arkok, An improved apriori algorithm for association rules (2014). arXiv: 1403.3948 5. G. Antoshenkov, Byte-aligned bitmap compression, in Proceedings DCC ’95 Data Compression Conference (IEEE, 1995), p. 476 6. F. Bodon, A fast apriori implementation. FIMI 3, 63 (2003) 7. G. Grahne, J. Zhu, Efficiently using prefix-trees in mining frequent itemsets. FIMI 90 (2003) 8. C. Kharkongor, B. Nath, Set representation for itemsets in association rule mining, in 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS) (IEEE, 2018), pp. 1327–1331 9. N. Koudas, Space efficient bitmap indexing, in CIKM (2000), pp. 194–201 10. A. Moffat, J. Zobel, Parameterised compression for sparse bitmaps, in Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, 1992), pp. 274–285 11. T.T. Nguyen, Mining incrementally closed item sets with constructive pattern set. Expert Syst. Appl. 100, 41–67 (2018) 12. D. Serrano, C. Antunes, Condensed representation of frequent itemsets, in Proceedings of the 18th International Database Engineering & Applications Symposium (ACM, 2014), pp. 168– 175 13. P. Shenoy, J.R. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, D. Shah, Turbo-charging vertical mining of large databases, in ACM Sigmod Record, vol. 29 (ACM, 2000), pp. 22–33 14. M. Song, S. Rajasekaran, A transaction mapping algorithm for frequent itemsets mining. IEEE Trans. Knowl. Data Eng. 18(4), 472–481 (2006) 15. R. Srikant, R. Agrawal, Mining generalized association rules (1995) 16. R. Srikant, R. Agrawal, Mining sequential patterns: Generalizations and performance improvements, in International Conference on Extending Database Technology (Springer, 1996), pp. 1–17 17. V. Umarani, et al., A bitmap approach for closed and maximal frequent itemset mining. Int. J. Adv. Res. Comput. Sci. 3(1) (2012) 18. T. Uno, M. Kiyomi, H. Arimura, Lcm ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining, in Proceedings of the 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations (ACM, 2005), pp. 77–86 19. K. Wu, E.J. Otoo, A. Shoshani, Optimizing bitmap indices with efficient compression. ACM Trans. Datab. Syst. (TODS) 31(1), 1–38 (2006) 20. M.C. Wu, Query optimization for selections using bitmaps, in ACM SIGMOD Record, vol. 28 (ACM, 1999), pp. 227–238 21. M.C. Wu, A.P. Buchmann, Encoded bitmap indexing for data warehouses, in Proceedings 14th International Conference on Data Engineering (IEEE, 1998), pp. 220–230 22. Y. Ye, C.C. Chiang, A parallel apriori algorithm for frequent itemsets mining, in Fourth International Conference on Software Engineering Research, Management and Applications (SERA’06) (IEEE, 2006), pp. 87–94 23. M.J. Zaki, Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Robust Moving Targets Detection Based on Multiple Features Jing Jin, Jianwu Dang, Yangpin Wang, Dong Shen, and Fengwen Zhai
Abstract Moving targets detection is the most basic part of intelligent video analysis, and its detection effect is directly related to the accuracy of subsequent processing. Aim to cope with the challenges of multi-modal background in complex video environment, the paper proposes a new detection method that combines pixelbased feature and region-based feature. It proposes a new region textural statistic by fuzzy clustering to statistical texture and then fuses it with intensity of pixel. Feature vectors which consisted of rich information are used in background model. Optimal threshold segmentation method is used to obtain adaptive threshold for foreground detection. The experiments indicate that the method can achieve expected results and obviously outperform performance in scenes including multi-modal background. Keywords Moving detection · Fuzzy textural statistic · Kernel FCM · Optimalthreshold segmentation
1 Introduction In most computer vision applications, the detection of moving targets is a key task and plays an important role in the intelligent video analysis system. Its goal is to segment the moving foreground from the video scene. Accurate detection results will contribute to the next stage of tracking, classification and other high level processing. However, the research field is still full of challenges due to multi-modal background interference, illumination changes, camera shake, foreground occlusion and shadows and so on in the video scenes [1, 2]. The background modeling method is the used widely because of its high accuracy and real-time performance. The essence of background modeling is to establish a statistical expression for the scene that must be robust enough to deal with changes in the scene. The Gaussian Mixture Model [3] is a classical parametric statistical J. Jin (B) · J. Dang · Y. Wang · D. Shen · F. Zhai Lanzhou JiaoTong University, Lanzhou, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_47
521
522
J. Jin et al.
model. It uses multiple Gaussian models to fit the multi-peak state of pixel brightness variation. The algorithm has large computational complexity. The CodeBook Model [4] creates codebook for each pixel in the time domain. The codebooks of all pixels form a complete background model. The codebook model can handle multimodal scenes well while the memory consumption is large. The subspace statistical model carries the background model by robust principal component analysis [5]. This method can separate moving targets accurately, but it has to satisfy the assumption of background stillness. The self-organizing background modeling method (SOBS) [6] maps one pixel in the background model to multiple locations of the model and adopts the pixel neighborhood spatial correlation in update. Adaptive background segmentation algorithm (PBAS) [7] based on feedback mechanism introduces cybernetics by which changes the foreground judgment threshold and background model update rate adaptively. PBAS has high accuracy, but the calculation and setting of multiple adaptive thresholds increases the algorithm complexity. The Vibe [8] is a lightweight algorithm that initializes the model with a random sampling strategy in the first frame and updates the background model by quadratic random sampling. The propagation mechanism of spatial domain information makes the algorithm robust to camera jitter. Vibe algorithm has higher time efficiency. Of course, a lot of new algorithms are also being proposed. Literature [9] proposes a novel detection method for underwater moving targets by detecting their extremely low frequency emissions with inductive sensors. Minaeian proposes an effective and robust method to segment moving foreground targets from a video sequence taken by a monocular moving camera [10]. It handles camera motion better. With the popularity of deep learning, some related methods have emerged for moving targets detection. Literature [11] trains a convolutional neural network (CNN) for each video sequence to extract a specific background which improves the accuracy of the background model. It has to takes a lot of time to train. The requirements of the hardware platform also limit the application of the deep learning method. Combining of multiple features in foreground detection is an important research idea [12]. All of these methods either use single pixel features or just use local features. This paper proposes a new method that combines pixel-based feature and region-based feature. Brightness is used as a basic feature of a single pixel and combined with regional texture features for background modeling. The most important thing is that the traditional statistical texture features are fuzzy clustered which improves the discrimination and robustness of regional texture features.
2 Our Moving Targets Detection Method In this part, our detection method is given completely. In Sect. 2.1, it introduces a new fuzzy textural statistic through Kernel FCM to original statistic textural feature and describes calculation process in detail. In Sects. 2.2 and 2.3, background modeling and update mechanism is recommended respectively.
Robust Moving Targets Detection Based on Multiple Features
523
2.1 Construction of Fuzzy Textural Statistic Pixel-based features are effective in extracting the accurate shape of the objects, but it is not effective enough and robust in dynamic backgrounds. The combination of region-based features will inherit the advantages of both. Intensity, color and gradient can be chosen as pixel-based features, region-based features such as Local Binary Pattern (LBP) have been attempted in some research. LBP owns many advantages such as fast calculation speed and gray scale invariance, but it is not effective in consistent textures. Therefore, the paper proposes a fuzzy textural statistic which applies Kernel Fuzzy C-Means (KFCM) to statistical texture. It obtains fuzzy texture vector by fuzzy clustering to Gray-Level Run-Length Matrix (GLRLM) and then calculates feature parameter. The fuzzy texture vector is more robust and more suitable for video processing by combining richer texture information. The final feature vector used in background model is consisted of region texture feature described by these feature parameters and pixel intensity information. Gray-level run-length is a representation of the grayscale correlation between pixels at different geometric locations. It is characterized by the conditional probability that successive pixels have same grayscale. As for an image I (M × N size), the element in its Gray-Level run-length Matrix R (L × N size, L is gray quantization level) is R θ (n, l). R θ (n, l) is called run-length which indicates the probability that continuous n(1 ≤ n ≤ N ) pixels have same gray level l(1 ≤ 0 ≤ L) starting from any location along the θ direction. θ is the angle between pixels in two-dimensional plane (θ = 0◦ in this paper). As for a center pixel (x, y), the GLRLM R of its local region is expressed by a vector C R in row major order. Initializing cluster centers Vi = {vxi , v yi }i=0,1,...d−1 and performing clustering by Kernel FCM on the intervals of C R , we can obtain the membership matrix M. Its element m(i, j)(i = 0, 1, . . . , d − 1; j = 0, 1, . . . , L × N − 1) indicates the membership that the jth interval of GLRLM belongs to the ith cluster center. In Kernel FCM algorithm, Eqs. (1) and (2) are used to iteratively update the cluster centers and the membership matrix until the error is less than threshold ε or maximum iteration has reached [13]. m(i, j) = d−1
1
1−K ( p j ,vi ) 1 r −1 k=0 ( 1−K ( p j ,vk ) )
0 ≤ i < d, 0 ≤ j < L × N
(1)
L×N −1
m(i, j)r K ( p j , vi ) p j j=0 vi = L×N −1 m(i, j)r K ( p j , vi ) j=0
0≤i 1) is a constant that p −v 2 controls the clustering ambiguity; K ( p j , vi ) = exp(− j σ 2 i ). After obtaining the convergent membership matrix, the vector C R (L × N ) is converted into a vector F with d dimension.
524
J. Jin et al.
F = MC R
(3)
Every element F(i) in F could be expressed as Eq. (4): F(i) =
L×N −1
m(i, j)C R ( j) i = 0, 1, . . . , d − 1
(4)
j=0
F(i) reflects the comprehensive membership that gray-level run-length vector belong to ith cluster center. Vector F realizes effective dimensionality reduction as well as contains fused and richer texture information. The paper names it as Fuzzy Textural Vector (FTV). The membership matrix is fixed under the determined parameters. So it can be calculated online in advance in order to ensure the time efficiency of the algorithm. GLRLM uses the statistic such as the run-length factor to reflect the texture information. As for fuzzy textural vector, two statistics such as fuzzy long run-length factor R F1 and fuzzy total run-length percent R F2 are applied to textural description on the analogy. As shown in Eqs. (5) and (6), Q is the total number of run-length with value of 1. R F1 =
d−1
v2y j F (i)
(5)
i=0
R F2 =
d−1
F (i)/Q
(6)
i=0
F is the normalized vector from F: F F = d−1 i=0
F(i)
(7)
In order to measure the textural feature characterization ability of the fuzzy texture vector FTV proposed in the paper, it is compared with other statistical textural feature such as Gray-Level Co-occurrence Matrix (GLCM) and Gray-Level Run-Length Matrix (GLRLM). The Support Vector Machine (SVM) is used to classify texture images in the FMD dataset and the Correct Classification Percentages (CCPs) is used as the final evaluation standard for classification result. The classification results are shown in Table 1. It can be seen that the statistics of FTV achieve certain improvement on the correct classification rate with better texture discrimination ability. To verify the effectiveness of FTV in video processing, we select the pixel point A (lower left grass) located in the background with the repeated perturbation in Fig. 1a and observe its change within a certain period (total 844 frames) by different texture
Robust Moving Targets Detection Based on Multiple Features
525
Table 1 Comparison of texture classification ability Method
GLCM
Features
Energy
Correlation
Long run-length factor
GLRLM Total run–length percent
FTV R F1
R F1
CCPs (%)
82.3
86.6
87.5
89.5
89.8
92.3
(a) A frame of the video
(b) Variation statistics of different texture measurement Fig. 1 Feature variation statistics of repeated perturbed pixels in video
526
J. Jin et al.
metrics. As shown in Fig. 1b, the fuzzy long run-length factor and fuzzy total runlength percent of FTV are more stable and more robust in spatial textural feature description than raw intensity feature and LBP in video environment.
2.2 Background Modeling For a pixel p(x, y), its feature vector v = [I, R F1 , R F2 ]T is consisted of raw intensity, the fuzzy long run-length factor and fuzzy total run-length percent computed in m × m p-centered region by process flow in section A. The feature vectors constitute the sample set. The first frame is used for background model initialization. The model of every pixel is established by random sample in its 3 × 3 neighborhood: M(x, y) = {v1 , v2 , . . . vn }
(8)
where, n is model size.
2.3 Foreground Detection and Model Update For ith frame in video sequence, feature vector of one pixel is v. vi is a sample in its background model. Each pixel is processed as follow: 1. Foreground detection in the current frame. Whether the current pixel is the background is judged by the similarity between the current pixel and its corresponding sample set. Euclidean distance is used for calculating distance between two feature vectors. The marker variable will be given value 1 if Euclidean distance is bigger than a threshold R: ci =
1 distance(v, vi ) < R (1 ≤ i ≤ n) 0 otherwise
(9)
In video processing, scenes always change under the influence of environment. The fixed threshold R that cannot adapt to different frame will be inaccurate and reduce the accuracy of moving detection inevitably. Therefore, the paper adopts an adaptive threshold calculation method. Threshold-based methods are common in image segmentation. Considering the computational efficiency, the optimal threshold segmentation method with good segmentation results and simple calculation is used to calculate the adaptive threshold in our algorithm. The calculation method is displayed as follows: (1) Find the maximum and minimum gray values zmax and zmin of the frame and initialing threshold T = (zmax + zmin)/2;
Robust Moving Targets Detection Based on Multiple Features
527
(2) The image frame is divided into foreground and background according to the threshold. The average gray value z F and z B of the two parts are obtained separately; (3) Compute new threshold T = (z F + z B)/2; (4) T is the final threshold until it does not change anymore. Otherwise the flow path goes to the second step and continues the iterative calculation. The adaptive threshold Ri of ith frame f i is calculated as follows: T Bi = Optimal_S( f i − Bave )
(10)
T Fi = Optimal_S( f i − f i−1 )
(11)
Ri = ∂ T Fi + (1 − ∂)T Bi
(12)
where Optimal_S means doing optimal threshold segmentation and Bave is made up of sample mean of each pixel in the current model. The weighted factor ∂ is proportion of foreground pixels in the previous frame f i−1 . The binarization result of pixel p(x, y) in ith frame is determined by (13), where T is an experience threshold: ⎧ n ⎨0 c ≥ T i Bi (x, y) = ⎩ i=1 1 otherwise
(13)
2. Background Model update. If a pixel in current frame is judged as background, its background model will be updated with 1/ϕ probability in order to adapt background change. A randomly selected sample in M(x, y) will be replaced by the feature vector v of current pixel. At meanwhile, neighborhood update idea proposed in Vibe is used in our process flow. We randomly select a pixel (xnb , ynb ) in 8-neighborhood region and also replace its model M(xnb , ynb ) with feature vector v of current pixel.
3 Experiments Experiments are performed on the CDNET standard library [14]. The quantitative indicators are F1 (F_measure), Re (Recall) and Pr (Precision). In experiments, five videos such as PETS2006, Waving tree, highway, fountain and canoe are selected for performance test of the algorithm. All of these scenes include background perturbance. Parameters setting of compared algorithms are identical to the original references. Figure 2 shows the experimental results of six algorithms in five different scenes.
528
J. Jin et al. PET2006 116th frame
Wavingtree 249th frame
Highway 241th frame
fountain 1119th frame
canoe 996th frame
Input frame
GMM
KDE
CodeBook
ViBe
PBAS
Ours
Fig. 2 Detection results of 6 algorithms
“PET2006” is a scene where multiple pedestrians are moving in the subway station. The indoor lighting makes each moving object have projected shadow and the twinkling metal fence is also interferes the foreground detection due to the change of the light. The contour of the pedestrian in the mask generated by the GMM is incomplete. KDE algorithm generates a lot of noise for both the fence and the shadow. Comparing the masks obtained by CodeBook, ViBe and PBAS, it is obvious that our algorithm can get the better result in multi-modal part such as moving shadow and mental fence. The tree in the “Wavingtree” scene is multi-modal complex background. Although the detection results of KDE and CodeBook retain more complete foreground, but a lot of noise is left in the swaying area. Due to the introduction of multiple features and neighborhood spread method, the proposed method outperforms the other four algorithms in the detection of dynamic background. Although the PBAS algorithm can effectively remove the multi-modal background, but its time consumption is higher than the proposed algorithm because that multiple thresholds are adjusted. The challenge in the “Highway” scene is the tree on the left and the shadow of the moving vehicles. Compared with other algorithms, the proposed algorithm has better
Robust Moving Targets Detection Based on Multiple Features
529
effect on multi-modal background and foreground integrity. As for the shadows of moving cars in the scene, the effects of these six algorithms are not ideal. “fountain” is a very complicated scene because of the fountain that keeps spraying. In 1119th frame, a white car is driving from right to left. Our method removes the interference of multi-modal fountain in the greatest extent and retains the most complete foreground area of sports car. The main reason is that the proposed method introduces robust and stable fuzzy texture features and adaptive threshold in background modeling. The “canoe” scene is similar. The combination of region-based feature and pixel-based feature and flexible model update mechanism contribute to obtain better detection result in river area that keeps shaking. The adaptive threshold computed by optimal segmentation is useful for getting complete canoe. Comparing our method with other algorithms by quantitative analysis, the average performance of the six algorithms in the five scenes is shown in the Table 2. It is obviously that the paper that combines robust fuzzy texture statistic with color information and adopts flexible background modeling and updating mechanism is efficient to achieve better comprehensive performance in this kind of scenes. The comparison of the average time efficiency for the algorithms is shown in the Fig. 3. All of these algorithms are able to achieve real-time performance. Although process time of our method is more than light-weight algorithm Vibe because of calculation in Kernel FCM and distances between feature vectors, detection performance is increased by 10.6%. Table 2 Compare of average performance GMM
KDE
CodeBook
ViBe
PBAS
Ours
Re
0.68
0.58
0.76
0.78
0.79
0.83
Pr
0.64
0.49
0.61
0.73
0.78
0.84
F1
0.66
0.53
0.68
0.75
0.78
0.83
Fig. 3 Time efficiency of 6 algorithms (frame/s)
530
J. Jin et al.
4 Conclusions Aim to enhance detection precision of moving foreground extraction in video processing, the paper proposes a robust method for complicated multi-modal scene. It obtains fuzzy textural vector FTV through kernel FCM to traditional statistic textural feature. Then fuzzy textural statistics is combined with intensity to construct feature vectors. Random sample and neighborhood spread mechanisms are used in modeling and update stages. The judgment threshold of samples distance is calculated adaptively. Experiments have proved the efficiency of the method in multi-modal scenes comparing to other outstanding algorithm. Acknowledgements This work was supported by National Natural Science Foundation of China with grant No. 61562057 and Gansu Provincial Technology Plan with grant No. 17JR5RA097.
References 1. A. Siham, A. Abdellah, S. My Abdelouahed, Shadow detection and removal for traffic sequences, in Proceeding of International Conference on Electrical and Information Technologies (Guangzhou, 2016, April) pp. 168–173 2. T. Bouwmans, Traditional and recent approaches in background modeling for foreground detection: an overview. Computer Science Review 11, 31–66 (2014) 3. C. Stauffer, W. Grimson, Adaptive background mixture models for real-time tracking. in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Fort, 1999, July) pp. 2246–2252 4. K. Kim, T.H. Chalidabhongse, D. Harwood et al., Real-time foreground-background segmentation using codebook model. Real-Time Imaging 3, 172–185 (2005) 5. J. Jing, D. Jianwu, W. Yangping et al., Application of adaptive low-rank and sparse decomposition in moving objections detection. J. Front. Comput. Sci. Technol. 12, 1744–1751 (2016) 6. L. Maddalena, A. Petrosino, The SOBS algorithm: what are the limits? in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (Rhode Island, 2012, June) pp. 21–26 7. M. Hofmann, P. Tiefenbacher, G. Rigoll, Background segmentation with feedback: the pixelbased adaptive segmenter, in Proceeding of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Washington, 2012, July) pp. 38–43 8. O. Barnich, V.D. Mac, Vibe: a universal background subtraction algorithm for video sequences. Image Processing 6, 1709–1724 (2011) 9. J. Wang, B. Li, L. Chen et al., A novel detection method for underwater moving targets by measuring their elf emissions with inductive sensors. Sensors 8, 1734 (2017) 10. S. Minaeian, L. Jian, Y.J. Son, Effective and efficient detection of moving targets from a UAV’s camera. IEEE Transactions on Intelligent Transportation Systems 99, 1–10 (2018) 11. M. Braham, M. Van Droogenbroeck, Deep background subtraction with scene-specific convolutional neural networks, in Proceeding of International Conference on Systems, Signals and Image Processing (Bratislava, May 2016), pp. 1–4 12. J. Zhai, Z. Xin, C. Wang, A moving target detection algorithm based on combination of GMM and LBP texture pattern, in Guidance, Navigation and Control Conference (2017), pp. 139–145
Robust Moving Targets Detection Based on Multiple Features
531
13. D.Q. Zhang, S.C. Chen, Kernel-based fuzzy and possibilistic—means clustering, in Proceeding of ICANN (2003), pp. 122–125 14. N. Goyette,P.M. Jodoin, F. Porikli et al., Change detection net: a new change detection benchmark dataset, in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Rhode Island, 2012, July) pp. 16–21
Digital Rock Image Enhancement via a Deep Learning Approach Yunfeng Bai and Vladimir Berezovsky
Abstract Digital Rock Images have been widely used for rock core analysis in petroleum industry. And it has been noticed that the resolution of Digital Rock Images are not fine enough for complex real-world problems. We propose Deep Neural Networks to increase resolution and quality for Digital Rock Images. The results demonstrate that our proposed method indeed can produce Digital Rock Images of higher resolution. And the proposed method for two-dimensional images has potential value to extend to 3D situation which means a lot for three-dimensional Digital Rock reconstruction. Keywords Deep neural networks · Digital rocks images · Image enhancement · Image resolution
1 Introduction Deep Neural Networks have been widely used to process 2D image in recent years and become more and more popular as time goes by. There are plenty of impressive applications of Deep Neural Networks in the fields of image recognition, image enhancement, image clustering, image cutting, object detection, face recognition, style migration and other fields [1–5]. Deep Neural Networks have brought in revolutionary achievements for image processing. In this paper, we are inspired by some successful image processing applications of neural networks to apply Deep Neural Networks to increase resolution and quality for Digital Rock Images. Rock core images play an important role in analyzing rock core in petroleum industry. Petroleum engineers are interested in obtaining a series of numerical descriptors or features which statistically describe porous materials [6]. If these Y. Bai (B) · V. Berezovsky Northern (Arctic), Federal University Severnaya Dvina Emb, 17, Arkhangelsk, Russia e-mail: [email protected] V. Berezovsky e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_48
533
534
Y. Bai and V. Berezovsky
features are valid, they may eventually be used for the prediction of the physical properties of porous media including porosity, specific permeability and formation factor which are important components in production studies [6]. Traditionally, rocks picture are studied by experts and they are interested in Digital Rock Images of better resolution. The better the resolution of the pictures, the more accurate numerical descriptors or features they can get. That is why we propose a machine learning method to supply Digital Rock Images of better resolution for them to overcome complex real-world problems. We also plan to make these enhanced pictures to be a dataset for another part of our project and apply the similar way to 3D Digital Rock reconstruction. The rest of this paper includes 5 parts: Sect. 2 introduces several impressive works related under this topic. Section 3 introduces machine learning algorithm implementation framework which is named Tensorflow. Section 4 introduces some detail of our proposed method. Section 5 introduces experiment results and makes discussion. Section 6 makes some conclusion.
2 Related Works In 2017, a method was proposed by Pratik Shah and Mahta Moghaddam to improve the spatial resolution of microwave images [7]. They eased the issue based on the strategy to provide additional information through learning [7]. They incorporated learning using machine learning algorithm. The model includes two stages which are convolutional neural networks and a non-linear inversion approach [7]. The results showed that their method could produce image of higher resolution [7]. In 2019, a deep recurrent fusion network (DRFN) was proposed by Yang et al. [8]. Their proposed method was utilized transposed convolution instead of bicubic interpolation in image processing [8]. They adopted learning method which has a larger receptive field and reconstructs images more accurately [8]. Extensive benchmark datasets evaluations show that even using fewer parameters, their approach achieves better performance than most of deep learning approaches in terms of visual effects and accuracy [8]. In 2019, Shen et al. proposed a multi-level residual up-projection activation network (MRUAN). Their model includes residual up-projection group, upscale module and residual activation block [9]. Specifically, residual up-projection group mines hierarchical low resolution feature information and high resolution residual information with the help of recursive method [9]. Subsequently, the upscale module adopts multi-level LR feature information as input to obtain HR features. Extensive benchmark datasets evaluations show that their MRUAN performs favorable against state-of-the-art methods [9].
Digital Rock Image Enhancement via a Deep Learning Approach
535
3 Tensorflow Tensorflow is machine learning algorithm implementation framework and deployment system designed to study ultra-large-scale deep neural networks for the Google Brain project launched in 2011 [10]. Tensorflow can be easily deployed on a variety of systems. In recent years, the performance of Tensorflow has been tested by some researchers and Tensorflow become one of the most popular deep learning libraries [11–13]. Meanwhile, Tensorflow supports Python which is very popular in academia. All of these situations make researchers easily and quickly to develop their deep neural network models and test their new ideas without worrying about the underlying algorithm.
4 Proposed Methods After comparing the methods and applications mentioned in the related works, we chose to apply SRCNN method to this project. At the same time, the SRCNN method has been used to enhance medical images [7], radar images [14], and underwater images [15] and is demonstrated the effectiveness in these fields. That is why we are confidence in our choice about the approach to produce Digital Rock Images of higher resolution. The working process of SRCNN is of two steps [7]. First, low-resolution images are interpolated to the desired scale with interpolation method [7]. Second, the interpolated images are mapped to the higher resolution image by Convolutional Neural Network whose filters’ weights are obtained by learning plenty of images [7]. In our project, filters’ weights of Convolutional Neural Network are obtained by learning plenty of Digital Rock images. For training the network, we used images from our cooperative laboratory. First, we cut these 400*500 pixel images into small pictures of 72*72 pixels. Second, we randomly selected 80% of the slices to prepare for training data and the remaining 20% slices to prepare for testing data. Third, we generated these low resolution images of 36*36 pixels by downsampling and blurring the features. Finally, we upscaled the images to 72*72 pixels with interpolation to fulfill the first step of SRCNN and then trained the Deep Neural Networks.
5 Experiment Result and Discussion The network was trained with general-purpose GPU. The GPU is GeForce GTX 1050 Ti. It has 768 NVIDIA CUDA Cores and 4G GDDR5 memories which are of 7 Gbps Speed. The GPU Architecture is Pascal.
536
Y. Bai and V. Berezovsky
The Deep Neural Networks were developed with Tensorflow. Upscaled pictures in the training data and original images in the training data were used as to train the Deep Neural Networks. Upscaled pictures were used as input and their output to compare with corresponding original images. Then adjusting the weight of the deep neural network by comparing the differences between the two groups until the training was completed. The filters’ weights of Convolutional Neural Network are randomly initialized by the function of Tensorflow before the step of training. After this, stochastic gradient descent is applied to minimize the mismatch. The model has been trained for several times. The iterations for training distributed within 200–600 then the networks were tested on the images from the testing data which has never been used to train Deep Neural Networks.
6 Conclusion The test demonstrated the feasibility of SRCNN to produce Digital Rock Images of higher resolution, and the potential practical value in rock image analysis in petroleum industry. One of our future works is scale this GPU implementation into several GPUs with a bigger dataset. And we also plan to extend this method to 3D Digital Rock reconstruction. Statement Compliance with Ethical Standards. Funding This study was funded by the Russian Foundation for Basic Research (grant number No. 16-29-15116) and China Scholarship Council. Conflict of Interest The authors declare that they have no conflict of interest.
References 1. P. Neary, Automatic hyperparameter tuning in deep convolutional neural networks using asynchronous reinforcement learning. in 2018 IEEE International Conference on Cognitive Computing (ICCC). (IEEE, 2018). http://doi.ieeecomputersociety.org/10.1109/ICCC.2018. 00017 2. S. Park, S. Yu, M. Kim, K. Park, J. Paik, Dual autoencoder network for retinex-based low-light image enhancement. in IEEE Access. vol 6, (2018) pp. 22084–22093. https://doi.org/10.1109/ access.2018.2812809 3. Y. Li, T. Pu, J. Cheng, A biologically inspired neural network for image enhancement. in 2010 International Symposium on Intelligent Signal Processing and Communication Systems. (IEEE, 2010). https://doi.org/10.1109/ISPACS.2010.5704686 4. Y. Zhao, Y. Zan, X. Wang, G. Li, Fuzzy C-means clustering-based multilayer perceptron neural network for liver CT images automatic segmentation. in 2010 Chinese Control and Decision Conference. (IEEE, 2010). https://doi.org/10.1109/CCDC.2010.5498558
Digital Rock Image Enhancement via a Deep Learning Approach
537
5. T. Kinattukara, B. Verma, Clustering based neural network approach for classification of road images. in 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR). (IEEE, 2013). https://doi.org/10.1109/socpar.2013.7054121 6. R.M. Haralick, K. Shanmugam, Computer classification of reservoir sandstones. IEEE Transactions on Geoscience Electronics 11(4), 171–177 (1973). https://doi.org/10.1109/TGE.1973. 294312 7. P. Shah, M. Moghaddam, Super resolution for microwave imaging: A deep learning approach. in 2017 IEEE International Symposium on Antennas and Propagation & USNC/URSI National Radio Science Meeting. https://doi.org/10.1109/apusncursinrsm.2017.8072467 8. X. Yang, H. Mei, J. Zhang, K. Xu, B. Yin, Q. Zhang, X. Wei, DRFN: deep recurrent fusion network for single-image super-resolution with large factors. IEEE Trans. Multimedia 21(2), 2019. https://doi.org/10.1109/tmm.2018.2863602 9. Y. Shen, L. Zhang, Z. Wang, X. Hao, Y. –L. Hou, Multi-level residual up-projection activation network for image super-resolution. in 2019 IEEE International Conference on Image Processing (ICIP). https://doi.org/10.1109/icip.2019.8803331 10. Y. Bai, V. Berezovsky, Digital rock image clustering based on their feature extracted via convolutional autoencoders. in Proceedings of the International Conference on Digital Image & Signal Processing. ISBN (978-1-912532-05-6) 11. D.C. Ciresan, U. Meier, J. Masci, L.M.Gambardella, J. Schmidhuber, Flexible high performance convolutional neural networks for image classification. in Twenty-Second International Joint Conference on Artificial Intelligence (2011, June) pp. 1237–1242 12. C. Feng, The basement of CNN fully connected layer. Deep learning in an easy way—–learn core algorithms and visual practice (Publishing House of Electronics Industry, 2017) pp. 50 13. W. Huang, Y. Tang, Tensorflow combat. (Publishing House of Electronics Industry, 2017) pp. 1–2. ISBN 978-7-121-30912-0 14. Y. Dai, T. Jin, Y. Song, H. Du, SRCNN-based enhanced imaging for low frequency radar. in 2018 Progress in Electromagnetics Research Symposium (PIERS-Toyama). https://doi.org/10. 23919/piers.2018.8597817 15. Y. Li, C. Ma, T. Zhang, J. Li, Z. Ge, Y. Li, S. Serikawa, Underwater image high definition display using the multilayer perceptron and color feature-based SRCNN. Access vol 7, IEEE. https://doi.org/10.1109/access.2019.2925209
Enhancing PSO for Dealing with Large Data Dimensionality by Cooperative Coevolutionary with Dynamic Species-Structure Strategy Kittipong Boonlong and Karoon Suksonghong
Abstract It is widely recognized that the performance of PSO in dealing with multiobjective optimization problem is deteriorated with an increasing of problem dimensionality. The notions of cooperative coevolutionary (CC) allow PSO to decompose the large-scale problem into the multiple subcomponents. However, the combination of CC and PSO tends to perform worse if interrelation among variables is exhibited. This paper proposes the dynamic species-structure strategy for improving search ability of CC and incorporates it with PSO algorithm. The resulting algorithm, denoted as “DCCPSO”, is tested with the standard test problems, widely known as “DTLZ”, with 3–6 optimized objectives. In the large-scale problem setting, the experimented results reveal that our proposed decomposition strategy helps enhancing performance of PSO and overcoming problems pertaining to interrelation among variables issues. Keywords Cooperative coevolutionary algorithm · Particle swarm optimization · Non-separable problem · Problem decomposition · Dynamic species
1 Introduction The main obstacle of utilizing the particle swarm optimization (PSO) in multiobjective (MO) optimization framework is identifying the global best solutions of large-scale problem domain with various optimized objective. The cooperative coevolutionary algorithm (CCA) proposed by Potter and De Jong [1, 2] pioneers the applicable method for decomposing and handling the large-scale problem simultaneously. This technique is motivated by the notion of divide-and-conquer. According
K. Boonlong Faculty of Engineering, Burapha University, Chonburi, Thailand K. Suksonghong (B) Faculty of Management and Tourism, Burapha University, Chonburi, Thailand e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_49
539
540
K. Boonlong and K. Suksonghong
to the Potter and De Jong’s framework, a solution consisting of N variables is decomposed into n subcomponents which are subsequently assigned to N corresponding species. Evolution of these species is performed more or less independently of one another. Meanwhile, the objective function can be evaluated after a complete solution is formed. Since any given species represents only a subcomponent of a complete solution, a full solution can be constructed by merging the individuals of considered species with those of the remaining species, the so-called “collaborator”. However, the problem decomposition strategy and the structure of the problem at hand are crucial in determining the performance of CCA [3]. Many researches effectively applied the concept of cooperative coevolution to the particular optimization problems [4–7]. Observing from the literature, although CCA performs well for large-scale optimization problem, its performance is rapidly dropped once the number of optimized objectives increases with the presence of interdependency among variables, i.e. non-separable problems [8]. To address the potential drawback on interdependency among variables, Bergh and Engelbrecht [8] incorporating CCA with PSO, decomposed an N variables problem into K species where each of them consists of s variables, e.g. N = K × s. As a result, each species represents s variables instead of one as the case of original CCA. However, the member of s variables in each species has never been adjusted during algorithms courses. Subsequently, the splitting-in-half decomposition scheme was proposed by Shi et al. [3]. In this scheme, solution chromosome is equally separated into two parts and evolutionary processes of these parts are performed individually. The main objective of these two strategies is to enlarge the number of variables assigned into species, i.e. species-structure in order to increase the chance of having interacting variables placed in the similar species. However, CCA adopted these two static decomposition schemes still perform poorly when dealing with large-scale problem. Besides, these schemes require a user to have a priori knowledge about the structure of the optimized problem. Alternatively, the species-structure that can be altered as algorithms run progresses and triggered by pre-specified rules could make a good sense for alleviating above problems [9, 10]. Yang et al. [9] articulated that once the frequency of alteration of grouping size increases, the probability of placing interacting variables within the same group also rises up. In [9], a new dynamic decomposition strategy whose variables assignment method is based on probability was proposed. Their proposed decomposition method enables species-structure to be adjusted throughout the optimization run. The proposed method was integrated to CCDE and tested with a set of non-separable problems with up to 1000 variables. Zhang et al. [10] proposed a dynamic cooperative coevolutionary framework in order to solve non-separable large scale global optimization problems. This framework appropriately incorporates two critical attributes of the variables, i.e., interaction relationship and contribution, in solving non-separable problems. To exploit the prominent features of CCA, this paper proposes a new dynamic decomposition strategy and integrates to PSO algorithm with extensive modifications for dealing with the large data dimensionality where interdependency among variables exists. This paper is organized as follows. The proposed dynamic species
Enhancing PSO for Dealing with Large Data Dimensionality …
541
structure and its procedures are discussed in Sect. 2. Section 3 explains the standard test problem employed together with experimental setting. The simulation results are presented in Sect. 4, while our conclusions are stated in Sect. 5.
2 Cooperative Coevolutionary PSO with Dynamic Species Structure Strategy Particle swarm optimization (PSO), a derivative free method, is originally introduced by Kennedy and Eberhart [11]. PSO mimic the behavior of a bird flock, it has a set of candidate solutions which is called swam similar to population in genetic algorithm (GA). Each candidate solution is represented by a particle which also has its corresponding personnel best solution. In PSO each particle would be changed according to its personnel best solution and the global best solution, which is the best solution in swam. PSO was developed to be used in multi-objective optimization. The multiobjective particle swarm optimization (MOPSO) was originally developed in by embedding the Pareto domination [12]. By Pareto domination, non-dominated solutions are represented as the best solutions where the number the non-dominated solutions are usually more than one. The repository, a set containing the non-dominated solutions, is represented for the global best solutions in multi-objective optimization. In MOPSO each particle will be changed by its personnel best solution and one is randomly picked from the repository. To incorporate our proposed strategy into MOPSO, a predetermined set of speciesstructure (SS) is identified. For simplicity, a geometric series having first term equals to 1 and common ratio equals to 2 is employed for determining SS. A number of species (NS) is computed by dividing number of problem variables (N) by a predetermined SS, or N S = N /SS. At initial stage, algorithm randomly selects a speciesstructure from SS. Then variables are assigned chronologically from the first species to the last species. In the other words, variables will be randomly selected and placed into the first species, and then repeating this process until number of assigned variables meets the selected SS. Then, a variable assignment process shifts to the second species and so forth. It is worth noting that, if there is a fraction from computation of N /SS, the last species will not be fully utilized compared to other previous species. Our setting rule is that, if the empty space of species-structure is less than 50% of selected value of SS, this considered species will be used in the next process, although it is not completely filled up. In contrast, if the empty space is greater than 50%, all variables placed in this species will be moved and combined into the previous species. The underlying idea of the proposed method is to allow algorithm to experiment the diverse species-structure varying from smallest to largest structure. Besides, the species-structure is predetermined using a simple method compared to other dynamic methods that require high-level information from a user. In this scheme, to guarantee that all species-structures will be used during the course of algorithm run, the value of species-structure is selected from a set of SS without replacement.
542
K. Boonlong and K. Suksonghong
In the dynamic process, changing the species-structure depends on the comparison result of fitness functions between solutions obtained from the current and the previous iterations. If current solutions are fitter than previous ones, the employed species-structure will be repeatedly utilized in the next iterations; else, the new species-structure will be adopted. We used the convergence detection as a trigger criterion for adjusting of species-structure. Given Ai and Ai−1 are non-dominated solution sets of the current iteration and previous iteration, respectively. The condition for solutions convergence is proposed as: C(Ai , Ai−1 ) ≤ C(Ai−1 , Ai )
(1)
where C(Ai , Ai−1 ) is coverage ratio of solution set Ai over solution set Ai−1 while C(Ai−1 , Ai ) is the reverse value of C(Ai , Ai−1 ). The solution coverage, C, being used to assess two sets of solutions, can be stated as: C(Ai , Ai−1 ) =
|{ai−1 ∈ Ai−1 ; ∃ai ∈ Ai : ai ≺ ai−1 }| |Ai−1 |
(2)
where ai ≺ ai−1 indicates that solution ai covers or dominates solution ai−1 where C(Ai , Ai−1 ) ∈ [0, 1]. If C(Ai , Ai−1 ) = 1, all solutions in set Ai−1 are covered by those in set Ai . Meanwhile, C(Ai , Ai−1 ) = 0 indicates that none of solutions in set Ai−1 are dominated by those of set Ai . Therefore, C(Ai , Ai−1 ) > C(Ai−1 , Ai ) means that solutions obtained from the current iteration is fitter than those of previous iteration. As a result, C(Ai , Ai−1 ) ≤ C(Ai−1 , Ai ) triggers the condition of solutions convergence and thus activates the dynamic mechanism to modify the species-structure. According to cooperative coevolutionary concept, in order to evaluate an interested species, the fitness of a solution will be calculated. Since one species contains only a part of a full solution, it must be combined with other species, the so-called “collaborator”, to form a complete solution. In the literature, it is acknowledged that CCA performance also depends on the adopted technique for selecting the collaborators. In this study, we utilize a selection of collaborator that equips with the elitism strategy. Figure 1 illustrates the process of the adopted collaborator selection technique. According to Fig. 1, we consider an optimization problem with four variables. The species-structure is randomly selected and the resulting value is one. At the end of iteration, PSO attains a set of non-dominated solutions and stores them in repository. Suppose that species 1 that is filled up with variable x1 is evaluated, CCA selects collaborators from the current archive. To form the complete solutions, CCA merges collaborators with evaluated species with random order. This scheme helps protecting solutions to be trapped in the local optima because random placing collaborator promotes CCA to explore the new search space. The procedure of the proposed dynamic species-structure cooperative coevolutionary PSO, denoted as “DCCPSO”, is illustrated in Fig. 2.
Enhancing PSO for Dealing with Large Data Dimensionality …
543
Fig. 1 Collaborator selection technique
3 Experimental Setting The scalable test problems DTLZ problems [13] whose number of optimized objective can be adjusted are used as the experimental cases. To test our main objectives, in problem setting, we consider 3–6 optimized objectives and 1024 decision variables. In DTLZ1, the true Pareto-optimal solutions have the objective function values laying on the linear hyper-plane. The search space contains many local Pareto optimal fronts that could make an optimization algorithm to be trapped before reaching the global Pareto optimal front. Among the DTLZ problems, DTLZ2 is the simplest one having sphered true Pareto optimal front. This problem can also be used to investigate an algorithm’s ability in dealing with a large number of objectives problem. DTLZ3 has many local optimal regions exhibited within the objective space which places difficulty to the optimization algorithm. Similarly, DTLZ3 has sphere true Pareto optimal front as DTLZ2. The test problem DTLZ4 is proposed for investigating the performance of the optimization algorithm in achieving the well-distributed set of solutions. Once again, DTLZ4 has sphere true Pareto optimal front as DTLZ2 and DTLZ3. For DTLZ5,vthe true Pareto optimal front is presented as a curve. The Interaction between decision variables had been introduced into this problem. Similarly, DTLZ6 has a curve true Pareto optimal front as DTLZ5. In addition, for DTLZ6, there is low density near the true Pareto optimal front in objective space causing difficulty to the optimization algorithm to reach the front. Lastly, DTLZ7 has a disconnected set of Pareto-optimal regions. This test problem has 2M −1 disconnected Paretooptimal regions in the search space where M is the number of optimized objectives. The parameter setting for DCCPSO is summarized and reported in Table 1.
544
Fig. 2 Procedure of the proposed DCCPSO
K. Boonlong and K. Suksonghong
Enhancing PSO for Dealing with Large Data Dimensionality … Table 1 Parameter settings
545
Parameter
Setting and values
Test problems
DTLZ problems with 3–6 objectives
Chromosome coding
Real-value chromosome with 1024 decision variables.
Population size
100
Repository size
100
Archive size for DCCPSO
100
Number of generations used for termination condition
1000
4 Simulation Results This study employed the average differences of objective vectors of to the true Pareto front (M 1 ) [14] and the hypervolume (HV) [15] as performance measures. To compute M 1 , the difference of objective vector of a solution i in repository to the true Pareto front (d i ) is the Euclidean distance between the objective vectors of solution to the nearest objective vector of a solution on the true Pareto front. M 1 is equal to the average of di of all solution in repository. The other criterion, hyper volume (HV) can be used to measure not only the difference of the solutions and the true Pareto front but also the variety of solutions. High value of HV reflects good performance of an algorithm. The HV can be referred to area, volume, and hypervolume, for two, three, and four-or-more objectives, respectively, between a pre-defined reference point and the solution to be evaluated. Tables 2, 3, 4, 5 exhibit results of M 1 with different test prob-lems, while the results of HV are reported in Tables 6, 7, 8, 9. Overall, according to results reported in Tables 2, 3, 4, 5, 6, 7, 8, 9, the proposed strategy is successful in boosting up the performance of DCCPSO since it outperforms the standard MOPSO, regardless of performance measure as well as test problem em-ployed. Considering results presented in Tables 2 and 6, based on DTLZ1 Table 2 Results of s of DTLZ problems with 3 objectives Test problems
MOPSO Mean
DCCPSO Std. dev.
Mean
Std. dev.
DTLZ1
43,337.16
657.43
19,581.10
DTLZ2
66.31
5.04
4.30
833.74 1.15
DTLZ3
72,007.67
9727.25
28,342.65
2603.05
DTLZ4
104.14
14.51
33.03
2.13
DTLZ5
110.12
16.86
8.79
2.32
DTLZ6
803.91
17.17
381.41
26.93
DTLZ7
5.60
1.78
0.91
0.32
546
K. Boonlong and K. Suksonghong
Table 3 Results of M 1 of DTLZ problems with 4 objectives Test problems
MOPSO Mean
DCCPSO Std. dev.
Mean
Std. dev.
DTLZ1
43,715.34
748.53
22,453.74
DTLZ2
121.65
3.60
10.65
867.75 2.41
DTLZ3
86,942.95
2603.41
34,141.94
4473.11
DTLZ4
113.52
7.11
40.99
7.12
DTLZ5
119.64
8.85
43.06
7.26
DTLZ6
821.06
15.32
587.78
44.11
DTLZ7
15.52
2.25
3.18
0.79
Table 4 Resuts for M 1 of DTLZ problems with 5 objectives Test problems
MOPSO
DTLZ1
43,769.82
Mean
DCCPSO Std. dev. 715.71
Mean 23,172.22
Std. dev. 772.87
DTLZ2
122.75
4.32
19.84
5.69
DTLZ3
87,854.46
3910.63
34,259.48
3829.82
DTLZ4
116.59
12.00
46.06
6.26
DTLZ5
122.79
4.88
48.40
8.07
DTLZ6
819.71
9.19
510.70
52.21
DTLZ7
22.14
3.97
4.68
0.88
Table 5 Results for M 1 for DTLZ problems with 6 objectives Test problems
MOPSO
DCCPSO
Mean
Std. dev.
Mean
DTLZ1
44,043.65
1063.36
22,809.98
Std. dev. 869.98
DTLZ2
124.28
4.22
16.75
6.01
DTLZ3
87,370.17
2814.89
35,106.89
4238.34
DTLZ4
121.01
8.00
53.51
10.25
DTLZ5
121.95
3.95
49.14
10.46
DTLZ6
821.24
17.43
516.18
56.84
DTLZ7
27.64
3.65
7.61
1.82
and DTLZ3, DCCPSO has superior ability for searching many local optimal points in both linear hyper-plane and sphere Pareto front. Besides, DCCPSO performs well in the large number of optimized objectives environment since its performance measures of DTLZ2 is better than that of MOPSO. Similarly, DCCPSO is able to
Enhancing PSO for Dealing with Large Data Dimensionality …
547
Table 6 Results of HV of DTLZ problems with 3 objectives Test problems
MOPSO
DCCPSO
Mean
Std. dev.
Mean
Std. dev.
DTLZ1
0.7555
0.0237
0.9780
0.0198
DTLZ2
0.5229
0.0351
0.9998
0.0112
DTLZ3
0.5091
0.0499
0.9695
0.0316
DTLZ4
0.6239
0.0284
0.9868
0.0395
DTLZ5
0.6701
0.0457
0.9992
0.0065
DTLZ6
0.4199
0.0169
0.9392
0.0304
DTLZ7
0.0854
0.0098
0.5051
0.0798
Table 7 Results of HV of DTLZ problems with 4 objectives Test problems
MOPSO
DCCPSO
Mean
Std. dev.
Mean
Std. dev.
DTLZ1
0.8860
0.2015
0.9826
0.0508
DTLZ2
0.5283
0.0787
0.9841
0.0938
DTLZ3
0.5362
0.0722
0.9274
0.0927
DTLZ4
0.7493
0.1210
0.9406
0.0940
DTLZ5
0.5897
0.0887
0.9675
0.0969
DTLZ6
0.5204
0.0713
0.7637
0.0761
DTLZ7
0.0858
0.0105
0.5305
0.0808
Table 8 Results of HV of DTLZ problems with 5 objectives Test problems
MOPSO
DCCPSO
Mean
Std. dev.
Mean
Std. dev.
DTLZ1
0.8862
0.1080
0.9937
0.0508
DTLZ2
0.6278
0.0752
0.9921
0.0592
DTLZ3
0.5578
0.0598
0.9282
0.0928
DTLZ4
0.7258
0.0959
0.9351
0.0936
DTLZ5
0.5954
0.0885
0.9344
0.0840
DTLZ6
0.5499
0.0716
0.8188
0.0588
DTLZ7
0.0674
0.0093
0.5305
0.0808
achieve well-distributed solution in the sphere Pareto front as tested by DTLZ4 and, one again, outperforms standard MOPSO. As mentioned earlier, one of our main goals of the proposed strategy is to enable PSO to effectively deal with non-separable problem. Results from DTLZ5 and DTLZ6, which introduce degree of interdependency among variables, reveal that
548
K. Boonlong and K. Suksonghong
Table 9 Results of HV of DTLZ problems with 6 objectives Test problems
MOPSO
DCCPSO
Mean
Std. dev.
Mean
Std. dev.
DTLZ1
0.9049
0.1841
0.9859
0.0947
DTLZ2
0.5824
0.0650
0.9913
0.0777
DTLZ3
0.5827
0.0707
0.9125
0.0912
DTLZ4
0.7973
0.1284
0.9135
0.0583
DTLZ5
0.5806
0.0663
0.9117
0.0933
DTLZ6
0.5391
0.0577
0.7580
0.0727
DTLZ7
0.0529
0.0130
0.4221
0.1317
our proposed dynamic solution decomposition strategy is capable to effectively deal with variable coupling problem. The DCCPSO performs much better than standard MOPSO in this aspect. Lastly, DCCPSO also outperforms MOPSO in the aspect of searching for disconnected true Pareto front as reported in the results of DTLZ7. In this paper, the proposed DCCPSO is tested within the large problem dimensionality environment by setting large number of decision variables, i.e. 1024 variables, regardless of test problem. The proposed dynamic problem decomposition strategy reveals its superior capability in this sense, regardless of difficulties in searching true Pareto front. It can be observed that, by fixing number of 1024 decision variables, once optimized objective increases, DCCPSO’s performances are slightly worse. However, its performances are superior compared to those of standard MOPSO.
5 Conclusion This study contributes to the literature in the sense that the dynamic species-structure is proposed to improve search ability of CCA by preventing CCA from getting trapped in the local optimal and by overcoming issues related to interrelation among variables. The propose strategy allows species-structure to be adjusted throughout the course of optimization. This promotes probability of interrelated variables will be placed in the similar species. Besides, with the proposed convergence detection method, our strategy encourages species-structure that performs well to be continuously used in the subsequently generation. The experimented results reveal that the proposed DCCPSO outperforms the standard MOPSO, regardless of performance measure, test cases, as well as the number of optimized objective. Results from solving DTLZ1 and DTLZ3 show that DCCPSO is capable to obtain the true Pareto front with different front shape. Its superiority in dealing with non-separable problems is clearly presented when solving DTLZ4 and it is even more pronounced in the case of solving DTLZ5 whose interrelation problem is stronger than that of DTLZ4.
Enhancing PSO for Dealing with Large Data Dimensionality …
549
References 1. M.A. Potter, K.A. De Jong, A cooperative coevolutionary approach to function optimization, in Parallel Problem Solving from Nature — PPSN III: International Conference on Evolutionary Computation The Third Conference on Parallel Problem Solving from Nature Jerusalem, Israel, October 9–14, 1994 Proceedings, ed. by Y. Davidor, H.-P. Schwefel, R. Männer (Springer, Berlin Heidelberg, Berlin, Heidelberg, 1994), pp. 249–257 2. M.A. Potter, K.A. de Jong, Cooperative coevolution: an architecture for evolving coadapted subcomponents. Evol. Comput. 8(1), 1–29 (2000) 3. Y.-j. Shi, H.-f. Teng, Z.-q. Li, Cooperative co-evolutionary differential evolution for function optimization, in Advances in Natural Computation: First International Conference, ICNC 2005, Changsha, China, August 27–29, 2005, Proceedings Part II, ed. by L. Wang, K. Chen, Y.S. Ong (Springer Berlin Heidelberg, Berlin, Heidelberg, 2005) pp. 1080–1088 4. K. Boonlong, Vibration-based damage detection in beams by cooperative coevolutionary genetic algorithm. Adv Mech Eng. 6(Article ID 624949): 13 (2014) 5. J.C.M. Diniz, F. Da Ros, E.P. da Silva, R.T. Jones, D. Zibar, Optimization of DP-M-QAM transmitter using cooperative coevolutionary genetic algorithm. J. Lightwave Technol. 36(12), 2450–2462 (2018) 6. Z. Ren, Y. Liang, A. Zhang, Y. Yang, Z. Feng, L. Wang, Boosting cooperative coevolution for large scale optimization with a fine-grained computation resource allocation strategy. IEEE Trans. Cybernetics. 49(2), 1–14 (2019) 7. A. Pahlavanhoseini, M.S. Sepasian, Scenario-based planning of fast charging stations considering network reconfiguration using cooperative coevolutionary approach. J. Energy Storage. 23, 544–557 (2019) 8. F.v.d. Bergh, A. P. Engelbrecht, A Cooperative approach to particle swarm optimization. IEEE Trans. Evol. Comput. 8, 225–239 (2004) 9. Z. Yang, K. Tang, X. Yao, Large scale evolutionary optimization using cooperative coevolution. Inf. Sci. 178, 2985–2999 (2008) 10. X.Y. Zhang, Y.J. Gong, Y. Lin, J. Zhang, S. Kwong, J. Zhang, Dynamic cooperative coevolution for large scale optimization. IEEE Evol. Comput. 14 (in-press) 11. J. Kennedy, R. Eberhart, Particle swarm optimization, in Proceedings of IEEE International Conference on Neural Networks, vol 4. (1995), pp. 1942–1948 12. C.A.C. Coello, G.T. Pulido, M.S. Lechuga, Handling multiple objectives with particle swarm optimization. IEEE Tran. Evol. Comput. 8, 256–279 (2004) 13. K. Deb, L. Thiele, M. Laumanns, E. Zitzler, Scalable test problems for evolutionary multiobjective optimization, in EMO, AIKP, ed. by A. Abraham, L. Jain, R. Goldberg (Springer, Berlin Heidelberg 2005), pp. 105–145 14. E. Zitzler, K. Deb, L. Thiele, Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8, 173–195 (2000) 15. E. Zitzler, L. Thiele, Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach. IEEE Evol. Comput. 3, 257–271 (1999)
A New Encoded Scheme GA for Solving Portfolio Optimization Problems in the Big Data Environment Karoon Suksonghong and Kittipong Boonlong
Abstract Working with a big data environment, it is inevitably dealing with the large search space which requires lots of computation resources. This paper proposes a new solution encoding method that helps to enhance the genetic algorithm (GA) performances. The proposed mixed integer-real value chromosome is specifically designed for solving large-scale optimization problems whose optimal solution consists of only a few selected variables. In the experiment setting, a bi-criterion portfolio optimization problem is transformed and solved within the single-objective optimization framework. Besides, the proposed encoding method also allows GA to handle the practical cardinality constraint in an efficient manner. Further, our new encoding scheme does not require any additional development on the evolutionary operator. Decision-maker is free to adopt any standard exiting crossover and mutation operator that is already well established. The simulation results reveal that the proposed helps improving GA performance in both exploitation and exploration tasks. Keywords Chromosome encoding · Cardinality constraint · Portfolio optimization · Integer value coding · Genetic algorithm
1 Introduction In the field of finance, portfolio optimization is considered one of the most recognized multiple criteria decision making (MCDM) problem which aims to solve for the optimal results in the complex scenarios including various practical constraints, completing and conflicting objectives and criteria. According to portfolio selection theory [1], an investor makes decisions on capital allocation to available investment assets in order to simultaneously maximize expected return and minimized K. Suksonghong (B) Faculty of Management and Tourism, Burapha University, Mueang Chonburi 20130, Thailand e-mail: [email protected] K. Boonlong (B) Faculty of Engineering, Burapha University, Mueang Chonburi 20130, Thailand e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_50
551
552
K. Suksonghong and K. Boonlong
risk, measured by the variance of returns, with respect to several constraints. From MCDM perspective, the optimal portfolio for an investor can be obtained by firstly optimizing this bi-criterion optimization problem subject to a set of constraints, and then selecting a single optimal solution, based upon the other higher level of information, from the obtained set of non-dominated solutions. Alternatively, this problem can be transformed into a single-objective counterpart by optimizing the “Sharpe ratio” [2] which is a measure of expected return per unit of risk. This approach allows investors to make the optimal choice without considering other higher levels of information, such as risk preference and risk tolerance, by selecting the best scalar vector among all possible solutions. Theoretically, increasing number of invested assets within a portfolio help reducing portfolio risk since a firm’s specific risk will be diversified away. In practice, however, the majority of investors tend to lessen the number of invested assets within their portfolio in order to balance between diversification benefit and monitoring and transaction costs [3]. This practical aspect introduces the so-called “cardinality constraint” to the portfolio optimization problem. Solving the cardinality constraint portfolio optimization problem, hereafter “CCPOP”, in the big data environment raises challenges to evolutionary-based algorithms, especially in the computation resources utilization perspective [4]. Considering a situation of optimizing CCPOP with thousands of available investment choices and only a few assets will be selected into portfolio, computation resources will be utilized unnecessarily to explore large search space due to a large number of possible solutions. This paper proposes a new solution encoding method for the chromosome representation process of the genetic algorithm (GA). The proposed scheme helps to enhance GA performance since it helps reducing search space as well as handling cardinality constraints at the same time. In our approach, a solution vector is encoded with the combination between integer and real number which helps reducing search space better than other approaches [5, 6]. In addition, our mixed integer-real chromosome tends to balance well between exploiting and exploring tasks of GA. Further, several modified processes during the chromosome representation stage allow user to employ the standard crossover and mutation methods rather than requiring newly developed operators for the new encoding scheme. This paper is organized as follows. The proposed encoding method together with several prerequisite solution representation processes is discussed in Sect. 2. Section 3 explains the formulation of a portfolio optimization problem with cardinality constraint. The simulation results are presented in Sect. 4, while our conclusions are stated in Sect. 5.
2 The Proposed Encoding Method Solving a portfolio optimization problem with cardinality constraint enforces GA to perform in the unnecessary large search space because the optimal solution contains only small selected variables. Using the conventional chromosome encoding method, lots of computation resources are needed for performing search, repairing solutions
A New Encoded Scheme GA for Solving Portfolio …
553
to handle constraints, and evolving solutions to obtain a better solution. Within the big data context, therefore, it discourages GA to achieve the optima since not only solution convergence is slow, but also poorly balance between exploiting and exploring tasks. The main goal of the proposed encoding method is to manage search space and handle constraint, instantaneously. In our method, the mixed integer-real number is used for encoding to the fixedsize chromosome that allows limiting the number of selected investment assets. As a result, search space becomes smaller which allows GA to perform better. For a numerical example, considering a situation where an investor is constructing the optimal portfolio among assets listed in the US NASDAQ index consisting of 2196 investment choices and limiting her investment allocation to 5 assets, number of possible solution for the conventional chromosome encoding method is 10002196 or 106588 solutions, i.e., each chromosome is encoded with a real number with 3 decimal digits. Meanwhile, with cardinality constraint of 5 assets, our proposed method initializes a solution chromosome with 10 species whereby the first 5 species are encoded with integer number identifying asset index to be invested, and the latter 5 species are encoded with a real number representing investment proportion. As a result, searching space is reduced to 2196!/(2196 − 10)!] × 100010 or 5.081031 . Figure 1 illustrates mixed-integer-real encoding chromosome with 10 species for handling 5-asset cardinality constraint. In addition to lessen search space, exploitation and exploration capability of GA is enhanced by incorporating a few prerequisite processes before performing evolutionary operators. First, to preserve asset index that is selected repeatedly by the majority of solutions, the integer value species and its corresponding real value species will be rearranged by placing the repeatedly chosen asset index at the very first place of chromosome. Then, the rest of species containing dissimilar asset index, together with their corresponding investment proportion, will be randomly placed into solution chromosome. The first process ensures that a good asset index will not be evolved which eventually promotes the speed of solution convergence. Meanwhile, the second process stimulates GA to explore new areas of search space. In fact, our proposed encoding scheme has restrictions in the sense that the integer number cannot be repeated within a single solution chromosome. If there is a happened case, the embedded repairing mechanism will be activated by replacing the repeated asset index by a newly random asset index together with its corresponding real
Fig. 1 Mixed integer and real encoding
554
K. Suksonghong and K. Boonlong
number value. Subsequently, the standard GA operators, namely, fitness evaluation, crossover, and mutation, can be performed. Regarding evolutionary operators, we employ the standard methods widely used in the literature with slightly modifications. For the crossover process, a solution chromosome is divided into two parts, including, integer and real value species which will be evolved separately. The one-point crossover is employed on the integer value part representing the asset index, whereas, the simulated binary crossover (SBX) [7] is applied to the real value part. Figure 2 demonstrates an example of prerequisite processes explained earlier with the case of mixed-integer and real encoding with cardinality constraint of five invested assets. It can be observed that the crossover process is also graphically explained in Fig. 2 where the one-point crossover is performed at species 4 of the integer value onwards and SBX is applied to the entire real value species. For the mutation method, the integer and real value species are also partitioned for performing mutation separately. In this paper, standard technique as the simulatedbinary crossover and variable-wise polynomial mutation are adopted. Figure 3 demonstrates mutation procedures for similar case explained above. Additionally, the situation where the embedded repairing mechanism will be activated is incorporated. It is clear from Fig. 3 that integer value species show the identical asset index after the mutation process adopted, i.e., asset index 1395 is selected twice. As explained earlier, the embedded repairing mechanism replaces this species by the newly random asset index, i.e., asset index 1486, then integer and real value species can be combined in order to subsequently performing fitness evaluation.
3 Problem Formulation Considering the portfolio selection process where N investment assets are available and decision-maker determines the proportion of investment that will be allocated to any available investment assets. Based upon our previous work [8], the standard portfolio optimization problem can be formulated by assigning x, representing a portfolio solution, as a N × 1 vector of investment allocation ratio to N available assets. Let vector R with size N × 1 represents the expected returns of each of N investment choices. Matrix is a N × N representing the variance-covariance matrix. According to Markowitz’s mean-variance portfolio theory, investors prefer a portfolio that offers a high level of expected returns and exhibits a low level of risk. Suppose that the expected return and variance of a portfolio are denoted by Rp (x) and V p (x), respectively. Two objectives of the portfolio optimization problem can be expressed as follows: Maximize R p (x) = xT R =
N i=1
x i Ri
(1)
A New Encoded Scheme GA for Solving Portfolio …
555
1913
994
1801
705
1395
0.143
0.241
0.222
0.076
0.318
1983
1913
1137
1395
180
0.308
0.140
0.060
0.161
0.331
Rearranged 1913
1395
994
1801
705
0.143
0.318
0.241
0.222
0.076
1913
1395
1983
1137
180
0.140
0.161
0.308
0.060
0.331
To be randomly positioned 1913
1395
705
994
1801
0.143
0.318
0.076
0.241
0.222
1913
1395
1137
180
1983
0.140
0.161
0.060
0.331
0.308
Separate integers and reals 1913
1395
705
994
1801
0.143
0.318
0.076
0.241
0.222
1913
1395
1137
180
1983
0.140
0.161
0.060
0.331
0.308
Integer coded crossover
Real coded crossover
1913
1395
705
180
1983
0.152
0.134
0.088
0.264
0.328
1913
1395
1137
994
1801
0.131
0.291
0.048
0.308
0.202
Crossover site
Combined
1913
1395
705
180
1983
0.152
0.134
0.088
0.264
0.328
1913
1395
1137
994
1801
0.131
0.291
0.048
0.308
0.202
Fig. 2 Prerequisite processes of chromosome representation and example of crossover
Minimize V p (x) = xT x =
N
xi x j σi, j
(2)
i=1
where xT is the transpose of vector x and xi is the proportion of investment allocated to security i. Ri is the expected return of asset i and σi, j is the covariance between asset i and j. The cardinality constraint portfolio optimization problem (CCPOP) can
556
K. Suksonghong and K. Boonlong 1913
1395
705
180
1983
0.152
0.134
0.088
0.264
0.328
Separate integers and reals 1913
1395
705
180
1983
0.152
0.134
0.088
180
0.328
Real coded mutation
Integer coded mutation 1395
0.264
1913
1395
1983
1913
Index 1395 is selected twice, embedded repair mechanism is activated. 1395 1486 180 1983
0.152
0.134
0.093
0.243
0.328
Normalize 0.160
0.141
0.098
0.256
0.345
Summation is equal to one
1913
1395
1486
Combine 1983 0.161
180
0.142
0.093
0.257
0.347
Fig. 3 Mutation operation on the mixed integer-real encoded chromosome
be transformed into a single objective model as follows: Prob.1 Subject to
min F(x) = −Sharpe Ratio p (x) x
N
xi = 1
i=1
xi ≥ 0 N qi = K max
i=1
where Sharpe Ratio p (x) is the maximum criterion that represents the expected return per a unit of risk, i.e., R p (x)/V p (x). The first constraint is applied to make sure that all investment is fully allocated to available assets. The second constraint implies that the short selling is not allowed. For the third constraint, K max is the maximum number of assets can be held in portfolio. qi ≥ 0 for xi ≥ 0 and qi = 0 for xi ≥ 0. Thus, by optimizing F(x) in Prob. 1, the optimal solutions of CCPOP can be obtained within a single of algorithm run.
4 Simulation Results The test problems are quoted from http://w3.uniroma1.it/Tardella/datasets.html which is publicly available. These data sets contain the historical weekly data of stocks listed in the EuroStoxx50 (Europe), FTSE100 (UK), MIBTEL (Italy), S&P500 (USA), and NASDAQ (USA) capital market indices during March 2003 and March
A New Encoded Scheme GA for Solving Portfolio …
557
2007. The data set of the US S&P500 index and NASDAQ index is used for testing our proposed method. There are 476 and 2196 available investment assets listed in S&P500 and NASDAQ index, respectively. Cardinality constraint limiting the number of invested assets is set as 4, 5, 6, 8, and 10. For the algorithm setting aspect, population size is 100, while the number of elite individuals is 2. The fitness scaling [ref] is also used in this paper with the scaling factor of 2. Stochastic universal sampling selection (SUS) is used for the selection of parent individuals. Simulatedbinary crossover and Variable-wise polynomial mutation are used for real numbers in both real encoding and mixed integer and real encoding. One-point crossover and Bit-flipped mutation are employed on the integer value chromosome. All parameter settings of GA are summarized and reported in Table 1. As mentioned earlier, the main goal of the proposed encoding scheme is not only to lessen the use of unnecessary computation resources by reducing search space but also balancing between exploitation and exploration tasks of GA. Table 2 highlights the superiority in the first aspect by revealing that, regardless of type of time spent, the computation time of GA with the proposed encoding method is very much shorter than that with the standard encoding method. In fact, to conserve space, we report only computation time of GA with standard encoding for solving POP without cardinality constraint which supposed to be faster than that for solving CCPOP. Nevertheless, it is still much slower than the proposed encoded GA in solving CCPOP. Similarly, Table 1 Parameter settings of experimented GA Parameter
Setting and values
Test problems
US S&P500 index, and US NASDAQ index
Chromosome coding
Real-value chromosome with N decision variables for real encoding, mixed integer and real chromosome with K = 4, 5, 6, 8, 10
Population size
100
Number of elite individuals
2
Scaling factor
2.0
Selection method
Stochastic universal sampling selection
Crossover method – Crossover probability – Real encoding – Integer encoding
1.0 Simulated-binary crossover (ηc = 15) [7] One-point crossover
Mutation method – Mutation probability – Real encoding – Integer encoding
0.025 Variable-wise polynomial mutation (ηm = 20) [7] Bit-flipped mutation
Number of generations used for termination condition
1000
558
K. Suksonghong and K. Boonlong
Table 2 Algorithm run time for the US NASDAQ problem Encoding scheme Standard real-encoding Proposed mixed integer-real encoding
K =4
Algorithm time (s)
Objective calculation time (s)
Total time (s)
417.85
6035.38
6453.23
5.96
1.29
7.25
K =5
6.51
1.45
7.96
K =6
6.57
1.52
8.09
K =8
7.37
1.79
9.16
K = 10
8.06
2.06
10.12
Figs. 4 and 5 reveal that by lessen search space; solutions of the proposed encoded GA converged much faster than those of standard GA. Considering the optimal solution, finance theory suggests that the higher the number of assets held in the portfolio, the lower the risk of investment. In other words, a portfolio consisting of a large number of investment assets tends to have small risks, then exhibiting a high value of Sharpe ratio. Tables 3 and 4 report the objective value obtained from the standard GA and proposed encoded GA with different problem scales. As explained above, according to finance theory, the objective value (Sharpe ratio) of unconstrained POP should be greater than that of constraint one if solutions are obtained from a similar method. Our results reported in Table 3 reveal that objective value obtained from standard GA for solving unconstrained POP is lower than the objective value obtained from our proposed encoded GA, although it is employed for solving constrained POP. This superiority is more pronounced when dealing with
Fig. 4 Solution convergence for the US S&P 500 problem
A New Encoded Scheme GA for Solving Portfolio …
559
Fig. 5 Solution convergence for the US NASDAQ problem Table 3 Computation results of the US S&P problem Encoding scheme
Standard real-encoding Proposed mixed integer-real encoding
Objective value
Number of nonzero assets
Average
SD
Average
SD
24.8757
0.7365
473.70
4.57
K =4
26.9520
0.5653
4.00
0.00
K =5
30.4843
0.2196
5.00
0.00
K =6
32.2547
0.0946
6.00
0.00
K =8
35.2157
0.0935
8.00
0.00
K = 10
36.8051
0.6258
10.00
0.00
Table 4 Computation results of the US NASDAQ problem Encoding scheme
Standard real-encoding Proposed mixed integer-real encoding
Objective value
Number of nonzero assets
Average
SD
Average
SD
19.7013
1.7249
1338.13
99.91
K =4
37.3658
0.3015
4.00
0.00
K =5
43.6956
0.5735
5.00
0.00
K =6
47.9927
0.6883
6.00
0.00
K =8
52.1254
0.5891
8.00
0.00
K = 10
54.6933
0.9475
10.00
0.00
560
K. Suksonghong and K. Boonlong
a big data environment. Table 4 exhibits that our proposed encoded GA outperforms remarkably regardless of cardinality constraint values.
5 Conclusion In this paper, the mixed integer-real chromosome representation is proposed for GA encoding processes. The main objective of the proposed method is to lessen search space and to balance between exploitation and exploration task of GA. As a result, GA adopting our new encoding scheme is anticipated to exhibit superior performance in the aspect of computation resources spent as well as the fitness of the solution. Optimizing portfolio in the big data environment, our proposed encoded GA shows its capability through the faster computation run time and faster solution convergence compared to the GA with conventional encoding technique. In addition, the proposed encoded GA ensures its ability in balancing between exploiting and exploring tasks by reporting better fitness solutions compared to those of classical GA. Conflict of Interest The authors declare that they have no conflict of interest.
References 1. H. Markowitz, Portfolio selection. J. Finance 7, 77–91 (1952) 2. W.F. Sharpe, Capital asset prices: a theory of market equilibrium under conditions of risk. J. Finance 19, 425–442 (1964) 3. K. Suksonghong, K. Boonlong, K.-L. Goh, Multi-objective genetic algorithms for solving portfolio optimization problems in the electricity market. Int. J. Elec. Power 58, 150–159 (2014) 4. K. Suksonghong, K. Boonlong, Particle swarm optimization with winning score assignment for multi-objective portfolio optimization, in Simulated evolution and learning: 11th international conference, SEAL 2017, Shenzhen, China, November 10–13, 2017, proceedings, ed. by Y. Shi, K.C. Tan, M. Zhang, K. Tang, X. Li, Q. Zhang, Y. Tan, M. Middendorf, Y. Jin (Springer International Publishing, Cham, 2017), pp. 1003–1015 5. K. Liagkouras, K. Metaxiotis, A new efficiently encoded multiobjective algorithm for the solution of the cardinality constrained portfolio optimization problem. Ann. Oper. Res. 267, 281–319 (2018) 6. F. Streichert, H. Ulmer, A. Zell, Evaluating a hybrid encoding and three crossover operators on the constrained portfolio selection problem, in Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No. 04TH8753) (IEEE, 2004), pp. 932–939 7. K. Deb, Multi-objective optimization using evolutionary algorithms (John Wiley & Sons Inc., New York, 2001) 8. K. Suksonghong, K. Boonlong, Multi-objective cooperative coevolutionary algorithm with dynamic species-size strategy, in Applications of Evolutionary Computation. EvoApplications 2018. Lecture Notes in Computer Science, vol. 10784, eds by K. Sim, P. Kaufmann (Springer, Cham, 2018)
Multistage Search for Performance Enhancement of Ant Colony Optimization in Randomly Generated Road Profile Identification Using a Quarter Vehicle Vibration Responses Kittikon Chantarattanakamol and Kittipong Boonlong Abstract In general, roughness of road affects vehicle performance by contributing vibration to the vehicle. In road maintenance, road profile can be used to detect damaged region of the road. Therefore, it is important to identify the road profile. Actually, the road profile can be evaluated by vibration response of the vehicle. In fact, the detection of road profile can be formulated as an optimization problem. Ant colony optimization (ACO) is used in optimization of the formulated optimization problem. In order to enhance performance of ACO in the road profile identification, multistage search (MS) is proposed. In the MS, decision variables to be optimized are divided into a number of parts. Each part is evolved as ACO process separately from other parts of the variables. There four classes of randomly generated road are used in test cases of the investigation. From the simulation runs, the MS can enhance performance of ACO. Keywords Optimization · Ant colony optimization · Road profile identification
1 Introduction Vibration caused by road profile affects the performance of a vehicle. In fact, the road profile could be evaluated by vibration signal measured at significant positions on the vehicle [1, 2]. The road profile identification could be formulated as an optimization problem in which corresponding objective function is the numerical difference between vehicle vibration responses due to actual road profile and that due to predicted road profile. The optimization methods are then required. It is difficult to obtain functional derivative of the formulated objective. The derivative-free methods are suitable for optimization in the road profile identification such as [1, 3]. K. Chantarattanakamol · K. Boonlong (B) Faculty of Engineering, Burapha University, Mueang Chonburi 20130, Thailand e-mail: [email protected] K. Chantarattanakamol e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_51
561
562
K. Chantarattanakamol and K. Boonlong
Ant colony optimization (ACO) [4] is a derivative-free method that mimics the behavior of ants in finding the shortest path from their nest to a location of food. ACO was originally proposed for combinatorial optimization such as [5–7]. Thereafter, the real-encoded ACO for continuous domain was proposed by Socha and Dorigo [8]. Consequently, many researches had been successfully applied real-coded ACO in optimization problems with continuous domain such as [9–11]. In the road profile identification, decision variables are represented as the height of road profile at sampled locations. There are a number of decision variables to be optimized. This paper proposes multistage search for ACO in road profile identification using vibration response of a quarter vehicle model. In multistage search, a full solution is divided into a number of solution parts. A part of a full solution is evolved by GA generation in each solution search stage. The multistage search is quite different from cooperative coevolution employed in genetic algorithm (CCGA) [12–15] in which a solution is also divided into a number parts, the so-called species. In CCGA, each species is evolved simultaneously with other species. In the contrast, each solution part in multistage search is evolved separately to other parts. In multistage search, the first part is optimized until its termination condition is satisfied, the second part is started to be optimized, and so on.
2 Road Profile Identification Using a Quarter Vehicle Vibration Responses A quarter vehicle model, two-degree of freedom system, is shown in Fig. 1. In the vehicle model, there are two independent coordinates of the system—vertical displacement of seat mass (zc ) and vertical displacement of sprung mass (zs ). The vehicle is moved directly with constant speed of 20 m/s and excited by the road profile of which length is 100 m. In the model, sprung mass (ms ), tire mass (mt ), equivalent spring constant of suspension (k s ), equivalent damping constant of suspension (cs ), equivalent spring constant of tire (k t ) are 250 kg, 53 kg, 10 kN/m, 200 kN/m, 1000 Ns/m, respectively. The vehicle model is moved onto the randomly generated road profile. The vibration sensor, which is assumed to be a acceleration sensor, is measured at the sprung mass so that the vibration signal is the acceleration of the sprung mass. The road profile identification is based on the fact that different road profile contributes dissimilar vibration response. By using the vibration signal, the road profile identification can be formulated as an optimization problem of which an objective function is numerical difference between vibration signals obtained from an actual road profile and that obtained from a predicted road profile. The decision variables to the problem represent sampled height of the predicted road profile.
Multistage Search for Performance Enhancement of Ant Colony …
563
Vibration signal
zs
ms
20 m/s cs
ks
zt
mt
kt zr
x
Fig. 1 Road profile identification using a quarter vehicle vibration responses
3 Ant Colony Optimization with Multistage Search In ant colony optimization (ACO), there are two sets of solutions—population and archive—are used in which sizes of population and archive are firstly defined. Each variable of a solution in archive is assigned pheromone, an indicator of the preference of the variable in the solution. The pheromone is directly evaluated from objective function of solutions in population. In generation of solutions in population, each variable of the generated solution is created by using the variable from an individual selected from archive. The selection of the individual from archive is based on probabilities evaluated from the assigned pheromone. In order to enhance performance of ACO for the optimization in the road profile identification, multistage search (MS) is proposed. The MS is applied into ACO by dividing decision variables into a number of parts each of which is evolved as ACO process. The evolution of each part in MS is quite different from that in cooperative coevolutionary (CC) concept. Each part in MS is evolved until it is finished once the termination condition is satisfied. Objective function of an individual of any part is calculated by using only the corresponding decision variables and optimized decision variables from the previous parts. For instance, Fig. 2 shows an example of road profile identification using a quarter vehicle vibration responses of eight sampled height road profile. The road profile identification can be then formulated as an optimization problem with eight decision variables. Figure 3 shows the comparison of optimization without and with multistage
564
K. Chantarattanakamol and K. Boonlong
zs
ms
zr zr8
To be optimized
ks
zr7
cs
zr6
zt
mt
zr1
zr2
zr3
zr4
1
2
3
4
zr5
kt zr (a) a Quarter vehicle vibration model
5
7
6
i
8
(b) Eight sampled height road profile
Fig. 2 Road profile identification using a quarter vehicle vibration responses of eight sampled height road profile zr
zr zr8
z*r1, z*r2
zr
Optimization
Op
z zr1 zr2 zr3 r4
zr8
z*r1, z*r2,z*r3,z*r4,z*r5,z*r6,z*r7,z*r8
zr5
1 2
zr6
zr1
z zr2 zr3 r4
3 4
3
4
zr6
z zr1 zr2 zr3 r4
6
6
8
i
zr8
z*r1, z*r2,z*r3,z*r4,z*r5,z*r6 Optimization
5
7
7
8
(a) Optimization without multi stage search
1 2
3 4
zr5
5
zr6
6
7
8
i
zr
z zr1 zr2 zr3 r4
1 2
5
zr7
Optimization
zr
n
tio
iza
tim
zr7
zr5
zr8
z*r1, z*r2,z*r3,z*r4
zr7
zr5
3 4
5
zr6 z zr1 zr2 zr3 r4
6
7
zr8
zr7
i 1 2
z*r1, z*r2,z*r3,z*r4,z*r5,z*r6,z*r7,z*r8 Optimization
zr7
8
i
1 2
3 4
zr5
5
zr6
6
7
8
i
(b) Optimization with multi stage search
Fig. 3 Comparison of optimization without and with multistage search
search for the example in Fig. 2. Without MS, all decision variables are optimized simultaneously as shown in Fig. 3a. On the other hand, in MS, a solution is divided into four parts each of which represents two decision variables as illustrated in Fig. 3b. Staring with part 1, zr1 and zr2 are optimized so that z*r1 and z*r2 are optimized values of height of road profile at sampled time i = 1 and 2. Part 2 is consequently implemented to obtain z*r3 and z*r2 , the optimized values at i = 3 and 4, and so on. The detail description of MS in ACO is shown in Fig. 4. Figure 4 displays ACO with MS in an optimization problem with eight decision variables. Similar to Fig. 3, decision variables, x 1 , x 2 , …, x 8 to be optimized are
Multistage Search for Performance Enhancement of Ant Colony …
0.2084 0.9376 0.4464 0.5855 0.7629 0.4652 0.6138 0.3525 0.1473 0.8443
0.3330 0.8306 0.4606 0.8438 0.7337 0.0440 0.6145 0.1744 0.7651 0.8627
Initial Population 0.0310 0.8596 0.7109 0.4897 0.8324 0.8409 0.6895 0.2928 0.6223 0.7755 0.4459 0.9547 0.3012 0.6695 0.8291 0.1851 0.1763 0.7013 0.0072 0.8348
0.5080 0.9561 0.3985 0.7323 0.4778 0.2467 0.0276 0.8699 0.3287 0.1044
0.5354 0.0119 0.8245 0.3888 0.7495 0.7663 0.2382 0.5900 0.7271 0.8192
565
0.2501 0.9544 0.0211 0.3969 0.5902 0.9254 0.6834 0.4551 0.2100 0.4966
0.4645 0.8075 0.1344 0.1266 0.6287 0.1390 0.8621 0.0857 0.4989 0.3032
Divided into 4 parts
Part 1
0.2084 0.9376 0.4464 0.5855 0.7629 0.4652 0.6138 0.3525 0.1473 0.8443
0.3330 0.8306 0.4606 0.8438 0.7337 0.0440 0.6145 0.1744 0.7651 0.8627
Part 2
0.5080 0.9561 0.3985 0.7323 0.4778 0.2467 0.0276 0.8699 0.3287 0.1044
0.0310 0.7109 0.8324 0.6895 0.6223 0.4459 0.3012 0.8291 0.1763 0.0072
Part 3
0.8596 0.4897 0.8409 0.2928 0.7755 0.9547 0.6695 0.1851 0.7013 0.8348
Part 4
0.5354 0.0119 0.8245 0.3888 0.7495 0.7663 0.2382 0.5900 0.7271 0.8192
0.2501 0.9544 0.0211 0.3969 0.5902 0.9254 0.6834 0.4551 0.2100 0.4966
0.4645 0.8075 0.1344 0.1266 0.6287 0.1390 0.8621 0.0857 0.4989 0.3032
ACO Process
ACO Process
ACO Process
ACO Process
Part 1 finished
Part 2 finished
Part 3 finished
Part 4 finished
Best
0.2711 0.4459 0.6287 0.7165 0.1015 0.4687 0.1833 0.1507 0.2312 0.2079
0.0626 0.7681 0.0665 0.9320 0.8793 0.0025 0.4194 0.7613 0.6938 0.7487
Best
0.7031 0.7112 0.6142 0.3740 0.9101 0.5281 0.9440 0.2676 0.3343 0.3014
0.3947 0.8563 0.5045 0.7470 0.0985 0.4238 0.2166 0.1372 0.3031 0.3313
Best
0.4732 0.6001 0.0356 0.0974 0.1968 0.8755 0.6161 0.6043 0.5566 0.5245
Best
0.0464 0.0512 0.9115 0.9841 0.6925 0.9112 0.2248 0.7677 0.8683 0.8961
0.8837 0.7418 0.7110 0.7689 0.4778 0.1377 0.3083 0.6561 0.6621 0.4082
0.9319 0.6283 0.1513 0.0169 0.2848 0.1131 0.8768 0.7107 0.7086 0.6158
Combine
0.2711
0.0626
0.7031
0.3947
0.4732
0.0464
0.8837
0.9319
Output of ACO with Multistage Search Fig. 4 ACO with multistage search in an optimization problem with eight decision variables
566
K. Chantarattanakamol and K. Boonlong
divided into four parts each of which represents two decision variables. In Fig. 4, after initial population is randomly generated, the population is divided into four parts. Part 1 having x 1 and x 2 is performed ACO process until this part is finished, the optimized x 1 and x 2 , which are 0.2711 and 0.0626 from the best solution as shown in the figure, are subsequently obtained. The next part 2 is then implemented, and the optimized x 1 and x 2 are also used in objective calculation of individuals in this part. After part 2 is finished, after that part 3 is executed until part 4 is completely performed. The output of ACO with multistage search is the full solution obtained from the combination of optimized corresponding variables from all parts as shown in the figure. The MS is particularly proposed to reduce solution search space. As shown in the figure, if normal ACO is used in the problem with eight decision variables in which each variable is encoded with four decimal digits number between 0.0000 and 1.0000, the number of possible solution is approximately equal to 10,0008 = 1032 . With the proposed MS, the number of possible solutions is exponentially reduced to be only about 5 × 10,0002 = 5 × 108 .
4 Test Problems The road profile identification using vibration response of the quarter vehicle in Fig. 1 is used as test problems. There are four test cases due to different classes A, B, C, and D of random road profile [16]. Class A of the road profile has the lowest roughness, while class D of the road profile has the largest roughness. The random road profiles of all road classes are shown in Fig. 5. Its length is 100 m and is randomly generated
Fig. 5 Four classes of randomly generated road profile
Multistage Search for Performance Enhancement of Ant Colony …
567
at sampled distance (d) of 0.2 m so that 500 decision variables to be optimized. In the numerical calculation of objective function, sampling time is equal to d/v = 0.01 s. As previously described, the objective function is numerical difference between vibration signal, acceleration of the sprung mass (Fig. 1), of actual road profile and that of a predicted road profile.
5 Simulation Results Parameter settings of ant colony optimization (ACO) for simulation runs are displayed in Table 1. The number of all decision variables to be optimized is 500. The numbers of the variables in each solution search stage (NVE) used in ACO with multistage search (MS) are 250, 100, 50, 25, 10, and 2. Population and archive have equal sizes of 100. Number of generations in each solution search stage is equal to 20 × number of decision variables so that 106 generated solutions in each run. Objective values of optimized solutions are displayed in Table 2. From Table 2, objective values are tenderly worse from class A to D due to the increase of road roughness. The objective values of optimized solutions obtained Table 1 ACO parameter settings for simulation runs Parameters
Settings and values
Chromosome coding
Real-value chromosome with 500 decision variables
Number of decision variables in each solution search stage (NVE)
Normal ACO: 500 ACO with MS: 250, 100, 50, 25, 10, 2
Population and archive sizes
100 for both sizes
Number of generations in each solution search stage
20 × number of decision variables
Number of repeated runs
10
Table 2 Objective values of optimized solutions NVE
Class A
Class B
Class C
Class D
Avg
SD
Avg
SD
Avg
SD
Avg
Avg
500 (normal ACO)
856.55
22.62
869.99
22.40
970.80
21.13
1534.30
48.47
250
804.14
24.06
794.23
33.26
882.87
35.68
1428.94
24.46
100
672.33
15.49
689.19
19.98
770.80
26.57
1250.20
28.63
50
570.66
20.09
600.60
28.66
647.93
23.95
1043.03
43.73
25
443.31
21.31
477.87
7.84
500.17
23.69
820.96
48.09
10
267.98
3.79
268.78
9.82
299.08
22.70
503.00
39.76
2
33.18
3.87
36.39
3.23
33.75
4.15
63.49
8.32
568
K. Chantarattanakamol and K. Boonlong
Fig. 6 Comparison of actual road profile and road profiles from optimized solutions from normal ACO and ACO with MS with NVE = 2 of road profile class A
from ACO with the MS are better than those obtained from normal ACO for all classes of road profile. With the use of MS, as previously described, the MS could reduce the number of possible solutions. In fact, the number of possible solutions is exponentially reduced with the decrease of NVE so that the ACO with the smallest NVE contributes the best solutions for each class of randomly generated roads as displayed in the table. Figure 6 shows the comparison of actual road profile and road profiles from ones of optimized solutions from normal ACO and ACO with MS with NVE = 2 of the selected class of road profile, class A. From the figure, the road profile obtained from the ACO with MS is obviously better than that obtained from the normal ACO and is quite close to the actual road profile. Figure 7 shows the comparison of vibration signal, which is the acceleration of sprung mass used in objective function evaluation, obtained from road profile in Fig. 6. In Fig. 7, the vibration signal due to the road profile obtained from the ACO with MS is much closer to vibration response due to actual road profile than that the vibration signal from the road profile by the normal ACO.
6 Conclusions To enhance performance of ant colony optimization (ACO) in the identification of randomly generated road profile using vibration responses of a quarter vehicle, the multistage search (MS) had been proposed. There were four classes—A, B, C, and D—of the randomly generated roads to be considered. The identification of
Multistage Search for Performance Enhancement of Ant Colony …
569
Fig. 7 Comparison of vibration signal obtained from road profile in Fig. 6
road profile was formulated as a optimization problem of which the corresponding objective is the numerical difference between vibration responses of a measured points from predicted road profile and that from actual road profile. The simulation results revealed that the MS can enhance performance of ACO in the road profile identification regardless of classes of road profiles. In addition, the MS with the smaller number of decision variables in each solution search stage (NVE) contributed better optimized solutions than the MS with the larger NVE since the number of possible solutions from the MS with smaller NVE is less than the MS with larger NVE.
References 1. Z. Zhang, C. Sun, R. Bridgelall, M. Sun, Road profile reconstruction using connected vehicle responses and wavelet analysis. J. Terramechanics. 80, 21–30 (2018) 2. Y. Qin, Z. Wang, C. Xiang, E. Hashemi, A. Khajepour, Y. Huang, Speed independent road classification strategy based on vehicle response: theory and experimental validation. Mech. Syst. Signal. Pr. 117, 653–666 (2019) 3. B. Zhao, T. Nagayama, K. Xue, Road profile estimation, and its numerical and experimental validation, by smartphone measurement of the dynamic responses of an ordinary vehicle. J. Sound Vib. 457, 92–117 (2019) 4. M. Dorigo, T. Stützle, Ant Colony Optimization (MIT Press, Cambridge, MA, 2004) 5. J.E. Bell, P.R. McMullen, Ant colony optimization techniques for the vehicle routing problem. Adv. Eng. Inform. 18(1), 41–48 (2004) 6. N.C. Demirel, M.D. Toksarı, Optimization of the quadratic assignment problem using an ant colony algorithm. Appl. Math. Comput. 183(1), 427–435 (2006)
570
K. Chantarattanakamol and K. Boonlong
7. Y. Hani, L. Amodeo, F. Yalaoui, H. Chen, Ant colony optimization for solving an industrial layout problem. Eur. J. Oper. Res. 83(2), 633–642 (2007) 8. K. Socha, M. Dorigo, Ant colony optimization for continuous domains. Eur. J. Oper. Res. 185(3), 1155–1173 (2008) 9. A. ElSaid, F.E. Jamiy, J. Higgins, B. Wild, T. Desell, Optimizing long short-term memory recurrent neural networks using ant colony optimization to predict turbine engine vibration. Appl. Soft Comput. 73, 969–991 (2018) 10. M.G.H. Omran, S. Al-Sharhan, Improved continuous ant colony optimization algorithms for real-world engineering optimization problems. Eng. Appl. Artif. Intel. 85, 818–829 (2019) 11. C.C. Chen, L.P. Shen, Improve the accuracy of recurrent fuzzy system design using an efficient continuous ant colony optimization. Int. J. Fuzzy Syst. 20(3), 817–834 (2018) 12. M.A. Potter, K.A. De Jong, A cooperative coevolutionary approach to function optimization. Lect. Notes Comput. Sci. 866, 249–257 (1994) 13. K. Boonlong, Vibration-based damage detection in beams by cooperative coevolutionary genetic algorithm. Adv. Mech. Eng. 6, 624949, 13 (2014) 14. J.C.M. Diniz, F. Da Ros, E.P. da Silva, R.T. Jones, D. Zibar, Optimization of DP-M-QAM transmitter using cooperative coevolutionary genetic algorithm. J. Lightwave Technol. 36(12), 2450–2462 (2018) 15. A. Pahlavanhoseini, M.S. Sepasian, Scenario-based planning of fast charging stations considering network reconfiguration using cooperative coevolutionary approach. J. Energy Storage. 23, 544–557 (2019) 16. M. Agostinacchio, D. Ciampa, S. Olita, The vibrations induced by surface irregularities in road pavements—a Matlab approach. Eur. Transp. Res. Rev. 6(3), 267–275 (2014)
Classification and Visualization of Poverty Status: Analyzing the Need for Poverty Assistance Using SVM Maricel P. Naviamos and Jasmin D. Niguidula
Abstract A household can fall into poverty for many reasons. This research study focuses on determining significant attributes that can determine poor household units by means of different non-income indicators such as household assets, housing conditions, available facilities like electricity, water supply, and sanitation. The study also includes the magnitude of poverty assistance received by the identified poor household units. The researchers used supervised learning such as Support Vector Machine (SVM) to classify households into two classes: the poor and non-poor based on training data. The training data consists of measurable quantitative attributes and properly correlated using Pearson Coefficient Correlation. To test the accuracy of the algorithm in evaluating the model, an 80% training and 20% testing data is set. And Table 2, it shows the accuracy that resulted to 71.93% which means that the model used is approximately 80% accurate. Keywords Poverty visualization · Poverty classification · Social protection program and services · Poverty assistance · Support vector machine
1 Introduction The Philippines is a nation arranged in Southeast Asia contained 7000 islands. Poverty has turned out to be the most significant difficulties confronting the country and its nationals. Filipinos are experiencing serious difficulties getting by in such troublesome conditions, and increasingly more are falling into extreme poverty. The country is wealthy in natural assets and biodiversity because of its closeness to the equator; notwithstanding, it is inclined to seismic tremors and tempests, making it the third most calamity inclined country in the world. M. P. Naviamos (B) · J. D. Niguidula Technological Institute of the Philippines, Manila, Philippines e-mail: [email protected] J. D. Niguidula e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_52
571
572
M. P. Naviamos and J. D. Niguidula
The Rural Poverty Portal (RPP) reports that half of the poor in the Philippines live in the country’s rural areas. The poorest of the poor are the indigenous, landless workers, fishermen, little agriculturists or farmers, mountain society, and women. Deforestation, exhausted fisheries, and unproductive farmland are serious issues for these groups of people. Absence of education and the absence of instructive open doors are additionally basic issues as well. Poverty statistics are the most fundamental bits of data for surveying the poverty status of a nation and for arranging an anti-poverty strategy [1]. The National Household Targeting System for Poverty Reduction (NHTS-PR) additionally called Listahanan of the Department of Social Welfare and Development (DSWD) is a framework for distinguishing poor households. The framework ensures the generation and foundation of a financial database of poor household units. This framework makes open to National Government Agencies (NGAs) and other Social Protection accomplices a database of poor household units as a reason in perceiving potential beneficiaries of the Social Protection Programs and Services (SPPS) [2]. The framework comprises a lot of uniform and target criteria to recognize poor individuals. A target model to choose those who need assistance the most was seen by the DSWD as a fundamental tool to improve the delivery of SPPS. The structure consolidated three successive advances, for example, geographic focusing, household unit appraisal, and approval to convey best outcomes. The execution was done in stages over a cycle of a three-year period time [3]. In this research, data mining has been utilized to analyze large data sets and transforming it into several sources to detect patterns. The study has a few interesting data that can be utilized to prepare and test the model for accuracy. The data is tested using a Support Vector Machine (SVM) algorithm, this can distinguish critical instances that decide the limits of the algorithm. In this connection, the study expects to utilize the SVM algorithm in structuring a model that can decide significant attributes in distinguishing poor and non-poor household units in selected five provinces in the Philippines.
2 Methodology The sample raw data collected and utilized in this research depends on the field interviews conducted from both rural and urban areas of the five selected provinces for a three-period time starting from the year 2014 to 2016 using a Household Assessment Form (HAF). The HAF is divided into three sections: the household identification, socio-economic information, and household roster; where household identification is consists of household address (municipality, barangay, district/province), length of stay, number of nuclear family in household and number of bedrooms/sleeping rooms while socio-economic includes the information of the household regarding the type of construction materials used in their roofs, outer walls, type of building where the household resides, tenure status of the property occupied, toilet, water, and electrical facility, assets (radio, tv, washing machine, microwave, refrigerator,
Classification and Visualization of Poverty Status …
573
video player, air conditioning, living room, dining set, vehicle, telephone/mobile, and computer), kind of inability (hearing, visual, speech, orthopedic, various incapacities, mental and different sorts of handicap) if applicable, displacement experienced and SPPS received. lastly, the household roster information which is used only for name matching and verification of poor household unit, ensuring that there are no duplicates of names or any kind and to guarantee that only identified poor household units will receive any type of SPPS from government and non-government agencies.
2.1 Data Pre-processing The raw data might be consisting of uproarious information, unimportant traits, and additionally missing data. Thus, data should be pre-prepared before applying any kind of data mining calculation which is done through different advances [4]. In the research study, the data cleaning process is done which includes identifying and reexamining errors in the data, filling in missing data, and removing duplicates. The next step is to make sure that the data set of poor and non-poor households will have the same number of populations, a random selection of the household sample is done in all five selected provinces to prevent any bias results. There are exactly a total of 221,324 identified poor households and 80% of it was used as a training set and 20% is used as a testing set. In the preparation stage, the calculation moves toward the estimations of both indicator characteristics and the objective quality for all cases of the preparation set, and it uses that data to build a classification model. This model speaks a classification of knowledge which is fundamental, a connection between indicator trait values and classes that license the forecast of the class of a model given its indicator characteristic qualities. In the testing stage, essentially after a prediction is made the algorithm allowed us to see the real class of the characterized group model. One of the critical goals of a characterization algorithm is to upgrade the predictive accuracy acquired by the grouping model when arranging models in the test set is concealed during training.
2.2 Feature Selection Various unimportant traits may be accessible in data to be mined. In this way, it should be removed. Likewise, many mining algorithms do not perform well with an enormous number of features or characteristics. Subsequently, the researchers applied a feature determination strategy before any kind of mining algorithm is applied. The crucial reason why feature determination is utilized in this research is to keep away from overfitting and improve model execution and to give snappier and more financially savvy models [4]. The Pearson Correlation Coefficient is a test that measures the quantifiable relationship, or relationship, between two consistent characteristics. It is known as the
574
M. P. Naviamos and J. D. Niguidula
best system of evaluating the connection between the traits of interest since it relies upon the procedure for covariance. It gives data about the degree of affiliation, or connection, similarly as the course of the relationship. It utilizes the accompanying as referenced to distinguish the significance or impact of each score [5]. Great: If the worth is close ±1, by then it said to be an ideal relationship: as one variable expands, the other variable will in general additionally increment (if positive) or diminishing (if negative). High degree: If the coefficient esteem lies between ±0.50 and ±1, at that point it is said to be a solid relationship. Moderate degree: If the worth lies between ±0.30 and ±0.49, at that point it is said to be a medium connection. Low degree: When the worth lies beneath +0.29, at that point it is said to be a little connection. No relationship: When the worth is zero.
2.3 Classification In data mining algorithm it can pursue three distinctive learning approaches and the researchers used a supervised learning approach in the research study since all labels are known. The characterization undertaking can be seen as a directed method where every event has a place with a class, which is shown by the estimation of the objective trait or basically the class property. The objective characteristic can take on all-out qualities, and each is relating to a class. Each model involves two segments, an indicator property estimations, and objective quality values. The indicator credits should be significant for foreseeing the class of an event. In the classification task the course of action of models being mined is confined into two on a very basic level inconsequential and exhaustive sets called the training set and the test set in the preparation stage.
2.4 Support Vector Machine The purpose behind the use of supervised learning, such as SVM is to classify objects into at least two classes dependent on training data. The training data comprised of measurable attributes that include the household assets, the received government, and non-government social protection program and the physical type of house materials used. For each training data, the predetermined class will determine whether a household is a poor or non-poor individual. The researchers utilized the SVM algorithm on account of its capacity to arrange complex issues and locate the ideal non-direct choice limit. To solve the objective of the research the steps used are the following:
Classification and Visualization of Poverty Status …
575
(1) set initial values of the model parameter, (2) Set the value of the objective function, (3) Compute the constraint equation for each row of the training data, and lastly, (4) use Python programming to view the results [6]. Once the training data is calibrated and trained, proper evaluation is done to check whether the training is successful or not. The following procedure is done to evaluate the SVM model: (1) computing the margin score for each training data, (2) classifying the margin score for each training data, (3) creating a confusion table, and lastly, (4) compute the percentage of correctly predicted or the error based on the prediction [6].
3 Results and Discussion 3.1 Data Table 1 shows the five selected provinces that includes province A to province E. These provinces were properly assessed by a group of field assessment officers to identify the magnitude of poor household units in each province and making sure only qualified household units that belong to the poorest families can receive SPPS or poverty assistance from any government and non-government agencies. To properly compute for the percentage of poor households in each province the formula used is found below: The Computational Formula: Magnitude of Poor HH (% ) = Identified Poor HH /Assessed HH As a result, Table 1 shows that Province C and D get the highest magnitude of identified poor households. And although, Province C in Fig. 1 shows as the second most populated province in the region with 173,288 households compared to Province E which has 59,039 households, the said province still results in a higher percentage of poor household units with 35.2%. Figure 1 also shows that areas like Province C Table 1 The number of assessed households units Province
Assessed households
Identified non-poor households
Identified poor households
Magnitude of poor households (%)
Province A
51,010
36,959
14,051
27.5
Province B
91,707
52,525
39,182
42.7
Province C
173,288
117,925
55,363
31.9
Province D
209,518
117,567
91,951
43.9
Province E
59,039
38,262
20,777
35.2
584,562
363,238
221,324
37.9
Total
576
M. P. Naviamos and J. D. Niguidula
Fig. 1 Identified poor and non-poor households
and D with an obvious high percentage of identified poor households should receive more poverty assistance to defeat the circumstance and advance the improvement of poor zones around the nation [7]. In the Household Assessment Form (HAF) of the National Household Targeting System for Poverty Reduction (NHTS-PR) the poverty status of a household can be identified based on the type of house the household resides, these include the following: (1) the type of house the household reside, (2) the number of available sleeping rooms, (3) the roof materials and the (4) the type of materials used for outer walls. Figure 2 shows that Province D showed the highest magnitude of household that uses strong materials like tile, concrete, brick, stone, wood, and plywood as a material for outer walls. Province D also had the highest magnitude of a roof
Fig. 2 Type of house a household resides
Classification and Visualization of Poverty Status …
577
Fig. 3 Type of available household facilities
material that uses light materials like cogon, nipa, and anahaw. And a house type duplex consisting of zero to nine sleeping rooms. Thus, the figure also shows that the Province A had the lowest magnitude of households that used a strong type or light type materials for roof and outer walls. And somehow reside in a cave, boat, under the bridge, etc. There are only three types of facilities being assessed by the NHTS-PR these are the following: (1) Toilet, (2) Electricity, and (3) Water Supply. Figure 3 shows that Province D had the highest magnitude of households with no electricity and whose main water source uses either shared, faucet, and community water systems. It also shows that the province of Palawan had the highest result of the household with a closed pit toilet. A closed pit toilet has a stage with a gap in it and a lid to spread the opening when it is not being used. The platform can be made of wood, cement, or logs covered with earth. Concrete stage keeps water out and last in numerous years [8]. The NHTS-PR is also assessing the household-owned items such as radio, television, video, stereo, refrigerator, clothes washer, cooling, phone, microwave, sala set, eating area, vehicle, or jeep, and motorcycle. And among all items, within the five provinces, phone gets the highest magnitude of owned items of a household. Probably because the phone nowadays is used not only for communication but also in many forms such as video, camera, entertainment, news updates, etc. Items such as stereo, refrigerator, clothes washer, cooling, microwave, sala, eating area, and vehicle or jeep are items that show a lower value of owned items in a household. The number of identified poor household units from Figs. 1, 2, 3, and 4 can be utilized as a reference why the province of Palawan among all other provinces has the highest magnitude of received SPPS of any government and nongovernment agencies which is clearly shown in Fig. 5. The following are the lists of all Social Protection Program and Services: (1) Pantawid Pamilyang Pilipino
578
M. P. Naviamos and J. D. Niguidula
Fig. 4 Type of items does the household-owned
Fig. 5 Social protection program and services received
Program (4Ps), (2) KC-NCDDP, (3) Sustainable Livelihood Program (SLP), (4) Social Pension Program, (5) Supplementary Feeding Program (SFP), (6) Educational Assistance under the Assistance to Individuals in Crisis Situations (AICS) and Non-Government Agencies initiated programs: (1) Scholarship (CHED), (2) Skills Training (TESDA), (3) Universal Health Care Program (DOH), (4) WASH Program (DOH), (5) Nutrition Programs—National Nutrition Council (NNC), (6) Rural Electrification—DOE/NEA, (7) Resettlement/Socialized Housing -NHA, (8) Microcredit/Microfinance—DTI/Government Financial Institutions, (9) Assistance to Pantawid Farmers Beneficiaries, and lastly, (10) Other SWD programs (for Local Government Units, Non-Government Organizations, and Civil Society Organizations). After identifying all necessary attributes on both the Identification Section, Socioeconomic Section, and Roster Section of the HAF, a random selection is done and
Classification and Visualization of Poverty Status …
579
balancing the number of poor and non-poor households to prevent any bias results. A total of 221,324 poor and 221,324 non-poor households are used to properly train and test the model. Then, split the total number of poor and non-poor households into 80% training sets and 20% testing sets. As a result, a total of 354,118 households are used as training sets and 88,530 testing sets. Python is used to plot the data into two target classes-the poor and non-poor households. The application of SVM with linear kernels is applied in a linearly separable data to properly see the difference between the two classes’ position in relation to each other.
3.2 Training Performance Once the model is built, the next important task is to make sure that the model is evaluated, this will delineate how good the predictions are, Tables 2 and 3 show the results of the precision, recall, F1-score, accuracy, and as well as the confusion matrix of the model. The following formula is used to check the performance metric: Precision = TP/(TP + FP) Recall = TP/(TP + FN) F1 - Score = 2 ∗ (Recall ∗ Precision)/(Recall + Precision) Accuracy = (TP + TN)/(TP + FP + FN + TN) In Table 2, precision means the proportion of accurately anticipated positive perceptions to the all-out anticipated positive perceptions. This implies among all Table 2 Precision, recall, and F1-score for poor and non-poor using SVM Classification algorithm
SVM linear Kernel
Target class Poor
Non-poor
Precision
Recall
F-score
Precision
Recall
F-score
Accuracy
79.47%
59.30%
67.92%
67.43%
84.61%
75.05%
71.93%
Table 3 Confusion Matrix for Poor and Non-Poor Households
Predicted class Class = yes (poor)
Class = no (non-poor)
Class = yes (poor)
26,311
18,051
Class = no (non-poor)
6794
37,374
Actual class
580
M. P. Naviamos and J. D. Niguidula
households that are labeled as poor households how many were identified as poor households. High precision identifies with the low false-positive rates. As a result, there is 79.47% precision which means that the model used is good enough to identify poor household units. The recall is the extent of effectively anticipated positive perceptions of all perceptions in the actual class. This means that of all the households, how many were properly identified as poor. As a result, there is 59.30% which is still above 50%. The F1-score is weighted normal of precision and recall. Consequently, if the amount of false positives and false negatives are through and through various the F1-score is a lot of dependable than the accuracy and for this situation, the score is 67.92%. Lastly, accuracy is the most characteristic performance measure and it is only the extent of accurately anticipated perception to the total perception. The result of the accuracy can be used as a great measure of the performance of the model. And in Table 2, it shows the accuracy that resulted to 71.93% which means that the model used is approximately 80% accurate. In Table 3, the confusion matrix is helpful to check whether there is misclassification. As a result, there are 26,311 accurately anticipated positive qualities which infer that the estimation of the actual class demonstrates this is a poor household unit and the predictive class is likewise something very similar. Thus, there are 37,374 accurately anticipated negative qualities which infer that the estimation of the actual class is non-poor household unit and predictive class tells something very similar. The false positives and false negatives, these qualities happen when the actual class contradicts with the predictive class.
4 Conclusion A household can fall into poverty for many reasons. The current findings in this research study suggest that a household unit can be measured based on the type of house, facilities that include water supply, electricity, and toilet, and as well as the types of items owned. The result can also be used to further understand why there is a great need for poverty assistance in areas with extremely low-income. Helping poor households is enough reason to expand the funding of the government and nongovernment agency programs and providing the basic needs of poor households can also help the entire country by advancing economic recovery and employment goals. This can also reduce food insecurity, hunger, and poor health. Compliance with Ethical Standards Conflict of Interest The authors declare that they have no conflict of interest. Ethical Approval This chapter does not contain any studies with human participants or animals performed by any of the authors. Informed Consent Informed consent was obtained from all individual participants included in the study.
Classification and Visualization of Poverty Status …
581
References 1. T. Fujii, Dynamic poverty decomposition analysis: an application to the Philippines. World Dev. 69–84 (2017) 2. Listahanan/National Household Targeting System for Poverty Reduction (NHTS-PR), 12 Feb 2018 [Online]. Available https://www.dap.edu.ph/coe-psp/innov_initiatives/listahanan-nat ional-household-targeting-system-for-poverty-reduction-nhts-pr/. Accessed 26 Aug 2019 3. L. Fernandez, Design and Implementation Features of the National Household Targeting System in the Philippines (2012) 4. J.A. Sunita Beniwal, Classification and feature selection techniques in data mining. Int. J. Eng. Res. Technol. 1(6) (2012) 5. X.L.L.W. Yashuang Mu, A Pearson’s correlation coefficient based decision tree and its parallel implementation. Inf. Sci. (2017) 6. K. Teknomo, Support Vector Machine Tutorial (2018) 7. Y.Z.Y.L. Yuanzhi Guo, Targeted poverty alleviation and its practices in rural China: a case study of Fuping county, Hebei Province. J. Rural Stud. (2019) 8. J. Conant, Sanitation and Cleanliness for a Healthy Environment
Comparative Analysis of Prediction Algorithms for Heart Diseases Ishita Karun
Abstract Cardiovascular diseases (CVDs) are the leading source of demises universally: More individuals perish yearly from heart disease than due to any other reason. An estimated 17.9 million humans died from CVDs in 2016, constituting 31% of all global deaths. [1] Such high rates of death due to heart diseases have to cease. This idea can be accelerated by the prediction of risk of CVDs. If a person can be medicated much earlier, before they have any symptoms that can be far more beneficial in averting sickness. The paper strives to communicate this issue of heart diseases employing various prediction models and optimizing them for better outcomes. The accuracy of each algorithm guides to a relative enquiry of these prediction models, forming a solid base for further research, finer prognosis and detection of diabetes. Keywords Machine learning · Heart disease prediction · Classification · Disease prediction · Receiver operating characteristics curve · Comparative analysis · Predictive modeling
1 Introduction Dysfunctions of the heart and blood vessels are known as cardiovascular diseases. These constitute coronary heart disease (CHD), rheumatic heart disease, cerebrovascular disease and other conditions. Eighty percentile of CVD deaths are attributable to heart attacks and strokes. [1] Heart attacks and strokes are predominantly generated by a clot that forbids blood from streaming to the heart or brain. Typically, the conventional rationale behind this is the accumulation of fatty dump in the interior linings of the blood vessels that provides for the heart and brain. Blood clots and bleeding from a blood vessel in the brain can also trigger a stroke. Use of tobacco, physical inactivity, unhealthy diet and obesity, excessive alcohol consumption, high blood pressure, diabetes and abnormally high concentration of fats in the blood are few explanations for the occurrence of CVDs. While an upsurge has been observed I. Karun (B) Department of Mathematics, Christ (Deemed to be University), Bengaluru, India e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_53
583
584
I. Karun
in the rate of heart diseases around the globe due to unhealthy approach toward life, machine learning can be attested as an extremely beneficent tool in prediction of occurrence of heart diseases. This will enable an individual to live a malady free life. Thus, in this research paper, attempts have been made to use various prediction algorithms to shrink the risk of heart diseases around the world after selecting the most important and relevant information. The paper is well structured into distinct segments. Section 2 elucidates the literature review carried out over the past research. Section 3 deals with the methodology used to choose the best model for speculating heart diseases using various prediction algorithms. Section 4 explains the execution of these algorithms in Python and Sect. 5 has the effects derived. Finally, in Sect. 6, the work is encapsulated.
2 Literature Review In [2], the aim of the research was to examine the connection between blood pressure (Joint National Committee) and the cholesterol categories (National Program for Cholesterol Education) with coronary heart disease (CHD) threat, integrate them into coronary prediction algorithms and compare the discriminatory characteristics of this strategy with other non-categorical prediction features. A straightforward coronary malady forecast calculation was created utilizing categorical factors, which permits doctors to foresee multivariate CHD chance in patients without unmistakable CHD. The Framingham Heart Study in [3] generated predictive features for sex-specific coronary heart disease (CHD) to assess the chance of developing CHD event in a middle-class white populace. Concerns existed with respect to whether these capacities could be generalized to other populations as well. The Framingham features worked well in populations after careful assessment, taking into consideration distinct prevalence of risk variables and fundamental rates of development of CHD. Palaniappan and Awang [4] emphasize on advanced data mining techniques for effective decision making. The study has established a prototype Intelligent Heart Disease Prediction System (IHDPS) using data mining approaches, namely Naive Bayes, neural network and decision trees. Every one method empowers critical information and has its one of a kind quality in realizing the goals of the characterized mining objectives. Using medical profiles such as blood pressure, sex, age and blood sugar IHDPS can anticipate the probability of people acquiring a heart ailment. In this paper [5], C-reactive protein is an inflammatory marker that is thought to have importance in coronary event forecast. The inference drawn from the study is that C-reactive protein is a comparatively mild anticipator of coronary heart disease and proposals with respect to it utilize in foreseeing the likelihood of coronary heart illness may ought to be reviewed. In [6], a modern approach based on coactive neuro-fuzzy inference system (CANFIS) was displayed for forecast of heart illness. The suggested CANFIS model mixed the adaptive capacities of the neural network and the qualitative view of the fuzzy logic that is then incorporated with the genetic algorithm to determine the disease’s existence. The model’s performance was assessed in regards
Comparative Analysis of Prediction Algorithms for Heart Diseases
585
with training performance and categorization accuracy, and the findings revealed that the suggested CANFIS model has an excellent ability to predict heart disease. To compare efficiency of diverse lipid measures for CHD forecast utilizing bias and calibration characteristics and renaming of peril categories and to evaluate incremental usefulness of apolipoproteins over conventional lipids for CHD forecast, a study has been conducted by Ingelsson et al. [7]. Here, in [8], conventional CHD hazard forecast systems require upgradation as the larger part of the CHD occasions happens within the “low” and “intermediate” risk bunches. On an ultrasound check, CIMT and existence of plaque are correlated with CHD and thus may possibly assist enhance the forecast of CHD danger. Adding plaque and CIMT to traditional dangerous factors improves CHD risk prediction in the Atherosclerosis Risk in Communities Study. The destinations of this enquiry in [9] were to create a coronary heart disease (CHD) chance representation amidst the Korean Heart Study (KHS) population and note the similarity or dissimilarity between it with the Framingham CHD hazard result. This research offers the primary proof that the peril function of Framingham over evaluates the risk of CHD in the Korean population where the rate of CHD is small. The Korean CHD chance demonstrate is a well-intended alteration and can be utilized to anticipate one’s hazard of CHD and give a valuable direct to recognize the bunches at tall chance for CHD amidst Koreans. Pattekari and Parveen [10] talk about developing an intelligent system using data mining modeling technique called Naive Bayes to bring to light and draw out the camouflaged proficiency related with diseases (cancer, diabetes and heart attack) from an ancient heart disease database. It can respond to complex disease diagnosis inquiries and hence help healthcare professionals design smart clinical choices that conventional frameworks are unable to create. It also helps to decrease the cost of therapy by offering efficient treatments. In paper [11], the authors seek to calculate the 10-year possibility in diabetic individuals of coronary heart disease (CHD) and how well fundamental and novel risk variables foretell the threat of CHD followed by the conclusion that while all diabetic grown-ups are at tall hazard for CHD, their disparity in CHD chance can be anticipated modestly well by essential hazard variables. No single novel threat marker significantly increases the entire risk assessment of CHD, but a battery of novel markers does. Also, their model could furnish rough calculation of the danger of CHD in individuals with type 2 diabetes for the primary prevention of this disorder. Wilson and Peter approach a new angle in [12] by claiming that the previous studies and surveys concentrated on conventional risk variables for cardiovascular disease and that scarce data on the role of overweight and obesity was accessible. In [13], the focus is on the HIV-infected sufferers with fat redistribution and the prediction of CHD in them. The authors talk about although how the risk of CHD is increased in fat redistribution patients infected with HIV, the pattern of fat allocation and sex is potentially significant elements in deciding the prospect in the inhabitants. Here, in [14], the goal was to develop and remotely approve a CHD genomic risk scores, in regards to lifetime CHD chance and respective to conventional clinical hazard scores. This was done because hereditary qualities play a critical part in coronary heart illness but the clinical usefulness of genomic risk scores relative to clinical hazard scores, like the Framingham Hazard Score (FRS), is speculative. The authors
586
I. Karun
in [15] perform contrast analysis on plasmaIL-6, fibrin D-dimer and coagulation factors VII and XIIa. Activated swelling and blood changing to a solid or semi-solid state are thought to boost and are linked to the danger of coronary thrombosis. They deduced that while D-dimer harmonized with IL-6 and C-reactive protein, only Ddimer exhibited a major autonomous coronary risk association. In the research paper [16], the authors introduce a method that uses significant risk factors to predict heart disease. The most effective data mining instruments, genetic algorithms and neural networks are included in this method. The framework was executed in MATLAB and forecasts the threat of heart infection with an exactness of 89%.
3 Methodology Prediction of heart diseases has been carried out using the methodology given in Fig. 1. To begin with, the dataset is thoroughly understood. There are multiple terminologies from the medical domain that need a clear understanding to start with. Once the terminologies are understood, proper analysis of data is carried out by finding the mean, median, standard deviation, count, percentile scores, minimum and maximum to facilitate description of data. Heat maps are a creative way of using colors to visualize the correlation between data. Histogram analysis also helps in segregating the data into required buckets. This is followed by data preprocessing which is a useful step in removing unwanted values or redundant data. This stage also handles missing data and outliers. The data is then brought to a common scale to further reduce data redundancy. Once the data is ready, the attributes contributing the most toward the Fig. 1 Methodology used for prediction of heart diseases
Comparative Analysis of Prediction Algorithms for Heart Diseases
587
prediction are chosen. Not every attribute has a significant contribution toward the prediction of heart diseases and hence lowers the accuracy, if not handled beforehand. There are multiple ways of carrying this out. With this, the data preprocessing stage comes to an end. The data is not ready to predict using different prediction algorithms. SVC, KNN and random forest classifiers are used here for the dataset. Precision, recall, accuracy and receiver operating characteristic plot are used to find the performance of each model. Finally, a comparative analysis is done to realize the finest model used to predict heart diseases.
4 Experimentation The heart disease UCI is a huge dataset with 76 attributes. Here, we are using only a subset of it containing 14 attributes. The subset dataset is thoroughly studied by finding out the mean, standard deviation, twenty-fifth and seventy-fifth percentile, minimum and maximum count for each and every attribute. These attributes are then categorized into different buckets using histogram analysis to find out the range for every attribute and also for outlier detection. Further, a heat map is used to analyze the correlation between each attribute, how they are impacted by each other and how they could impact the prediction in general. With a good understanding of the data, the dataset is checked for missing values. Since the dataset is devoid of missing values, data redundancy and outlier’s removal were targeted. Then, the dataset was split into training (80%) and testing (20%) sets for cross-validation. Cross-validation is used to model the figures and conduct testing on the test set figures to analyze how the outcomes of prediction on the training set generalize for the testing set as well. SVC, KNN and random forest classifiers are chosen for modeling the data and prediction due to the ease with which they can be understood for further work and also for their robustness. Precision, recall and accuracy for each model are calculated. Received operating characteristic plot is generated for each model to plot the true positives against the false positives. Finally, a comparative analysis is made based on accuracy of the models to find the best-suited model for heart disease prediction.
5 Results and Discussions Here, Table 1 shows the values obtained for calculating the precision, recall, ROC_AUC (received operating characteristics area under the curve) and accuracy. Figures 2, 3 and 4 represent the received operating characteristic curve for every one model and also show the region under the curve. ROC_AUC is used for diagnostic test evaluation which basically measures how well a parameter can identify the difference between diseased and normal. Lastly, Fig. 5 is an accuracy plot for the three classification models that clearly shows that random forest classifier outperformed the rest.
588
I. Karun
Table 1 Results of classification models used Model
Precision
Recall
ROC_AUC
Accuracy
SVC
0.796
0.771
0.867
0.769
KNN
0.638
1
0.904
0.796
Random forest classifier
0.809
0.844
0.899
0.829
Fig. 2 SVC receiver operating characteristic plot
Comparative Analysis of Prediction Algorithms for Heart Diseases
589
Fig. 3 KNN receiver operating characteristic plot
6 Conclusion In this paper, the aim has been to predict the occurrence of heart disease using machine learning algorithms. Combating different heart diseases has become a major concern given the alarming number of people who are currently suffering from it. With improved prediction techniques, finding solutions to problems and predicting them early on has become an easier task. The work carried on demonstrates how random forest classifier could be a helpful model in predicting the chances of suffering from a heart disease. There are so many other models that could be used for future work to find out better and optimized solutions. The work is based on a subset of the entire database. Further work could be carried out on larger subsets to enhance the prediction results.
590
Fig. 4 Random forest classifier receiver operating characteristic plot
Fig. 5 Accuracy plot of algorithms
I. Karun
Comparative Analysis of Prediction Algorithms for Heart Diseases
591
References 1. Cardiovascular Diseases (CVDs). World Health Organization, 17 May 2017, https://www.who. int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds) 2. P.W.F. Wilson et al., Prediction of coronary heart disease using risk factor categories.”. Circulation 97(18), 1837–1847 (1998) 3. R.B. D’Agostino et al., Validation of the Framingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation. Jama 286(2), 180–187 (2001) 4. S. Palaniappan, R. Awang, Intelligent heart disease prediction system using data mining techniques, in 2008 IEEE/ACS International Conference on Computer Systems and Applications (IEEE, 2008) 5. J. Danesh et al., C-reactive protein and other circulating markers of inflammation in the prediction of coronary heart disease.”. New Engl. J. Med. 350(14), 1387–1397 (2004) 6. L. Parthiban, R. Subramanian, Intelligent heart disease prediction system using CANFIS and genetic algorithm. Int. J. Biol. Biomed. Med. Sci. 3(3) (2008) 7. E. Ingelsson et al., Clinical utility of different lipid measures for prediction of coronary heart disease in men and women. Jama 298(7), 776–785 (2007) 8. V. Nambi et al., Carotid intima-media thickness and presence or absence of plaque improves prediction of coronary heart disease risk: the ARIC (Atherosclerosis Risk in Communities) study. J. Am. Coll. Cardiol. 55(15), 1600–1607 (2010) 9. S.H. Jee et al., A coronary heart disease prediction model: the Korean heart study. BMJ Open 4(5), e005025 (2014) 10. S.A. Pattekari, A. Parveen, Prediction system for heart disease using Naïve Bayes. Int. J. Adv. Comput. Math. Sci. 3(3), 290–294 (2012) 11. A.R. Folsom et al., Prediction of coronary heart disease in middle-aged adults with diabetes. Diab. Care 26(10), 2777–2784 (2003) 12. P.W.F. Wilson et al., Prediction of first events of coronary heart disease and stroke with consideration of adiposity. Circulation 118(2), 124 (2008) 13. C. Hadigan et al., Prediction of coronary heart disease risk in HIV-infected patients with fat redistribution. Clin. Infect. Dis. 36(7), 909–916 (2003) 14. Gad Abraham et al., Genomic prediction of coronary heart disease. Eur. Heart J. 37(43), 3267–3278 (2016) 15. G.D.O. Lowe et al., Interleukin-6, fibrin D-dimer, and coagulation factors VII and XIIa in prediction of coronary heart disease. Arterioscler. Thromb. Vasc. Biol. 24(8), 1529–1534 (2004) 16. S.U. Amin, K. Agarwal, R. Beg, Genetic neural network based data mining in prediction of heart disease using risk factors, in 2013 IEEE Conference on Information & Communication Technologies (IEEE, 2013)
Sarcasm Detection Approaches Survey Anirudh Kamath, Rahul Guhekar, Mihir Makwana, and Sudhir N. Dhage
Abstract Sarcasm is a special way of expressing opinion most commonly on social media websites like Twitter and product review platforms like Amazon, Flipkart, Snapdeal, etc., in which the actual meaning and the implied meanings differ. Generally, sarcasm is aimed at insulting someone or something in an indirect way or expressing irony. Detecting sarcasm is crucial for many natural language processing (NLP) applications like opinion analysis and sentiment analysis which in turn play an important role in analysing product reviews and comments. There are a lot of challenges proposed when it comes to detecting sarcasm and it is not as simple and straightforward as sentiment analysis. In this survey, we focus on the various approaches that have been used to detect sarcasm primarily on social media websites and categorise them according to the technique used for extracting features. We have also elucidated the challenges encountered while detecting sarcasm and have considered a particular use case for detecting sarcasm in Hindi-English mixed tweets. An approach has been suggested to detect the same. Keywords Natural language processing (NLP) · Long short-term memory network (LSTM) · Convolutional neural network (CNN) · Recurrent neural network (RNN) · Content and user embedding CNN (CUE-CNN) · Nonlinear subspace embedding (NLSE)
A. Kamath (B) · R. Guhekar · M. Makwana · S. N. Dhage Computer Engineering Department, Bharatiya Vidya Bhavan’s Sardar Patel Institute of Technology, Andheri (West), Mumbai 400058, India e-mail: [email protected] R. Guhekar e-mail: [email protected] M. Makwana e-mail: [email protected] S. N. Dhage e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_54
593
594
A. Kamath et al.
1 Introduction Sarcasm is a form of expression in which the implied and actual meanings of the sentence differ. It is used popularly on social media websites like Twitter, Facebook, online discussion forums like Reddit and other discussion forums and product reviews for instance Amazon and Flipkart product reviews. This form of expression is chosen generally to emphasise certain aspects; for instance, sarcastic reviews might be made about a product which did not really live up to its hype or something which fell very short of user expectations. The question which needs to be answered is “Why is sarcasm detection gaining importance and turning a lot of heads?” The answer is—traditional sentiment analysis fails to perform accurately when it encounters sarcastic tweets. This leads to a lot of discrepancies because opinion mining which uses sentiment analysis leads to incorrect results. This is because the implied and actual meaning of the sentence differs. In order to deal with all these situations, sarcastic comments need to be detected and processed separately for opinion mining and sentiment analysis. Conversation with digital assistants becomes more engaging once they start understanding the irony and sarcasm in human speech. We will be focusing on sarcasm detection only from text which is more challenging as the non-verbal modes of communication like gestures and body language are not readily available to us. Therefore, our task is to try and extract all these non verbal modes which are used during face-to-face communication, from the text, for instance counting the capitalised letters in a tweet, number of vowels occurring together in a tweet, etc. The use of slangs and short forms has also added to the pre-processing of the text. Section 2 deals with the presence of sarcasm on social media websites, Sect. 3 describes briefly the different approaches used for detecting sarcasm, Sects. 4–7 describe the methods used in detail, Sect. 8 describes our suggested approach, Sect. 9 deals with the challenges faced while detecting sarcasm, and Sect. 10 concludes our paper and the last section is the References section.
2 Sarcasm on Social Media There is a lot of content generated every day on social media websites like Twitter, Facebook, Tumblr, Instagram, Flickr, etc. The data generated is a heterogeneous mix of text, image and emoticons. Users use a combination of these to express their opinions and reviews about the current happenings, products, etc. The diversity of the data is something which complicates the task of any algorithmic manipulation on the data. Out of the above-mentioned platforms, Twitter turns out to be the most popular one in the research community. The tweets posted by users are used by companies to understand the user’s perspective about their products and therefore help them in improving their products and services. These tweets are also used by companies to post relevant advertisements depending on the user’s preference.
Sarcasm Detection Approaches Survey
595
Sarcasm is becoming more and more popular because people find it very amusing to express their opinions using sarcastic expressions. A survey done by [1] discusses the types, characteristics, approaches and trends in the field of sarcasm detection. Some examples of sarcasm are as follows: 1. I work very hard only to be this poor. 2. This is just what I wanted! 3. Great! So now there is no electricity in our area.
3 Approaches Used 3.1 Stand-Alone Approach In this approach, the sentence under consideration is the only thing that is considered for detecting sarcasm. No extra information of any sort is made use of. Features such as unigrams, bigrams and trigrams have been employed in [2, 3]. In addition, the presence of hyperboles, interjections, POS (Part-of-speech) tags, etc., are also considered for detecting the presence of sarcasm. In [4], an approach where a word is found that should appear after the trigram using Context2Vec which is a sentence completion library has been made use of. This word is termed as expected word. Finally, the compatibility of the expected word and the observed word is found. If found to be incompatible, the tweet has a high probability of being sarcastic. This approach can also be thought of as an isolation approach wherein only the tweet under consideration is used for extracting features.
3.2 Behavioural Approach In this approach, the tweet under consideration is not the only thing that is used for detecting sarcasm. A lot of features are extracted from the user’s profile who has posted that tweet. Features like the twitter age of the user, number of past positive and negative tweets, the time at which the user generally tweets, the user’s social graph, etc., are also considered [5].
3.3 Context-Based Approach This approach basically stresses the importance of considering the context in which the sarcastic tweet is posted. This helps in better feature extraction as the usage of sarcasm is highly dependent on the context. Sometimes, context is the only clue which helps us differentiate a sarcastic tweet from a non-sarcastic one, for example
596
A. Kamath et al.
This is just what I wanted! The presence of sarcasm in the above sentence depends on the previous context. If the person has been having a bad day, then the above sentence is obviously a sarcastic one. There also exists the possibility that the user genuinely got what he wanted and is not sarcastic about it.
3.4 Concept-Level Approach Sometimes, the presence or absence of sarcasm cannot be judged unless you know some fact about what is being talked about. For example, “Shoaib Akhtar could have bowled a little faster”, is definitely a sarcastic sentence. But we can say this only because we know that he is the fastest bowler in the world. For a person who is unaware about this, it is very difficult to judge whether the tweet is a sarcastic one or not.
3.5 Hybrid Approach A combination of more than one of the above-mentioned approaches is known as a hybrid approach. It is like combining the best approaches to get even better approaches.
4 Methods Used Detection of sarcasm in text is an active area of research, and hence, a wealth of approaches exists. Every approach has its own uniqueness depending on the features under consideration, and subsequently, the classifier used. Broadly speaking, there are three approaches.
4.1 Rule-Based Approach Rule-based approach uses some set of rules which generally consider the parts of speech tags of the tokenised words for feature extraction. In this approach, there exist a set of predefined rules and an algorithm exists for extracting features and judging whether a tweet is a sarcastic one or not [6]. The presence of some parts of speech in a specific order, presence of interjections, punctuation marks, n-grams, emoticons, hashtag, etc., are used for making the model which detects sarcasm.
Sarcasm Detection Approaches Survey
597
4.2 Machine Learning Approach Machine learning approaches which use different techniques for extracting features like developing more effective word embeddings, using various types of neural networks like convolutional neural networks, long short-term memory networks for extracting features, etc., are used for training the model which detects the presence of sarcasm. As far as the concepts of deep learning are concerned, LSTMs and RNNs are made use of for taking the context into consideration.
4.3 Hybrid Approach In this approach, we use a combination of the above two approaches so that we can get the best of both worlds. Generally, certain rules are defined for extracting lexical and syntactic features, and these features are fed to a machine learning algorithm which is then tuned by tweaking its hyper-parameters. Feature engineering is the most important step in this domain of sarcasm detection. The classifier accuracy depends on the features extracted and the technique used for extracting them. Umpteen techniques have been used for this purpose which are described subsequently. Early work on this topic used lexical and semantic approaches which basically considered a sarcastic sentence and extracted features evident from the sentence alone. The approaches can be classified on the basis of a lot of factors, for example, on the basis of the technique used for feature engineering, the content considered for detecting sarcasm etc. A few classifications are as follows.
5 Sarcasm Detection Considering Only Text 5.1 Stand-Alone Approach Sentiment analysis generally gives wrong results with sarcastic comments as it fails to identify the true sentiment conveyed. Bouazizi and Ohtsuki [7] use features like existence of repeated vowels in some words, use of words that are rarely used, count of gestures like laughter, count of interjections and presence of sarcastic expressions for detecting sarcasm. Generally, sarcastic text has a positive comment in a negative circumstance or a negative comment in a positive circumstance. Considering the count and position of the different parts of speech in a sentence is a useful method for extracting features. Also, the presence of hyperbole and interjections has been made use of extensively for sarcasm detection [6]. The presence of various lexical factors like diminutives, demonstrative determiners, quotation marks and interjections has been made use of for detecting sarcasm in Portuguese texts in [8]. The approach used the posts from the website of a Portuguese newspaper for their research as against
598
A. Kamath et al.
the most popular datasets of tweets and Facebook posts. Unigrams, hyperbole and pragmatic features are one of the most common approaches for extracting features from the text. An approach suggested in [9] uses natural language processing techniques like parts of speech tagging, fraction of positive and negative words and then uses composite features like lexical and hyperbolic features to make a hybrid model. Since the data at hand is huge, algorithms like map-reduce can be used for speeding up the execution. This approach does not take into account emoticons and smileys. The text analysed might convey some meaning, and sometimes the emojis used might show emotions opposite of which conveyed by the text. So emoticons and smileys play a crucial role in flipping the polarity of a sentence. Social media posts are filled with emojis and emoticons, and the importance of considering them for detecting sarcasm has been stated in [10]. Till now, we have discussed methods which are focused on engineering lexical and syntactic features from the text. Words are generally represented as vectors to feed them to neural networks like LSTM, RNN, etc. Learning the word embeddings from the corpus is an important process to get vectors which are more meaningful and can help improve predictions. Kumar et al. [11] discussed a deep learning-based approach using bidirectional LSTMs and convolution neural networks. An approach to learn embeddings from the Amazon product reviews corpus is mentioned in [12]. In order to derive effective word embeddings, [12] considers affect-based features. Affect basically consists of sentiment and emotion. Sentiment is binary valued, and emotion has more values and hence gives us more information. LSTMs are basically recurrent neural networks which are used for time series and textual data. They can be effectively used for learning word embeddings from the corpus. This led to their novel finding that sentiment-related features are more effective for shorter texts and emotion-related features are more effective for detecting sarcasm in longer pieces of text. This actually helps us understand that sentiment analysis can be applied on shorter pieces of text, and for longer pieces of text, emotion can be considered instead of sentiment for extracting features and training a supervised learning model. The approach falls in the domain of deep learning as it involves the use of long shortterm memory networks and Glove word embeddings. Glove basically means Global Vectors for Word Representation. The main motivation behind using recurrent neural networks and bidirectional RNNs is that it helps capture the context of the words. Another technique to generate good word embeddings is discussed in [13] which uses the most frequently occurring words in the conversation history of the user to develop word embeddings which are then fed to a CNN. After feature extraction is done, the classification is done using pre-trained models like NLSE, CUE-CNN, etc. Almost all the approaches are a combination of rule-based and machine learning techniques. Abulaish and Kamal [14] are an example of such a research. Abulaish and Kamal [14] use a set of predefined rules to extract features and the machine learning approach where the rules are learnt and not defined beforehand. The unique aspect about this paper is that the tweets considered are self-deprecating and the approach is tailor made to detect sarcasm in self-deprecating tweets as opposed to other research papers. Another important application of sentiment analysis and sarcasm detection has been explored in [15] which uses techniques for sentiment analysis from online
Sarcasm Detection Approaches Survey
599
feedback received from students on Twitter. This analysis helps the teacher to know the view of the students that is, whether he/she is liking the topic being taught, he/she is interested or not, etc. Students were asked to provide online feedback on Twitter where they were provided with learning-related emotions. Once the feedback was received, it was pre-processed, then the presence of hashtag sarcasm was checked and then found if the emotion used meant the opposite of what was being said in the tweet or not. Unigrams were used for this purpose. Pandey et al. [16] use the SentiWordNet dictionary and a vectorisation approach to detect sarcasm in the Amazon Alexa dataset. In this approach, counts of different parts of speech were found out and then this is treated as a feature for detecting sarcasm. An improvement in the pre-processing of the tweets by expanding the slangs and emoticons using a dictionary is given in [17]. This helps improve the accuracy as it simplifies the NLP tasks which work better with meaningful pieces of text and not slangs and emojis. Since the dataset has tweets which have been tagged using human intervention, they can be considered to be highly accurate. The only shortcoming of this approach is that new acronyms, short forms, emojis and stickers keep popping up every day. The usage of slangs and emojis is becoming increasingly popular with the increasing use of texting and messaging applications like WhatsApp, Facebook Messenger, etc. Therefore, there will be a need of human intervention to manually update the dictionary to accommodate the newer slangs which are surfacing on different social media websites. NLP solutions generally require a domain. It is very difficult to generate NLP models which are general as the corpus under consideration increases tremendously in size and is neither efficient nor accurate. Mathematical figures which can be deduced from tweets, etc., include sentiment scores. Basically, most of the statistical-based approaches count the occurrences of particular types of words and then create features from it for classification. Rendalkar and Chandankhede [18] calculate sentiment scores for sentences and then use this to predict whether a tweet is sarcastic or not. This approach basically finds out the sum of the sentiment scores of all words in the post obtained from Facebook. This comes with the disadvantage that the sentiment of a word cannot be judged by considering it in isolation. The context becomes exceedingly important for judging whether the word has a positive spin or a negative one. A sentence completion approach which uses the context of the previous words has been suggested by [4] wherein a word is predicted which is followed by an n-gram (generally trigram) called as expected word and then compared with the actual word present in text and is called as observed word. Sentence completion model is used to place the most suitable word to complete the sentence. The semantic distance between the observed words and expected words is used as an indicator. The language completion model is trained with non-sarcastic text to place a best suitable word to complete the sentence. It is easy to find the expected word if it is the final word. But if any information about the word is missing we need to go over the set of such stored words to find the correct word. Following are the two approaches to find the incongruous words in the sentence: • All the words in the sentence are considered as candidate incongruous words and similarity is performed for expected and observed words. A threshold value
600
A. Kamath et al.
T is selected between minimum and maximum of similarity check results. If the similarity check value is found to be less than T for particular expected and observed words, then the sentence is predicted to be sarcastic • In the previous approach, similarity check values are calculated even for words which are not incompatible. The current approach is similar to the first one but here similarity check values are calculated only for a set of selected words in the text.
5.2 User Behaviour Approach Till now, all the approaches, we had a look at, did not consider the behaviour and past history of the user. This approach expands the horizon for finding features and is the behavioural modelling approach to detect sarcasm in tweets suggested in [5]. This approach takes into account the user behaviour and various other attributes of the user like his command over the language, his twitter age, the time zone of the user, the user’s familiarity with the social media platform, etc., in addition to the features extracted from the tweet. [5] also elucidates the ways in which we can view sarcasm depending on the way it is being used. For example, it can be used to convey emotion, as a contradiction in the sentiments conveyed in the sentence, etc. Therefore, we get a better understanding of the usage and detection of sarcasm from this. This actually adds a lot of credibility in the features because just extracting the count of n-grams and the usage of capital letters might not be sufficient to say that the tweet under consideration is sarcastic. Some users might be using capital letters and slangs very frequently just because it is their way of typing and expressing their opinion. Therefore, in order to understand anomaly, we first need to understand what is normal. Bigonha [19] has proposed new approach which reduces the amount of data to be considered by finding the significant users on Twitter by ranking them. This approach mainly depends on the user’s activity on Twitter. Reference [19, 20] does network analysis for identifying influential users and communities. Then using the users profile and the eld of interest, carry out content based classification. Finally, SVM is used for sentiment classification.
5.3 Context-Based Approach In this approach, the context is considered using different techniques which are discussed further in this section. Poria et al. [21] use semantic and syntactic approaches for feature extraction, and then, these are used as inputs to a neural network which performs the task of classification. They have used various combinations of neural networks and compared their accuracies. Apart from this, they have reviewed the semantic modelling using the support vector machine (SVM) with input in the form of a parse tree which contains the semantic and syntactic information of
Sarcasm Detection Approaches Survey
601
the text under consideration. Poria et al. [22] use a recurrent neural network (RNN) for feature extraction which is then fed to an LSTM. Word embeddings are generated which function as the grammatical and syntactic rules. This approach has shown a lot of promise to detect sarcasm. Bouazizi and Ohtsuki [23] use a combination of punctuation, syntactic, sentiment, pattern-related features, and some other features like the occurrence of uncommon words and interjections. Ultimately, these are fed to a support vector classifier, KNN classifier and random forest. A combination of various approaches and a comparison of their performances can be found in the hybrid approach to detect sarcasm as explained in [24]. Ghosh et al. [25] consider the case of tweets and discussion forums where the context of the comment is tremendously important for determining sarcasm. It uses the immediately preceding comment and the immediately succeeding comment so as to understand the context in which a comment is being made. This approach along with detection of sarcasm in isolation has proved to be effective as it takes both sentence-level and word-level inputs for the model which proves to be beneficial for the model. Tungthamthiti et al. [26] also consider one more very important factor which is coherence. It basically measures the dependence between a set of sentences in the tweet or comment. Such features combined with binary weights improve the accuracy and hence proved their mettle in the feature engineering phase.
5.4 Concept-Level Approach This approach deals with incorporating factual information related to the tweet under consideration because for judging whether the tweet is sarcastic or not, sometimes some facts need to be known. Tungthamthiti et al. [26] use factual and common sense know how to detect sarcasm in tweets. It shows how important it is to understand the context and meaning of the words in which they appear. This is generally achieved using a concept net which is basically a graph in which the relationships between the nodes store the information as described in [26].
5.5 Hybrid Approach In this approach, a combination of more than one of the above-mentioned approaches is employed for detecting sarcasm. An example of which is mentioned in [27] which uses two predictors—one of them considers the past behaviour of the user and the other considers the statement in itself which is termed as target phrase which is then searched in users past tweets for the context. These factors which are the results of each predictor are together considered for classification using various logical operators like AND, OR, etc., which integrates the results based on the output of both the predictors. Apart from features as discussed earlier, more robust features were introduced by [28] which broadly discusses on tweet features, author-related
602
A. Kamath et al.
features, features related to the audience and environment which were then fed to a logistic regression classifier which was used to predict sarcasm. From the above discussion, we are given to understand that there exists no generic approach to detect sarcasm. It entirely depends on the social media platform and the kind of data under consideration. Malave and Dhage [29] stress the importance and need of a generalised approach to detect sarcasm by considering the context and user behaviour together. Both the aspects are very important for detecting sarcasm because sarcastic expressions depend a lot on the characteristics of the user like his familiarity with that particular social media platform as well as the context in which the sarcastic comment is being made.
6 Sarcasm Detection Considering Text and Image With the advent of social media, it is not just text which is sarcastic. Sarcasm is also conveyed via images and memes. Detecting sarcasm in images has always been a challenging task since capturing all the cues from an image becomes very difficult especially when the implicit and actual meanings of the image differ. Das and Clark [30] consider a lot of factors which are generally associated with a Facebook post which includes the caption of the post, the sentiment conveyed by the image of the post, the reactions—likes, dislikes, etc.—the description of the post and the comments made by the users. This provides a complete coverage of all the attributes one could possibly attribute with a Facebook post. There are about three approaches—rule-based, statistical and deep learning-based which involve CNNs and LSTMs. Indicators are specified in the rule-based approach to detect sarcasm. For example, hashtag is one of the key indicators for sarcasm detection on social media platforms like twitter. Such methods which combine several other methods give a more holistic view of the content which is to be adjudged. CNN-based approach has been used to extract features from images on Flickr to detect sarcasm in [31]. This approach utilises visual hints from images instead of semantic features. Schifanella et al. [32] describe two approaches, one of them using only natural language processing (NLP) and features extracted from the image while the other considers unigrams for textual input and uses deep learning approaches for extracting features from the images. The feature vectors obtained from the text, and the image are concatenated and fed to a support vector machine (SVM). It considers posts from two popular social media platforms—Instagram and Twitter. It compares the accuracy of the classifier first by considering the image and then by disregarding the image. While using deep learning approaches, care needs to be taken that the model is not learning too much, that is it should not overfit.
Sarcasm Detection Approaches Survey
603
7 Sarcasm Detection in English and Hindi Texts Nowadays, people tweet using a mix of more than one language. For example, Hindi and English are commonly used together in tweets by a lot of users. Though the language is different, the English alphabet is used by a lot of users for expressing their opinions in Hindi. This expands the horizon of sarcasm detection as new difficulties come up when we are dealing with such data which contains a mix of several languages. Swami et al. [33] make an effort to detect sarcasm in tweets having a mixture of English and Hindi transliterated words. Swami et al. [33] have annotated the corpus at the token level and have made an excellent e ort in detecting sarcasm using n-gram features and emoticons. Vyas et al. [34] elucidate methods to go ahead with the parts of speech (POS) tagging of tweets containing a mixture of Hindi and English words. They vividly describe the difficulties involved because of the mixing of the two languages at different levels. The usage of slangs, colloquial short forms and spelling errors further compound the problem. Bharti et al. [35] use a contextbased approach for detecting sarcasm in Hindi texts. The proposed solution finds keywords from the news articles identified as relevant, and these keywords are used to find the analogous tweets. It also extracts keywords from the input tweet and is used to select the suitable news articles in the new corpus. Then, both the set of keywords are compared to find the similarities. If most of the keywords are found to be similar, then the news article and the input tweet can have same context. Also, the algorithm finds the count of positive words and negative words in both the sets. So if the news article has a greater count of positive keywords than the tweet, it suggests that the tweet negates the fact and is identified as sarcastic. Limitation of this approach is the non-availability of news timestamps. Desai and Dave [36] are one of the papers which use sentences in Devanagari script and consider two types of statements, that is, with markers and those with no markers to detect sarcasm. They trained an SVM for five classes, and the method was tested and found to have an accuracy of 84%.
8 Suggested Approach After surveying the above papers, we find that there still lack sufficient robust approaches for detecting sarcasm from code-mixed tweets especially Hindi-English mixed tweets. So, we have proposed a framework to detect the presence of sarcasm in Hindi-English mixed tweets as shown in Fig. 1.
604
Fig. 1 Proposed framework to detect sarcasm in Hindi-English mixed tweets
A. Kamath et al.
Sarcasm Detection Approaches Survey
605
8.1 User Input A module to accept input from the users social media account—Twitter, Facebook or Instagram. This should be able to accept input given the username of the users social media account or the id of the tweet for the given instance.
8.2 Pre-processing The tweet or post is pre-processed to remove urls and other extraneous symbols which might hamper our text classification accuracy. Pre-processing involves a lot of other natural language processing (NLP) steps like splitting the tweet into tokens, removing the most common and frequently occurring n-grams, deriving the root words and lemmas from the given words, etc.
8.3 User History-Related Features The data collected can be used to trace back the users profile for the occurrence of the subject in the history of the user’s communication via the social media platforms mentioned above, specifically Twitter. This may include the count of the appearance of subject in the history and many more such attributes to get required features.
8.4 Punctuation-Related Features The occurrence of the punctuation marks in a given tweet can be indicator of presence of sarcasm. Moreover, multiple punctuation marks of the same type are one of the precursors of sarcasm in text.
8.5 Composite Sentiment Score Calculation This gives an overall estimate of the sentiment score in the given tweet. The given tweet may contain some frequently occurring words or idioms which are matched with the database created which contains those words/idioms which are mostly found in sarcastic sentences. Also, the text with both positive and negative Hindi words, that is, containing words of opposite polarity are used as the basis for the calculation of sentiment score.
606
A. Kamath et al.
8.6 Consideration of Context Using Subject of Sentence and Hashtags Hashtags and subject of the tweet or the text help in determining the context to which the tweet is referred as this helps in gaining insights about the utterance.
8.7 Standardisation, Scaling Features Various features obtained from the above processes are scaled such that the deviation is around 1 and are centred at 0. Apart from these, the approach may vary depending upon the data collected in those features. This process helps in improving the learning ability of the model which is going to be used further.
8.8 Classification Using all the extracted features to train a classifier like random forest, SVM-Support Vector Machine and classify the tweet or post as a sarcastic one or not.
9 Challenges Faced in Sarcasm Detection 9.1 Variety of Data Due to the new continuous advancement in the types of data being generated on social media websites like videos, GIFs, memes, images with indirect implications, etc., the task of sarcasm detection is becoming more and more difficult. Extracting features from all the different kinds of data that are associated with a post and then considering them together will become a major challenge. A lot of pre-processing will be needed, and the feature engineering phase will become tougher and tougher. More advanced machine learning models will have to be used for dealing with such diverse features.
9.2 Linguistic Diversity Due to the linguistic diversity of countries like India, where people generally use multiple languages in their posts, traditional python libraries cannot be used directly. Heavy pre-processing steps will definitely be needed before some concrete features
Sarcasm Detection Approaches Survey
607
can be extracted from the post. The presence of different scripts in the post will also compound this problem. For example, it will be a difficult task to extract features from a tweet containing a mixture of the English language and the Devanagari script.
9.3 User Idiosyncrasies User behaviour is gaining more and more importance as users have developed their own short forms and their own way of putting forth opinions. Some users use a lot of punctuation marks and capital letters in their posts which the traditional models might mistake for sarcasm. Therefore, more robust models are needed which can deal with such issues.
9.4 Variety of News There are a lot of incidents occurring everyday in the world. In order to adjudge a tweet as sarcastic or not, the tweet has to be first linked to the incident under consideration and only then can some meaningful features be extracted from the tweet. Sometimes, it becomes even more difficult as the incident being referred to may be perceived by some users as a good thing and the others as a negative incident. This will create a lot of difficulties while determining the sentiment associated with that hashtag, and in such situations, human intervention will be needed to judge the presence of sarcasm in the tweet.
10 Conclusion Thus, there has been a lot of work in detecting sarcasm from text, and most of the approaches differ in the way they extract features; the technique of classifying them remains almost the same throughout. All the approaches consider either all or a subset of the following features: stand-alone approach which only considers the tweet to be adjudged, user behaviour features extracted from the past activity of the user on twitter, concept-level features to extract facts related to the tweet and context-level features which consider the comments in the thread. A lot of approaches considering only the textual part have been analysed, whereas the methods extracting features from the image are comparatively lesser in number. This is because it is difficult to extract features indicating sarcasm from images as it involves a lot of complexities like the expression of the person in the image, the characteristics or information associated with the image, etc., which are needed to holistically understand the post and then judge whether it is a sarcastic post or not. There has not been a lot of work in detecting sarcasm in tweets which contain a mixture of English and Hindi due to the
608
A. Kamath et al.
linguistic complexity associated with such a corpus. For such a code-mixed corpus, count-based approaches have proven to be very successful in detecting sarcasm. A lot of future work can be done in this domain to try and extract the meaning of the words and then use that to detect sarcasm in addition to the count-based features.
References 1. A. Joshi, P. Bhattacharyya, M.J. Carman, Automatic sarcasm detection: a survey. ACM Comput. Surv. CSUR 50(5), 73 (2017) 2. C.C. Liebrecht, F.A. Kunneman, A.P.J. van Den Bosch, The perfect solution for detecting sarcasm in tweets# not (2013) 3. F. Kunneman, C. Liebrecht, M. Van Mulken, A. Van den Bosch, Signaling sarcasm: from hyperbole to hashtag. Inf. Process. Manag. 51(4), 500–509 (2015) 4. A. Joshi, S. Agrawal, P. Bhattacharyya, M.J Carman, Expect the unexpected: harnessing sentence completion for sarcasm detection, in International Conference of the Pacific Association for Computational Linguistics, pp. 275–287 (2017) 5. A. Rajadesingan, R. Zafarani, H. Liu, Sarcasm detection on twitter: a behavioral modeling approach, in Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (ACM, 2015), pp. 97–106 6. S.K. Bharti, K.S. Babu, S.K. Jena, Parsing-based sarcasm sentiment recognition in twitter data, in Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (ACM, 2015), pp. 1373–1380 7. M. Bouazizi, T. Ohtsuki, Opinion mining in twitter: how to make use of sarcasm to enhance sentiment analysis, in 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (IEEE, 2015), pp. 1594–1597 8. P. Carvalho, L. Sarmento, M.J. Silva, E. De Oliveira, Clues for detecting irony in user-generated contents: oh…!! it’s so easy, in Proceedings of the 1st international CIKM Workshop on Topicsentiment Analysis for Mass Opinion (ACM, 2009), pp. 53–56 9. K. Parmar, N. Limbasiya, M. Dhamecha, Feature based composite approach for sarcasm detection using mapreduce, in 2018 Second International Conference on Computing Methodologies and Communication (ICCMC) (IEEE, 2018), pp. 587–591 10. J. Subramanian, V. Sridharan, K. Shu, H. Liu, Exploiting emojis for sarcasm detection, in Social, Cultural, and Behavioral Modeling, eds. by R. Thomson, H. Bisgin, C. Dancy, A. Hyder (Springer International Publishing, Cham, 2019), pp. 70–80 11. A. Kumar, S.R. Sangwan, A. Arora, A. Nayyar, M. Abdel-Basset et al., Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network. IEEE Access 7, 23319–23328 (2019) 12. A. Agrawal, A. An, Affective representations for sarcasm detection, in The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, 2018), pp. 1029–1032 13. S. Amir, B.C. Wallace, H. Lyu, P. Carvalho, M.J. Silva, Modelling context with user embeddings for sarcasm detection in social media. arXiv:1607.00976 (2016) 14. M. Abulaish, A. Kamal, Self-deprecating sarcasm detection: an amalgamation of rule-based and machine learning approach, in 2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI) (IEEE, 2018), pp. 574–579 15. N. Altrabsheh, M. Cocea, S. Fallahkhair, Detecting sarcasm from students feedback in twitter, in Design for Teaching and Learning in a Networked World (Springer, 2015), pp. 551–555 16. A.C. Pandey, S.R. Seth, M. Varshney, Sarcasm detection of amazon alexa sample set, in Advances in Signal Processing and Communication (Springer, 2019), pp. 559–564
Sarcasm Detection Approaches Survey
609
17. A.G. Prasad, S. Sanjana, S.M. Bhat, B.S. Harish, Sentiment analysis for sarcasm detection on streaming short text data, in 2017 2nd International Conference on Knowledge Engineering and Applications (ICKEA) (IEEE, 2017), pp. 1–5 18. S. Rendalkar, C. Chandankhede, Sarcasm detection of online comments using emotion detection, in 2018 International Conference on Inventive Research in Computing Applications (ICIRCA) (IEEE, 2018), pp. 1244–1249 19. C. Bigonha, T.N.C. Cardoso, M.M. Moro, M.A. Goncalves, V.A.F. Almeida, Sentiment-based influence detection on twitter. J. Braz. Comput. Soc. 18(3), 169–183 (2012) 20. B. Sluban, J. Smailovic, S. Battiston, I. Mozetic, Sentiment leaning of influential communities in social networks. Comput. Soc. Netw. 2(1), 9 (2015) 21. A. Ghosh, T. Veale, Fracking sarcasm using neural network, in Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 161– 169 (2016) 22. S. Poria, E. Cambria, D. Hazarika, P. Vij, A deeper look into sarcastic tweets using deep convolutional neural networks. arXiv:1610.08815 (2016) 23. M. Bouazizi, T.O. Ohtsuki, A pattern-based approach for sarcasm detection on twitter. IEEE Access 4, 5477–5488 (2016) 24. N. Vijayalaksmi, A. Senthilrajan, A hybrid approach for sarcasm detection of social media data. Int. J. Sci. Res. Publ. IJSRP 7(5) (2017) 25. D Ghosh, A.R. Fabbri, S. Muresan, The role of conversation context for sarcasm detection in online interactions. arXiv:1707.06226 (2017) 26. P. Tungthamthiti, S. Kiyoaki, M. Mohd, Recognition of sarcasms in tweets based on concept level sentiment analysis and supervised learning approaches, in Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing (2014) 27. A. Khattri, A. Joshi, P. Bhattacharyya, M. Carman, Your sentiment precedes you: using an authors historical tweets to predict sarcasm, in Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 25–30 (2015) 28. D. Bamman, N.A. Smith, Contextualized sarcasm detection on twitter, in Ninth International AAAI Conference on Web and Social Media (2015) 29. N. Malave, S.N. Dhage, Sarcasm detection on twitter: user behavior approach, in Intelligent Systems, Technologies and Applications, eds. by S.M. Thampi, L. Trajkovic, S. Mitra, P. Nagabhushan, J. Mukhopadhyay, J.M. Corchado, S. Berretti, D. Mishra (Singapore, 2020), pp. 65–76 30. D. Das, A.J. Clark, Sarcasm detection on facebook: a supervised learning approach, in Proceedings of the International Conference on Multimodal Interaction: Adjunct (ACM, 2018), p. 3 31. D. Das, A.J. Clark, Sarcasm detection on ickr using a cnn, in Proceedings of the 2018 International Conference on Computing and Big Data (ACM, 2018), pp. 56–61 32. R. Schifanella, P. de Juan, J. Tetreault, L. Cao, Detecting sarcasm in multimodal social platforms, in Proceedings of the 24th ACM International Conference on Multimedia (ACM, 2016), 1136–1145 33. S. Swami, A. Khandelwal, V. Singh, S.S. Akhtar, M. Shrivastava, A corpus of english-hindi code-mixed tweets for sarcasm detection. arXiv:1805.11869 (2018) 34. Y. Vyas, S. Gella, J. Sharma, K. Bali, M. Choudhury, Pos tagging of english-hindi code-mixed social media content, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 974–979 35. S.K. Bharti, K.S. Babu, S.K. Jena, Harnessing online news for sarcasm detection in hindi tweets, International Conference on Pattern Recognition and Machine Intelligence (Springer, 2017), pp. 679–686 36. N. Desai, A.D. Dave, Sarcasm detection in hindi sentences using support vector machine. Int. J. 4(7), 8–15 (2016)
Web and Informatics
Interactive Animation and Affective Teaching and Learning in Programming Courses Alvin Prasad and Kaylash Chaudhary
Abstract Programming is an essential subject in computer science and information systems developments, and its demand in the computer industry is even higher today. To the dismay of the industry, there is a very high dropout rate of programming students as they find coding difficult. The literature presents many ways to combat attrition rate in programming courses, which include the way programming is taught (teaching) and received (learning) by students. This paper will focus on teaching and learning aspects of programming courses. In particular, we discuss an interactive tool that can be used to teach and learn to program. This tool is developed using Adobe Flash CS5, specifically the Adobe Action Script 3.0, Adobe AIR and iPhone OS. The tutorials are animated and designed using natural language. Keywords Animation · Affective teaching · Home language · Programming
1 Introduction The article by CBC News published in 2016 states that there will be 200,000 communication and information technology jobs that will need filling in Canada, and they predict that there will not be enough people to fill them [2]. According to US bureau of labour statistics employment in the area of information technology is expected to increase by 13% by 2026 [12]. In another survey, across 43 countries and territories reports that IT skills are in demand and position, which is very difficult to fill, where computer programming is one of them [3]. Students do not want to take programming courses as they feel that programming is difficult to understand, which leads to a shortage of skilled workers in this specialisation. A. Prasad (B) The University of Fiji, Lautoka, Fiji e-mail: [email protected] K. Chaudhary The University of the South Pacific, Suva, Fiji e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_55
613
614
A. Prasad and K. Chaudhary
Furthermore, the proposed study’s goal is to motivate students and change their attitude towards programming. The finding from this research will benefit the stakeholders such as curriculum developers, course facilitators and other researchers in making an efficient and practical decision. A system was developed to teach and learn programming where the essential concepts from the literature are incorporated. The research intends to find whether degree programmes developed in programming students mother tongue helps make students grasp the subject of programming better or not. Thus, the research question for this paper is to determine the relationship between independent variables such as interactive animation, effective teaching and learning, with the dependent variable student results in a programming course. The next section will discuss the challenges faced in teaching and learning programming. Section 3 will present the methodology of this research, while Sect. 4 discusses the application and its architecture. This paper concludes in Sect. 5.
2 Challenges Faced in Teaching and Learning Programming Some of the problems in teaching and learning of programming language are related to the student’s prior knowledge and learning environment before taking the course. According to Oroma et al., the background of the student, such as the type of skill acquired from the previous training or study environment, is an important factor in the selection of subject at a tertiary institute [8]. Secondly, the lack of encouragement, and motivation and painting a negative picture in the minds of prospective students that programming is a tricky subject is another issue that needs to be dealt with by the instructors and parents. Also, mathematics is another subject which discourages students from doing computer programming because students find mathematics difficult and have stereotype ideas that programming is similar to mathematics, and hence, they refrain from choosing computing science as a major. Furthermore, the medium or the mode of study is another factor that leads to students not liking the subject. Different learners have different taste, so a particular method might appeal to some and not to the others. Together with the medium, the teacher also plays a leading role in a student’s life for further studies. Hence, a teacher’s negative attitude towards the subject also demoralises the student [7]. Language is also a factor that leads to students understanding of the subject matter. The researchers in a study concluded that students whose first language is not the same as that used in the instruction of the course did not perform well [9]. According to a study Krpan et al. and Woei et al., the schooling and learning computer programming is perplexing [6, 13]. Our understanding is if someone gets to hear the statement that ‘programming is difficult’ every day, automatically it will become difficult for that person. Individuals need to understand that programming is just another language for programmers to communicate with computers. Information technologies personnel facilitated by computers are trying to make every task easy,
Interactive Animation and Affective Teaching …
615
solve most of our problems and help in enhancing the world to the new technological era. We can imagine that in the coming years, computers will drive the world, and we need drivers who can authorise or command. So to take this commanding position, we should know the language which is necessary to do this. According to Jacobsen, experts during a code week discussed the need for programmers with the ability to solve problems [5]. They are predicting an increase in the demand for programmers. Soon, education will comprise virtual learning [11], mobile learning [1], gamification with digitalised classrooms and digital books [4]. Many applications, various techniques and different methods have been tested to encourage students to take programming courses. Live coding, virtual learning environments, massive open online courses, 3-dimensional folktales, online trainer and judges, visualisation, robotic toolkits, games, mobile learning, social media and other methods have been employed to promote the interest in programming. Article [10] discusses these techniques and elaborates on some drawbacks which we want to overcome in the developed application.
3 Methodology Firstly, this research reviewed the innovative technologies and methods which are currently used in teaching and learning programming through a critical literature review. Secondly, animated applications have been developed using Adobe Flash CS5, precisely the Adobe Action Script 3.0, Adobe AIR and iPhone OS document. The animation contains commands in English (medium for teaching in schools and tertiary institutes in Fiji), Hindi (Fiji Hindi spoke mostly by Fijians of Indian Descent) and I-Taukei (spoken by indigenous Fijians). In this research, an experiment was conducted using this animation and application with first-year information technology students in the Western Division of Fiji.
4 Application Architecture The application used in this study as a teaching resource was developed using Adobe Flash CS5. We chose this platform because it allowed us to develop animations quickly and played on any platform with a flash player. The Adobe AIR 2 and iPhone OS within Adobe Flash were used to develop the application used on desktop, web and mobile devices. Action script 3.0 was used for the coding of the application. iPhone OS document in Adobe allows to create or develop applications deployed on Apple iPhone and iPod touch. Adobe AIR 2 document enables to create applications and bundle the same code into native apps for Windows, Mac, iPhone, iPad, Kindle Fire, Nook Tablet and other android devices. It enabled us to create an interactive environment using different controls. Majority of the students in the new era have a mobile device; the application developed can be used by the students on the go as
616
A. Prasad and K. Chaudhary
well. As the students are travelling to the university or anywhere, sitting in the park, on the bed trying to sleep, or anytime they want to, they can go through the animation as this will be a kind of entertainment for them, and at the same time, they will learn programming. The animation includes interactivity, where students were able to move the mouse cursor on the code to find what a particular syntax signifies, as shown in Fig. 1. Together with that, drag and drop animations can be used to test knowledge after the explanation of the method or concept. Students can use the animation to go through while doing tutorial and lab questions. In addition to this, the main application makes use of scenarios related to student’s daily lives, such as the characters in a village setting asked to collect mangoes from the nearby farm. As the learner starts with the first challenge, other characters with new and advanced methods are introduced. Every challenge adds a new line of code to the existing block of code, which leads to a complete program. Figure 2 shows the flow of the challenges in the application. Figure 3 shows an example scenario. The learner gets the first challenge that is to create a basket by writing code, as shown in Fig. 4. We believe that this challenge will motivate the learners whereby they would like to overcome the contest. It will
Fig. 1 Explanation of individual lines of code
Read the scenario and accept the challenge
Try the code challenge
Run the Code
No
Visualize the example.
Yes
Need Help? Error?
Yes
Debug using the Error message (hint)
Fig. 2 Flow chart shows how the application works
No
Challenge Complete. Next Challenge.
Interactive Animation and Affective Teaching …
617
Fig. 3 Initial page with a scenario
lead to the application and understanding of different concepts in programming. If, for instance, a learner needs help, the application has example pages which explain what steps to take. Figures 5, 6, 7 and 8 show example pages from the app. The example can be described using any three languages. Students have the option of going to the example page over and over, which explains the concept in an animated format. After writing the correct code, the learner will be able to see their creation. It also will provide an option which will lead to the next challenge. If the code entered is incorrect, then an error message will appear with an explanation on the error and hint which will help in better understanding of the challenge. While completing the challenges, learners will ultimately write blocks of code, which eventually will solve a problem. As shown in Fig. 9, learners after completing five challenges can write a block of code where they can declare variables, initialise the values and write the formula to add the number of mangoes. While they write the code, the animation continues to show what they have done. After completing these codes, learners will be able to use the concept learnt to do any calculations like addition, subtraction, multiplication and division on any number of variables.
5 Result A total of 68 students result was recorded before the use of animation in teaching and post to the use of animation. Exam results were recorded, and statistical analysis was carried out. As the sample size and the students are the same for both the exams, a paired sample t-test was conducted to determine the effect of animation on the learning ability of students. Some assumptions have to be satisfied to conduct this test. One being that the dependent variable (exam results) is continuous; two, the independent variable (students) must comprise related groups; three, there ought to be no significant outliers in the differences between the two associated groups, and lastly, the distribution of the differences in the dependent variable between the two related groups should be approximately normally distributed. The data in sample satisfies the first and second conditions, that is, the dependent variable—exam results are measured on a continuous scale—and the independent
618
A. Prasad and K. Chaudhary
Fig. 4 Challenge 1 writes the code to create a basket
variable—students contain the same students who have set for the exams. For the third assumption, a specific procedure needs to be carried out to determine if there are any outliers in the data. To proceed, find the differences between the two results and divide into quartiles. We shared the data into three quartiles (Q1, Q2 and Q3) by using the quartile function from excel and later found the inter-quartile range (IQR)
Interactive Animation and Affective Teaching …
619
Fig. 5 Animation showing the coding environment
Fig. 6 Error message with hints
by subtracting the first quartile from the third quartile. Then, the upper bound and the lower bound are found by using the following formulas: The formula for Upper Bound: Q3 + (1.5 ∗ IQR)
(1)
The formula for Lower Bound: Q1 − (1.5 ∗ IQR)
(2)
The first quartile (Q1) is four, while the third quartile (Q3) is seven, as shown in Table 1. The upper bound is 11.5, and the lower bound is −0.5. To satisfy the third assumption, that there should not be any significant outliers. According to the data obtained after conducting the outliers test, there are no noteworthy outliers in the data; thus, so far, three assumptions have been proven true for ‘sample’ data.
620
Fig. 7 Example page in three languages
Fig. 8 Two different challenges completed page Fig. 9 Simple coding completed after completing five challenges
A. Prasad and K. Chaudhary
Interactive Animation and Affective Teaching … Table 1 Finding outliers for sample
621
First quartile (Q1)
4
Second quartile (Q2)
6
Third quartile (Q3)
7
Inter-quartile range (IQR)
3
Upper bound
11.5
Lower bound
−0.5
First value outlier
FALSE
Furthermore, the data used for paired sample t-test should be approximately normally distributed. To test for this, use a histogram, but a probability plot is paramount for determining the distribution. Regression analysis in excel identified normality checks. It creates a normal probability plot displaying the R2 value, and a linear trendline is added to indicate the deviation of the difference values from this trendline. In Fig. 10, the trendline (theoretical percentiles of the normal distribution) is the thin black line while the blue dots (observed sample percentiles) are the plots for the differences. It indicates that the data is almost normally distributed with a few insignificant outliers. The R2 value, 0.86, is closer to 1, demonstrating that data closely fit the regression line; thus, it is approximately normally distributed. We can now conduct a paired sample t-test since the data in sample satisfies the four requirements mentioned earlier, as shown in Table 2. The null and alternative hypothesis of the research is: Ho There is no significant difference in the Pre_Score and Post_Score means of students. Ha There is a significant difference in the Pre_Score and Post_Score means of students Since the p-value is less than the significance level (0 < 0.05), reject the null hypothesis. That is, there is a significant difference in the Pre_Score and Post_Score means of students. Thus, it concludes that the use of animation allows students to understand the concepts in programming. Fig. 10 Normal probability plot for sample
15 R² = 0.8624 6
10 5 0 -5
0
50
100
Sample Percenle
150
622
A. Prasad and K. Chaudhary
Table 2 t-test: paired two sample for means
Pre_score
Post_score
Mean
4.985,294,118
10.39705882
Variance
6.19381036
6.60118525
Observations
68
68
Pearson correlation
0.561131223
Hypothesised mean difference 0 Df
67
t Stat
−18.82631138
P(T ≤ t) one-tail
9.28691E-29
t critical one-tail
1.667916114
P(T ≤ t) two-tail
1.85738E-28
t critical two-tail
1.996008354
6 Conclusion Almost every day, new technology is invented, and its use improves the lives of billions of people. Moreover, very few students are interested in pursuing undergraduate studies in programming as they feel that it is very difficult to understand. Thus to encourage more students to undertake this subject area, an animation was developed, which teaches fundamental concepts in programming. We tested a group of first-year programming students’ results before learning via this animated concept and after learning programming incorporated animation. The result indicates that there is a significant relationship between the use of animated tools and academic performance. The result of this study will help students, course facilitators and researchers in knowing how feasible are animation and effective teaching and learning in regard to teaching difficult subjects such as computer programming. Compliance with Ethical Standards Conflict of Interest The authors declare that they have no conflict of interests. Ethical Approval This research concerns students. The exam score of students was used in the study as per their ethical approval. Informed Consent Informed consent was obtained from all individual participants included in the study.
Interactive Animation and Affective Teaching …
623
References 1. M. Ally, J. Prieto-Blázquez, What is the future of mobile learning in education? Univ. Knowl. Soc. J. 142–151 (2014) 2. CBC News, Why learning Computer Coding is so important, according to advocates (20 Oct 2016) 3. A. DeNisco, Report: 40% of employers worldwide face talent shortages, driven by IT. TechRepublic (2016) 4. J. Hamari, K. Jonna, H. Sarsa, Does Gamification Work?—a literature review of empirical studies on gamification, in 47th Hawaii International Conference on System Science (IEEE computer society, 2014), pp. 3025–3034 5. H. Jacobsen, Digital experts say coding leads to empowerment (and jobs). EurActiv.com (2015) 6. D. Krpan, S. Mladenovi´c, M. Rosi´c, Undergraduate Programming Courses, Students’ Perception and Success, in International Conference on New Horizons in Education. Procedia—Social and Behavioral Sciences, pp. 3868–3872 (2015) 7. V.P. Meisalo, S. Suhonen, S. Torvinen, E. Sutinen, Formative evaluation scheme for a web-based course design, in Proceedings of the 7th Annual Conference on Innovation and Technology in Computer Science Education (Association for Computing Machinery, New York), pp. 130–134 8. J.O. Oroma, H. Wanga, F. Ngumbuke, Challenges of teaching and learning computer programming in developing countries: Lessons from Tumaini University, in Proceedings of INTED Conference (Spain, 2015), pp. 3820–3826 9. N. Pillay, V.R. Jugoo, An investigation into student characteristics affecting novice programming performance. ACM SIGCSE Bull. 107–110 (2005) 10. A. Prasad, M. Farik, Integration of innovative technologies and affective teaching & learning in programming courses. Int. J. Sci. Technol. Res. 313–317 (2015) 11. L.G. Rolando, D.F. Salvador, A.H. Souza, M.R. Luz, Learning with their peers: Using a virtual learning community to improve an in-service Biology teacher education program in Brazil. Elsevier (2014) 12. U.S. Bureau of Labor Statistics, Occupational outlook handbook. Retrieved March 6, 2019, from U.S. Bureau of Labor Statistics. https://www.bls.gov/ooh/computer-and-information-tec hnology/home.htm (13 Apr 2018) 13. L.S. Woei, I.H. Othman, C.K. Man, Learning Programming using Objects-First Approach Through Folktales (Penerbit UTM Press, 2014), pp. 47–53
IoT and Computer Vision-Based Electronic Voting System Md. Nazmul Islam Shuzan, Mahmudur Rashid, Md. Aowrongajab Uaday, and M. Monir Uddin
Abstract Focusing on complete transparency with maximum security, a novel type of advanced electronic voting system is introduced in this paper. Identification and verification of voters are assured by microchip embedded national identity (NID) card and biometric fingerprint technology, which is unique for every single voter. Also, with the help of live image processing technology, this system becomes more secure and effective. As voting is an individual opinion among multiple, so the second influence is unacceptable. So while voting if multiple faces detected by the camera module of the voting machine, automatically the vote will not be counted. Viola–Jones algorithm for face detection and local binary pattern histogram (LBPH) algorithm for face recognition has begun the image preparing innovation increasingly exact and faster. Four connected machines work together to accumulate each successful vote in this system. To reduce corrupted situation and to recapture the faith of mass people on the election, this inexpensive and effective system can play a vital role. Keywords EVM · Image processing · Face detection · Live recognition · Fingerprint · Inexpensive · NID · Offline
Md. Nazmul Islam Shuzan (B) · M. Rashid · Md. Aowrongajab Uaday Department of Electrical and Computer Engineering, North South University, Plot-15, Block-B, Bashundhara, Dhaka 1229, Bangladesh e-mail: [email protected] M. Rashid e-mail: [email protected] Md. Aowrongajab Uaday e-mail: [email protected] M. Monir Uddin Department of Mathematics and Physics, North South University, Plot-15, Block-B, Bashundhara, Dhaka 1229, Bangladesh e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_56
625
626
Md. Nazmul Islam Shuzan et al.
1 Introduction A process consisting of a set of rules to create a body that governs a specific community or to regulate a certain choice depends on the vital viewpoint is voting. Maximum countries all over the world create their governmental body by voting. Ballot box voting is being practiced in most countries currently. Because of the staring expansion of technology, the ballot papers and ballot boxes are transformed into machines, electronic voting machines (EVM). Presently, more than 20 countries are using electronic voting machines in their elections, such as USA, Canada, Germany, and India [10]. In Bangladesh, EVM is introduced firstly in 2007, while officers club election [7] and after that in several election, EVM appeared but the success rate is not good enough [3]. Besides, a controversial situation was experienced during the last US Presidential Election in 2016 [2]. Statically the suspected causes behind these incidents might be unethical hacking of the machines remotely, provocation in the time of voting, corruption of the election incharge, and other else. Because of these negative intentions, the proper voting process is violated. In a result, the mass people are losing their faith in this voting system, and the percentage of participation is decreasing continuously. Considering these circumstances, this paper got the motivation to get relief from these situations and increase the percentage of participation of the voters by gaining back their faith. This paper introduces a novel type of electronic voting machine which works on fully offline and without any human interaction so that the immoral hacking can be prevented. Unauthorized human interferes while the voting process can be prevented by identifying the voter instantly on the specific voting zone. And also to provide a private fence, detecting multiple persons over the zone must be ensured. This process can be possible using image processing technology. As voting is a private decision, so there is no necessity to be present any second person over the voting zone with the voter. So, while voting if any person without the voter is detected by the camera which is set up on the voting machine, automatically the counting machine will not count that vote. So here, the counting machine acts as a decision maker who takes the decision that the vote will be counted or not and also as a database to save each and every vote details. The counter machine will be a wire connected with a machine which will be used as a voting machine. The voting machine is basically for submitting vote. Buttons on the voting machine will be used to submit a vote for each candidate of an election. Now to verify the voter, an identification machine will be connected with the voting machine. A biometric fingerprint sensor module and a NID scanner module will be integrated to identify the voter. Only a successful authentication from identification machine can initiate the voting machine. After the submission of a successful vote, the identification machine and the voting machine will come to the reset mode automatically and ready to take another vote. These processes are explained elaborately in this paper to reach the goal so that a transparent secured voting system can be employed in elections.
IoT and Computer Vision-Based Electronic Voting System
627
2 Background Srinivasu et al. [11] introduced an electronics voting machine which is required an Adhaar card to identify voters. From Adhaar card, EVM will scan a QR code. Voters also need to scan fingerprint. And there are plenty of papers which they introduce biometric fingerprint voting system [1, 5]. Okokpujie et al. [8] described an improved fingerprint algorithm. But their system is not able to secure the voting process. This system helps to remove the fake voting, but it is unable to identify multiple human existences in the voting zone. In our system, we focus on other human interference on the voting spot to prepare the voting zone highly secure. We introduce a novel voting process using face detection and recognition (FDR). Kone et al. [10] proposed a voting system which used Adhaar card, biometric of face recognition with IoT. They showed a statistic over online-based EVM system. From this statistic, it proved that online-based EVM system is not highly secured due to hacking possibilities and network scarcity, etc. This proposal influences us to work on offline-based EVM system which reduces the error of fake voting. We accumulated all kind of data from voter NID card. Lavanya [6] explained the security issue of voting zone. To ensure the security issue, this system introduced an improved voting machine hardware and software connection. But in our system, we installed a camera by which we can prevent multiple people existence on the voting circle. Selvarani et al. [9] delineated an idea about mobile voting system wherever any voter can vote using the concept of short messaging service. This method will increase participation of election. But it cannot provide high security. In our system, we verified a voter using their NID, fingerprint, and face recognition.
3 Algorithms and Implementations The proposed system (see Fig. 1) is developed based on the previous NID and fingerprint verification system. NID is that technology which stores the documents related to the voter, such as name, birth date, birthplace, voter number, and voting center. It also helps to verify that a voter has already submitted his/her vote or not. After successful verification of NID, the fingerprint mechanism (I) and face detection and recognition (FDR) (II) will automatically be activated together. Biometric fingerprint verification is required to confirm the authentication of the genuine voter, because every NID stores its unique fingerprint, and it verifies before activating the voting machine. NID also stores a voter image of the frontal face. The FDR module will compare the voters live image of the frontal face with the stored image in NID. While casting a vote, it additionally watches that is there any individual with the exception of the voter is available in the voting booth or not. It actually works by comparing the eyes, nose, and mouth detection. This procedure will proceed until the vote is casted.
628
Md. Nazmul Islam Shuzan et al.
Fig. 1 Operational flowchart
After successful authentication, the voting machine will be activated and ready to cast a vote. Candidate’s election symbol will be placed there along with an individual button. The voter can vote their preferable single candidate by pressing the specific switch just beside the preferable candidate’s symbol. The machine inputs single vote after activation. So there will be no chance to give multiple votes by a voter. There will be a specific button to give No Vote. After submitting vote, the machine will be deactivated itself after sending the data to counting machine.
The counter machine will take signals from the voting machine and FDR module. After successful input from the voting machine, the identifier of the counter machine analyzes the data from the FDR module. During the whole voting process, if FDR module identifies more than one person then the counter machine does not count the vote; otherwise, the counter machine will count the vote automatically. The counter machine can be accessed only with the authorization of the specific authorized person. Specific biometric fingerprint and NID authentication must be required to get access to the counter machine. This access will allow the person to spectate the count and will not allow to modify it.
IoT and Computer Vision-Based Electronic Voting System
629
3.1 Face Detection and Recognition Application of Face, Eyes, Nose, and Lips Detection: Utilizing the Viola–Jones algorithm [12], we distinguish human faces, eyes, nose, and lips. It is trained by using a few appearances and non-faces and preparing a classifier that recognizes faces. Once the preparation is done, it is ready to distinguish any face approaching the picture. At first, we gather 700–1000 human faces and mark them as faces. Then, we input around 1000 non-faces or non-countenances and mark them as non-faces in the algorithm. Afterward, we input a picture as information and the result advises if it is a human face or not. Same procedure will apply on eyes, nose, and lips detection. When a voter enters to the voting center, firstly, we make sure that human eyes, nose, and lips are existed there or not, if not exist, then we announce that make the face more frontal which also help for recognizing the face. Σ I (x, y) =
O(A4) + O(A1) + O(A2) + O(A3)
(1)
(x,y ∈ z)
where I is the integral image and O is the original image. The summation of any rectangle area by using the integral image is very efficient. And the equation is given below. A weak classifier, 0 if p f (x) < pθ
h(x, f, p, θ ) =
(2) 1 otherwise
where feature (f ), polarity (p), and threshold (θ ). So, strong classifier,
630
Md. Nazmul Islam Shuzan et al.
Fig. 2 LBPH algorithm process 1
h(x) = sgn
M
a j h j (x)
(3)
1 βt
(4)
J =1
where a j = log
Application of Facial Recognition: There are many different types of face recognition algorithm and different types of algorithm have a different type of feature extraction process and comparing the matching image with the input image among them, we have used local binary pattern histogram (LBPH) [4]. 1. Applying the LBPH algorithm operation (Fig. 2) The LBPH algorithm analyzes a sliding window depend on the parameters radius and neighbors. We have to convert a facial image in grayscale. We can analyze that image as a window of 3 × 3 pixels. Then, we make a threshold value based on the central value of the matrix based on each 3 × 3 windows central value. We determine a new binary value. If the pixel intensity value is greater than the central value, we set the value 1 otherwise we set 0 for lower than the threshold. And now we will only consider the binary value ignoring the central value. Now we need to concatenate each binary value line by line and make a new binary value (10001101). Then, we need to convert this binary value into a decimal value and set it to the central value of the matrix which is mainly a pixel from an original image. Thus, the LBPH procedure occurred. 2. Extracting Histograms Now, we need to divide the image using the parameter Grid X and Grid Y based on the below image (see Fig. 3). We can extract the histograms of each region. After analyzing each region of the grayscale image, we get a new LBP histogram which contains only 256 positions (0 255) which represents the occurrence of each pixel intensity. We need to concatenate each histogram and make a new histogram which we consider the final histogram of that image.
IoT and Computer Vision-Based Electronic Voting System
631
Fig. 3 LBPH algorithm process 2
Fig. 4 Live streaming FDR implementation
3. Recognition performance We have a dataset and for each dataset, we trained the above algorithm. So, for each trained dataset, we stored unique histogram. Now, we take a test face, we apply the above algorithm and store the histogram. Now, we have to compare this with each trained histogram using Euclidean distance theory so that we can find the minimum error and the most minimum error rate we consider it a best prediction or best matches. 4. Implementation (Fig. 4) Using MATLAB R (2018a), we implemented the voter face verification process. To detect and recognize the face, we used Viola–Jones and LBPH algorithm. We collect the data from voter NID card where a training set is collected (for each person 8 images to make the recognition more accurate). After that we stored it in a folder and then we applied detect and extract feature. Now, we start the streaming USB camera and repeat the above step. For voting access, single face is required instead of a multiple face which is described in Fig. 4.
632
Md. Nazmul Islam Shuzan et al.
3.2 Voting Machine While implementing the hardware system, the machine is divided into four modules. As the whole system works in offline, all four modules are interconnected with wires. Here, for implementation purposes, four Arduino modules are used as the microcontroller of four modules. Here, the four modules are: 1. 2. 3. 4.
Identification machine as Module-A FDR machine as Module-B Voting machine as Module-C Counter machine as Module-D.
Separate power source and security procedures are ensured in each module. At a time single module turns into active mode. For example, the successful identification of Module-A will initialize Module-B and Module-C. After that Module-A turns into idle mode until the whole process is completed. A database system is connected with the modules. The information about the voters are stored in the database. Detailed illustrations of these modules are brought up here: 1. Module-A: In the identification machine, one NID scanner, one fingerprint scanner, and one display are installed with an Arduino. Here, Arduino is used as the main processor of this machine only. In the very beginning, this module remains activated. Fingerprint scanner and NID scanner works in a circular way so that any one of these two can be used at first. After that another one will be the second. When a user/voter punch his/her NID on the scanner, the machine will check the availability of the NID number in the database. If it is available
IoT and Computer Vision-Based Electronic Voting System
633
Fig. 5 Identification machine
there and no vote casted before, then the machine will ask for voters fingerprint (Fig. 5). 2. Module-B: After the successful single from Module-A face detection and recognition machine initializes and starts detecting and recognizing the face/s which are available in front of the machine. It starts matching the live face with the face stored in NID. If both faces are matched, then a positive signal is sent by this machine to counter machine. And also it detects is there any multiple faces are available in front of the machine. If multiple face detected or wrong face detected which does not match with the NID face, then FDR machine provides a negative signal to the counter machine. Counter machine holds this signal and waits for the successful signal of voting machine. 3. Module-C: The voting machine as Module-C is the machine consisting of a number of buttons based on the candidates of a voting area. There is another button to give No Vote. The user/voter can vote only one time. After submitting one vote the machine will send a signal to the counter machine and deactivated itself so that another vote cannot be inserted. 4. Module-D: The final stage of this system is the counter machine. After receiving both successful signals from FDR machine and voting machine, the microcontroller analyzes the signal from FDR machine first. If the signal is positive that means only single and authorized face is detected, then the microcontroller store the data from the voting machine and recount the vote. And if the signal from FDR machine is negative that means multiple face is detected, then the microcontroller store the vote in rejected section and does not count it with the successful votes. After this storing process, the counter machine signal back to all the machine to refresh them and get ready to cast for a new vote (Fig. 6).
634
Md. Nazmul Islam Shuzan et al.
Fig. 6 Implemented voting and counting machine image
4 Results and Discussion 4.1 Face Detection and Recognition We analyze the face recognition accuracy level and for that, we took two faces from our dataset. Using the above algorithm, we create each of the dataset LBP histograms. From the below figure (see Fig. 7), we just showing the two of the dataset histograms. After that, we took a test image and apply the above algorithm and make a histogram which is shown in (see Fig. 8). Now, we have to compare this test face image histogram with each of the training datasets histograms, and from below figure (see Fig. 9), we can see the clear difference between test face and two of the training sets histogram. Before making a prediction, we have to observe the histogram result, if the difference between the training set and test image histogram is minimum, only then we can call it best prediction. In figure (see Fig. 10), we can see the final best prediction result through a histogram. Using Euclidean distance theory, we measure the minimum error or face recognition accuracy level. Using this theory, we subtract each training set histogram from test face histogram. And the
Fig. 7 Training set with LBP histogram
IoT and Computer Vision-Based Electronic Voting System
635
Fig. 8 Test face with LBP histogram
Fig. 9 Compared between trained and tested histogram
Fig. 10 LBP histogram best prediction
minimum result is called the best match or best prediction. So, we get satisfactory face recognition result using the LBPH algorithm and the accuracy level can be improved using a multiple face recognition algorithm. Face detection and recognition is the most important part of our daily life. To make the voting system more efficient, we individually detect each of the human face eyes, nose, and lips. When each of the part of the human face is detected only then we can run our face recognize process. From below figure (see Fig. 11), we analyze the performance of this FDR system. In recognizing process, we trained human face depended on their frontal face. In the future, we have to detect human face in various alignments so that we can easily find human existence. As a result, the voting system will be more secure and accurate. We discussed the challenge of this FDR system. So, there are many opportunities for improvement this system. We
636
Md. Nazmul Islam Shuzan et al.
Fig. 11 Detection process
hope our work will provide an insight into the future works needed to be done in this arena to make these systems practical and perform well in real-life scenarios.
4.2 Voting Machine In the hardware-implemented part, we have used Arduino as the microcontroller of each machine. We make the communication in between two machines with wire because the primary principle of this system is the total procedure will complete in offline. The fingerprint scanner of the identification module is the most challenging part of this machine. This machine stores the training data in its flash memory. After that it analyzes a confidence rate to identify the trained finger. The higher confidence rate is the more accurate finger detected. Here is presented the confidence rate of a specific finger of the fingerprint sensor which is implemented in our machine. We have done 20 random tests of a specific finger. The finger data was stored before. From the data, we can see that 50% results of the total 20 test are above the average. Here, all 20 test results are successful test results. The minimum confidence rate to get the success is 52 and the maximum is 217. The machines we created are in three parts. But this can be possible in a single processor. That time the machine will be in a compact size. But we are ensuring security in every machine so that three-phase security system is implemented in this machine. As all the data is retrieved and stored from offline database so that power failure cannot damage the data and process (Fig. 12).
IoT and Computer Vision-Based Electronic Voting System
637
Fig. 12 Confidence rate of fingerprint sensor
5 Conclusion As voting is one of the most significant democratic rights, the mismanagement and corruption are the main diseases of this system. As the machine runs in full offline, the jamming or hacking probability turns to zero. The hardware-implemented parts can be upgraded and developed with better advanced electronic equipment. But that time cost can be varied. And the detection and recognition probability of the algorithms increase with maximum training data and sufficient light in the voting zone. Above all, this described system is the ultimate choice to evaluate people’s selection in an ethical way. Compliance with Ethical Standards Fund This research work was partially supported by the North South University (NSU) CTRG Research Grants under the Project Grant No.: NSU/MAT/CTRGC/47. Conflict of Interest The authors declare that they have no conflict of interest. Ethical Approval This chapter contains the picture of participants as per their ethical approval. Informed Consent Informed consent was obtained from all individual participants in the study.
References 1. S. Anandaraj, R. Anish, P. Devakumar, Secured electronic voting machine using biometric, in 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS) (IEEE, 2015), pp. 1–5 2. C.D.: United states presidential election of 2016 (Jul 2019), https://www.britannica.com/topic/ United-States-presidential-election-of-2016
638
Md. Nazmul Islam Shuzan et al.
3. S. Correspondent, Evm proves prone to abuse (Dec 2018), https://www.thedailystar.net/ban gladesh-national-election-2018/evm-use-electronic-voting-machine-showing-problems-slo wing-down-polling-1680712 4. M. Jia, Z. Zhang, P.J.D. Song, Research of improved algorithm based on lbp for face recognition, in Chinese Conference on Biometric Recognition (Springer, 2014), pp. 111–119 5. D. Karima, V. Tourtchine, F. Rahmoune, An improved electronic voting machine using a microcontroller and a smart card, in 2014 9th International Design and Test Symposium (IDT) (IEEE, 2014), pp. 219–224 6. S. Lavanya, Trusted secure electronic voting machine, in International Conference on Nanoscience, Engineering and Technology (ICONSET 2011) (IEEE, 2011), pp. 505–507 7. Ltd., P.L.B., Electronic voting machine (evm), https://www.pilabsbd.com/portfolio-items/evm/ 8. K. Okokpujie, E. NO, S. John, E. Joy, Comparative analysis of fingerprint preprocessing algorithms for electronic voting processes, in IT Convergence and Security 2017 (Springer, 2017), pp. 212–219 9. X. Selvarani, M. Shruthi, R. Geethanjali, R. Syamala, S. Pavithra, Secure voting system through sms and using smart phone application, in 2017 International Conference on Algorithms, Methodology, Models and Applications in Emerging Technologies (ICAMMAET) (IEEE, 2017), pp. 1–3 10. K. Srikrishnaswetha, S. Kumar, M. Rashid, A study on smart electronics voting machine using face recognition and aadhar verification with iot, in Innovations in Electronics and Communication Engineering (Springer, 2019), pp. 87–95 11. L. Srinivasu, K. Rao, Aadhaar card voting system, in Proceedings of International Conference on Computational Intelligence and Data Engineering (Springer, 2017), pp. 159–172 12. P. Viola, M. Jones, Robust real-time face detection. Int. J. Comput. Vision 57(2), 137–154 (2004)
Lexical Repository Development for Bugis, a Minority Language Sharifah Raihan Syed Jaafar, Nor Hashimah Jalaluddin, Rosmiza Mohd Zainol, and Rahilah Omar
Abstract The Bugis community migrated from Sulawesi, Indonesia and settled in the Malay Peninsula in the seventeenth century as a minority community. As a minority community in a new territory which was dominated by a more powerful community, the use of their mother tongue in communication within a family and other public domain has been non-dominant and has been shifted to Malay. With this regard, this research intends to investigate the factors that have led the Bugis community to shift their language to Malay. Due to language shift, the Bugis language has become non-dominant and is currently facing rapid attrition. Therefore, this research also attempts to contribute knowledge about Bugis language loss by developing a lexical repository for the language. In order to accomplish this motive, observation and interviews were carried out in a number of Bugis villages in Johore and Selangor in Malaysia; to find out the factors which led to the language shift. On the other hand, the development of the repository was done by collecting a large number of words through Rapid Word Collection (RWC) based on the semantic domains. Currently, as many as five thousand words have already been collected and included in the repository. The development of this lexical repository can help in storing as many Bugis lexical items as possible. Storing these items can help in preserving this minority language that is nearly extinct. Keywords Bugis · Minority language · Lexical repository · Language shift
1 Introduction Language plays a vital role in people’s life. It is not merely used for communication, but also as a repository for each language’s unique identity, cultural history, tradition and memory. Therefore, the United Nations Educational, Scientific and Cultural Organisation (UNESCO) has declared 2019 as The Year of Indigenous Languages (IY2019). This declaration is made for both, speakers of the language and also for S. R. Syed Jaafar (B) · N. H. Jalaluddin · R. Mohd Zainol · R. Omar National University of Malaysia, Bangi, Malaysia e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_57
639
640
S. R. Syed Jaafar et al.
others to appreciate the importance they have made to our world which is rich in cultural diversity. With this in mind, we who are from a dominant community in Malaysia would like to take this opportunity to play our role to support the declaration made by UNESCO in preserving and restoring one of the minority languages spoken in Malaysia namely Bugis. The Bugis homeland is Sulawesi, Indonesia. However, the Bugis people can also be found in other areas outside of Sulawesi such as Malaysia and Singapore. Functional load means the language’s ability to function in more than one social domain. The more number of domains it covers the higher functional load it has. For example, in Malaysia, Malay covers almost all the public domains like education, administration, local businesses and communication. Meanwhile, functional transparency is a feature which determines the level of functional load as it shows the autonomy and control that the language has in a particular domain. The language is said to have higher functional load when it does not have to share a particular function with other languages [1]. Although Bugis people have spread to other areas, the language is now in danger of extinction. Their language has been abandoned by not only the young generations but also the elders. Communication within their families, between parents and their children or grandparents and their grandchildren at home is much done in Malay, a language spoken by the majority community in Malaysia. The Bugis language, however, has not widely been used in daily communication among parents and grandparents with their children and grandchildren because their knowledge about the language is extremely limited. Observations showed that young generations were responding to their grandparents in Malay even though the grandparents spoke Bugis to them. It also showed that in some situations, the young generations could understand Bugis but in other situations they could not. Due to the fact that the language used to communicate at home is not in Bugis, transmission of the heritage language from the older generations to the younger one is reduced and eventually the language could possibly not be used at all. Moreover, the language is severely endangered as most of the grandparents and older generations who could speak the language are now no longer alive. Those who are still alive may understand the language but do not speak it to their children or among themselves. It clearly shows that the Bugis language is in danger. Once the language is in danger, an attempt must be made to maintain it. Crystal (2000) has identified six main mechanisms of intervention that language maintenance might be attempted. The mechanisms are, (1) to increase the prestige of its speakers, (2) to increase the wealth of its speakers, (3) to increase the power of its speakers, (4) to improve the presence of the language in the educational system, (5) to ensure the language can be written down, and (6) to provide the language access to technology to its speakers. According to Crystal (2000), the first three mechanisms relate to the status of the language, while the rest relate to people’s access to it. This research thus, aims to discuss the factors that have led the language to become extinct. Language shift has been identified to be the key problem of the loss of minority language of Bugis in Malaysia as migrants in the presence of a dominant second language. The second aim of this research is to help in documenting and
Lexical Repository Development for Bugis, a Minority Language
641
preserving minority languages in Malaysia with an initial attempt to Bugis language. The last mechanism highlighted by Crystal (2000) that is, access to technology to the language’s speakers is implemented to help in preserving and restoring the endangered language before it becomes extinct by developing a lexical repository for Bugis. A repository is an online dictionary that helps to document and preserve a language for many generations. The lexical repository contains information not only about language, but also culture as it shows the language’s unique identity, memory and cultural history.
2 Previous Literature The Bugis community was known as brave sailors, successful traders and travelers. Therefore, they were not always in their hometown. For centuries, many Bugis have been leaving their hometown in Sulawesi, Indonesia, to other places in search for new life and settlements. Mass migration of Bugis people occurred in the late seventeenth and early eighteenth centuries. During the seventeenth century, some of Bugis people had already settled in the Malay Peninsula [2]. In the search of literature, there have been many studies focused on the early history of Bugis people who migrated to Sumbawa, Jawa, Bali, Sumatera and Borneo; including the Malay Peninsula [2, 3]. Bugis people were traveling from one place to another before they reached the Malay Peninsula. They were dropped by in Klang and Selangor and they eventually settled there [4]. Apart from discussing the history of Bugis, there are previous works concerning political issue related to Bugis people. As a foreign power in the Malay Peninsula, the Bugis intervention in the politics of Malaya, especially in the eighteenth century, could be obviously seen when there was a rebellion to conquer the Johore-Riau sultanate under King Kecil, by King Sulaiman. In search of a new and better territory, Daeng Bersaudara who was the leader for Bugis people had taken this opportunity to help King Sulaiman to conquer the Johore-Riau sultanate [4]. The intervention of Bugis in the Johore-Riau sultanate has given a space for the community of Bugis to settle in Johore. As we can see, quite a number of previous researches have focused on the history of Bugis. From the search of the literature, discussion on Bugis in Malaysia has not yet received much formal attention from scholars in various areas particularly concerning the aspect of language. The role and functions of the language are now becoming non-dominant in public domains. It has not been used in important domains like communication within a family and so on. Hence, this research intends to focus on preserving the language from dying by developing a lexical repository for Bugis.
642
S. R. Syed Jaafar et al.
3 Materials and Methods As it was mentioned earlier, a lexical repository for Bugis is developed in order to save the language of this minority community in Malaysia from becoming extinct. To develop the repository, a large number of Bugis lexical items were collected from the native speakers. Lexical items were collected during fieldworks to a number of Bugis villages located at two states in the Peninsular Malaysia namely Johore and Selangor which are located at the southern and west-coast of Peninsular Malaysia, respectively. Two districts in Johore, namely Pontian and Pasir Gudang were visited where a large part of the Bugis community could be found. A good number of Bugis villages at Pontian; such as Makuaseng, Kampung Sungai Kuali, Batu Hampar, Parit Keroma and Peradinand Kampung Pasir Putih at Pasir Gudang were visited. Data collection in Selangor was done in Pandamaran, Klang during Bugis Cultural Event, which many of Bugis speakers from each part of Selangor were invited to the event. In order to collect a large number of lexical items, Bugis speakers were asked to complete a Rapid Word Collection (RWC). RWC is a systematic word collection which is used to gather many words in a short time. The key to RWC is questionnaire which contains various semantic domains like animal, fruits and emotion domains which need Bugis speakers to make note in their native words. This collection of words not only expresses the language of Bugis, but also captures unique cultural identity of the communities. Before the RWC was done, the Bugis leaders in each village were encouraged to call their community members to gather in one place to respond to the questionnaire. At the end of the visits to all the villages, a total of five thousand lexical items were collected. On top of getting lexical items for the repository, fieldworks were also done to get more information related to the ethnic culture of the community. Visiting to their houses was more beneficial to obtaining such needed information. Bugis families from various levels of ages including their grandparents and grandchildren as well as parents and their children were interviewed and observed during our visits. Observation and interview have provided some valuable data to help in identifying the key problem behind language loss.
4 Language Shift in Bugis Communication There are many reasons language endangerment can occur. Some of the reasons include language contact, social structures and others. In this research, language endangerment which happens to the minority language of Bugis is mainly due to language shift. As it was mentioned above, the Bugis people came to the Malay Peninsula and had settlements here since the eighteenth century. For thousands of years, the Bugis had successfully formed and developed their community in Malaysia. As migrants, the Bugis community should give way to the existed community which has already established and dominantly occupied the territory. Settlements in an already
Lexical Repository Development for Bugis, a Minority Language
643
established community had caused the Bugis to allow and accept the more dominant language than theirs. Because of that, the Bugis had to adapt the Malay language which is a more dominant language used by a large community around them. Since a while the Bugis community had adapted to the use of Malay language in their daily life, such as communication with the local people, local business matters and other important social domains. This situation, when two languages are in use and where one language is replaced to another language, is called language contact. It is also known as language shift. When two languages come in contact, speakers of one language tend to shift their language to another language. According to Grenoble [5], language A is adopted by speakers of language B, hence language A replaces language B due to decrease in number of speakers of language B. From the observation and interview session with the native speakers, the Bugis language has largely been shifted due to bilingualism/multilingualism, impact of education, migration and business domains, as discussed below. There are a large number of countries which are bilingual/multilingual due to their linguistic diversity. These include India, Nigeria, The Philippines, Indonesia, Malaysia and many others [6]. Bilingualism/multilingualism is the ability to speak more than one language within one and the same speaker. According to Fasold [6], bilingualism/multilingualism is the most basic condition for language shift in a minority community, though it could not always be the absolute condition for language shift. This can be seen in the Bugis community where Bugis language was used to communicate with all members in the family. This however, has changed when the Bugis language is only used to communicate with parents, while Malay is used with children. The same situation also occurred in other minority languages in Malaysia. For example, Sindhi and Punjabi languages where their mother tongue is used to accommodate with their grandparents, while Malay and English are used with their children [7, 8]. The language used to communicate between young and older generations is different in the Bugis community. Young generations are not fluent in their own mother tongue anymore and therefore, communication between them should always be in Malay and sometimes Malay and Bugis together. It was observed that only a small number of young generations who could understand the Bugis language but would always respond it in Malay. The disability to respond in Bugis is due to the impact of education they have received. Education is always perceived to be very important as it is believed to facilitate securing jobs in the future. In addition, education in public schools in Malaysia is done in Malay. Every subject in public schools in Malaysia is taught in Malay except for Chinese and Tamil schools. Children who were sent to the public schools were definitely taught in Malay and therefore, they became fluent in the language more than their mother tongue. This situation further decreases the proficiency in their mother tongue which leads to language shift. It was observed too, that children who were sent to boarding schools in town or somewhere else outside their village or community were absolutely fluent in Malay much more than the Bugis language. It is clear that the education system in Malaysia has given a big impact on the use of language among young generations of the Bugis.
644
S. R. Syed Jaafar et al.
Migration is another factor that is commonly associated with language shift. When a minority community migrated to an area where another community is dominant, language shift often occurs. This is also highlighted by Fishman [9] as he stated that a powerful language would have a high tendency to either dominate or replace another language when they are in contact. As the case for the minority community like Bugis, Malay language dominates the Bugis language. As a result, the bugis language has become less powerful. In most situations in life, Malay is widely used compared to Bugis language.
5 Developing a Repository for Bugis Lexical Items Discussions on endangered languages have been the main focus in the area of computer and technology linguistics. Some substantial works are [10–14] and many others. It is no doubt that computer and technology has played its valuable role in saving and preserving endangered languages. For instance, the language corpora database was created to compile Uralic languages which contain natural spoken language data. Data documentation can be done through either written form or technology which is able to keep large data and is accessible anywhere. With this in mind, hence, an idea to develop a lexical repository for Bugis has been proposed mainly for the language’s survival as the number of speakers of this language has gradually dropping. There are at least a couple of reasons why Bugis language has fewer speakers. The main reason is that the oldest generations who were able to speak the language are dead. Only a few of them are still alive. Another reason is that the language is not so much being learnt by young generations because they have started to shift to another language namely Malay language, which is more dominant than Bugis. Moreover, young generations were taught in Malay-medium government schools and this has affected their language for communication at home. This research therefore, aims to save the language from dying by documenting the language’s lexical items into a repository. In the following discussion, the process development of the repository is explained.
5.1 Development Process of the Repository A conceptual model namely Systems Development Life Cycle (SDLC) is used to develop a repository for Bugis. There are five phases in this model, which are requirement analysis, designing, implementing, testing and evaluating. Though phases in SDLC have different functions from one to another, they however have the same objective. The description of each of the five phases is as presented below:
Lexical Repository Development for Bugis, a Minority Language
645
1. Requirement analysis. This phase focuses on the main problem in the research. The system development is outlined based on the problems that have been identified. 2. Designing: At this phase, the process of development of this system is explained in details. This includes the explanation about designing and developing the screen layouts, materials that are exhibited in the system and other things that are related to the development of the system. This system was designed for a repository where data are stored in a system called repository. It is designed with a special feature automatically implemented within the system. Responsive Web design (RWD) is an approach to Web design that makes webpage render well on a variety of devices and window or screen sizes. RWD is about using HTML and CSS to automatically resize, hide, shrink or enlarge a website to make it look good on all devices such as desktops, tablets, and phones from anywhere, as long as they can get accessed to the Internet. Besides that, searching function is provided in the repository to help users to find certain words by typing the words that they want to find in the searching bar. The system will find matches as you type. All the lexical items were transcribed according to phonetics symbols based on the International Phonetics Alphabet (IPA revised 2015) together with their meanings in Malay and English (Table 1). 3. Implementation: Basically for this process, it involves installing and bringing it into operation and production. 4. Testing: At this stage, the prototype is ready to be used by user. User requires testing and verifying the functions of the system. This is important to ensure the system is in order and work as planned. Table 1 Searching function in the repository
646
S. R. Syed Jaafar et al.
Table 2 Advantages and disadvantages of SDLC [15] Factors
Waterfall
V-shaped
Evolutionary prototyping
Spiral
Iterative and incremental
Agile
Unclear user requirement
Poor
Poor
Good
Excellent
Good
Excellent
Unfamiliar technology
Poor
Poor
Excellent
Excellent
Good
Poor
Complex system
Good
Good
Excellent
Excellent
Good
Poor
Reliable system Good
Good
Poor
Excellent
Good
Good
Short time schedule
Poor
Poor
Good
Poor
Excellent
Excellent
Strong project management
Excellent
Excellent
Excellent
Excellent
Excellent
Excellent
Cost limitation
Poor
Poor
Poor
Poor
Excellent
Excellent
Visibility of stakeholders
Good
Good
Excellent
Excellent
Good
Excellent
Skills limitation Good
Good
Poor
Poor
Good
Poor
Documentation
Excellent
Excellent
Good
Good
Excellent
Poor
Component reusability
Excellent
Excellent
Poor
Poor
Excellent
Poor
5. Evolution: This is the final process in SDLC which measures the effectiveness of the system and evaluates potential enhancements. SDLC has a number of models namely Waterfall, V-Shaped, Evolutionary Prototyping, Spiral, Agile, and Iterative and Incremental. Each model has its advantages and disadvantages, as shown in Table 2. Therefore, it is important to choose which model the development of the Bugis repository system can be based on. Considering the advantages and disadvantages of each model, this research has chosen the Agile Model as it is based on iterative and incremental development, where requirements and solutions evolve through collaboration between cross-functional teams. The advantages of this model are: • Decrease the time required to avail some system features. • Face to face communication and continuous inputs from customer representative leaves no space for guesswork. • Each phase can be tested with user in order to fulfill user requirements specification. • Participation of user in system development can be increased and thus, the system can be easily understood by the user. • The end result is the high-quality software in the least possible time duration and satisfied customer.
Lexical Repository Development for Bugis, a Minority Language
647
6 Conclusion To conclude, based on the data collected, a minority community namely Bugis who had settled in Malaysia since the eighteenth century has been influenced by the more powerful community in many ways. The powerful community has affected the minority community’s life especially their language. As discussed above, the mother tongue of the Bugis has less speakers due to language shift which occurred. Factors such as bilingualism/multilingualism, impact of education and migration have led to language shift. With regard to the factors discussed above, the Bugis language eventually can become extinct. Therefore, efforts should crucially be taken in order to save this minority language from dying. As it was highlighted in this research, an initial attempt to help in preserving the language is done through a development of lexical repository. Lexical repository for Bugis has been created in the hope that the young generations from this minority community who wish to learn their mother tongue in the future will be able to access to the repository. Besides that, other communities who are interested to know and learn Bugis language will have a reference preserved in the system. Compliance with Ethical Standards This research has been awarded by the National University of Malaysia under the grant: University Research Grant [grant code: GUP-2018-003]. Conflict of Interest The authors declare that they have no conflict of interest. Ethical Approval This chapter does not contain any studies with human participants or animals performed by any of the authors. (If study has been done then the statement will be: This chapter contains the study/survey etc. of participants/students/human being etc. (Whatever taken in the paper) as per their ethical approval). Informed Consent Informed consent was obtained from all individual participants included in the study.
References 1. R. Pandharipande, Minority matters: issues in minority languages in India. Int. J. Multicultural Soc. 4(2), 213–234 (2002) 2. K. Rahilah, N. Nelmawarni, Sejarah Kedatangan Masyarakat Bugis Ke Tanah Melayu: Kajian Kes Di Johor. JEBAT 36, 41–61 (2009) 3. Soehartoko, Merantau bagi orang Wajo Makassar: ringkasan penelitian, Pusat Latihan Penelitian Ilmu-Ilmu Sosial Universitas Hasanuddin (1971) 4. D. Zuraidi, Malay-Bugis in the political history of Johor-Riau and Riau-Lingga kingdoms. Prosiding Seminar Internasional Multikultural dan Globalisasi 2012 (2012) 5. L.A. Grenoble, Endangered Languages. Elsevier Ltd. (2006) 6. R. Fasold, The sociolinguistics of society (Blackwell, Oxford, UK, 1987) 7. M.K. David, Language shift among the Sindhis of Malaysia. Doctoral Thesis. University of Malaya, Kuala Lumpur (1996) 8. M.K. Kundra, The role and status of English among the urban Punjabis in Malaysia. Master’s Dissertation, University of Malaya (2001)
648
S. R. Syed Jaafar et al.
9. J.A. Fishman, Language and ethnicity in minority sociolinguistic perspective (Multilingual Matters LTD., England, 1989) 10. C. Gerstenberger, N. Partanen, R. Micahel, W. Joshua, Utilizing language technology in the documentation of endangered uralic languages. Northern Eur. J. Lang. Technol. 4(3), 29–47 (2016) 11. C. Gerstenberger, N. Partanen, M. Rießler, Instant annotations in ELAN corpora of spoken and written Komi, an endangered language of the Barents Sea region, in Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages (2017) 12. P. Littell, A. Pine, H. Davis, Waldayu and Waldayu mobile: modern digital dictionary interfaces for endangered languages (2017) 13. W.P. Sze, S.J. Rickard Liow, M.J. Yap, The Chinese Lexicon Project: A Repository of Lexical Decision Behavioral Responses for 2,500 Chinese Character (2013) 14. I. Ullah, Digital dictionary development for Torwali, a less studied language: process and challenges, in Proceeding of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages, vol. 2 (Honolulu, Hawaii, USA, 2004, 2019) 15. M. Sami, Choosing the right Software development life cycle model. Retrieved August 17, 2019, from https://melsatar.blog/2012/03/21/choosing-the-right-software-developmentlife-cycle-model/ (2012)
Toward EU-GDPR Compliant Blockchains with Intentional Forking Wolf Posdorfer, Julian Kalinowski, and Heiko Bornholdt
Abstract With the introduction of the European Union’s General Data Protection Regulation service, providers have to make various adjustments to their business processes and software. Under the GDPR users can, at any time, request that their data is deleted when they no longer use a service or have their data rectified when the service provider stores inaccurate or false data. However, blockchains or distributed ledgers all have in common that they are “append only”. Also every modification to historic data can be immediately detected through the now invalidated hash values stored in every block. In this chapter, we propose a transaction model based upon our previously devised dispute mechanism to cope with the GDPR requirements of deletion and rectification. By introducing the longest-chain-rule into a BFT-PoS Blockchain, we can achieve a validator approved intentional fork to modify previous data. Keywords Blockchain · Byzantine fault tolerance · Proof of stake · Fork · GDPR · Right to erasure
1 Introduction Introduced through Bitcoin [13], the blockchain offers far more applications than just being a “cryptocurrency”. Applications can range from supply-chains, healthcare, order-books to freight handling and many more. All these directly benefit from the W. Posdorfer (B) · J. Kalinowski · H. Bornholdt Department of Informatics, University of Hamburg, Vogt-Kölln-Straße 30, 22527 Hamburg, Germany e-mail: [email protected] J. Kalinowski e-mail: [email protected] H. Bornholdt e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_58
649
650
W. Posdorfer et al.
decentralization through a peer-to-peer network, transparency of data and consensus, data integrity and data immutability through signatures and checksums (hashes). While these blockchain features are generally very favorable in many distributed systems for various use cases, the inability to delete data can be a problem. It is computationally infeasible to alter or delete anything in previous blocks, due to the recalculation of hashes that need to match the difficulty target in Proof-of-Work blockchains. The computational effort increases drastically when data lies very far in the past. Blockchains do not solely use the computationally work-intensive Proof-of-Work (PoW) algorithms to reach consensus. The family of Proof-of-Stake (PoS) algorithms have in common that they all try to minimize the workload imposed on miners in one or the other way. Byzantine-Fault-Tolerant PoS algorithms solve the Byzantine Generals Problem in a blockchain specific setting. Through this leader-based consensus, they can guarantee instant block finality. By having one leader per block height, the blockchain cannot create forks resulting in temporary different states, making it ideal for business processes that cannot tolerate the eventual consistency of Bitcoin and other PoW Technologies. However, the European Union’s General Data Protection Regulation (GDPR) [8] contains the requirement for data to be deleted or altered. When a provider stores personal data of a user, the user has the right to demand that his data is deleted or corrected. Since on-chain storage of personal data is not feasible under the GDPR providers usually opt for off-chain storage on central servers or encryption of data with unique keys and deleting them upon request. This either results in more centralization with central servers, or a legal gray area, since technically, deleting encryption keys does not prevent decryption through brute-forcing. In this paper, we present a different approach to handle the deletion and rectification of user data to comply with the EU-GDPR. Building upon previous work [14], we have devised a dispute mechanism and transaction model to delete or replace previous data by integrating the longest-chain-rule of PoW consensus algorithms into a BFT-PoS algorithm. By intentionally forking the chain in previous heights and having validators confirm and vote for the changes, we can make adjustments to the blockchain history. The remainder of the paper is structured as followed: Sect. 2 outlines foundations of blockchain technologies, the general building blocks and a brief introduction to consensus algorithms. Section 3 introduces the relevant articles 16 and 17 of the GDPR, while Sect. 4 summarizes the previously devised dispute mechanism. Afterward Sect. 5 introduces the transaction model to rectify and delete data from the blockchain and showing problems of this approach in Sect. 6. In the end, we summarize our findings and give some insights for future work.
Toward EU-GDPR Compliant Blockchains with Intentional Forking
651
2 Blockchain A Blockchain is a decentralized data strucutre. Its internal state and consistency are constructed and retained via a consensus algorithm. The algorithm likewise guarantees that every participating node also has identical state. Depending on the technology all, most or some nodes hold a fully replicated state or parts of it [2]. Commonly, the state of the blockchain is modified via a transaction. Multiple transactions submitted in a close timeframe are grouped into a block. Every block contains a hash of its predecessors block and the hash over its own data. The block hash is constructed by hashing the respective predecessors hash (Prev_hash), a Timestamp, the transactions merkle-tree hash (TX Merkle Root) and a Nonce. Figure 1 depicts an exemplary blockchain data structure showing three blocks and their linking via the predecessors hash. Transactions can be created by every network participant and are propagated to other nodes via a peer-to-peer network. Different blockchain technologies employ different transaction types, like currency transferrals, trade deals, function calls in smart contracts or other use case relevant data.
2.1 Consensus Blockchains guarantee transaction and block validity and finality, through a distributed consensus algorithm. Depending on the algorithm finality is either achieved instantly or eventually. Public Blockchains like Bitcoin [13] and Ethereum [5, 18] are using algorithms from the Proof of Work (PoW) consensus family . To gain the right to append a block, a node (referred to as miner in PoW) must solve a form of cryptographic challenge, e.g. calculate a nonce, so that a target hash value resulting from data and nonce is met. This challenge is a computationally expensive brute-force task, which fortunately is trivial to verify once performed. By changing the requirements for the target hash value, the network can selfadjust to a median time between two blocks (blocktime). This also ensures that only occasionally two distinct blocks compete for the same height. When two blocks emerge for the same height, it is called a fork. This non-intentional fork is being resolved through the longest chain rule, which states, that whichever chain is the
Block 5
Block 6
Block 7
Prev_hash
Timestamp
Prev_hash
Timestamp
Prev_hash
Timestamp
TX Merkle Root
Nonce
TX Merkle Root
Nonce
TX Merkle Root
Nonce
Fig. 1 Simplified blockchain data structure [15]
652
W. Posdorfer et al.
8
5
6
9
7
11'
8'
9'
12
13
10
11
Fig. 2 Blockchain with multiple forks
longest (e.g. which block has more successors) must be accepted as the single only valid state. This also means that other variants of the state are immediately discarded once a longest chain is determined. Figure 2 shows an exemplary blockchain with two forks, where the blocks 8, 9 and 11 are being discarded due to the longest chain rule. Proof-of-Stake algorithms on the other hand try to minimize the computational efforts from brute-forcing cryptographic challenges by using stake or voting power [7, 9, 10]. Miners can stake their own assets (e.g. coins) to simplify the cryptographic challenge for themselves. This allows them to use less computing power to solve the challenge. Depending on the technology additional restrictions are imposed on the assets, like the coin age [3, 10], to disallow whales (miners with big amounts of stake) to dominate the network and potentially impact the system in negative ways.
2.2 BFT-based PoS Consensus Private or consortium blockchains usually have different requirements to block finality and a different level of trust when it comes to participating nodes. This allows them to use consensus algorithms that work differently from the standard PoW or PoS of public blockchains. When participating nodes are mostly known or trustworthy, the blockchain can use Byzantine Fault Tolerant (BFT)-based PoS algorithms, which are usually based on the Crash-tolerant or PBFT-algorithms [6, 11]. Since the BFT-PoS algorithms do not require solving a cryptographic challenge nodes do not have to brute-force nonces. However, nodes are separated into two classes. Validators are directly participating in the consensus by voting for and signing valid block proposals, while the other nodes are just “passively” participating, as they do not affect the consensus but can issue transactions into the network. Out of the validators everyone takes turn becoming the proposer in a deterministic round-robin fashion. The proposer is responsible for proposing a single valid block for the current height, which the other validators must vote upon. For a block to become accepted by everyone, it needs to reach more than 2/3 (commonly written as + 23 ) of the votes. Since there is only one proposer per height, there is also only one block-proposal per height, which makes every accepted block instantly final, since there cannot be
Toward EU-GDPR Compliant Blockchains with Intentional Forking
653
competing blocks or forks. This makes BFT-PoS blockchains very suitable for highperformance networks with higher throughput rates and requirements for instant transaction and block finality.
3 General Data Protection Regulation The General Data Protection Regulation (GDPR) of the European Union sets new standards when it comes to acquisition, storing and protection of personal data while also enabling broader sovereignty over one’s own data [8]. Adopted in 2016 by the EU and coming into effect in mid-2018, it puts businesses into the position of applying new rules and regulations toward the data it collects about users for its services [17]. It also requires new approaches from technologies to comply with the laws of the GDPR [1]. When it comes to blockchains the most “troublesome” are Articles 16 and 17, since they directly oppose the principle of immutability of blockchains: – Article 16 Right to Rectification—allows users to request rectification from the controller of incomplete, inaccurate or otherwise wrong personal data concerning him or herself. – Article 17 Right to Erasure—allows users to request that all personal data stored by the controller be deleted in a timely manner. To circumvent these hindrances, providers usually employ off-chain storage or encryption with key-disposal. When storing content off-chain, providers create anchors or pointers in the blockchain that refer to external data stored somewhere else, e.g. central servers with full control by the provider [12]. When a user requests data to be deleted or altered the provider can simply change the off-chain content, thus not changing the blockchain but resulting in an invalid anchor-point, since the data was removed/altered. The other option providers can result in is encrypting the content and storing it on-chain. When a user now requests deletion of his data, the provider deletes the key (or key-pair), which was used to encrypt the data. The data will thus become undecryptable and remains as junk in the blockchain [4]. This method has still to be legally vetted, even though the key has been disposed, data could potentially still by decrypted through brute-forcing.
4 Dispute Mechanism In previous work [14], we have devised a dispute mechanism to contest previous transactions by allowing a Blockchain with a BFT-PoS consensus algorithm to intentionally create forks. In a virtual power plant, scheduling vectors are used to allocate
654
W. Posdorfer et al.
5
6
7
8
9
10
dispute fork
11
12
13
dispute
copy/adjust
copy/adjust
11A
12'
13'
14
fork 8A
Fig. 3 Dispute mechanism
energy production among its producers. Creating an optimal scheduling vector is a difficult optimization task, but verifying that a scheduling vector is better than another one is trivial. The goal was to have a blockchain that only holds one scheduling vector (in the form of a transaction) per given time slot to avoid unnecessary clutter in the chain [16]. Due to network latency, excessive computational runtime or malicious intent it can quite well be possible that a scheduling vector was deemed to be the best, while an actual better vector was not yet computed or broadcasted correctly. In this case, the blockchain contains a block with a non-optimal scheduling vector inside it. Since the goal was to minimize clutter and the blockchain storage footprint, we did not opt for republication of the better vector in a successive block, but instead to contest the currently included transaction. Sending the dispute message Mdispute = {t xoriginal , t xoptimal , blocktarget } must include the optimal transaction, the original transaction and the target block for verification. Once the current proposer has received the dispute message, he will create a new proposal consisting of the replacement and adjustment commands to intentionally fork the chain at the desired target block. As can be seen in Fig. 3 Block 11 is disputed, while Block 12 and Block 13 need to be copied and their hashes adjusted. All validators must vote for this replacement in the normal consensus BFTPoS rounds, by signing the new Block 11A and resigning the Block 12’ and Block 13’. This guarantees that a + 23 majority agrees to the changes and also commits them, ultimately resulting in the intentional fork and replacing the block with the less optimal transaction.
5 Rectifying and Deleting Data In order for the blockchain to comply with Article 16 Rectification and Article 17 Erasure, the dispute mechanism needs to be adapted. The process itself can remain the same: A special consensus message is used to create an intentional fork, which replaces previous target blocks and adjusts/recreates succeeding blocks to have a valid chain with valid hashes. The following adjustments have to be made:
Toward EU-GDPR Compliant Blockchains with Intentional Forking
655
5.1 Adapting the Transaction Verification In the virtual power plant scenario, scheduling vectors can be compared mathematically. Scheduling vectors can be compared based on the overall scheduling cost cost S1 > cost S2 or other quality criterions. This can not easily be done for personal data, e.g. home address or telephone number. In this case, validator nodes need some other type of proof that a new transaction is “better” than the originally persisted transaction.
5.2 Proof of Ownership Proving that a user owns a dataset is a necessity for any personal data that needs to be rectified or deleted. A service cannot allow just any user to delete any data they like. Validators and Proposer will need some form of verifiable proof, which allows the service to determine that a user is eligible for this operation, while also ideally not exposing the users’ identity or personal information.
5.3 Transaction Format In order for a user to prove he owns the original data in question, we propose that transactions be accompanied by a proof of ownership in form of a public key ( pub X ). By providing a signature of the data, a user proves he owns the corresponding private key. The corresponding private key ( priv X ) is obviously never revealed. Ideally, every single transaction should use a different public/private key-pair in order to prevent traceability. Using key derivation functions with a main key-pair could be used to generate new keys for every transaction. Also this transaction format could be easily extended to contain multiple public keys, which would mean that multiple private keys would need to be provided in order to modify or delete a transaction. T X nor mal := {data, pub X }
(1)
To modify a transaction, the user needs to generate a signature of the data which can be verified by the previously attached public key ( pub X ). Since the original transactions data is completely replaced with data’ we can either choose to keep the original pub X or replace it with a new public key pubY . Where Y is consecutively used as a key derivator for a new key-pair. Figure 4 shows which parts of the Message 2 are used to verify ownership and which parts are used to update it.
656
W. Posdorfer et al.
data
modify
data'
verify
verify verify
Fig. 4 Modifying a previous transaction with T X modi f y
T X modi f y := {I ndext x , data , sign pub X (data ), sign pub X ( pubY ), pubY }
(2)
Analogous to the modification transaction the delete transaction must also include a signature verifiable by the public key attached to initial transaction. This again allows the validators to verify the ownership of the data. Additionally, a Nonce and the signature of the nonce is appended to further verify the deletion of the original data. The original transaction can either be fully removed from the merkle tree or replaced with a null-transaction depending on implementation details. T X delete := {I ndext x , nonce, sign pub X (nonce)}
(3)
Whenever an original dataset has been modified or deleted, it will result in a corrupt merkle tree in the original block. The dispute mechanism implies that hashvalues of the merkle tree, resulting new block hash and every following block hashes need to be recalculated. While in the consensus process, the block hashes will need to be approved through the normal voting process (signing the hashes) of the validators.
6 Limitations The presented method for rectifying and deleting data based upon the dispute mechanism unfortunately does not come without trade-offs. When dealing with blockchains that have a long history, the mechanism does take extensively longer the further back the data lies in the chain. Any block that succeeds the modified block needs to be re-verified, re-hashed and resigned by all the participating validators. Also the validators need to broadcast their newly signed blocks to the others so they can achieve a + 23 majority and move on with the consensus. Message propagation in BFT-PoS Blockchains is of quadatric nature in the worst case or with unoptimized protocols. For every additional validator voting on the
Toward EU-GDPR Compliant Blockchains with Intentional Forking
657
changes, the amount of messages rises significantly, while also affecting round times. When choosing a set of validators, this limitation has to be kept in mind. On the other hand, the message size can be kept fairly small, as only the hashes need to be adjusted in the blocks (256 bit for SHA256). Only the hashes will need to be broadcasted, since the transactions in almost all of the blocks remain untouched and are already persisted on every node. Furthermore, integrity and block finality guarantees are severely impaired. While PoW-Blockchains don’t have a block finality guarantee at all, rather a feasible computational finality, BFT-PoS Blockchains have immediate block finality, as they cannot create forks and thus cannot create alternative applications states. When integrating this dispute mechanism, it has to be kept in mind that transactions might not be final. Further, restrictions and dependencies on transactions will need to be put in place to deal with rectification and deletion, which we have not covered in this paper.
7 Conclusion In this paper, we presented a method for applying Article 16—Right to Rectification and Article 17—Right to Erasure of the European Union—GDPR into a blockchain setting. By modifying the dispute mechanism from our previous work, we devised a transaction model to allow deletion and modification of previously finalized transactions, while enforcing a proof of ownership without revealing a users’ identity of the transaction in question. Data owners need to integrate a public key into every transaction as a form of ownership. By signing a special modify or delete transaction, they can prove that they own the data and are entitled to make these modifications. The dispute mechanism integrates into a Byzantine-Fault-Tolerant Proof-of-Stake Blockchain consensus and creates intentional forks. Validators have to vote on the changes using the normal consensus protocol to replace a previous blocks with another one. All hashes in succeeding blocks need to be recalculated and blocks have to be resigned by validators (voting). Rebroadcasting and restarting the consensus for previously agreed upon blocks is one of the bigger trade-offs this method brings, as it can take a significant amount of time , depending on blockchain length, and a significant amount of messages, depending on validator set size. Nevertheless, storing personal data on the blockchain is likely not a good idea. There are better methods, e.g. decentralized cloud storages (IPFS), that are more suitable and do not introduce massive amounts of data into the chain needlessly increasing its size. However if it is required to store data "onchain" and also to be GDPR compliant and if simple encryption of user data is not enough, the presented method shows a way how to achieve this.
658
W. Posdorfer et al.
References 1. D.W. Allen, A. Berg, C. Berg, B. Markey-Towler, J. Potts, Some economic consequences of the gdpr. Available at SSRN 3160404 (2019) 2. A.M. Antonopoulos, Mastering Bitcoin: unlocking digital cryptocurrencies (O’Reilly Media, Inc., 2014) 3. I. Bentov, C. Lee, A. Mizrahi, M. Rosenfeld, Proof of activity: extending bitcoin’s proof of work via proof of stake. IACR Cryptol. ePrint Archive 2014, 452 (2014) 4. M. Berberich, M. Steiner, Blockchain technology and the gdpr-how to reconcile privacy and distributed ledgers. Eur. Data Prot. L. Rev. 2, 422 (2016) 5. V. Buterin, et al., A next-generation smart contract and decentralized application platform. White paper (2014) 6. M. Castro, B. Liskov et al., Practical byzantine fault tolerance. OSDI 99, 173–186 (1999) 7. B.M. David, P. Gazi, A. Kiayias, A. Russell, Ouroboros praos: an adaptively-secure, semisynchronous proof-of-stake protocol. IACR Cryptol. ePrint Archive 2017, 573 (2017) 8. European Union: Regulation (EU) 2016/679 of the European parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec. Offic. J. Europ. Union (OJ) 59 (2016). http://data.europa.eu/eli/reg/2016/679/oj. Accessed on 03 Sept 2019 9. A. Jain, S. Arora, Y. Shukla, T. Patil, S. Sawant-Patil, Proof of stake with Casper the friendly finality gadget protocol for fair validation consensus in Ethereum. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. (2018) 10. S. King, S. Nadal, Ppcoin: peer-to-peer crypto-currency with proof-of-stake. Self-published paper (2012) 11. B. Lampson, H.E. Sturgis, Crash recovery in a distributed data storage system (Jan 1979). https://www.microsoft.com/en-us/research/publication/crash-recovery-in-adistributed-data-storage-system/. Accessed on 03 Sept 2019 12. C. Lima, Blockchain-gdpr privacy by design: how decentralized blockchain internet will comply with gdpr data privacy (2008). https://blockchain.ieee.org/images/files/pdf/blockchaingdpr-privacy-by-design.pdf. Accessed on 03 Sept 2019 13. S. Nakamoto, Bitcoin: a peer-to-peer electronic cash system (2008) 14. W. Posdorfer, J. Kalinowski, Contesting the truth-intentional forking in bft-pos blockchains, in International Conference on Practical Applications of Agents and Multi-Agent Systems (Springer, 2019), pp. 112–120 15. W. Posdorfer, J. Kalinowski, H. Bornholdt, W. Lamersdorf, Decentralized billing and subcontracting of application services for cloud environment providers, in Proceedings of the ESOCC 2018 Workshops (Springer, 2018), pp. 79–89 16. M. Stübs, W. Posdorfer, J. Kalinowski, Business-driven blockchain-mempool model for cooperative optimization in smart grids, in International Conference on Smart Trends for Information Technology and Computer Communications (Springer, 2019) 17. C. Tankard, What the gdpr means for businesses. Netw. Sec. 2016(6), 5–8 (2016) 18. G. Wood, Ethereum: a secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper 151, 1–39 (2018)
Incorum: A Citizen-Centric Sensor Data Marketplace for Urban Participation Heiko Bornholdt, Dirk Bade, and Wolf Posdorfer
Abstract Modern cities and their citizens rely on a vast availability of urban data to offer high living standards. In the past, various systems have been developed to address this problem by providing platforms for collecting, sharing and processing urban data. However, existing platforms are insufficient, as they do not put the needs of citizens first and do not offer solutions for connecting proprietary sensors as they are commonly found in households. We propose a Citizen-Centric Marketplace, in which citizens are first-class entities. With our Incorum approach, citizens will be able to easily collect, share and process data from their existing sensors while maintaining data sovereignty and privacy. Additionally, it is easily extensible with further services and applications which can be traded on the marketplace as well. This distributed marketplace offers incentives for active participation and improves the lives of citizens in a smart city. Keywords Smart city · Decentralized marketplace · Crowdsensing · Participatory sensing · Data space · Data sovereignty · Urban participation
1 Introduction Cities nowadays face a couple of challenges, mostly due to the advent of several megatrends occurring in parallel: The scarcity of natural resources, demographic change, climate change, rising demand for safety, security and privacy, as well as ongoing urbanization [5]. Such megatrends create needs, opportunities, and finally H. Bornholdt (B) · D. Bade · W. Posdorfer Department of Informatics, University of Hamburg, Vogt-Kölln-Straße 30, 22527 Hamburg, Germany e-mail: [email protected] D. Bade e-mail: [email protected] W. Posdorfer e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_59
659
660
H. Bornholdt et al.
lead to keen competition for shared resources and infrastructure among cities as well as within different sectors and among stakeholders within a city. Tackling at least some of these challenges requires a comprehensive understanding of the environment and a city’s context, which in turn allows increasing efficiency, productivity, welfare, resilience and so forth. Up to now, at least urban areas are densely packed with all kinds of sensors. However, in contrast to the vision of a nervous system or the blood circuit, these sensors are not interconnected yet and the data is only locally used, if at all. Several data spaces (or marketplaces) have evolved over the last years to address this issue, but none of these aim at interconnecting everything. They are mostly operated by governments, business organizations or other central authorities and they are targeting companies and expert users instead of the average citizen. However, a city is made up of citizens and therefore we claim that citizen participation is the key to success. Citizens must participate in data spaces/marketplaces, which in turn must be tailored to their needs and preferences. They must be open for everyone and sharing, finding and using data must be as easy as surfing the World Wide Web. For several reasons sharing sensor data often has a negative connotation, mostly due to privacy risks [9]. In order to motivate citizens to share their data, they need to be incentivized and privacy issues need to be overcome. We envision an utterly decentralized marketplace approach. By allowing only the rightful owner of the sensor data to transfer and restrict the usage rights of the data, we can guarantee data sovereignty. Users can choose individually without any imposed restrictions which kinds of sensors (e.g. smart devices, weather stations) they want to integrate into the marketplace. Also, the envisioned marketplace also allows sharing services and applications in order to provide full support for any sensor-related use case. For example, with custom service extensions we may give citizens the ability to restrict their devices to only connect to a local storage and disable all transmission to vendor clouds. Other services may transform, encrypt, classify, enrich or augment the data. For user interaction, we envision a sharing of applications (e.g. visualizations, citizen science apps, social sensor networks) similar to existing mobile app stores. In summary, the main contributions of this paper are: (i) We analyze requirements from a citizen’s perspective to form a Citizen-Centric Marketplace (CCM) and (ii) present our vision of Incorum (from latin incolae: people; forum: marketplace), which enables secure collecting, sharing and processing of sensor data, services and applications between citizens as well as the connection of existing proprietary sensors in order to establish a CCM. The paper is structured as follows: Sect. 2 defines the requirements for a CCM and Sect. 3 evaluates existing approaches with respect to these requirements. Section 4 presents our approach for a CCM: Incorum. Finally, Sect. 5 reflects on our approach, summarizes the paper and highlights prospects for future work.
Incorum: A Citizen-Centric Sensor Data Marketplace …
661
2 Requirements This section presents requirements for a CCM from a citizen’s perspective. For this purpose, we analyzed existing literature (notably [2, 7]), worked out application scenarios and use cases, inferred requirements and finally mapped these onto our vision of a marketplace in which citizens and sensor data are first-order entities. The result is a custom-tailored set of functional and non-functional requirements that are detailed in the following and later on revisited in Sect. 4. From a bird’s eye view, the marketplace shall offer basic means for citizens (i.e. users) to integrate their sensor devices, to create offerings, to browse and search for offerings and to retrieve corresponding artefacts. Artefacts may be data (e.g. temperature time series), services (e.g. filter), applications (e.g. visualization) or other resources (e.g. processing rules as mentioned in Sect. 4). Moreover, for each offering, a user shall define an exchange value and access restrictions as well as usage policies. Data sovereignty is one of the most essential cornerstones of the envisaged marketplace: users shall have as much control over their data as technically feasible, even if the data leaves the user’s sphere. Additionally, everything above a certain threshold of originality, like services or applications, shall be either traded or disseminated for free to create and foster a community. Finally, the marketplace, i.e. the cognitive model, the metaphors, the interactions, and the frontends shall be designed in a way that citizens with average IT skills can easily engage in trading. To fulfil such high-level functionality, we have more detailed requirements on a lower level. In order to enforce sovereignty, security and privacy, we require means for authentication and authorization, secure data transfer, as well as a completely decentralized system architecture without single points of failure and without the need for trusted or superior third parties. Well defined data and interface models for arbitrary preprocessing services, that act on data before it is published (e.g. anonymize, aggregate, encrypt), as well as interfaces for applications that locally offer additional values (e.g. visualization or computing services), are necessary to provide anchors for extensions. For easy integration of sensors, automatic discovery mechanisms are required. Integration of proprietary sensors, as they already occur in many households, as well as external data spaces would allow access to additional sensor data. The marketplace should use techniques that reward data sharing and thus promote the development of a sharing society, whereby every citizen should receive an advantage in participation. Finally, to allow continuous evolution as well as integration/federation with other approaches, the use of open standards is mandatory.
3 Related Work Since more and more cities are dependent on sensor data and the exchange of information is becoming ever more important, several approaches have been developed in the recent past to solve arising challenges. This section presents three different approaches and demarcates these with respect to our requirements.
662
H. Bornholdt et al.
3.1 Industrial Data Space The Industrial Data Space (InDS) is a virtual data space for the secure exchange of information [7]. The most important user requirements taken into account are data sovereignty, decentralized data management, data economy, value creation, easy linkage of data, trust, secure data supply chain and data governance. Since the goals of InDS overlap with ours, the requirements proved very helpful for our requirements analysis. The InDS reference architecture, as well as corresponding InDS instances, are intended to facilitate trading with all types of data and all kinds of users. They are neither tailored to the specifics of sensor data, nor the needs of citizens. Consequently, the approach is based on compromises in order to accommodate the requirements of all data and users.
3.2 Machine eXchange Protocol The Machine eXchange Protocol (MXProtocol) presents a concept for operating a Low-Power Wide-Area Network (LPWAN) for the Internet of Things [3]. The main idea is to connect sensors via (privately owned) gateway nodes to the Internet. Such gateways are open for other people and can be used for a small fee. Payments are processed via the own blockchain-based cryptocurrency Machine eXchange Coin (MXC). At the time of writing, only MXC and the MXProtocol are fully specified. The platform’s global launch is planned for Q4 2019. The release of a data marketplace is planned for Q1 2021. The use of blockchain technology enables a completely decentralized marketplace, which also ensures equality between all participants. By paying for data and providing bandwidth on the gateways, users are motivated to buy gateways and share their data. Finally, the blockchain-based marketplace enables decentralized trading of data between participants. This marketplace is only a rough vision so far and only a few details are publicly available. Hence, it is not clear whether this marketplace will be open or work with proprietary gateways and sensors only. Moreover, communication is based on LPWAN, but up to now, most consumer-grade sensors use Bluetooth, ZigBee, Wi-Fi or LTE. Finally, it is unclear how the communication overhead of the blockchain will be dealt with using the low bandwidth of LPWAN links. And finally, data sovereignty is not considered and there are no plans for a service and application infrastructure.
Incorum: A Citizen-Centric Sensor Data Marketplace …
663
3.3 Streamr Streamr is an open-source platform for the free and fair exchange of real-time data [10]. It is a ready-to-use solution, which offers a marketplace with various existing data sources. After registration, users can purchase access to paid and free data sources and process the data in an online visual editor. Streamr describes itself as part of a revolution in which monolithic solutions (such as Azure Virtual Machines and Amazon Simple Storage Service) are replaced by decentralized blockchain-based solutions. Caused by the ambitious goal to offer any kind of service in a decentralized way, several requirements of a CCM are covered. Their vision also includes a data marketplace, which is referred to as a global universe of shared data streams where anyone can contribute and subscribe to. The use of a visual interface for selecting data sources and processing data makes it possible for users without technical background knowledge to participate in the marketplace. The approach also includes a decentralized architecture for storage (InterPlanetary File System) and computing power (Decentralized Applications), which can be used as additional services for the marketplace. But, up to now, the marketplace runs on a central architecture, which leads to the emergence of a superior third party. All user data and data streams are centrally processed, which does not allow privacy. However, this state should only be temporary. In a future milestone, the marketplace shall be completely decentralized and up to now, no solution is offered for connecting existing (proprietary) sensors.
4 Incorum—A Citizen-Centric Marketplace In this section, we introduce Incorum as a marketplace for trading sensor data and related resources between citizens and our vision of a CCM. Incorum meets all requirements mentioned in Sect. 2 and addresses the open issues of related approaches. The heart of our approach is the so-called Incorum node. Basically, this can be any Internet-enabled device in the LAN or PAN of the citizen, capable of running the Incorum platform (e.g. Raspberry Pi or smartphone). Once the node has been put into operation, it will connect to other (remote) nodes to take part in an unstructured Peerto-peer (P2P) network. Simultaneously, it will start to explore its logical environment to discover local sensors and subsequently start to exchange data on the user’s behalf. The following two examples will illustrate possible use cases. The first one introduces trading on the marketplace while the second example exemplifies the benefits of the whole Incorum ecosystem. Example 1 Alice bought a weather station for her balcony to measure temperature, humidity and atmospheric pressure. Her neighbour Bob also has a weather station, but it measures temperature and particulate matter. Using Incorum, Alice and Bob
664
H. Bornholdt et al.
are able to trade missing readings and thus save the purchase of expensive additional equipment. Example 2 Alice wears a smartwatch to monitor her vital signs, acknowledging that the collected data will inherently be sent to the smartwatch’s vendor in order to be further analyzed. However, Alice is concerned about her privacy. Therefore, she sends all data via a proxy connection to her own Incorum node. Using pre-defined rules, the node can analyze the data packets, extract the information, and finally prevent the data from being forwarded to the vendor. An app running locally on the Incorum node now persists and analyzes Alice’s vital signs and generates appealing visual representations. If required, the data and representations can also be shared with friends or a doctor via the marketplace. These two examples are depicted in Fig. 1 and give an overview of the created P2P network and the way the Incorum nodes work in terms of collecting, processing and exchanging sensor data while also blocking unwanted traffic on a citizens behalf. Leaving the bird’s eye perspective, Fig. 2 depicts the internals of the Incorum node, consisting of several components, which will be further detailed in the following subsections.
Block unwanted Traffic
Vendor Clouds Misc. Data
Misc. Data
2. Announce Request
Incorum Node
Incorum Node 3. Sent Data ‚C‘
Health Data
Temperature
Weather
Misc. Data
Fig. 1 Two Incorum nodes, each located on a citizen’s LAN, collect and filter data from nearby sensors and exchange them with other nodes according to citizen’s behalf Incorum Node Protocol Bindings HTTP
CoAP
Event Bus Rule Engine
MQTT Marketplace Blackboard
Discovery
Service Manager Sandboxed Environment
Persistence
Lifecycle Manager
Web Interface
Citizen's Digital Representative
Fig. 2 The incorum node consists of various components, each responsible for a specific task. Together they act as a citizen’s digital representative on the marketplace
Incorum: A Citizen-Centric Sensor Data Marketplace …
665
4.1 Local Data Collection As soon as the Incorum node has been put into service and connected to the own local network, the Discovery component starts searching for existing sensors. The different protocols that are used in sensor communication (e.g. HTTP, CoAP, MQTT, …) can be added via so-called Protocol Bindings. Sensors with standardized interfaces can be discovered and read comparatively easily. However, the majority of sensors, e.g. as currently used in smart homes, use proprietary interfaces and protocols and can thus usually only communicate with dedicated (smartphone) apps or the cloud services of the respective vendors. In order to access the sensor data, the Incorum node captures the traffic between the sensor and the app or cloud service. For this purpose, different mechanisms exist, e.g. defining the Incorum node as proxy or as a wireless access point, tunnelling a VPN through the Incorum node, sniffing on the local network, or using the built-in support of one’s own router to record all traffic. Encrypted connections are challenging, because the internals of communication (e.g. the sensor readings) are obfuscated. We argue, that some (low cost) sensors that are only used in local settings (e.g. in a smart home), will not use encryption due to higher resource requirements (e.g. battery, CPU, memory). If not, man-in-the-middle attacks can potentially be done by installing self-signed certificates on the corresponding device (if not hindered by certificate pinning). Having access to all traffic, the node’s Event Bus, depicted in Fig. 3, takes over. The left-hand side of the figure shows all traffic packets that are scanned by the node using Deep Packet Inspection (DPI) [8] and Complex Event Processing [6]. Based on the fact that each sensor discloses a unique communication fingerprint, this process is based on rules that specify certain patterns to look for, be it in a packet’s header or body or a certain sequence of packets. Once a pattern matches, a complex event is generated and triggers further processing steps in which the sensor data is extracted. The extracted values can then be stored in the Persistence component to enable subsequent access or further processing of time series. For each sensor and supplier such rules have to be defined, which can as well be traded on the marketplace. This means that the rules only have to be created once by experienced users and can then
Fig. 3 Structure of the event bus that scans all traffic using deep packet inspection to extract data from (proprietary) sensors Legend
Trading
Event Bus
Sensor
Rule Engine
Network Packet
Complex Event
DPI
Rule
Sensor Data
666
H. Bornholdt et al.
be used by all others. This approach is similar to the work done by Haubeck et al. [4] in which behavioural patterns of industrial machines (e.g. improvements after certain updates) are exchanged within a P2P network to enable adaptation of behaviour.
4.2 Data Publishing All locally discovered sensors are called Internal Data Sources (IDSs). The user has to decide whether these shall be accessible over the marketplace or are to be used solely by local Incorum services and applications. For this purpose, newly discovered sensor nodes are presented to the user via the Incorum node’s Web Interface. Here, the user can decide which data sources to publish, by creating External Data Sources (EDSs) out of one or more IDSs. EDSs are advertised on the marketplace and other users can subscribe to these sources either free of charge or in exchange for money or other artefacts, (e.g. data, rules, services, applications or other resources). As indicated in Fig. 4, the mapping of IDSs and EDSs is not 1 : 1. In fact, it is a n : m relationship: An IDS may be mapped to one or even multiple EDSs (e.g. in different qualities for different prices) and EDSs may combine data from multiple IDSs. Moreover, arbitrary tasks may be defined that filter, clean, transform, transcode, augment, aggregate or anonymize the data. By defining simple workflows on a webbased visual editor, the user is even able to build arbitrary processing chains that take the IDSs’ data as input and deliver the processed data as output to the EDSs. The tasks themselves are executed either in a local sandbox, where the Service Manager keeps track of the Lifecycle Management, or remotely as offered by third-parties on the marketplace. This way, privacy can also be enforced: Users just need to add respective processing tasks to the mapping of IDSs and EDSs. For example, a user might specify that the data will only be provided in aggregated form, e.g. an average daily temperature, or the start and endpoint of daily motion trajectories may be cut off to hide departure and destination.
Temperature Daily Weather Humidity
Combine Aggregate Heart Rate
Heart Rate Anonymize Legend
Fig. 4 Mapping of internal to external data sources using data processing workflows
Internal Data Source
Data Processing
External Data Source
Incorum: A Citizen-Centric Sensor Data Marketplace …
667
In order for data being advertised on the market, the user must provide some meta-data: • Type of data: describes the type of data involved (e.g. temperature or particulate matter) and, if applicable, for what period and area. • Quality: indicates the quality of the data or the source, i.e. information about the sensor used (e.g. private or official measuring station), the accuracy of the measured values (e.g. concrete or diffuse values), the measurement interval, and so on. • Terms and conditions of use: the owner decides whether the data is available to all or only a certain subset of the users. In addition, the owner can determine what the data may be used for (e.g. exclude commercial use and thus make the data accessible only to private users). • Price: every citizen who wants to retrieve data may have to pay a specific price as specified by the EDSs’ owner. The same data may also be offered in different qualities for different prices. Citizens who only want to know, for example, whether it is currently raining at a particular location, may have to pay less than citizens who want to know the exact rain rate or a history thereof. Up to now, we rely on natural language to describe and find data, rules, services and applications of interest. But using some kind of semantic annotations are envisaged and prospect for future work.
4.3 Data, Service and Application Sharing In order to search for and exchange data, users define corresponding Data Requests using the Incorum’s node web-based visual editor which contains information about the desired data as well as an initial bidding price they are willing to pay. The node then creates a Data Subscription on the Marketplace Blackboard. Subsequently, an automatic reverse auction is initiated by the nodes: each data provider holding the data of interest places a bid with the minimum price for which it is willing to fulfil the subscription. The requesting node then automatically selects the best offer. A distributed ledger is used to keep track and to ensure consensus over each participant’s account balance. In addition to sensor data, services and applications can also be offered on the marketplace’s blackboard to create added value for the user. Applications refer to software that is directly usable by users (e.g. for visualizing sensor data, anomaly detection, alerting), whereas services are meant to be integrated into other services or applications (e.g. storage service for the offloading of sensor data). Services or applications can either be distributed as binaries to be run on the requester’s node or can be offered as runtime services, which are running only on the provider’s node and can thus only be invoked remotely. Although not yet implemented, publishing and searching for services and applications follows a similar scheme as for data.
668
H. Bornholdt et al.
5 Conclusion As cities continue to grow and megatrends happen, sensor data is becoming more and more important. It is therefore essential that citizens are motivated to share data and that the barriers are as low as possible. Existing approaches require sensors with special interfaces, which are in most cases not present in households. Existing solutions store data centrally, which is problematic because citizens are increasingly concerned about privacy, so decentralized solutions that support data sovereignty will be favored in the future. This paper presents Incorum as a vision for realizing a Citizen-Centric Marketplace that meets the requirements set out in Sect. 2. Incorum allows to easily collect, share and process sensor data. The citizens’ privacy and data sovereignty are backed by defining access restrictions as well as usage policies and Incorum’s completely decentralized architecture, in which data does only leave the personal sphere on a citizen’s behalf. Moreover, straight-forward integration of open and proprietary sensors is supported by Incorum nodes and their visual web interface makes it easy for citizens with average IT skills to participate in the marketplace. Since the node can be operated on a Raspberry Pi or smartphone, no expensive/special-purpose hardware is necessary. Citizens are motivated to participate in the marketplace by gaining access to new data and also by receiving a return for offered data. The presented system not only enables the collection of data from many different sources, but also raises awareness among citizens about the data sent by their sensors (possibly without their knowledge) to third-parties. Additionally, even services and applications can be traded on the marketplace, which allows more functions to be added, e.g. a firewall on the network to filter or block unwanted communication with the vendor clouds. Future work comprises the evaluation of concrete technologies to implement the different components of the marketplace. For example, an information-centric network can be used for routing data in the network, a blockchain for ensuring consensus over various states and accounting, and smart contracts for complying with the terms of conditions for data use [1, 11]. Different mechanisms for sniffing or intercepting traffic of (proprietary) sensors (e.g. DPI, operation as a proxy or wireless access point), ease of use, robustness, effectiveness, and efficiency need to be evaluated. Further, user studies with non-technically inclined citizens must be conducted to verify ease of use and understandability of the user interfaces and creation of filter and processing rules. Acknowledgements This work was partially supported by the DFG (German Research Foundation) under the Priority Programme SPP1593: Design for Future—Managed Software Evolution.
Incorum: A Citizen-Centric Sensor Data Marketplace …
669
References 1. B. Ahlgren, C. Dannewitz, C. Imbrenda, D. Kutscher, B. Ohlman, A survey of informationcentric networking. IEEE Comm. Magazine 50(7), 26–36 (2012) 2. R. Brunner, F. Freitag, L. Navarro, Towards the development of a decentralized market information system: requirements and architecture, in 2008 IEEE International Symposium on Parallel and Distributed Processing (IEEE, 2008), pp. 1–7 3. M. Foundation, Machine exchange protocol: premium network infrastructure, infinite data stream commissioned (2018) 4. C. Haubeck, H. Bornholdt, W. Lamersdorf, A. Chakraborty, A. Fay, Step-based evolution support among networked production automation systems. AT—Automatisierungstechnik 66, Heft 10, 849–858 (2018) (evolution of CPPS) 5. J. Höller, V. Tsiatsis, C. Mulligan, S. Karnouskos, S. Avesand, D. Boyle, From Machine-toMachine to the Internet of Things: Introduction to a New Age of Intelligence (Elsevier, 2014) 6. D.C. Luckham, B. Frasca, Complex event processing in distributed systems. Computer Systems Laboratory Technical Report CSL-TR-98-754, Stanford University, Stanford, vol. 28, 16 (1998) 7. B. Otto, J. Jürjens, J. Schon, S. Auer, N. Menz, S. Wenzel, J. Cirullies, Industrial Data Space White Paper (Fraunhofer-Gesellschaft, Munich, Germany, 2016) 8. D. Serpanos, T. Wolf, Architecture of Network Systems (Elsevier, Amsterdam, 2011) 9. S. Spiekermann, A. Acquisti, R. Böhme, K.L. Hui, The challenges of personal data markets and privacy. Electron. Markets 25(2), 161–167 (2015) 10. Streamr, Unstoppable data for unstoppable apps: Datacoin by streamr (2017) 11. G. Wood et al., Ethereum: a secure decentralised generalised transaction ledger. Ethereum Project Yellow Paper 151, 1–32 (2014)
Developing an Instrument for Cloud-Based E-Learning Adoption: Higher Education Institutions Perspective Qasim AlAjmi, Ruzaini Abdullah Arshah, Adzhar Kamaludin, and Mohammed A. Al-Sharafi Abstract Cloud-Based E-learning is indeed the new paradigm shift in the information and communication technology. As much as the field is still in its infancy stage, various educational institutions have adopted the cloud-based E-learning around the world at all the levels of education. Cloud-based E-learning has shown significant progress and success among all the fields and subjects of education including language teaching, mathematics, science, engineering, and technology. Concisely, this paper is to show the instrument development that can evaluate the adoption of the cloud-based E-learning based on two vital learning aspects: the approach of systematic review and the interaction among the concerned and experts’ point of views. The practitioners in the field of education may find this instrument appealing because of its effectiveness of use. The purpose of this article is to develop multi-dimensional instrument that can be applied to measure the adoption factors for cloud-based Elearning perceived by HEIs, and to come up with a set of validated items. A systematic research design was implemented, multiple stage followed, six experts evaluated this instrument, piloting on 40 out of 100 respondents. Data collection triangulated by in-person interview, telephone interview, and questionnaire interview. Finally, empirically validation was done along with response bias. Finally, valid instrument is developed. Q. AlAjmi (B) College of Arts & Humanities, A’ Sharqiyah University, Ibra, Oman e-mail: [email protected] Q. AlAjmi · R. A. Arshah · A. Kamaludin · M. A. Al-Sharafi Faculty of Computing, College of Computing and Applied Sciences, Universiti Malaysia Pahang, Lebuhraya Tun Razak, 26300 Gambang, Pahang, Malaysia e-mail: [email protected] A. Kamaludin e-mail: [email protected] M. A. Al-Sharafi e-mail: [email protected] R. A. Arshah · A. Kamaludin · M. A. Al-Sharafi Faculty of Educational Technology, A‘Sharqia University, Ibra, Oman © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_60
671
672
Q. AlAjmi et al.
Keywords Cloud-based E-learning · Cloud computing · E-learning · Instrument development
1 Introduction The rise of cloud computing since the year 2007 has indeed shifted the paradigm in the sector of technology hence applying cloud computing services in the educational levels [1]. Cloud computing is considered as the dynamic platform which can be used for cyber-infrastructure and make it easy to access the services and applications. In essence, cloud computing has now been in the best position to grow as an innovation based on technology that can take care of enormous information that is customarily sent and stored in the electronic devices. Various factors affect the adoption of cloud computing as evidenced in the various previous studies [2]. However, the numerous studies were carried on the business organizations thus the respondents were the IT professionals. The online applications for learning among the learners have increased tremendously [3].
2 Literature Review The most vital factor affecting the adoption of cloud-based E-learning is the concept of data security [4]. The learners are not safe with their data online because of the various frauds. Nevertheless, most of them reluctant to adopt it because of other factors than the security concerns [5] as the security issue already overcome by the cloud computing providers. In addition, when we were developing the instrument, most experts/respondents asserted that they were looking for services that could ensure file backup and storage but not security concerns [6]. Furthermore, the technical competence may be compromised and also the cost of adopting the CBEL may be exuberant for the learners to achieve [7]. According to the development of the instrument, most of the respondents responded that their institutions had adopted private cloud as a security measure. Our instrument was also centered SaaS--somehow---because most of the respondents adopted the model. There is another difficulty associated with the adoption, the fact that the compatibility of the authorization of cloud computing and the various policies by developers might hinder the adoption of cloud-based E-learning [8]. Multiple uncertainties also accompany the idea of cloud computing such as the concepts of standardization [9]. The policies of the government and the legal environment and the aspects of competition are also hindrances toward adopting the cloud-based E-learning. As much as we are trying to come up with the instrument viable for the adoption of the cloud-based E-learning, it is evident that numerous factors may inhibit the process of adoption. The cost is a crucial factor among most students and mostly in the developing countries where technology has not yet hit the ground running [10]. Furthermore, other countries do
Developing an Instrument for Cloud-Based …
673
not allow online-based research work, and that makes it hard for the adoption of the cloud-based E-learning. Several authors conducted surveys to understand the interest of learners and decision makers in adopting cloud-based E-learning in their institutions [11]. This survey asked the targeted population about not only the fit and viability of this CBEL factors but also the cultural elements surrounding the implementation of CBEL. The instrument involved various parts: the first part was the demographic information of the respondents. The second part was about the cloud-based E-learning, and that included the technological factors, the viability factors, and the information culture factors. There was also the part of the adoption of cloud-based E-learning and the academic service quality. Most institutions were nevertheless thought to improve their effectiveness through cloud computing [12]. Therefore, the instrument could be used for various purposes. Cloud computing was also found out to fit the requirements to offer academic services. The conclusion was the fact that the instrument was competently reliable and led to a structure that had two factors. The factors involved the aspect of comfortability with using the cloud-based E-learning and e-learning self-managing. However, a lot of work was required to create a predictive validity. Ingersoll and Smith [13] further conducted the research and came up with a detailed explanation of the constructs relying on the existing literature. Smith also affirmed the reliability and the validity of the instrument that was created by McVay. The authors asserted that the device could be vital in ensuring that the tool could be relied upon in measuring the selfmanagement of the learners and the comfort of adopting the cloud-based E-learning [14]. Besides, many experts/respondents were of the view that the employees of the institution were skilled and hence the adoption of the instrument would be easier. After going through the disadvantages and the advantages of the readiness of cloud-based E-learning questionnaires by [15] made the items to increase from 12 to 38. From the factor analysis, it was evident that the questions focused on the following constructs: the beliefs regarding cloud-based E-learning, the confidence in the former skills, and the readiness to interact. The survey questioned, in general, the respondents on the efficiency of the platform to provide appropriate learning skills and the beliefs by the learners in their abilities to be victorious with the cloud-based E-learning.
3 Methodology The process involved a systematic process that was selected and chosen for the refining and understanding of the instrument required for the cloud-based E-learning adoption among the HEIs. Furthermore, various articles were incorporated to understand the factors for the CBEL adoption [3]. The following dimensions were extracted from the literature: (1) Technological factors: refers to the technology aspect (2) Viability factors: refers to its importance to the HEIs. (3) Information culture factors: it refers to the information communication and utilization. It should be noted that,
674
Q. AlAjmi et al.
these dimensions and its items been tested and validated in many contexts, and gained more comments on. Our instrument was therefore aimed at improving the ones that had been developed before. We thought of expanding the existing survey and come up with an instrument that is more rigorous through comprehensive systemic validation. Since the concept of learning through online platforms has become common, it is vital to ensure that the approach follows a tight process to ensure that the validity and reliability of the instruments. Validity can be achieved if the instrument goes through a rigorous process to understand the content and the validity referenced on the criterion. However, it is evident that various instruments that claim to measure the adoption of cloud-based E-learning do not undergo rigorous procedures. General steps were followed for developing this instrument as follows: (1) defining constructs, (2) dimensions identification, (3) grouping items for each dimension, (4) collecting date, (5) piloting, and finally, (6) evaluating the reliability and validity of the instrument. During the instrument measurement, the first-stage result was used as the starting. The 233 items were customized for the adoption of cloud-based E-learning. Then, the author had to pretest the instrument and the experts/respondents were selected carefully, data collection using Survey Monkey website. Pretesting step consists of three phases, screening the questionnaire by the fellow researchers/experts, university students with self-administrating, and qualitative cunsoultive by experts in the field. The instrument has piloted with 40 respondents from various HEIs in Oman and various levels of rank. Main questionnaire was administered online via survey website and authors following. The items measured using Five-Point Likert scale” Strongly Disagree” (1) to “Strongly Agree” (5). The study incorporated the two-phase approach to validate a survey that was carried out to measure the adoption of cloud-based E-learning. The first phase tackled the demographic information while the second part addressed the translation validity and criterion-referenced validity. During the first part, a team of experts surveyed constructs such as gender, age, academic levels, and employment positions. The questions were significant because they could help in unraveling the validity of the instrument with regards to the preference of the learners with different socioeconomic backgrounds. The second part involved the analysis of the specific items and statistical analysis regarding the reliability, and the validity of the instrument of the survey was conducted. Each part provided various findings that were significant for the systematic refinement concerning the instrument. Our primary goal involved two aspects: to contribute toward developing a rigorous instrument for cloud-based E-learning adoption and to ensure that our method for the future development of the survey is shared.
4 Instrument Development The first phase included the development of the survey. The survey relied heavily on the literature [16–18]. The approach that was executed in this phase included the conventional method that ensured the validity of the face and content are there and
Developing an Instrument for Cloud-Based …
675
the experts were required to review the clarity and completeness items to ensure that everything was perfectly placed. However, after developing the first version of the instrument, we considered intensifying the rigor by initiating an analysis of the item of the characteristics. As much we surveyed on the capabilities of the technology, it was not used so much because the questions were asking about the necessary skills and the using frequency. The next phase involved the analysis of the items. The process included the focus groups and the interviews to comprehend the perceptions of the participants and the appropriate implication of the particular item through a review that is known as the cognitive testing [19]. Concisely, cognitive testing involves evaluating the levels to which the asked questions are consistently interpreted and understood by the individuals that are answering. The questions of the interview were aimed at ensuring the participants comprehended the questions appropriately. The cognitive testing, therefore, required the correspondents to say in their terms and words the meaning and the implication of the questions as asked by the interviewer. Since our concept involves online platforms, we did not carry out face-to-face interviews but adopted the online questionnaires.
4.1 Items Extracted The items were extracted from the previous literature review; with systematic review the author come up with more than 100 factors considering its relevant to the context of the study which is higher educational context. After searching in four well-known digital databases, the result was as follows: Science Direct: 173 article, Scopus: 96 article, IEEE xplorer: 633 article, Web of science: 346 article, EBSCO: 292 article, with a total of = 1540 article. The research criteria used was “((((cloud computing) AND adoption factors) AND higher education) OR adoption theory) for the period 2013–2017, written in English. Then, after removing irrelevant articles, we found 950 articles focused on Educational context; from this, we extracted 147 factors which we ordered as most frequent factors which result 100 factors. We had to group these factors and removed irrelevant factors and similar factors, and we reached to 11 factors. Table 1 shows the list of initial extracted items with its sources. Finally, we have built up the study questionnaire based on the above items and factors.
4.2 Content Validity Then after, we moved to the content validity stage, where we have contacted few experts from the field of the study, and they were given a content validity letter consists of introductory about the researchers and the aim of this research, objectives of this content validity and the evaluation rates: 1 = “The item is not a representative,” 2 =
676
Q. AlAjmi et al.
Table 1 Factors, items, and sources Factors
Code
Number of items
Sources
Relative advantage items
RA
6
[20–26]
Complexity items
Com
5
[27–30]
Compatibility items
comt
6
[22, 23, 24, 25, 26, 27, 28, 29, 31, 32]
Fit items
Fit
5
[33–36].
Decision-makers items DM
6
[9, 25, 26, 31, 37, 38, 39]
Cost reduction items
CR
5
[9, 40, 41, 42].
IT readiness items
IT
6
[21, 24, 26, 43, 44, 45, 46, 47]
Information integrity items
Info.int
5
[27, 28, 29, 32, 48, 49, 46, 47]
Information formality items
Info.Form
5
[9, 37, 38, 48, 49]
Information control items
Info.Cont
5
[24, 43, 44, 48, 49]
Information pro-activeness items
Info.Pro
5
[20, 21, 22, 48, 49, 45, 47]
“The item needs revision to be representative,” and 3 = “The item is representative.” Their evaluation as stated in Table 2 against each.
4.3 Pilot Study Procedures The participants for the study were 100 learners that had achieved various levels of learning stages. Of the 100 participants, there were 45 males and 55 females participating. However, out of the total 100, only 40 completed the item analysis from the questionnaire. Indeed, we credited the participants for completing the questionnaire as part of the assignment. The responses of the participants were put into consideration for the conclusion and testing of the instrument. The participants in the case were supposed first to fill out the questions on the questionnaire sent to them via Survey Monkey for comments and feedback. The respondents start by filling the demographic information and then to the last part of the questionnaire. After completing the subscales, the participants were then asked to go through the items they have answered and the response to three questions of the survey questions.
Developing an Instrument for Cloud-Based …
677
Table 2 Result from experts’ review Items
Exp 1
Exp 2
Exp 3
Exp 4
Exp 5
Exp 6
Mean
SD
RA 1
3
2
3
3
3
3
2.833333
0.408248
RA 2
3
3
2
3
3
2
2.666667
0.516398
RA 3
3
3
2
3
3
3
2.833333
0.408248
RA 4
3
2
2
3
3
3
2.666667
0.516398
RA 5
3
2
2
3
3
3
2.666667
0.516398
RA 6
2
2
3
3
3
3
2.666667
0.516398
Com1
3
2
2
3
3
2
2.5
0.547723
Com2
3
1
2
3
3
2
2.333333
0.816497
Com3
3
3
2
3
3
1
2.5
0.83666
Com4
2
3
2
3
3
1
2.333333
0.816497
Com5
3
2
2
3
3
2
2.5
0.547723
Comt1
3
2
3
3
3
3
2.833333
0.408248
Comt2
3
3
3
2
3
2
2.666667
0.516398
Comt3
3
3
2
3
3
3
2.833333
0.408248
Comt4
3
2
3
3
3
2
2.666667
0.516398
Comt5
3
3
2
3
3
3
2.833333
0.408248
Fit1
3
3
2
3
3
3
2.833333
0.408248
Fit2
3
2
3
2
3
3
2.666667
0.516398
Fit3
3
2
3
3
3
3
2.833333
0.408248
Fit4
3
3
2
3
3
3
2.833333
0.408248
Fit5
3
3
2
3
3
3
2.833333
0.408248
DM1
3
1
2
3
3
3
2.5
0.83666
DM2
3
3
2
2
3
3
2.666667
0.516398
DM3
3
3
2
2
3
3
2.666667
0.516398
DM4
3
3
2
3
3
3
2.833333
0.408248
DM5
3
3
2
3
3
3
2.833333
0.408248
DM6
3
2
3
3
3
3
2.833333
0.408248
CR1
3
3
3
3
3
3
3
0
CR2
3
2
3
3
3
3
2.83333
0.408248
CR3
3
3
3
3
3
3
3
0
CR4
3
3
2
2
3
3
2.666667
0.516398
CR5
3
1
2
3
3
3
2.5
0.83666
IT1
2
3
3
3
3
3
2.833333
0.40848
IT2
3
2
2
2
3
2
2.333333
0.516398
IT3
3
3
2
2
3
3
2.666667
0.516398
IT4
2
3
2
2
3
3
2.5
0.547723 (continued)
678
Q. AlAjmi et al.
Table 2 (continued) Items
Exp 1
Exp 2
Exp 3
Exp 4
Exp 5
Exp 6
Mean
SD
IT5
3
2
2
3
3
3
2.666667
0.516398
IT6
3
1
2
3
3
3
2.5
0.83666
Info.Integ1
3
2
2
3
3
3
2.666667
0.516398
Info.Integ2
3
3
2
3
2
3
2.666667
0.516398
Info.Integ3
3
2
2
3
3
3
2.666667
0.516398
Info.Integ4
3
3
2
3
3
3
2.833333
0.408248
Info.Integ5
3
3
2
3
3
3
2.833333
0.408248
InfoForm1
3
3
3
2
3
3
2.833333
0.408248
InfoForm2
3
3
3
3
2
3
2.833333
0.408248
InfoForm3
3
1
2
3
3
3
2.5
0.83666
InfoForm4
3
2
2
3
2
3
2.5
0.547723
InfoForm5
3
2
2
3
3
3
2.666667
0.516398
InfoCont.1
3
3
2
3
3
3
2.833333
0.408248
InfoCont.2
3
3
2
3
3
3
2.833333
0.408248
InfoCont.3
3
3
2
3
3
3
2.833333
0.408248
InfoCont.4
3
3
2
3
3
3
2.833333
0.408248
InfoCont.5
3
3
2
3
2
3
2.666667
0.516398
Info.Pro.1
3
3
2
3
3
3
2.833333
0.408248
Info.Pro.2
3
2
2
3
3
3
2.666667
0.516398
Info.Pro.3
3
2
3
3
3
3
2.833333
0.408248
Info.Pro.4
3
1
2
3
3
3
2.5
0.83666
Info.Pro.5
3
2
2
3
3
3
2.666667
0.516398
5 Findings From Tables 1 and 2, we can notice that all the items were accepted with minor revision for some of them. From analyzing the responses from the experts, we found out that the expert responded to the questions based on their experiences besides with the adoption of cloud-based E-learning. We realized that most respondents accepted that the decision makers are interested in adopting the CBEL. Regarding the information culture factors, most people ascertained that the integrity of the information improves the sharing of information among the staff openly and effectively. On adopting the E-learning cloud-based computing, most respondents verified that they are comfortable recommending cloud computing in their institutions. The final outcome of the previous mentioned phases was development of 32-question survey designed to measure the influences factors on CBEL adoption.
Developing an Instrument for Cloud-Based …
679
6 Conclusion In conclusion, the instrument development reviewed the factors that affect the adoption of cloud computing in E-learning. The factors were found out to be vital about influencing the opinion of the decision makers in adopting cloud computing to come up with systems that can offer different services to various stakeholders. The numerous studies that were carried out tried to ensure sufficient information is available concerning cloud computing. However, the past information was insufficient regarding the adoption of cloud-based E-learning. It is therefore vital to state that the present study aims at enriching the literature of those studies. The study can be implemented by the decision makers who may require utilizing cloud computing in the public sector and may need to understand the factors that can influence their intentions in adopting cloud computing in E-learning. Most of the respondents strongly agreed that cloud computing is viable to implement CBEL services in their institution. Acknowledgments Funding The authors would like to express their sincere gratitude to Universiti Malaysia Pahang, for supporting this work through Research Grant Scheme RDU180702 (Kajian Impak dan Halatuju Pembelajaran Teradun di Malaysia) and Faculty of Educational Technology, A‘Sharqia University (ASU), Oman. Conflict of Interest The authors declare that they have no conflict of interest. Ethical approval This chapter does not contain any studies with human participants or animals performed by any of the authors. Informed consent Informed consent was obtained from all individual participants included in the study.
References 1. I.E. Allen, J. Seaman, Learning on Demand: Online Education in the United States, 2009 (ERIC, 2010) 2. H. Al-Samarraie, N. Saeed, A systematic review of cloud computing tools for collaborative learning: Opportunities and challenges to the blended-learning environment. Comput. Educ. 124, 77–91 (2018) 3. M. Anshari, Y. Alas, L.S. Guan, Developing online learning resources: Big data, social networks, and cloud computing to support pervasive knowledge. Educ. Inf. Technol. 21, 1663–1677 (2016) 4. T.-S. Hew, S.L.S.A. Kadir, Understanding cloud-based VLE from the SDT and CET perspectives: development and validation of a measurement instrument. Comput. Educ. 101, 132–149 (2016) 5. W.A. Al-Ghaith, L. Sanzogni, K. Sandhu, Factors influencing the adoption and usage of online services in Saudi Arabia. Electron. J. Inf. Syst. Dev. Countries 40 (2010) 6. Q. AlAjmi, R.A. Arshah, A. Kamaludin, A.S. Sadiq, M.A. Al-Sharafi, A conceptual model of E-learning based on cloud computing adoption in higher education institutions 7. M.S. Kerr, K. Rynearson, M.C. Kerr, Student characteristics for online learning success. Internet Higher Educ. 9, 91–105 (2006)
680
Q. AlAjmi et al.
8. A. Alharthia, M.O. Alassafia, A.I. Alzahrania, R.J. Waltersa, G.B. Willsa, Critical success factors for cloud migration in higher education institutions: a conceptual framework. Int. J. Intell. Comput. Res. (IJICR) 8, 817–825 (2017) 9. H. Hassan, M.H.M. Nasir, N. Khairudin, I. Adon, Factors influencing cloud computing adoption in small and medium enterprises. J. ICT 16, 21–41 (2017) 10. M. Menzel, R. Ranjan, L.Z. Wang, S.U. Khan, J.J. Chen, CloudGenius: a hybrid decision support method for automating the migration of web application clusters to public clouds. IEEE Trans. Comput. 64, 1336–1348 (2015) 11. R. Almajalid, A survey on the adoption of cloud computing in education sector. arXiv:1706. 01136 (2017) 12. A. Abbas, K. Bilal, L.M. Zhang, S.U. Khan, A cloud based health insurance plan recommendation system: a user centered approach. Future Gener. Comput. Syst. Int. J. Sci. 43–44, 99–109 (2015) 13. R.M. Ingersoll, T.M. Smith, The wrong solution to the teacher shortage. Educ. Leadership 60, 30–33 (2003) 14. Q. Alajmi, A.S. Sadiq, A. Kamaludin, M.A. Al-Sharafi, Cloud computing delivery and delivery models: opportunity and challenges. Adv. Sci. Lett. 24, 4040–4044 (2018) 15. M. McVay, How to be a Successful Distance Education Student: Learning on the Internet (Prentice Hall, New York, 2001) 16. K.M. Faqih, Which is more important in E-learning adoption, perceived value or perceived usefulness? Examining the moderating influence of perceived compatibility, in 4th Global Summit on Education GSE. Kuala Lumpur. World Conferences (2016) 17. J. Bersin, Predictions for 2014, Bersin by Deloitte (2013) 18. R.S. El Hinnawi, The impact of relationship marketing underpinnings on customer’s loyalty: case study–bank of palestine, MS Degree Thesis, The Islamic University-Gaza, 2011 19. E. Yadegaridehkordi, L. Shuib, M. Nilashi, S. Asadi, Decision to adopt online collaborative learning tools in higher education: a case of top Malaysian universities. Educ. Inf. Technol. 1–24 (2018) 20. J.S. Ibrahim, Adoption of Cloud Computing in Higher Education Institutions in Nigeria (Universiti Utara Malaysia, 2014) 21. F. Makoza, Cloud computing adoption in higher education Institutions of Malawi: an exploratory study. Int. J. Comput. ICT Res. 9 (2015) 22. H.M. Sabi, F.-M.E. Uzoka, K. Langmia, F.N. Njeh, C.K. Tsuma, A cross-country model of contextual factors impacting cloud computing adoption at universities in sub-Saharan Africa. Inf. Syst. Front. 1–24 (2017) 23. A.N. Tashkandi, I.M. Al-Jabri, Cloud computing adoption by higher education institutions in Saudi Arabia: an exploratory study. Cluster Comput. 18, 1527–1537 (2015) 24. T. Oliveira, M. Thomas, M. Espadanal, Assessing the determinants of cloud computing adoption: an analysis of the manufacturing and services sectors. Inf. Manag. 51, 497–510 (2014) 25. C. Low, Y. Chen, M. Wu, Understanding the determinants of cloud computing adoption. Ind. Manag. Data Syst. 111, 1006–1023 (2011) 26. N. Alkhater, G. Wills, R. Walters, “Factors influencing an organisation’s intention to adopt cloud computing in Saudi Arabia, in Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on, 2014, pp. 1040–1044 27. M. Stieninger, D. Nedbal, W. Wetzlinger, G. Wagner, M.A. Erskine, Impacts on the organizational adoption of cloud computing: a reconceptualization of influencing factors. Proc. Technol. 16, 85–93 (2014) 28. K. Charlebois, N. Palmour, B.M. Knoppers, The adoption of cloud computing in the field of genomics research: the influence of ethical and legal issues. PLoS ONE 11, e0164347 (2016) 29. L. Morgan, K. Conboy, Key factors impacting cloud computing adoption. Computer 46, 97–99 (2013) 30. R. Hussein, C. Ennew, W. Kortam, The adoption of web-based marketing in the travel and tourism industry: an empirical investigation in Egypt. J. Innov. Manag. Small Medium Enterprises 1 (2012)
Developing an Instrument for Cloud-Based …
681
31. H. Gangwar, H. Date, R. Ramaswamy, Understanding determinants of cloud computing adoption using an integrated TAM-TOE model. J. Enterprise Inf. Manag. 28, 107–130 (2015) 32. H.M. Sabi, F.-M.E. Uzoka, K. Langmia, F.N. Njeh, Conceptualizing a model for adoption of cloud computing in education. Int. J. Inf. Manag. 36, 183–191 (2016) 33. D.L. Goodhue, R.L. Thompson, Task-technology fit and individual performance. MIS Q. 213– 236 (1995) 34. F. Mohammed, O. Ibrahim, M. Nilashi, E. Alzurqa, Cloud computing adoption model for e-government implementation. Inf. Dev. 33, 303–323 (2017) 35. F. Mohammed, O. Ibrahim, N. Ithnin, Factors influencing cloud computing adoption for egovernment implementation in developing countries: Instrument development. J. Syst. Inf. Technol. 18, 297–327 (2016) 36. T.-P. Liang, C.-P. Wei, Introduction to the special issue: Mobile commerce applications. Int. J. Electron. Commer. 8, 7–17 (2004) 37. H.P. Borgman, B. Bahli, H. Heier, F. Schewski, Cloudrise: exploring cloud computing adoption and governance with the TOE framework, in 2013 46th Hawaii International Conference on System Sciences (HICSS), pp. 4425–4435 (2013) 38. A.Y.-L. Chong, B. Lin, K.-B. Ooi, M. Raman, Factors affecting the adoption level of c-commerce: an empirical study. J. Comput. Inf. Syst. 50, 13–22 (2009) 39. I. Sila, Factors affecting the adoption of B2B e-commerce technologies. Electron. Commer. Res. 13, 199–236 (2013) 40. O. Ali, J. Soar, J. Yong, X. Tao, Factors to be considered in cloud computing adoption. Web Intell. 309–323 (2016) 41. S. Arvanitis, N. Kyriakou, E.N. Loukis, Why do firms adopt cloud computing? A comparative analysis based on South and North Europe firm data. Telematics Inform. (2016) 42. S. Nunes, J. Martins, F. Branco, R. Gonçalves, M. Au-Yong-Oliveira, An initial approach to egovernment acceptance and use: a literature analysis of e-Government acceptance determinants, in 2017 12th Iberian Conference on Information Systems and Technologies (CISTI), pp. 1–7 (2017) 43. M.H. Kayali, N. Safie, M. Mukhtar, Adoption of cloud based E-learning: a systematic literature review of adoption factors and theories. J. Eng. Appl. Sci. 11, 1839–1845 (2016) 44. M. Odeh, A. Garcia-Perez, and K. Warwick, Cloud computing adoption at higher education institutions in developing countries: a qualitative investigation of main enablers and barriers (2016) 45. M.A. Al-Sharafi, A.A. Ruzaini, Q. Alajmi, F. Herzallah, Understanding online banking acceptance by jordanian customers: the effect of trust perceptions. J. Eng. Appl. Sci. 436–438 (2016) 46. M.A. Al-Sharafi, R.A. Arshah, F.A. Herzallah, Q. Alajmi, The effect of perceived ease of use and usefulness on customers intention to use online banking services: the mediating role of perceived trust. Int. J. Innov. Comput. 7 (2017) 47. Q. Alajmi, A. Sadiq, A. Kamaludin, M.A. Al-Sharafi, E-learning models: the effectiveness of the cloud-based E-learning model over the traditional E-learning model, in 2017 8th International Conference on Information Technology (ICIT), pp. 12–16 (2017) 48. C.W. Choo, P. Bergeron, B. Detlor, L. Heaton, Information culture and information use: an exploratory study of three organizations. J. Assoc. Inf. Sci. Technol. 59, 792–804 (2008) 49. C.W. Choo, C. Furness, S. Paquette, H. Van Den Berg, B. Detlor, P. Bergeron et al., Working with information: information management and culture in a professional services organization. J. Inf. Sci. 32, 491–510 (2006) 50. L. Chao, Handbook of Research on Cloud-Based STEM Education for Improved Learning Outcomes (IGI Global, 2016)
Gamification Application in Different Business Software Systems—State of Art Zornitsa Yordanova
Abstract The study aims at examining gamification all together with its game elements, techniques and mechanism, and how they may be used in business software implantation and operation in enterprise management. The research goes through the literature to clearly define gamification and to reveal concrete applications of it. The methodology in the study employed a focus group of subject matter experts in business software applications who held a brainstorming session to identify which gamification elements from the literature analysis is appropriate and relevant to the scoped enterprise information systems in the study: ERP (enterprise resource planing), CRM (customer relationship management) and BI (business intelligence). The results first provide gamification elements for the implementation and management of business software systems, as well as an explanation of how the extracted gamification elements can be used especially for the purposes and objectives of the business software. Keywords Gamification · Business software · Information systems · MIS · User engagement · ERP · CRM · BI
1 Introduction Gamification has recently attracted a great deal of interest from both practitioners and educators. Normally, gamification is used as a tool for better understanding of a particular material or topic and also to illustrate a specific scenario or a case in which demonstration and empathy are required [1]. It has been used for greater commitment to a cause (representing the topic as a game, not an obligation or responsibility). Other case studies show gamification utilization for increasing results and bringing reality closer through role-play (in which situations the user would be in an artificial environment and would not be able to show his potential) [2]. Gamification is used as well to accommodate intergenerational differences in the setting of objectives and Z. Yordanova (B) University of National and World Economy, 8mi dekemvri, Sofia, Bulgaria e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_61
683
684
Z. Yordanova
tasks (particularly between the different thinking of millenniums-born around and after 2000 and their possible leaders from previous generations). Gamification is applied for many other purposes, its main idea is the use of game elements in a nonbusiness context in order to achieve concrete and better results. Thus, the generalized perceptions of gamification give reasons for the researchers to believe that it would be a particularly useful and applicable tool for increasing the learning outcomes and the business results as a whole. Bearing in mind the scale of the concept and the wide spectre and scope of its application, the proper usage and application of gamification is a difficult task especially for achieving particular goals. No matter of the large quantity of research during the latest years analysing and digging the gamification, the topic is still not scoped. Even more, the concept is increasingly attracting more interest from different industries for achieving different purposes. On the other hand, the large number of discussions, research and case studies give reasons for reckoning the topic as a hot one. Currently, much research has examined the application of gamification in different contexts, for different purposes, in different industries. However, a deep understanding is still missing since gamification has been studied mostly as a tool for learning purposes. Gamification in information system is surely amongst the most research areas of its application and possible impact over other industries and within multidisciplinary processes, especially those ones which involve many people [3]. This research focuses on researching how gamification may be of use in different management information systems or business software for handling different problems that these systems meet during their processing and performing business operations. This is achieved through employing a focus group from subject matter experts in MIS implementation and maintenance who formulate some game elements, which are already in use for these purposes, as well as some good practices in embedding gamification into MIS projects and products.
2 Theoretical Background 2.1 Gamification Playing as a phenomenon is older than culture, economy, the entire socio-cultural and socio-economic system as we know them today before human society and human civilization [4]. It is more than just a physiological phenomenon or a psychological reflex the author concluded. It exceeds the exclusively physical or purely biological activity. This is a significant feature—i.e., has some meaning and serves not only on its own, isolated from side factors and purposes, it is a function of human being. The importance of games as a phenomenon is also confirmed by pedagogical research, which determines games as an essential and critical element of the maturation process. After the brief preface, which puts the games at the centre of
Gamification Application in Different Business Software …
685
human development since its inception, the concept of gamification is brand new, yet significant and promising for its development. Amongst the first to define the concept of gamification as a modern concept is Pelling [5, 6], who saw it back in 2002 as a process that makes the interface of different products, in his case, electronic transactions, more fun, faster and more playful. The gamification process defined by Deterding et al. [7] is the use of game elements in non-game contexts. In depth, this definition is dealt with in another authors’ study [8], where they explain that it is a matter of game elements, not a game in general. While games are usually played, play itself is a different and broader category than the game itself. Games, on the other hand, are characterized by rules and competition, or the struggle with concrete and persistent results or goals on the part of those involved. The authors make a distinction between the term “serious games” and “gaming”. While serious games describe the use of incomplete games for non-entertainment purposes, the use of games and the use of gaming elements is a way of diversifying existing approaches for better performance. A link between the concept of gaming and serious games, however, is that both concepts use games for purposes other than their normal use for entertainment. In addition, gamification is also defined as a process of using gaming mechanisms and game thinking to solve problems from Deterding itself. In another study, [8] claim that gamification as a term derives from the digital media industry. Lee and Hammer [9] believe that gamification is the use of gaming mechanisms, dynamics and frameworks to promote desired behaviour. Kapp [10] defines gamification as the use of game-based mechanics, aesthetics and playful thoughts to make people loyal, to motivate action, to encourage learning, and to solve problems. The key point of gamification is the inclusion of gaming tasks that players have to perform [11]. McGonigal [12] summarizes in a study that, since the beginning of the twenty-first century, a lot of research interest has been on games as a phenomenon through which can be conveyed an element of joy and excitement in serious work situations and their solution. Spakova et al. [13] define gamification as “the process of doing activities in non-game contexts such as games”. Another definition in the literature interprets gamification as an informal term for the use of video game elements in non-gaming systems to improve user experience and user engagement [14]. Huotari and Hamari [15] divide gamification into three parts: (1) implementing elements of the game in non-gaming activities, (2) making psychological changes and (3) visible changes in user behaviour. As a summary of the analysed definitions, it can be concluded that gamification is a concept for using game elements [7, 16, 17] in a different non-game context [7, 17, 18] for the purpose of increasing consumer engagement [14, 16, 18] or handling educational challenges [19]. Again for the purpose of systemizing and summarizing, Jakubowski [20] concluded that he considered the following two definitions to be the most focused: (1) gaming is the use of game elements in non-game contexts [7]; (2) ignoring is the process of gaming and gaming mechanics for consumer engagement and problem-solving [16]. Table 1 summarizes the definitions in the scientific literature [21]. For the research purposes after the performed literature analysis on the concept of gamification, the author uses the following definition:
686
Z. Yordanova
Table 1 Definitions of gamification in the scientific literature Authors
Definition
Pelling [5], p. 1
“A process that makes the interface of different products, more fun, faster and more playful”
Deterding et al. [7]
“Using game elements in non-game contexts”
Deterding et al. [7]
“The process of using gaming mechanisms and game thinking to solve problems by yourself”
Deterding et al. [7]
“Term for using video game elements in non-gaming systems to improve user experience and user engagement”
Lee and Hammer [9]
“Using gaming mechanisms, dynamics and frameworks to promote desired behaviour”
Kapp [10]
“Use of game-based mechanics, aesthetics and playful thoughts to make people faithful, to motivate action, to promote learning and to solve problems”
Spakova et al. [16]
“The inclusion of gaming tasks that players must perform”
Huotari and Hamari [14]
“Implementing elements of the game in non-games activities, making psychological changes and visible changes in user behaviour”
Zicherman and Cunningham [16] “A process of using thinking and mechanics to engage users” Burke [18]
“Using game mechanics and game design techniques in non-game contexts of design behaviour, skills development, or engaging people in innovation”
Werbach and Hunter [17]
“Using gaming elements and game design techniques in non-gaming contexts”
Huotari and Hamari [14]
“A process of improving the service with the ability to play games to maintain the overall value creation of the user”
Werbach [35]
“Process of turning activities into more playful situations”
Gamification is using of game elements, techniques and mechanism in non-game context to achieve specific goals.
The result of the analysis of all the definitions and understanding of gamification in the literature provides a contribution with (1) unifying all the mentions ingredients of game within the concept, i.e., elements, technics and mechanism and (2) clarifying that gamification concept aims at delivering results on particular topics and already set goals.
2.2 Business Software Systems Technology and business information systems/software, in particular, are an essential component of everyday life and work of employees and overall for organizations. Information systems are becoming increasingly integrated in the business processes of enterprises, making their divisibility and their definition as separate objects of
Gamification Application in Different Business Software …
687
the organization system increasingly difficult, but also increasingly important for the overall process of corporate governance and an element of the enterprise’s economy [22]. Until recently, information systems were the primary responsibility of information technology (IT) departments and IT specialists. Today, more and more information technology is becoming a major tool of management, of business professionals, of units that are directly responsible for organizational goals, not just technical provision. In this sense, IT knowledge is also needed to a great extent by economists in their role as financiers, marketers, accountants, logistics specialists, management, manufacturing, sales, human resources, sales, trade, entrepreneurs, etc. Businesses are highly dependent on IT infrastructure and are increasingly demanding in terms of maintaining and processing vast amounts of user data, dynamic resource sharing, rapid adaptation to changing business requirements, ease of use and automation of services and processes, measurability of technology costs, operational provision of different information services, etc. [23]. Kisimov [24] examines information systems in terms of their management flexibility, especially for the purposes of business management. Under business management, Kissimov [24] understands management of any system driven by business considerations using the following business environment components as leadership tools—business strategy, business policies, business processes, business critical performance indicators, business critical business success factors. Business management is thus a form of governance in which business components are leading to how to implement the management process, in this case the management information system. The information systems in organizations that address the management of various business processes and overall management activities meet different terms in the literature [25]: management information systems, business management information systems, business information systems [26], corporate information systems [23, 27], company information systems and management information systems. Despite the differences, the functions of all of them are to support and provide the information processes in the organization and their management. For definition in this study, Kisimov’s definition [27] has been adopted, which summarizes that these systems usually serve to support the corporate business and include modules designed to support core corporate activities, which in practice outlines their broad scope. It is with this definition and within its scope that a wide range of information systems are involved in the study, which are related to different aspects of the company activities and their management. Petkov [25] concludes that MIS is a complex system for providing information about the management activity in an organization or, in more detail, it is a constantly evolving system for transforming data from various sources (internal and external to the organization) in information and presentation in an appropriate form for managers at all levels and in all functional activities of the organization to assist them in taking timely and effective decisions in the planning, management and control of activities, which they are responsible. According to Ferguson [28], the organization’s performance management is sought and achieved at strategic, tactical and operational levels, namely through MIS, ERP, CRM, Supply Chain Management and BI systems.
688
Z. Yordanova
Information systems are also considered to be one of the strongest decisionmaking tools. Lucas [29] outlines the four main trends that organizations need to cope with and steadily manage: (1) to use information provision through MIS as part of their corporate strategy and to achieve their goals; (2) technology as a highly penetrating component of the day-to-day work of the employees and the business environment; (3) the use of technology to transform the organization, continuous change according to the environment, competition, customer requirements and innovation progress; and (4) the use of personal computers as the main tool of management. All these arguments support and motivate this research in searching for a link between applying gamification and the boosting the performance of the already applied business software systems.
3 Methodology The methodology in the study employed a focus group with subject matter experts in business software implementation. The reason for using this specific research technique is its usefulness in qualitative research [30] for depth understanding of the researched hidden possibilities for knowledge transfer between different fields [31]. The choice of using subject matter experts is because of the required knowledge on business information systems for the research purposes and also for extracting value with providing the understanding of different stakeholders, which permits the analysis to predict the reaction of potential groups on the field [32]. One focus group meeting was organized. It took place in May 2019 and it was formed by four specialists in business applications implementation, each of them with 10+ years’ experience in multinational IT projects. The literature analysis, performed above, had been prior provided to the specialists for bringing more knowledge and understanding of gamification as a concept even though the four members of the focus group had declared their awareness of gamification. The awareness of gamification and its application were both criteria for selecting the subject matter experts as the main purpose was to elicit and summarize the already used gamification in business software implementation and operations. For setting a scope in the discussion, the group limit their brainstorming activities to these main business software systems: enterprise resource planning (ERP), customer relationship management (CRM) and business intelligence (BI). The focus group sat down in a meeting for 2 h. The approach for performing the discussion was based on complete affirmation and acceptance of the game elements, techniques and mechanism in the different systems for both their implementation and further usage. The assignment of a gamification element to each of the scoped business software systems during implementation and management is made based on the gamification tools described by Boer [33] since they are wide ranged enough for achieving diverse results. These are: limitations; emotional reinforcement; storytelling; progressive relationships; challenges; opportunities for rewarding; cooperation in the field of
Gamification Application in Different Business Software …
689
competition; cooperation; feedback, opportunity for a victory; achievements; leadership; avatars; levels; badges; points; fights; searches; collections; social graphics; combat the team; unlocking the content; virtual goods, gifts. The assignment is done on the bases of relevance and adequacy. During the meeting, the author of the paper was taking the role of a moderator so as to keep the scope, target and the focus of the group [34]. The moderator had also the responsibility to take notes and to summarize the discussed topics. The outcomes of the meetings are presented in the results’ section of the study and aim at delivering knowledge on the already applied gamification practices. The added value and contribution from this approach is to demonstrate clearly the relevancy and purposefulness of gamification application in business software systems implementation and maintenance. Main reason for this is the characteristic of gamification to be extremely useful when it is applied for targeting specific and particular goals and not just to relieve tasks and operations.
4 Results and Discussion The outcomes of the focus group meeting are presented in Table 2. They show the elicited game elements, techniques and mechanism for the purposes of implementation and management of the different systems in the scope. The results from the discussion also provided understanding of where and how exactly these gamification tools could be used from management point of view. It seemed these are most relevant and appropriate for analysis; implementation; monitoring phases during implementation and engaging users to perform their tasks with higher quality. The results from this second deliverable of the discussion are presented with explanations from the experts of how exactly the examined gamification elements, techniques and mechanism might be in use. The approach of presenting the outcomes aims at delivering more practical knowledge on real application of gamification in business software management. The results are presented in Table 3 Table 2 Gamification applied in Business software systems Business software system type
Game element techniques and mechanism, useful in the scoped business software systems
ERP
Virtual goods; opportunities for rewarding; feedback, achievements; gifts, challenges; levels collections; progressive relationships; cooperation in the field of competition; storytelling
CRM
Fights; gifts; challenges; storytelling; levels; cooperation in the field of competition; achievements; competition, progressive relationships, storytelling
BI
Collections; achievements; gifts; challenges; storytelling emotional reinforcement, opportunities for rewarding; levels; virtual goods, cooperation in the field of competition
690
Z. Yordanova
Table 3 Game elements, techniques and mechanism usage Game elements, techniques and mechanism Explanation for usage Virtual goods
By promising virtual good for performing tasks accurately, the overall performance of business software system is better and users follow the right process more engagingly
Cooperation in the field of competition
Most of business software systems processes rely on good cooperation between users and different teams. That is why cooperation is stimulated by competition and thus brings better performance and minimizing mistakes
Achievements
Since users have usually been assigned to daily and routine duties and tasks in business software systems, different levels of achievements bring them more engagement and passion in work
Levels
Since users have usually been assigned to daily and routine duties and tasks, different levels of achievements bring them more engagement and passion in work in addition to the natural human desire for achievements
Opportunities for rewarding
Rewarding has been recognized in the literature and practice as one of the strongest motivators for achieving tasks and performance. The properly embedded opportunities for rewarding not directly related to payment give other perspective of users and motivate their engagement better than extra payment on monthly basis. On the other hand, that approach stimulates each one of the performed tasks and the teamwork
Gifts
Gifts are almost as strong motivator for achieving performance and results as payment is. By providing gifts instead of extra payment, employers stimulate the inner employees’ engagement
Challenges
The newly generations are more and more less susceptible to extra payment. The motivational factors for stimulating their work performance are a huge topic and challenges have been recognized as a tool for handling this problem
with the leading role of gamification elements, techniques and mechanism, identified in the previous focus group meeting.
Gamification Application in Different Business Software …
691
5 Conclusion The research provides a general information of how gamification, i.e. game elements, techniques and mechanism may be used in business software implantation and management. As part of the enterprise assets, management information systems or business software is usually compromised by curved processes, bad practices, singleheaded in decision-making, etc. By embedding gamification within these processes— implementing and managing might be solved or at least improved. The suitable gamification elements, techniques and mechanism according to the formed focus group and the provided both gamification literature analysis and summary of gamification tools is described in details in the research. The most appropriate gamification tools which were several time mentioned within different phases of implementation and management of the scoped business information systems were: cooperation in the field of competition; achievements; virtual goods; levels; opportunities for rewarding, feedback. The outlined specific goals that may be achieved and supported by gamification during business software implementation and management were: stimulating engagement; stimulating better performance; enjoying the route tasks and operations; stimulating the proper work; and minimizing process curving. Compliance with Ethical Standards This study was funded by NID NI 14/2018; BG NSF Grant No. M 15/4 -2017 (DM 15/1), KP-06 OPR01/3-2018. The authors declare that they have no conflict of interest. Informed consent was obtained from all individual participants included in the study. This chapter contains a brainstorming with subject matter experts as per their ethical approval and their personal data has been anonymized. Some parts of the literature analysis are based on previous work of the author.
References 1. J.Hamari, J. Koivisto, P. Parvinen, Introduction to Gamification Minitrack, in Proceedings of the 52nd Hawaii International Conference on System Sciences, p. 1445 (2019) 2. S. Schöbel, A. Janson, J.C. Hopp, J.M. Leimeister, Gamification of online training and its relation to engagement and problem-solving outcomes, in Academy of Management Annual Meeting (AOM) (Boston, Massachusetts, USA, 2019) 3. J. Koivisto, J. Hamari, The rise of motivational information systems: a review of gamification research. Int. J. Inf. Manag. 45, 191–210 (2019) 4. J. Huizinga, Homo ludens: a study of the play element in culture, ROUTLEDGE & KEGAN PAUL London, Boston and Henley, Second edition, (first edition published in German in Switzerland in I944) (1949) 5. N. Pelling, The (short) prehistory of gamification, funding startups and other impossibilities. Haettu (2011) 6. J. Andrews, Why use gamification: the power of games, retrieved from: Article Source https://www.zco.com/blog/why-use-gamification-the-power-of-games/, accessed on 03.12.2018 (2011)
692
Z. Yordanova
7. S. Deterding, D. Dixon, R. Khaled, L. Nacke, From game design elements to gamefulness: defining “gamification”, in MindTrek, eds. by A. Lugmayr, H. Franssila, C. Safran, I. Hammouda, pp. 9–15 (2011). https://doi.org/10.1145/2181037.2181040 8. Deterding et al., Gamification: toward a definition, CHI 2011, 7–12 May 2011, ACM, Vancouver, BC, Canada. 978-1-4503-0268-5/11/05 (2011) 9. J. Lee, J. Hammer, Gamification in education: what, how, why bother? Acad. Exch. Q. 122, 1–5 (2011) 10. K.M. Kapp, The gamification of learning and instruction: game-based methods and strategies for training and education (Pfeiffer, San Francisco, CA, 2012) 11. Kiryakova et al., Gamification in education, in Conference: 9th International Balkan Education and Science Conference (Edirne, Turkey, 2014) 12. J. McGonigal, Reality is Broken: Why Games Make us Better and How They can Change the World (The Penguin Press, 2011) 13. Deterding et al., Gamification: Using Game Design Elements in Non-Gaming Contexts. CHI 2011, 7–12 May 2011, Vancouver, BC, Canada 14. K. Huotari, J. Hamari, Defining gamification—a service marketing perspective, in Proceedings of the 16th International Academic Mindtrek Conference (ACM, Tampere, Finland, New York, NY, USA, 3–5 Oct 2012), pp. 17–22 15. A. Shpakova, V. Dörfler, J. MacBryde, Changing the game: a case for gamifying knowledge management. World J. Sci. Technol. Sustain. Dev. 14, 143–154 (2017) 16. G. Zichermann, C. Cunningham, Gamification by design: implementing game mechanics in web and mobile apps (O’Reilly Media, Sebastopol, CA, 2011) 17. K. Werbach, D. Hunter, For the Win: How Game Thinking can Revolutionize Your Business (Wharton Digital Press, 2012) 18. B. Burke, Gamify: How Gamification Motivates People to Do Extraordinary Things, 1st edn. (Routledge, 2014) 19. Z. Yordanova, Gamification for handing educational innovation challenges, in Digital Transformation of the Economy: Challenges, Trends and New Opportunities. Advances in Intelligent Systems and Computing, eds. by S. Ashmarina, A. Mesquita, M. Vochozka, vol. 908 (Springer, Cham, 2020) 20. M. Jakubowski, Gamification in business and education, project of gamified course for university students. Dev. Bus. Simul. Exp. Learn. 41, 339–341 (2014) 21. Z. Yordanova, Educational innovations and gamification for fostering training and testing in software implementation projects, in Software Business. ICSOB 2019, eds. by S. Hyrynsalmi, M. Suoranta, A. Nguyen-Duc, P. Tyrväinen, P. Abrahamsson. Lecture Notes in Business Information Processing, vol. 370 (Springer, Cham, 2019) 22. Mangunwihardjo Abdurrahman, Sufian: establishing competitive advantage to improve business performance. Jurnal Ekonomi dan Bisnis Indonesia 03(01), 23–30 (2018) 23. R. Varbanov, Biznes v sredata na Veb 2.0 (Veb 2.0, Enterptise 2.0, Cloud computig, Saas), Biblioteka “Stopanski svJat”, Izdanie na Stopanska akademiJa “D. A. CENOV” Svishtov (2011) 24. V. Kisimov, Biznes upravlenie na dinamichni infomacionni sistemi, disertacionen trud (2009) 25. A. Petkov, Upravlenski informacionni sistemi, Primaks OOD, str. 11 (2004) 26. V. Mihova, A. Murdzheva, Analiz na vazmozhnostite za sazdavane na arhitektura na biznes inteligentna sistema za optimizirane barzodejstvieto na bazi danni, Mezhdunarodna konferencia „Prilozhenie na informacionnite i komunikacionni tehnologii v ikonomikata i obrazovanieto”, 2–3 dekemvri, 2011, UNSS, Sofia (2011) 27. V. Kisimov, Web 3.0 approach to corporate information systems evolution. Econ. Altern. 2, 5–19 (2012) 28. M.A. Ferguson, Business Process and Performance Management Framework for the Intelligent Business. Intelligent Business Strategies (2008) 29. H. Lucas, Information Systems Concepts for Management, 5th edn. (Mitchel McGraw-Hill Inc., 1994)
Gamification Application in Different Business Software …
693
30. W. Nyumba, D. Mukherjee, The use of focus group discussion methodology: insights from two decades of application in conservation. Qual. Methods Eliciting Judgements Decis. Making 9(1), 20–32 (2018). https://doi.org/10.1111/2041-210X.12860 31. Z. Yordanova, Knowledge transfer from lean startup method to project management for boosting innovation projects’ performance. Int. J. Technol. Learn. Innov. Dev. 9(4) (2017) 32. J. Kahan, Focus groups as a tool for policy analysis, analysis of social issues and public policy, pp. 129–146 (2001) 33. P. Boer, Introduction to Gamification, doctpna na: https://cdu.edu.au/olt/ltresources/downlo ads/whitepaper-introductiontogamification-130726103056-phpapp02.pdf (2013) 34. N. Stoimenov, Advanced computing for energy efficiency of milling processes. Prob. Eng. Cybern. Robot. 66, 83–91 (2015) 35. Werbach K. (2014) (Re)Defining Gamification: A Process Approach. In: Spagnolli A., Chittaro L., Gamberini L. (eds) Persuasive Technology. PERSUASIVE 2014. Lecture Notes in Computer Science, vol 8462. Springer, Cham
Data Exchange Between JADE and Simulink Model for Multi-agent Control Using NoSQL Database Redis Yulia Berezovskaya, Vladimir Berezovsky, and Margarita Undozerova
Abstract This paper describes the way for data exchange between Simulink model and JADE multi-agent control. Simulink model is the predictive thermal model for datacenter. Multi-agent control is aimed to optimize energy consumption of the datacenter ventilation system. The data exchange is carried out via NoSQL database Redis. The paper offers reasons for choosing Redis as middleware in the interaction of multi-agent control with Simulink model as well as the paper describes the Simulink blocks and agent in JADE that are developed for interaction (reading/writing) with Redis. Keywords Co-simulation · Multi-agent control · Simulink · JADE · Redis · Simulink/JADE interface
1 Introduction The constantly growing energy consumption is one of the great challenges in the modern world. Significant contribution to total world energy use is provided by modern datacenters intended for storage and processing of huge data amount. The computing power consumption is the largest and more important part of the datacenter energy consumption of and there are a number of achievements in reducing the energy consumption of this part [1]. Other important part in terms of energy is a datacenter cooling system, which often operates with some redundancy [2], and more fine-tuning of this system operation would allow saving energy. We consider cooling system as global cooling and set of local fans. The global cooling ensures appropriate environment conditions inside the datacenter such as Y. Berezovskaya (B) Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, Luleå, Sweden e-mail: [email protected] V. Berezovsky · M. Undozerova M.V. Lomonosov Northern (Arctic) Federal University, Arkhangelsk, Russia e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_62
695
696
Y. Berezovskaya et al.
temperature and relative humidity. The set of local fans provides cooling of datacenter computational nodes (CPUs); usually, one local fan is used for cooling of one computational node. In order to save energy, each local fan has to work with the lowest power consumption sufficient to maintain the permissible temperature of the corresponding CPU and global cooling has to operate with the lowest power consumption sufficient to maintain permissible temperature inside the datacenter. In order to optimize energy consumption of datacenter cooling system, it is reasonable to use multi-agent control. The main interesting feature of multi-agent control is peer-to-peer communicating controllers which collaborate to determine their actions [3]. Each local fan has its own controller, and global cooling has controller too. The controllers determine the operating mode of corresponding units according to predefined conditions and goals pursued by the controllers. The goals for all controllers are to maintain appropriate temperature values, and they do not allow the units to work excessively, thus try to save energy. We can help researchers and engineers to simulate and analyze the behavior of datacenter from energy consumption point of view and to simulate and analyze the behavior of a datacenter in its interaction with its surrounding environment. We can give the opportunity to test different agent decision algorithms by creating tool for co-simulation of multi-agent control and behavior of datacenter from thermal and energy consumption points of view. We propose an approach for the joint simulation of multi-agent control and thermal model of datacenter developed with different tools; for correct simulation, these models need to exchange some data. The multi-agent control developed with Java Agent Development framework (JADE) receives the temperature values from datacenter model developed with Simulink and sends the control signals for cooling system units. Such simulation tool can create substantial amount of data, which represents the behavior of datacenters in their operation time. It would be useful not only to exchange data between models but also to store the data for further analysis. We use the NoSQL database Redis for data exchange between models developed in JADE and Simulink, as well as Redis allows to store the history of control signals and set of corresponding values of temperature. So, for different scenarios we can accumulate initial data for analytics tools using methods of data mining and machine learning. The rest of this paper is organized as follows: Sect. 2 is a task description; Sect. 3 is a brief background study on existing ways of data exchange between JADE and Simulink; Sect. 4 is description of NoSQL database Redis and substantiation of its use for this task; Sect. 5 is description of agent used for interaction with Redis (reading/writing); Sect. 6 is description Simulink block used for interaction with Redis (reading/writing), and in Sect. 7, conclusion and some suggestions for future work are presented.
Data Exchange Between JADE and Simulink Model …
697
2 Task Description The task of this work is to develop system for co-simulation of datacenter and multiagent control for its cooling system. The goal of co-simulation system is to research different decision-making scenarios of multi-agent control. In order to get feedback from the controlled datacenter model, it is necessary to organize data exchange between multi-agent control and datacenter model. Figure 1a demonstrates the block representation of datacenter model. The main characteristics of this block are its inputs and outputs. Inputs for model of datacenter are the control signals generated by multi-agent control. GF_CS—control signal for global cooling and LF_CSs—vector of control signals for local fans. Outputs for datacenter model are the temperature values generated by datacenter model. T_SR— temperature in the server room and T_CPUs—vector of CPUs temperature values. For datacenter modeling, we have developed a library of blocks in Simulink. The library contains blocks that model the base components of a datacenter and some service blocks. The library makes it possible to create instances of the blocks and to construct models of datacenters in different configurations [4, 5]. Figure 1b demonstrates the block representation of multi-agent control. The main characteristics of this block are its inputs and outputs. Inputs for multi-agent control are the temperature values generated by datacenter model: T_SR—temperature in the server room and T_CPUs—vector of CPUs temperature values. Outputs for multiagent control are the control signals generated by multi-agent control: GF_CS— control signal for global cooling and LF_CSs—vector of control signals for local fans. Figure 1 shows that the inputs of one block need to be connected to the outputs of another block, and vice versa. In order to complete the task of this work, we need to find out the way of data exchange between the datacenter model and multi-agent control. In proposed scheme of interaction, datacenter model developed in Simulink sends temperature values to Redis; multi-agent control developed in JADE receives temperature values from Redis, makes decision about control signals for datacenter cooling system, and sends the signals to Redis; datacenter model receives the control signals and, if necessary, changes the operating modes of cooling system units. The block diagram of an interaction between these parties is shown in Fig. 2. Fig. 1 Block representation of datacenter model (a) and multi-agent control (b)
698
Y. Berezovskaya et al.
Fig. 2 Block diagram of the interaction of agents in the simulation of a datacenter model
3 Background Study The model of data center was implemented in MATLAB/Simulink. But S-functions available in MATLAB/Simulink are not capable to handle multiple threads of execution, which is critical for multi-agent systems (MAS). Thus, it is necessary to separate MAS from Simulink, allowing to overcome the problem with multiple threads of execution. To do this, we need to introduce some intermediary that would act as middleware between Simulink models and agents. Analysis of the literature showed that the interaction of two platforms, Java Agent Development Framework (JADE) and MATLAB/Simulink, is usually implemented in one of two ways. The first way is using MACSimJX middleware [6, 7], and the second one is using TCP/IP communications [8–10]. The MACSimJX middleware is developed to enable communication between Simulink and JADE, thereby combining two powerful software tools for modeling MAS and hardware of real-time systems [11]. However, it seems to be developers do not support this tool and stopped on version for MATLAB 2010. Adjustment of the existing version of MACSimJX for modern versions of MATLAB could require a lot of time so this tool is not suitable for our task. Clear TCP/IP communication does not quite fit for our task; we need the ability to store the transmitted data. For our goals, we decided to use NoSQL database Redis which can receive values of temperature from model of datacenter and save these values on disk. On the other hand, Redis can receive control signals from multi-agent control and save them. For transmission data from Simulink and JADE to Redis and from Redis to Simulink and JADE, we use TCP/IP communication. We need to develop blocks in Simulink for interaction with Redis, as well as with the same goal we need to create a special agent in JADE.
Data Exchange Between JADE and Simulink Model …
699
Fig. 3 Agent diagram with dedicated for interaction with Simulink model agent
The proposed multi-agent control architecture using Redis as an intermediary is presented in Fig. 3. Local fan agents interact with a distributed local fan system, cooler agents, and global fan agents interact with a global cooling system. To interact with the data center model, an additional agent is needed, which will receive data from the data center model for processing by the respective agents, and also send signals to the model for devices of the cooling system.
4 NoSQL Database Redis Redis is an open-source NoSQL class in-memory database distributed under a modified BSD license. It proceeds data structures of the “key-value” type and is mainly used as a key-value database, implementation of cache, and message brokers. Compared to other databases, Redis shows very high performance [12]. This is the most popular key-value database, according to the DB-Engines rating [13]. It is written in C, and most major programming languages have Redis bindings. Redis supports the following abstract data types: strings, lists, sets, hash tables, geospatial data, HyperLogLogs, bitmaps, and streams. Redis has transactions, various levels of on-disk persistence, LRU eviction, Lua scripting, and built-in replication. With automatic partitioning using Redis Cluster and by Redis Sentinel, it provides high availability [14]. Most often, Redis is used for caching, queuing and for enabling event-driven architectures (Pub/Sub). Solving the task of ensuring high performance, Redis optimizes data in memory so as to hold high memory efficiency, low applications network traffic, and low computational complexity [15]. By extending its architecture and introducing a cluster, Redis enables to guarantee high availability. Single instance configuration of Redis is strong consistent. In a distributed configuration, when the client work with replica nodes, the Redis Cluster is eventual consistent. Redis Cluster has the following features [16, 17]: • Scalability. Linear scalability up to 1000 nodes gives high performance. • Availability. Redis Cluster remains viable when network splits into fragments if at least one reachable slave exists for each inaccessible master node and most of the master nodes are reachable.
700
Y. Berezovskaya et al.
• Relaxed write warranty. Some of the write operations can be lost, despite the efforts of Redis Cluster to retain all issued by the application write operations. The main type of basic tasks involved interfacing between the Simulink block and the JADE agent is queried by categories: Key search query—to search for a specific sensor log among many other data sets. Two main tasks are assigned to Redis as an intermediary. First, it provides a cache for a model data to fast retrieval of persistent state. Secondly, it stores short-lived data, such as an instant data of simulation.
5 The Agent for Interaction with Redis For JADE interaction with Redis data store, it was decided to use Jedis, the Redis client for the Java language [18]. Using Jedis, one can send data to the store and receive data from it by key. Thus, a special agent can be introduced into the multiagent system. It will receive data from the datacenter model collected in the Redis and send it to the corresponding agents. After the agents process this data, it sends the results to the Redis data store, so that the model can use it. To interact with the Redis data store to exchange data is exchanged with the datacenter model, the RedisAgent class is implemented. A diagram of the RedisAgent class and its associated classes is shown in Fig. 4. The class contains two fields: Field
Fig. 4 RedisAgent and associated classes diagram
Data Exchange Between JADE and Simulink Model …
701
mainBehaviour—the main sequential behavior of the agent, signalsList—a list of signals to devices of the cooling system of the data center model. public void receiveDataFromRedis( SequentialBehaviour receiveBehaviour) { Jedis jedis = new Jedis(REDIS_HOST, REDIS_PORT); List temperatures = jedis.lrange("temperatures", 0, temperaturesSize); } public void sendDataToRedis( SequentialBehaviour receiveBehaviour) { Jedis jedis = new Jedis(REDIS_HOST, REDIS_PORT); jedis.del("signals"); for(int i = 0; i < signalsList.size(); i++) { jedis.lpush("signals", S String.valueOf(signalsList.get(i).sign)); } for(int i = 0; i < signalsList.size(); i++) { signalsList.get(i).sign = 0; } }
Signals are recorded in the list with the signals key and can be extracted from the Redis store using this key. All of the above methods are used in behavior. The primary behavior is the behavior of LoopRedisAgentBehaviour. In this class, the following types of behavior are added to the sequential behavior: • ReceiveDataFromRedisBehaviour—behavior for retrieving data from the Redis data store; • DelayBehaviour—behavior auctioning a temporary delay; • ReceiveDataFromAgentsBehaviour—behavior of receiving control signals from agents (listing 77); • SendDataToRedisBehaviour—behavior of sending received control signals to the Redis data store.
702
Y. Berezovskaya et al.
class LoopRedisAgentBehaviour extends BasicRedisAgentBehaviour { private static final long serialVersionUID = 1L; public LoopRedisAgentBehaviour(Agent a, SequentialBehaviour seq) { super(a, seq); } public void action() { _seq.addSubBehaviour( new ReceiveDataFromRedisBehaviour(_a, _seq)); _seq.addSubBehaviour(new DelayBehaviour(_a, 1500)); _seq.addSubBehaviour( new ReceiveDataFromAgentsBehaviour(_a,_seq)); _seq.addSubBehaviour( new SendDataToRedisBehaviour(_a, _seq)); _seq.addSubBehaviour(new DelayBehaviour(_a, 500)); _seq.addSubBehaviour( new LoopRedisAgentBehaviour(_a, _seq)); } }
6 The Simulink Blocks for Interaction with Redis Simulink block for writing to Redis implements S-function: DC to Redis. function [s, x0, str, ts] = DC_to_Redis(t, x, u, fl) switch fl case 0 [s, x0, str, ts] = InitializeSizes; case 2 s = Update(t, x, u); case 3 s = Outputs(t, x, u); case 9 s = Terminate(t, x, u); otherwise error([’Unhandled flag = ’, num2str(fl)]); end
Data Exchange Between JADE and Simulink Model …
703
function [s, x0, str, ts] = InitializeSizes sz = simsizes; sz.NumInputs = 1; sz.NumSampleTimes = 1; s = simsizes(sz); x0 = []; str = []; ts = [-1 0]; r = tcpip(get_param(“REDIS_HOST”), get_param(“REDIS_PORT”)); r.terminator = ’CR/LF’; fopen(r); if r.BytesAvailable > 0, fread(r, r.BytesAvailable); end set_param(gcbh, ’UserData’, r); function s = Update(t, x, u) r = get_param(gcbh,’UserData’); key = get_param(“REDIS_KEY”); value=num2str(u(1)); fprintf(r, sprintf(’SET %s %s \n’, key, value)); s = []; function s = Outputs(t, x, u) r = get_param(gcbh,’UserData’); key = get_param(“REDIS_KEY”); fprintf(r, sprintf(’GET %s \n’, key)); while r.BytesAvailable == 0, pause(0.001); end while r.BytesAvailable > 0, [chunk, ~, msg] = fread(r, r.BytesAvailable) value = [value char(chunk’)] end s = str2num(value(5 : end)); function s = Terminate(t, x, u) r = get_param(gcbh, ’UserData’, r); fclose(r); s = []; The rest of the source code as well as examples can be retrieved from GitHub repository [19].
704
Y. Berezovskaya et al.
7 Conclusion In this paper, the task of co-simulation of datacenter and multi-agent control for its cooling system is considered. The proposed solution uses JADE to simulate multiagent system to control for datacenter’s cooling devices. Imitation model of datacenter was created in Simulink environment. Communication between model of datacenter and agents carried out through NoSQL database Redis. Agent used for interaction with Redis (reading/writing) and Simulink blocks used for interaction with Redis (reading/writing) are proposed. In future work, we going to consider using MQTT [20] broker as an intermediate between Simulink and JADE. MQTT is a TCP-based, machine-to-machine protocol designed for IoT devices. MQTT has low traffic overhead and low bandwidth requirements [21]. It implements the design pattern publisher/subscriber, enables IoT devices with limited resources to send or publish information on a given topic to a server that acts as an MQTT message broker. It is promising to use it as the interface we are looking for.
References 1. J. Koomey, Growth in Data Center Electricity Use 2005 to 2010 (Analytics Press, Oakland, 2011) 2. Z. Wang et al., Optimal fan speed control for thermal management of servers, in ASME 2009 InterPACK Conference Collocated with the ASME 2009 Summer Heat Transfer Conference and the ASME 2009 3rd International Conference on Energy Sustainability (ASME Digital Collection, 2009), pp. 709–719 3. M. Wooldridge, An Introduction to Multiagent Systems (Wiley, 2002) 4. A. Mousavi, V. Vyatkin, Y. Berezovskaya, X. Zhang, Cyber-physical design of data centers cooling systems automation, in 2015 IEEE Trustcom/BigDataSE/ISPA, Vol. 3 (IEEE, Helsinki, 2015), pp. 254–260 5. A. Mousavi, V. Vyatkin, Y. Berezovskaya, X. Zhang, Towards energy smart data centers: simulation of server room cooling system, in IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA) (IEEE, New York, 2015), pp. 1–6 6. Y.S.F. Eddy, H.B. Gooi, S.S. Chen, Multi-agent system for distributed management of microgrids. IEEE Trans. Power Syst. 30, 24–34 (2015) 7. W. Khamphanchai, M. Kuzlu, M. Pipattanasomporn, A smart distribution trans-former management with multi-agent technologies, in IEEE PES Innovative Smart Grid Technologies (ISGT) (IEEE, New York, 2013), pp. 1–6 8. M.A. Haj-ahmed, Z.P. Campbell, M.S. Illindala, Substation automation for re-placing faulted CTs in distribution feeders, in IEEE International Conference on Power Electronics, Drives and Energy Systems (PEDES) (IEEE, New York, 2014) 9. M.T. Kuo, S.D. Lu, Design and implementation of real-time intelligent control and structure based on multi-agent systems in microgrids. Energies 6, 6045–6059 (2013) 10. F. Yang, R. Roche, F. Gechter, F. Gao, A. Koukam, An agent-based approach for battery management systems, in 2015 IEEE Energy Conversion Congress and Exposition (IEEE, New York, 2015), pp. 1367–1374 11. MACSIMJX. http://agentcontrol.co.uk/. Last accessed 04 Sept 2019 12. A.T. Kabakus, R. Kara, A performance evaluation of in-memory databases. J. King Saud Univ. Comput. Inf. Sci. 29(4), 520–525 (2016)
Data Exchange Between JADE and Simulink Model …
705
13. Solid IT. DB-Engines Ranking. https://db-engines.com/en/ranking. Last accessed 04 Sept 2019 14. Redis Labs. Redis Website. https://redis.io/. Last accessed 04 Sept 2019 15. M. Diogo, B. Cabral, J. Bernardino, Consistency models of NoSQL databases. Future Internet 11(2), 43 (2019) 16. Redis Labs. Redis Cluster Specs. 2018. Available online: https://redis.io/topics/clus-ter-spec. Last accessed 04 Sept 2019 17. Redis Labs. Redis Cluster Tutorial. 2018. Available online: https://redis.io/topics/cluster-tut orial. Last accessed 04 Sept 2019 18. Jedis. https://github.com/xetorthio/jedis, last accessed 2019/09/04 19. https://github.com/valber-8/sim2jade. Last accessed 04 Sept 2019 20. MQTT—Message Queuing Telemetry Transport. http://mqtt.org/. Last accessed 04 Sept 2019 21. T. Yokotani, Y. Sasaki, Comparison with HTTP and MQTT on required network re-sources for IoT, in 2016 International Conference on Control, Electronics, Renewable Energy and Communications (ICCEREC) (IEEE, New York, 2016), pp. 1–6
Visualizing Academic Experts on a Subject Domain Map of Cartographic-Alike Diana Purwitasari, Rezky Alamsyah, Dini Adni Navastara, Chastine Fatichah, Surya Sumpeno, and Mauridhi Hery Purnomo
Abstract Visualizing bibliographic information aids academician users to gain insights into science mapping, and then to define the next research plans. This paper focused on expert visualization to make users utilizing their cognitive skills to comprehend science mapping by exploring experts and domain expertise. To address the comprehension problem, we represented the knowledge domain and the involving players or academic experts in a visual approach of cartographic-alike. First, to generate a base map of standardized knowledge domains, we identified semantic relatedness through word embedding on collected metadata texts of articles according to Scopus subject areas. Then, the expert coordinates were obtained after transforming article metadata with the base map and the articles were labeled with subject domains. To make it cartographic-alike, subject domain color on the map was set, where darker areas indicated more experts had interests in the particular subjects, while blended colors demonstrated mixed subjects. The experiments required two semi-manually collected datasets of Domain Data and Researcher Data in the forms of Scopus metadata. Our findings on the embedding process showed that D. Purwitasari (B) · S. Sumpeno · M. H. Purnomo Department of Electrical Engineering, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia e-mail: [email protected] S. Sumpeno e-mail: [email protected] M. H. Purnomo e-mail: [email protected] D. Purwitasari · R. Alamsyah · D. A. Navastara · C. Fatichah Department of Informatics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia e-mail: [email protected] D. A. Navastara e-mail: [email protected] C. Fatichah e-mail: [email protected] S. Sumpeno · M. H. Purnomo Department of Computer Engineering, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_63
707
708
D. Purwitasari et al.
labeling articles, and hence, experts gave a better performance with training on 1st tier Scopus subject areas of four domains compared to 2nd tiers of 26 sub-domains to avoid over-mixed subjects in the article contents. The visual result of the cartographic colored map had encouraged the respondents to explore the research interest of the experts. After observing the color blended map, users could be expected to initiate crossed domain collaboration plans. Keywords Visualizing experts · Subject domain map · Word embedding · Article labeling
1 Introduction Bibliographic information for scientific literature datasets contains scientific article metadata of title, authors, abstract, keywords, and references. With the increase of open datasets, science mapping as the process of mining and analysis of the metadata helps the user in finding emerging topics and making research plans. Since the beginning of bibliometric research, there have been many studies that introduce the visual approaches to gain insights into science mapping. Visualizing topics as a vital process in science mapping can be roughly categorized into topic content, topic relationship, and topic evolution [1]. There was a cartographic approach to display topic contents of scientific literatures from keywords and their semantic relations on the map [2]. Another visualization aspect of the topic relationship was about representing co-citations in tree-based structures [3]. However, mechanisms of zooming, searching, or filtering became challenging issues, not to mention explanation efforts to make user understands the meaning of tree structure. Map-related technique was also used with spatial autocorrelation analysis to generate a Delaunay triangulation network of keyword polygons. The map illustrated topic evolution issue through categories inspired by emerging topic statuses with the influence of semantic relatedness among keywords [4]. The visual approach also considered the article authors in the network forms [5]. Many visualization approaches described only the relations in the knowledge domain or just the relations between the involving authors or experts of the knowledge domain. Yet, combining both relationships between knowledge domain and the involving players, or defined as academic experts, in visualization is less studied. Previous descriptions took account of visualizing topics as part of the knowledge domain in the science mapping, which often identified from article texts. Without any predefined information, clustering became common approach like an expert map generated by self-organizing map (SOM) algorithm on dataset of researches and their listed interest [6]. Notes that with other datasets, the SOM approach would have different maps. Trends of research topics are intuitively observable through the same set of subject areas. However, positions of subject domains on the map can be entirely different because article metadata as the domain source is periodically updated. For that reason, this paper proposed a system for visualizing experts on a
Visualizing Academic Experts on a Subject …
709
standardized map of subject domains. The subsequent sections explained about four main processes in the proposed system and then followed with empirical experiments using metadata of local academicians manually collected from Scopus.
2 Related Works Visualizing scientific literatures generally emphasizes on texts of title-abstract, citations of networks, authors, and metadata which includes all three parts through the mechanisms of looking up, seeking relations or finding temporal pattern [7]. Still related to over time analysis, citations combined with temporal patterns were allowing users to track long-term developments for seeing the research trends [8], whereas the networks of citations or co-authorships were utilized to predict the possibilities for activities of the researchers [9]. With topics as the focus, the concerned process included defining research areas, knowing relationships between those subject areas and then predicting subject evolution. To incorporate the authors, mapping them to the defined research areas and then identifying relationships between them before predicting future collaborations are still interesting for explorations with some supported visuals. Those information were defined as author profiles [10], although that visual did not display authors onto a map of subject areas. The map is expected self-explanatory interface to support users in understanding any information related to scientific literatures. Hence, the next proposed process here established a standardized and self-explanatory map for exhibiting experts and their research interests.
3 Proposed Methodology to Visualize Experts on Subject Domain Map There are four main processes in Fig. 1, namely collecting article metadata of researchers and domain-related, then preparing the base map from knowledge domain according to Scopus subject area. After transforming articles in reference to the base map, the scaling process makes the visualization result displaying subject areas of the experts. Here, users were expected to utilize the proposed system, called in Indonesian language as SIPETRUK that representing features of mapping, recommending, and visualizing to explore experts and their subject domains through a map. Later, we described the context of use for SIPETRUK and other existing nationwide systems that related in supporting policies to increase the national research productivity level. The process of preparing subject domains in Fig. 2 used dataset of article metadata from predefined subject domains of Scopus subject areas. Terms in texts of titles and abstracts from articles in Domain Data were tagged based on parts of speech because further process only required noun keywords. Preparing subject domain
710
D. Purwitasari et al.
Fig. 1 System architecture for visualizing academic experts on a subject domain map of cartographic-alike
Fig. 2 Transforming metadata based on information of subject domains
applied Word2Vec word embedding using a shallow neural network [11] on those keywords. It resulted into vectors of weight values of important words in Domain Dictionaries. Vectors with closer values represented similar semantic relationship of noun keywords. Then, raw texts of title-abstract of published articles for each researcher (from Researcher Data) and Scopus recent articles based on subject areas (from Domain Data) were transformed with Domain Dictionaries. The transformation results of word embedding approach were kept as Domain-based articles, in which all represented articles became vectors with the same dimensions. To visualize the articles on a subject domain map, all vectors had 2D transformation of t-Distributed Stochastic Neighbor Embedding (t-SNE) [12] and being generated into x-y coordinates which more suitable for user observation. Article vectors tend to be high-dimensional data because number of keywords becomes number of dimensions. Method t-SNE was used to project the article vectors to a
Visualizing Academic Experts on a Subject …
711
Fig. 3 Scaling in expert visualization with cartographic-alike using transformed data
low-dimensional space. It still maintained the local structure or distances between the points to remain almost the same without turning them into crowded points. Two points of articles were located closer if they had similar context, and it could be expected that both articles utilize the same keywords. The focus on the proposed system was visualizing the experts instead of articles of the experts. Consequently, after t-SNE transformation, expert coordinates were obtained through calculating the center value of article coordinates from each expert collection respectively. Steps in Fig. 3 are displaying the expert coordinates and other meaningful information on the map, so that users could apply their cognitive skills to explore the visualization to comprehend science mapping. In the visualization, labels of Scopus subject areas were covered in the base map and became the information of experts as well. As a consequence, expert labels were established from article labels. This was similar to the process for obtaining expert coordinates from article coordinates. Given Word2Vec transformed articles in Domain-based articles, a test data of an expert article from Researcher Data was labeled by a lazy learning approach inspired from k-nearest neighbors (kNN) [13]. Noted that, a test data of an expert article was also Word2Vec transformed. If an expert had a number of articles labeled in certain subject more than a threshold value, the expert was stated to have the particular subject as his or her research interest. For coloring the map in Fig. 3 to help user cognitive skill, the base map was set to have scaled grids. Each grid color depends on the numbers of subjects and experts. Various subjects influenced the blended aspect in coloring, while the number of experts in one subject affected the transparency aspect.
4 Datasets For our experiments, we had two collections of articles called as Researcher Data and Domain Data from Scopus metadata (Fig. 1). Subject details of the collected articles were listed in Table 1. As the first collection of Researcher Data, we used Scopus metadata of scientific literature filtered with the keyword “computer science” for selecting researchers or lecturers in our university, which is a public institution emphasized on the study fields of scientific and engineering. The manually collected
712
D. Purwitasari et al.
Table 1 List of Scopus subject areas along with the labeled numbers of articles and experts Scopus subject area Scopus subject (Tier I) area (Tier II)
Scopus code
Number of articles
Number of faculty experts
Health Sciences-HS There are no auto-labeled articles of Researcher Data for Medicine (± 10,000 articles) (MEDI), Nursing (NURS), Veterinary (VETE) and Dentistry (DENT) color: Red (89) Health HEAL 89 FTE(6), FTIK(2) Profession Life Sciences-LS (± 10,000 articles) color: Green (35)
Physical Sciences-PS (± 20,000 articles) color: Blue (2399)
There are no auto-labeled articles of Researcher Data for Biochemistry, Genetics, and Molecular Biology (BIOC), Immunology and Microbiology (IMMU), and Neuroscience (NEUR) Agricultural and AGRI Biological Sciences
24
FTI(1)
Pharmacology, Toxicology and Pharmaceutics
11
FS(2)
PHAR
There is no auto-labeled articles of Researcher Data for Chemistry (CHEM) Chemical Engineering
CENG
84
FTI(2), FS(3)
Computer Science
COMP
1373
Earth and Planetary Sciences
EART
70
Energy
ENER
322
Engineering
ENGI
6
Environmental Science
ENVI
39
FS(2), FTI(2), FTI(1), FTSLK(1)
Materials Science
MATE
62
FTI(4), FS(1), FTSLK(1)
Mathematics
MATH
363
Physics and Astronomy
PHYS
80
FTE(39), FTIK(20), FMKD(10), FTI(10), FTK(2) FTI(10), FS(2), FTSLK(2), FTE(2), FMKD(1), FTE(13), FTI(9), FMKD(1), FS(1), FTIK(1), FTK(1) FTI(1)
FTE(13), FMKD(9), FTSLK(3), FS(2), FTI(2), FTIK(2) FTI(3), FMKD(2), FS(2), FTE(1) (continued)
Visualizing Academic Experts on a Subject …
713
Table 1 (continued) Scopus subject area Scopus subject (Tier I) area (Tier II)
Scopus code
Number of articles
Number of faculty experts
Social Sciences-SS (± 12,000 articles) color: Black (793)
Arts and Humanities
ARTS
41
Business, Management and Accounting
BUSI
132
FTI(5), FTIK(3), FTSLK(3), FBMT(1)
Decision Sciences
DECI
581
FMKD(15), FTIK(16), FTI(12), FTE(4), FTSLK(2), FTK(2)
Economics, Econometrics and Finance
ECON
31
Psychology
PSYC
8
FTE(3), FTI(1)
FMKD(3), FTIK(1) FTE(1)
There are no auto-labeled articles for Social Sciences (SOCI)
metadata contained a collection of text files for each researcher. One text file represented a BibTeX database file formed by a list of entries for articles published by a researcher. The focus of this paper was to visualize scientific mapping of experts. Therefore, we did not include metadata files of researchers who have less than ten published articles. It made the dataset contained 3182 articles of 200 researcher files from eight faculties with the following abbreviations in Indonesian language. • FTI (industrial technology) 46 experts • FTE (electrical technology) 44 experts • FTIK (information and communication technology) 39 experts • FMKD (mathematics, computation, and data science) 24 experts
• FTSLK (civil, environmental, and geoengineering) 14 experts • FS (basic science) 9 experts, • FTK (marine technology) 8 experts, and • FBMT (business and technology management) 1 expert
Whereas the second collection of Domain Data focused on 26 Scopus subject areas, with 51,939 bib-items, as shown in Table 1 and each subject area had ±2000 article metadata of title-abstract published from 2017 to 2018. There are two levels of categories in Scopus subject area, i.e., 1st tier subject of physical sciences has 2nd tier subjects of computer science among others. After labeling articles and then labeling experts, Table 1 shows that a researcher might have several Scopus subject areas. Labeling process was completed if the number of labeled articles in one subject ≥5. Our initial analysis showed that prolifically experts were existed in the faculties of FTI, FTE, FTIK, and FMKD. There were three prominent experts in Table 1 with h-Index at least 11 who had several interests until 5–8 domains of 2nd tier subjects.
714
D. Purwitasari et al.
5 Results and Discussions Some Python packages used in the experiments were BibtexParser for parsing raw Scopus metadata, Natural Language Toolkit (NLTK) for text processing, Gensim for word embedding, Scikit-learn for labeling, Seaborn and Matplotlib for visualization, in addition to Mpld3 for bringing the visual into web browser.
5.1 Parameter Setting for Word Embedding Word embedding with Word2Vec required to set some parameters like minimal term occurrences in the collection or distance window to check semantic relations with nearest neighbor terms. Some setting conditions with different values in Table 2 supported the process of preparing subject domain map which was necessary for transforming data based on the domain map. With Gensim package there were two approaches of common bag of words (CBOW) and Skip Gram, as well as options to use all keywords with minimal occurrences in documents or filtered by minimal Document Frequency, DF. This resulted into six conditions for word embedding of 200 dimensions. Then to observe its performance, word embedding result was used for classifying articles in Domain Data with kNN (k = 100) and tenfold validations. Articles in Domain Data had labels of Scopus subject areas while Research Data had not. Average precision over all 26 classes of 2nd tier subjects became the performance indicator since labeling process prioritizes true positive data. However, the precision values were quite low with less than 50%, although Skip Gram (parameter-4 Table 2) gave slightly better result compared to CBOW (parameter-6 Table 2), since Skip Gram weighting took account not randomly word usages in the sentences. Lower precision values indicated that content of the articles was mixed subjects. More keywords included within embedding process were preferable in scientific texts of mixed subjects based on the results without using DF threshold and less value in the minimal count (parameter-1 vs. parameter-4). Table 2 Parameter values for Word2Vec word embedding used in kNN of tier-2 subjects No
Word2Vec
Document Freq.
Min. Count
Window
Avg. Precision
# Terms
1
CBOW
Use min DF = 0.1
10
5
15.15%
–
40.96%
19.607
Skip Gram
Not Use DF, based on minimal word occurrences in min. count
5
41.16%
29.674
1
43.38%
74.984
2 3 4 5 6
CBOW
3
41.15%
5
43.23%
Visualizing Academic Experts on a Subject …
715
Table 3 kNN classification with selected word embedding parameters of tier-1 subjects Classes
Precision (%)
Accuracy (%)
Social Science (only 2nd tier)
75.00
81.46
40–60
20.5–77.5
Physical Science (only 2nd tier)
81.00
88.08
25–73
3.5–71.85
Life Science 71.00 (only 2nd tier) 56–75
63.06
Health Science (only 2nd tier)
81.00
69.46
59–91
65–80.5
Average
78.00
77.97
47–74.5
Mixed subjects: Physical Science (MATE, ENGI has lowest true positive)
For example in the mixed subjects, there was an article of “Smart meter based on time series modify and neural network for online energy monitoring” labeled with COMP but it could be tagged as ENER as well. Noted that, both subjects belong to the same 1st tier. Therefore, with the setting condition of parameter-4 (Skip Gram, not using DF, minimal word count =1, and term distance window = 5), we classified the articles into four classes of HS, LS, PS, and SS as shown in Table 3. The results of precision (only considering true positive) and accuracy (considering both true positive and true negative) had better performance with average values of ±78%. There was an increasing of >35% compared to labeling with 2nd tier subjects as classes, which supported the mixed issue in the contents of articles as shown in the confusion matrix. The numbers of mismatched labels (not in diagonal) were quite high, and the lowest accuracy was 3.5% in ENGI subject with 20 articles of Physical Science (PS) then the opacity of PS was set to 25% and the grid had thicker color of PS. However, if the grid had 20 articles of PS, then the opacity of PS was 12.5% and the grid color became more transparent. Figure 5 demonstrates grids with blended colors and some gradation levels of transparency. After exploring the expert coordinates by mouse hover in the browser, coincide areas of four subject domains were identified. With one boxed area in the example, there was one square grid with at least five experts having accumulated blended colors of HS-25% + PS-25% + SS-25%; i.e., expert(1) was labeled as SS:DECI-14. The experts were researchers from the faculties of FTE and FTIK where their research subjects often related to the keyword “computer science.” Moreover, the grid example revealed that noticeable subjects of 2nd tiers among others were DECI and ECON. Both subjects had strong adjacent context to the keyword “computer science” as well. The experiments had demonstrated a subject domain map with experts on the top and having colored area in cartographic-alike because of various combinations of blend and transparency. The visualization in Fig. 5 bridged mixed subject domains with cartographic-alike approach so the users who explore it could be encouraged to strategize further collaborations.
6 Conclusions We proposed a visual approach to display experts on a standardized subject domain map. The results of the proposed method demonstrated a visualization with a case study in the experiment. Experts of the experiment results were identified to have several related research interests as shown in blended and transparency colors on
718
D. Purwitasari et al.
the map. Selection of subject domains was influential in defining the base map and labeling steps to produce better science mapping. In the recent years, our government launched some systems inspired by traditional characters in our cultures, such as Science and Technology Index Application (SINTA) and Akreditasi Jurnal Nasional, in Indonesian language, for maintaining a list of nationally accredited journals (ARJUNA). Then, the government set some regulations to collect scientific bibliographies of Indonesian researchers. Those activities have final goal to increase the national research productivity, which is relevant in the community problem-solving. Current work in this paper also reinforced the government objective by contributing to a visual approach of science mapping for encouraging user exploration in research interest. Therefore, the next tasks for the visualizing system of SIPETRUK, another character from Indonesian folktales, are related to the recommending feature which has not been thoroughly discussed in here. Compliance with Ethical Standard Funding This work was funded by Institute of Research and Community Service (Lembaga Penelitian dan Pengabdian Masyarakat, LPPM) Institut Teknologi Sepuluh Nopember (ITS) Surabaya with the grant number of 1592/PKS/ITS/2019. Conflict of Interest The authors declare that they have no conflict of interest.
References 1. C. Zhang, Z. Li, J. Zhang, A survey on visualization for scientific literature topics. J. Vis. 21(2), 321–335 (2018) 2. A. Skupin, The world of geography: visualizing a knowledge domain with cartographic means. Proc. Natl. Acad. Sci. 101(Supplement 1), 5274–5278 (2004) 3. J. Zhang, C. Chen, J. Li, Visualizing the intellectual structure with paper-reference matrices. IEEE Trans. Vis. Comput. Graph. 15(6), 1153–1160 (2009) 4. K. Hu et al., Understanding the topic evolution of scientific literatures like an evolving city: using Google Word2Vec model and spatial autocorrelation analysis. Inf. Process. Manag. 56(4), 1185–1203 (2019) 5. F. van Ham, J.J. Van Wijk, Interactive visualization of small world graphs, in Information Visualization. 2004. INFOVIS 2004. IEEE Symposium, pp. 199–206 (2004) 6. Z. Huang, H. Chen, F. Guo, J.J. Xu, S. Wu, W.-H. Chen, Visualizing the expertise space, in Proceedings of the 37th Annual Hawaii International Conference on System Sciences, 2004, p. 9 (2004) 7. P. Federico, F. Heimerl, S. Koch, S. Miksch, A survey on visual approaches for analyzing scientific literature and patents. IEEE Trans. Vis. Comput. Graph. 23(9), 2179–2198 (2017) 8. F. Heimerl, Q. Han, S. Koch, T. Ertl, CiteRivers: visual analytics of citation patterns. IEEE Trans. Vis. Comput. Graph. 22(1), 190–199 (2016) 9. T. Kurosawa, Y. Takama, Predicting researchers’ future activities using visualization system for co-authorship networks, in 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 1, pp. 332–339 (2011) 10. S. Latif, F. Beck, VIS author profiles: interactive descriptions of publication records combining text and visualization. IEEE Trans. Vis. Comput. Graph. 25(1), 152–161 (2019)
Visualizing Academic Experts on a Subject …
719
11. T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, CoRR, vol. abs/1301.3 (2013) 12. L. van der Maaten, G. Hinton, Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008) 13. T. Basu, C.A. Murthy, Towards enriching the quality of k-nearest neighbor rule for document classification. Int. J. Mach. Learn. Cybern. 5(6), 897–905 (2014)
An Empirical Analysis of Spatial Regression for Vegetation Monitoring Hemlata Goyal, Sunita Singhal, Chilka Sharma, and Mahaveer Punia
Abstract With advancement of spatial and machine learning techniques, remote sensing dataset is rapidly being used in agriculture domain. In this paper, districtwise time-series precipitation data and multi-date normalized difference vegetation index (NDVI) results of Rajasthan state coupled with regression techniques of machine learning to evaluate districtwise regression models and make out patterns of vegetation status. The K-fold cross-validations have been used to measure the strength of the best regression model for entire Rajasthan state and districtwise. The results conclude that support vector machine regression model outperforms with 0.80 correlation, 0.800 RSquare, 0.040 RMSE, and 96.610% accuracy. Decision tree performs outstanding model for agriculture status in most of the districts of Rajasthan state for low--medium range vegetation state. Keywords Spatial · NDVI · SPI · VCI · Regression · Vegetation
1 Introduction Spatial data is an efficient way to strength vegetation as it has the potential to identify the relations between the hydro-meteorological precipitation, standardized precipitation dataset, and normalized differentiate vegetation index-based vegetation condition index dataset [1]. It allows to recognize vegetation state/patterns, correlation of precipitation with satellite vegetation indices, next predictions of vegetation, and refining the knowledgebase of the vegetation domain.
H. Goyal (B) · S. Singhal Manipal University, Jaipur, Rajasthan, India e-mail: [email protected] C. Sharma Banasthali Vidyapith, Banasthali-Niwai, Rajasthan, India M. Punia Birla Institute of Scientific and Research, Jaipur, Rajasthan, India © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_64
721
722
H. Goyal et al.
With remote sensing, NDVI-based VCIs compute the condition index/health index of the ground vegetation in percentile [2, 3]. Time series of major months of precipitation data has been used for the June to September months (1981--2019) to calculate standardized precipitation index (SPI) and multi-date AVHRR-NDVI-vegetation condition index (VCI) of July to October months (1997--2016) outcomes to make out the pattern of vegetation associated with regression to monitor the vegetation of Rajasthan [4].
2 Study Region, Materials, and Approach 2.1 Study Region The Rajasthan state consists of 33 districts. Due to unevenness of precipitation to acquire high variation in SPIs dataset with vegetation pattern for all 33-districts, the regression dataset has been selected as study region [5]. All the nominated rain gauge vector locations are depicted in Fig. 1.
Fig. 1 Rajasthan state-districtwise-rain gauge vector point location [5]
An Empirical Analysis of Spatial Regression …
723
Table 1 Details of the feature Feature
Detail
Time range
VCIs
derived from NDVI (minimum 1-maximum 100)
July--October (fortnightly)
SPIs
derived from long-term precipitation dataset
June--September (daily)
Precipitation
district-stationwise precipitation data
June--September (daily)
2.2 Datasets and Its Characteristics AVHRR-NDVI-based VCI dataset in the .tiff format at 4 km resolution has been processed for the time period of 1997--2019 for July--October months. Rain gauge stationwise monthly precipitation dataset has been processed for June--September months (year 1997--2019). A set of 250 rain gauge stations containing 240 features of precipitation, SPIs, and VCIs points for entire boundary of Rajasthan. The dataset description is given in Table 1.
2.3 Data Conversion All time-series monthly precipitation data is standardized, transformed, and integrated with rain gauge stationwise (253 points of 33 districts for entire boundary of Rajasthan). This precipitation data is a precondition for computing SPIs. SPI is computed on timescale 1 and 2 for each rain gauge station. All the data is normalized at 4 km grid to cover 33 districts for entire boundary of Rajasthan state. Table 2 depicts that the generalized discretization classes for precipitation, SPI, and VCI have been categorized [6].
2.4 Spatial Features Measurement Spatial autocorrelation--Moran’s Index and statistical Pearson correlation have been used to select spatial features. Principal component analysis is used to decrease the number of features having similar type of characteristics into single group in timeseries dataset. It has been used to find out the temporal variability in spatial dataset [7].
Extremely scanty
Scanty
Moderately deficit
Deficit
Normal
Excess
Moderately excess
Extremely excess
Abnormal
0 && 50
51 && 100
101 && 150
151 && 200
200 && 350
351 && 400
401 && 450
451 && 500
>500
Precipitation class
Precipitation range (in between)
Table 2 Precipitation, SPI, and VCI classes for regression
Severely dry Moderately dry Dry Normal
−2.50 && −2.01 −2.00 && −1.51 −1.50 && −1.00 −0.99 && 0.99
2.50 && 3.00
2.00 && 2.49
1.50 && 1.99
Extremely wet
Severely wet
Moderately wet
Wet
Extremely dry
−3.00 && −2.51
1.00 && 1.49
SPIs class
SPIs range (in between)
86 && 100
76 && 85
66 && 75
56–65
46 && 55
36 && 45
26 && 35
16 && 25
0–15
VCIs range (in between)
Extremely wet
Severely wet
Moderately wet
Wet
Normal
Dry
Moderately dry
Severely dry
Extremely low
VCIs class
724 H. Goyal et al.
An Empirical Analysis of Spatial Regression …
725
3 Methodology The approach used for this research is portrayed in Fig. 2. The subprocessing steps are represented in blocks to succeed in up to the final results. To start with precipitation, rain gauge station July to October months (year 1997--2019) in vector format and derived SPIs data and related AVHRR VCIs for Rajasthan region is used at 4 km grid. In the next step, the absent values of precipitation and SPIs were complete out through the radial basis interpolation method and then precipitation, SPIs transformed at 4 km grid with the VCIs layered data on WGS84 spatial reference. Statistical Pearson correlation [8] and spatial autocorrelation--Moran’s index [7] were used to measure the features’ significance as discussed in Sect. 2 of vegetation structure. A set of 240 features of precipitation, SPIs, VCIs of 250 rain gauge vector locations has been processed to obtain the outcome. In order to normalize the precipitation and SPI dataset, averaging and interpolation techniques were used in next step. Feature selection is done with PCA and VCI dataset is product dataset, so there is no need to process the noise removal. All the regression models (referred, Table 3) were trained and tested on the vegetation dataset on respective default tuned parameters. Correlation, RSquare, RMSE, and accuracy measures have been applied for the selection of the best regression on K-fold cross-validation. Fig. 2 Methodology used for vegetation monitoring
VCIs images (Raster)
Precipitation data
SPI computation Normalization at 4 km grid on WGS84 georeference Features’ measurement
Data cleaning
Selection of features with PCA
Train and test the regression model
Evaluation of the model Best model predictive selection
726
H. Goyal et al.
Table 3 Spatial regression models for vegetation monitoring Models
Methods
Packages
Default tuned parameters
Decision tree [9]
Rpart
Rpart
Minsplit = 20; maxdepth = 30; Minbucket = 7
Linear regression
lm
lm
None
Neural network [10]
Nnet
Neuralnet
Hlayers = 10; maxnwts = 10,000; maxit = 100
Random forest [11]
Rf
Random forest
Ntree = 500; sampling = bagging
Support vector machine
ksvm
Kernallab
Kernal radial basis
3.1 Extraction of the Features Principal component analysis (PCA) is used to extract the features to find out the variance in the data. It is used to reduce the dimensionality to discover unidentified trends and patterns in the highly AVHRR dimensioned dataset.
3.2 Regression Models In order to accomplish spatial regression, different machine learning techniques are explored. It is considered that each model is having uniqueness of features and fitted with data. The decision tree, linear regression, neural network, random forest, and support vector machine were used. Table 3 depicts the details of machine learning models that are available in GNU GPL, open-source software R.
4 Model Selection Evaluating above-said machine learning regression models on the time-series data of precipitation--SPIs--VCIs is essential. Statistical correlation, RSquare, RMSE, and accuracy set are used to measure the performances of different regression models of machine learning. Accuracy performance measure evaluates a model to compute a degree of a model’s ability to correctly predict a previously unseen test instance. To evaluate the robustness of the regression model, k-fold cross-validation has been used.
An Empirical Analysis of Spatial Regression …
727
5 Results The results of regression prediction models have been validated and tested as shown below Fig. 3. All the five machine learning regression models have been run on their default tuning parameter with k-fold cross-validation. The correlation, RSquare, RMSE, and accuracy computed result have been shown in Table 4 for all the models with accuracy achieved by best partition of k-fold best partitioning dataset for precipitation--SPIs--VCIs regression model. Table 4 depicts that the correlation, RSquare, RMSE, and accuracy are measured on K-fold training-testing partitions for Precipitation--SPI--VCI regression dataset. The obtained results validate that the support vector machine regression model outperforms among all the models. Figure 3 shows the comparison of models of K-fold cross-validations in terms of correlation, RSquare, RMSE, and accuracy for Precipitation--SPI classification model. The support vector machine is performed outstanding in all measures with highest accuracy of 96.61% and minimum RMSE of 0.04 for precipitation--SPI--VCI regression. The Decision tree is the worst regression model with 20.14% accuracy and highest RMSE with 30.16. Fig. 3 Comparative measures for K-fold precipitation--SPI regression model
Table 4 Performance measures of regression models Regression model
k-fold
Correlation
RSquare
RMSE
Accuracy
Decision tree
60–40
0.53
0.33
30.16
20.14
Linear regression
80–20
0.78
0.77
0.32
95.69
Neural network
80–20
0.80
0.80
0.05
95.55
Random forest
90–10
0.47
0.25
0.30
94.54
Support vector machine
80–20
0.80
0.80
0.04
96.61
728
H. Goyal et al.
Fig. 4 Districtwise: spatial data mining regression model
Districtwise precipitation--SPI--VCI k-fold regression of selected spatial data mining algorithm which scored highest accuracy among all models for all 33 districts of Rajasthan state (1997--2019) is depicted in Fig. 4. The results have been concluded that Sirohi and Dausa have scored highest accuracy with the linear regression model where vegetation is not good due to low precipitation. Churu, Hanumangarh districts have scored highest accuracy with neural network where precipitation and vegetation conditions are extremely good since high precipitation. Bharatpur, Bikaner, and Alwar districts have scored highest accuracy with the random forest where vegetation is average green due to average precipitation. Remaining all districts have scored highest accuracy with the decision tree model, covering scanty and deficit precipitation, shrubs kind of vegetation occur. In some districts, vegetation is good due to irrigation with some other sources of ponds, canals, dam, and rivers along with precipitation.
An Empirical Analysis of Spatial Regression …
729
6 Conclusion Spatial regression machine learning models have been used for spatial dataset of precipitation, SPIs, and VCIs to screen the vegetation condition of Rajasthan state. The results conclude that support vector machine outperform in all regression models with k-fold cross-validation for entire Rajasthan state with 0.800 correlation, 0.800 RSquare, 0.040 RMSE, and 96.610% accuracy. Districtwise regression, it has been concluded that decision tree and random forest models are better on low--medium range vegetation state. Neural network is better where vegetation is good and linear regression is suitable where vegetation is approximate equal to not.
References 1. H. Goyal, C. Sharma, N. Joshi, An integrated approach of GIS and spatial data Mining in big Data. Int. J. Comput. Appl. 169(11), 1–6 (2017) 2. P.S. Thenkabail, J.G. Lyon, Hyperspectral Remote Sensing of Vegetation (CRC Press, 2016) 3. H. Goyal, C. Sharma, N. Joshi, Spatial data mining: an effective approach in agriculture sector. Int. J. Sci. Res. Rev. 6(4) (2017) 4. H. Goyal, N. Joshi, C. Sharma, An empirical analysis of geospatial classification for agriculture monitoring. Proc. Comput. Sci. 132, 1102–1112 (2018) 5. H. Goyal, C. Sharma, N. Joshi, An effective geospatial association rule approach in agriculture monitoring, in IEEE International Conference on Computing for Sustainable Global Development (BVICAM-INDIACOM 2018) (In print) 6. H. Goyal, N. Joshi, C. Sharma, Feature extraction in geospatio-temporal satellite data for vegetation monitoring, in Emerging Trends in Expert Applications and Security (Springer, Singapore, 2019), pp. 177–187 7. H. Goyal, C. Sharma, N. Joshi, Estimation of monthly rainfall using machine learning approaches, International Conference on Innovations in Control, Communication and Information Systems (ICICCI), IEEExplore Digital Library https://doi.org/10.1109/iciccis.2017.866 0837, March 2019 8. E. Frank, Y. Wang, S. Inglis, G. Holmes, I.H. Witten, Using model trees for classification. Mach. Learn. 32(1), 63–76 (1998) 9. A. Liaw, M. Wiener, Classification and regression by random forest. R News 2(3), 18–22 (2002) 10. M. Riedmiller, H. Braun, A direct adaptive method for faster backpropagation learning: The RPROP algorithm, in IEEE International Conference on Neural Nets, pp. 586–591 (1993) 11. E.M. Abdel-Rahman, F.B. Ahmed, R. Ismail, Random forest regression and spectral band selection for estimating sugarcane leaf nitrogen concentration using EO-1 Hyperion hyperspectral data. Int. J. Remote. Sens. 34(2), 712–728 (2013).
Extracting Temporal-Based Spatial Features in Imbalanced Data for Predicting Dengue Virus Transmission Arfinda Setiyoutami, Wiwik Anggraeni, Diana Purwitasari, Eko Mulyanto Yuniarno, and Mauridhi Hery Purnomo Abstract Since the movements of mosquito or human can potentially influence dengue virus transmission, recognizing location characteristics defined as spatial factors is necessary for predicting patient status. We proposed feature extraction that considers location characteristics through previous dengue cases and the high possibility of encounters between people with different backgrounds. The number of incoming populations, school buildings and population density was included as the location characteristics. Besides the information of the spatial factors, the number of dengue cases set within a particular time window was specified for virus transmission period. Our experiments obtained two datasets of dengue fever which were patient registry and location characteristics of Malang Regency. Manually recorded Registry Data only contained positive group data and not the negative group when the patients were healthy. Thus, the proposed extraction method also included the process of generating negative data from the existing positive data. Then, we preprocessed the data by cleaning, imputing, encoding, and merging, such that there were four A. Setiyoutami (B) · W. Anggraeni · D. Purwitasari · E. M. Yuniarno · M. H. Purnomo Department of Electrical Engineering, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia e-mail: [email protected] W. Anggraeni e-mail: [email protected] D. Purwitasari e-mail: [email protected] E. M. Yuniarno e-mail: [email protected] M. H. Purnomo e-mail: [email protected] W. Anggraeni Department of Information Systems, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia D. Purwitasari Department of Informatics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia E. M. Yuniarno · M. H. Purnomo Department of Computer Engineering, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_65
731
732
A. Setiyoutami et al.
features representing previous dengue cases and eight features describing location characteristics. The experiments demonstrated that by using some ranked features the prediction had a better accuracy of 78.7% compared to using all features. Temporalbased features displayed better performances, but the result was improved in the wider location where people met. Keywords Imbalanced data · Temporal-based spatial features · Location characteristic · Predicting dengue virus transmission
1 Introduction Dengue virus or DENV is among the important and most geographically widespread arboviruses (arthropod-borne viruses) in causing human disease. Dengue fever is not only found in impoverished areas, but also affects prosperous neighborhoods in tropical and subtropical countries [1]. Some cases caused by dengue virus have existed in several countries, such as Brazil [2], Bangladesh [3], Cambodia [4], Nigeria [5], and Indonesia as the focus in current work [6]. Many manifestations of dengue virus showing different severity such as dengue fever (DF), dengue hemorrhagic fever (DHF) with plasma leakage, and DHF with shock or dengue shock syndrome (DSS). Indonesia was reported to have significant DHF cases in the Asia Pacific, and Malang Regency—East Java has the second most dengue fever cases among other provinces as well as one of the dengue-endemic areas. Therefore, we experimented dengue cases in Malang Regency, based on our previous studies about location characteristics such as the number of dengue fever cases in neighboring community health centers [6], as well as natural elements of altitude [7], weather, and climate [8]. Some factors used in the studies of dengue virus transmission are environmental factors in Yogyakarta [9], socio-economic and census factors in Sri Lanka [10], population density and mosquito populations in Colombia [11], and spatial features with entomological data in Thai Villages [12]. Dengue virus spreads through the bite of Aedes aegypti mosquito, which takes the blood of an infected person, then transmits the virus while biting a healthy person. As the movements of the mosquito vector are very restricted, the infection spreads through an infectious human host. Therefore, it is necessary to understand the location characteristics that potentially become virus transmission place in dengue prediction system. The system can support people to be more aware of their surroundings to minimize the possibility of dengue virus spread. Because a person could be a carrier when moving between locations to visit daily [13], we stated that time and place influence virus transmission. This research focuses on extraction and selection methods for features related to spatial factors along with temporal issues which are significant in the transmission process.
Extracting Temporal-Based Spatial Features …
733
2 Data Preprocessing Our research obtained Registry Data of dengue fever patients from community health centers on sub-district level, called as Puskemas, in Malang Regency during 2016 to mid-2019. There were 39 Puskesmas in 33 sub-districts with one sub-district generally consisted at least 6-18 villages and made 390 villages altogether. Since people from different villages within one sub-district were going to the same Puskesmas for health consultation, we hypothesized that the locations influence the virus transmission. We used location characteristics data related to geographical position, population, health, education, trade, and tourism from 2015 to 2017, which was called as Location Data. Registry Data was manually recorded and only had patient data that became the positive group data. However, prediction model requires both positive and negative data. This section explained how prediction dataset was constructed from both sources of Registry Data and Location Data as shown in Fig. 1.
2.1 Cleaning and Imputing Data Data preprocessing included removing duplicate entries and deleting irrelevant attributes due to manual record process by the staffs of Puskesmas. Inaccurate or missing data such as values of sick date, hematocrit and platelet count that exceed 100 were corrected in the data cleaning. Because of data incompleteness in Registry Data and Location Data, imputation method to fill in missing values was applied by estimating with model-based imputer from several relevant values as shown in Table 1. For example, either parameter value of P4, P5, or P6 could be estimated using the combination of those three values with age, gender, diagnosis, as well as discharged status. The listed parameters in Table 1 were employed based on previous study which correlated age, sex, and severity with admission time of dengue fever patients [14]. The admission process also considered platelet and hematocrit as the criteria for immediate referral to hospitals [15]. Another study stated platelet count, minimum hematocrit, and maximum hematocrit were required prior to the sick date imputation [16]. Noted that, we also standardized age value into year-based unit to accommodate age grouping when observing its influence in predicting patient
Registry Data
Location Data
T1-Cleaning, imputation, and encoding raw data
Infected Data
T2-Generate features of previous dengue cases
T4-Generate positive data combined with features from Location Data
T3-Generate negative data combined with features from Location Data
T5-Rank features to reduce dimensions
Fig. 1 Proposed extraction method to generate temporal-based spatial features
T6-Predict dengue virus transmission
734
A. Setiyoutami et al.
Table 1 Imputation of raw parameter values for Registry Data Value code
Parameter value
Data
Values used in imputation
P1
Days to hospitalization
Registry Data (possibly missing values)
P4, P5, P6, age, gender, diagnosis, discharged status
P2
Sick date
P3
Hospitalization date
P2–P1
No imputation
P4
Platelet count
P5
Minimum hematocrit
Registry Data (possibly missing values)
P6
Maximum hematocrit
Use either the combination with P4 to P6, also age, gender, diagnosis, discharged status
Calculated from P3–P1
status. Then, those listed essential values became four features of Registry Data in the prediction system. Imputation process for Location Data happened because not all sub-districts regularly conducted regional and population surveys. To resolve missing values in number of incoming populations, population density, number of school buildings, number of traditional markets, and number of tourism sites, we also applied model-based imputer using all those values grouped by village name and year. In addition to those five parameter values, there were three other location characteristics features including geographic position, position toward forest, and topography [17]. Thus, there were eight features supported Location Data in the prediction.
2.2 Categorical Data Encoding Some parameter values in Registry Data and Location Data were encoded to numerical as shown in Table 2, i.e., age grouping [18] and population data which was derived from a regulation of urban and rural area classification released by Statistics Indonesia or Central Agency on Statistics. Other encoded feature values on Table 3 were geographic position, topography, and position toward forest. Table 2 Encoding values based on range for dengue features Feature
Data
Range values for encoding
Age = {1, …, 5}
Registry
1:
= 45 years
3:
5–14 years
1:
8500 people
Population density = {1, …, 8}
Location
Extracting Temporal-Based Spatial Features …
735
Table 3 Extraction results of temporal-based spatial features Feature Feature name
Value
Score Rank Data
F1
Age
Years old
0.055
6
F2
Gender
Male; female
0.015
11
F3
Prev. cases in Puskesmas
Numbers
0.258
1
F4
Prev. cases in village
Numbers
0.083
4
F5
Incoming populations
Numbers
0.189
2
F6
School buildings
Numbers
0.151
3
F7
Tourism sites
Numbers
0.054
7
F8
Traditional markets
Numbers
0.043
8
F9
Population density
Numbers/km2
0.083
5
F10
Geographic position
1: hills; 2: flat
0.023
9
F11
Topography
1: valley; 2: beach; 3: slope; 4: 0.008 plain
12
F12
Position toward forest
1: inside; 2: edge; 3: outside
0.018
10
Target
Sick status
Positive; Negative
–
Registry
Location
The encoded values of those three features were enumerated from a category that appears most frequent as the highest numerical value. For example, sick patients were commonly found to live in the plains based on F11-Topography data and in the flats by looking up F10-Geographic position.
3 Methodology for Extracting Temporal-Based Spatial Features 3.1 Data Definition One instance in Registry Data represented one sick person or a patient who infected with dengue fever. After process T1 in Fig. 1, there were 4294 rows of patient data and 11 columns or parameter values for number, name, age, gender, sub-district name, village name, Puskesmas name, sick date, hospitalization date, diagnosis, and discharged status. While in Location Data, there were 1170 rows where each instance represented one village with 11 columns containing the location characteristics of year, subdistrict name, village name, number of incoming populations, number of school buildings, number of tourism sites, number of traditional markets, population density, geographic position, topography, and position toward forest. Prediction system defined as a classification problem for knowing patient status had positive data to describe sick patients and negative data to describe the patients
736
A. Setiyoutami et al.
who are healthy before the sick period. Each data used in the prediction problem has 12 features and one target label for patient status of positive or negative value (as shown in Table 3). As mentioned before, our datasets only had positive data since they were collected from manual records of Puskesmas. Therefore, the extraction process incorporated to generate negative data based on the existing positive data. The positive data xi+ came from the period of yearn , xi+ ← f (Dn ), while the negative data xi− were extracted from the period of yearn−1 , xi− ← f (Dn−1 ). Previous study stated that time interval between the first and second dengue infections varies between 1.6 and 2.6 years [19]. Hence, it was acceptable to hypothesize that the patients are in healthy condition on one year before being infected. For example, the sick date of a patient xi+ was on February 9th , 2018 and all values of F1–F12 were obtained from yearn = 2018. Then, to overcome imbalanced condition, we augmented one instance representing negative data xi− with all values of F1–F12 were obtained from yearn−1 = 2017.
3.2 Feature F3–F4 of Previous Dengue Cases in Puskesmas and Village Duration of human DENV infectiousness was known in the following periods [20]. • Intrinsic incubation period (IIP) [21]: the time between being infected and the onset of symptoms due to the infection, which lasts for 3–8 days. • Illness/fever period (IFP): the time when the infected person becomes viremic or infectious, which lasts for 1–8 days. • Extrinsic incubation period (EIP): the time it takes for a virus to be transmitted by a mosquito after consuming an infected blood meal [22], which lasts for 8–12 days. Here, we determined dengue virus transmission period as described in Fig. 2 based on summarizing number of days from periods of IIP → IFP → EIP in one Day -2
-1
0
1
2
3
4
5
6
7
8
9
10
11 12 13 14
15 16 17
18 19
20 21
22
23
24 25
the earliest possibility of other people getting infected the latest possibility of other people getting infected
infected mosquito bites a healthy person days before the symptoms onset (3-8 days after the bite)
period of illness (1-8 days after the symptoms onset) mosquito bites an infectious person extrinsic incubation period (8-12 days)
Fig. 2 Mechanism to define the proposed value for dengue virus transmission period
Extracting Temporal-Based Spatial Features …
737
transmission cycle. At the earliest, other people could get infected on the 9th day after the first infected person getting sick. Regarding of IIP → IFP, the earliest possible day was calculated from number of days between Day-2 and Day-7. At the latest based on consecutive periods IIP → IFP → EIP, the value fell on the 25th day from number of days between Day-2 and Day-23. Therefore, the first infected person could be considered as the infection source or virus carrier. Supposed there was an infected mosquito biting a healthy person (p-A) on July 1st and the symptoms would onset between July 3rd to July 8th . Hence, p-A became infectious from July 3rd to July 15th . Then, the following conditions were representing the most likely scenarios for virus transmission: • If a mosquito vector bit p-A on July 3rd , at the earliest the mosquito became infectious after EIP on July 10th . – If the mosquito bit a healthy person (p-B) on July 10th , at the earliest p-B became sick on July 12th , or 9 days after p-A got sick. • However, if the mosquito vector bit p-A on July 15th , at the latest the mosquito became infectious after EIP on July 26th . – After biting p-B on the same day, at the latest p-B started to get sick on July 28th , or 25 days after p-A got sick. Notes that, steps to generate temporal-based spatial features were listed in Table 3. Thus, we set dengue virus transmission period between 9 and 25 days and determined features in Table 3 to explain how the virus transmits in Puskesmas (F3) and in village (F4). Before computing values of F3–F4, Infected Data was copied from Registry Data with four columns of sub-district name, village name, Puskesmas name, and sick date. Then, the values of F3–F4 were computed from summations of Infected Data after grouping data based on the names of village and Puskesmas. The following descriptions were referred to Table 4. Line-3 and Line-4 were constant values from Registry Data for both positive and negative data. Line-5 to Line-8 were summation process based on grouping by criteria for Puskesmas or village with different year from Infected Data. Line-9 to Line-18 were about location characteristics from Location Data with differentiation on observation year because of positive and negative data. Whereas Line-19 to Line-21 were constant values of Location Data with no dependency to positive or negative status.
4 Results and Discussions For the experiments, there were Prediction Data with 5458 rows within 12 columns from Registry Data and Location Data. To reduce dimensions in the prediction task, we applied Random Forest to rank the features [23] and showed their performances in Table 5 for prediction. Generating complete Prediction Data with positive data and negative data required Registry Data from last year for each observation year,
738
A. Setiyoutami et al.
Table 4 Pseudocode for extracting temporal-based spatial features
Table 5 Accuracy comparison with ranked features Scenario
1 2 3 4
Features 1
2
3
4
5
F3
F5 √
F6 √
F4 √
F9 √
√
√
√
√
√
√
% Training data
80
√
All features (F1–F12)
90
Accuracy Random Forest (%)
Naïve Bayes (%)
Neural network (%)
76.9
64.7
–
71.6
69.7
78.7
69.7
75.1
69.0
72.2
5
80
75.7
70.2
73.2
6
60
73.8
69.0
71.8
80
77.6
–
7 8 9 10
√
√
√
√
√
√
√
√ √
√
√
76.4 69.6 70.3
69.9
70.0
Extracting Temporal-Based Spatial Features …
739
e.g., the earliest records in 2015 were utilized to support testing and training in 2016. Therefore, even though there were 4294 negative data starting from 2015, the complete Prediction Data had less instances of 5458 in 2016. The experiments set several percentage numbers for allocating training and testing data. With all features used in the prediction, 80% training data gave better accuracy performance. Therefore, next empirical experiments were using the same threshold value for training data but employing ranked features based on Random Forest method. As shown in Table 3, the five features with ranked score ≥ 0.1 were numbers of F3 (previous cases in Puskesmas), F5 (incoming populations), F6 (school buildings), F4 (previous cases in village), and F9 (population density). Then, seven scenarios in Table 5 displayed the performances for feature combinations, e.g., scenario-1 used F5–F6–F4–F9. Table 5 showed how location characteristics were important in predicting patient status, as the highest accuracy was produced from including features F5, F6, and F9 in the prediction. Along with previous dengue cases in Puskesmas, location characteristics that were evaluated as significant factors for dengue virus transmission included number of incoming populations as the effect of migration. Such human movement can result in a high possibility of disease spreading as the people who migrated may be carriers of some viruses [24]. Dengue virus transmission can also happen in schools. It is due to the fact that Aedes aegypti mosquitoes bite mainly during the day and the possibility of mosquito breeding sites exists inside the schools [25]. The other significant factor is population density, since there is a high possibility of virus transmission to occur among locations with denser populations due to the low mobility of mosquito vector. It was also implied from Table 5 that previous cases in Puskesmas (F3) had more influence in the prediction results compared to previous cases in village, since scenario 3 had higher accuracy than scenario 1. Overall performances validated that Random Forest (RF) classifier gave better prediction model which was not literally depended on the probabilistic values of existing data as Naïve Bayes results. Since Neural Network results also gave lesser accuracy values, the experiments focused more on the RF. Figure 3 showed Receiver Operator Characteristic (ROC) curve on mean true positive (TP) rate comparing the performances of prediction using features F3 + F5 + F6 + F9, F3 + F5 + F6, and all features, where TP rate was plotted against false positive (FP) rate. TP rate or sensitivity showed the proportion of sick people correctly identified as having dengue fever, while FP rate or 1-specificity was defined as the probability of healthy people misidentified as infected with dengue virus. Based on ROC curves comparison, it was concluded that prediction with features F3 + F5 + F6 + F9 overall had the best performance compared to the other two set of features, as it had the highest TP rate. We also analyzed the cause of false negative (FN) and false positive (FP) results in the prediction. FN was likely happened due to the low number of dengue cases in Puskesmas, where it only happened when the number of dengue cases was between zero to ten cases. While FP was more likely happened in denser populations (>8500/km2 ).
740
A. Setiyoutami et al.
TP rate (sensitivity )
Fig. 3 ROC curve to compare the performances of prediction with different set of features
All features F3+F5+F6+F9 F3: Prev. cases in Puskesmas F5: Incoming populations F6: School buildings F9: Population density F3+F5+F6
FP rate (1-specificity)
5 Conclusions We have demonstrated the correlation of spatial features which were defined as the location characteristics in dengue virus transmission. Since infected humans become virus carriers, they may transmit it while moving between certain areas. In this paper, we extracted dengue virus transmission period in the locations nearby, which were categorized as health centers (or Puskesmas in Indonesian) and village, as part of spatial features. Since Puskesmas is in sub-district level, it covers larger area than village. The extraction also included information of places where people from different social backgrounds often meet such as incoming populations, school buildings, and population density. Based on the extracted temporal-based spatial features, the prediction results were allied to the locations, such as dengue cases being transmitted in the same Puskesmas were more likely to occur than in the same village. Therefore, as part of early warning system, the government can carefully maintain environmental conditions at Puskesmas to minimize the possibility of dengue virus transmission. For future works, the dengue virus transmission period and spatial factors that are significant to dengue virus transmission can be used in defining dengue epidemic model, to detect the source of dengue infection, specifically humans who spread the virus. Acknowledgements This research was supported by Indonesia Endowment Fund for Education (LPDP) from Ministry of Finance under Indonesian Education Scholarship for Master Program 2018 granted to Arfinda Setiyoutami with registry number 201812110113706.
Extracting Temporal-Based Spatial Features …
741
References 1. D.J. Gubler, The global emergence/resurgence of arboviral diseases as public health problems. Arch. Med. Res. 33(4), 330–342 (2002) 2. A. Maitra, A.S. Cunha-Machado, A. de Souza Leandro, F.M. da Costa, V.M. Scarpassa, Exploring deeper genetic structures: Aedes aegypti in Brazil. Acta Trop. 195, 68–77 (2019) 3. J. Ledien et al., An algorithm applied to national surveillance data for the early detection of major dengue outbreaks in Cambodia. PLoS ONE 14(2), 1–11 (2019) 4. T. Shirin et al., Largest dengue outbreak of the decade with high fatality may be due to reemergence of DEN-3 serotype in Dhaka, Bangladesh, necessitating immediate public health attention. New Microbes New Infect. 29, 100511 (2019) 5. A.H. Fagbami, A.B. Onoja, Dengue haemorrhagic fever: An emerging disease in Nigeria, West Africa. J. Infect. Public Health 11(6), 757–762 (2018) 6. W. Anggraeni et al., Artificial neural network for health data forecasting, case study: number of dengue hemorrhagic fever cases in Malang Regency, Indonesia, in 2018 International Conference on Electrical Engineering and Computer Science (ICECOS), 2018, pp. 207–212 7. W. Anggraeni, I. Putu Agus Aditya Pramana, F. Samopa, E. Riksakomara, R. Pwibowo, L.T. Condro, Forecasting The Number Of Dengue Fever Cases In Malang Regency Indonesia using fuzzy inference system models. J. Theor. Appl. Inf. Technol. 15(1) (2017) 8. W. Anggraeni et al., Modified regression approach for predicting number of dengue fever incidents in Malang Indonesia. Proc. Comput. Sci. 124, 142–150 (2017) 9. T.W. Kesetyaningsih, S. Andarini, Sudarto, H. Pramoedyo, Determination of environmental factors affecting dengue incidence in Sleman District, Yogyakarta, Indonesia. African J. Infect. Dis. (2018) 10. P. Siriyasatien, S. Chadsuthi, K. Jampachaisri, K. Kesorn, Dengue epidemics prediction: a survey of the state-of-the-art based on data science processes. IEEE Access 6, 53757–53795 (2018) 11. L.D. Piedrahita et al., Risk factors associated with dengue transmission and spatial distribution of high Seroprevalence in schoolchildren from the urban area of Medellin, Colombia. Can. J. Infect. Dis. Med. Microbiol. J. Can. des Mal. Infect. la Microbiol. medicale 2018, p. 2308095 (2018) 12. M.P. Mammen Jr. et al., Spatial and temporal clustering of dengue virus transmission in Thai Villages. PLOS Med. 5(11), 1–12 (2008) 13. J.Y. Kang, J. Aldstadt, The influence of spatial configuration of residential area and vector populations on dengue incidence patterns in an individual-level transmission model. Int. J. Environ. Res. Public Health (2017) 14. J.E. Abello, J. Gil Cuesta, B.R. Cerro, D. Guha-Sapir, Factors associated with the time of admission among notified dengue fever cases in region VIII Philippines from 2008 to 2014. PLoS Negl. Trop. Dis. (2016) 15. L.W. Ang et al., A 15-year review of dengue hospitalizations in Singapore: Reducing admissions without adverse consequences, 2003 to 2017. PLoS Negl. Trop. Dis. 13(5), 1–13 (2019) 16. V. Wiwanitkit, P. Manusvanich, Can hematocrit and platelet determination on admission predict shock in hospitalized children with dengue hemorrhagic fever? A clinical observation from a small outbreak. Clin. Appl. Thromb. 10(1), 65–67 (2004) 17. J. Franklin, Predictive vegetation mapping: geographic modelling of biospatial patterns in relation to environmental gradients. Prog. Phys. Geogr. Earth Environ. 19(4), 474–499 (1995) 18. E.T. Suryani, Profile of dengue high fever in Blitar City at 2015–2017. J. Berk. Epidemiol. (2018) 19. K.B. Anderson et al., A shorter time interval between first and second dengue infections is associated with protection from clinical illness in a school-based cohort in Thailand. J. Infect. Dis. 209(3), 360–368 (2014) 20. L.B. Carrington, C.P. Simmons, Human to mosquito transmission of dengue viruses. Front. Immunol. 5, 290 (2014)
742
A. Setiyoutami et al.
21. M. Chan, M.A. Johansson, The incubation periods of dengue viruses. PLoS ONE 7(11), 1–7 (2012) 22. Y.H. Ye et al., Evolutionary potential of the extrinsic incubation period of dengue virus in Aedes aegypti. Evolution (N. Y). 70(11), 2459–2469 (2016) 23. Y.C. Shiao, W.H. Chung, R.C. Chen, Using SVM and Random forest for different features selection in predicting bike rental amount, in 2018 9th International Conference on Awareness Science and Technology (iCAST), 2018, pp. 1–5 24. G.A. Williams et al., Measles among migrants in the European Union and the European Economic Area. Scand. J. Public Health 44(1), 6–13 (2016) 25. P. Ratanawong et al., Spatial variations in dengue transmission in schools in Thailand. PLoS ONE 11(9), 1–16 (2016)
The Application of Medical and Health Informatics Among the Malaysian Medical Tourism Hospital: A Preliminary Study Hazila Timan, Nazri Kama, Rasimah Che Mohd Yusoff, and Mazlan Ali
Abstract The medical and health informatics proven to give a significant impact to the healthcare industry worldwide. Various medical and health informatics applications are available and found to be potential for implementation in scaffolding the healthcare industry. Even though multiple types of research previously conducted on the roles of medical and health informatics in health care, a little study found to be available related to the positions of medical and health informatics in medical tourism specifically in Malaysia. Therefore, this study aimed to investigate and provide preliminary findings on medical and health informatics applications by Malaysian medical tourism hospitals. This study only focuses on medical and health informatics applications that are accessible on the official website of medical tourism hospitals. A total number of 21 hospitals registered with the Malaysian Health Tourism Council (MHTC) with “Elite Partner Membership” status selected for sampling. From the analysis, it found that most of the applications utilized for enhancing communication between users and medical tourism hospitals. There were nine (9) types of applications identified that is the website itself followed by social network links, LiveChat, appointment booking, find a doctor, find a service, inquiry form, mobile applications and, finally virtual tour. The level of implementation found to be at average and need further progressive development so that the high potential and benefits of medical/health informatics fully utilize at a maximum level for the medical tourism industry.
H. Timan (B) · N. Kama · R. C. M. Yusoff · M. Ali Razak Faculty of Technology and Advanced Informatics, Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia e-mail: [email protected] N. Kama e-mail: [email protected] R. C. M. Yusoff e-mail: [email protected] M. Ali e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_66
743
744
H. Timan et al.
Keywords Medical and health informatics · Medical tourism · Medical and health informatics application · Medical tourism hospital
1 Introduction The advent of information technology and information system revolutionized the healthcare industry since the last decade as early as in the 1950 s. As an impact, the healthcare ecosystems on the Internet are also facing rapid and explosive digital transformation enabling the transfer and communications of healthcare data as well as the delivery of services and applications between patients and healthcare providers [6]. The terms medical informatics and health informatics have become the topic of discussion for many years and numerous numbers of scholars carried out various definitions in explaining both terms. At the same time, the field of medical tourism is also progressing well aggressively every year, in-line with the advancement of medical and health informatics. There is no doubt that medical and health informatics is one of the main factors that influence the development of the medical tourism industry. From this perspective, the combination of health, information technology, and information science shaped a new way of developing the healthcare industry in general and specifically in the context of medical tourism. Besides, telecommunication and information play an essential role in the establishment of a robust arrangement to improve the political, social, and legal matters in the niche of medical tourism. Unfortunately, current literature shows a minimal study that has been conducted to investigate the user of health and medical informatics in the medical tourism industry, especially to create engagement with their patients through the Customer Relationship Management. Therefore, this study will focus on preliminary investigation toward the application of medical informatics among medical tourism hospital in Malaysia, uniquely accessible online through their official website related to Health Consumer Informatics. Therefore, this study will identify: 1. What are the categories’ of medical and health informatics application accessible for users on the websites of medical tourism hospitals in Malaysia? 2. What is the level of implementation for medical and health informatics application among medical tourism hospitals in Malaysia that are accessible through the website? 3. What is the purpose of implementing medical and health informatics application on the websites of medical tourism hospitals in Malaysia?
The Application of Medical and Health Informatics …
745
2 Medical and Health Informatics 2.1 What Is Medical and Health Informatics United States National Institutes of Health defines the term “medical informatics” as the information science that implies computers to various field of healthcare and medicine also the analysis and propagation of medical data [7]. Another definition carried out by Hersh et al. [4] suggested that medical informatics best viewed as services that help clinicians through the implementation of informatics applications. Medical informatics considered as a science that addresses how best to use the information to improve health care. It is an interdisciplinary field of information science, computer and healthcare dealing with resources, equipment and methods to expand the acquisition, storage, retrieval, and use of health and biomedicine information [13]. According to Paul et al. [12], an integration between Medical Science and Information Science has brought to the existence of Medical Information Science which also referred to health informatics or medical informatics. It works mainly with the application of tools, techniques, and technologies in the healthcare system and its stakeholders, including hospitals, clinical center, nourishing homes, diagnostic center, and pharmaceutical labs. This situation transformed the health care to become an information-based science in which every data and information related to health care should be treated importantly and systematically to ensure efficiency in the delivery of treatment and services, especially for communication. Generally, health informatics is also known as healthcare informatics, biomedical informatics, or medical informatics enable information to be put together, managed, used, shared, and support the promotion and delivery of healthcare services by utilizing the power and ability of knowledge, skills, and tools [10]. A prior study conducted by Rina et al. [14] provides a further understanding of “medical informatics.” It is a field that comprises the cognitive, information processing and communication tasks of medical practice, education, and research which involves the process of acquiring, storing and using the information in healthcare coping with the utilization of information technology since most of the clinical routine requires gathering, synthesizing, and acting on information. The rapid development of medical informatics brings a massive impact on the medical and health industry. Generally, there were various healthcare informatics applications used in the Malaysian public healthcare system, as mentioned by Rina et al. [14]. The medical and health informatics application used by Malaysian health care, including telemedicine, clinical information systems, and information support for laboratory, radiology, and pharmacy. Earlier, various major ICT projects initiate in Malaysia including Multimedia Super Corridor (MSC), Telemedicine Blueprint (subsequently named telehealth), Clinical Decision Support Systems (CDSS), In-Patient and Out-Patient Management Systems, Laboratory Information Systems (LIS), and Health Management Information Systems (HMIS). However, little attention is given by scholars on its roles
746
H. Timan et al.
of supporting the medical tourism industry, especially to reach out to the medical tourist from abroad despite the comprehensive implementation of medical and health informatics.
2.2 Medical/Health Informatics and Medical Tourism A group of researchers [13] investigate the implementation of medical and health informatics application for the medical tourism industry in Iran. They examine the potential aspects of health informatics application in medical tourism and its ability to apply as an initiative for integration, interoperability, and knowledge extraction tasks in the low, middle, and high level of decision-making. Based on their reports, there are six (6) types of medical and health informatics application suggested have a high potential for implementation to boost up the field of medical tourism as explained in Table 1. Meanwhile, a study by Hong [5] focused on the potential of telemedicine platform for medical tourism which found to help saves the patients from the hassles of recognizing and associating themselves with the right health service provider but also diminishes the language and cultural barriers. On top of that, as the growing number of medical tourist keep on increasing, holistic e-hospital found to be sure can serve as an efficient platform to entice more medical tourists. The telemedicine does help to reduce the barricades that existed between the medical tourist and the medical tourism providers.
3 Methodology The study on health informatics application in the medical tourism industry found to be very minimum in number. Thus, there is a need to conduct more studies related to the field. For this study, a list of medical tourism hospital with the status of “Elite Partner Membership” registered with the Malaysian Health Tourism Council (MHTC) has selected for investigation. The “Elite Partner Membership” accreditation given by international healthcare accreditation agencies related to the medical and health services and industries, such as the Joint Commission International (JCI), Malaysian Society for Quality in Health (MSQH), the Australian Council on Healthcare Standards (ACHS), Accreditations Canada, also CHKS Accreditation Unit (UK). The aim of the “Elite Partner Membership” focused on giving the most prestigious private healthcare services and facilities. This study conducted qualitatively by evaluating the web content of medical tourism hospitals through observation focused on the availability of its medical and health informatics applications. Selected medical tourism hospital website investigated between 12th until 14th July 2019. Firstly, a list of medical tourism hospital downloaded from the official website http://www.mhtc.my owned by MHTC, which
The Application of Medical and Health Informatics …
747
Table 1 Potential medical and health informatics applications for medical tourism [13] Medical informatics application
Description
Electronic Health Card
Mostly adopted by the European Union countries and contained information in two (2) categories that are administrative and medical aimed to support the patients’ decision-making
Telemedicine
A medical tourist was able to obtain legal information on their health status by teleconsultation with the physician and select for the best treatment options before boarding their medical travel plan
Virtual Social Network
It is a virtual medical tourism community. Provide communication among the medical tourist and the healthcare provider through web forums, social group and channels, questions and answers systems, and group instant messaging
Electronic Health Record (HER)
Developed based on technological and informational infrastructure from the medical perspectives. Increasing the capabilities of medical tourism facilities to attract medical tourists from developed countries whereby their personal medical and health information accessible globally over the network
Data Mining and Knowledge Discovery
It is a process of analyzing large data sets to survey and extract earlier unidentified patterns, styles, and relationships to produce knowledge for decision-making and used together with analytics tools
Medical Tourism Recommender Systems (RS) It is tools for reducing information overload and Decision Support System (DSS) and providing travel recommendations to tourists. It is appropriate in many domains by using various functionalities and platforms such as web-based, mobile, and intelligent, DSS, ranking systems, scheduling systems, routing applications, and positioning systems
was the official body responsible for the planning, development, and monitoring of the medical tourism industry in Malaysia. Continuously, Each Unified Resource Locator (URL) for the hospital identified and visited to ensure that all links are accessible. Each of the websites visited to define types of medical and health informatics application available and accessible online. Next, each result recorded and tabulated in a table for analysis using Microsoft Excel. The total number of twenty-one (21) hospitals selected for this study, as listed in Table 2.
748
H. Timan et al.
Table 2 List of “Elite Partner Membership” medical tourism hospitals No.
Hospital
URL
H1
Alpha Fertility Centre
https://alphafertilitycentre.com/
H2
Ara Damansara Medical Centre
https://www.ramsaysimedarby.com/hospitals/ admc/
H3
Beverly Wilshire Medical Centre
https://www.beverlywilshiremedical.com/
H4
Fertility Associates
https://fertilityassociates.com.my/
H5
Gleneagles Kuala Lumpur
http://gleneagleskl.com.my/
H6
Gleneagles Penang
http://www.gleneagles-penang.com/
H7
KPJ Ampang Puteri Specialist Hospital
https://www.kpjampang.com/
H8
KPJ Damansara Specialist Hospital
http://www.kpjdamansara.com.my/
H9
KPJ Tawakkal Specialist Hospital
https://www.kpjtawakkal.com/
H10
Loh Guan Lye Specialist Centre
http://www.lohguanlye.com/
H11
Mahkota Medical Centre
http://www.mahkotamedical.com/
H12
National Heart Institute
https://www.ijn.com.my/
H13
Pantai Hospital Kuala Lumpur
https://www.pantai.com.my/kuala-lumpur
H14
ParkCity Medical Centre
https://www.ramsaysimedarby.com/hospitals/ pmc/
H15
Penang Adventist Hospital
https://pah.com.my/
H16
Prince Court Medical Centre
https://www.princecourt.com/
H17
Subang Jaya Medical Centre
https://www.ramsaysimedarby.com/hospitals/ sjmc/
H18
Sunfert @ Bangsar South
https://www.sunfert.com/
H19
Sunway Medical Centre
https://www.sunwaymedical.com/
H20
Thomson Hospital Kota Damansara
https://thomsonhospitals.com/en
H21
Island Hospital
https://www.islandhospital.com/en/home
4 Analysis and Discussion The findings of this study divided into two sections. Firstly, each website visited and observed to identify types of medical and health informatics applications that are accessible to its users. Furthermore, the level of medical and health informatics applications made available online also identified beside the purpose of providing those services.
4.1 Types of Medical and Health Informatics Applications Based on the observation and investigation, there are nine (9) types of medical and health informatics application and services that are available to access through the
The Application of Medical and Health Informatics …
749
official websites of medical tourism hospitals in Malaysia. Further discussion on the results of the study is as follows (Table 3): • Website The website has proven to bring many positive impacts to the medical tourism industry [1, 3, 8, 9, 11]. The result of website utilization among the entire medical tourism hospitals in Malaysia with the status of “Elite Partner Membership” found to be at the highest level. All hospitals provide websites that are accessible to their users. From the observation, all of the sites found to be functioning well and work as required. • Social Network The roles of a social network were undeniable especially in bringing together the community. The recent explosion of social network utilization giving a significant impact on various industries including the healthcare. Earlier study stated that the social media applications on the social networks help to improve the communication speed between and across healthcare stakeholders [2]. Based on the result of this study, 15 out of 21 hospitals provides at least one link to their official social media page, which represents 72% of the population selected for this study. Most of the hospitals offer links to their Facebook account besides Instagram, YouTube Channel, Twitter, and LinkedIn. • Live Chat The “LiveChat” box enables users of medical tourism hospitals to get real-time communication with the representation from the hospitals. This application ensures that every urgent inquiry needed for clarification can be entertained speedily and responsively by the hospitals. Unfortunately, only four (4) out of 21 hospitals provides this type of channel for communications. Not only that, the LiveChat provides limited information as it is only served for users to ask on appointment booking and scheduling. • Appointment Booking In the context of medical tourism, the patients separated geographically from the hospitals across the border. Thus, they need to have an ability in making an appointment online to reduce hassles and saving their time. In this study, it found that fifteen (15) hospitals or 70% provide appointment-booking system available on their official website. • Find a Doctor These services act as a directory on the list of physicians and healthcare practitioners available in medical tourism hospitals. It is an advantage for the users as they can choose the preferred doctor before setting for an appointment. Several numbers of 17 medical tourism hospitals provide these functions on their official website.
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
H1
H2
H3
H4
H5
H6
H7
H8
H9
H10
H11
H12
H13
H14
H15
H16
H17
Website
/
/
/
/
/
/
/
/
/
/
/
/
Social network
/
/
/
LiveChat
/
/
/
/
/
/
/
/
/
/
/
/
/
Appointment booking
/
/
/
/
/
/
/
/
/
/
/
/
/
/
Find a doctor
/
/
Find a service
/
/
/
Inquiry form
/
/
/
/
/
Mobile apps
/
/
(continued)
Virtual tour
Table 3 Summary of types of medical and health informatics application and services that are accessible online on the website of medical tourism hospitals in Malaysia
750 H. Timan et al.
/
4
/
/
21
H21
15
/
/
/
H20
LiveChat
H19
/
Social network
/
H18
Website
Table 3 (continued)
15
/
/
Appointment booking
17
/
/
/
Find a doctor
2
Find a service
5
/
/
Inquiry form
7
/
/
Mobile apps
2
Virtual tour
The Application of Medical and Health Informatics … 751
752
H. Timan et al.
However, it works limitedly as it only offers minimum information about the doctors, which not helping much in their decision-making. It is worth more if they could give doctor-rating systems, which will become guidance for users’ decision. • Find a Service Surprisingly, only two (2) hospitals provide searching systems for users to find the list of services offered by the hospitals. Even though it is one of the critical information to be provided in their official websites [15], it found that the medical tourism hospitals are not aware and taking necessary actions on it. • Inquiry Form From the result, it shows that only five (5) medical tourism hospitals from the list provide an online inquiry form as a medium of communication for their users. Most of the hospitals choose either “LiveChat” or “inquiry form” to be made available on their websites. However, one (1) hospital provides both. • Mobile Applications There are seven (7) medical tourism hospitals go further advanced by providing mobile applications as a channel of communication to be downloaded by the users. They are accessible by downloading from the Play Store and scan the QR-Code supplied on the websites. It is in-line with the new revolution of IT/IS usage that brings impact to various industry worldwide. It does help for fast access to information and reliable for users by using their mobile devices. • Virtual Tour For better visualization of the condition and scenery of the medical tourism hospitals, two (2) hospitals provide a virtual tour through their websites. It is beneficial for the medical tourist before they started their medical journey to avoid any disappointment as they are traveling from far.
4.2 Numbers of Medical and Health Informatics Applications Available Online Above finding showed that most of the medical tourism hospitals involved in this study provide at least four (4) types of medical/health informatics applications on their website recorded as eight (8) hospitals followed by five (5) hospitals provide six (6) types of applications. Moreover, six (6) hospitals offer two (2) applications while three (3) hospitals recorded with four (4) applications and finally two (2) hospitals provides at least one (1) application on their website to cater their users’ requirements (Fig. 1).
The Application of Medical and Health Informatics … 9 8 Number of Applications
Fig. 1 Number of medical/health informatics applications available online for each medical tourism hospitals in Malaysia
753
8
7 6
6
5 4
4
3 2 1 0
2 1 6
5 4 3 Number of Medical Tourism Hospitals
2
4.3 Purpose of Medical and Health Informatics Applications Available Online Based on the study, the implementation level of medical and health informatics application available online among “Elite Partner Membership” of medical tourism hospitals in Malaysia needed more attention. Even though a various form of applications provided on their website, the functionality of each medical and health informatics applications is still limited. Most of the applications provided aimed to ease the communication between medical tourism hospitals and its current and potential medical tourism patients.
5 Conclusion Overall, this study found that the implementation of medical and health informatics applications that are available on the official website of medical tourism hospitals still at a lower level. It seems like there is no standard implementation on what types of online services that must be available on their website. However, this preliminary study only provides the helicopter view on the overall landscape of medical and health informatics in Malaysian medical tourism industry. The medical tourism hospitals in Malaysia should start to go further by providing services such as “Telemedicine” and “Doctor Rating Systems” to strengthen their position and attract medical tourist patient. Finally, yet importantly, this will enhance the decision-making process made by the users and lastly build up an excellent reputation for the medical tourism hospital in specific and generally for the industry. For future research, the whole landscape of the medical tourism industry in Malaysia should be studied to find an opportunity for medical and health informatics to take part. It may lead to robust development of medical tourism industry in Malaysia.
754
H. Timan et al.
References 1. J. Fernández-Cavia, C. Rovira, P. Díaz-Luque, V. Cavaller, Web quality index (WQI) for official tourist destination websites. proposal for an assessment system. Tourism Manage. Perspect. 9, 5–13 (2014) 2. F.J. Grajales, S. Sheps, K. Ho, H. Novak-Lauscher, G. Eysenbach, Social media: a review and tutorial of applications in medicine and health care. J. Med. Int. Res. 16(2) (2014) 3. D.M. Herrick, Medical Tourism: Global Competition in Health Care, National Center for Policy Analysis (2007) 4. W.R. Hersh, E. Care, I.S. An, Medical Informatics 288(16), 1955–1958 (2001) 5. Y.A. Hong, Medical tourism and telemedicine : a new frontier of an old business corresponding author. J. Med. Int. Res. 18, 5–7 (2016) 6. A. Iyengar, A. Kundu, G. Pallis, Healthcare informatics and privacy. IEEE Internet Comput. 22(2), 29–31 (2018) 7. Y. Jia, W. Wang, J. Liang, L. Liu, Z. Chen, J. Zhang, T. Chen, J. Lei, Trends and characteristics of global medical informatics conferences from 2007 to 2017: A bibliometric comparison of conference publications from Chinese, American, European and the Global Conferences’, Computer Methods and Programs in Biomedicine (Elsevier B.V.) 166, pp. 19–32 (2018) 8. D. Lonˇcari´c, L. Bašan, M. Jurkovi´c, Websites as tool for promotion of health tourism offering in Croatian specialty hospitals and health resorts 2 health tourism in Croatia, in Recent Advances in Business Management and Marketing, pp. 265–270 (2013) 9. N. Lunt, P. Carrera, Systematic review of web sites for prospective medical tourists. Tourism Rev. 66(1/2), 57–67 (2011) 10. A. Madden, Health informatics and the importance of coding. Anaesthesia Intensive Care Med. 15(2), 62–63 (2014) 11. A. Ngamvichaikit, R. Beise-Zee, Communication needs of medical tourists: an exploratory study in Thailand. Int. J. Pharm. Healthcare Market. 8(1), 98–117 (2014) 12. P.K. Paul, D. Chatterjee, M. Ghosh, Medical Information Science: Emerging Domain of Information Science and Technology (IST) for sophisticated Health & Medical Infrastructure Building–An Overview. Int. Sci. J. Sport Sci. 1(2), 2012 (2012) 13. P. Rezaei-hachesu, R. Safdari, M. Ghazisaeedi, Taha Samad, - Soltani, The Applications of Health Informatics in Medical Tourism Industry of Iran. Iran J. Public Health 46(8), 1147–1448 (2017) 14. R. Rina, M. Khanapi, N. Abdullah, ‘An analysis of application of health informatics in Traditional Medicine: A review of four Traditional. Int. J. Med. Inform. 84(11), 988–996 (2015) 15. M. Samadbeik, H. Asadi, M. Mohseni, A. Takbiri, A. Moosavi, A. Garavand, Designing a medical tourism website: A qualitative study. Iranian J. Public Health 46(2), 249–257 (2017)
Design of Learning Digital Tools Through a User Experience Design Methodology Gloria Mendoza-Franco, Jesús Manuel Dorador-González, Patricia Díaz-Pérez, and Rolando Zarco-Hernández
Abstract The insertion of digital technologies and the widespread use of mobile devices has allowed their use in learning processes. The wide exposure of young people to technology has modified their expectations regarding educational experiences, so it is necessary to propose new ways of learning. However, beyond just proposing new learning tools, it is necessary to think about the learning experience, for this, it is essential to have pedagogical strategies and know in detail the users’ specific needs of consultation, study, or learning. With this in mind, a project that aims to improve the learning process and application of ergonomics knowledge is presented, in which the design of a digital informal learning tool developed with a student-centered perspective is proposed. Keywords M-learning · E-book · User-centered design · Educational experience · Educational digital tools
1 Introduction Educational tools have constantly evolved at the pace that technology allows, which in turn modifies the expectations of students, their study habits, and their practices when doing technological projects. There is a great diversity of digital products G. Mendoza-Franco (B) · R. Zarco-Hernández Postgraduate Program in Industrial Design, UNAM, Mexico City, Mexico e-mail: [email protected] R. Zarco-Hernández e-mail: [email protected] J. M. Dorador-González Escuela Nacional de Estudios Superiores Juriquilla, UNAM, Queretaro, Mexico e-mail: [email protected] P. Díaz-Pérez Facultad de Estudios Superiores Aragon, UNAM, Nezahualcoyotl, Mexico e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_67
755
756
G. Mendoza-Franco et al.
oriented to education, we can find, among the most representative, the e-learning, mlearning platforms and e-books. Each of them has specific characteristics and serves particular purposes. Nowadays, the young students grew up with a high influence of digital technologies, so their learning expectations are different [1]. It turns out that educational experiences must be focused precisely on that, on being complete experiences [2]. Reason why the didactic resources have diversified; the books which were the primary didactic tool have gone to the background. Currently, students of all grades prefer to search for information by “google it,” looking for explanatory videos, in articles with short texts; and above all, they seek to have access to information all the time, anywhere. It is for these reasons that digital tools seem to be the best option to solve the consultation and learning needs of students; however, in the design and development process, some issues arise that are worth addressing. Of course, one of them is didactics, it is important to consider that for these tools, to be successful and meet their objectives, they must be designed within a didactic or educational strategy. The development of educational technologies should not be focused on technology per se, but should take into account the pedagogical objective as the main one and consider technology as a means to achieve it [3], and only in this way, a successful development of digital tools can be achieved. Therefore, the technological and pedagogical component are equally important and both must be studied carefully [4]. On the other hand, there is technology; technological proposals to address educational problems have evolved greatly in the last 40 years, which represents some difficulties. It is necessary to generate reference frameworks that allow guiding the development of educational strategies using each one of the possible digital platforms [5]. This becomes complex when new strategies are rapidly developed. Many times, digital products reach students before they could be evaluated. This fact represents a great challenge. Technology-mediated educational tools change constantly, and new proposals are being developed almost instinctively trying to take advantage of the benefits that new technologies represent. Finally, there is the experience. Learning must be seen as an experience, that is, a set of interactions between various elements of a system that allow the student to appropriate knowledge [6]. So, it is essential to know in depth the students’ learning needs, but also their habits of consultation, study, and application of knowledge. Only by carrying out a student-centered design process, it is possible to make proposals that could successfully be inserted into the learning process. Digital technologies allow to design solutions very complex and that can cover the entire learning process; however, it is necessary to know in depth the context and needs to make sound decisions and then offer ideal tools [7].
Design of Learning Digital Tools Through …
757
2 Technology-Mediated Education There are a large number of technological approaches to learning, each one aimed at covering certain objectives. With regard to the design of digital tools for learning, there are three major trends: e-learning that is the initiator of everything, m-learning that has become increasingly important, and finally, there is a tool that looks like a simple digital product, but it has, in fact, great potential: the e-book.
2.1 E-Learning Platforms The e-learning platforms were the first innovative proposal of tools for learning using digital technologies. The arrival of the Internet brought to the table a series of new opportunities for knowledge and social exchange that could now be done at a distance. Since then e-learning platforms have changed enormously, we can now define these platforms as those that make use of digital resources and devices to improve learning processes [8]. Although this definition seems very broad, it is the one that best reflects the objectives of e-learning. Among the advantages represented by the use of e-learning platforms, we find the fact that it enhances resources for the students, who benefit from multimedia content that represent innovative ways to learn compared to traditional learning models [9]. Another advantage of this type of technologies is that it allows students to learn outside the classroom, giving the possibility of autonomy in learning, the student can choose when and where to learn [10]. It is for this reason that some authors consider elearning as the evolution of distance learning [8]. Finally, it is also clear that the use of e-learning helps to optimize economic resources, and the main reasons for adopting this model are: to improve the quality of learning, to improve access to education and training, to reduce the costs of education, and to improve the cost-effectiveness of education [11].
2.2 M-Learning and Ubiquitous Learning With the arrival of mobile phones and their expansion in the market, new technological proposals for learning have been also done using this new tool. At the beginning, it was simple learning games or interactions through text messages; currently, thanks to the sophistication of mobile phone equipment, m-learning is as complex as e-learning [12]. There are two main differences between the two proposals, the first one is the level of interactivity, the second one, the portability. Smartphones have proposed a whole new digital language through app design. These new digital products allow fast and constant access to content, communication, and even interactivity with the
758
G. Mendoza-Franco et al.
environment, so they have become part of everyday life [4]. Being able to propose educational strategies contained in apps has allowed us to take advantage of these benefits to generate clear learning strategies. On the other hand, the portability offered by mobile solutions allows us to generate a ubiquitous learning environment, that is, the access to content and strategies is available anywhere at any time [13], and could be reached by a thumb. This possibility generates new expectations in students who demand “just enough, just for me, just in time” models [12]. This context has generated students who are impatient, creative, and expect results immediately, personalize the things they choose, focus on themselves, and trust technology [14].
2.3 Enhanced E-Book An e-book is basically a book in electronic format [15]. From this concept, we could think that it is not a new digital learning tool, but a traditional tool in a new format. However, to have this, digital resource has allowed us to explore new possibilities in its operation as a tool. Some studies show that this simple change in format of a book completely modifies the experience of reading and learning through it [16]. Even we can say that reading in digital formats brings some extra benefits, for example, the possibility of being able to use different ways to access the content such as text to speech or being capable to change the font size [17]. In addition, we must add the fact that being able to read the book on an electronic device increases students’ interest in reading. In addition, digital formats allow new functions to be added to the simple text. We can insert other types of content such as multimedia or generate links to external resources, so it facilitates the process of self-regulating learning and helps to extend the activity to the use of additional resources [16]. Therefore, an e-book can be considered as a new learning tool, as long as it has additional features to improve the learning experience.
3 An Educational Digital Tool Design for Enhancing Ergonomics Application Ergonomics is a fundamental discipline for all the professionals in charge of product design. However, in Mexico, there is a lack of interest for learning about this discipline, in addition to this, there are few educational tools that allow informal learning besides textbooks and some Web platforms. After an inquiry process, it was found that students have a certain curiosity to learn about the subject, but the expectations they have regarding the informal learning experience of a subject are not covered by the textbooks normally used. For this reason, we propose the design of a digital
Design of Learning Digital Tools Through …
759
tool, a mobile application, that allows to meet the learning needs of the students at the same time that it is closer to their expectations of learning experience. A user experience design method is used for this purpose.
3.1 User Research With the objective of awakening the interest of university students of engineering and industrial design in ergonomics, a group of professors initiated a research project to identify the areas of opportunity and the deficiencies of the teaching--learning process of ergonomics in Mexico. A diagnostic process was carried out, and as part of it, 93 surveys were applied to students (47) and recent graduates (46) of the industrial engineering and design careers of the National Autonomous University of Mexico. In particular, the results of two of the questions are relevant at this stage of the project. The first question asked respondents to use qualifiers to describe their perception about the content and organization of ergonomics textbooks, the results are shown in Fig. 1. The answers with higher percentages allow us to obtain the following conclusions: • More than a half (56%) of respondents consider books to be interesting. Although it is a majority percentage, it is not so high. It means that half of the participants find no interest in consulting an ergonomics book • 37% of participants consider the books to be “unclear” • 30% of respondents think that ergonomics books contain a lot of information, while 26% say they are very extensive • 30% of participants consider that ergonomics books have insufficient information. The last two points seem contradictory; however, after discussing the issue with ergonomics professors, from their teaching experience, the problem is that students search for information in books, that they can quickly apply to their design or engineering projects. That is why books do not seem to offer enough information at the same time that they are perceived with excess information. Textbooks do not cover Fig. 1 Qualifiers of ergonomics textbooks in percentage
760
G. Mendoza-Franco et al.
Fig. 2 Simple frequency of desirable features
the need for students to consult recommendations that allow them to apply ergonomic knowledge to their projects immediately. The second question related to this topic is an open question in which participants were asked to describe the ideal characteristics of a material or tool to learn about ergonomics. A categorization of concepts was carried out based on keywords mentioned by the respondents, and finally, a frequency analysis was presented in Fig. 2. Based on the participants’ responses, a proposal was made to design a new type of teaching material that met the needs of students and professionals to consult information related to ergonomics. The proposal corresponds to a digital tool that is interactive, easy to use and that allows easy access to practice-oriented content, so that users are able to easily apply the information obtained. At the beginning of the project, one of the objectives was to write an e-book; however, after the information collected, it is clear that an e-book is not enough, even an enhanced e-book might not be enough, so a deeper analysis of the users’ needs was carried out in order to make a more accurate proposal.
3.2 Design Features Definition At this stage, an analysis was carried out using the Persona method [18], it was important to understand how the didactic tool will be used in a real context, how it will be used in class or when carrying out a project. For this stage, it was decided to focus the analysis on the students since they are the target user. For the Persona method, three different student profiles were created: outstanding student, who performs learning activities autonomously and has an interest in learning more; average student, who
Design of Learning Digital Tools Through … Table 1 Functions proposed for each requirement
761
Feature
Functions
Customization
• Recent contents • Labels to tag specific contents
Immediate access
• Search tool to easy find an specific topic
Dynamic visualization • Two ways of visualization of contents: graphic and text Interactivity
• Brief texts and use of multimedia • The user decides what to read and in which order • Visual tools to indicate the reading progress
Additional resources
• To show additional academic resources for each topic
fulfills his tasks and carries out the projects, but is not interested in devoting extra time to his learning; and finally, the selfless student, who has difficulty with classes and projects and has also problems understanding when consulting a text. From the analysis of needs and possible solutions in the Persona method, the requirements of the tool were obtained, which are shown in Table 1.
3.3 Interactive Handbook of Ergonomics Based on the requirements obtained and the information acquired in the user research stage, the development of a mobile application was proposed, it should allow: • To access to short contents oriented to its immediate application in a design project • To access assessment and measurement methods that allow students to apply them with simple resources to solve ergonomic needs • To access texts that address ergonomics topics categorized according to their field of application, so it will be easier to access them based on the problem they want to solve • To access textbooks and other online resources categorized by topic of interest • To self-regulate user learning. It is the users who decide what content they consult, with what depth and in what order. In addition to allowing them to label these contents with the objective of making a classification of them according to the interests of the user • The application must work disconnected from the Internet, this is to make it more accessible and not conditioning its operation to the connection to Internet • The application must be easy to use Based on the aforementioned requirements, the proposal is defined as an mlearning strategy, although the content seems like an e-book, the interaction needed
762
G. Mendoza-Franco et al.
brings it closer to a mobile application for educational purposes. Probably in the future, the social component of the application could be proposed in order to facilitate discussions and conversations with ergonomics experts. However, at this time, it is preferred that the proposal could be low cost and very accessible, so it is preferred without this social component. In order to cover the desirable characteristics, the creation of four types of contents or elements is proposed: • Checkpoints. Presented in the main menu. These are ergonomic recommendations sorted by thematic categories. Each checkpoint consists of a single sentence and is accompanied by a short text that describes the ergonomic principle from which it is derived and the form of application. • Methods. Presented in the main menu. The methods are classified thematically and synthetically present the procedure and materials needed to apply an ergonomic inquiry, measurement or evaluation method • Texts. Presented in a secondary menu. The texts correspond to brief theoretical descriptions of the fundamental themes of ergonomics • Other resources. Accessible from texts. They correspond to external resources such as books, articles, or Web pages that address ergonomics in a deeper way Once the first prototype was developed to visualize the functions and the way of presenting the contents, a focus group session with design students was carried out to evaluate the value of the proposal.
4 Results During the focus group carried out, the participants talked about their information consultation habits and the problems that exist when they refer to textbooks to find information that allows them to solve ergonomic needs in the design projects. The following conclusions were obtained: • Students prefer to check Web sites, videos, and other multimedia resources rather than books to learn about a new topic. • Students prefer books in digital formats due to the portability and flexibility they offer • They would like to have mobile applications that facilitate the information consultation process • They require information related to ergonomics that allows them to solve design problems in an agile way. Subsequently, the mobile application proposal “Interactive handbook of ergonomics” was presented, and they were asked to explore the prototype. When recovering their impressions in this regard, the following results were obtained: • The application correctly addresses the need for practical content • The way of presenting the contents seems friendly and easy to use
Design of Learning Digital Tools Through …
763
• Checkpoints are the type of content students look for when they ask about ergonomics • The content tagging function seems useful for self-regulating learning • The methods look interesting, but students would like to have associated evaluation tools in other mobile applications. This point is an area of opportunity for the future • Being able to have a mobile application for consultation on ergonomics issues can support the application of ergonomic knowledge to design projects.
5 Conclusions The development of digital learning tools has allowed revolutionizing the ways in which new knowledge is acquired, but it also presents new challenges for teachers and developers, who have to find the right ways to propose such tools to be really useful and to support the acquisition of new knowledge. On the other hand, the expectations of the students are focused on having more complete and ubiquitous learning experiences, thus demanding the development of new products for educational purposes. This document presents an example of an educational tool designed from a user experience design method in which students were included in all stages of development, showing that in this way, you can have a more detailed understanding of user learning needs, and therefore, to be able to generate a tool that best suits the reality of the students. Compliance with Ethical Standards Conflict of Interest This study was funded by UNAM-DGAPA-PAPIME PE109818. The authors declare that they have no conflict of interest. Informed consent Informed consent was obtained from all individual participants included in the study.
References 1. D. Furió, M.-C. Juan, I. Seguí, R. Vivó, Mobile learning vs. traditional classroom lessons: a comparative study. J. Comput. Assist. Learn. 31(3), 189–201 (2015) 2. D. Oblinger, J.L. Oblinger (eds.) Educating the Net Generation (EDUCAUSE, Boulder, CO, 2005) 3. J.V. Wertsch, J.V. Wertsch, Voices of the Mind: Sociocultural Approach to Mediated Action (Harvard University Press, 2009) 4. Y. Alioon, O. Delialioglu, A frame for the literature on M-learning. Proc. Soc. Behav. Sci. 182, 127–135 (2015) 5. M. Kearney, S. Schuck, K. Burden, P. Aubusson, Viewing mobile learning from a pedagogical perspective. Res. Learn. Technol. 20(1), n1 (2012) 6. S. Alexander, E-learning developments and experiences. Educ. Train. 43(4/5), 240–248 (2001)
764
G. Mendoza-Franco et al.
7. J L. Moore, C. Dickson-Deane, K. Galyen, E-Learning, online learning, and distance learning environments: Are they the same? Internet High. Educ. 14(2), 129–135 (2011) 8. A. Sangrà, D. Vlachopoulos, N. Cabrera, Building an inclusive definition of e-learning: An approach to the conceptual framework. Int. Rev. Res. Open Distrib. Learn. 13(2), 145–159 (2012) 9. T. Anderson, The Theory and Practice of Online Learning (Athabasca University Press, 2008) 10. R.A. Cole, Issues in Web-Based pedagogy: A Critical Primer (Greenwood Publishing Group, 2000) 11. A.W. Bates, ‘Restructuring the university for technological change, paper presented at What Kind of University? 18–20 June, The Carnegie Foundation for the Advancement of Teaching, London (2001) 12. K. Peters, M-learning: positioning educators for a mobile, connected future. Int. Rev. Res. Open Distrib. Learn. 8(2) (2007) 13. M. Sarrab, M-learning in education: Omani undergraduate students perspective. Proc.-Soc. Behav. Sci. 176, 834–839 (2015) 14. S. Carlson, The net generation goes to college. Chron. High. Educ. 52(7), A34 (2005) 15. N. Aharony, Factors affecting the adoption of e-books by information professionals. J. Librariansh. Inf. Sci. 47(2), 131–144 (2015) 16. E. Dobler, e-textbooks: a personalized learning experience or a digital distraction? J. Adolesc. Adult Lit. 58(6), 482–491 (2015) 17. D.H. Schunk, B.J. Zimmerman, Self-regulated Learning: From Teaching to Self-reflective Practice (Guilford Press, 1998) 18. T. Miaskiewicz, K. A. Kozar, Personas and user-centered design: how can personas benefit product design processes? Des. Stud. 32(5), 417–430 (2011)
Fake Identity in Political Crisis: Case Study in Indonesia Kristina Setyowati, Apneta Vionuke Dihandiska, Rino A. Nugroho, Teguh Budi Santoso, Okki Chandra Ambarwati, and Is Hadri Utomo
Abstract The rapid development of information technology brings many positive effects to society, especially in terms of information distribution. However, it is actually misused by people who have an interest in sharing hoaxes. Simultaneously with the distribution of hoaxes, emerged the phenomenon of echo chambers that could make public believe in what is in their mind and to ignore the facts. In Indonesia, the distribution of hoaxes which has implications for riots has occurred in Papua as a result of tweeting on Twitter that has a content that provokes riots in Papua. Keywords Fake identity · Echo chamber · Hoaxes · Post-truth era
1 Introduction The need of precise and accurate information is the society’s demand in the current technological era. This increase of the need is due to the rapid development of information and communication technology which is considered to have many benefits for the society as users. This development certainly gives birth to facilities that provide ease for the public to access information widely without limits and a long wait to get it. Citing the data from Statista, in 2017 it was estimated that there were 2.5 billion people who used social media with the increasing number of users, it is true that the social media platform is chosen by the public to share information with one another. However, amid the use of social media as a platform used by the public to share information, there are parties who actually misuse it by sharing fake information, called hoaxes. Currently, Indonesia is entering the post-truth era, a situation where information or news is influenced by the ideas of trust and opinion [1]. In this situation, access to social media can build public opinion by sharing hoaxes or fake news and even harmful negative content. In this era, people seem to be more convinced K. Setyowati (B) · A. V. Dihandiska · R. A. Nugroho · T. B. Santoso · O. C. Ambarwati · I. H. Utomo Departement of Public Administration, Faculty of Social and Political Sciences, Universitas Sebelas Maret, Ir. Sutami Street, No 36 A, Surakarta 57126, Indonesia e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_68
765
766
K. Setyowati et al.
that bad news is good news. Post-truth era is also considered as an era where opinions are based more on emotions and beliefs than on objective facts [2]. Simultaneously with the post-truth era, a phenomenon called the echo chamber has appeared. This phenomenon makes people communicate only with those who have the same thoughts [2]. The workings of the echo chamber phenomenon are almost the same as the sound echo that is when there are the same ideas repeated in a variety of thoughts that further strengthen one’s view of an event. This phenomenon develops more easily among the people along with the presence of information technology that has implications for the use of social media that tends to be misused. The misuse is shown through social media that can create its own reality. At present, the role of algorithms in social media that uses timelines is based on the interests of social media users so that the phenomenon of echo chambers increasingly develops among users [3] . The development of social media as information media has various benefits that can be felt by the public, but behind the benefits arise problems from the use of social media. One of the problems is the number of hoaxes widespread. Even educated people cannot distinguish which news is true. It can be called an advertorial and hoax [4] . Many cases of hoax distributions are often carried out by buzzers in social media that distribute hoaxes, hate speech, and bullying that use emotions excluding the public in accepting facts and true objectives [5]. This means that in social media the buzzer has a big influence to invite or build opinions in the society so that their thought is in accordance with the opinions that the buzzer builds. One of the most widely used social media in Indonesia is Twitter with several users in Indonesia reaching 19.5 million [6]. Twitter buzzers which have the role of influencer will persuade followers about certain topics on Twitter. As is well known, Twitter provides a column containing biographies to describe the user’s identity briefly, but buzzer accounts often cannot be identified and the columns tend to empty [7]. In fact, we often encounter buzzers who share hoaxes, hate speeches, and slanders using fake identities to cover their identities when writing their tweets on Twitter. This is done to avoid legal threats that threaten the buzzer who shares hoaxes to the public. In Indonesia, the impact of the hoax case that we just felt was the emergence of riots in the Papuan Student Dormitory in Surabaya due to provocation against Papuan students. Police said the suspect’s Twitter post contained provocative contents and led to hoaxes. East Java Regional Police Chief Inspector General, Luki Hermawan, said that a total of five suspects’ posts were hoaxes. Based on the description mentioned, this study aimed to examine the bot Twitter users or social media users with fake identities with Twitter users who use real identities in distributing hoaxes on the issue of riots in Papua that just occurred in 2019.
Fake Identity in Political Crisis: Case Study in Indonesia
767
2 Research Method This study is a mixed study, referred to as mixed methods, which is a procedure for collecting, analyzing, and “mixing” or integrating quantitative and qualitative data at several stages of the study process in a single study to obtain a better understanding of the problem of the study [8, 9]. The design used in this study was a mixed methods sequential explanatory design, the design of a study that consists of two different phases, quantitative and qualitative [10]. In this design, a researcher first collects and analyzes quantitative (numerical) data. Then, the qualitative data (text) are collected and analyzed to help explain or elaborate the quantitative results obtained in the first stage. The second phase, qualitative, is built on the first phase, quantitative, and the two phases are connected at an intermediate stage in the study. This approach was used because quantitative data and subsequent analysis provide a general understanding of the problem of the study. The qualitative data and the analysis refine and explain the statistical results by exploring participants’ views in more depth [8, 10, 11]. In this study, the quantitative method was used to describe the number of outdegree and indegree to examine the tweets on Twitter and use qualitative methods to deepen the findings on the previous results. The author used social network analysis (SNA) to find out the buzzer accounts and the original accounts on Twitter. SNA can be described as a study that studies human relations by utilizing graph theory [12]. By using the graph theory, SNA can examine the structure of social relations within a group to reveal informal relationships between individuals.
3 Result and Discussion From these data, it is known that the Twitter accounts “polisimedsos” and “bob_bay” issued a high outdegree node value of 50. Through SNA analysis, it is known that an outdegree is an account that mostly mentions other Twitter accounts. The account issued a lot of jokes about the Papua hoax case and the mention made led to the “Jokowi” account. An indegree account is an account that is mentioned a lot or has many users (Fig. 1; Table 1). High outdegree accounts such as polisimedsos and bob_bay each have 2287 and 462 followers. In general, people must use their real identities when using social media accounts to avoid misuse of social media by irresponsible people. In a study conducted by [13] by observing interactions between fake profiles and other Facebook users, there was a strong tendency for Facebook users to consider a profile as fake if it appeared to be new on Facebook, which means that the profile has few friends and an almost empty wall. When the number of friends increases, skepticism of other Facebook users decreases. Having a higher number of friends and many social activities makes the profile socially attractive to other users as determined by [14]. At the beginning of the trial, all our fake profiles received messages from people we sent friend requests.
768
K. Setyowati et al.
Fig. 1 Network relation
Table 1 Node relation total
Twitter account Polisimedsos
Indegree
Outdegree
Total
0
50
50
Yusuf_dumdum
29
0
29
permadiaktivis
31
0
31
Savicali
29
0
29
Bob_bay
0
39
39
57
0
57
Jokowi
Similarly, Twitter users with fake accounts will tend to do a lot of outputs or tweets to attract the attention of the public. In addition, fake account users on social media currently tend to issue provocations or hoaxes through their social media accounts. Like the previous case in Papua, which started from Twitter tweet, it was confirmed by the authorities as a hoax. This case brought many impacts including the riots that occurred in Papua until the government through the Menkominfo took the policy to shut down all Internet connections in Papua. This was done to prevent other conflicts that increasingly triggered riots in Papua. However, the people felt that this disturbed the way of communication and resulted in paralysis of activities in Papua.
4 Conclusion Based on the description above, it is known that some users misuse social media to share hoaxes. Like the case of Papua which originated from the users’ tweets that contain provocation hoaxes that have implications for the riots that occurred
Fake Identity in Political Crisis: Case Study in Indonesia
769
in Papua. Through SNA, we can find out the relationship between one node with another. The nodes that tend to issue high outdegree numbers are twitter accounts that use fake identities and Twitter tweets that tend to share hoaxes. In this case, it is imperative that the government overcome and prevent the widespread of hoax distribution. As a people controller, the government should to pay more attention to freedom of expression in social media.
References 1. N. Rochlin, Fake news: Belief in post-truth. Library Hi Tech. 35(3), 386–392 (2017) 2. J. Kristiyono, O.R. Jayanti, Fake news (Hoax) and paranoid frame of min of social media user, in 3rd International Conference on Transformation in Communications 2017 (IcoTiC 2017) (Atlantis Press, 2017 November) 3. M.Y. Alimi, Mediatisasi agama, Post Truth Dan ketahanan Nasional: Sosiologi Agama Era Digital. Moh Yasir Alimi (2018) 4. D.R. Rahadi, Perilaku pengguna dan informasi hoax di media sosial. J. Manajemen dan Kewirausahaan 5(1) (2017) 5. A.E.W. Wuryanta, Post-Truth, Cyber Identity dan Defisit Demokrasi (2018) 6. Kominfo Indonesia. Available at: https://kominfo.go.id/index.php/content/detail/3415/Kom info+%3A+Pengguna+Internet+di+Indonesia+63+Juta+Orang/0/berita_satker. (2019) 7. R. Juliadi, The construction of Buzzer identity on social media (A descriptive study of Buzzer identity in Twitter), in 3rd International Conference on Transformation in Communications 2017 (IcoTiC 2017) (Atlantis Press, 2017, November) 8. T.C. Teddlie, The past and future of mixed methods research: from data triangulation to mixed model designs, in Handbook of Mixed Methods in Social and Behavioral Research, pp. 671–701 (2003) 9. W.E. Hanson, J.W. Creswell, V.L.P. Clark, K.S. Petska, J.D. Creswell, Mixed methods research designs in counseling psychology. J. Counsel. Psychol. 52(2), 224 (2005) 10. J.W. Creswell, V.L.P. Clark, M.L. Gutmann, M.L, W. Hanson. Advance Mixed methods Research Designs (2003) 11. G.B. Rossman, B.L. Wilson, Numbers and words: combining quantitative and qualitative methods in a single large-scale evaluation study. Eval. Rev. 9(5), 627–643 (1985) 12. M. Tsvetovat, A. Kouznetsov, Social Network Analysis for Startups: finding Connections on the Social Web (O’Reilly Media, Inc., 2011) 13. K. Krombholz, D. Merkl, E. Weippl, Fake identities in social media: a case study on the sustainability of the Facebook business model. J. Service Sci. Res. 4(2), 175–212 (2012) 14. Y. Boshmaf, I. Muslukhov, K. Beznosov, M. Ripeanu, The socialbot network: when bots socialize for fame and money, in Proceedings of the 27th Annual Computer Security Applications Conference (ACM, New York, 2011, December), pp. 93–102
Cloud Computing in the World and Czech Republic—A Comparative Study Petra Poulová, Blanka Klímová, and Martin Švarc
Abstract Cloud computing is a technology that has significantly influenced the current IT industry, both from the user’s point of view and the provider. The purpose of this article is to explore the use of the cloud computing in the corporate environment in the Czech Republic and abroad. The authors used a method of online questionnaire survey for the collection of the data from organizations in the Czech Republic, and for the collection of foreign data, research of the company RightScale was used. The results of this survey show that the general cloud adaptation in the Czech organizations is not as small and may even be global in some respects. However, in the world, companies more incline to use cloud computing, they are less skeptical, and advantages prevail over disadvantages as far as the cloud computing is concerned. Keywords Cloud computing · IT industry · Survey
1 Introduction Cloud computing is a technology that has significantly influenced the current IT industry, both from the user’s point of view and the provider. Cloud computing is a very broad term, but according to the study by Mell and Grance, it can be characterized by the following five basic features that are common for all cloud services: selfservice on request, broad network access, resource pooling, rapid elasticity, measured service [1, 2]. Cloud computing (CC) can be defined as the use of various services,
P. Poulová (B) · B. Klímová · M. Švarc Faculty of Informatics and Management, University of Hradec Kralove, Rokitanskeho 62, Hradec Kralove 500 03, Czech Republic e-mail: [email protected] B. Klímová e-mail: [email protected] M. Švarc e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_69
771
772
P. Poulová et al.
Fig. 1 Division of cloud computing according to [5]
such as software development platforms, servers, storage and software, over the internet, often referred to as the “cloud” [3]. Cloud-based applications are all around us [4]. Many users do not even realize that they are actually applications that work on this technology. One of such applications is, for example, an email client. The cloud applications are categorized in practice according to two main aspects. The first aspect divides applications according to the type of cloud service they provide (e.g., software as a service, platforms as a service, or communications as a service). The second aspect divides cloud applications according to the so-called deployment model. This model defines how cloud services are provided to the end user. These include public cloud computing, private cloud computing, or hybrid cloud computing. Figure 1 provides these divisions of CC [5, 6]. The purpose of this article is to explore the use of the cloud computing in the corporate environment in the Czech Republic and abroad.
2 Methods The authors used a method of online questionnaire survey for the collection of the data from organizations in the Czech Republic. This was conducted from February 2018 till June 2018. Because the questionnaire was distributed in the corporate environment, great emphasis was placed on its brevity and short duration. The completion of the questionnaire took five minutes at maximum. Due to the difficulty of collecting foreign data, the authors used data that was collected and evaluated by RightScale. The company carries out similar research every year. This article works with the research data from January 2017 [7].
Cloud Computing in the World and Czech Republic …
773
3 Findings and Discussion In this section, the authors compare the findings from the Czech Republic and abroad and illustrate them in figures. The individual figures are divided into six categories, which for the sake of comparability correspond to the research categories conducted by RightScale [7]. They are as follows: • • • • • •
Key data Demographics Use of CC in companies Types of used cloud solutions Advantages and disadvantages Use/planned cloud providers
Table 1 provides the overview of the number of respondents. The demographic data in Fig. 2 show that the group with the highest percentage of respondents in the world is 1001 and more employees. In the Czech Republic, on the contrary, the group with the smallest number of employees, i.e., between 1 and 100 employees. This difference is most likely due to the more complex corporate hierarchy in larger organizations (1001 + employees). Therefore, it is more demanding to send a questionnaire email to the right person without getting lost. Another probable reason may be that larger organizations do not attach such weight to emails from Table 1 Overview of the number of the respondents in the world and in the Czech Republic
World
Czechia
Total number of respondents
1002
52
Respondents (1001 + employees)
485
14
Respondents (under 1001 employees)
517
38
Margin of error
3.07%
12.06%
Fig. 2 Respondents according to the size of a company
774
P. Poulová et al.
Fig. 3 Use of cloud computing in companies
an unknown person. It could also have been an advantage for RightScale, since 20% of the organizations were from among their clients [7]. Figure 3 then provides the data about the use of cloud. Interestingly, the results reveal that there is the smallest group of the so-called planners in the Czech Republic, the organizations that currently do not use cloud solutions in the organization but plan to use it in the future. Only a few organizations are currently thinking about switching to a cloud solution. In addition, there are not many the so-called cloud beginners in the Czech Republic, just under 8 percent. In the world, however, this group is quite numerous, reaching 22%. On the contrary, in the world, there is a group of cloud-based organizations. The second largest group both in the Czech Republic and in the world is “cloud explorers,” i.e., the organizations that have cloud deployed on multiple projects and focus on improving and expanding cloud utilization in an organization. Figures 4 and 5 then illustrate the types of cloud solutions used in the world and in the Czech Republic. The data obtained from the Czech Republic proportionally almost coincide with the data from the world. Only private or private cloud representation is higher, by about 12%. According to the survey, the lack of experts and security are the biggest disadvantages (Fig. 6). These two groups also ranked first in the global world, but on Fig. 4 Types of cloud solutions used in the world
Cloud Computing in the World and Czech Republic …
775
Fig. 5 Types of cloud solutions used in the Czech Republic
Fig. 6 Disadvantages of cloud solutions
a noticeably smaller scale. While in the Czech Republic the average of these two groups is 40.5%, the average in the world is 20%. Another noticeable difference is also the performance disadvantage, which worries two times more respondents in the Czech Republic than in the world. The biggest advantage of the cloud is, according to the respondents from the Czech Republic, that it is better scalable. This is followed by faster access to infrastructure and IT efficiency. On the contrary, the smallest share of the respondents sees the benefits of high performance and business continuity. On the contrary, over 38% of the respondents gained high performance in the world’s research (Fig. 7). As far as the used/planned cloud providers are concerned, Amazon Web Service dominates in the world by 5%, followed by Google Cloud and Microsoft Azure. None of the organizations surveyed in the Czech Republic uses a Digital Ocean as a provider. Furthermore, IBM’s solution with less than 5% representation is not very popular among the Czech respondents. On the other hand, AlgoCloud is not used in the world (Fig. 8).
776
P. Poulová et al.
Fig. 7 Advantages of cloud solutions
Fig. 8 Used/planned cloud providers
As for the services, the overwhelming majority of the respondents use the public cloud to store data. In the world, the relational database, which ranks fourth in the Czech Republic, wins. Nearly the same result in the Czech Republic and abroad is true for mobile services and sending notifications (Fig. 9).
Cloud Computing in the World and Czech Republic …
777
Fig. 9 Use of services in public cloud
4 Conclusion The results of this survey show that the general cloud adaptation in the Czech organizations is not as small and may even be global in some respects. Of the total of 52 respondents, 65.4% of them currently use cloud, and another nearly 6% of them plan to deploy a cloud in their organization within one year. However, this figure could have been affected by the fact that the majority of the respondents (48.1%) were small organizations of up to one hundred employees with no such pressure on this technology. On the other hand, in the world survey conducted by RightScale, which was compared with the survey in the Czech Republic, there was the largest share of the respondents from large companies (1001 and more employees). Overall, 80% of the respondents responded positively to the use of the cloud. Generally speaking, the world’s respondents more incline to use cloud computing, they are also less skeptical, and advantages prevail over disadvantages.
778
P. Poulová et al.
Acknowledgements This study is supported by the SPEV project 2019, run at the Faculty of Informatics and Management, University of Hradec Kralove, Czech Republic. Compliance with Ethical Standards Conflict of Interest The authors declare that they have no conflict of interests. Ethical Approval This chapter contains the survey of participants. Whatever taken in the paper as per their ethical approval). Informed Consent Informed consent was obtained from all individual participants included in the study.
References 1. L. Lacko, Osobni cloud pro domaci podnikani a male firmy (Personal cloud for home business and small companies) (Computer Press, Brno, 2012) 2. P. Mell, T. Grance, The NIST Definition of Cloud Computing (Draft): Recommendations of the National Institute of Standards and Technology (2011). http://csrc.nist.gov/publications/nis tpubs/800-145/SP800-145.pdf 3. Techopedia. Cloud Computing. https://www.techopedia.com/definition/2/cloud-computing 4. B. Klimova, P. Maresova, Cloud computing and e-learning and their benefits for the institutions of higher learning, in 2016 IEEE Conference on e-Learning, e-Management and e-Services, IC3e 2016 (Institute of Electrical and Electronics Engineers Inc, 2017), pp. 75–78 5. B.A. Sosinsky, Cloud Computing Bible (Wiley, Chichester, 2011) 6. S. Srinivasan, Cloud Computing Basics (Springer, New York, 2014) 7. RightScale 2017 State of the Cloud Report (2017). http://www.offis.com.au/static/media/upl oads/download_files/rightscale-2017-state-of-the-cloud-report.pdf
Data Quality Improvement Strategy for the Certification of Telecommunication Tools and Equipment: Case Study at an Indonesia Government Institution E. A. Puspitaningrum, R. F. Aji, and Y. Ruldeviyani Abstract The problem based on the findings of the Supreme Audit Agency (BPK) on licensing performance checks at the Ministry of Communication and Information Technology (KOMINFO) is that the certification data of telecommunication tools and equipment in the e-certification information system database is incomplete, inaccurate, and invalid in supporting equipment and equipment certification services telecommunication. Based on these conditions, the level of maturity of data quality management is measured. The measurement of the level of maturity of data quality management is carried out using the Modelo Alarcos de Mejora de Datos (MAMD) 2.0 framework. The result of measurement data quality maturity level is level 1, while the expected level is level 2. The strategy to improve the quality of certification data for telecommunication tools and equipment is prepared based on an analysis of the causes of data problems, gaps in the current condition of quality data management and expectations, and regulations related to data governance. The recommendations produced are grouped into eight data discipline points that need to be implemented by the Directorate of Standardization agreeing to reach the desired level of data quality management maturity. Eight points of data discipline are data requirements management, technology infrastructure management, configuration management, historical data management, data security management, data quality control and control, data life cycle management, as well as standard resolutions, policies, and procedures. Keywords Telecommunication · Tools and equipment certification data · Data quality management · Maturity of data quality management · MAMD
E. A. Puspitaningrum (B) · R. F. Aji · Y. Ruldeviyani Faculty of Computer Science, Universitas Indonesia, Jakarta, Indonesia e-mail: [email protected] R. F. Aji e-mail: [email protected] Y. Ruldeviyani e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_70
779
780
E. A. Puspitaningrum et al.
1 Introduction According to Carretero et al. [1], to develop the potential of an organization and to innovate in increasingly competitive markets is based on data. Because of this fact, organizations are becoming increasingly aware of the high level of data quality, the higher the benefits they can get. The Directorate General of Resources and Equipment of Post and Information Technology (SDPPI) has the task of formulating and implementing policies and technical standardization in the field of postal and information technology resources and instruments [2]. Based on the structure and main tasks and functions carried out by the Directorate General of SDPPI, in addition to the functions of policy, regulation, and guidance, the Directorate General of SDPPI also has a public service function. The function of public services is carried out through the issuance of radiofrequency spectrum licenses, including complaints of radiofrequency spectrum disturbances, competency testing, and radio operator certification, certification and testing of telecommunications equipment and devices. The public service certification of telecommunication tools and equipment is precisely carried out by the Directorate of Standardization. The findings of the Audit Examination document on the Effectiveness of Management of Licensing Services in the Field of Communication and Information in 2016 SD Quarter IV of the Fiscal Year 2017 at the Ministry of Communication and Information Phase 3 by the BPK found that the e-certification information system used was incomplete, inaccurate, and invalid to support services for certification of telecommunication tools and equipment [3]. From the results of the BPK’s findings, it can be concluded that the quality of data certification for telecommunication tools and equipment in the e-certification information system database as in Table 1. The findings Table 1 Data quality conditions Description
Data quality
Amount of data
The invoice that does not have an application number
Not complete, inaccurate, invalid
21
Some payments have no invoice
Inaccurate
2
There is experimental data (dummy) in the production database
Inaccurate
10
There is a certificate that has an APL_ID Inaccurate that is not inside Table T_Application
1
There is a certificate extension of more than 1 time
Not complete, inaccurate, invalid
3
There is a printed certificate but no payment info
Not complete, inaccurate, invalid
2
There is the same certificate number for different companies and different telecommunications equipment
Inaccurate
2
Data Quality Improvement Strategy for the Certification …
781
are not following the document Better Management Practices which is the discussion result of the criteria for performance inspection on the effectiveness of licensing services in the field at the Ministry of Communication and Information, whereas in sub-criteria 3.3.2 the document Better Management Practices states that the database information system licensing services have been administered completely, accurately and validly [4]. Therefore, a strategy is needed to maintain data quality known as data quality management (DQM), a process that includes planning, implementation, and control that applies quality management techniques that aim to measure, assess, and ensure the data is good enough to be used [5]. DQM not only plays a role in repairing damaged data but also plays a role to maintain data quality in each data life cycle process (data life cycle) so that it meets the expectations of data users [5]. Some previous studies that discussed data quality management include Carretero et al. [1] which uses the MAMD 2.0 framework where the process reference model is based on the principles of ISO 8000-61 and the evaluation model is based on ISO 8000-62. Larburu et al. [6] introduced three quality of data (QoD) management techniques, Malange et al. [7] using a technology-organization-environment framework that was considered simultaneously and Glowalla et al. [8] using a combined conceptual life cycle model framework. The MAMD framework was chosen for the management of data quality in the certification of telecommunication tools and equipment because this framework uses process-focused data discipline. It is hoped that the data discipline can improve the data quality of the telecommunication tools and equipment certification process. The purpose of this research is to make recommendations for data quality management strategies to improve the quality of data on Telecommunications Tools and Equipment Certification in the Directorate of Standardization.
2 Literature Review 2.1 Definition of Strategy According to Cassidy [9], strategy refers to the level of global thinking about organizational information systems (SI) and their integration with other parts of the company. According to Ward and Peppard [10], IS/IT strategy is a long-term and directed plan that decides what should be done with IT which is primarily concerned with harmonizing the development of SI with business needs and seeking benefits from IT.
782
E. A. Puspitaningrum et al.
2.2 Data on Telecommunication Tools and Equipment Certification According to Dama International [5], data is a representation of facts as text, numbers, graphics, images, sounds, or videos. Based on the Ministry of Communication and Information [11], certification of telecommunication tools and equipment is a series of activities to issue certificates for telecommunication tools and equipment. Data processed during the certification process of telecommunication tools and equipment and quality must be maintained including application data for an application for certification, telecommunication tools and equipment certificate data, invoice data (SP2), payment data for SP2 data, and data number for receipt of the certification application.
2.3 Data Quality Management According to Dama International [5], data quality management (DQM) is a critical support process in the management of organizational change. Changing business focus, corporate business integration strategies, and mergers, acquisitions, and partnerships can mandate that IT functions to integrate data sources, make copies of gold data, retrospective data, or integrate data.
2.4 Data Quality Framework Loshin [13] in his book introduces a framework model in managing data quality. This framework as a reference to measure the level of maturity (maturity level) of an organization’s data quality or called the data quality maturity/capability model. The model for measuring the level of maturity was developed based on the Capability Maturity Model (CMM) created by the Software Engineering Institute at Carnegie Mellon University. Maturity level of data quality according to Loshin has a level from 1 to 5. The framework introduced by Loshin [13] includes eight dimensions, namely: 1. 2. 3. 4. 5. 6. 7. 8.
Expectations of data quality (data quality expectation) Use of data quality dimensions (dimension of data quality) Information policy Procedure to support information policy (procedure) Governance Data standardization (standard) Technology (technology) Performance management.
Data Quality Improvement Strategy for the Certification …
783
Carretero et al. [1] in their study introduced a model framework in managing data quality. This framework is a reference for measuring the maturity level of data quality of an organization. The MAMD framework model is based on the principles of ISO 8000-61, which is complemented by adding specific data governance processes, and specific to the data management process. The evaluation model is based on ISO 8000-62 and is therefore following ISO/IEC 33000. Carretero et al. [1] approved the work of MAMD based on three related data disciplines: data management, data quality management, and data governance. Processes related to data governance include from DG 1 to DG 8. Processes related to data management include from DM 1 to DM 9. Processes related to data quality management include from DQM 1 to DQM 4. MAMD data quality maturity level has a level from 0 to 5. The maturity level is immature, basic, managed, established, predictable, and innovating. Maturity level is calculated based on the level of process capability in the process reference model included in the evaluation. The level of capability is calculated by considering the level of institutionalization of good practice and the process attributes described in ISO/IEC 33020. The classifications for each process attribute according to ISO/IEC 33020 are: “Not Achieved (N)”, “Partially Achieved (P)”, “Fully Achieved (F)”, and “Achieved Most (L)”. The maturity model is shown in Table 2. Table 3 summarizes the process attributes and capability levels that must be achieved by following the explanation in ISO/IEC 33020 in clause 5.2. Based on the comparison Table 4, the MAMD framework was chosen to improve the data quality for certification of telecommunication tools and equipment because Table 2 Ordinal scale for rating capability levels
Source Carretero et al [1]
Table 3 Ability levels and process attributes by following ISO/IEC 33020
Process capability level
Process attributes
The process is incomplete n/a Process is undertaken
PA.1.1. Processing performance
Process managed
PA.2.1. Work management PA.2.2. Work product management
Source ISO/IEC [14]
784
E. A. Puspitaningrum et al.
Table 4 Comparison of data quality management frameworks Framework Loshin
MAMD
Equation
Equally used to measure the maturity of the quality of data management in an organization
Difference
Maturity assessment coverage includes Maturity assessment coverage includes eight dimensional components nine data management processes, four data quality management processes, and eight data governance processes
this framework uses data discipline that focuses on the process. It is hoped that the data discipline can improve the data quality of the telecommunication tools and equipment certification process.
3 Theoretical Framework Figure 1 is a theoretical framework used to develop data quality improvement strategy for the certification of telecommunication tools and equipment. The framework is based on the results of a literature review that has been done. The arrows in the picture show the relationship between each of the factors that influence. While the factors that influence the strategy to improve the quality of telecommunications tool and equipment certification data are the causes of data quality problems, consideration of data quality management requirements that occur between current conditions (obtained from the process assessment model) with the
Fig. 1 Theoretical framework
Data Quality Improvement Strategy for the Certification …
785
quality of data needed using the reference process reference model, the guideline general secretary of the Ministry of Communication and Information Number 1 of 2018 and the Minister of Communication and Information Technology Regulation Number 41/PER/MEN.KOMINFO/11/2007 concerning the General Guidelines for Governance of National Information and Communication Technology. This theoretical framework is used as a guide to working on the results and discussion in Chapter 5. The initial process of the study was assessed of the quality of the telecommunications tools and equipment certification data based on the dimensions identified and analyzed the causes of data quality problems. The research process was continued by carrying out an assessment of the level of maturity of the current data quality management based on the results of the interview. Furthermore, the measurement of the expected level of maturity of data quality management is carried out and then it is analyzed the gap of current data quality management with that expected. Furthermore, recommendations for improving data quality based on current data quality management gaps are followed by best practices in data quality management and include improvements to the causes of data quality problems. From these recommendations, a mapping of the success indicators is then conducted to measure the success of each recommendation. Besides, an impact analysis is carried out to determine the priority of the implementation of the strategy and the target time for the implementation of the strategy.
4 Research Methodology 4.1 Data Collection Procedures This study uses a mixed-method research methodology with a type of concurrent triangulation design. In concurrent triangulation designs, both quantitative and qualitative methods are used simultaneously in one phase, to confirm, cross-validating, or reinforce findings in one study. Both components are considered equally important [12]. At the stage of data collection and processing data analysis quantitative and qualitative methods are used simultaneously to confirm, or strengthen findings. Quantitative methods are used to measure data quality maturity. Qualitative methods are used to see the quality of current data for each dimension so that the data quality problems (data anomalies) can be obtained in each dimension. To get data quality problems, questions and procedures are made and data collection is done. The data collection is done by face to face and interacting with the people involved in the research case study. The data sources used in this study are data from research results obtained through two data sources, namely primary and secondary data. Primary data in this study was obtained by observing data from Certification Telecommunications Tools and Equipment databases to determine the quality of current data. Besides methods were also conducted with interview techniques and observations. The interview method
786
E. A. Puspitaningrum et al.
uses structured interviews, researchers have known with certainty the information that the resource person wants to explore and has made a list of questions systematically. The selection of sources as a source of data in this study is based on the principle of subjects who master the problem, have data, and are willing to provide complete and accurate information. The resource person who acts as a source of data and information must fulfill these requirements. The speakers in this study were officials at the Directorate of Standardization who handled the process of certification of telecommunication tools and equipment, officials in the SDPPI Control Directorate who managed e-certification database and information systems, and e-certification staff programmers. In this study, secondary data relates to the issue of research obtained from internal organization data (documents) regarding the management of telecommunications tools and equipment certification data at this time. This secondary data is used as supporting data in assessing the current process, and as supplementary data for primary data.
4.2 Methods/Techniques for Analyzing Data There are two methods of data processing used in this study, namely the processing of qualitative data using thematic analysis and quantitative data processing using a process assessment model. The process assessment model used uses the MAMD 2.0 framework guide. Data that has been obtained by interview, observation, a document study, and benchmarking techniques will be interpreted to get a deep understanding of the problems that occur. Then data processing and analysis is carried out by conducting thematic analysis (qualitative methods) and process assessment models (quantitative methods) on the results of interview transcripts to get recommendations and conclusions. According to Carretero et al. [1], process assessment model is a process to calculate the level of capability of the process, different types of evidence must be examined and will be collected back to each example of the business process that has been selected for evaluation. As a result of the level of ability, a classification will be obtained.
5 Results and Discussion 5.1 Assessment of Current Data Quality Dimensions Data quality assessment is carried out on the dimensions of data quality, namely completeness, accuracy, and validity. This assessment is intended to identify data
Data Quality Improvement Strategy for the Certification …
787
problems in each dimension of measurement. Data quality measurement is carried out on the main data of the telecommunication tools and equipment certification process which includes application data for certification applications, telecommunication tools and equipment certificate data, invoice data (SP2), SP2 pay data info, and receipt data. The measured data is telecommunication tools and equipment certification data from 2016–2018, following the data which is the material for BPK examination. Measurements are made by querying using structured query language (SQL). Measurements are carried out based on data quality criteria rules relating to each dimension of data quality. The criteria are based on information on the e-certification system database, applicable laws and regulations related to the certification process of telecommunication tools and equipment and the results of focus group discussion (FGD) attended by the Directorate of Standardization as the owner of business processes, Directorate of Control as responsible for development and maintenance system, as well as application developer vendors. Data problems for each dimension that has been identified will then be searched for the cause of the problem. Identification of the causes of data quality problems is obtained through interviews with e-certification system programmers. After getting the cause of the data problem, then analyzing the causes of data quality problems to get recommendations for solutions to these problems. Analysis of the causes of data quality problems is done by mapping the causes of similar data problems to be solved with the same solution. Example of Recommendations Based on Solutions to Cause of Data Quality Problems shown in Table 6.
5.2 Current Maturity Level/Data Quality Management Capability Assessment The assessment of the maturity level of data quality management in this study uses a process assessment model from the MAMD framework. According to Carretero et al. [1], the MAMD evaluation model is based on ISO/IEC 33000. In clause 4.2 ISO/IEC 33010 requires that to reach the maturity level all processes included in the previous level must be in the status “F”, and the process at that level is at least in the condition “L”. Therefore, when an organization will be assessed regarding data quality management, the assessor must investigate based on evidence how much data quality management processes from the reference model of data quality management processes are achieved. A list of interview questions includes the process attributes PA.1.1, PA.2.1, PA.2.2, and processes included in level 1 and level 2 MAMD. This research documents and classifies evidence according to the template produced as part of the changes made to MAMD. Based on the evidence gathered and following the guidelines provided, we proceed to derive the results for each of them—shown as {N, P, L, F} in Table 5. In Table 5, the process of maturity level 1 is evaluated as PA.1.1 where DM. 2, DM. 5, DM. 3, DM. 4, DG. 2 reach the value “F” and DM. 1, DQM. 2, and DG. 4 reach
788
E. A. Puspitaningrum et al.
Table 5 Ranking of process attributes and maturity levels of data quality management certification telecommunications tools and equipment DM. 1
DM. 2
DM. 5
DM. 3
DM. 4
DQM. 2
DG. 2
DG. 4
PA. 1.1
L
F
F
F
F
L
F
L
PA. 2.1
P
F
L
L
F
N
L
P
PA. 2.2
N
L
P
L
L
N
P
N
the value of “L”, so that the maturity level reaches 1 but cannot proceed to level 2 because in PA.1.1 not all of them reach the F value. So that the maturity level of data quality management certification telecommunications tools and equipment still reaches maturity level 1.
5.3 Maturity Level/Management Capability Data Quality Expected The maturity level of the data quality management certification telecommunications tools and equipment expected in the future is formulated from the results of the study of related regulatory documents, namely the Minister of Communication and Information Technology Regulation Number 41/PER/MEN.KOMINFO/11/2007 concerning General Guidelines for Information Technology Governance And National Communication. The study of the document uses the method of reviewing the contents of the document, namely the technique used to conclude efforts to find the characteristics of the message, and carried out objectively, systematically. From the results of regulation document review contents, several regulations were found which were following the characteristics of the MAMD maturity level assessment. The results of data quality for certification telecommunication tools and equipment maturity level are expected to have been validated by the Head of the Telecommunication and Informatics Post Equipment Certification Section. From the results of the validation, it was decided that the level of management maturity in the quality of data on certification for telecommunication tools and equipment would be achieved in stages. In the early stages of 2019–2020, a level of 2 (two) is expected and is expected to increase by 1 (one) level every year to level 5 (five) by the end of 2023.
5.4 Data Quality Management Maturity Level Gap The management maturity level gap in the quality of certification for telecommunication tools and equipment data is seen from the gap between the current level of data quality management maturity and the expected level of data quality management maturity. The current level of data quality management maturity is level one
Data Quality Improvement Strategy for the Certification …
789
or basic. While the level of data quality management maturity is expected for this year and next year are two or managed. To become a level two requires that to reach the maturity level all processes are included in the level one attribute of the PA process. 1.1 and level 2 attributes of the PA process. 2.1 must be in status “F”, and the process at the level two attribute of the PA process. 2.2 is at least in the “L” condition. For each data discipline to reach the expected level of value, so that the level of management maturity of the certification for telecommunication tools and equipment quality data can be achieved, the Directorate of Standardization of must meet the characteristics to the expected maturity level. Examples of Characteristic Targets that must be fulfilled by the Directorate of Standardization shown in Table 7.
5.5 Recommendations Based on Data Management Policies Policy according to Loshin [13] is also needed in maintaining data quality in the management framework and data quality management. The quality of telecommunication tools and equipment certification data is currently required to follow the policies issued by the Ministry of Communication and Information. Recommendation based on data management policies include: 1. The realization of data management, data governance, and data maintenance in Regulation of the Minister of Communication and Information Technology Number: 41/PER/MEN.KOMINFO/11/2007 concerning General Guidelines for Governance of National Information and Communication Technology Points 4.5.2.4, 4.6.2.5, and 4.7.2.3. 2. Data Management in Guidelines for the Secretary General of the Ministry of Communication and Information Number 1 of 2018 concerning Information Technology Governance Ministry of Communication and Information Chapter VII Point C. Example of Recommendations Based on Data Governance Policies shown in Table 8.
5.6 Data Quality Improvement Recommendations Recommendations for improving the quality of data will be prepared based on the results of the analysis of the causes of data quality problems (Table 6), the results of the gap analysis of data quality management (Table 7), and policies issued by the Ministry of Communication and Information related to data governance (Table 8). Each target characteristic results from the gap analysis, recommendations solutions to data problem solving, and recommendations based on data governance policies
790
E. A. Puspitaningrum et al.
Table 6 Example of recommendations based on solutions to cause of data quality problems Code cause Recommended solutions PMD-04 PMD-05
Recommended code
Improve the ability to validate input data errors by creating R-03 functions that filter certificate numbers/receipt numbers/double invoice numbers before certificate data is entered in the database
Table 7 Examples of characteristic targets that must be fulfilled by the directorate of standardization Data discipline Process attributes Characteristics DG 2
PA. 2.1
Characteristic target code
The Directorate of TK-13 Standardization manages the performance of the process as follows: • Analyze to determine the right environment for information storage • Information must be accessed, transmitted, or manipulated by only authorized users who carry out official business. Carefully review the criteria and procedures for access rights • Written agreements and contracts must be made with third parties to establish requirements regarding information protection, usage restrictions and reporting incidents • Transfer information from active storage to the archive because the need for access has been reduced • When information reaches the end of the required retention period, it may be destroyed or permanently stored in an archive for historical reference. Take place or research objectives
are grouped using the categorization method. Recommendations and target characteristics are grouped according to similar data discipline categories. Example of Data Quality Improvement Recommendations shown in Table 9.
Data Quality Improvement Strategy for the Certification …
791
Table 8 Example of recommendations based on data governance policies Policy
Recommendation
Regulation of the Minister of Communication and Information Technology Number: 41/PER/MEN.KOMINFO/11/2007 concerning General Guidelines for the Management of National Information and Communication Technology Point 4.5.2.4. concerning Realization of Data Management
At the input stage, the procedures R-07 that must be carried out are: data access procedures, data transaction procedures to check their accuracy, completeness, and validity, as well as procedures for preventing data input errors
Recommended code
Table 9 Example of data quality improvement recommendations Recommended Code Recommendations TK-13 TK-14 R-07 R-08 R-09 R-15 R-19
The Directorate of Standardization manages the performance of the data lifecycle management process as follows: • Analyze to determine the right environment for information storage • Written agreements and contracts must be made with third parties to establish requirements regarding information protection, usage restrictions and reporting incidents • Transfer information from active storage to the archive because the need for access has been reduced • When information reaches the end of the required retention period, it may be destroyed or stored permanently in the archives for ongoing historical references or research purposes • Deletion of sensitive data is carried out using safe methods and techniques so that the Ministry of Communication and Information is protected from the risk of leakage and data abuse
5.7 Data Quality Improvement Strategies The results of recommendations for improving the quality of the data in Subsection 5.6 then mapping the success indicators to measure the success of each recommendation. An impact analysis is carried out to determine the priority of the implementation of the strategy and the target time for implementing the strategy. The target time for implementing the strategy is aligned with the activity schedule and budget plan of the sub-directorate for the post, telecommunications, and information technology certification and data. The results of the analysis of strategies to improve the quality of data on telecommunications equipment and equipment certification are as follows: • Target Q3 2019 Time The Directorate of Standardization refines the data quality strategy that is aligned with the needs and expectations of stakeholders through the process performance that produces work products.
792
E. A. Puspitaningrum et al.
• Target Time Q4 2019 1.
Directorate of Standardization monitors the level of quality in the data used and corrects discrepancies found in data through process performance that produces work products. 2. Increase the ability to validate input data errors by creating a function that filters the certificate number/receipt number/multiple invoice numbers before the certificate data is entered in the database 3. Decrease the number of incidents that cause downtime and decrease the amount of total downtime per time duration 3. Increasing the validation capability of the e-certification application that relates to data input application and certificate applications 9. Directorate of Standardization precisely manages work products produced by the configuration management process. 10. Evaluate the e-certification database structure • Target Time Q1 2020 1. Directorate of Standardization properly manages work products produced by the technology infrastructure management process. 2. Setting up a server to become a development server so that dummy data is not stored in the database in the production server 3. Conducted regular testing of the mechanism of data backup and restore, to ensure the integrity and validity of the procedure 4. Make arrangements to equalize the time zone between the payment gateway server and the e-certification application server • Target Q2 2020 Time Directorate of Standardization precisely manages work products produced by historical data management processes. • Target Q3 2020 Time 1. Directorate of Standardization properly manages work products produced by data security management processes. 2. Directorate of Standardization manages the performance of the data life cycle management process • Target Time Q4 2020 The Directorate of Standardization has established regulations to control data quality, guarantee data quality, improve data quality, support-related data and provide resources consistently throughout the organization.
Data Quality Improvement Strategy for the Certification …
793
6 Conclusion The measurement of the maturity level of data quality management currently at level 1. The management maturity level of the quality of telecommunication tools and equipment certification data expected in the future is level 2. The results of the measurement of current conditions are compared with expectations for gap analysis. The results of the gap analysis are seen as opportunities in increasing the maturity level of data quality management. The recommended strategy recommendations are expected to help improve the quality of telecommunication tools and equipment certification data by implementing data quality management. Some recommendations are grouped into eight points of data discipline that need to be implemented by the Directorate of Standardization if they want to reach the expected level of data quality management maturity. Eight data discipline points include data requirements management, technology infrastructure management, configuration management, historical data management, data security management, data quality control and monitoring, data life cycle management, as well as standard definitions, policies, and procedures. Four points have a high impact, and the other four have a medium impact. Fulfillment of recommendations is scheduled to be carried out until the end of 2020, with a quarterly period every three months. In the future, the proposed framework needs to become a more general framework that can be used by any government organization. Furthermore, it is also interesting to improve the quality of public service data for all ministries so that the Indonesian government can improve the quality of public services. Acknowledgements This study was funded by Universitas Indonesia (under the PIT 9 Grant 2019 of “Optimization of Digital Business Contribution trough Platform-based Project Management Model” (No: NKB0014/UN2.R3.1/HKP.05.00/2019). The authors declare that they have no conflict of interest. This chapter contains the interviews of participants. Informed consent was obtained from all individual participants included in the study.
References 1. A.G. Carretero, F. Gualo, I. Cabbalero, M. Piattini, M, MAMD 2.0: Environment for Data Quality Processes Implantation Based on ISO 8000-6X and ISO/IEC 33000 (Elsevier, 2, 2016) 2. K. Kominfo, Regulation of the Minister of Communication and Information Technology Number 6 of 2018 concerning the Organization and Work Procedure of the Ministry of Communication and Information Technology [in Bahasa], Jakarta: Kominfo, 2018 3. BPK, Findings of performance checks on the effectiveness of management of licensing services in the field of communication and informatics in 2016, iv quarterly 2017 budget year in the ministry of communication and informatics stage 3 [in Bahasa] (BPK, Jakarta, 2017) 4. BPK, & Kemkominfo, Better Management Practice (Jakarta: Kementerian Kominfo, 2017) 5. Dama International, The DAMA Guide to the Data Management Body of Knowledge (DAMADMBOK Guide) (United States of America: Technics Publications, LLC, 2009)
794
E. A. Puspitaningrum et al.
6. N. Larburu, R. Bults, M. Sinderen, H. Hermens, Quality-of-data management for telemedicine systems, in The 5th International Conference on Current and Future Trends of Information and (Procedia Computer Science, 2015), pp. 451–458 7. S.N. Malange., E.K. Ngassam, S. Ojo, I.O. Osunmakinde, Methodology for improving data quality management in South African, in IST-Africa 2015 Conference Proceedings (IIMC International Information Management Corporation, 2015) 8. P. Glowalla, P. Balazy, D. Basten, A. Sunyaev, Process-driven data quality management— an application of the combined conceptual life cycle model, in 47th Hawaii International Conference on System Science, pp. 4701–4709, 2014 9. A. Cassidy, Information System Strategic Planning, 2nd edn. (Auerbach Publications, Boca Raton, 2006) 10. J. Ward, J. Peppard, Strategic Planning for Information Systems, 4th ed. (Wiley, London, 2016) 11. K. Kominfo, Minister of Communication and Information Technology Regulation number 18 of 2014 concerning certification of telecommunication tools and equipment [in Bahasa], Jakarta (2014) 12. G.C. Pheng, M.B. Nunes, F. Annansingh, Investigating information systems with mixedmethods research, in IADIS International Workshop on Information Systems Research Trends, Approaches and Methodologies (ISRTAM), Rome, Italy (2011) 13. D. Loshin, The Practitioner’s Guide to Data Quality Improvement (Morgan Kaufmann OMG Press, 2011) 14. ISO/IEC, ISO/IEC 33020:2014 Information Technology - Process Assessment - Process Measurement Framework For Assessment Of Process Capability, ISO (2014)
Evolution of Neural Text Generation: Comparative Analysis Lakshmi Kurup, Meera Narvekar, Rahil Sarvaiya, and Aditya Shah
Abstract In the past few years, various advancements have been made in Language Models owing to the formulation of various new algorithms such as Generative Adversarial Networks (GANs), ELMo and Bidirectional Encoder Representations from Transformers (BERT). Text Generation, one of the most important language modeling problems has shown great promise recently due to the advancement of more efficient and competent context-dependent algorithms such as ElMo and BERT and GPT-2 compared to preceding context independent algorithms such as word2vec and GloVe. In this paper, we compare the various attempts to Text Generation showcasing the benefits of each in their own unique form. Keywords Word2vec · GloVe · ELMo · BERT · GANs · GPT-2
1 Introduction Numerous efforts have been made in the past for Natural Language Text Generation. The most popular of them was using Long Short-Term Memory (LSTMs) and Recurrent Neural Networks (RNNs), where recent experiments have shown that they have a good performance in sequence-to-sequence learning and text data applications. The main advantage of using RNNs is that the activation outputs from neurons are propagated in both directions, creating loops in the architecture which act as the neurons memory center. This helps the neurons to remember the prior learned information. L. Kurup (B) · M. Narvekar · R. Sarvaiya · A. Shah Dwarkadas J. Sanghvi College of Engineering, Vileparle-W, Mumbai 400056, India e-mail: [email protected] M. Narvekar e-mail: [email protected] R. Sarvaiya e-mail: [email protected] A. Shah e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_71
795
796
L. Kurup et al.
LSTMs help overcome the Vanishing Gradient problem of RNNs. Moreover, the “cell state” of the LSTM helps to remember or forget the information more selectively. Word Embeddings is a way of representing text in the vector form, where words with the same meaning have a similar representation. Embeddings such as word2vec [1] and GloVe [2] have also given satisfactory results. In the word2vec method, words that share similar meanings are located in close proximity to the other words in the space. Word2vec uses vectors (a list of numbers) to represent words in a way that captures the semantic relationship between different words. The Continuous Skip Gram and the Continuous Bag of Words (CBOW) are the two major techniques. The former works better on larger datasets and on more infrequent words, while the latter is faster. A context window helps in determining the number of words in front and behind a given word that would be included as context words. GloVe is an unsupervised learning algorithm for obtaining vector representation of words. The Euclidean Distance is used for calculating the linguistic or semantic similarity of the words and the model is trained on the non-zero entries of a global word-word co-occurrence matrix, which tabulates how frequently words in a corpus occur together. Generative Adversarial Networks, a more recent approach, have found immense success in the Image Generation domain. But training GANs for Text Generation is comparatively more difficult. Past methods have involved using the maximum likelihood function or convolutional networks. Reinforcement Learning with GANs has shown promising results and produces more semantically correct sentences. Generative Pretrained Transformer which was introduced in 2018 was a combination of ULM-Fit and Transformer model. It used Generative Pretrained Language Modeling + task-specific fine-tuning for good transfer learning results. The key difference between GPT and ELMo was that ELMo [3] uses bi-directional LSTMs, whereas, GPT uses a multi-layer transformer decoder. ELMo feeds embeddings into customized models for specific tasks, whereas, GPT does fine-tuning of the base model, depending on the final end task. The major limitation of GPT was that it was unidirectional. This issue was overcome in ElMo and BERT. ELMo and BERT [4] have shown one of the most promising results as it not only captures the word, but also the context of the word. For example, for the sentences: (1) After reaching the tennis court, he realized he had forgotten his racket. (2) The loud racket caused by the party next door ruined the peaceful sleep of the neighbors. ELMo would generate different vectors for the different uses of the word racket. The first racket would be closer to words like bat, equipment, etc. The second racket would be closer to words like chaos, noisy, etc. So, instead of using a fixed embedding for each word, ELMo looks at the entire sentence before assigning each word in it, an embedding. This is performed by using a bidirectional LSTM network which is trained on a specific task to be able to create those embeddings. Apart from embeddings, ELMo has also shown prominent results in language modeling, a task to predict the next word in a sequence of words.
Evolution of Neural Text Generation: Comparative Analysis
797
On the other hand, BERT [5] is a transformer-based model whose language model is bidirectional, i.e., both forward and backward. It is a state-of-the-art NLP model for performing sentence classification, language modeling, sentiment analysis, Named Entity Recognition (NER) and various other NLP tasks. Just like ELMo, BERT can also be trained to create contextualized word embeddings ELMo and BERT can be considered as an upgrade from the word2vec and GloVe methods. They prove to be more efficient for context-based Text Generation and language modeling as they are context-dependent which leads to more syntactically and semantically correct text. In 2019, openAI released the GPT-2 [6], which was a successor of the original GPT model. It has 1.5 billion parameters trained on around 40 gigabytes of data, and has more transformer layers and parameters, along with a few architecture modifications from the original GPT. This advanced GPT-2 outperformed BERT is almost all aspects. While BERT had 340 million parameters, GPT-2 outperformed it with a massive 1.5 billion parameters. In short, BERT was very good at filling in the blanks, but GPT-2 was very good at essay writing. Thus, the immense number of parameters, trained on the large corpus of data made GPT-2 one of the most efficient methods for Text Generation.
2 Working 2.1 Word2vec Word2vec [1] is the most basic word representation that uses an LSTM and RNN network to generate new text based on the dataset. After loading and pre-processing the data, this data is fed into the different layers of our architecture. The different words are converted to their respective vectors with the help of gensim. This data is passed into an LSTM network. The looped nature of an RNN coupled with the ability of LSTMs to avoid the long term dependency problem enhances the Text Generation. The validation performance is evaluated based on sentence-level BLEU [7, 8] score. Reference sentence: a nonconvex bi-directional designing adaptation is adopted to introduce convolutions. Text Generated: a nonconvex bidirectional designing adaptation adopted do introduce convolutionals called. Reference sentence: simple and effective applications directly improve the core domain understanding in the latest science applications. Text Generated: simple and effective improves core improve domain understanding rewritten late science applications, directly.
798
L. Kurup et al.
2.2 GloVe GloVe [2] is another basic word representation that uses word embedding and LSTM to train a neural network to generate text mimicking the style of the training dataset. By sampling a random seed sentence, we generate the next word, append it to the seed sentence, repeat, and then generate the next word. We will then have a new paragraph generated by the neural network. We use a pretrained GloVe embedding layer by downloading a 50d pretrained word vector. The main difference between word2vec and GloVe is that word2vec is a predictive model and GloVe is a count-based model. Word2vec learns their vectors to improve the loss of predicting the target words from the context words based on the vector representation. GloVe does dimensionality reduction on the co-occurrence count matrix to learn the vectors. GloVe works better than word2vec on a larger amount of data. We use BLEU score as an evaluation metric to calculate the resemblance degree between the generated texts and human-created texts. Reference sentence: random gaussian weights perform filtering of symbolic gpus in engineering that has observed to develop dropconnect that is neq to studied shortcomings This difference growing outperforms rank pca theory. Text Generated: random Gaussian weights perform filtering symbolic gpus engineering observed develop dropconnect neq consists endowing studied shortcomings difference growing outperforms rank pca theory. We also trained the model using GloVe embeddings on a dataset of Donald Trump speeches, but received a lesser BLEU score as the dataset contained a lot of indirect and direct speeches conversions, which made it difficult for our model to generate syntactically correct sentences. Reference Sentence: it’s obvious to anybody the hatred is unreal and artificial which makes detailed evenings redundant so no benefits of addition of the articles to university. Text Generated: it’s obvious to anybody the hatred unreal artificially redundant detail evenings benefits addition washer articles intervention tz university.
2.3 ELMo ELMo [3] is a deep contextualized word representation technique that is implemented using a deep bidirectional language model (biLM) pretrained on a large text corpus. Unlike word2vec and GLoVe, the ELMo vector which is assigned to a token or word depends semantically on the complete sentence. So the same word in two semantically different sentences will have different embeddings. Here, we generate ELMo embeddings using pretrained TensorFlow hub model for the given semantically different sentences with respect to the word “bank”. While creating the model, the trainable parameter is set to true so that all the variable LSTM cells can be trained.
Evolution of Neural Text Generation: Comparative Analysis
799
Fig. 1 ELMo embeddings with vector values for different tokens generated on the above sentences
Sentence 1: She sat besides the bank Sentence 2: Bank account has 300 rupees Sentence 3: They were riding down the north bank of creek Sentence 4: Child plays near the bank Sentence 5: The bank accounts were cleared by him. We further implemented ELMo embeddings on the above five sentences and plotted them with a graphical representation. The output for the embeddings can be visualized with the help of matplotlib (Fig. 1). ELMo generates different embeddings for the word “bank” with respect to the contextual definition of the sentence. These embeddings are generated at the output of every hidden layer of bidirectional LSTM. Since ELMo [9] is capable of capturing syntactic and semantic information of words from large corpus text so they are widely used to correctly generate the next word in a sentence, a task known as language modeling. In language modeling, we predict the next most probable word in a sentence, which also depends semantically on the previous words generated in the sentence. One of the recent developments in language modeling is BERT—Bidirectional Encoder Representations from Transformers. BERT generates the next word in a sentence by taking into account both the left and right context of the words.
800
L. Kurup et al.
2.4 BERT BERT [5] is a new method for obtaining language representation which helps to achieve state-of-the-art results for various NLP tasks. Released by Google in 2018, the model is trained on a large plain corpus text specifically the complete Wikipedia dataset containing 2.5 B words and BookCorpus dataset containing 800 M words considering the contextual representation of the text as well. Models like ULMFit and ELMO are based on the unidirectional transformer [10]. Consider a given a text They were riding down the north bank of creek. And let the masked word be [bank], then ELMo [4] will consider all the words which are to the left or previous to the masked word in order to predict the next word. So it will be based only on the part of the sentence before the masked word [bank], i.e., They were riding down the north ….. And not on the full sentence. BERT, on the other hand, is the first deeply bidirectional model and thus the complete sentence They were riding down the north [mask] of the creek, is considered to predict the masked word. Here we make an attempt to generate raw text using BERT which is a PyTorch implementation of Google’s pretrained model. We use the BERT-base-uncased model from BertForMaskedLM which is fully pretrained for masked language modeling. To generate text based on masked words, we initially start from all masks, and then we repeatedly pick a location to mask the token at that location. Then, we generate the word at that location based on probability defined by BERT. Thus, the most probable word will be generated at the masked location. This process is continued and we stop when we converge to generate a random sample of raw text. Evaluation methods for unconditional raw text are generally not perfect. So here we measure the diversity of our samples with self_bleu., i.e., we compute corpus BLEU where for each generated sentence, we compute BLEU treating the other sentences as references. We also compute the percentage of n-grams that are unique among the generations. Here are some pieces of raw text as generated by our implemented BERT model. on her screen she saw an image of herself. i had never been seen with different people before. I smile despite myself. my head sticks up. he leaned forward and nuzzled my neck softly. the video is accompanied by a special video reinforcement. he also served on whitechapel borough council. jordan is willing to meet new borders: lessons from the arab - israeli: ramallah negotiations “ (pdf). israelnetwork. com. in their second season hull kingston rovers finished 10th in the 71st division two of league one and were replaced by barrow who were struggling with salary shortages. no cctv or video system exists during the presence of the firemen. on the side of the right direction side of miller road is the lookout. for me, the past weeks were full of simple, clean things for me, but even so, i feel better - although not by so much. christopher parker
Evolution of Neural Text Generation: Comparative Analysis
801
2.5 GPT-2 Generative Pretrained Transformer 2 (GPT 2) [6], released in February 2019 by OpenAI is the successor of the original GPT and is used to accurately predict the next token given a sequence of tokens in an unsupervised way. GPT-2 uses Byte Pair Encoding (BPE) on UTF-8 byte sequences. UTF-8 supports 231 characters in total. Using the byte sequence representation, GPT-2 is able to assign a probability to any unicode string, regardless of any preprocessing steps. Based on the content of the given conditioning text, it adapts to its style and generates synthetic text samples in response to the model. Instead of using an existing dataset, the GPT 2 is trained based on a web scraping technique in which the content for data comes from the links posted to Reddit which are rated at least 3 karma. Since the model uses text written by humans instead of an existing dataset so the samples generated by the model are more robust, creative and meaningful as compared to those generated by previous models. Consider the initial text, “It was raining heavily the previous day..” Reference Text: and they started running. They were just starting to make their way up to the hill to the end of the path. They wanted to start climbing so they said, “What? What do we want to do?” We were aware of all the ideas that the kids were working through and then we started thinking about making plans. Text Generated: and they were running. They were just starting to make their way up the hill to the end of the path. They wanted to start climbing so they would get there to get started, so they were like, “What? What do we want to do?” We had all these ideas that the kids were working through and then we had to start thinking about them and making plans. As shown above, GPT-2 and BERT produce the best results. They produce similar BLEU [7, 8] scores. However, GPT-2 performs slightly better than BERT by correctly generating punctuation marks and thus generating more syntactically correct sentences.
3 Conclusion and Future Scope Word2vec (Refer to Table 1) and GloVe (Refer to Tables 2 and 3) one of the earlier methods, are not very suitable for Text Generation. However, they do produce meaningful texts on datasets related to a single topic. After this came ELMo (Refer Table 1 BLEU score for word2vec implementation on arxiv_abstracts dataset [11, 12]
BLEU-1
BLEU-2
BLEU-3
BLEU-4
0.700000
0.557773
0.492144
0.427287
BLEU-1
BLEU-2
BLEU-3
BLEU-4
0.593587
0.356702
0.217844
0.000000
802
L. Kurup et al.
Table 2 BLEU score for GloVe implementation on arxiv_abstracts dataset [11] BLEU-1
BLEU-2
BLEU-3
BLEU-4
0.661337
0.524324
0.414472
0.340235
Table 3 BLEU score for GloVe implementation on speeches dataset [13] BLEU-1
BLEU-2
BLEU-3
BLEU-4
0.489529
0.312395
0.257277
0.217595
Table 4 Nearest neighbors to the word “bank” using GLoVe and context embeddings using ELMo Embedding Source
Nearest neighbors
Glove
bank
Banks, banked, banking, finance, currency, money
ELMo
The bank account has 300 rupees {….} {…} The bank accounts were cleared by They were riding down the north bank him and…. She sat beside the bank{…} of creek {….}
Table 5 BLEU score for BERT implementation on BooksCorpus and English Wikipedia dataset Self_BLEU
BLEU-1
BLEU-2
BLEU-3
0.42153
0.77431
0.64321
0.54167
Self_BLEU
BLEU-1
BLEU-2
BLEU-3
0.51963
0.84915
0.73423
0.68167
to Table 4) which produced context-dependent word embeddings. BERT (Refer to Table 5) was a further advancement, which has produced state-of-the-art results and has given a greater BLEU score than both word2vec and GloVe. Then came the GPT-2, which outperformed all of the previous methods for Text Generation. Based on our implementation, BERT was observed to perform well for filling in the blanks, and GPT-2 performed well for story generation. The GPT-2 (Refer to Table 6) has produced the most accurate results that are both syntactically and semantically correct. BLEU [7, 8], or Bilingual Evaluation understudy, is a score for comparing a candidate translation of the text to one or more reference translations. A perfect match results in a score of 1.0 and a perfect mismatch results in a score of 0.0. The score was developed for evaluating the predictions made by automatic machine Table 6 BLEU score with GPT-2 implementation on dataset based on preprocessed Reddit articles BLEU-1
BLEU-2
BLEU-3
BLEU-4
0.764706
0.692364
0.636376
0.590079
Evolution of Neural Text Generation: Comparative Analysis
803
translation systems. BLEU cumulative scores refer to the calculation of individual ngram scores at all orders from 1 to n and weighting them by calculating the weighted geometric mean. Therefore, we have used BLEU for evaluating the quality of the text based on a reference sentence. Although BERT [9] has shown a significant improvement as compared to all the previous models, XLNet [14] released in June 2019 has proved to outperform BERT in most of NLP tasks. XLNet [15] which is based on Transformer-XL is a stateof-the-art generalized autoregressive (AR) pretrained model. BERT is basically an example of an auto-encoding model (AE) in which we have a set of input tokens and some tokens from them are replaced with special tokens called [mask] and then the model is trained in order to predict those masked tokens. But while predicting the tokens, BERT assumes that the tokens are independent of each other. Thus, BERT does not consider the dependency between these tokens. XLNet, on the other hand, is based on the best of both AR language modeling and AE. Let’s consider a sequence of tokens [Silicon, Valley, is, a, global, center, for, high, technology, and, innovation]. Let [Silicon, Valley] be the two tokens selected by BERT and XLNet which will be [masked]. So, in this case, XLNet will be able to capture the dependency between both the tokens [Silicon] and [Valley] together while BERT fails to consider this explicitly. Instead, it will consider the tokens as [Silicon, city] and [Valley, city] and thus XLNet is able to capture more dependencies as compared to BERT which gives much better performance. We further plan on implementing Generative Adversarial Networks (GANs) for Text Generation. GANs require a lot of computational power and are particularly hard to train. The non-convergence of the model parameters coupled with the occasional mode collapse of the generator makes it one of the intensive algorithms to implement. Overfitting is also a major problem associated with GANs. We further plan on implementing GANs for Text Generation and using the correct Generator and Discriminator networks to try and produce results as good as BERT and GPT-2.
References 1. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) 2. J. Pennington, R. Socher, C.D. Manning, GloVe: global vectors for word representation, in Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014) 3. M.E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer, Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018) 4. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805 (2019) 5. A. Wang, K. Cho, BERT has a mouth, and it must speak: BERT as a Markov random field language model. arXiv preprint arXiv:1902.04094 (Apr 2019) 6. A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners (2019) 7. A. Wang, J. Hula, P. Xia, R. Pappagari, R. Thomas McCoy, R. Patel, N. Kim, I. Tenney, Y. Huang, K. Yu, S. Jin, B. Chen, B. Van Durme, E. Grave, E. Pavlick, S.R. Bowman, Can you
804
8.
9. 10. 11. 12. 13. 14. 15.
L. Kurup et al. tell me how to get past sesame street? Sentence-level pretraining beyond language modeling. arXiv preprint arXiv:1812.10860v5 (2019) K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, BLEU: a method for automatic evaluation of machine translation. in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 311–318 J. Howard, S. Ruder, Universal language model fine-tuning for text classification. arXiv preprint arXiv: 1801.06146v5, (2018) M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, Ł. Kaiser, Universal transformer. arXiv preprint arXiv:1807.03819 (2019) R. McDermott, Trump-speeches, Github repository (2016). https://github.com/ryanmcdermott/ trump-speeches/blob/master/speeches.txt Y. Wang, RNN_Sequence, Github Repository (2017). https://github.com/alanwang93/RNN_ Sequence/blob/master/data/arxiv/arxiv_abstracts.txt Y. Zhu, S. Lu, L. Zheng, J. Guo, W. Zhang, J. Wang, Y. Yu, Texygen: a benchmarking platform for text generation models. arXiv preprint arXiv:1802.0188v1 (2018) https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupe rvised_multitask_learners.pdf. Accessed 2nd July 2019 Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, Q.V. Le, XLNet: generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237 (2019)
Research on the Status and Strategy of Developing Financial Technology in China Commercial Bank Ze-peng Chen, Jie-hua Xie, Cheng-qing Li, Jie Xiao, and Zi-yi Huang
Abstract Financial technology is a technology-driven financial innovation. The emergence and application of financial science and technology in recent years have brought about significant changes in the overall financial system, increased competition in the banking market, and also provided a new direction for China’s financial development. At present, the scale of domestic commercial banks has slowed down, their profitability has declined, and the amount of non-performing loans has continued to increase. Bank operating pressures have continued to increase and operations are in a critical period of transition. Financial science and technology gives full play to the enabling role of technological innovation in financial development, broadens the boundaries of financial development, and further promotes the changes in business concepts, business models, and service methods of commercial banks, and thus plays a role in the transformation of financial science and technology competitiveness in banking operations. It is of utmost importance that domestic and foreign commercial banks have increased investment in R&D for financial science and technology and promote the application of new technologies. At present, China has developed into the world’s largest mobile payment and Internet market. Many populations, as well as changes in demand and habits, have also laid a solid foundation for commercial banks to develop financial technology. To this end, in this new phase of financial and technological integration and win-win situation, if commercial banks can grasp the development opportunities of financial technology as quickly as possible and strengthen the development of financial technology, they will inject new business into the new round of business development. To enhance the competitiveness of bank operations, so as to achieve business transformation. This article analyzes the domestic banking business situation and the impact of financial science Z. Chen · J. Xiao Industrial Commercial Bank of China Ltd Guangdong Yangjiang Branch, 529500 Yangjiang, China J. Xie · C. Li (B) Industrial Commercial Bank of China Ltd Guangdong Branch, 510120 Guangzhou, China e-mail: [email protected] Z. Huang Lingnan College, Sun Yat-Sen University, 510120 Guangzhou, China © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_72
805
806
Z. Chen et al.
and technology, further analyzes and summarizes the status quo and experience of domestic and foreign commercial banks in the development of financial technology, and combines domestic commercial banks’ actual conditions to propose relevant development proposals. Keywords Finance science and technology · Commercial banks · Development status · Countermeasures In recent years, the financial reform caused by the development of financial science and technology has attracted worldwide attention. Financial and scientific innovation has accelerated the change of the market competition pattern of banks and has reconstructed the mode of banking business development. For commercial banks, whether they can grasp the wave of development of financial technology is the key to enhance competitiveness and achieve business transformation.
1 Introduction In 2017, the total assets of China’s commercial banks were 252.4 trillion yuan, and the total liabilities were 232.9 trillion yuan, a year-on-year growth rate slowed by 4.5 and 5.2 percentage points, respectively. The growth rate of the scale slowed down. In 2017, the net profit of Chinese commercial banks is 1.7 trillion yuan, increased by 5.99% compared with the same period. The profitability of banks rose slightly,but the profitability declined. However, their recovery was not stable. Their ROA and ROE were 0.92% and 12.56%, respectively, a year-on-year decrease of 0.06 and 0.82 percentage points, respectively. ROA and ROE continued decline: The balance of non-performing loans was 1.7 trillion yuan, a year-on-year increase of 15.77%, and the non-performing loan ratio was 1.74%, which was the same as the previous year. Non-performing loans still increased. At present, the domestic economic recovery is still not stable, and it is in the transition period of the new and old economic kinetic energy. In addition, financial institutions are in a strong regulatory stage, and the regulatory authorities have further strengthened the financial chaos, resulting in inadequate bank development and balance sheet expansion [1]. With the impact of the financial and technological changes in recent years on the overall operation and business innovation of commercial banks, it has created a greater impact on domestic commercial banks, especially in the field of Internet and mobile payment, which has undergone disruptive changes, and has further exposed that the traditional business model of commercial banks is in dilemma. According to Accenture’s forecast data, as financial technology gradually affects the banking industry, the global banking industry will have approximately 30% of its operating income by 2020 due to financial technology, with deposits affecting approximately 17.4%, and credit card and payment implications. About 6%, the impact of loans is about 4.9%, and the impact of asset management is about 3.4%. At
Research on the Status and Strategy of Developing Financial …
807
present, China’s Internet companies and financial technology companies rely on their platform advantages and technological advantages, as well as the keen sense of the market and the concept of service that pays great attention to the customer experience. This has given rise to many financial innovations and has formed huge scale for the customers, businesses, and channels of commercial banks. The diversionary pressure has even had a subversive impact on the traditional banking service model. It can be seen that currently China’s commercial banks must build new core competitiveness—technical innovation capabilities. The use of big data, cloud computing, blockchain and other cutting-edge financial technologies has become an important direction for commercial banks, and financial technology will promote commercial banks’ changes in business concepts, business models, service methods, and other changes to meet the current developments in the business situation. At present, some domestic commercial banks that have forward-looking business ideas have also begun to take actions to actively develop financial technology and build cross-border financial platforms such as e-commerce, P2P, direct sales banking, smart investment advice, and life services to meet the needs of the new economy and to actively integrate financial technology development.
2 Development of Financial Science and Technology in Commercial Banks at Home and Abroad and Its Enlightenment Financial technology is a technology-driven financial innovation that can provide the society with an equal and unique new supply, inject new vitality into financial development, and can reshape the structure of customer behavior, business models, and financial services, and can be used by new groups of people. Provide new financial products and services under new scenarios, such as reconstructing existing financial formats, such as smart investment advice, blockchain, online loans, and mobile banking, which can enhance the distribution of financial scenarios, aggregate payments, and innovation insurance; The objects covered by the expandable financial services, such as credit information and online loans, have enabled some groups that could not obtain financial services to obtain financial services, such as crowdfunding, because the Internet and other technologies enable individuals or businesses to obtain financing.
2.1 Foreign Major Bank Financial Technology Development In 2017, the global banking industry generally increased investment in R&D of financial science and technology and promoted the application of new technologies. This has become a universal choice for major banks worldwide. Among them,
808
Z. Chen et al.
the convenient real-time payment service field, the expansion and optimization of online channel service functions, and the development and application of cuttingedge technologies such as artificial intelligence and blockchain are the key areas for the development of international banking financial technology. At present, the main modes for the development of finance by foreign commercial banks are the following.
2.1.1
National Strategy to Determine Development
At present, Germany has determined the strategy for the development of financial technology from the national strategic level, and has ensured the development and application of financial science and technology through the development of business incubators and banks. In March 2017, the German Ministry of Finance established the FinTech Commission to further encourage the development of the financial technology industry. This committee is composed of 20 members of the financial technology, banking, insurance and research institutions, and specializes in the application of digital technologies in the financial sector. At the same time, Frankfurt fintech incubator was established in Frankfurt in January 2017, which attracted 13 major corporate partners such as Deutsche Bank and Frankfurt Savings Bank to enter it and to achieve cooperation.
2.1.2
Bank’s Own All-Round Development of Financial Technology
The large foreign banks with strong individual strengths have positioned themselves as financial technology companies and have developed their financial technology in all aspects of their own resources. Among them, JPMorgan Chase has made outstanding achievements, and JPMorgan Chase has positioned itself as an investment bank and a technology company. Investment in blockchains, artificial intelligence, and big data technologies each exceeds US$9 billion each year. There are 40,000 technicians, including programmers and system engineers, accounting for 1/6 of all employees. At the same time, JPMorgan Chase created innovations in financial technology through activities such as the “residentization plan,” “FinLab Challenge,” and “strategic investment case.” JP Morgan Chase gained many technological innovation ideas and proposals to promote the vitality of the organization and also to fill the gaps. The blank of its own product service further wins the follow-up business opportunities and continuing customer resources.
2.1.3
Banking Group Development
Some banks, under the spur of external fierce competition, cooperated with each other to cope with the impact of financial technology. In June 2017, more than 30 mainstream banks in the USA jointly launched the P2P real-time payment network
Research on the Status and Strategy of Developing Financial …
809
Zelle. This payment service has been implanted in mobile phones of various banks. The bank’s app does not need the customer to install a new app, and to enter a bank account. Simply providing the transaction partner’s registered e-mail or mobile phone number can be used for real-time payment or collection transactions, easy services, identifing the weak link of the Venmo, PayPal and other third-party payment companies, and then, breaking the rule of many third-party payment companies like Venmo, PayPal, etc. The bank’s app made great progress in the field of real-time payment.
2.1.4
The Individual Business Lines of Individual Banks Are Scattered
Some banks still exist in the mode of individual development of individual business lines. At present, such models are mainly based on smaller banks. They are mainly used to promote the development of advantageous products and further consolidate the advantages of a business segment. For example, Natwest Bank of Scotland Royal Bank launched the paperless mortgage loan business in the UK, fully using electronic channels to complete the application for mortgage loan business, arranging mortgage consultants to guide customers to complete the process, and providing online electronic signature to complete the mortgage contract signed by customers. BNP Paribas has launched an app based on virtual reality technology that can provide customers with service experience such as querying bank transaction records and visualizing the purchase process of real estate. In October 2017, Bank of America Merrill Lynch introduced the CashPro Assisstant, a smart analysis and forecasting tool based on artificial intelligence and API technology, to customers of CashPro cash management. Customers can extract required account information online for centralized use and analysis, thereby improving company financial analysis efficiency.
2.1.5
Promoting the Application of Results While Actively Nurturing Technological Innovation Capabilities
While promoting the application of financial scientific and technological achievements, major commercial banks abroad also attach importance to nurturing sustainable technological innovation capabilities. For example, Citibank, JP Morgan Chase, and Bank of America all set up science and technology laboratories to explore commercial applications of commercial banks. Some commercial banks, through hiring external technical experts to form committees, set up scientific and technological ventures, innovation ecosystems, and set up financial science and technology venture capital funds to build and improve the cooperation mechanism between banks and external scientific and technological forces, and to seize the wisdom resources of continuous technological innovation. For example, at the beginning of 2017, HSBC Group set up a scientific and technical advisory committee composed of senior scientific experts and entrepreneurs from the USA, China, India, and Israel, and the group’s chief operating officer serves as the chairman of the committee to guide the relevant
810
Z. Chen et al.
infrastructure of the HSBC Group’s IT strategy, digital development, information security, technological innovation, and other aspects of work. Overall, foreign banks are advancing various technological innovations with an open and positive attitude, and are committed to lead the industry in innovation, lead cooperation within and outside the industry, and the formulation of new standards. At the same time, through efforts to overcome the inertial thinking of the bank’s internal development, it has actively tried to cooperate with external scientific and technological forces for joint development.
2.2 Domestic Major Bank Financial Technology Development In recent years, China’s FinTech investment has grown rapidly. In 2016, China’s fintech investment and financing amounted to 281, accounting for 56% of the world’s total; the total amount was RMB 87.5 billion, accounting for 77% of the world; the US International Trade Agency believes that the overall development of China’s financial technology market ranks the world’s second.
2.2.1
Financial Technology Investment Increases
In recent years, major domestic banks have increased their investment in financial science and technology in order to gain a competitive advantage. For instance, China Merchants Bank adopts 1% of pre-tax profit in the previous year to draw special funds to set up financial science and technology innovation project funds, encouraging the whole bank to use emerging technologies for financial innovation; Hua Xia Bank will set up 1% of the total annual pre-tax profit to set up scientific and technological innovation funds for support, scientific and technological innovation projects and research, and rewarding related personnel, in order to attract more people to participate in the financial science and technology innovation and development.
2.2.2
Individual Banks Have Identified Strategic Priorities for Development and Built Support Platforms
Some domestic large banks have taken the initiative to adapt to changes in the environment and identified the priorities of their development strategies. They have taken advantage of cutting-edge technologies such as big data and cloud computing, upgraded their technologies, accelerated the creation of core competitiveness in scientific and technological innovation capabilities, and integrated new business
Research on the Status and Strategy of Developing Financial …
811
development requirements, and new IT architecture to better serve the development of financial technology. For instance, the Bank of China will create development advantages and break the bottleneck of development. It will inject scientific and technological elements into the whole process and the entire field, and gradually build a digital environment with rich ecological scenes, collaborative online and offline, extreme user experience, flexible product innovation, efficient operation management, and intelligent digital control of risks bank. Ping An Group considers splitting its technology and Internet subsidiaries into the market. The future dualdrive strategy of “Funding + Technology” will provide Ping An Group with a leap forward in terms of profitability and value, and will focus on the two major areas of financial technology and medical technology, and will strive to become world’s leading financial technology company.
2.2.3
Strengthening Cooperation with Financial Technology Companies
Under the competitive environment, the subjective willingness and cooperation between domestic commercial banks and financial technology companies have also gradually increased and accelerated. Financial technology companies have exported scientific and technological capabilities to inject new vitality into commercial banks and upgrade their technological capabilities. For example, the four major banks have taken the lead in developing strategic cooperation with Tencent, Alibaba, Baidu and Jingdong, and some small- and medium-sized banks have also started similar cooperation. For example, Nanjing Bank is positioned in the strategy of Internet finance and big data, and with Alibaba Group, the three-party strategic cooperation of Ant Financial Group has successfully launched the “Xinyun+” Internet finance open platform of Nanjing Bank. The cooperation and development of banks and financial technology companies will further promote changes in the bank’s ecology.
2.3 Financial Science and Technology Development Enlightenment Under the background of major changes in the external business environment and the industry’s operating situation, large international banks have changed from time to time, and they have already advanced their financial technology deployment and achieved good results. This has provided a good inspiration to domestic commercial banks in the development of financial technology. From the development of financial science and technology from domestic and foreign banks, it can be seen that the development of financial science and technology requires a clear strategic direction, ensuring continuous investment in resources and innovative R&D practices, but also
812
Z. Chen et al.
satisfying the needs of customers and providing good financial service efficiency and customer experience in order to ensure never fail.
2.3.1
Determine the Bank’s Strategic Positioning
The major banks all regard the development of financial technology as a core strategy of the bank and incorporate it into the development plan of the bank or group as a whole or a business line. In combination with the bank’s own core advantages and realistic environmental conditions, the major banks have different characteristics. The financial and technological strategy will strategically maintain its own competitive advantage. It can be seen that banks’ development of financial science and technology needs to be good at using external forces, effectively utilizing external resources to expand their own technologies and service capabilities, and improving the transformation and application capabilities of financial science and technology.
2.3.2
Continuous Resource Input
Banks with better development of financial science and technology are generally the development goals of financial science and technology. They receive the unanimous attention and support from the highest level of the bank. On this basis, they form continuous large-scale investment in resources such as human resources, material resources, and policies, and they can build on the actual situation. Sustained technological R&D and achievement transformation and application mechanisms ensure the development of financial science and technology in terms of basic safeguards.
2.3.3
Good at Developing with the Help of External Forces
Major international commercial banks have always paid attention to the establishment of mutually beneficial cooperation with other institutions and market entities in the research, development, and application of financial technology, and have achieved good results. Individual domestic commercial banks have only begun to try to cooperate with external science and technology forces and jointly develop these two years.
2.3.4
Domestic Banks Have Advantages in Scale Application
Customer demand and experience are the fundamentals of banking financial technology development. From the perspective of the development of domestic bank financial technology, domestic banks currently have stronger innovation capabilities at the business application end than foreign banks, plus China’s large population, especially in recent years, changes in the way of life are digitalized. The degree of
Research on the Status and Strategy of Developing Financial …
813
acceptance of financial services is high and the overall market has obvious advantages. New types of financial technology products and market operations are easily applied and promoted on a largescale.
3 Existential Development Issues At present, the development of China’s financial science and technology is still dominated by the development of non-financial financial technology companies, although commercial banks have advantages such as the number of customers, longterm customer stickiness, and a complete risk management system. However, there are also obvious problems such as the first-mover advantage is not obvious, the traditional development thinking is difficult to change and talent constraints and other development issues [2].
3.1 Development of First-Mover Advantage Is Not Obvious At present, the main players in the financial technology field are traditional financial institutions (such as banks, brokers, insurance companies, etc.) and Internet companies (such as e-commerce, social media, search engines, portals, etc.), as well as startups specializing in financial technology. R&D companies with application models (such as companies focused on the research and application of blockchain, financial technology companies that specialize in R&D credit model, risk pricing model, risk control model, investment consulting model, and focus on using artificial intelligence technology Financial institutions reform information systems, etc.), and financial companies and platforms that specialize in financial science and technology applications (such as independent third-party payment companies, P2P platforms, Internet-based insurance, credit information, wealth management, investment advice, crowdfunding, etc.) Wait. At present, the development of financial science and technology is still dominated by the development of non-financial institutions. The commercial bank’s first-mover advantage in the development of financial technology is not obvious. Commercial banks are in the vanguard of financial technology development. Internet companies such as Ali, Tencent, and Baidu have made significant progress at this stage and have emerged as competitors with global influence. To achieve surpassing, commercial banks must increase capital investment in order to realize overtaking and to gain competitive advantage in a new round of competition.
814
Z. Chen et al.
3.2 Development Thinking Has Not Yet Completely Changed The two core elements of financial science and technology are data and technology. Different methods of use determine the difference between the development thinking of financial science and technology and the thinking of traditional commercial bank operations. Financial science and technology emphasizes the application of technology and focuses on the construction and accumulation of information technology, and gives full play to the enabling role of technological innovation in finance. New technologies based on information technology are applied to the financial industry chain to optimize financial functions and expand the boundaries of financial services. At present, the attitude of commercial banks to the development of financial science and technology is still not entirely clear. There are certain contradictions in the acceptance of new things and the concept of complex contradictions. Some financial institutions still have the attitude of business competition and resistance defense against financial technology. Moreover, the traditional development thinking of commercial banks cannot be effectively reversed. Although commercial banks are also changing their development ideas, due to the large industry span and lack of familiarity with science and technology integration, it is still difficult to break the traditional conventions, beyond the established dependence, and the financial and scientific development of commercial banks. Most still remain in the form of development.
3.3 There Is Still a Gap in the Main Technologies for Development The current comprehensive application of technologies such as computers, the Internet, cloud computing, artificial intelligence, and big data applications is an important foundation for the development of financial science and technology. At the same time, it is popularized and applied to banking services. The bottom is still based on the development of technology. At present, although many banks use financial technology to name the banking business. However, judging from the current state of development, commercial banks are often concerned about financial technology and businesses followers. However, there are few initiatives for innovation and development, coupled with the slow technological innovation of banks at this stage, and the inability to quickly update their systems and applications, and enterprising finances. Compared with technology companies, most commercial banks have gaps in technology development technology and non-financial financial technology companies, and there are still technical gaps in cloud computing, big data processing, and artificial intelligence applications.
Research on the Status and Strategy of Developing Financial …
815
3.4 Development Lacks Top-Level Design and Effective Organization Promotion Due to the short development time of financial science and technology and the fragmented development in the earlier period, commercial banks have not paid full attention to the development of financial technology and have not formed top-level design strategies, and lack effective organizational promotion frameworks and efficient innovation processes. Most commercial banks do not yet have one. Specialized functional departments took the lead in planning and promoting financial science and technology, and related functions were scattered in the retail business department, electronic banking department, and the Ministry of Science and Technology. Commercial banks used the traditional development system to create new business innovation cycles that were difficult to match with the ever-changing market rhythm; coupled with the failure to continue large-scale resource input, Commercial banks have not yet formed an effective organizational promotion and sustainable technology research and development and application transformation mechanism.
3.5 Development Faces Talent Constraints for Development Currently, the domestic banking industry mainly recruits employees in the economics, finance, finance, law, and management professions. Its professional background is dominated by the assets, liabilities, and intermediate businesses of major banks. Its experience in work experience accumulation is mainly based on experience in the business sector, application experience and work background, rather than technology research and development. From the perspective of the development of financial science and technology, commercial development of financial science and technology requires complex talents who understand both technology and finance. The high-end compound talents of commercial banks in the development of financial technology are still scarce. The quality of financial science and technology personnel and the effective allocation of human resources determine the success or failure of development. At the same time, there are also very few expert talents who have conducted in-depth studies in the frontier financial science and technology fields such as artificial intelligence and blockchain, and need to introduce talents in relevant fields.
816
Z. Chen et al.
4 Commercial Bank Financial Technology Development Strategy Thanks to China’s huge population base and market size, commercial banks have a large number of business customers and business licenses and other business qualifications, which is a natural advantage for commercial banks to develop financial technology. For this reason, commercial banks can integrate domestic business characteristics and their own advantages. Business development serves as an entry point to jump out of the banking framework to improve the bank’s thinking and accelerate the development of financial technology.
4.1 Establish a Scientific Concept of Financial Technology Development The thinking of financial science and technology is different from that of traditional commercial banks. It focuses on the integration of technologies to reshape the platform, reshape data, and reshape services to fully assist financial institutions in achieving digital transformation. The customer needs and experience are financial factors. The fundamental orientation of the development of science and technology needs to focus on the integration of service application scenarios. For this reason, commercial banks must change the concept of financial science and technology development, and need to have a way out of the banking business framework and business model, and comply with the current financial needs of residents and the changing trends. The real “customer-centric” development philosophy has been changed, the situation has been reviewed, and initiatives have been actively taken to develop new ideas for financial science and technology and to adjust their own business strategies in a timely manner [2].
4.2 Rebuilding a Sound Organizational Structure and Development Model Commercial banks should combine the development characteristics of financial technology, refer to the strategies and experiences of domestic and foreign counterparts, and integrate the Bank’s operating resources and customer resources to implement top-level design of the entire financial technology development framework and development strategy, and start commercial bank-related businesses. The design of the transformational architecture of the field provides a feasible development plan for accelerating the strategic layout of commercial banks in the field of financial technology.
Research on the Status and Strategy of Developing Financial …
817
First, develop from the top level, innovate the top-level fintech institutions, establish the highest-level fintech development decision-making body of the bank, determine the sustainable and stable strategic goals, and guide the development of fintech from the strategic direction. Second, it is possible to set up a restructured financial science and technology institution, set up a special financial science and technology development department, strengthen the unified promotion and planning of the entire bank’s financial science and technology, and clarify the responsibilities of the relevant departments in the development of financial science and technology so as to promote the synergy of various departments. The third is the reengineering of business processes. For existing business processes, it should be sorted out as soon as possible to establish an efficient business process that is suitable for the trend of financial science and technology. The fourth is to reconstruct a reasonable appraisal mechanism. The difference between the development of financial science and technology and traditional banks is large. The traditional appraisal methods cannot reflect the business characteristics of the bank’s financial science and technology personnel. To this end, it is necessary to optimize the responsibilities of the relevant departments and personnel, and try fault-tolerance mechanisms. Establish assessment rules that adapt to the development of banking financial science and technology. Fifth, to determine sustainable and steady strategic support and resource input, consensus needs to be reached at the highest level to determine effective and sustained large-scale resource input, and targeted measures must be taken to ensure that the bank’s resources can be accurately invested in related fields and realized. The expected effect will fundamentally establish the basic guarantee mechanism for the development of financial and scientific development of commercial banks.
4.3 Strengthen the Development of Financial Science and Technology Applications 4.3.1
Improve the Development of Its Own Financial Technology Applications
The weaker underlying technological innovation capability is the biggest shortfall for the development of commercial banks’ financial science and technology. Commercial banks should increase investment in areas such as research, experimentation, and commercialization according to their endowments to achieve effective resource allocation, and respond to conventional technology updates and applications. Technological innovation needs to effectively differentiate development. First, commercial banks with more resources can invest in high-tech projects to develop their own suitable financial science and technology, such as the establishment of science and technology laboratories, the exploration and tracking of new technologies, and the
818
Z. Chen et al.
introduction of project incubation and other mechanisms for successful incubation projects. Give rewards and encourage the main branch to actively explore the application of new technologies so as to create its own core competitiveness of financial science and technology development and achieve leapfrog development. The second is to actively participate in the forefront conference of the financial industry, increase the understanding of the latest technology of the industry at all levels of the bank, and share the case of international and domestic science and technology finance. Third, it is possible to explore the establishment of merger and acquisition funds related to the development of financial science and technology, and to absorb the advanced financial technology in the market through mergers and acquisitions projects or enterprises, and increase the channels for the introduction of financial technology for commercial banks in all directions.
4.3.2
Strengthen External Cooperation
Commercial banks can develop financial technology and can strengthen cooperation with domestic and foreign advanced financial technology companies in order to achieve complementary advantages and cooperation, and to jointly develop new markets and create a win-win situation. The first is to use cooperation and use and analyze the unique business models and technological innovation research capabilities of financial technology companies to fill gaps in their services and product chains to create synergies. The second is to use the advantages of risk control to provide financial services such as depository management and payment settlement for thirdparty payment agencies and online loan platforms and help improve the credit level and market reputation of financial technology companies, and tap the current market through cooperation. There may be unmet business demand points and provide corresponding financial services. The third is to conduct in-depth cooperation with scientific and technical professional assessment agencies, intellectual property assessment agencies, science and technology universities, and excellent incubators or parks in the industry to research and design exclusive financing products and financial services so as to realize the excavation and control of source resources of Kechuang, and to absorb different corporate innovation ideas and business, to fill the gaps in their products and services, but also to win the continuous customer resources and business opportunities, to seize the development opportunities.
4.4 Building a Financial and Technological Manpower Construction System With the development of financial science and technology, commercial banks have made great changes in the demand for financial science and technology talents. Therefore, it is necessary to strengthen the recruitment, distribution, use, and cultivation
Research on the Status and Strategy of Developing Financial …
819
of innovative talents in financial science and technology. Commercial banks can use the knowledge of financial business knowledge, network information technology, marketing skills, and the use of financial science and technology in a variety of knowledge and skills, such as choosing financial and computer double-degree graduates at the time of recruitment, or training young employees to become comprehensive Internet finance talents. At the same time, we will create an environment conducive to the gathering of talents in financial science and technology and give them a tilt in terms of policies, assessments, and performance. On the one hand, we can strengthen our internal training to train relevant talents that meet our own development conditions, and encourage financial science and technology talents to rotate in various departments to integrate with business more closely; on the other hand, universities and society can introduce talents of financial science and technology through the form of talent exchange, and strengthen the quality of human resources through the integration of production, teaching and research, exchange training, etc., in order to promote the development of commercial banks’ financial science and technology. Institutions provide adequate personnel protection [2].
References 1. M. Jia, C. Li, Thinking of the operational transformation of china’s commercial banks from the perspective of the new bank supervision model. Southern Fin. (11), 27–31 (2011) 2. Z. Chen, X. Jie, C. Li, Thinking of commercial banks developing finance technology under the new situation. Int. Fin. (2), 37–41 (2018)
Understanding Issues Affecting the Dissemination of Weather Forecast in the Philippines: A Case Study on DOST PAGASA Mobile Application Lory Jean L. Canillo and Bryan G. Dadiz
Abstract Mobile phone-based technologies are becoming the trend of the digital generation. Prior studies have shown that the simplicity of mobile devices makes it more conducive in the weather forecast dissemination. However, there are only few that explore the issues that may affect the delivery of the forecast via mobile phonebased apps. In addressing the issue, a case was conducted on the DOST PAGASA mobile app. Using a qualitative approach, it was found out that the weather forecast information issues and the mobile app usability issues have greatly influenced the dissemination of weather forecast in the mobile app. Nevertheless, the app usability as well as the user’s satisfaction was critical in this study. Keywords Weather apps · Human–computer interaction · Mobile phone-based technology
1 Introduction The study is aimed to apprehend the issues that affect the dissemination of weather forecast via Android application. Recent Information and Communication Technology (ICT) modernizes not only the way people communicate globally but also improves the delivery of information especially in the distribution of the weather forecast information. Moreover, the contemporaneous use of emerging mobile phone-based technologies in forecasting has altered the hoary way of broadcasting the weather warnings like tuning into radios and watching television. Numerous meteorological agencies around the globe like Bureau of Meteorology (BOM) in Australia and the Central Weather Bureau (CWB) in Taiwan have already exploited this technology in their forecasting services. In the L. J. L. Canillo (B) · B. G. Dadiz Philippine Atmospheric Geophysical and Astronomical Services Administration, College of Information Technology Education, Technological Institute of the Philippines, Manila, Philippines e-mail: [email protected] B. G. Dadiz e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_73
821
822
L. J. L. Canillo and B. G. Dadiz
report of the Broadband Commission, they found out that in 2017, 48% of the global population or 3.58 billion people are connected online, and an estimated five billion are unique mobile user’s subscribers [1]. Meanwhile, in the Philippines, the formation of the Department of Communication and Information Technology (DICT) has led to the improvement of ICT infrastructures in the country [2]. Based on research, there are roughly 177 million mobile phone subscribers in the Philippines [3] wherein, out of 101 million population, 130 million has a cellphone [4], the number of subscribers is growing rapidly every year. WeAreSocial 2018 also discovered report that the Philippines is still one of the developing countries that are popular in “social media usage,” and also the most disaster-prone country in the world, with an average of 20 tropical cyclones yearly [5]. In adapting to the rapid development of digital ICT and the varying climate risk, the Philippine Atmospheric Geophysical and Astronomical Services Administration (PAGASA) is employing a mobile weather app in hastening the dissemination of weather forecast particularly in the occurrence of severe weather or tropical cyclones in the Philippine Area of Responsibility (PAR). Public awareness and other behavioral aspects are critical in this study; hence, no matter how much the government delivers, it still rests on the people on how they will use the information wisely [5]. If the people are well-informed of the weather forecast information, the major result will be a disaster resilient community leading to an improved national security, less economic damage, and reduced climate change risk. In achieving this, the proponent conducted semi-structured interviews, observations, and document reviews to determine the different issues that may affect the delivery of weather forecasts via the DOST PAGASA Android App.
2 Literature Review 2.1 Weather Apps and the Forecast Dissemination Mobile technology is the norm of today’s generation. It manifests in the rapid evolution of different mobile applications that equals every human activity, and people were attached to their mobile devices [6] thus becoming an essential part of their life [7]. Moreover, modern ICT has contributed to the expanding of major network technologies globally, 29% of the world’s population has access to 4G mobile network, 31% in 3G, and 40% in 2G. In the Asia Pacific, 34% of the population has access to 4G mobile network, 25% in 3G, and 41% in 2G [1]. According to data, the Philippines ranked as 110th out of 187 countries in terms of broadband penetration, fourth among the ASEAN five countries (Thailand, Singapore, Malaysia, Philippines, and Indonesia), and 89th in terms of mobile penetration [8]. Numerous mobile phone-based technologies depend on the Internet and broadband connection speed in the delivery of information [2], typically weather forecast information dissemination processes require the use of Internet, available telecommunication networks, and broadband speed. It was defined in a report that the broadband is the transmission capacity of
Understanding Issues Affecting the Dissemination of Weather …
823
Table 1 Mobile apps and their weather forecast information features Mobile Apps
Weather forecast information features
BOM weather [10]
Current conditions, forecast, warnings, radar images, and location service
Taiwan weather [11]
Current conditions, radar and satellite images, current observations, air quality, forecast, location services, personalize alarm assistant, weather check-in, unit conversion, alert push notifications, and language options
DOST PAGASA mobile app [12] Notification, general flood advisories, thunderstorm advisories, rainfall warnings, local weather bulletins, satellite and radar images, surface maps, and social media sharing capability During the time of research (October 2018)
Fig. 1 Weather forecast information dissemination process
at least 1.5 or 2.0 (Mbps) or the data connection speed of at least 256 (Kbps) [2]. In addition, the mobile technology was assimilated in forecast dissemination for the reasons like high portability [9] and universally known means of communication, thus significant in ensuring the safety of the people during the passage of tropical cyclones. The forecast information included in the mobile app was shown in Table 1. Collaboration among various Meteorological Organizations yields quality weather forecast information, and thus, in this study, they were identified as the weather forecast information source. Then the product or the “weather forecast information” is being disseminated through mobile apps for the benefits and consumption of the forecast customers; this process was illustrated in Fig. 1.
2.2 Weather Forecast App in the Market The utilization of the weather app is becoming the trend in weather forecasting. Prior research determined diverse type of publicizing weather forecast via mobile such as the use of modern wireless technology and Java technology [13], personalized cloudbased execution of short-term weather forecast [14], iOS application of weather computational server [15], iForecast real-time weather observation [16], locationbased weather app [17], and so on. According to Google out of 200billion apps in the Play Store, there were 540 free and 231 paid weather apps during the time of research. Moreover, various Meteorological
824
L. J. L. Canillo and B. G. Dadiz
Table 2 Mobile weather apps in the market (Google Play Store)
DOST PAGASA Mobile app
BOM weather
Taiwan weather
Overall ratings
3.8
4.7
4.1
Number of installs
100, 000+
1 M+
1M+
During the time of research (October 2018)
Organizations utilized their “official mobile apps” in addition to their current forecasting services [18]. Table 2 shows the different weather apps in terms of ratings and installs in the market.
2.3 Weather Forecasting Practices Ebert in her presentation mentioned that keeping up with the technology, the “voice of the customer” and the stakeholders have helped in the strengthening of the weather information value chain in BOM [19], while in CWB, advance meteorological equipment, modern ICT infrastructures, and relationship with the press and Emergency Management Offices (EMO) have fortified and improved their weather forecast information services [20]. Meanwhile, in the Philippines, the Republic Act 10692 or the “PAGASA Modernization Act of 2015,” paved the way to modernization of ICT and meteorological equipment’s of the bureau whereas the amended Presidential Decree No. 1149 reinforces the PAGASA mandate in providing National Meteorological and Hydrological Services (NMHS) throughout the country [21].
3 Conceptual Framework In conducting the study, weather forecast information dissemination practices of various Meteorological Organizations like the Bureau of Meteorology (BOM) [22] and Central Weather Bureau (CWB) [23] were also examined including their weather app features in the market. Also, the research gives emphasis in the utilization of the weather app in various sectors of the society as well as its usability. Although technical aspects are critical in this study, social behaviors and the forecasting practices were carefully considered as well; the relationship among these variables was shown in Fig. 2.
Understanding Issues Affecting the Dissemination of Weather …
825
Fig. 2 Weather forecast information dissemination via mobile app
4 Methodology In answering the research question, the proponent gathered data using semistructured interviews, observations, document reviews, and focus group discussions. An informed consent form was presented before the start of interview utilizing Robson guidelines [24] and Web-based interviews for better response rate [25]. Furthermore, the proponent also conducted document reviews about the user’s experiences in using the mobile app. In validating the collected data on the mobile app usability, the Goal Question Metric-GQM model [26] has been applied; moreover, analysis of data was done using spreadsheets (MS Excel) and affinity diagrams [27]. Using the gathered data, the purpose of the research is to determine the issues that affect the weather forecast dissemination via mobile phone app, in this case, the DOST PAGASA Android App.
4.1 Description of the Study Area The subject area, the Philippine Atmospheric Geophysical and Astronomical Services Administration (PAGASA) is one of the attached agencies of the Department of Science and Technology (DOST) under its Scientific and Technical Services Institutes. In reducing the disaster caused by weather calamities, the agency integrated numerous technological applications in their forecasting services such as utilization of website, social media pages, telephone inquiries, press conferences/briefing, and mobile app. The following are the major services delivered by the agency: • • • • • •
Weather Forecast and Tropical Cyclone Warning Flood Forecasting and Warning Services Climatological and Farm Weather Services Research and Development Astronomical Services Information, Education and Public Outreach.
826
L. J. L. Canillo and B. G. Dadiz
4.2 Data Collection Method There was a total of 19 research participants for this study, 15 mixed PAGASA employees and outsiders (not affiliated with the agency). Four (4) selected participants take part in the focus group discussions. Table 3 shows the set of questions used during the interviews. The research participants were diverse in terms of age, gender, socio-economic status, and IT experience to get a profound understanding of the issue.
5 Analysis In ensuring the focus of study, the analysis gives emphasis on the weather app usability and individual experiences of the respondents in using the app. Based on the data collected, all participants were Android users and it was also mentioned during discussions that the respondents were interested in the mobile app features, while others browse the app based on their specific needs. The following are the weather app features that are remarkable for them; 1. 2. 3. 4. 5.
Weather forecast notifications, Thunderstorm advisories, Rainfall warnings, Surface maps and satellite images, and Social media sharing capability of the app.
Table 3 Research questions (RQ)
RQ1. What smartphone do you use? RQ2. What weather app do you use? RQ3. How did you hear about the app? RQ4. What do you like about the app? Dislike? RQ5. How much time do you spend on the app? RQ6. What might keep people from using the app? RQ7. What app features do you like the most? RQ8. Why do you like these features? RQ9. What app features you dislike? Why? RQ10. What do you think of the app? What is the most appealing about the app? RQ11. What’s the hardest part about using this app? RQ12. What could be done to improve the app? RQ13. Would you keep using the app? Why? Why Not? RQ14. Would you share the app? RQ15. Anything else you’d like to share about the app?
Understanding Issues Affecting the Dissemination of Weather …
827
Table 4 Weather forecast application reviews in terms of quality characteristics Mobile apps
Measure
App reviews
DOST PAGASA mobile app Effectiveness “Does not work as it should” “No update on weather info or flood warning”
BOM weather
Taiwan weather
Efficiency
“Send notification about the weather advisories “I wasn’t able to view the weather update”
Satisfaction
“Became useless after the update, none of the tabs work”
Effectiveness “It seems to be accurate…” “I constantly have problems with detailed forecast not displaying…” Efficiency
“The radar view takes longer to load after the upgrade…” “Love the quick view and scroll through weeks weather…”
Satisfaction
“Nice clean UI and very responsive…”
Effectiveness “This is the best Taiwan weather app. All info, radar images, warnings in one…” “I wish the USA had accurate weather like this. If it’s says it’s going to rain, it’s going to rain…” Efficiency
“Excellent refresh of the app…”
Satisfaction
“Love it. Best I have seen from any country”
Source Google Play Store (October 2018)
Furthermore, the app was found to be useful and informative. However, based on the data, the PAGASA weather app ranked as third in terms of overall rating and number of app installs and also fall behind in terms of highest app review rate (equivalent to 5 stars) garnering only 12% of the total rates of the BOM Weather app and 32% of the Taiwan Weather app. The app user reviews and experiences were analyzed based on effectiveness, efficiency, and satisfaction. Reviews were selected using the “Most Helpful First” in the Play Store. In this study, the proponent selected five user reviews for each mobile app. Table 4 shows the weather app usability analysis based on GQM model.
6 Results 6.1 Weather Forecast Information Issues • Weather forecast information errors. There are some cases wherein the duty personnel extend his time and multi-task, more than 8 h of work and irate callers sporadically resulted in employee’s exhaustion thus led to typographical errors in the published forecast.
828
L. J. L. Canillo and B. G. Dadiz
Fig. 3 Value chain model of app user’s positive reaction and behavior
• Extravagant Weather Forecast Information. The rich information such as highresolution images prolongs the loading time of the app, thus causing errors in displaying the whole forecast. • User element of interest in the weather forecast. Some of the app users tend to choose the forecast features that seem to be remarkable for them. Noticeably, when they are satisfied, they tend to do something positively like writing good reviews about the app, or expressed confidence about the reliability of the app. However, if NOT they do differently like giving negative feedback or show frustrations in using the app. This user value chain was illustrated in Fig. 3.
6.2 Mobile App Usability Issues • Effectiveness of the weather forecast. The app users show reliability on the weather app when the forecast information was accurate and correct. Moreover, the simplicity of the app makes it more useful and navigable in delivering forecast information. • Efficiency of the weather forecast. Weather update notification features are found to be interesting according to app users. Also, the user shows appreciation of the app when they can easily view the forecast information nevertheless user expressed disappointments when the forecast information takes longer time to load. • User’s satisfaction. The simplicity of the app makes it easier to navigate the forecast features, however; some easily get bored on the app interface, particularly, the unresponsive table that does not go along with small mobile display size likewise the dysfunctional or broken links also added to the user’s frustrations.
7 Conclusion Weather forecast dissemination plays a significant role in ensuring the safety of the public during inclement weather. It was found out in this study that even though the users demonstrated reliance on the forecast information provided by the app, weather forecast information issues, and mobile app usability issues have greatly influenced the dissemination via the weather app. User satisfaction is found to be crucial hence, gave birth to new hypothesis that app users show satisfaction on the accuracy of
Understanding Issues Affecting the Dissemination of Weather …
829
the forecast information even though the app interface is found to be simple and minimalistic. In probing this, there is a need to conduct further studies and future research in a much bigger scope and a larger group of samples.
8 Further Studies The data collected in this study was diversified. In addition, the mobile app usability was measured using only the GQM model, results may vary in other models likewise, the user experiences may also differ in other countries due to the difference in the available ICT infrastructure and other cultural factors. Ethical Approval This chapter does not contain any studies with human participants or animals performed by any of the authors. This chapter contains study of total 19 research participants, out of which 15 are mixed PAGASA employees and four are outsiders (not affiliated with the agency) as per their ethical approval. Informed Consent Informed consent was obtained from all individual participants included in the study.
References 1. GSM Association, The mobile economy 2018, Executive summary 2. National Broadband Plan: Building Infostructures for a Digital Nation, Department of Information and Communication Technology (DICT) (Diliman, Quezon City, 2017) 3. Manabat R.G. & Co, “IT report: Philippines, 2018 investment guide” by KPMG in the Philippines 4. P. Wallace, Telecommunications for Nation Building: National Consensus for Solutions and Progress (The Wallace Business Forum, 2017) 5. T.A. Cinco, R.G. de Guzman, A.M. Ortiz, R.J. Delfino, R.D. Lasco, F.D. Hilario, E.L. Juanillo, R. Barba, E.D. Ares, Observed trends and impacts of tropical cyclones in the Philippines. Int. J. Climatol.. 6(14), 4638–4650 (2016). https://doi.org/10.1002/joc.4659 6. A.M. Lagmay, B.A. Racoma, K.A. Aracan, J. Alconis-Ayco, I.L. Saddi, Disseminating nearreal-time hazards information and flood maps in the Philippines through Web-GIS. J. Environ. Sci. 59, 13–23 (2017) 7. V. Konok, D. Gigler, BM Bereczky, Á Miklósi, Humans’ attachment to their mobile phones and its relationship with interpersonal attachment style. Comput. Hum. Beha. 61, 537547 (2016) https://doi.org/10.1016/j.chb.2016.03.062 8. V.N. Inukollu, D.D. Keshamoni, T. Kang, M. Inukollu, Factors influencing quality of mobile apps: Role of mobile app development life cycle. (2014). CoRR, abs/1410.4537 9. Broadband Commission for Sustainable Development, The State of Broadband: Broadband Catalyzing Sustainable Development (2017) 10. M.V. Japitana, R.G. Damalerio, Android Application Development for Data Capturing and Mapping Land Features: A Mapping and Project Monitoring Tool for Phillidar 2.B.14 Project (College of Engineering & Information Technology Thesis, Caraga State University, Butuan City: Phil-LiDAR,), pp. 1–10. 2.B.14 Project
830
L. J. L. Canillo and B. G. Dadiz
11. BOM Weather, Retrieve from http://www.bom.gov.au/, Google Play Store. Retrieve from https://play.google.com/store/apps/details?id=au.gov.bom.metview&hl%E2%80%89=%E2% 80%89en. Accessed September 2018 12. Taiwan Weather App, Google Play Store. Retrieve from https://play.google.com/store/apps/det ails?id=org.cwb&hl=en. Accessed October 2018 13. DOST PAGASA Android App, Google Play Store: https://play.google.com/store/apps/details? id=dost.pagasa.gov.ph.gc.ccs. Accessed October 2018 14. Z.M, Kalin, K. Modri, Weather Forecast Presentation Based on Leading Edge Mobile Phones and Java Technologies Master’s thesis, University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia, p. 16 15. D.K. Krishnappa, D. Irwin, E. Lyons, M. Zink, Cloudcast: Cloud Computing for Short-Term Mobile Weather Forecasts (Master’s thesis, Electrical and Computer Engineering Department, University of Massachusetts Amherst) 16. T. Duarte, A. Oliveira, R. Trancoso, C. Palma, IOS Application for Detailed Weather Prevision in Continental Portugal (Master’s thesis, METEO-IST, Departamento de Engenharia Mecânica Instituto Superior T´ecnico, Lisboa, Portugal) 17. J. Li, Z. Yang, C. Deng, C. Wei, Research And Implementation of Mobile App for Interactive Weather Service Based on SOA Framework (Master’s thesis, Zhejiang Meteorological Service, Zhejiang, Hangzhou) 18. B.G.Patel, Design and implementation of location-based weather application on android platform. Int. J. Res. Appl. Sci. Eng. Technol. (IJRASET) 5(XII) (2017). ISSN: 2321–9653, 1420-1429 19. Free weather apps, Google Play Store: Retrieved from https://play.google.com/store/apps/cat egory/WEATHER/collection/topselling_free. Paid weather apps: Retrieved from 20. https://play.google.com/store/apps/category/WEATHER/collection/topselling_paid. Accessed September 2018 21. B. Ebert, High-Impact Weather @BOM. Retrieved from http://climateextremes.org.au/wpc ontent/uploads/2018/06/[email protected]. Accessed September 2018 22. C.L. Hsin, T.-C. Yeh, Overview of The Central Weather Bureau (CWB): The Organization and Mission to the Weather-Related Disaster Mitigation. Central Weather Bureau. Retrieved from https://www.ncdr.nat.gov.tw/itp2006/download/02.reports/01.Taiwan/04CWBToday2005.pdf 23. Philippine Atmospheric Geophysical and Astronomical Services Administration (PAGASA) Website and Annual Report. Retrieved from http://bagong.pagasa.dost.gov.ph/. Accessed October 2018 24. Bureau of Meteorology (BOM) Website, Retrieve from http://www.bom.gov.au/. Accessed September 2018 25. Central Weather Bureau (CWB) Website, Retrieve from https://www.cwb.gov.tw/V7e/. Accessed September 2018 26. C. Robson, Real World Research. (Wiley, 2011) 27. V. Diaz de Rada, J.A. Dominguez-Alvarez, Response quality of self-administered questions: A comparison between paper and web. Soc. Sci. Comput. Rev. 32(2), (2014). Sage Publications
Guideme: An Optimized Mobile Learning Model Based on Cloud Offloading Computation Rasha Elstohy, Wael Karam, Nouran Radwan, and Eman Monir
Abstract There is growing interest in using mobile learning systems to improve connection between various partners in educational institutions. With mobile learning, there are variety of users, services, education contents, and resources; in any case, how to deploy M-learning applications is quite a challenging demand. On the other side, the addressed success of cloud computing as a large-scale economic paradigm with virtualization appeared to resolve issues as storage capacity, resource pooling, elasticity, and offloading. This research gained benefits from cloud computing resources and capabilities in proposing effective mobile learning model. We specifically address a case study for students and their learners applied on Egyptian schools. Guideme is implemented based on Android platform with support of text and content offloading facilities. Furthermore, we investigate the performance of proposed model, and we conclude that Guidme model can optimize responsivity by leveraging public cloud server about 1.7% for light computation offloading and 11% while intensive computation offloading. Keywords Mobile learning · Cloud computing · Offloading computation · Response time
R. Elstohy (B) Department of Information Systems, Obour Institutes, El Obour, Egypt e-mail: [email protected] W. Karam · N. Radwan Department of Information Systems, Sadat Academy for Management Sciences, Cairo, Egypt e-mail: [email protected] N. Radwan e-mail: [email protected] E. Monir Department of Scientific Computing, Faculty of Computers and Information, Banha University, Benha, Egypt e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_74
831
832
R. Elstohy et al.
1 Introduction These days e-learning is broadly used by instructive foundations while supporting their learning procedure and give whenever administration to students to access training materials and information. The use of e-learning was executed by a few instructive establishments in Egypt [1]. As Egyptian society turns out to be progressively reliant on innovation, schools are putting more effort in technological means for communication. Access to PC and Internet advances is extending all over world. In work places, schools and homes for both two teachers and guardians are provided with access to electronic communication to access learning material and data. Meanwhile, at these days, cell phones or smartphones are broadly utilized and gave extraordinary correspondence and multimedia capabilities, which make the conveyance of learning exercises and out of class cooperation a progressively practical methoolgy [2].Eventhough, e-learning affords many benefits including flexibility, variety, mobility, and others. Still One explanation why cell phones are not used to communicate among educational institutions may be the trust on performance [3]. On the other side, unreliable infrastructure turns into a noteworthy issue while implementing m-learning. Institutions need to execute m-learning troubles in the obtainment of server/PC, storage, and stable networks [4]. That is beside issues related to professional skills, not the majority of the organizations have the expert staff for planning, creating such apps to oversee e-learning. However, as the IT world improves, distributed computing is gradually becoming a new worldview of innovation in the IT world, hence cloud computing features with pooling of assets. The computing resources of the company were pooled to support more clients with variant virtual and physical resources which powerfully relegated and reassigned by customer request. Therefore, cloud computing come to save network storage and server time, as needed automatically without requiring human capacity and communication for aiding service from provider and stakeholders. Currently, set of works and research in CC are working to enhance computing capabilities to benefit constrained cellular phones by means of full access to software, computing management services, and cloud infrastructures. Consequently, great choices offered by cloud computing for massive tasks require more time to calculate. One popular technique of cloud computing is offloading [4]. Computation offloading helps in improving mobile application performance, reduces its power consumption, and accelerates respositivity [5]. Offloading, also referred to as remote execution, includes coding intesive code for an application running on a remote server in order to make system that can take advantage of the sufficient power supply and powerful hardware to improve its responsiveness and reduce battery power intake [6].
Guideme: An Optimized Mobile Learning Model Based on Cloud …
833
While offloading, accomplish genuinely computation augmentation for constrained cell phone, and the performance of application execution still needs to be monitored. In this paper, we will discuss previous studies focusing on m-learning models based on any of mobility pervasive computing, analysis of other researches shows that recent associated subsets cover limited subsets of computing offloading and its impact on reducing response time at some circumstances. At this manner, we will propose a novel of new m-leaning model based on mobile services which can benefit in offloading while immigrating intensive tasks in fly. Proposed model depends on public cloud and conventional client/server model depending on network condition. We investigate efficiency of two provided modes in term of responsivity. Evaluation of prototype efficiency exposed improving which migrating to public cloud on most cases. Rest of this paper is organized as following: Sect. 2 provides a summary of m-learning systems, Sect. 3 discusses the proposed model called “Guidme” and its components. Section 4 details model implementation steps. The prototype efficiency assessment will be focused on in Sect. 5. In conclusion, Sect. 6 finishes up the paper and draws new future work thoughts.
2 Previous Works For supporting m-learning, many models increased to run cloud-based mobile applications. For instance, Rogers [7] study examined correspondences innovations in encouraging parental association in middle schools, uncovering obstacles limiting the use of communication technology. Existing study was based on evaluating the function of two related interchange technologies, telephone, and e-mail, one common communication technology; school Web sites, and contact practices among parents/instructors of middle. The outcomes demonstrate that numerous parents still depend on customary types of communication including landline telephones, printed pamphlets, and eye-to-eye communication, which uncovers that teachers and guardians alike are not exploiting the benefit, ease, and speed of electronic communication methods like e-mail and Web sites. After deep analysis, it has been noticed that none of the above models considering performance parameters with high impact on user satisfaction. Service response time appears to be highest factors affecting on satisfaction, specially while post and request heavy tasks consuming a lot from device capabilities and therefore user time that influence us toward constructing a new novel focusing on responsivity which can satisfy client requests and needs, and enhance device performance. From mobile service point of view, our “Guideme” model offers new ways to boost performance of mobile learning models in a novel not focused in before.
834
R. Elstohy et al.
2.1 Cloud Computing Researchers defined cloud computing as information technology service paradigm where computer services are provided to customers on request in a selfservice fashion for both software and hardware, independent of location or device characteristics as illustrated in Fig. 1. The resources required to provide the requisite quality of service levels which are shared, quickly provisioned, progressively scalable, virtualized, and discharged with minimal contact between service providers and clients. Clients pay for the service as a working cost without acquiring any noteworthy introductory capital use [8]. Our research will utilize Microsoft azure as public cloud provider benefits from “as you go” model for cost customization. On the other side, a conventional desktop server should be utilized as on-premise server acting as switching mode in case of failure circumstances or network delay.
Fig. 1 Cloud computing architecture
Guideme: An Optimized Mobile Learning Model Based on Cloud …
835
2.2 Offloading Computation Computation offloading is process of sending computation-intensive application components to a remote server [9]. As of late, several of computations offloading system with a few methodologies have been suggested for cell phone applications. Those applications are distributed at different levels of granularity, parts are sent (offloaded) for remote execution in order to extend and enhance cell phone capabilities [9–11] more about offloading techniques as illustrated in Fig. 2.
3 Proposed Model Architecture Figure 3 illustrates the basic architecture of the proposed m-learning service model; Guidme is mostly offloading concept dependent on which discussed in Sect. 2, two important concepts, first one is public cloud computing and second one is nearby desktop server through access protocols, and both of the cell phones must be connected to the network via wireless connection, 4G cellular radio, or WIFI. Guidme has two basic models; first model is developed as client native side, instantiate as interface app for student, installed and configured at their android phones. Using this app, students can upload photos to their learners’ for any diary content asking for confirmation or help, and parents can easily contact learners’ in case of inconvenience for any matter related to their child progress. At client side, offloading service module dynamically select the appropriate offloading resources whether is
Fig. 2 Mobile offloading procedure
836
R. Elstohy et al.
Fig. 3 Guideme architecture
public cloud or desktop nearby server in case of disconnection condition. However, adaptation on offloading is always required, decision will be taken at runtime whether mobile codes should be offloaded and which parts ought to be executed remotely. For example, if the remote open server ends up inaccessible because of unstable network connection, operated tasks should be rendered to the device, otherwise go to execute at another available server as registered. Remote server constantly receives the request from multiple cell phones. Server produces multiple clones for the handling of multiple client requests. Each cloud clone responsible for handling and retrieving mobile customer requests where clone stores “guideme” content and text inquiry. On the other side, learner can receive content and text messages using GUI interface, and Fig. 4 illustrates guideme model usecase. Additionally, a monitoring algorithm has been constructed for detection task, quality of network connections, and delay calculation, and server status are managed throughout decision-making engine module.
4 Implementation The proposed model is divided into two parts, client part consists of set of services, one for offloading images as in Sect. 4.1, another service for offloading text from parent as in Sect. 4.2, the offloading module service includes decision-making engine at Sect. 4.3, client part deployed and configured at client cell phones, the client side has been developed using software known as Android studio, application development, based on IntelliJ IDEA which known as integrated development environment (IDE).
Guideme: An Optimized Mobile Learning Model Based on Cloud …
837
Fig. 4 Guidme usecase
This IDE includes extensible plug-in and a base workspace to customize the Java environment. The IDE [12, 13] can be integrated with database engine, Google Cloud Messaging and App Engine, following sections present brief listings of some major modules.
4.1 Content Service Module Content service is a main feature of this model. Camera capabilities were mostly used to snap pictures and tag them. Camera app is one of the most popular mobile phone apps, making heavy use of camera hardware. Snapchat and Instagram are two examples that can implement their own custom camera view rather than using the builtin mobile device application. To access camera features, permission has to be set in file of Android Manifest, that is achieved by adding the camera and other manifest element < uses features> camera declaration features used by the app. To set image capture settings, camera class is used, with methods start, preview, stop, and snap photos. This class manages the camera service’s and actual camera hardware. The standard intent ACTION_IMAGE_CAPTURE is used to capture a picture and then return it back. To store captured images inside gallery and entitled them, EXTRA_OUTPUT parameter is set as well. Finally, while client can view stored images within Image Gallery, application stores the images
838
R. Elstohy et al.
Fig. 5 Sequence diagram of content service activity
inside cloud server, Fig. 5 presents as a sample sequence diagram of content service module.
4.2 Texting Service Module In order to contact with learner, a complain class has been created to instantiate on create () method, an instance of on click listener instantiate to handle http get () and post () methods for binding student name, e-mail address, department name, and phone number parameters. Both aforementioned services utilize azure SQL database to migrate data to its portal with synchronization to SQL server database engine [14].
4.3 Offloading Service Module Offloading module consists of several components, including network status monitor, an offloading decision engine, a resource monitor, and an offloading proxy which can connect offloading remote execution manager to decision engine. The decision engine is constructed in order to investigate delay because of the offload. Two types of measurements are made prior to execution (1): average response time of the application running on the public cloud server comparing to time consumed if offloading done to on-premise server and after detecting network conditions.
Guideme: An Optimized Mobile Learning Model Based on Cloud …
839
There are two decisions regarding selecting appropriate offloading modes: Condition 1: There is good network connection to cloud where Tpublic = Tnet + Tcloud
(1)
Condition2: there is good network connection to on-premise server where Tserver = Tnet + Ton−premise .
(2)
Offloading to public cloud is beneficial in case of T public < T server .
5 Evaluation In order to characterize the advantages of using our model, we carried out set of experiments and studied the impact of the proposed model in term of responsivity [15] on applied services. From this perspective, our experiment’s goal is twofold, first: (1) To investigate if our model is proper to offload real implemented tasks using different types of android devices; second: to compare the performance of offloaded tasks to public cloud comparing to offloading same tasks to onpremise server in term of response time. To achieve first goal, a suite of microbenchmark was conducted. Our benchmark took place on Samsung and HTC devices, respectively, both of them have been charged totally, and after that, we kill all background tasks appeared in multitasking menu. We do not need anything to interfere with our tests, so additionally, we placed them in Airplane mode to keep them from bringing mail or accepting calls. Our benchmarking test consisted of a client installed on aforementioned cell phones and a server application running on the cloud azure and SQL server engine. Using 11 real benchmarks, we gather and then analyze tasks execution among trips. The evaluation results are validated via statistical modeling as given in Table 1. (2) To achieve second goal and to conduct responsivity in our experiment, we measured the total response time as demonstrated at Eqs. (1) and (2), experiments executed 11 times among different intervals to show actual result using stable WiFi connection At these experiments, we monitored execution time for each service standalone, first we exploit automatic timer to begin directly before sending the solicitation and to stop directly in the wake of getting the response in on-premise server mode, and then, we record azure public cloud metrics and dashboard where it offers helpful performance monitoring layout as in Fig. 6.
840
R. Elstohy et al.
Table 1 Evaluation benchmark recorded values Device
System function
Cloud server response time
Median
Standard deviation
On-premise server response time
Galaxy J7 prime
Text offloading
2.46
2.5
0.332483766
2.5
Content offloading
4.36
4.4
0.492612085
5.1
Text offloading
2.41
2.45
0.260128174
2.3
Content offloading
5.75
5.65
0.634647759
6
HTC Desire816G
Fig. 6 Cloud response time metrics
During experiments, it is noticed that device capabilities may affect responsivity. Precisely, processing power of devices may affect overall performance and appear while comparing in our benchmark between Samsung and HTC, as in Fig. 7. Additionally, we observed that service computation complexity has its’ influence on offloading significantly. We found that while requesting and posting text offloading service, it consumes about 2.4 s, while invoking content service and post-captured image from public server, it takes average 4.3 s, and in time, it consumes average 4.7 in case of on-premise server as illustrated in Fig. 8.
Guideme: An Optimized Mobile Learning Model Based on Cloud …
841
Fig. 7 Guidme performance benchmark
Fig. 8 Guidme performance evaluation optimization
6 Conclusion and Future Works In this paper, an effective mobile learning model was presented, and we specifically address a case study for students and their learners applied on Egyptian schools. Guideme is implemented based on Android platform with support of content and text offloading facilities, one of them is public cloud computing and other one is nearby desktop server, Offloading dynamically select the appropriate resources, decision is taken at runtime according to adaptation condition. Furthermore, we investigate the performance of proposed model which concludes efficiency in term of response time, and using public cloud, we optimize provided services performance and emphasize robustness and availabile demands.
842
R. Elstohy et al.
Future works may conduct more experiments on other real mobile device with other features and characteristics. More conditions and rules should be applied in the offloading decision model to achieve better outcomes.
References 1. S. Husain, Online Communication Between Home and School. Case Study: Improve the Usability of the Unikim Communication Platform in the Primary Schools of Tierp Municipality (Uppsala University, Master, 2012) 2. N. Mechael, Mobile Phones and the Preservation of Well-being in Egyptian Families. (Master of Science, Columbia University, 2011), pp. 1–10 3. N. Selviandro, Z. Hasibuan, E-learning: a proposed model and benefits by using E-learning based on cloud computing for educational institution. in Proceedings of Springer–-Verlag Berlin Heidelberg Conference (2011) 4. Y. Cao, R. Klamma, Mobile Cloud Computing: A Comparison of Application Models (Aachen University, 2011), available at “http://arxiv.org/abs/1107.4940 5. A. Nasser, A. Ali, H. El-ghareeb, Hybrid e-government framework based on cloud computing and service oriented architecture. J. Theor. Appl. Inf. Technol (2018) 6. S. Torghbeh, A Lightweight Mobile Could Computing. Framework for resource Intensive Mobile Application. Kuela Computer (University of Malaya, Malaya, 2014) 7. R. Rogers, V. Wright, Assessing technology’s role in communication between parents and middle schools. Electron. J. Integ. Technol. Educ. 7, 36–58 (2008) 8. K. Somaiya, The future of cloud. J. Adv. Res. Electri. J. Electron. Instru. Eng. 1(3) (2012) 9. K. Akherfi, M. Gerndt, H. Harroud, Mobile cloud computing for computation offloading: Issues and challenges. J. Appl. Comput. Inf. Saudi Arabia (2018) 10. K. Delic, J. Riley, Enterprise Knowledge Clouds. Next Generation Km Systems. In Proceedings of Inform Process, Knowledge Management (Cancun, Mexico, 2009), pp. 49–53 11. J. Pandit, O’Riordan, A model for contextual data sharing in smartphone applications. Int. J. Pervas. Comput. Commun. 12(3), 310–331 (2016) 12. Android Studio Inc. [online], available at: http://www.androiddocs.com/tools/studio/index. html 13. Android studio Inc. [online], available at: https://developer.android.com/studio/profile/net work-profiler 14. Microsoft, Inc. Azure Cloud Server, 2019, [online] available at: https://azure.microsoft.com/ en-in/services/sql-database/ 15. Chen et al., J. Cloud Comput. Adv. Syst. Appl. 6(1) (2017)
Model Development in Predicting Seaweed Production Using Data Mining Techniques Joseph G. Acebo, Larmie S. Feliscuzo, and Cherry Lyn C. Sta. Romana
Abstract Production trends nowadays can be predicted by identifying hidden patterns in variables and factors underlying the production processes. In the industry sector, data mining is a valuable data analysis method that can help forecast variabilities in the different industrial processes, particularly, the production area. Useful knowledge can be analyzed from data in databases that can be very helpful in identifying specific factors that improve the quantity and quality of products. Thus, this study utilized data from an agency database that was analyzed using data mining techniques to develop a model on seaweed production. Seaweed production is continuously increasing its share in the market but is also faced with problems and constraints. This study determined significant variables that can predict high seaweed production and compared classification accuracies of Naïve Bayes, J48, CART, logistic regression and CHAID algorithms. A prediction model on seaweed production is generated that can benefit the seaweed industry sector in designing interventions to increase its production. Validation of the result through other data mining techniques is recommended by this study. Keywords Classification algorithm · Data mining · Decision tree · Seaweed production
J. G. Acebo (B) Eastern Samar State University, Salcedo, Eastern Samar, Philippines e-mail: [email protected] L. S. Feliscuzo · C. L. C. Sta. Romana Cebu Institute of Technology-University, Cebu City, Philippines e-mail: [email protected] C. L. C. Sta. Romana e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_75
843
844
J. G. Acebo et al.
1 Introduction Seaweed is a significant marine component that thrives together with other aquatic life such as mangroves and coral reefs. It has substantial uses that include both environmental and commercial as it can be converted to various forms of marketable products. The Philippines is one of the dominant producers of a variety of seaweed species [1] but is, however, faced with limited production of marketable seaweed species due to several factors. Seaweed production is currently a booming industrial activity, as it continues to rise in its share in the world market. However, it is also bombarded with problems in terms of farming methods which encompasses both individual and environmental concerns [2]. Thus, this study made use of data from an agency database that was analyzed using data mining techniques to develop a model on seaweed production. This study determined significant attributes that best contribute to high seaweed production and compared classification accuracies of several algorithms, and then obtained a suitable prediction model. Data mining becomes an increasingly valuable data analysis method in the industry sector. Production trends can be predicted through data mining to identify and analyze hidden patterns in the variables which control the processes and improve production rate. Currently, data mining [3] is applied to different areas in the production and manufacturing industry to extract knowledge for application in predicting maintenance, design, fault detection, quality control, production and decision support systems.
2 Methods 2.1 Research Design A quantitative data analysis was applied in this study which involves the process of collecting and evaluating measurable and verifiable data to be tested to generate new information and build the predictive models through data mining methods.
2.2 Research Method This study follows the cross-industry process for data mining (CRISP-DM) methodology [4]. This methodology involves an organized process to help plan a data mining project. It has six major iterative phases which are shown in Fig. 1.
Model Development in Predicting Seaweed Production …
845
Fig. 1 Cross-industry standard process for data mining
2.3 Applying Data Mining Algorithms Data from the agency database used as the final dataset consisted of information on individual local seaweed farms and farmers, seaweed production rate and other related information on seaweed farming status related to the seaweed industry. Data includes twenty-two (22) input variables which are farm and farmer-related features, and one (1) output variable which is the seaweed production rate taken from both dried and fresh seaweed harvests. Seaweed production rate is categorized into high, average and low. Each variable in the dataset has 1131 instances. The final dataset was split into two: 75% as a training dataset and 25% as a test dataset. The following algorithms were utilized in analyzing the dataset. Naïve Bayes. The classifier of Naive Bayesian is generated from Bayes’ theorem [5] following the assumptions of independence between variables. A Naive Bayesian classifier, as stated in [5], can be easily built, having no problematic iterative parameter estimation that makes it suitable for huge industrial datasets. Regardless of its ease, the algorithm often does well and is commonly applied because it often performs better than other classification methods. Bayes theorem follows the procedure of computing posterior probability that is P(c|x) out of P(c), P(x) as well as P(x|c) and considers the result of the predictor (x) on a particular class (c) as independent from the given values of added input variables, and this assumption is referred to as the class conditional independence. This is shown as follows:
846
J. G. Acebo et al.
P(c/x) =
P(x/c) ∗ P(c) P(x)
J48. An application of the C4.5 decision tree algorithm, the J48 algorithm tends to classify a data instance by generating a decision tree taken from the values of the attributes in the training set. It identifies an attribute that causes the categorization of the various instances most accurately, the time it crosses the training set. Further, the probable input values with certainty are apportioned to the particular branch by eliminating it. Decision trees are generated by C4.5 [6] algorithm for classification making C4.5 often a statistical classifier. Simple CART. A classification technique that creates binary decision trees, simple classification and regression tree (CART) [3] generate only two children, where entropy is applied to select the appropriate splitting attribute. By disregarding the missing data, the said algorithm can best treat the training dataset. CART creates classification tree through binary splitting of attribute and using the Gini index in choosing the splitting attribute [7]. Logistic Regression. Logistic regression [8] is a method for dataset analysis where many independent variables cause an outcome. A dichotomous variable measures the outcome which provides only two possible outcomes. The goal of this method is to look for the best model to describe the association between the dichotomous characteristic and set independent variables. CHAID. Used for ascertaining associations between variable of categorical response and other categorical variables, chi-square automatic interaction detector analysis [9] is best appropriate for identifying patterns in datasets with many categorical variables and for visualizing relationships of summarized data.
2.4 Analyzing and Visualizing the Model To apply the different algorithms and analyze their corresponding accuracy, sensitivity and specificity, this study used R Programming Language [10], a freeware programming language and software environment [11]. This language is frequently used [12] as an educational language and research tool for data analysis and statistical software development. A confusion matrix and predictive model were visualized out of the algorithm with the highest accuracy. Also, the variable importance plot is generated to provide the most important variables where the top variables contribute more to the model than the bottom ones and also have high predictive power in classifying the seaweed production rate.
Model Development in Predicting Seaweed Production …
847
3 Results and Discussion Table 1 depicts the accuracy, sensitivity and specificity results of the different classification algorithms. From the results, accuracy is computed from sensitivity and specificity [13]. Comparing the results of the different algorithms, CHAID has the highest accuracy of 96% as highlighted. It has a high sensitivity value of 0.95 and specificity of 0.97. Naïve Bayes has an accuracy value of 0.93, followed by J48 with 0.92, logistic regression of 0.85 and CART with 0.80. Out of the five algorithms, Table 1 depicts that CHAID has the highest value which implies that it can correctly classify a specific seaweed farm as having or not having high, average and low production rate. The accuracy value is the proportion of true positive results in a particular population. CHAID has the highest accuracy of 96% which indicates that the predictive model generated by the algorithm is acceptable. The confusion matrix for the seaweed production rate is shown in Table 2 with only 46 number of errors for the 1131 dataset. It shows a very low error rate of 4.05%. Correctly classified instances are read diagonally. The classifier correctly predicted 86% on average class, 92% on the high class and 91% on low class. This result is acceptable in terms of the performance level of the accuracy of the model. Figure 2 illustrates the predictive model generated by the CHAID algorithm. It can be gleaned from the model that the root node is “unavailability of good seedlings.” The nonexistence of this criterion in seaweed farms is said to predict a high seaweed production rate. This result is associated with the report of farmers with insufficient suitable cuttings stock and is left with the only option to spend only one harvest cycle causing a 50% decline in production in the present years. Good seedlings become unavailable due to massive pest seaweed infestation [2]. It confirms that seaweed farmers with Table 1 Results of the different classification algorithms
Table 2 Confusion matrix for the CHAID algorithm for seaweed production rate
Algorithm
Accuracy
Sensitivity
Specificity
Naïve Bayes
0.93
0.92
0.92
J48
0.92
0.92
0.93
CART
0.80
0.85
0.84
Logistic regression
0.85
0.85
0.87
CHAID
0.96
0.95
0.97
(a)
(b)
239
(c) 39
396 7
Transaction) transaction. • struct Products: All product listings are stored using the mapping (uint => Product). Every time a user wants to add an item to their inventory for sale, he can create it here. Information on the item that is kept in Swarm is linked to this listing via the unique hash that Swarm generates when the content is uploaded to Swarm. • uint transactionCount: Stores how many transactions have been stored in struct. Useful for making unique transaction IDs and for searching transaction. • address private escrow: Address of who created the contract. • bool markedForKilling: used to safely terminate contract if it is ever necessary.
Functions: • addProduct: We have several functions that retrieve and store information from the Products struct (such as getProductsCount and getProductPrice that return price or product quantity). addProduct adds a new listing/entry to our products map. • newTransaction: When a new transaction is created, this adds a new entry in the transaction map and prepares it with everything—like addresses, amount, product ID—so the buyer can
876
• • • • •
• • •
• •
• •
J. Ramón et al.
deposit payment afterward. It will then return to the buyer and seller the transaction ID number necessary to keep track of the transaction. getTransactionAddresses: Returns the address of the seller and of the creator of the entry to ensure this is an entry created by a valid seller. getTransactionDetails: Returns details given a transaction ID. This transparency allows anyone to view transaction records. This could be made private by encrypting the information as well on a future version. deposit: Allows the buyer to deposit payment for the transaction into the smart contract’s escrow service. The contract will then hold the funds until the item has been received by the seller & both the buyer & seller are happy with the transaction. returnBalance: Refunds balance to the buyer. This is a private function that can only be called by other functions in the contract. Not by any external user. This automate the refund process in the case of a problem such as the seller never shipping. cancelTransaction: This function can be called by both the buyer and seller given a transaction ID. If the buyer calls it, a flag is set that the buyer wants a refund. If the seller then calls this function as well then another flag is set for the seller. Then the money is automatically returned to the buyer using the returnBalance private function. Another case is if the buyer wants to cancel, but the seller does not respond, or mark the item as shipped within 10 days, then the money is returned to the buyer. Here, we can see the escrow process being fully automated. Do note that these flags can only be set by the correct parties as it checks the address of the caller. This ensures that the seller cannot set the buyer flags and cheat the system. ship: Marks an item as shipped. Can only be called by seller of a specific transaction. payBalance: Will pay the balance by releasing funds held by the smart contract address for this transaction to the seller once item was marked as received by buyer. accept: This can be called by both buyer and seller and when both parties call the function then it means both seller and buyer are happy and funds can be released to the seller. Another version has instead a “Issue Payment” that will release the funds once the seller has marked the item as shipped and the buyer is okay with the contract releasing the funds to the seller. checkBalance returns whether the buyer has deposited funds into the escrow. searchBalance: Can be used by the buyer to search for a transaction ID with pending status. This is a very expensive operation, gas-wise, because searching through the transaction map will depend on how big it is at the time the search is executed. This is a good example of how gas allows a fair usage of the miners EVM power to run code from a smart contract. Do note that a big misconception on gas usage is that read-only and constant operations do not consume gas. All operations consume gas, but the user does not pay the gas usage when it runs in the local EVM. However, if a constant function is called from a transaction then it is not free and the caller will be charged. Because of this and the nature of loops, loops are very expensive to run if not run locally. With this in mind, one way to search a balance without spending gas would be to run it locally instead of through the contract. cancelship: Allows cancelation of an item that was previously marked as shipped so the funds can be returned to the buyer. kill, safekill, markToKill: These functions allow for termination of the deployed contract if there is ever a need by first stopping new transactions and allowing a certain time for ongoing ones to finish and then refunding any pending funds held by the contract to appropriate parties. The reason this is important is because killing a contract that is currently holding funds can be risky as the funds could be lost if handled incorrectly. These functions ensure that will not happen.
With these and other functions, the contract allows communication between the Frontend Node.js server and the Blockchain so information can be read and written. Note that we only need one smart contract address to hold several transaction records and balances. There is no need to deploy an individual smart contract per transaction, as this would be extremely wasteful, and bad.
Decentralized Marketplace Using Blockchain …
877
6 Swarm: Decentralizing Everything Most traditional Web applications use a centralized database to store all the data used in the site. This is due to relational database being very fast to query data therefore speeding up loading times. However, the whole point of the project was to make the system as decentralized as possible, so instead of using a centralized database system like MySQL, we instead decided to use Swarm, a decentralized storage system heavily coupled with the Ethereum network layer [17]. Swarm is a storage system that provides scalable, reliable, and cost-effective data storage. Swarm is based on storage servers, rather than file servers; the storage servers are optimized for cost-performance and aggregated to provide high-performance data access. Swarm uses a striped log abstraction to store data on the storage servers. This abstraction simplifies storage allocation, improves file access performance, balances server loads, provides fault-tolerance through computed redundancy, and simplifies crash recovery. [18].
Swarm is very easy to use and deploy. In our case, the Frontend Node.js server can upload data, such as images and large product descriptions, to Swarm that would otherwise be expensive to store directly on the Ethereum Blockchain. This enables a truly decentralized system with no single point of failure or censorship. One more point to make about Swarm is the issue of data persistence. To protect the data in the case of a node failure, it is in the benefit of the group, for speed and security, that each marketplace frontend keeps a Swarm node to protect data. There are two key scripts when working with Swarm, the upload script and the download script. They function as follows: 1 2 3 4
swarm –bzzaccount tar -czvf swarm_data.tar swarm_data hash=$(swarm up swarm_data.tar) echo $hash
1 2 3 4 5
cd swarm_download_data mkdir $1 cd $1 swarm down bzz:/$2 swarm_data.tar tar -xzvf swarm_data.tar
6.1 Upload Script (Left) Line 1 connects to the Swarm node installed on user’s machine. The Swarm node is installed when the application is deployed. It uses the geth address for that user. Line 2 makes a tar file swarm_data.tar from the directory swarm_data. Whenever a user is creating a product listing, the images and descriptions are stored under this directory. In line 3, swarm up swarm_data.tar uploads the tar file swarm_data.tar to the local Swarm node which is subsequently synced to the remote nodes. This operation returns a hash value of the stored content which is stored in the variable hash. Finally, line 4 returns the hash value to the module which called this script. In our case, it is returned to Node.js server script.
878
J. Ramón et al.
6.2 Download Script (Right) First, the script enters the directory swarm_download_data in line 1. Our server always maintains this directory as the root for storing the downloaded Swarm contents. In line 2, inside that directory, a new directory is created under the name of the first argument passed to this script. The first argument passed is always the product ID. Hence, the information about each product is stored under the directory named with this product ID. Next, the script enters into that directory created in line 3. Line 4 downloads the content from the Swarm node under the name swarm_data.tar using the hash value that was provided as the second argument to the script. Line 5 then untars the downloaded file. After this line, the Node.js server goes into each product folder and parses the image and description information of that product and displays it to the user.
7 A Discussion on the Working System Consumers do not care whether their toaster runs on a MySQL database, a hamster wheel, or Blockchain Technology. They just want to eat nice, tasty toast. It is up to the engineers to figure out how to make the best toast. A true sign of when technology has succeeded is when it can be used on day-to-day life without the consumer even realizing it. The day we have a Blockchain toaster that works better than a nonBlockchain toaster is the day we know Blockchain has truly succeeded in the toaster business. Similarly, the goal of our Blockchain market is to be better than currently available marketplaces. Aside from removing censorship by decentralizing the marketplace and reducing fees, we also aim at unifying storefronts for the same seller on different frontend Node.js servers. So that if a seller wants to sell something on eBay and Amazon that they do not run into the problem mentioned earlier of creating a race condition on selling the same item twice and forcing one transaction to be delayed or canceled due to being out of stock. This can be achieved using a shared inventory, or ledger, between several frontend marketplaces, which is exactly what the Blockchain can do as each frontend accesses it. It also allows sellers to announce their products freely among any frontend they wish to do so with. This also means that if Amazon or eBay were to adopt Blockchain for their inventory tracking they could phase in the decentralized marketplace while also maintaining their own Web sites for their own products. This will aid in phasing out current technology, or coexisting until one proves to be better than the other. One other point of discussion in our system is the shipped flag. Currently, we have the seller manually set the shipped flag and then buyer manually set an issue payment/delivered flag once they receive the item. Because the money is held in escrow there is no benefit to the buyer for withholding the release of funds, but malicious buyers could do this for no reason. This is why ideally we would only allow something to be marked as shipped/delivered when a carrier service such as
Decentralized Marketplace Using Blockchain …
879
FedEx, DHL, USPS, or UPS mark something as shipped with their own Blockchain addresses. Then using Internet of things (IoT) devices, these could be tracked reliably. Ultimately, we believe this is where the industry is headed due to the amount of money already invested in tracking packages.
8 Ongoing Challenged and Future Work While the system is fully operational, there are still challenges and work to be done. Some of these challenges are the empty/fake package problem. This is when the item is marked as shipped and delivered by the postal carrier, but then buyer claims to have received an empty box from the seller. Who is at fault here? Was the item stolen during transit? Did the seller lie and ship an empty box trying to defraud the buyer? Alternatively, did the buyer receive the item but is lying by claiming he received an empty box? There is no easy solution to this even when humans are involved. Supply Chain Blockchain Projects such as the IBM–Walmart project mentioned earlier aim to tackle this problem by using IoT devices to keep track of an item from start to end. This is the true solution, but for now using a third party that can help arbitrate is the only choice. Similar to Open Bazaar’s multisignature system, we allow a third party to be called and moderate a dispute and ideally side with the correct party. However, this is only used when it becomes necessary and a third party does not need to be picked when using the system. Otherwise, no human factor needs be involved, and is not involved for the majority of the transactions. For when it is used however, we trust the third party will work as it has in many places since these arbitrators earn a fee and live on their reputation as fair arbitrators. Do note that trusting the carrier as a release of funds does not make it a centralized figure with the IoT approach; furthermore, one way to have the marketplace work without the delivered flag is for the escrow contract to hold the funds until the buyer releases them. If buyer does not release them, then he does not get them back, but the seller also does not get them. Because of this the buyer has no interest in holding the escrow funds hostage, but a malicious buyer could do this regardless. To solve this, both buyer and seller can be required to have a deposit that is held until the transaction is marked resolved. Now the buyer loses additional funds by holding the payment hostage instead of releasing it, or in the case of a dispute, allow the arbitrator to step in decide who to release the funds to. Both cases would then trigger a release of the deposit. Another major challenge is the legality of selling or purchasing specific products. Because the market does not have a centralized system to monitor what is sold, there is a risk of selling items that may be illegal to purchase by certain parties. This may be used to workaround sanctions or trade agreements. For example, US citizens can buy night vision goggles without any special permits, but they are not allowed to export these items. Because there is no easy way to control who is buying what, it is possible someone from France could be buying night vision goggles. Solutions to this are left to the seller to comply with International, Federal, State, and Local laws when selling an item. The other solution would be to introduce a centralized aspect,
880
J. Ramón et al.
perhaps in the form of masternodes, to deal with these issues. Another example could be taxation on items sold. This is an issue with regular online retailers as well where both parties are supposed to report and pay taxes; however, the majority of people do not pay taxes for these purchases when they technically are supposed to do so. The solution is simple. It is not the marketplace’s job to regulate these things, but the entities using it to ensure they are following the law. For example, VISA is not responsible for someone using their credit card to buy an item they are not supposed to buy, that is, the seller and buyer’s responsibility. Another challenged we are actively exploring solutions for is that while we have a working system that can retrieve all stored product listings, this search can be expensive in terms of gas cost due to having to iterate through our products map and filter the results; therefore, it is not very feasible to search the products based on user preferences. As mentioned, gas is always consumed—even by read-only or constant functions—but is not charged to the user when it runs in the local EVM; however, if the constant function is called from a transaction, then they do pay for this gas. A solution to these expensive gas operations are keeping separate local functions that can run these queries on the local EVM avoiding all the major gas fees. Another way one could solve this would be to maintain a copy of the Blockchain data on Swarm to speed up these searches and instead only have to check if the data in Swarm, and the data in the Blockchain match, if we are to have a truly decentralized search engine it is key that we do not outsource this computation outside Swarm or the Blockchain. Finally, creating our own Blockchain with no gas fees would solve transaction fees altogether for consumers. Instead marketplaces would be encouraged to mine these transactions to continue operating the marketplace. Whether they are mining transactions from their frontend, or from others, they have a motivation to run the functions to keep the marketplace running. This approach is typically used with success in closed and private Blockchains. The biggest challenge however, as mentioned before, is transaction times when it comes to transactions per second. Ethereum can only handle 20 transactions per second, but the goal is to at least achieve around 200 transactions as PayPal does. Furthermore, there are emerging feeless Blockchains that do not require gas as Ethereum does such as EOS and IOTA [19]. Solutions for this include exploring other Blockchains and following where technology heads for third-generation Blockchains. Future work along this line also includes exploration into using multiple cryptocurrencies to transact that may be more stable or to allow FIAT (such as USD) to be used in transactions. This could be achieved with our own stable coin that we tether to the US dollar or other FIAT currencies.
9 Conclusion We have presented a truly decentralized marketplace using Blockchain, cryptocurrency, and Swarm Technology built on the Ethereum Blockchain. We discussed the technology and provide a walkthrough of the system architecture and workflow. The
Decentralized Marketplace Using Blockchain …
881
main advantages of using a decentralizing a marketplace are the ability to avoid product censorship—compared to current e-commerce platforms like Amazon— removing the middleman fees in transactions by using an escrow service created with Ethereum smart contracts that offer an alternative to current payment gateways like PayPal. We used Swarm to maintain a decentralized database for Webhosting in coordination with information stored in the Blockchain. Because reproducible research is a key principle of cumulative science and promoting collaboration between researchers [20], and because we believe in not just a decentralized open market, but a decentralized project development, we are open sourcing the entire project and making it freely available for the community to use. There have been other attempts and papers written on similar ideas, but unlike those, we have a working product ready to be deployed globally. It is available on multiple repositories including Docker’s Hub (Search: UNLVCS), GitHub (https://github.com/UNLVCS), and Zenodo. There is also a 25 min long video with a live demo on YouTube named “UNLV Blockchain Day: Decentralized Blockchain Marketplace Presentation.” The future is now.
References 1. S. Nakamoto et al., Bitcoin: a peer-to-peer electronic cash system (2008) 2. S. Haber, W.S. Stornetta, How to time-stamp a digital document, in Conference on the Theory and Application of Cryptography (Springer, 1990), pp. 437–455 3. D. Bayer, S. Haber, W.S. Stornetta, Improving the efficiency and reliability of digital timestamping, in Sequences II (Springer, 1993), pp. 329–334 4. R.C. Merkle, Protocols for public key cryptosystems, in IEEE Symposium on Security and Privacy (IEEE, 1980), pp. 122–122 5. A. Back et al., Hashcash-a denial of service counter-measure (2002) 6. C. Dwork, M. Naor, Pricing via processing or combatting junk mail, in Annual International Cryptology Conference (Springer, 1992), pp. 139–147 7. V. Buterin et al., Ethereum white paper: a next generation smart contract & decentralized application platform, First version (2014) 8. S. Williams, 20 real-world uses for blockchain technology. https://www.fool.com/investing/ 2018/04/11/20-real-world-uses-for-blockchain-technology.aspx. Accessed 03 May 2019 9. M. Corkery, N. Popper, From farm to blockchain: Walmart tracks its lettuce, in The New York Times, p. B1. Print, published: 24 Sept 2018 10. D. Siegel, Understanding the dao attack. https://www.coindesk.com/understanding-dao-hackjournalists. Accessed 03 May 2019 11. Transactions speeds: How do cryptocurrencies stack up to visa or paypal? howmuch.net a cost information website. https://howmuch.net/articles/crypto-transaction-speeds-compared. Accessed 03 May 2019 12. E-commerce worldwide—statistics & facts. https://www.statista.com/topics/871/onlineshopping/. Accessed 03 May 2019 13. Number of digital shoppers in the United States from 2016 to 2021 (in millions). https:// www.statista.com/statistics/183755/number-of-us-internet-shoppers-since-2009/. Accessed 03 May 2019 14. Explosives, weapons, and related items. https://sellercentral.amazon.com/gp/help/external/ 200164950. Accessed 03 May 2019
882
J. Ramón et al.
15. Bitcoin overtakes ethereum in node numbers. https://www.trustnodes.com/2019/01/09/ bitcoin-overtakes-ethereum-in-node-numbers. Accessed 03 May 2019 16. C. Dannen, Introducing Ethereum and Solidity (Springer, 2017) 17. K.R. Özyılmaz, A. Yurdakul, Designing a blockchain-based iot infrastructure with ethereum, swarm and lora (2018). arXiv:1809.07655 18. J.H. Hartman, I. Murdock, T. Spalink, The swarm scalable storage system, in Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No. 99CB37003) (IEEE, 1999), pp. 74–81 19. S. Popov, The Tangle, iota Whitepaper (2018) 20. J.R. Fonseca Cacho, K. Taghva, Reproducible research in document analysis and recognition. in Information Technology-New Generations (Springer, 2018), pp. 389–395
A Expansion Method for DriveMonitor Trace Function Dong Liu
Abstract Based on the Trace Function, after analyzing the data format of the trace data files, the paper gives the methods for data extraction with databases, data channel expansion, data fusion, and data reconstruction. The expansion method gives more convenient in data analyzing and processing for related electrical engineer. Keywords SQLite database · Data channel expansion · Data fusion · Data reconstruction
1 Introduction As Siemens frequency conversion drive debug software, DriveMonitor software [1] can modify all parameters and record curve data with serial RS232 connecting to frequency conversion device. Through serial RS485 or Profibus DP, the software can network with more than 256 frequency conversion devices which can work with automation PLC devices. DriveMonitor software can modify several commonusing parameters and all parameters with exporter mode by combining parametric reading and writing mode of frequency converter, unit system and all built-in models. Especially its curve data recording function can complete the complete recording of internal data of frequency conversion drive with different triggering conditions, provide more and more detailed data for electrical engineer, provide data support for further improving parameter adjustment, and finally satisfy complex and flexible process. Based on the curve data recording function of DriveMonitor software [2], this paper gives one advanced data processing functions through detailed analysis and a large number of tests, such as data information extraction, data channel expansion, data information fusion, data information reconstruction and so on, which expand the basic functions and make data progress more efficient, such as: data storage capacity D. Liu (B) Electric Department, Capital Engineering & Research Incorporation Limited, Beijing 100176, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_78
883
884
D. Liu
is no longer limited by 8K bytes, data channels from 8 to several channels without any limitation, seamless splicing of data from different data files, perfect fusion and so on. These extended functions are more convenient for frequency conversion drive debugging and maintenance staff to analyze on-site problems and summarize on-site data.
2 Data Information Extraction Every curve data file recorded by the DriveMonitor software contains two parts: the actual curve data information and the data file configuration information. The actual data information of the curve records the core data of the current transmission device, including the reference current P350, the reference voltage P351, the reference frequency P352 and so on. The configuration information of data file includes basic information of device, parameter setting of DriveMonitor curve, trigger condition of curve recording, time of curve recording, name of curve variable and so on. Info-Parameters is the only device data for Siemens frequency conversion drive device based on 16-bit fixed-point processor, and is also an important parameter of its internal model algorithm. If the actual data information of the curve is bound to the basic information of the device, it can make the data traceable and provide an important scale for converting the data from the unit value to the actual value [3]. The basic data export function provided by DriveMonitor software, can only export text data and realize simple data processes. More flexible data processes have to manual check device information, data re-standardization processing with different sample time. Manual interpolation calculation is needed, so it is impossible to achieve more advanced data analysis function, which indirectly restricts the ability of the maintenance staff to analyze data and solve problems. According the long-term experience of transmission debugging in metallurgical automation industry, this paper proposes to extract all the valid data from the data record files into the database, use the database to manage all data. Considering the generality and practicability of the database, this paper adopts the lite database of SQLite, which can realize data processing on Windows and Linux. It provides a basic platform for more functional expansion in the future. Every data record file contains the actual data information of the curve, device information, parameter setting of DriveMonitor curve, trigger condition of curve record, time of curve record, name of curve variable and so on. When these data information are extracted into the database, the actual 16-bit fixed-point data format is adopted, and the actual data information of the curve is saved separately as a table to realize the non-destructive preservation of the data. Except the actual data values, all information of the curve is recorded in the database through the format of the standard file header and saved separately as a table. The table can ensure the integrity and retrievability of curve data information. Two database tables are shown in Fig. 1.
A Expansion Method for DriveMonitor Trace Function
(a) Actual Curve-Data Information Table
885
(b) Header data information
Fig. 1 Table information of databases
After the curve data is extracted into the database, the two parts of the DriveMonitor curve record file can be managed efficiently by common database retrieval methods. It can not only realize the relational retrieval of data tables, but also realize the effective decomposition and fusion of data, especially for a large number of control fields. The advanced data tools, such as MATLAB, Mapple and other tools, are used for mass data analysis, signal processing, system model building and so on. In DriveMonitor curve data record database, according to the existing data files, the header file table and data information table are automatically extracted by computer software [4]. In order to achieve more advanced data processing functions, an “index table” is established to scientifically and efficiently manage the data fragments of every tracing record file. The corresponding data structure relationship index is shown in Fig. 2.
Index Table
Trace file 1
Head Table
Data Table
Trace Info 1
Trace Info 2
Trace Data 1
Trace Data 2
Trace Info n-1
Trace Info n
Trace Data n-1
Trace Data n
Trace file 2
Trace file n-1 Trace file n
Fig. 2 Index map of databases
886
D. Liu
3 Data Channel Extension The built-in curve recording function of frequency conversion drive device is usually limited by the number of data channels and the length of recording, and different devices have different limits. The number of data channels and the length of recording time are a pair of contradictions that are difficult to solve, which have been plaguing the majority of frequency conversion drive debugging and maintenance personnel [4]. Aiming at the problem that the number of curve data channels is limited, through long-term exploration and testing, this paper summarizes a complete set of reading and writing methods of DriveMonitor curve data records. By improving those functions, the number of curve data channels can be expanded arbitrarily by users, and the length of data recording time can also be changed by users, which expands the limits of different devices. Specifically, in the database in Fig. 2, the user adds the custom data to the “data table,” adds the name, channel number and other information of the data to the “header table,” and according to the definition format of the curve data file of DriveMonitor software, the data record file can be regenerated. In this way, users can expand more customized channel data in the existing offline curve data record files according to their own needs, instead of being limited to the existing eight channel data, such as adding the ninth channel data and displaying the name, as shown in Fig. 3. Through the data channel expansion function, it is very convenient to display the same batch of variable curve data records in a DriveMonitor software much more at the same time, which makes the data comparison easier. Using this function, virtual signals can be established and compared with recorded signals; in addition,
Fig. 3 Diagram of data channel expansion
A Expansion Method for DriveMonitor Trace Function
887
this function can also be used to compare information with recorded signals. It also lays a good data foundation for later data information fusion and data information reconstruction.
4 Data Information Fusion Data information fusion makes serval curve data records extracting with the one time scale. Head fusion and data fusion of data information are carried out respectively through time comparison and stitching [5]. Different header information is analyzed, checked, synthesized and merged to form a unified header file. Serval curve data information of the same signal are processed. The data are converted according to the current unitary value, and then joined and fused together to form continuous information. After data information fusion, the curve data records can be opened and viewed by DriveMonitor software, which can show the running state of the equipment more clearly, and make the debugging of equipment and process more convenient. The structure of data information fusion is shown in Fig. 4.
4.1 Data Mosaic For the same variable with the same reference setting value of the same device, if multiple records are made, a number of curve data recording files are formed, which directly copies and splices all the data to realize the function of data information fusion. This makes it easier to observe the device’s action and performance response from all data streams. For two data files with the same variable in the pre-trigger time, the pre-trigger time is earlier than the current data timestamp because the starting point of the data timestamp is calculated from the trigger time and the pre-trigger time is earlier than
Trace File 1 Head 1
Head Fusion
Trace File 2
Data 1
Head 2
Data 1+Data 2 Trace file Data Fusion
Fig. 4 Structure of data fusion
Data 2
888
D. Liu
New reference-unites New referenceunites
referenceunites
referenceunites
referenceunites
referenceunites
Trace File 1
Trace File 2
Trace File n-1
Trace File n
Trace File Data Fusion
Data 1
Data 2
Data n-1
Data n
Data Fusion
Data Exchange
Fig. 5 Convert of data fusion
the trigger time. In data information fusion, the start time stamp of the second file must be changed, then the complete fusion of two data information can be carried out correctly. If the unitary values are different, the unitary values are unified first, and then the file data are mosaic. For the fusion of N source data files, if there are totally different reference unites, users need to select one of the reference unites or customize a reference unites. Then all other data are converted according to the selected reference unites, and finally data mosaic is performed, as shown in Fig. 5.
4.2 Key Technologies of Data Fusion 4.2.1
Establishing Virtual Channel
In the field of metallurgical automation, as a zero-level equipment, frequency conversion drive device usually works under continuous operating conditions. The maximum data storage space of Siemens frequency conversion drive device is 8 kB, so it cannot record curve data continuously. It can only work in the trigger recording mode. Therefore, every curve data file is the time-discontinues data. When data information fusion is carried out, data discontinuity and data interruption will occur. In the process of data information fusion, users can choose to fill in all the missing data to zero to distinguish the real data of normal process; but the fused data contains real data and virtual data, and the invalid information added by human is mixed with effective information, so the real operation of the device cannot be judged. In the above situation, this paper proposes to re-establish a virtual channel in the process of filling the virtual auxiliary data to reflect the artificially added signal in data fusion. Referring to the virtual channel signal, users can clearly distinguish the
A Expansion Method for DriveMonitor Trace Function
889
100% n/f(null) n/f(act)
t
Fig. 6 Virtual channel of data fusion
real data and virtual data in the fused data. As shown in Fig. 6, for example, after splicing the actual speed feedback signal segments in different time periods, real data and virtual data can be distinguished by establishing a virtual channel n/ f (nul) labeled invalid data segment. In the real data segment, but in the virtual data segment.
4.2.2
Adjusting Sampling Interval
When DriveMonitor software collects data, the interval time of sampling data is set by the user, so even if the parameters of the same control device of the same device are recorded, the interval time of recorded curve data may be totally different. In the process of data information fusion, it is necessary to adjust the processing according to the data interval time, so that the data can be copied and then fused. For curve data records with different sampling frequencies of the same signal, the minimum sampling time or user-defined sampling time can be used for data information fusion. The minimum sampling time is used for data information fusion. Firstly, through the first-order signal holding function inside the frequency converter, the data signals during the minimum sampling time remain unchanged during the two sampling intervals until the data of the next cycle arrives. Then, the head of each data segment is aligned by absolute time. In order to facilitate data alignment and analysis according to the actual operation of the equipment, the head data can also be fine-tuned according to the actual situation before fusion. The data fusion function also provides more advanced data interpolation supplementary functions, such as Newton-Lagrange interpolation and spline interpolation, which make the fused data signal more in line with the actual operation of field equipment.
890
D. Liu
5 Data Information Reconstruction Metallurgical automatic process-control system is usually a repetitive process system. Through data information reconstruction, the whole process data of the equipment can be recorded completely as the data infrastructure for all devices of analysis and equipment maintenance. If there is enough data, the complete process data description can be reconstructed. However, the curve data recording mode based on trigger mechanism cannot meet the requirements of the whole process data recording. Therefore, this paper provides a time-sharing data information reconstruction method, which can greatly improve the efficiency of information reconstruction. Data information reconstruction collects every data fragment of the process, and record the whole process completely by data fusion method, which realizes the full digital tracking of the process. The reconstruction based on time-sharing data information is to merge the device data collected in different time periods under trigger mechanism and get complete data information by splicing and fusing the relationship between logic and time sequence, scanning of the current equipment operation data under full operating conditions, as shown in Fig. 7. Data reconstruction is similar to the panoramic continuous shooting function of modern digital cameras. Through continuous shooting, grabbing a section of data of the current running equipment, and then splicing processing, the complete equipment operation information is obtained, and the data record of the equipment operation under full operating conditions is realized. The result of data information reconstruction is shown in Fig. 8. Data information reconstruction of one equipment is only the first step of system data rebuilding, while data information reconstruction of multi-equipment synchronization in continuous production line can make the operation process of the whole production line clearer and more convenient for systematic analysis. Especially for continuous production process, the relationship between upstream and downstream equipment is very close, and even sometimes it is impossible to debug and troubleshoot separately, so it is more necessary to reconstruct the data information of the whole process. For example, for the multi-stand continuous rolling system in the bar and wire production line, the signals of torque, speed and so on are coupled among several frequency drive device, which brings great difficulties to on-site debugging and fault analysis. Based on the curve data record file of the frequency conversion drive device, after reconstructing the data information, the signal coupling between the devices and the stack-pull relationship between the devices can be observed very clearly, which provides necessary information for field debugging and daily maintenance of the device, and can greatly shorten the adjustment time of the process. For example, in the frequency drive device of upper and lower rolls on heavy plate mill product line, through data information reconstruction, it provides strong data support for the realization of load impact analysis and compensation, load observation and other functions.
A Expansion Method for DriveMonitor Trace Function
891
6
Fra
ent
gm
ent
gm Fra
2
Fra gm ent
5
3 ent m g Fra
Fragment 4
Fragment 1
Trace File 1 Data 1
Head 1
Fig. 7 Diagram for data reconstruction
Data 1
Head New
Data 1
Trace File 5
Trace File 3
Trace File 1 Head 1
Head 3
Head 5
Data 3
Data 5
Trace File Fusion Data 2
Data 3
Data 2
Head 4
Fig. 8 Diagram of data reconstruction
Data 4
Data 6
Data 5
Trace File 6
Trace File 4
Trace File 2 Head 2
Data 4
Head 6
Data 6
892
D. Liu
To sum up, distributed data recording of continuous production equipment on the same production line and data information reconstruction function can reproduce the whole continuous production process from the perspective of technology and realize continuous analysis.
6 Conclusions Aiming at the curve data recording function of Siemens DriveMonitor software, the basic features of this function are analyzed in detail, and a better and powerful post-processing method is given. Through the data information extraction function, the data recorded in the device can be archived in the database, which realizes the cross-platform reading and writing operation, and also provides the basic data source for the future development of more functions. Through data channel expansion and data information fusion, we can break through the restrictions of DriveMonitor software and device, realize data expansion and development, and make the original intermittent data fragments become continuous on the time axis. From the point of view of metallurgical automation system, data flow fusion and reconstruction of single equipment process are realized by synthesizing, analyzing and processing data of single equipment in different time periods. On this basis, data reconstruction of whole process equipment is realized. The data processing and analysis method proposed in this paper can better meet the needs of professionals in metallurgical automation field.
References 1. Siemens Service, DriveMonitor Introduce [EB/OL]. (2004–9). http://www.ad.siemens.com.cn/ download/docMessage.aspx?ID=1263&loginID=&srno=&sendtime= 2. Siemens Service, SIMOVERT MASTERDRIVES Using Book [EB/OL]. (2004–6). http://www. ad.siemens.com.cn/download/docMessage.aspx?ID=6900&loginID=&srno=&sendtime= 3. D. Liu, W. Mao,Y. Wang,C. Yue, Continual fault data record method for simatic driver. Metallurg. Ind. Autom. 38(5), 53–57, 72 (2014) 4. D.L.D. Liu, Marking process analyze in plate marking machine. International Conference on Machine Learning & Cybernetics. IEEE (2010) 5. D. Liu, Simulation and character recognition for plate marking machine. International Conference on Signal Processing Systems. IEEE (2010)
Load Prediction Energy Efficient VM Consolidation Policy in Multimedia Cloud K. P. N. Jayasena and G. K. Suren W. de Chickera
Abstract Multimedia cloud computing is a dynamic development on could computing by upgrading various kind of media services including text, image, video on the Internet. The most significant challenges of the rising popularity of multimedia cloud computing are energy utilization and green cloud computing. To enhance the efficiency of energy on multimedia cloud data centers, we enhance the virtual machine consolidation (VMC) framework. There are two phases in VMC: the virtual machine (VM) selection and the virtual machine allocation. There are many researchers who proposed the solution with VM allocation and VM selection separately. Related on these two policies, we proposed the fast up and slow down (FUSD) load predictionbased energy-efficiency VMC for data-intensive jobs in multimedia cloud infrastructure. According to the simulation results in CloudSim with real trace data of cloud platform illustrated that the proposed load prediction policy shows up to better performance for efficiency of energy consumption, service level agreement violation (SLAV), and a number of VM migrations (VMM), respectively, compared to existing traditional VMC. The proposed VMC policy can use for large-scale multimedia cloud platform where minimal QoS assurance, SLAV, and energy consumption is inevitable. Keywords Energy efficiency · Load prediction · Multimedia cloud · SLA violation · VMC
K. P. N. Jayasena (B) Department of Computing and Information Systems, Sabaragamuwa University of Sri Lanka, Balangoda, Sri Lanka e-mail: [email protected] G. K. S. W. de Chickera Department of Accounting, Sri Lanka Institute of Advanced Technological Education, Galle, Sri Lanka e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_79
893
894
K. P. N. Jayasena and G. K. S. W. de Chickera
1 Introduction Multimedia cloud [1] is an emerging computing paradigm that effectively processes various multimedia applications and delivers multi-QoS provisions for clients. The multimedia users can use the cloud data centers to deploy multimedia applications process effectively. Yet, it is a challenging task to minimize the massive energy usage incurred by multimedia cloud while employing the client application’s QoS requirements [2]. The improving energy efficient of data centers [3, 4] has treated an important factor in last few decades due to its high performance, economic, and environmental impact. The 10% of operational expenses of data centers account for power consumption cost, and it may increase up to 50% in the next decade according to the Gartner group. Therefore, academia, industry, and government consider to reduce the data centers power consumption. It is crucial to overcome the problem of high energy usage; this can be implemented by enabling virtualization technology that enables efficient use of the existing data center resources using a method referred as VMC which involves the gathering of numerous VMs into a single physical server. The VMC comprises live migration, which is moving a virtual machine among physical servers with minimal downtime is an competent process to enhance resource efficient and minimize the power usage in data centers [5]. The VMC issue can be divided into four subproblems such as cloud server detection of underload, cloud server detection of overload, VM selection, and VM allocation [6]. One of the crucial aspects of planning multimedia cloud platform minimizes energy consumption. [7–9] papers proposed algorithm for centralized dynamic VM consolidation (DVMC) policy. They proposed a single controller that knows the current resource availability of the physical machines. The second perspective of DVMC considered as Physical Machine (PM) selection techniques which can divide into two methods, such as the threshold-free and based DVMC algorithms. The third perspective of DVMC is considered as expected future resource that is mainly based on non-predictive DVMC algorithm. The fourth perspective of DVMC is considered as destination PM selection strategies which mainly focus on a random selection of target PM from the appropriate PMs for a specified VM [10, 11]. Fourier transformation and the support vector machine predictors, periodic patterns and mined patterns used future to predict the resource demands. Our prediction policy varies from predictive-based methods which can learn and forecast the demands of resource schedule online VMs without any previous information about the VMs host. According to our best of understanding, this is the research that presents the load prediction algorithm for the multimedia cloud. In our research work, we used FUSD [12] algorithm to overcome the VMC issues, and significant contributions are as follows. We implement a VMC strategy that can mitigate the overload in the cloud architecture efficiently while reducing the cloud servers used in multimedia cloud architecture. Then we present a load forecast technique that is able to detect the expected resource procedures of cloud computing entirely without
Load Prediction Energy Efficient VM Consolidation Policy …
895
observing the functions of the VMs in the multimedia cloud. Finally, we evaluate load prediction algorithm performance compared with traditional policies and evolutionary algorithms such as the adaptive genetic algorithm (AGA) [13].
2 The Preliminary Studies Figure 1 illustrates the basic concept of VMC, and it contains three stages labeled as A1, A2, A3. In the A1 stage, the k-means technique adopts to distribute all hosts into three clusters; underload, overload, and normal load. A2 stage shows the operation of VM selection mechanism. In the A3 stage, VM allocation process. Virtual machine: V M = V M1 , V M2 , …, V Mm vm , and Nvm is the quantity of VMs necessary for this setup and the each VM represents as a triple; Memory
V Mi = (V MiC PU , V Mi
, V MiBandwidth )i = 1, 2, 3, . . . , Nvm
(1)
Physical machine: P M = P M1 , P M2 , …, P M N pm , and N pm is the quantity of necessary PMs and the each PM represents as the triple Memory
P M j = (P M Cj PU , P M j
, P M Bandwidth ) j = 1, 2, 3, 4 . . . N pm j
(2)
2.1 VM Selection The objective of the VM selection is to discover the clusters which are overload VMs and clusters which are underload VMs. The issue of selection of VM is to discover the
Fig. 1 VMC
896
K. P. N. Jayasena and G. K. S. W. de Chickera
Best V Ms List with its efficiency of energy valuation structure. The VM selection policy is defined VM Selection = {Coverload , Cunderload , Best V Ms List, E, FU S D} for the proposed energy efficient problem. Best V Ms List refers the set of Nv m VMs associate with N p m hosts separately. It function represented by BestVMsList(Nvm , t) = t, P Mvm 1 P Mvm 2 . . . P Mvm Nv m where t denotes as the start time, P Mvm is a vector of P M, V M and vm denotes as the best migrant VM in the particular host P M. E(N pm , t, Δt) = E1, E2, . . . , E N pm is the energy consumption model by the N pm P Ms in a resource queue. FU S D refers the VM selection policy according to load prediction strategy. Total Energy(t, Δt) =
(E(P Mi , t, Δt))
(3)
where E(P M1 , t, Δt) = (E(V M j , t, Δt)) TotalE(t, Δt) consists with the energy consumption in the cloud data center within Δt time from the starting time t. The power usage of VM in the specific host can be represented by E(V M j , t, Δt). The power usage of VM in specific host can be denoted by E(V M j , t, Δt) = E(cpu j , t, Δt) + E(ram j , t, Δt)
(4)
E(cpu j , t, Δt) define as power consumption occurred by CPU and E(ram j , t, Δt) represent as consumption occurred by RAM. In this project, we used selection procedures [14], such as minimum migration time (MMT), maximum correlation (MC), and random selection (RS) policy to compare with the load prediction procedures.
2.2 VM Allocation Policy The VM allocation discussed as placing selected VMs for migration to other active hosts. The strategy can be represented as VM Allocation(m, t) = (Coverload , Cunderload , Csleep , Best V Ms List, E, FU S D) (5) FU S D states to the VM allocation policy with load prediction strategy. compare to the VM selection procedure. SLA (t, Δt), SLAV(t, Δt), and ASLAV(t, Δt) are the performance metrics that are used to evaluate the performance of the method. ASLAV represents the mean of SLA violation rate. TotalE(t, Δt) is estimated power usage in the cloud data center in a Δt time starting from the time t. The E SV (t, Δt) equation is used to estimate the energy efficiency. E SV (t, Δt) = TotalP(t, Δt, ) × S L AV (t, Δt)
(6)
Load Prediction Energy Efficient VM Consolidation Policy …
897
It is necessary to get the minimal E SV (t, Δt) to gain an efficient energy-efficiency VM allocation procedure.
3 Load Prediction for Energy Efficient Approach 3.1 Skewness Algorithm In the algorithm n denoted the number of cloud resources we consider and r ei be the utilization of the i-th cloud resource and r¯e is the mean utilization of all cloud resources for server S. We present the resource skewness of a server S as Skewness (S) =
n i=1
re
i
r¯e
2 −1
(7)
The skewness is used to measure the inequality in the deployment of different resources on a cloud server.
3.2 Load Prediction Algorithm The FUSD algorithm the future resource needs of VMs are predicted based on previous statistics. E L(t) = α × E L(t − 1) + (1 − α) × o(t), o ≤ α ≤ 1
(8)
The E L(t) denotes the estimated load, O(t) defined the observed load at time t. The α is constant reproducing a trade-off among constancy and responsiveness. This algorithm used the above equation to forecast the load on the VM server. They estimate the load and forecast the load in every minute.
3.3 Temperature The power consumption of the host is higher than a host threshold that host treated as a hotspot. The result of the hotspot server is overloaded and few VMs performing; therefore, it must transfer away. The temperature of the hotspot is illustrated as a P as the square sum of its cloud resource consumption left from the hot threshold. Let RE be the set of hotspot in server P and rt be the hot threshold for hotspot. The definition of the temperature is:
898
K. P. N. Jayasena and G. K. S. W. de Chickera
temperature(P) =
(r − rt )2
(9)
r ∈R E
We classify a host as a coldspot if the energy usage of all its hosts is lower than a cold threshold. The result of the coldspot is server frequently idle; therefore, it must be turned-off to reduce the energy. The FUSD-based VMC strategy as follows: 1. Order the hosts, select the hosts with the highest temperature. 2. Select one VM on that host that would reduce host’s temperature. 3. If many such VMs, select the one that increases skewness the least find a host that can recognize this VM and not become a hotspot. 4. If many such hosts, choose the one that decreases its skewness most after accepting the VM. 5. If there are no, any host can be selected to continue to the next VM on the hot host. 6. The host is referred as coldspot if the sum of that hosts utilisations for different resources is under a given cold threshold. 7. Begin with the hosts that have the least utilization for memory below the cold threshold. 8. Migrate all the VMs to a different hosts that will not become hot/cold after migration (to avoid future hotspots).
4 Implementation and Simulation This part illustrates the performance appraisal of the experiments presented in this research. We evaluate our approach with traditional policies and evolutionary computing-based policy.
4.1 Experiment Environment It is necessary to analyze massive-scale virtualized data centers in multimedia cloud platform; however, analyzing such actual environment is expensive and timeconsuming. Therefore, we executed our algorithm and evaluated the quality of our solution using simulation environment of CloudSim [15]. Table 1 illustrates the characteristics of machines. Table 2 illustrated that the Amazon EC2 instance types of virtual machines are used for experiment and the characteristics are given in Table 2. Workload data: In the workload data, we use CoMon data project [16] data for our experiment. This data project compromise with CPU consumption of thousands of VMs from data servers on 500 geographic area in the world. This workload traces are collected in March and April in 2011 with 5 min time interval and evaluate for previous research.
Load Prediction Energy Efficient VM Consolidation Policy … Table 1 Servers configuration Server CPU modal HP ProLiant DL380 G4 HP ProLiant ML310 G5
899
RAM (GB)
Frequency (MHz) Cores
Intel Xeon 3040
4
1860
2
Intel Xeon 3075
4
2660
2
Table 2 Amazon EC2 types) Virtual machine instance type High-CPU medium Extra-large Micro Small
MIPS
RAM (GB)
2500 2000 500 1000
0.8 3.7 0.61 1.70
Simulation parameters: The parameters used in the experiment are cold threshold = 0.25, green computing = 0.4, hot, warm threshold = 0.9, 0.65, respectively, and consolidation limit = 0.05. We evaluate the FUSD load forecast algorithm with ↑ α = −0.20, ↓ α = 0.70, and based on practical experience, we select the default parameters.
4.2 Evaluation Measurements • Energy consumption: Recent studies presented there a linear relationship among energy consumption and CPU utilization. The power modal is defined as f 1 (x) =
Nvm (( pmaxi − pmini ) × Ucpui (t) + pmini ) × Bi
(10)
i=1
pmini : minimum energy consumption of P Mi f 1 (x): the total energy consumption of the P Ms pmini = pmaxi × 0.6
(11)
• SLAV time per active host: The time incurred for an active hosts until CPU utilization of 100% N pm 1 Tsi SLATAH = N pm i=1 Tai
(12)
900
K. P. N. Jayasena and G. K. S. W. de Chickera
N pm : the number of physical host, Tai : the time of the physical host, Tsi : the time span for the hosts which utilization reaches to 100% and i: active status. • SLA performance degradation: The performance degradation during the VM migration that connects with SLA presented as Nvm C Pd j 1 PDM = Nvm J =1 C Pr j
(13)
Nvm : the number of VMs, C Pd j : performance degradation by VM migrations, and C Pr j : the total CPU utilization requested by the jth virtual machine. • SLA violation: This is an important metric for measuring performance of the policy, and it should be minimum. SLAV = SLATAH × PDM
(14)
• Number of VMM minimization When transferring VMs from one physical host to another host, the performance may decrease. For the better QoS, it is important to minimize the number of migrated VMs. The number of VMM at the time instant t defined as: F3 (x) =
Nvm
Bi (t)
(15)
i=1
Bi (t): the binary variable that equals 1 F3 (x): number of VMM at instant time t, if V Mi is migrated or otherwise 0 (V Mi is not migrated). In this research, we implemented another two metrics which we define as EnergySLAV-Migration (ESM) and Enery-SLAV (ESV) to enhance the concurrently reduction of energy, SLAV, and number of VMM, ESV = SLAV × Energy
(16)
ESM = Migration × Energy × SLAV
(17)
4.3 Results Analysis The proposed policy is accessed with various existing policies (MAD-MC, IQR-RS, IQR-MMT, MAD-MMT, MAD-RS) and evolutionary algorithm designed for VM consolidation (AGA-MMT, AGA-MC, AGA-RS) in the CloudSim3.0 platform. The overload detection algorithms methods (IQR, MAD IQR and AGA) and (i.e., RS, MC and MMT) VMs selection methods adopted in this experiment. The results for power consumption, any number of ESM, SLAV, and ESV are presented in Table 3.
Load Prediction Energy Efficient VM Consolidation Policy … Table 3 Characteristics of the workload (CPU utilization) Policy Energy SLAV×106 ESV×106 consumption (KWH) IQR-MC [14] IQR-MMT [14] MAD-MMT [14] IQR-RS [14] MAD-MC [14] MAD-RS [14] AGA-MMT AGA-MC AGA-RS FUSD-FUSD
901
VM migration ESM
177 189
701 315
1241 595
23,035 26,476
28,597 15,751
185
331
612
26,292
16,089
180 176
719 739
293 1302
23,988 23691
31,011 30,836
178 139 151 162 83
735 314 623 710 296
1306 437 942 1153 247
24,082 24,508 25,100 25,100 1399
31,448 10,708 23,633 28,931 345
Fig. 2 Energy consumption
Figure 2 illustrates that the FUSD has the minimum energy consumption. The VM migration value is the most critical factor that affects SLAV and power consumption. The comparison between the traditional policy and AGA from the view of VM migration value illustrated in Fig. 5. When one of the variables is reduced (SLATAH or PDM), the SLAV is also reduced. Figure 3 shows FUSD algorithm that expands SLA violation comparison with others. The ESV value was implemented by multiplying SLAV and energy consumption referred in Sect. 4.2 and used for the concurrently enhancement of SLAV and energy consumption presented in Fig. 4. The ESM metric evaluation results also presented that the FUSD-FUSD mechanism has the best performance compared to others.
902
K. P. N. Jayasena and G. K. S. W. de Chickera
Fig. 3 SLA violation
Fig. 4 ESV
Fig. 5 Number of VM migrations
5 Conclusion In this research, load prediction-based VMC policy called FUSD is proposed to the experiments the VM migration, SLAV, and power consumption. We proposed an energy-efficiency framework adopted by FUSD load prediction method for dataserious tasks in multimedia cloud. It enhanced the VM migration result from VM consolidation mechanism and designs. The skewness theory is used to compute the irregular usages of various cloud resources on the data centers among VMs and identify the existing cloud resource and forecast the future load to mitigate the overload issue on the cloud server. These are the outcomes arise in the experiments. Efficient VM migration is focused on VM consolidation policy; the number of occurrence of the VM migrations can be reduced by FUSD VM consolidation mechanism; the FUSD-based load prediction VM policy can get the maximum energy efficiency as compared to traditional and
Load Prediction Energy Efficient VM Consolidation Policy …
903
evolutionary computing-based policies. The proposed algorithm has the minimum ESM and ESV values demonstrate which can get the less power consumption based on a fair value of SLAV. In future, we plan to evaluate the FUSD policy performance in a real cloud infrastructure and compare it with many heuristics and meta-heuristics algorithms. Compliance with Ethical Standards The authors declare that they have no conflict of interest.
References 1. M.S. Hossain, C. Xu, A. Artoli, M. Murshed, S. Göbel, Cloud-based multimedia services for healthcare and other related applications. Future Gener. Comput. Syst. 66, 27–29 (2017) 2. G. Han, W. Que, G. Jia, L. Shu, An efficient virtual machine consolidation scheme for multimedia cloud computing. Sensors 16(2), 246 (2016) 3. M.A. Sharkh, A. Shami, An evergreen cloud: Optimizing energy efficiency in heterogeneous cloud computing architectures. Veh. Commun. (2017) 4. F.T. Chong, M.J. Heck, P. Ranganathan, A.A. Saleh, H.M. Wassel, Data center energy efficiency: improving energy efficiency in data centers beyond technology scaling. IEEE Design Test 31(1), 93–104 (2014) 5. E. Arianyan, H. Taheri, S. Sharifian, Novel energy and sla efficient resource management heuristics for consolidation of virtual machines in cloud data centers. Comput. Electr. Eng. 47, 222–240 (2015) 6. S.Y.Z. Fard, M.R. Ahmadi, S. Adabi, A dynamic vm consolidation technique for qos and energy consumption in cloud environment. J. Supercomput. 1–22 (2017) 7. D. Deng, K. He, Y. Chen, Dynamic virtual machine consolidation for improving energy efficiency in cloud data centers, in Cloud Computing and Intelligence Systems (CCIS) (IEEE, 2016), pp. 366–370 8. F. Ahamed, S. Shahrestani, B. Javadi, Security aware and energy-efficient virtual machine consolidation in cloud computing systems, in Trustcom/BigDataSE/ISPA (IEEE, 2016), pp. 1516–1523 9. G.B. Fioccola, P. Donadio, R. Canonico, G. Ventre, Dynamic routing and virtual machine consolidation in green clouds, in 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) (IEEE, 2016), pp. 590–595 10. M. Pedram et al., Hierarchical, portfolio theory-based virtual machine consolidation in a compute cloud. IEEE Trans. Serv. Comput. (2016) 11. F. Farahnakian, T. Pahikkala, P. Liljeberg, J. Plosila, N.T. Hieu, H. Tenhunen, Energy-aware vm consolidation in cloud data centers using utilization prediction model. IEEE Trans. Cloud Comput. (2016) 12. Z. Xiao, W. Song, Q. Chen, Dynamic resource allocation using virtual machines for cloud computing environment. IEEE Trans. Parallel Distrib. Syst. 24(6), 1107–1117 (2013) 13. M. Kaliappan, S. Augustine, B. Paramasivan, Enhancing energy efficiency and load balancing in mobile ad hoc network using dynamic genetic algorithms. J. Netw. Comput. Appl. 73, 35–43 (2016) 14. A. Beloglazov, J. Abawajy, R. Buyya, Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future Gen. Comput. Syst. 28(5), 755–768 (2012) 15. R. Buyya, R. Ranjan, R.N. Calheiros, Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: challenges and opportunities, in International Conference on High PERFORMANCE Computing Simulation (2009), pp. 1–11 16. K.S. Park, V.S. Pai, Comon: a mostly-scalable monitoring system for planetlab. ACM Sigops Oper. Syst. Rev. 40(1), 65–74 (2006)
An Attribute-Based Access Control Mechanism for Blockchain-Enabled Internet of Vehicles Sheng Ding and Maode Ma
Abstract Rapid development of wireless technology and Internet of things (IoT) has indirectly promoted the development of Internet of vehicles (IoV). The data collected by the sensors equipped on the vehicles become richer and more detailed. However, most data are sensitive and private which is relevant to the privacy of the driver and the core information of the system. To protect the data from unauthorized access, fine-grained access control must be enforced. Due to the centralized and complicated access management, traditional access control mechanisms are unfit for IoV systems. In this paper, we propose an attribute-based access control scheme for blockchain-enabled IoV systems. A new type of transaction has been designed to represent the authorization of attributes. The proposed scheme could simplify the access management, and effectively prevent the systems from single point of failure and data tampering. We also improve the access control part to decrease its computation overhead for resource-constrained members in the IoV systems. The security analysis demonstrates that the proposed scheme can resist various attacks and the performance analysis shows our scheme can be implemented efficiently. Keywords Internet of vehicles · Blockchain · Attribute-based access control
1 Introduction Internet of vehicles (IoV) is an extended application of Internet of things (IoT) in intelligent transportation system (ITS), which is essentially a data sensing and processing platform, collecting data from the driver, other vehicles, and the external environment, and using it for safe driving, traffic control, crash response, etc. [1]. Based on the real-time sharing data between vehicles, the IoV system can respond promptly to reduce the risk of accidents. For example, once a moving vehicle has a sudden emergency brake, it will send timely messages to inform the surrounding S. Ding · M. Ma (B) School of Electrical & Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_80
905
906
S. Ding and M. Ma
vehicles. However, the development of IoV is still in an early stage, many problems have not been effectively solved [2]. As a result of the rapid increase in the number of IoV security incidents, the security of IoV has gained extensive attention from industry and academia. IoV systems involve mass data, the majority of them is extremely sensitive and relevant to the privacy of the driver and the critical data of the system, such as the information of vehicle condition, driving habit, traffic information, location, and password. If these sensitive data have been accessed by malicious users, the safety of vehicles will be facing great threats. A malicious user may gain access to a target vehicle during software upgrading by forging its identity or credential. Therefore, it is indispensable to protect vehicles from unauthorized access as it will bring about privacy disclosure [3]. However, the access control mechanisms applied in the IoV systems are too weak to meet the strong security demands for the present. It is not appropriate to adopt discretionary access control (DAC) and identity-based access control (IBAC) to enforce access control in the IoV systems, because it is impractical to formulate the access control list (ACL) for each vehicle due to the huge quantity of intelligent vehicles. Mandatory access control (MAC) is generally implemented by a central administrator. As a result, it will lead to the tricky problem of the single point of failure. Furthermore, the vehicles in IoV systems are fast moving and widely distributed, and the centralized access control mechanism may not be suitable for IoV systems. Attribute-based access control (ABAC) is flexible and fine-grained. It can abstract identities or roles into a set of attributes granted by the attribute authority. The data owner can formulate an access policy to specify with which attributes the data requestor is able to obtain the authorized access. In this way, access management could be effectively simplified due to the less quantity of the attributes. To solve the single point of failure problem, blockchain technology provides new possibilities [4]. The data in the blockchain cannot be tampered. It is a promising trend to combine blockchain technology with IoV systems [5] because blockchain technology is expected to build a trust among vehicles and improve the security of the overall system. It can help the IoV systems to build an open and credible database. Billions of intelligent vehicles could build distributed trust through blockchain [6]. In this paper, we propose an attribute-based access control mechanism for blockchain-enabled IoV systems (ABACB). The outstanding features of the proposal can be summarized as follows: 1. By the proposed ABACB scheme, the system do not need to formulate any ACL or assign any role for each one in the IoV systems. Each member will be granted a set of attributes according to its identity, characteristic, and role. To obtain the access authorization, a satisfied set of attributes must be possessed. 2. A new kind of transaction is defined to express the attribute issuance and record it by the blockchain technology. Once the data is written to the blockchain, it will not be tampered with. Anyone could access the information at any time when needed.
An Attribute-Based Access Control Mechanism …
907
The rest of the paper is organized as follows. In Sect. 2, the system model of the proposed solution is provided. The detailed description of our attribute-based access control scheme for blockchain-enabled IoV systems is presented in Sect. 3. We implement security analysis and performance evaluation in Sect. 4 and Sect. 5, respectively. Finally, we conclude in Sect. 6.
2 System Model In this section, we present the system model of the proposed access control scheme ABACB, as described in Fig. 1. The system model mainly includes three entities, which are attribute authorities, data owners, and data requestors. Attribute authorities: Each member of the IoV system needs to register to the attribute authority first. The attribute authority will grant a secret key based on identity-based cryptography and appropriate attributes to each member according to its characteristics and role. With the secret key, they could realize mutually authentication and negotiate a session key. The attribute authority can serve as a consensus node of the consortium blockchain. Each attribute authorization is expressed as an attribute transaction written in the blockchain. Each new transaction will be first put into the transaction pool of the attribute authority which generates it. After the selected transactions are packed into a block, other consensus nodes must verify the validity of the block before writing it into the blockchain. Once successfully written, no one could tamper it unless all the consensus nodes agree on a new consensus. Data owners: The intelligent vehicles are the major players of the IoV system as the data owners. They are also the users of the blockchain, who are not involved in the verification and consensus of blocks. The blockchain is read only by them.
Fig. 1 An overview of the system model
908
S. Ding and M. Ma
To protect the data from unauthorized access, each vehicle may formulate an access policy with attributes to stipulate who could access it. Data requestors: The data collected by the sensors equipped on the intelligent vehicles are valuable for the participants of the vehicular system, such as the roadside unit (RSU) and the vehicles around. For example, RSU needs to interact with vehicles for identification or speed measurement, while the vehicles around need to timely obtain the location and the speed information to avoid crashes [7]. To obtain the access authorization, the data requestors need to use the attributes granted by the attribute authority to demonstrate that they have a set of attributes satisfying the access policy of the data owner. By our ABACB scheme, the IoV system allows (n – 1)/3 in n attribute authorities to be Byzantine nodes. Each attribute authority possesses a pair of public key and secret key. The attribute authority could use the public key to generate its official address, and use the secret key to sign the block. The secret key of each attribute privilege is kept secret, so no one can spoof the signature of each block. The data requestors are untrusted, and they may collude to obtain the access authorization when none of them has a set of attributes which satisfies the access policy formulated by the data owner.
3 Proposed Scheme 3.1 System Initialization The system initialization algorithm will first generate a set of global parameters. Members involved in the system need to agree to a same elliptic curve defined over the finite filed Fq . E is a cyclic subgroup of this elliptic curve and the elliptic curve discrete logarithm problem (ECDLP) is hard to solve in it. G is a base point in E with order r. Two hash function H0 : {0, 1}∗ → Zr∗ and H1 : {0, 1}∗ → {0, 1}λ have also been selected.
3.2 Registration After the registration with the attribute authority, each member will receive a unique number as its ID. Then the attribute authority will issue a secret key based on identitybased cryptography to each member according to the ID of it.
An Attribute-Based Access Control Mechanism …
909
3.3 Address Generation To generate an address for the attribute application, each vehicle can first select k ∈ Zr∗ as a secret key (SK). The corresponding public key (PK) is obviously kG. Then each vehicle could hash PK ID ( represents concatenation) and encode the result using Base58Check encoding technology to get the corresponding address. Hence, the address can be expressed as: Address = Base58Check[H1 (PK(ID)]. The address of each attribute authority is constant. Each of them can use its secret key to create its official address AA using the same method described above.
3.4 Attribute Application Each member could use a self-generated address and its ID to apply for an attribute i. The attribute authority will first verify whether the applicant should be granted the attribute and then generate an attribute transaction between its own official address and the address provided by the applicant: i
AA → Address. Then the attribute authority will hash this transaction, and sign the hash result together with a time stamp using its secret key, which can be expressed as: i SigSK H0 (AA → Addresstime stamp) . At last, the attribute authority will pack the attribute transaction, the timestamp, and the signature up, and put the data packet into its own transaction pool.
3.5 Block Generation All consensus nodes will periodically select a block creator from them and pack the attribute transactions in its transaction pool into a block. Then the block creator will broadcast the new block to other nodes to reach a consensus. We use the PBFT [8] protocol as the consensus algorithm, and the detailed process is described in Fig. 2. In the pre-prepare phase, the new block will be verified by each consensus node and then be broadcasted to the other. When one of the nodes receives 2f same block, it will broadcast a commitment message to the other. Once a node receives 2f +
910
S. Ding and M. Ma
Fig. 2 Reach a consensus by the PBFT protocol
1 commitment messages, the new block will be recognized and appended to the blockchain.
3.6 Access Control In order to obtain the access authorization, a data requestor needs to prove that it has a set of attributes that satisfies the access policy formulated by the data owner. The detailed process between them is described in Fig. 3. 1. The data requestor first use any identity-based authentication and key agreement (AKA) protocol to negotiate a session key K with the data owner. Their subsequent communication will be encrypted by K based on any symmetric key encryption algorithm. Then the data requestor could initiate an access request to the data owner with its identity ID. 2. The data owner will return a random number N ∈ Zr and its own access policy P. 3. The data requestor could first select a set of attributes S which satisfies the access policy formulated by the data owner. Then it uses each secret key SKi which corresponding address has been granted the attribute i ∈ S in the satisfied attribute set to sign the random number N specified by the data owner. Then the data requestor could return the attribute set S and each signature and public key pair SigSKi (N ), PKi , i ∈ S to prove its ownership of each attribute in S. 4. After receiving the messages returned by the data requestor, the data owner first checks whether the submitted attribute set satisfies its access policy. If satisfied, the data owner will hash each PKi ID and encode the result to get each corresponding address. The data owner then can search the blockchain for the latest record of each address. If these addresses have been issued by each attribute
An Attribute-Based Access Control Mechanism …
911
Fig. 3 Implementing access control
in the satisfied set, the data owner will use each public key PKi to verify each signature SigSKi (N ) by calculating: ? VerPKi SigSKi (N ) = N . If the result is the random number N, it is demonstrated that the data requestor indeed possesses the address as well as the attribute. If the data requestor has enough attributes, the data owner will authorize its access and the following data sharing can be encrypted by the session key K.
4 Security Evaluation 4.1 Security Analysis In order to implement correct attribute-based access control, our ABACB scheme must be capable of resisting collusion attack. The data requestors are untrusted, and they may collude to obtain the access authorization when none of them has a set of
912
S. Ding and M. Ma
attributes which satisfies the access policy formulated by the data owner. It obviously disables the access control. Suppose that the data owner, Bob, has specified an access policy {Attx AND (Atty OR Attz )}. If a data requestor has both attributes Attx and Atty , or Attx and Attz can obtain the access authorization. Suppose that there are two data requestors, Alice and Eve. Unfortunately, Alice has only the attribute Attx and Eve has only the attribute Atty . Obviously, neither of them can get the access authorization. However, if Alice and Eve collude with each other, they will form a satisfied set of attributes. By our ABACB scheme, the data owner Bob will hash PKi submitted by Alice together with its identity IDA and encode the result to get the corresponding address, which can be expressed as: Address = Base58Check[H1 (PKi IDA )]. Eve may intend to transfer its attribute Atty to Alice, including an address which has been issued the attribute Atty , the corresponding public key and the signature of the random number to enforce collusion attack. Bob can easily find that the attribute Atty does not belong to Alice due to the difference between their identity IDs. If Alice submits the address, the corresponding public key, and the signature number of the random to Bob, Bob can detect that the address Base58Check H1 PKAtt y IDA is not the same address as Base58Check H1 PKAtt y IDE submitted by Alice which has been issued the attribute Atty and then terminate the communication with Alice. Therefore, the proposed access control scheme has its ability against collusion attacks.
4.2 Formal Verification by AVISPA AVISPA is popular used and agreed to analyze the automatic security at the Internet protocols and application area. The simulation used the OFMC back-end. It has limited number of sessions. Dolev-Yao model is used as the intruder model. With this model, the intruder can completely control the system. Therefore, the intruder will become the intermediate node of all messages that came from agents. In that way, messages sent from agents may be intercepted, analyzed, or misrepresented when the intruder knows the secret keys. And he will send the false messages to a random agent and pretend itself as a normal agent of the network. The simulation experiment demonstrated ABACB can prevent many kinds of intruder types and all the expected safety purposes are achieved, which can be seen in Fig. 4.
5 Performance Analysis A proof of concept numerical study on the proposed access control scheme has been carried out to check its validity and evaluate its computation overheads.
An Attribute-Based Access Control Mechanism …
913
Fig. 4 Results of formal verification by AVISPA
We construct blockchain to simulate the IoV system over the Hyperledger Fabric platform. It features a modular architecture that provides a high degree of confidentiality, flexibility, and scalability. The proposed ABACB scheme is run in a desktop with Intel Pentium G620 CPU @ 2.60 GHz and 1 GB RAM. The operation system is Ubuntu 16.04LTS. Hyperledger Fabric is a promising platform for distributed ledger solutions. The performance of the platform has been extensively analyzed by both industry and academia. Therefore, in order to avoid duplication, in this paper, the computational cost analysis is mainly focused on the access control part. The impact of the computation overhead in the process of the access control over the overall performance of the system has been evaluated with the analysis results shown in Fig. 5. From Fig. 3, it is clear that, using each secret key, whose corresponding address has been issued with the attribute in the satisfied set, to sign the random number selected Fig. 5 Computation overheads
914
S. Ding and M. Ma
by the data owner accounts for the vast majority of the computation overhead of the data requestor. The computation overhead increases linearly with the number of attributes in the satisfied set. In addition to verify the signatures returned by the data requestor, the data owner needs to hash each PK submitted by the data requestor together with its ID to get each corresponding address. The computation overhead for the data owner is also in direct proportion to the number of attributes in the satisfied set. To our knowledge, a NIST256P signature cost 2.87 ms to compute and 6.34 ms to verify in a high-quality C++ implementation of the elliptic curve digital signature algorithm (ECDSA). From Fig. 5 we can see that, the computation overhead of the data owner exceeds that of the data requestor with the increase of the number of attributes. However, although the number of attributes reach 50, the computation overhead is reasonable for each IoV member. Furthermore, an access policy with 50 attributes is rarely used. Hence, our ABACB can be efficiently implemented in the IoV systems.
6 Conclusion We proposed an attribute-based access control scheme for blockchain-enabled IoV systems called ABACB. A new kind of transaction which represents the authorization of attributes has been defined. The data requestor only needs to prove that it has enough attributes to obtain the access authorization, which makes the access control more fine-grained and flexible. The security analysis demonstrates that our ABACB can resist various malicious attacks. The performance analysis demonstrates its efficiency to show that it is acceptable to be used in the IoV systems. Acknowledgements This work is funded by the NRF Systemic Risk and Resilience Planning Grant for the project of NRF2018-SR2001- 005 by National Research Foundation, Singapore.
References 1. J. Kang, R. Yu, X. Huang et al., Privacy-preserved pseudonym scheme for fog computing supported internet of vehicles. IEEE Trans. Intell. Transp. Syst. 19(8), 2627–2637 (2017) 2. A. Dorri, M. Steger, S.S. Kanhere et al., Blockchain: A distributed solution to automotive security and privacy. IEEE Commun. Mag. 55(12), 119–125 (2017) 3. Y. Sun, L. Wu, S. Wu, et al., Security and privacy in the internet of vehicles, in 2015 International Conference on Identification, Information, and Knowledge in the Internet of Things (IIKI) (IEEE, 2015), pp. 116–121 4. M. Iansiti, K.R. Lakhani, The truth about blockchain. Harvard Bus. Rev. 95(1), 118–127 (2017) 5. Y.N. Liu, S.Z. Lv, M. Xie et al., Dynamic anonymous identity authentication (DAIA) scheme for VANET. Int. J. Commun, Syst. 32(5), e3892 (2019) 6. M. Singh, S. Kim, Branch based blockchain technology in intelligent vehicle. Comput. Netw. 145, 219–231 (2018)
An Attribute-Based Access Control Mechanism …
915
7. B. Fan, S. Leng, K. Yang, A dynamic bandwidth allocation algorithm in mobile networks with big data of users and networks. IEEE Netw. 30(1), 6–10 (2016) 8. M. Castro, B. Liskov, Practical Byzantine fault tolerance, in OSDI 1999, vol. 99, issue 1999, pp. 173–186
Intelligent Image Processing
An Investigation on the Effectiveness of OpenCV and OpenFace Libraries for Facial Recognition Application Pui Kwan Fong and Ven Yu Sien
Abstract The complexity and applicability of facial recognition in the growing domains of security and authentication, payment, advertising and health care have accelerated the research and development of this niche in the artificial intelligence (AI) field. A facial recognition system utilizes technologies to recognize a human face through mapping of distinct facial features from raw data such as photography and video to information of known faces stored in the database. Recognizing faces in real-world images is a challenging task. Generally, a facial recognition system involves multiple processes such as facial detection, feature extraction, face matching and confidence scoring in which various techniques and algorithms are available in the form of libraries and frameworks. In this paper, an investigation is conducted using two pipelines to identify the effectiveness of two prevalent opensource libraries, OpenCV and OpenFace, on real-time face recognition system. In the OpenCV pipeline, Viola –Jones algorithm is chosen for face detection, while feature extraction is conducted using local binary pattern histogram (LBPH). Face detection is performed using dlib, while feature extraction is conducted using deep learning algorithm in the OpenFace pipeline. Performance measures of both pipelines are evaluated using Labelled Faces in the Wild (LFW) as a benchmark data set. In addition, a locally populated data set consisting of images of personnel who are given access to the office door system is used to experiment on the applicability of these pipelines on real streaming data. OpenFace achieved an AUROC of 0.98, while OpenCV scored 0.94. Both experiments inferred that facial recognition system with OpenFace pipeline outperforms OpenCV in all aspects of performance measures. Further development demonstrates the successful implementation of OpenFace pipeline to a functional prototype in replacing the manual door access control. Keywords Facial recognition · OpenFace · OpenCV · Biometric authentication P. K. Fong (B) · V. Y. Sien HELP University, Kuala Lumpur, Malaysia e-mail: [email protected] V. Y. Sien e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_81
919
920
P. K. Fong and V. Y. Sien
1 Introduction Face recognition is a computer technique used for capturing the features of a human face for user authentication. The face is a complex multidimensional structure which requires effective computing techniques for recognition. Facial recognition has been known as a challenging field due to various aspects such as quality of the image such as lighting and face positions; isolating the face image from background image; identifying the face by matching it to the database in which the target face may have different expressions and physical appearance (i.e. with and without spectacles) and unique feature extraction from the face to distinguish one person from another. A generic facial recognition system pipeline consists of four stages: identify the face in an image; preprocessing such as normalization and alignment on the face detected; extract facial features and compare the target face with known faces in the database to make a prediction on the identity of the face. In order to develop a face recognition application, several algorithms relevant to image processing and machine learning have to be chained. Face recognition has been widely implemented across various fields through multiple methodologies and platforms such as programming software development, cloud-based application programming interfaces (API) as well as software libraries [1]. As the applicability of face recognition increases in human–computer interaction, entertainment and general identity verification, the emergence of APIs and software libraries has taken over traditional programming development. Both API and software libraries have their own strengths in which the former provides easy implementation while the latter enables customization for facial recognition application [2]. This paper will focus on software libraries for facial recognition. Two libraries, namely Open Source Computer Vision Library (OpenCV) and OpenFace are chosen as OpenCV, are the most powerful computer vision library with more than 500 functions [3], and OpenFace utilizes one of the emerging algorithms, deep neural networks [4]. As both libraries can be implemented in the facial recognition application, the accuracy and performance of these libraries may be different. A study is conducted to investigate the effectiveness of these libraries in a facial recognition for a door access system.
2 Background and Related Works 2.1 Conventional and Biometrics Door Access System A conventional door access system allows any individual with an access card or PIN number to unlock the door access. The conventional PIN number method is still widely used due to its lower cost, frequency and simplicity of set up and maintenance.
An Investigation on the Effectiveness …
921
Common problem arises when the card is lost, or the PIN number is forgotten by users. In addition, these methods to unlock the door system may not be secure as PIN number is not unique. Any individual who knows the PIN number can gain access to the facility without being identified by the system. To increase the level of security, PIN number with complex combinations must be changed frequently. However, it is normally difficult to remember all the PIN numbers when individuals hold more than one account with different PINs. In order to overcome these difficulties, a variety of biometric authentication has been studied and developed to increase the security and usability of biometric system [5]. A survey conducted by one of the biometric service providers, Veridium, found that speed and security are the two main reasons in expanding the use of biometrics authentication in the workplace [6]. A growing preference adopting biometrics in other domains such as banking, check-in and various mobile applications shows the values of face recognition, iris recognition and fingerprint authentication in gaining access and verification. As these features are unique to each individual, the difficulty of password management can be eliminated while increasing the security level.
2.2 Face Recognition System A facial recognition system determines the identity of an input (face image) by matching it to a database of known individuals [7]. Generally, the facial recognition procedure can be broken down into three major steps, namely face detection, feature extraction and face recognition [8]. A face detection algorithm is used to identify the presence of any face(s) in the image and where it is located. In some libraries, face alignment is performed to improve the accuracy as facial recognition systems have in general experience difficulties in recognizing the same faces with different orientations. Feature extraction is performed on the aligned face to obtain important features that are useful for recognition. The face patch detected is transformed into vector or fiducial points depending on the algorithm implemented. Finally, recognition is performed by comparing features from detected face-to-face features stored in a database using classification algorithms. Figure 1 illustrates the overall workflow of a facial recognition system.
2.3 OpenCV OpenCV is an open-source computer vision software library consisting of more than 2500 optimized algorithms primarily for image detection, recognition, identification, tracking and other image processing procedures [3]. Due to its high applicability in various domains and platforms, well-established companies such as Intel, IBM,
922
P. K. Fong and V. Y. Sien
Fig. 1 General steps in face recognition system [11]
Fig. 2 LBPH procedure [12]
Yahoo and Google have implemented this library in detecting intrusions, stitching street view images, navigating robots and inspecting product labels. The OpenCV library provides different types of feature extraction and recognition algorithms, while the most commonly used algorithms are eigenfaces, Fisherfaces and local binary pattern histogram (LBPH). Among these algorithms, LBPH has shown to produce highly discriminative features, more accurate recognition results and requires less computational complexity in achieving efficient results as compared to other face recognition algorithms [9]. LBPH represents image pixels using local binary pattern with histogram. Four parameters, namely radius, neighbours, grid X and grid Y, are adjusted to obtain the binary number based on thresholding properties with each neighbouring pixel. Figure 2 illustrates a simple computation to obtain a feature presentation for one pixel which is later converted to a histogram to represent the whole image. Images with high similarity of histogram representation are considered a match.
2.4 OpenFace OpenFace is a facial recognition model based on deep learning developed by Amos et al. [4] in the Python programming language. They believe that this open-source library provides promising facial recognition results similar to DeepFace in Facebook and FaceNet in Google.
An Investigation on the Effectiveness …
923
In the high-level architecture overview shown in Fig. 3, the model can be separated into two main components: feature extraction and classification. As images are unstructured data, feature extraction is performed to obtain a set of feature vector for classification. In OpenFace, feature extraction is performed offline using Google FaceNet model comprising neural network with triplet loss function. The trained neural network is used for feature extraction in the second component, classification for facial images. In recognizing new images, OpenFace integrated a face detection model from dlib to separate the target face from the background image. Subsequently, the detected face goes through affine transformation to ensure that each face is aligned in the orientation suitable for feature extraction. The normalized face image is then fed into the trained neural network to extract 128 facial embeddings that will subsequently be used for classification using the support vector machine (SVM) by default. Since feature extraction is a crucial step which determines the accuracy of the system, local features such as distance between eyes, nose and mouth are insufficient. In order to extract complex features from the huge face data set, high computational complexity and cost will be incurred when conventional feature extraction techniques are used. To increase the efficiency of facial recognition, OpenFace library provides offline training to extract features from 500,000 images which reduces the time required to perform feature extraction.
Fig. 3 High-level architecture of OpenFace [4]
924
P. K. Fong and V. Y. Sien
In addition, this one-time feature extraction phase to produce 128 facial embeddings converts the high-dimensional image to low-dimensional data which reduces the complexity of the system recognition. The training data can be updated instantaneously, whereby the model can be retrained with minimal time required. Apart from this, OpenFace integrated dlib and affine transformation which enables this library to handle inconsistent and bad lighting as well as different facial positions.
3 Design and Implementation In acquiring face images, a video camera is used to capture images of individual as the application is always on standby mode. As the individual approaches the mounted camera, a proximity sensor is triggered, and face detection is activated. Results from the face detection algorithm are sent to OpenCV or OpenFace for recognition. The implemented pipelines for both OpenCV and OpenFace are shown in Figs. 4 and 5. Instead of using the default design, both pipelines are designed to use the same face detection, alignment and recognition algorithms in order to investigate the efficiency of OpenCV and OpenFace feature extraction algorithms. The face data set of each user is created by taking a minimum of 10 images from different random angles under the same lighting condition. For this research, a total of 33 faces are identified and each of the images is labelled with the user’s name. A total of 535 images are fed into the respective system for image pre-processing and feature extraction to produce a set of feature vector to represent each of the image. A Web application is developed to package this facial recognition system using OpenCV and OpenFace. ReactJS with JavaScript ES6 is used to generate the frontend graphical user interface (GUI), node WebSockets to process the video feed from server and Redux library to manage the GUI state. Meanwhile, ExpressJS and MongoDB are used for the back-end video processing and communication with the server which stores both the OpenCV and OpenFace recognition system developed in Python programming language.
Fig. 4 OpenCV library pipeline for real-time face recognition system
Fig. 5 OpenFace library pipeline for real-time face recognition system
An Investigation on the Effectiveness …
925
4 Results and Discussions In this section, comparison results will be presented using a real data set obtained from the OpenCV and OpenFace feature extraction algorithm. To ensure unbiased comparisons, both features extracted are fed into the same classifier. Experiments are performed to investigate the performance of OpenCV and OpenFace feature extraction methods using precision, recall and F-measure rates. Stratified k-fold in Python scikit-learn is used to prepare these uneven numbers of images. A fivefold cross-validation is performed on separate data sets to train and test the data. The experiments were carried out by adapting protocols and techniques from [10]. Tables 1 and 2 summarized the precision, recall and F-measure rates for OpenCV and OpenFace. On average, classifier trained using features from OpenFace library achieved higher precision, recall and F-measure rates as compared to features from OpenCV by 0.038, 0.06 and 0.06. Considering the small data set used in both experiment settings, OpenFace which uses deep learning shows a better result in all the three performance measures. Inferring from this example, deep learning algorithm is useful for feature extraction which plays an important role in increasing the effectiveness of facial recognition application. To further verify the effectiveness between OpenCV and OpenFace, an area under the curve of the receiver operating characteristics (AUROC) shown in Fig. 6 is computed from the confusion matrix given by the classifier results. Figure 6 shows comparisons of ROC curves and AUROC values for both facial recognition systems. Features from OpenFace outperforms OpenCV with faster processing time by 0.04. These results showed that deep learning algorithm in OpenFace is able to extract more significant patterns from the images as compared to LBPH algorithms used in OpenCV. Table 1 Performance measures using OpenCV pipeline
Table 2 Performance measures using OpenFace pipeline
K-fold
Precision
Recall
F-measure
1
0.928
0.909
0.904
2
0.898
0.864
0.859
3
0.845
0.848
0.833
4
0.984
0.955
0.960
5
0.924
0.886
0.884
K-fold
Precision
Recall
F-measure
1
0.947
0.947
0.942
2
0.919
0.917
0.905
3
0.976
0.967
0.967
4
0.942
0.955
0.946
5
0.986
0.977
0.978
926
P. K. Fong and V. Y. Sien
Fig. 6 ROC of facial recognition system using OpenFace and OpenCV
5 Conclusion This paper presented an investigation on the effectiveness of using the open-source framework, OpenCV and OpenFace, for a facial recognition system. Accuracy results from using real data set obtained from the School of Information and Communication Technology, HELP University show a positive output for both open-source frameworks. As OpenFace outperforms OpenCV in all the performance measures, this framework is implemented in a facial recognition system located outside the staff room in the School of Information and Communication Technology. This system however needs to be improved as the small data set may not be representative of the system’s performance in general. Testing on the prototype built upon OpenFace pipeline as shown in Fig. 5 produced a minimum accuracy of 90% for the images of 33 people registered in the database. Thus, future work should focus on fine-tuning the accuracy in order to increase the applicability of this facial recognition system for security purposes that require identification and authentication. Compliance with Ethical Standards The study was funded by HELP Internal Research Grant Scheme (grant number 046). The authors declare that they have no conflict of interest. Conflict of Interest The authors declare that they have no conflict of interests. Ethical Approval This chapter contains the use of human face images to train and test the face recognition model which has been conducted as per the ethical approval. Informed Consent Informed consent was obtained from all individual participants included in the study.
An Investigation on the Effectiveness …
927
References 1. D.N. Parmar, B.B. Mehta, Face recognition methods & applications. arXiv:1403.0485 (2014) 2. P. Masek, M. Thulin, Evaluation of face recognition apis and libraries (University of Gothenburg, Sweden, 2015) 3. G. Bradski, The opencv library. Dr. Dobb’s J. Softw. Tools (2000) 4. B. Amos, B. Ludwiczuk, M. Satyanarayanan, OpenFace: a general-purpose face recognition library with mobile applications. CMU School of Computer Science, 6(2) (2016) 5. S. Mayhew, Survey shows growing preference for use of biometric authentication in the workplace. Retrieved from https://www.biometricupdate.com/201902/survey-shows-growing-pre ference-for-use-of-biometric-authentication-in-the-workplace (2019) 6. Z. Rui, Z. Yan, A survey on biometric authentication: toward secure and privacy-preserving identification. IEEE Access 7, 5994–6009 (2018) 7. R. Jafri, H.R. Arabnia, A survey of face recognition techniques. JIPS 5(2), 41–68 (2009) 8. W.L. Chao, Face Recognition (GICE, National Taiwan University, Taipei, 2007) 9. T. Ahonen, A. Hadid, M. Pietikainen, Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 12, 2037–2041 (2006) 10. J. Bergstra, N. Pinto, D.D. Cox, SkData: data sets and algorithm evaluation protocols in Python. Comput. Sci. Discov. 8(1), 014007 (2015) 11. S. Narang, K. Jain, M. Saxena, A. Arora, Comparison of face recognition algorithms using OpenCV for attendance system. Int. J. Sci. Res. Publ. 8(2), 268–273 (2018) 12. K. Salton, Face recognition: understanding LBPH algorithm. Retrieved from https://towardsda tascience.com/face-recognition-how-lbph-works-90ec258c3d6b (2017)
Virtual Reality as Support of Cognitive Behavioral Therapy of Adults with Post-Traumatic Stress Disorder Ivan Kovar
Abstract The subject of this article is the use of virtual reality in psychological cognitive behavioral therapy of adults with post-traumatic stress disorder. A high percentage of adults suffers from this disorder during their lives. At the beginning of the paper, there is a theoretical description of virtual reality and explanation of post-traumatic stress disorder. The main paper’s part includes the research itself. The 6-month long research works with five adult respondents who suffer from anxiety disorders caused by traumatic events. The question of this paper is to find out if it is possible to speed up regular psychological cognitive behavioral therapy and reduce the use of pharmaceuticals with the help of virtual reality. The results of psychological tests after the treatment with the use of virtual reality are auspicious because there are verifiable decreases in depression and anxiety levels. Conclusion, evaluation, and discussion of the research are included. Keywords Post-traumatic stress disorder · Cognitive behavioral therapy · Anxiety disorder · Virtual reality
1 Introduction This research introduces a subject of the use of virtual reality (VR) as a tool to overcome the post-traumatic stress disorder (PTSD) of adults suffering from anxiety disorders. Outcomes from various researches present us that dealing with PTSD is not the best and usually the use of mood stabilizers and anxiety medications is the primary option how to deal with this problem [1, 2]. Currently, psychotherapy and pharmacotherapy (especially antidepressants) are used to treat PTSD. Antidepressants belong to the group of narcotics which are often used in the therapy of the most common depressive disorders and anxiety diseases. In I. Kovar (B) Faculty of Applied Informatics, Tomas Bata University, Nad Stranemi 4511, Zlin 760 05, Czech Republic e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_82
929
930
I. Kovar
clinical practice, there are also used mood stabilizers, anxiolytics, or antipsychotics. However, the clinical response to psychopharmacs is not optimal. An important role in the treatment of post-traumatic stress syndrome is also played by placebo, which has similar success as commonly used drugs. It can mean that the basic solution should be to eliminate the trauma from the mind and to eliminate stress. So, VR was used to find out if it can help with these areas [1]. The lifetime prevalence of PTSD is 5.0–6.0% for men and 10.4–12.3% for women in the European population [2]. The question of this paper is to find out if VR can be used as a tool to help with cognitive behavioral therapy of people suffering from PTSD. The research was working with five adult respondents during 6-month long research investigation. An environment of VR provides safe and physically painless form of the therapy. Samsung Gear VR version SM-R325 and smartphone Samsung Galaxy S7 Edge were used in this research for creating VR environment.
2 Description of PTSD PTSD arises as a reaction to a traumatic event, where the victim repeatedly experiences an event in thoughts, dreams, and fantasies. PTSD is defined as a mental disorder that arises from sudden, life or personal integrity threatening events such as war, flood, fire, severe injury, car accident, abuse, rape, kidnapping, life-threatening disease, or changes in interpersonal relationships social roles—job loss, partner infidelity, divorce [3]. People of any age can suffer from PTSD. This contains people who have been through a physical assault, sexual abuse, disaster, an accident, war veterans, natural disasters, and many others. Based on data of the National Center for PTSD, about 8 out of every 100 people will experience PTSD during their lives. Not all the people with PTSD had to go through a dangerous situation. Some of them developed PTSD based on a friend’s or relative’s experience [3]. Symptoms of PTSD: • • • • • • •
nightmare caused by trauma connected with having difficulty sleeping; unwanted repeated memories; having inappropriate reactions, e.g., angry outbursts; feeling intense; avoidance and numbing; often mood changes and negative or self-destructive thoughts; difficulties with concentration [4].
Any treatment depends on the seriousness of symptoms, the time and how soon they occur after the traumatic event. There are three types of treatment using for PTSD [5].
Virtual Reality as Support of Cognitive Behavioral Therapy …
931
Cognitive behavioral therapy focused on the trauma involves gradually “exposing” patient to emotions and situations that remind him/her the trauma. Then, the psychotherapist tries to replace distorted and irrational thoughts about the experience with a more balanced picture [6]. Family therapy is suitable especially for close relatives of the patient to understand what the patient is going through and how they can help him/her to work through relationship problems together as a family [6]. Medication is usually prescribed to patients with PTSD to help them with secondary symptoms of depression or anxiety. This is not the best option for how to help with the causes of PTSD [6].
3 Description of VR VR can be described as an electronic system that creates an artificially generated 3D computer environment [7]. For the research, VR unit Samsung Gear VR version SM-R325 which can be seen in Fig. 1 and the smartphone Samsung Galaxy S7 Edge were used to create a 3D environment. Samsung Gear VR has been created by Samsung Electronics in a collaboration with the company Oculus. In the simplest description, it is a head-mounted housing unit for a Samsung Galaxy compatible smartphone to enable VR effect. This VR set can be described as a system where the smartphone is located in the VR unit and works as the headset’s display and processor. There are three degrees of freedom tracking with the use of a gyroscope, a magnetometer, and an accelerometer that means it is possible to track the rotation of the head and position of the remote Bluetooth controller. The remote controller is used to interact in a VR environment. This VR set was used because it has resolution good enough to change the human imagination; it is a portable device which is easy to transport from one location to another, and it is possible to use this VR set also in a seated or lying position. The VR unit weighs 345 g. The whole VR set has 502 g. This VR unit has a view field which is 101° [8, 9]. Fig. 1 VR unit Samsung Gear VR with a remote controller
932
I. Kovar
4 Description of the Research The data of five people (25–47 years old) who suffer from PTSD due to their previous traumatic event were collected. The goal of this paper was to determine whether it is possible with the appropriate and targeted use of VR to speed up regular psychological cognitive behavioral therapy and to help with the mental state of respondents.
4.1 Background and the Process of the Research At the start of the analysis, we talked independently to all the respondents and received more detailed information about their life situation and their mental problems. The collected data can be seen in Table 1. From February 2019 to July 2019 (6 months), all investigated respondents were exposed to VR affected psychological cognitive behavioral therapy. The effect of VR affected psychological cognitive behavioral therapy on the psychic state of the respondents was checked and the results were measured. The respondent 1 can be seen during her VR therapy in Fig. 2. Unique 360° VR video material was created personally and individually for each respondent exactly according to his/her needs. A special 360° camera GoPro Fusion was used for recording these 360° videos retracing respondent’s traumatic event. There were 25 meetings in total with each respondent. Every meeting had three hours and the meetings were held repeatedly every week during 6 months period of the time. The structure of every meeting during the research was the same. All the participants went through four well-known psychological anxiety tests and questionnaires at the beginning of each meeting. There were used metacognition questionnaire-30 (MCQ-30), social avoidance and distress scale (SADS), trail making test (TMT), and the PTSD checklist for DSM-5 (PCL-5). The VR affected psychological cognitive behavioral therapy was following. During this VR therapy, unique personalized multimedia content was displayed to the respondent with the use of VR set. This VR affected psychological cognitive behavioral session was supplement with 5.1 Table 1 Overview of general data of the respondents involved in the research Resp. 1
Resp. 2
Resp. 3
Resp. 4
Resp. 5
Gender
Female
Female
Male
Male
Female
Age
25
47
40
27
32
Medication
No
Yes
No
No
Yes
Trauma
Direct
Indirect
Direct
Direct
Indirect
Traumatic event
Car accident Death of mother Car accident House fire Death of a child
First contact with 2012 PTSD
2008
2012
2014
2016
Virtual Reality as Support of Cognitive Behavioral Therapy …
933
Fig. 2 Respondent 1 during her VR affected therapy
surround sound to improve the overall VR experience. When this two hours VR affected psychological cognitive behavioral therapy finished, there was a relaxation phase to relax, calm down, and get rid of the tension for about twenty minutes. There were participated a psychologist and a general practitioner during the respondent’s VR therapy once per month. They were used to consult the respondent’s measured psychological data and overall health condition of examined respondent. InFig. 3, it can be seen the respondent 2 during her VR therapy while she was going through a difficult emotional and traumatic situation from her past. Ethical approval: This chapter contains the study of human participants. Informed consent was obtained from all participants included in the study. All procedures performed in studies involving human participants were in accordance with the ethical standards of Tomas Bata University and Czech research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
4.2 Results and Discussion The mental status of respondents was inspected with the help of four psychological tests. Metacognition Questionnaire-30 (MCQ-30) This questionnaire rates the individual differences in five factors which are positive beliefs about worry, negative beliefs about uncontrollability and danger, cognitive self-consciousness, the need to control thoughts and cognitive confidence. These factors are important in the metacognitive model of mental disorders. If there is poor metacognition, it may boost pathological concerns, promote manifestations of anxiety, and encourage obsessive and compulsive symptoms. While the score for each section can be from 6 to 24
934
I. Kovar
Fig. 3 Respondent 2 during her VR affected therapy
points, the total score can be from 30 to 120 points with a higher score illustrating a higher degree of incorrect metacognitions [10]. The measured results of MCQ-30 are visualized in Fig. 4. Social Avoidance and Distress Scale (SADS) Each question of this psychological questionnaire is associated with a specific aspect of PTSD anxiety. Respondents have to answer personally if the question is true or false according to them. If they are not sure about the answer, they should choose the answer which better describes their feelings at the current moment. It is crucial to choose what they feel first, not to think about the answer for a long time. The final score is received and based on the answers true or false. This questionnaire has a total score from 0 to 28 points, where a higher score indicates a higher level of PTSD anxiety [11]. The results of the psychological measurement of SADS can be seen in a graphical form in Fig. 5. Trail Making Test (TMT) It is one of the most popular neuropsychological tests. This test helps with the speed of reactions, task switching, visual searching, and the flexibility of mental health. It is very important for the respondents to train these attributes to get used to VR. This test is also sensitive to detect cognitive impairment
Virtual Reality as Support of Cognitive Behavioral Therapy …
935
Fig. 4 Visualization of the data from MCQ-30 measurement
Fig. 5 Results of SADS measurement in a graphic form
associated with dementia. The test consists of two parts of the psychological evaluation. On an empty paper, there are randomly located numbers and letters. The aim is to draw a line which connects step by step first 25 numbers in the evaluation trail part A. The task in trail part B is to alternate letters and numbers like, for example, 1, A, 2, B, 3, C… The time which is required to finish both tests is the conclusion of this psychological test [12]. The time of 29 s is an average time to finish the TMT trail part A with an inadequate time for this part of the test is the time which is bigger than 78 s. The inadequate
936
I. Kovar
time for the TMT trail part B is the time which is bigger than 273 s with an average time for this part of the test 75 s [13]. The collected results of the measurements can be seen in Fig. 6 (TMT trail part A) and in Fig. 7 (TMT trail part B). PTSD Checklist for DSM-5 (PCL-5) This test consists of 20 questions and the answers to these questions show the presence of PTSD symptoms. Questions of the PCL-5 correspond with the new DSM-5 psychiatric manual which was released in 2013. The PCL-5 can be used in a few cases. One of them is to help with creating a temporary diagnosis of PTSD. Another instance where it can be used is for measuring
Fig. 6 Visualization of the collected data from TMT (trail part A)
Fig. 7 Visualization of the collected data from TMT (trail part B)
Virtual Reality as Support of Cognitive Behavioral Therapy …
937
and following symptoms during a longer period of the time. This test has been confirmed as a tool for the inspection of symptom changes during the treatment. The result of this test shows the final score between 0 and 80 points. If the respondent achieves the score 33 points or bigger, it means that he or she needs to consider this disorder very seriously. It is suitable to be involved in cognitive behavioral therapy or prolonged exposure. If the respondent achieves the score which is less than 33 points, he or she is classified as a patient suffering from moderation level of PTSD [14]. This questionnaire is usually used several times to identify the change in PTSD symptoms during a longer time. It can be classified as a change that is not caused by a coincidence if there is 5 points decrease. It can be classified as a clinically relevant change if there is a reduction of 10–20 points [15]. From the collected PCL-5 measurements, it can be understood that the difference max between the first meeting and the last meeting is with every respondent more than 10 points which means it can be spoken about the clinically significant shift (respondent 1 max = 12 points, respondent 2 max = 16 points, respondent 3 max = 14 points, respondent 4 max = 12 points, respondent 5 max = 11 points). Figure 8 represents measured data from PCL-5 in a graphical form and Table 2 in a numerical form. It is possible to see a significant change in the curve shape. With longer-lasting VR therapy, even better results can be expected.
Fig. 8 Visualization of data from PCL-5 measurement
938
I. Kovar
Table 2 Numerical measured data from PCL-5 test Session number
Value of PCL-5 test Resp. 1
Resp. 2
Resp. 3
Resp. 4
Resp. 5
1
46
49
41
45
50
2
45
48
38
45
50
3
45
49
38
46
51
4
46
48
39
45
49
5
45
47
38
44
50
6
44
47
37
44
48
7
42
48
35
45
48
8
43
47
36
44
47
9
42
46
35
43
45
10
41
45
34
42
44
11
41
44
35
41
44
12
40
44
34
41
45
13
39
45
33
42
43
14
39
43
33
40
43
15
40
43
34
40
41
16
38
41
32
38
42
17
37
40
30
37
44
18
35
39
28
36
41
19
37
39
30
36
40
20
36
40
29
37
38
21
35
38
28
35
40
22
35
37
28
34
39
23
36
35
29
32
39
24
34
35
27
32
40
25
34
33
27
33
39
max
12
16
14
12
11
5 Conclusion After 6-month long experiment, it can be said that there are confirmable reductions in depression level and we can give the answer to the question. Can VR speed up regular psychological cognitive behavioral therapy and reduce the use of pharmaceuticals? The initial research shows positive indicators that the answer is yes. Based on our psychological tests, there are proven results about an improvement of the mental state of the respondents. The biggest difference between the first and last meeting during each psychological test is expressed in a quantity max. In MCQ-30, the biggest number max was detected for the respondent 3 with max =
Virtual Reality as Support of Cognitive Behavioral Therapy …
939
14 points. The most significant difference in SADS was found out for the respondent 5 with max = 10 points. The biggest value in TMT (trail part A) was max = 16 s for the respondent 1 and max = 26 s for the respondent 5 in TMT (trail part B). In the last PCL-5 test, the most marked different max was identified for the respondent 2 with max = 16 points. Based on the collected results, it can be said that VR affected psychological cognitive behavioral therapy has a clinically significant effect on our involved respondents. Based on our research and another research papers worldwide, it can be predicted that VR has a generally positive impact on human’s psychological health and use of the VR will be rapidly extended to the more and more areas of human’s health [16, 17]. The extension of this research with the use of professional 8 K 360° camera Insta 360 Pro for even better and more believable VR experience is further planned. It is also relevant to include a larger sample of respondents in this further planned research. Compliance with Ethical Standards Funding This study was funded by grant number IGA/CebiaTech/2019/003 from Internal Grant Agency of Tomas Bata University in Zlin. Conflict of Interest The authors declare that they have no conflict of interest. Ethical Approval All procedures performed in studies involving human participants were in accordance with the ethical standards of Tomas Bata University and national Czech research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed Consent Informed consent was obtained from all individual participants included in the study.
References 1. P. Havlikova, Aktualni a potencialni farmakologicke moznosti lecby post traumaticke stresove poruchy. Psychiatrie pro praxi. Psychiatrická nemocnice Sternberk 18(2), 56– 58 (2017). http://www.remedia.cz/Clanky/Farmakoterapie/Posttraumaticka-stresova-porucha/ 6-L-g8.magarticle.aspx 2. J. Prasko, B. Paskova, N. Soukupova, V. Tichy, Post-traumaticke stresove poruchy: I. dil klinicky obraz a etiologie. Psychiatrie pro praxi (4), 157–160 (2001). https://www.psychiatriepropraxi. cz/pdfs/psy/2001/04/04.pdf 3. Post-Traumatic Stress Disorder (Anxiety Canada). https://anxietycanada.com/disorders/posttraumatic-stress-disorder/ 4. M. Smith, L. Robinson, R. Segal, J. Segal, Post-Traumatic Stress Disorder (PTSD): Symptoms, Treatment, and Self-help for PTSD. Help Guide: Your Trusted Guide to Mental Health & Wellness (2019). https://www.helpguide.org/articles/ptsd-trauma/ptsd-symptoms-self-help-tre atment.htm 5. Posttraumaticka Stresova Porucha—PTSD (Galaxy). https://www.psychowalkman.cz/ucinky/ dalsi-ucinky/posttraumaticka-stresova-porucha-ptsd/
940
I. Kovar
6. Post-Traumatic Stress Disorder (National Institute of Mental Health: Transforming the Understanding and Treatment of Mental Illnesses, 2019). https://www.nimh.nih.gov/health/topics/ post-traumatic-stress-disorder-ptsd/index.shtml 7. B. Sobota, F. Hrozek, Virtualna realita a jej technologie (Datakon Znalosti, Ostrava, 2013). ISBN 978-80-248-3189-3 8. V. Highfield, T. Mcmullan, J. Bray, Samsung Gear VR Review: What You Need to Know (Alphr, 2018). https://www.alphr.com/samsung/samsung-gear-vr/1002842/samsung-gear-vr-review 9. M.-C. Juan, I. Garcia-Garcia, R. Molla, R. Lopez, Users’ perceptions using low-end and highend mobile-rendered HMDs: a comparative study. Computers 7(1) (2018). https://doi.org/10. 3390/computers7010015, http://www.mdpi.com/2073-431X/7/1/15. ISSN 2073-431X 10. A. Wells, S. Cartwright-Hatton, A short form of the metacognition’s questionnaire: properties of the MCQ-30. Behav. Res. Therapy 42(4), 385–396 (2004). https://doi. org/10.1016/s0005-7967(03)00147-5, http://linkinghub.elsevier.com/retrieve/pii/S00057967 03001475. ISSN 00057967 11. J. Sobanski K. Klasa, K. Rutkowski, Social avoidance and distress scale (SAD) and fear of negative evaluation scale (FNE)—Reliability and the preliminary assessment of validity. Psychiatr Pol. 47(4), 691–703 2013 12. R. Walrath, M.W. Hertenstein, T. Koulenti, et al., Trail Making Test. Encyclopedia of Child Behavior and Development (Springer Boston, MA, US, 2011), pp. 1499–1500. https:// doi.org/10.1007/978-0-387-79061-9_2934, http://www.springerlink.com/index/10.1007/9780-387-79061-9_2934. ISBN 978-0-387-77579-1 13. E. Heerema, C. Chaves. Administration, scoring and interpretation of the trail making test: how effective is the trail making test in identifying dementia. VeryWell Health (2018). https:// www.verywellhealth.com/dementia-screening-tool-the-trail-making-test-98624 14. M.J. Bovin, B.P. Marx, F.W. Weathers, M.W. Gallagher, P. Rodriguez, P.P. Schnurr, T.M. Keane, Psychometric properties of the ptsd checklist for diagnostic and statistical manual of mental disorders—fifth edition (PCL-5) in veterans. Psychol. Assess. 28(11), 1379–1391 (2016). https://doi.org/10.1037/pas0000254, http://doi.apa.org/getdoi.cfm?doi=10.1037/pas0000254. ISSN 1939-134X 15. A.R. Ashbaugh, S. Houle-Johnson, C. Herbert, W. El-Hage, A. Brunet, M. Mazza, Psychometric validation of the English and French versions of the posttraumatic stress disorder checklist for DSM-5 (PCL-5). PLOS ONE 11(10) (2016). https://doi.org/10.1371/journal.pone.0161645, http://dx.plos.org/10.1371/journal.pone.0161645. ISSN 1932-6203 16. A. Bourla, S. Mouchabac, W. El Hage, F. Ferreri, E-PTSD: an overview on how new technologies can improve prediction and assessment of Posttraumatic Stress Disorder (PTSD). Eur. J. Psychotraumatol. 9(sup1) (2018). https://doi.org/10.1080/20008198.2018.1424448, https:// www.tandfonline.com/doi/full/10.1080/20008198.2018.1424448. ISSN 2000-8198 17. C. Botella, B. Serrano, R. Banos, A. Garcia-Palacios, Virtual reality exposure-based therapy for the treatment of post-traumatic stress disorder: a review of its efficacy, the adequacy of the treatment protocol, and its acceptability. Neuropsychiat. Disease Treatment. https://doi.org/10. 2147/ndt.s89542, https://www.dovepress.com/virtual-reality-exposure-based-therapy-for-thetreatment-of-post-traum-peer-reviewed-article-NDT. ISSN 1178-2021
Facial Expression Recognition Using Wavelet Transform and Convolutional Neural Network Dini Adni Navastara, Hendry Wiranto, Chastine Fatichah, and Nanik Suciati
Abstract A facial expression recognition is one of the machine learning applications. It categorizes an image of facial expression into one of the facial expression classes based on the extracted features from an image. Convolutional Neural Network (CNN) is one of the classification methods in which also extracts patterns from an image. In this research, we applied the CNN method to recognize facial expression. The wavelet transform is used before being processed into CNN to improve the accuracy of facial expression recognition. The facial expression images are taken from Karolinska Directed Emotional Faces (KDEF) dataset which contains seven different facial expressions. The preprocessing of the images includes converting the image to grayscale, changing the image resolution to 256 × 256 pixels, and applying data augmentation with horizontal reflection and zoom in. The experimental results of facial expression recognition using CNN with wavelet transform achieve 84.68% accuracy and without wavelet transform achieve 81.6%. The best result is 89.6% accuracy which is obtained with the data split based on the photo session, using wavelet transform, RMSprop optimizer with learning rate 0.001, and without data augmentation. Keywords Convolutional neural network · Data augmentation · Facial expression recognition · Wavelet transform
D. A. Navastara (B) · H. Wiranto · C. Fatichah · N. Suciati Department of Informatics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia e-mail: [email protected] H. Wiranto e-mail: [email protected] C. Fatichah e-mail: [email protected] N. Suciati e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_83
941
942
D. A. Navastara et al.
1 Introduction Machine learning has become part of the daily life of people around the world. The development of machine learning enables a computer to learn and predict possible patterns to be used to help people’s daily needs. The application based on a machine learning method helps to solve problems in a more efficient way. Some of the machine learning applications are image classification, information retrieval, voice recognition, and medical diagnosis [1]. A human facial expression recognition is also one of machine learning applications. It plays an important part, for it displays human emotions and moods and can be used to find a change in behaviors and feelings [2]. It can contain rich information about human’s behavior. Thus, the human face plays a primary role in interpersonal communication [3]. Human facial expression recognition categorizes an image of facial expression into one of the facial expression classes based on the features extracted from the image [1, 4]. The main purpose of facial expression recognition (FER) is to introduce a natural way of communication in man-machine interaction. The FER is the base of affective computing as it is used to recognize the human expression effectively [5]. Many working fields in the world use image classification to improve the quality of their product like business, finance, health, government, transportation, and many more. Many companies, researchers, and universities keep improving machine learning to get a better and faster result. Recently, deep learning algorithms achieve higher results than conventional learning algorithms in many cases. Convolutional Neural Network (CNN) is one of the deep neural networks that commonly used for image classification. In this research, the combination of the wavelet transform and CNN method are developed for facial expression recognition. The images are transformed into the wavelet domain using the wavelet transform before being processed into the CNN. The purpose of the wavelet transform is to improve the accuracy of the facial expression recognition using CNN. The preprocessing steps are applied such as changing the image format to grayscale, changing the image resolution to 256 × 256 pixels, and applying data augmentation with horizontal reflection and zoom in. The dataset used in this research is taken from Karolinska Directed Emotional Faces (KDEF) which contains seven different facial expressions. We compare the experimental results of facial expression recognition using CNN with wavelet transform and without wavelet transform.
Facial Expression Recognition Using Wavelet Transform …
943
2 Literature Review 2.1 Convolutional Neural Network Convolutional Neural Network (CNN) is one of the deep learning algorithms, which is an improvement from Multi-Layer Perceptron (MLP). MLP or known as Fully Connected Layer in CNN is the model inspired by how humans’ neurons work. Every neuron is connected and transfer information from each other. Every single neuron gets the input then applies a dot operation with weight and adds some bias value. The result of this operation becomes the parameter of the activation function for the neuron’s output [6]. The first research that underlies this algorithm is done by Hubel and Wiesel [7] that researched the visual cortex of the cat’s eyes. CNN is designed to process two-dimensional data like images or sound [8]. The architecture of CNN can be divided into two parts, the Feature Extraction Layer and the Fully Connected Layer. Feature Extraction Layer translates the input data (images) into features using multiple convolutional layers and pooling layer. The purpose of this convolutional layer is to extract the feature of the input image while the pooling layer will reduce the dimensionality of the data and reduce the number of parameters and the computational complexity of the model [8]. Pooling layer ensures the feature from the convolution is the same, although the object’s position is translated. Fully Connected Layer is an MLP that has multiple hidden layers, activation functions, output layer, and loss function that is used to classify the features from the feature extraction layer. Fully Connected Layer is implemented at the end of the network [1, 8].
2.2 Wavelet Transform Wavelets transform is a transformation method that adopts the Fourier Transform method. Wavelet Transform converts the signal in the time domain into a signal in the time and frequency domain (which in this case is formed into a translation and scale domain) [9]. Discrete Wavelet Transform (DWT) is one of the methods used in digital image processing that can be used for image transformation and image compression. DWT derives from and simplifies the continuous wavelet transform, representing a sequence of sampled numbers from a continuous function [1, 10]. Let an image f (x, y) has dimensions M × N. The two-dimensional DWT transform pair is defined as shown in Eqs. (1) and (2). Wϕ ( j0 , m, n) = √
1 M.N
M−1 N −1 x=0 y=0
f (x, y)ϕ j0 ,m,n (x, y)
(1)
944
D. A. Navastara et al.
Wψi ( j, m, n) = √
1 M.N
M−1 N −1
f (x, y)ψ ij,m,n (x, y)
(2)
x=0 y=0
where Wϕ is the approximation coefficients, Wψ is the detail coefficients, M and N are the subband sizes, j is the resolution level, and i is the subband set {H, V, D} [3].
2.3 KDEF Dataset The Karolinska Directed Emotional Faces (KDEF) is a set of a total 4900 pictures of human facial expressions [11]. The set of images contains 70 individuals displaying seven different emotional expressions. Figure 1 shows the expressions included in the dataset. The expressions are afraid, angry, disgust, happy, neutral, sad, and surprised. Each expression is taken from five different angles which are shown in Fig. 2. The angles are full left profile, half left profile, straight, half right profile, and full right profile.
Fig. 1 The seven different expressions contained in the dataset
Fig. 2 The five different angles for expression [11]
Facial Expression Recognition Using Wavelet Transform …
945
3 The Proposed Method There are three main steps in this facial expression recognition system that is shown in Fig. 3.
3.1 Preprocessing Before being fed to the network for training, the images will be preprocessed first. The preprocessing includes changing the image format into grayscale, crop the face area, and then the image is resized into 256 × 256 pixels. The resize process is needed to make the computation faster.
3.2 Wavelet Transform After being preprocessed, the images data will be transformed into the wavelet domain using a Discrete Wavelet Transform (DWT). The size of the images will be reduced into its quarter which is 128 × 128 pixels. The DWT process will produce
Fig. 3 Main stages of facial expression recognition
946
D. A. Navastara et al.
four different sub-bands which are High-High (HH), High-Low (HL), Low-High (LH), and Low-Low (LL) sub-bands. These four sub-bands will be the input of the proposed network.
3.3 CNN Training The training process will use the training data to build the CNN model. The training process will use Adam optimizer with a learning rate that will vary as the testing scenario goes. The CNN architecture is shown in Fig. 4. The architecture is made after some experiments by changing layers and network parameters like kernel size, filter size, zero padding, and strides.
4 Experimental Result 4.1 Data The input data is preprocessed first before being transformed using a Discrete Wavelet Transform. The preprocessing result is shown in Fig. 5. After being preprocessed, the input data is transformed into a wavelet domain using Discrete Wavelet Transform that is shown in Fig. 6. Besides using the original data, the experiment also uses more data variations with data augmentation. The augmentation is done with horizontal reflection and random zoom in that is shown in Fig. 7.
4.2 Experiments on Data Split This experiment is used to test the model performance when a person’s face has never trained the model before. There are two ways of splitting the data into train data and test data. The first one is splitting the data based on the individual which results in an accuracy of 74.74%. And the second one is splitting the data based on the photo session. This split is done by using all session A data to train the model, then choose randomly 1000 session B data to test the model. It is sure that every individual trains the model at least once. The accuracy of this split is 84.68% as shown in Table 1.
Facial Expression Recognition Using Wavelet Transform …
Fig. 4 The CNN architecture
947
948
D. A. Navastara et al.
Fig. 5 a Original image; b input image after preprocessing
Fig. 6 a HH subband; b HL subband; c LH subband; d LL subband
Fig. 7 a Original image; b the result of horizontal reflection; c the result of zoom in
4.3 Experiments on Wavelet Transform In this experiment, we compare the experimental results of facial expression recognition to Discrete Wavelet Transform (DWT) and without DWT. The purpose of DWT is to increase the accuracy of the CNN model. In Table 2, the experimental results show that the proposed method without DWT obtains an accuracy of 81.6% and with DWT achieves an accuracy of 86.8%.
Facial Expression Recognition Using Wavelet Transform …
949
Table 1 Comparison between data split based on the individual and data split based on the photo session Data
Accuracy (%)
Class
Precision (%)
Recall (%)
Based on the photo session
84.68
Afraid
77
66
Angry
89
83
Disgust
85
83
Happy
94
97
Neutral
84
95
Sad
77
75
Based on the individual
74.74
Surprised
84
92
Afraid
63
55
Angry
79
67
Disgust
82
75
Happy
91
92
Neutral
66
85
Sad
68
63
Surprised
77
84
Table 2 Comparison between data without wavelet transform and data with wavelet transform Data
Accuracy (%)
Class
Precision (%)
Recall (%)
Without wavelet
81.6
Afraid
77
60
Angry
87
76
Disgust
82
85
Happy
92
95
Neutral
77
88
Sad
71
76
With wavelet
86.8
Surprised
85
88
Afraid
83
70
Angry
91
81
Disgust
88
85
Happy
98
99
Neutral
85
93
Sad
78
83
Surprised
85
95
4.4 Experiments on CNN Parameters This experiment is used to find the optimal parameters that produce the best accuracy. The optimizers are used to find the optimal parameter, which is SGD, RMSprop,
950
D. A. Navastara et al.
Table 3 Best result on CNN parameters experiment Optimizer
Accuracy (%)
Class
Precision (%)
Recall (%)
RMSprop
89.6
Afraid
76
85
Angry
91
85
Disgust
89
88
Happy
97
99
Neutral
97
96
Sad
83
85
Surprised
94
89
Adam, and Adagrad. Each optimizer is tested with four different learning rates, which are 0.1, 0.01, 0.001, and 0.0001. The best parameter is RMSprop optimizer with learning rate 0.001 which results in an accuracy of 89.6% as shown in Table 3.
4.5 Experiments on Data Augmentation This experiment is used to find the usage of more data variation using data augmentation can increase the model accuracy. The first experiment uses the original data only and the second one uses the original and augmentation data. The experimental results using the original data only achieve an accuracy of 89.6% and using the original and augmentation data obtains 89% that are shown in Table 4. In this research, the data augmentation usage obtains a similar result with the original data usage only. In Table 4, the experimental results show that the best precision and recall are happy. This can happen because happy expression is not visually similar to any other expression label, therefore the model can distinguish it well. The worst precision and recall are Afraid and Sad because both labels are similar and hard to be distinguished.
5 Conclusion Based on the experimental results, the data split based on the photo session yields better accuracy of 84.68% compared to the data split based on the individual. This is caused by the model is trained with all faces of the individuals in the dataset. The data using Discrete Wavelet Transform (DWT) gives a better result in an accuracy of 84.68% than the data without DWT. Data augmentation with horizontal reflection and random zoom in is not very effective to increase the accuracy of the model proven by the accuracy of the model trained with original data only achieves 89.6%. The accuracy of this model is not much different from the model trained with original data and augmented data which results in an accuracy of 89%. Facial expression recognition system has been successfully implemented with an accuracy of 89.6%
Facial Expression Recognition Using Wavelet Transform …
951
Table 4 Comparison between using the original data only and using the original and augmentation data Data
Accuracy (%)
Class
Precision (%)
Recall (%)
Original data only
89.6
Afraid
76
85
Angry
91
85
Disgust
89
88
Happy
97
99
Neutral
97
96
Sad
83
85
Original and augmentation data
89.0
Surprised
94
89
Afraid
75
83
Angry
98
87
Disgust
86
90
Happy
94
99
Neutral
93
93
Sad
83
87
Surprised
95
83
which is obtained with the data split based on the photo session, using Discrete Wavelet Transform, RMSprop optimizer with learning rate 0.001, and without data augmentation. Acknowledgements This work was supported by Institute of Research and Community Service (Lembaga Penelitian dan Pengabdian Masyarakat, LPPM) Institut Teknologi Sepuluh Nopember (ITS) Surabaya with the grant number of 1748/PKS/ITS/2018.
References 1. T. William, R. Li, An ensemble of convolutional neural networks using wavelets for image classification. J. Softw. Eng. Appl. 11, 69–88 (2018) 2. C. Soladié, N. Stoiber, R. Séguier, A new invariant representation of facial expressions: definition and application to blended expression recognition, in 2012 19th IEEE International Conference on Image Processing, Orlando, FL, USA (2012) 3. Y.-l. Xue, X. Mao, F. Zhang, Beihang University facial expression database and multiple facial expression recognition, in 2006 International Conference on Machine Learning and Cybernetics, Dalian, China, China (2006) 4. C. Primasiwi, H. Tjandrasa, D.A. Navastara, Deteksi Ekspresi Wajah Menggunakan Fitur Gabor dan Haar Wavelet. Jurnal Teknik ITS 7(1), 20–22 (2018) 5. N. Perveen, S. Gupta, K. Verma, Facial expression recognition using facial characteristic points and Gini index, in 2012 Students Conference on Engineering and Systems, Allahabad, Uttar Pradesh, India (2012)
952
D. A. Navastara et al.
6. C. Fatichah, W.F. Lazuardi, D.A. Navastara, N. Suciati, A. Munif. Image spam detection on instagram using convolutional neural network, in Intelligent and Interactive Computing (Springer, Singapore, 2019), pp. 295–303 7. D. Hubel, T. Wiesel, Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195, 215–243 (1968) 8. K. O’Shea, R. Nash, An introduction to convolutional neural networks (2015) 9. N. Suciati, A.B. Anugrah, C. Fatichah, H. Tjandrasa, A.Z. Arifin, D. Purwitasari, D.A. Navastara, Feature extraction using statistical moments of wavelet transform for iris recognition, in 2016 International Conference on Information & Communication Technology and Systems (ICTS) (IEEE, 2016), pp. 193–198 10. A. Wichert, Wavelet Transform (INESC-ID/IST, University of Lisboa, Portugal, 2014) 11. D. Lundqvist, The Karolinska Directed Emotional Faces—KDEF (Department of Clinical Neuroscience, Psychology Section, Karolinska Institute, Solna, Sweden, 1998)
Survey of Automated Waste Segregation Methods Vaibhav Bagri, Lekha Sharma, Bhaktij Patil, and Sudhir N. Dhage
Abstract Waste segregation is a major aspect of any efficient waste management system. Since manual segregation is such a tedious task, there has been extensive research into the development of automated techniques for the same. The techniques are categorized as two distinct approaches—hardware-based approach which primarily employs the use of sensors, and software-based approach, which uses image processing and deep learning algorithms to perform the classification. This paper aims to highlight and study all the existing techniques and analyse the advantages and disadvantages of each approach. Keywords Automated waste sorting · Waste management · Sensors · Image processing
1 Introduction In 2016, 2.01 billion tonnes of solid waste was produced globally. It is projected to reach 3.40 billion tonnes in 2050, a 70% increase from 2016. With a continual increase of solid waste, it is becoming more and more difficult to manage. Improper handling of this waste can adversely affect the environment and the well-being of people. The traditional methods used for the same are now becoming inefficient to manage the solid waste. Thus waste management is becoming an increasingly V. Bagri (B) · L. Sharma · B. Patil · S. N. Dhage Computer Engineering Department, Sardar Patel Institute of Technology, Andheri (West), Mumbai 400059, India e-mail: [email protected] L. Sharma e-mail: [email protected] B. Patil e-mail: [email protected] S. N. Dhage e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_84
953
954
V. Bagri et al.
important area of research. New methods to collect, segregate, transport, recycle and dispose off waste are being studied [1, 2]. The aim is to optimize each of the stages of waste management, to help make the entire process cleaner and efficient. Waste segregation is an integral part of the waste management process as it helps to streamline the process greatly. Municipal solid waste consists of a variety of waste that each need to be handled and managed differently. Segregation also plays a key role to aid recycling which is an essential practice in the contemporary scenario. Segregation also makes the handling of waste less expensive as compared to nonsegregated waste. Automating segregation of waste has numerous advantages including time efficiency and increased accuracy. Thus, various techniques have been researched and developed over time which leverage technology to help automate segregation of waste. Some of these approaches use specific properties of the materials using sensors, electromagnetic rays, etc. to separate the different waste items. Other approaches have adopted to use image processing to simplify the segregation process. Object identification and classification algorithms play an important role in these approaches. This paper aims to review such papers to help understand the recent advancement in waste segregation processes. The organization of the paper is as follows. Section 2 gives a detailed explanation of the waste segregation techniques, with part A being hardware-based techniques and part B being software-based techniques. Section 3 considers both approaches as a whole and provides advantages and disadvantages of each approach. Section 4 concludes the survey.
2 Waste Segregation Techniques With the increasing awareness towards waste management and advancement in technology, there are various techniques proposed to segregate waste quickly and efficiently. As shown in Fig. 1, these techniques can be broadly classified as hardware and software based. A few of these techniques belonging to both categories are discussed in this section.
2.1 Hardware-Based Techniques These techniques of segregation are dependant of various types of sensors and electromagnetic rays which leverage the materialistic properties of the items to be separated from each other. Pereira et al. [3] propose a smart bin that performs segregation of wet and dry waste using capacitance as a differentiator. The bin comprises of two copper plates kept at a 45◦ angle towards each other. The waste item when placed between these plates changes the value of capacitance between them. Wet waste has higher capacitance
Survey of Automated Waste Segregation Methods
955
Fig. 1 Classification of methods to segregate waste
than dry waste and can be separated by dropping it into a designated chamber. The dry waste can be further detected for plastic waste. They use an infrared sensor for this as plastic can be differentiated based on its absorptivity when placed under IR light. Automated waste segregator (AWS) proposed by Chandramohan et al. [4] is another system that leverages capacitance for classification of dry and wet waste. AWS has a metal detection system for metallic waste and capacitive sensing module for dry and wet waste as mentioned earlier. When waste is pushed through the flap of the bin, the IR proximity sensor senses it and the microcontroller comes out of low power mode. First the waste is passed through a parallel LC circuit whose parallel impedance value changes if the waste is metallic and this data is returned as a proximity value. The remaining waste is passed through the capacitive sensing module that measures the change in the count value between the capacitive plates. If the change is greater than 30, waste is wet, else it is dry. The collection unit rotates to collect the respective type of waste using a lookup table provided. This system has certain limitations including detection of only one waste item at a time with priority being metallic, wet and dry. Another limitation includes not being able to handle items with exceptions in their capacitive change. Pan et al. [5] suggest the use of odour sensors in their garbage bin which is powered by solar energy. The odour sensor can detect food waste and segregate it. Toxic waste can also be identified by their emissions by the sensor and the user is notified of the same. The bin has two compartments for recyclable and non-recyclable waste. A faulty classification by the user can be identified by a buzzer. An infrared sensor is used to identify the same. This sensor is further used to detect if the bin is full and the user is notified. Separation of metal waste from remaining waste by the use of metal and ultrasonic sensors to aid recycling is suggested by Aahash et al. [6]. The presence of the waste item is identified by the ultrasonic sensor. The metal sensor consists of a metal detector that works on the principle of electromagnetic induction. This sensor is used to classify whether the waste item as metallic or not. The system has two doors which open depending on the item classified. This embedded system is built using
956
V. Bagri et al.
Arduino UNO and Embedded C is used as the programming language. This system is simple but the scope is limited to classification of metallic and non-metallic waste. Another approach of segregating metallic and non-metallic waste is the use of eddy currents. Rahman and Bakker [7] made use of an electromagnetic sensor to detect the presence of magnetic materials in the waste. The waste materials are made to pass through a tube, on either side of which a coil is placed. When an alternating current passes through the coils, a magnetic field is created. Thus, when a magnetic material passes through the tube, it creates a magnetic flux which is in phase with the field and this change is picked up by the sensor. Similarly, when a conductive, non-magnetic material passes through the tube, an out of phase magnetic flux is created. Non-conductive and non-magnetic materials produce no effect. This way, segregation takes place. But the scope is still limited to segregation of metallic and non-metallic waste. Karaca et al. [8] use a hyperspectral imaging system to classify various types of plastics and paper along with metal and glass. The waste items are placed on a conveyor belt that is illuminated by two 1000 W Quartz halogen lamps to prevent errors caused by variations of ambient light. The imaging system consists of a shortwave infrared (SWIR) camera and SWIR spectrometer which scans the waste items on the conveyor belt. After calculation of reflectivity of the items from captured data, it is further processed by removing noise using Savitzky-Golay filtering. Post-processing is done to correct multiplicative effects using standard normal variate method and baseline effects using asymmetric least squares method. The mean classification accuracy of the model on four datasets is 93.01, 96.65, 93.63 and 93.03%. There have also been research for separation of different types of the same materials like plastics, wood, metals, etc. Edward et al. [9] have surveyed the use of infrared spectroscopy with the addition of laser aided identification and marker systems for sorting plastics for recycling. In infrared spectroscopy, the plastics are irradiated with near-infrared waves. Using the amount of light reflected from each plastics surface, their absorption band can be identified and the plastics can be sorted. Similarly, X-rays can be used for PVC segregation. Laser Zentrum Hannover developed a method which uses a heat impulse response of plastics as a sorting factor. They have also mentioned the use of marker systems in which either the containers or plastics are marked. For example, containers marked using invisible ink or plastics using a molecular marker or different dyes. They finally conclude that a hybrid system that matches the plastic intake and optimizes the cost-effectiveness of the recycling facility is ideal. Wood waste can be reused only if has not been treated with various chemicals to enhance performance. Fellin et al. [10] suggest the use of Energy Dispersive X-ray Fluorescence (ED-XRF) to segregate untreated wood from treated wood. Chemically treated wood consists of elements such as copper, lead, mercury, arsenic to name a few. ED-XRF is a method that is used to find out the elements and their composition in a substance. The wood-based materials (WBM) are exposed to X-rays which excite the atoms of the elements. The intensity of the radiation signal emitted by them is proportional to their concentration. The authors have presented a table that gives the minimum and maximum natural values of these elements after careful review
Survey of Automated Waste Segregation Methods
957
of literature. These values are used to identify the WBM that is treated and hence is separated. This method is quick and cost effective as compared to the previously used methods like Atomic Absorption Spectroscopy or Inductively Coupled Plasma Spectrometry but it has reported a 22% cases which resulted in a false positive or negative. Takezawa et al. [11] suggest the combination of eddy currents and X-rays for effective separation of light metals alloys, especially wrought aluminium alloys. This technique uses Beer-Lambert Law for finding the attenuation constant 1/4 of X-rays incident on the aluminium alloy samples. The attenuation constant is the basis for identifying different alloys. Eddy currents measure the impedance of the alloy samples against a reference sample and then display each as leading or lagging if their impedance if different than the reference sample. Both these methods could separate 7 alloys into 3 groups individually, but when used together they could identify 6 groups. This approach is useful for closed-loop recycling of aluminium and other light metal alloys. Grzegorzek et al. [12] also worked with sorting of aluminium alloys. The materials were scanned with a camera and a laser beam to study their spectral emission signatures. Then, they were classified using various algorithms like Naive Bayes, SVM and KNN, out of which, SVM performed the best. Table 1 gives a summary of all the hardware-based approaches surveyed, with certain additional points of differentiation among them.
2.2 Software-Based Techniques These techniques primarily employ the use of image processing and neural networks to accurately classify the objects scanned as waste. These waste objects are then segregated on the basis of the class they have been identified in. Zhihong et al. [13] proposed the use of the Fast R-CNN [14] algorithm for object detection and image classification. It was composed of two subnets, Regional Proposal Network and VCG-16. The system was able to identify plastic bottles from a stream of moving items and separate it out with the help of robotic arms. Since the waste it scans is placed on conveyor belts, every item is scanned multiple times, thereby improving the accuracy of segregation. One limitation of the system though is that it only removes plastic bottles from a pile of waste; it does not categorize different types of waste and segregate them. Sudha et al. [15] designed a system which classified images of individual objects as biodegradable and non-biodegradable. They used Caffe, which is a deep learning framework, for implementation purposes. Caffe was preferred over other similar frameworks like Torch because it consumes less time while training a large dataset in case of a standard architecture like the one being used by them. The limitation of the system comes from the single object classification scenario, that is the system can only classify one object at a time, making it practically infeasible. Setiawan et al. [16] made use of the Scale Invariant Feature Transform (SIFT) Algorithm for object detection and classification. Along with the image of the object,
958
V. Bagri et al.
Table 1 Comparison of hardware-based techniques Technique/properties used
Type of materials segregated
Sensors/electromagnetic rays used
Limitations
Capacitance, infrared rays [3]
Wet and dry waste, plastics
Infrared sensor, copper plates for capacitor
Certain objects which do not follow the conditions specified for segregation Only plastic separated for recycling
Capacitance, parallel impedance [4]
Wet and dry waste, metals
LC circuit with LDC1000, copper plates as capacitor
Can detect only one item at a time with priority being metallic, wet and dry Cannot handle exceptions
Electromagnetic induction [6]
Metallic and non-metallic
Metal sensors, ultrasonic sensors
Limited segregation
Infrared spectroscopy, Plastics laser aided identification and marker systems [9]
Near-infrared rays, X-rays, laser
Efficiency sorting techniques are not well understood Expensive
Energy dispersive X-ray Wood-based materials fluorescence (ED-XRF) [10]
Oxford instruments X-MET 5100
22% false positive or negative cases found
Eddy currents [7]
Metallic and non-metallic
Electromagnetic sensor
Limited segregation Miss rate increases at higher particle feed rates
Hyperspectral imaging system [8]
Plastic, paper, metal, glass
SWIR camera, SWIR spectrometer
Tested for limited dataset
Impedance, attenuation constant [11]
Wrought aluminium alloys
X-ray
Cannot distinguish certain alloys from each other
the system was given an image of the product label to ease the classification task among organic and non-organic waste. On the basis of the keypoints descriptor in both images, the objects were classified. This shows the utility of the product label in classifying waste objects. The same, though, cannot be applied to waste items in general because many items do not have a product label. Omar et al. [17] designed an innovative trash can that is embedded with an image processor to determine which part of the can a particular item is to be placed in. Thus, it segregates the waste into various categories. The classification is based on the first two of Hus seven image invariant moments, used with the K nearest neighbour (KNN) algorithm. While the results indicate a high accuracy (upto 98%), the testing dataset is very small (20 items), leaving the accuracy of the system in doubt.
Survey of Automated Waste Segregation Methods
959
This approach of using image processing with the KNN algorithm for classification purposes was also adopted by Rahman et al. [18]. They used it to classify between different kinds of recyclable waste papers. Using image processing, various features of an image of paper was extracted such as skewness, kurtosis, dispersion, entropy. When the KNN algorithm was applied on these features, the waste papers were separated into multiple different classes according to the type of paper. An alternate way of segregating recyclable waste paper is the use of DNA computing [19]. DNA computing algorithms also greatly reduce the computational time required for matching classes. Similar to the strands of DNA, an image pixel is considered to be made up of 4 components, R (red), G (green), B (blue) and I (intensity). These strands of image DNA are compared with the trained dataset and a result is obtained. This achieves high parallelism and very efficient results. In both of these approaches, the lighting condition is a major dependency. It must be ensured that the lighting is consistent across the enrolment and identification phases of these systems, otherwise the accuracy can be compromised. Chinnathurai et al. [20] have designed a semi-autonomous bot that uses contentbased image retrieval to identify objects as recyclable or non-recyclable. It is a bot with four parts: drivetrain for movement, image acquisition system for capturing images and sending it to a remote image processing server, image processing server that classifies the image and sends response back to the bot, HMI for manual control of bot. The approach used to classify is to have a stored list of image features of pre-identified recyclable objects. A picture of the new item is taken, and its features are compared with the existing list of features to see if a match is found in the stored recyclable objects database. The database is created beforehand and the images are indexed on the basis of colour features and stored in an inverted database. Since the existing database is limited, it limits the number of objects that can be identified and thus classified. Sakr [21] performed a comparative study on the use of CNN and Support Vector Machines (SVM) for the classification of different categories of waste. A 7 layer network, called AlexNet, was used for convolution purposes. The bag of features technique was coupled with SVM for image classification. Using an 8 × 8 window, the entire image was scanned and features were extracted from those images. The similar features were then grouped together with the help of K means algorithm and given as input to the SVM classifier. The results indicated that SVM performed with a higher accuracy than CNN (94.8% compared to 83%). The limitations of the system were that the training set was not vast, so it lacked variety. Moreover, due to low GPU memory, the images had to be scaled down from 256 × 256 to 32 × 32. This led to overfitting problems. Intelligent Garbage Classifier [22] is a system that classifies and segregates solid waste products using computer vision. For classification using computer vision, images of waste material are given as input to the system. Various image processing algorithms like thresholding, Gaussian blur, border detection and watershed segmentation are applied on it to extract the useful part of the image (the actual object) and eliminate background and noise. Then, to characterize the objects, two shape descriptors are used, namely Hus moments and Fourier descriptor. After the distance
960
V. Bagri et al.
of both the descriptors to the known classes is calculated, the K-local hyperplane nearest neighbours (HKNN) algorithm is used to determine which class the image belongs to and is accordingly classified. The actual segregation task is performed by a robotic arm that is controlled using the Lynx6Arm service. Koyanaka and Kobayashi [23] introduced a novel way of segregating lightweight metal waste products by their apparent density and 3D shape. The data was captured by means of a 3D imaging camera and was given as input to the sorting algorithm. The algorithm worked on the basis of multivariate analysis. All the items were initially sorted manually and their respective parameters noted to create a database of sampled fragments. Subsequent unknown fragments were identified on the basis of this database. A drawback of this approach was that the data analysis to prepare the identification algorithm was very time consuming. Hence, the authors introduced a neural network [24] layer for this process, which greatly reduced the time required, without affecting the sorting accuracy (upto 85%). A higher accuracy can also be obtained by modifying the multivariate database and re-learning the neural network. Huang et al. [25] also proposed a similar technique, with object colour and shape being the major distinguishing factor. They made use of triangulation scanning, in which a triangle was formed between the camera, the laser beam and the emitter. These approaches, however, are limited to segregation of lightweight metals and cannot be generalized to plastics since the weights and shapes of such objects are largely non-differentiable. Table 2 gives a summary of all the software-based approaches surveyed, with certain additional points of differentiation among them.
3 Discussion With the progress in technology over the last few years, the methods available for automatic waste segregation have also increased many folds. The research indicates that there are two main levels of segregation of waste. The segregation can be done at the source level by the user throwing the waste, so that the task of the municipal corporation becomes easier while handling it. Other systems suggested perform the task at an industry level, but require the use of various specialized machinery like conveyor belts. Hardware-based approaches are very efficient. Since they deal with parameters regarding properties of materials, they generally provide very accurate (almost 100%) results. At the source level by individuals, with the use of sensors, various types of smart waste bins can be designed [3–6]. The bins are fitted with small sensors that identify the kind of waste being placed in it, and accordingly segregate it or keep it separate sections within the bin. At the industry level, the waste is made to pass through large conveyor belts, on which certain properties of the material passing is tested [7–11]. If the material satisfies a particular property, it is segregated from the rest by means of a robotic arm or an air pump.
Survey of Automated Waste Segregation Methods Table 2 Comparison of Software-based techniques Algorithm/ Categories of Sensors used technique used materials segregated Fast R-CNN [13] Plastic bottles
Camera, KUKA robotic arms
961
Accuracy (%)
Limitations
91
Only works for removal of plastic bottles, no other segregation Dependent on lighting Results unreliable since dataset was too small (60 items)
Hu’s image invariant moments, with KNN algorithm [17] SIFT algorithm [16]
Inorganic waste Camera like cans, bottles, cutlery
98
Organic and Camera inorganic items with product label
89.9
KNN algorithm on features extracted via image processing [18] DNA computing algorithms (RGBI as strands of DNA) [19]
Recyclable waste Camera papers
93
Recyclable waste Camera papers
95.17
Content-based image retrieval using bag of features [20] CNN and SVM [21]
Recyclable and non-recyclable
–
Plastic, paper and Camera metal
SVM—94.8 CNN—83
HKNN [22]
Recyclable and non-recyclable
Camera, Lynx6Arm
–
Multivariate analysis with neural networks [23, 24]
Lightweight metals
3D imaging camera, linear laser, weight sensor
85
Camera, Qik2s12v10 (for drivetrain)
Most waste items in real life do not have product labels, hence limited scope Lighting has to be consistent across enrolment and identification phase Lighting has to be consistent across enrolment and identification phase Limited size of database limits classification capacity Training set not very vast Low GPU memory, so images scaled down from 256 × 256 to 32 × 32 Dataset taken is small, hence results cannot be generalized Limited to lightweight metals, cannot be used with other materials
962
V. Bagri et al.
The hardware-based approaches are accurate, as well as easy to use, yet there is a need for software-based approaches. This is owing to the high cost of the specialized hardware needed to design these segregators. There is no single sensor that works for all categories of waste. Hence, a separate sensor is needed to segregate metallic and non-metallic materials, then a separate is needed to segregate wet and dry waste, a separate for plastics, and so on. This adds up to the cost as well as design complication. Ideally, a hybrid system which has all the different type of sensors fitted can provide really accurate and complete results. But owing to the expensive nature of sensors and the different techniques of segregation for each category of waste, this is not practically possible right now using hardware-based approaches. The sensors also need to be compatible with each other to make this possible. This brings us to the software-based approaches. Humans distinguish between various categories of waste on the basis of their vision. The research done in this area uses algorithms to develop such vision capabilities in machines to perform the segregation task. Expectedly, a camera is the primary instrument/sensor used in all the approaches, while robotic arms and air pumps are used to do the actual segregation. Due to the large computational requirements needed to perform this form of segregation, it is difficult to design systems to perform source level segregation. Still, certain bots have been proposed, that run various image processing and classification algorithms to carry out the segregation task [20, 22]. The main focus of this approach is at the industry level. The waste materials are placed on a conveyor belt and a camera clicks a photo of the material [13, 15–19, 21]. The photo can be from multiple angles or from 3D cameras [23, 24]. Subsequently, the image is classified using a variety of approaches ranging from feature matching using KNN algorithms to DNA computing algorithms to neural networks. This classification is independent of the composition of the waste material and can work for metals, plastics, wet waste, etc. uniformly. The drawback of this approach though is that it is not completely accurate. The results are highly dependent on the training dataset that has been provided to the processing module. Moreover, GPU limitations restrict the performance of most systems, as processing a large dataset requires a lot of computational power, and such can only be provided by supercomputers. This makes the practical implementation of such systems too expensive. Also, since this segregation is dependent on visionbased techniques, the accuracy depends on the current state of the item, that is if its shape or colour changes from that provided in initial dataset, then the systems will fail to recognize it.
4 Conclusion This paper performs a review of the existing techniques in automatic segregation of waste materials for the purpose of efficient waste management. The approaches have been broadly classified into hardware-based and software based, and a vast variety
Survey of Automated Waste Segregation Methods
963
of techniques (eddy current based, X-ray based, image processing based) have been surveyed. In particular, the prevalent methods for automatic segregation of waste in the period 2000–2019 has been studied. The pros and cons of both approaches have been analysed. The two approaches have been further categorized as source based and industry based, on the basis of the use of the particular technique. From the survey, it can be concluded that an ideal automatic waste segregator can be hardware based or software based, but the limitations need to be overcome. An ideal hardware-based segregator needs a variety of sensors which are capable of working together. An ideal software-based segregator needs to be trained with a large enough dataset so that it can recognize all types of waste objects. While both of these are not possible right now, future research might make it a reality.
References 1. S. Gundupalli, S. Hait, A. Thakur, A review on automated sorting of source-separated municipal solid waste for recycling. Waste Manage. 60, 56–74, 02 (2017) 2. A.W. Larsen, Survey on existing technologies and methods for plastic waste sorting and collection (2012) 3. W. Pereira, S. Parulekar, S. Phaltankar, V. Kambl, Smart bin (waste segregation and optimisation), in 2019 Amity International Conference on Artificial Intelligence (AICAI) (2019), pp. 274–279 4. A. Chandramohan, J. Mendonca, N.R. Shankar, N.U. Baheti, N.K. Krishnan, M.S. Suma, Automated waste segregator, in 2014 Texas Instruments India Educators’ Conference (TIIEC) (2014), pp. 1–6 5. P. Pan, J. Lai, G. Chen, J. Li, M. Zhou, H. Ren, An intelligent garbage bin based on nbiot research mode, in 2018 IEEE International Conference of Safety Produce Informatization (IICSPI) (2018), pp. 113–117 6. G. Aahash, V. Ajay Prasath, D. Gopinath, M. Gunasekaran, Automatic waste segregator using arduino. Int. J. Eng. Res. Technol. (IJERT) Iconnect 6(7) (2018) 7. Md. Abdur Rahman, M.C.M. Bakker, Hybrid sensor for metal grade measurement of a falling stream of solid waste particles. Waste Manage. 32(7), 1316–1323 (2012) 8. A.C. Karaca, A. Ertürk, M.K. Güllü, M. Elmas, S. Ertürk, Automatic waste sorting using shortwave infrared hyperspectral imaging system, in 2013 5th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS) (2013), pp. 1–4 9. Edward and Bruno. Automated sorting of plastics for recycling (2000) 10. M. Fellin, M. Negri, R. Zanuttini, Multi-elemental analysis of wood waste using energy dispersive x-ray fluorescence (ed-xrf) analyzer. Eur. J. Wood Wood Prod. 72(2), 199–211 (2014). Mar 11. T. Takezawa, M. Uemoto, K. Itoh, Combination of x-ray transmission and eddy-current testing for the closed-loop recycling of aluminum alloys. J. Mater. Cycles Waste Manage. 17(1), 84–90 (2015). Jan 12. M. Grzegorzek, D. Schwerbel, D. Balthasar, D. Paulus, Automatic sorting of aluminum alloys based on spectroscopy measures (2011) 13. C. Zhihong, Z. Hebin, W. Yanbo, L. Binyan, L. Yu. A vision-based robotic grasping system using deep learning for garbage sorting, in 2017 36th Chinese Control Conference (CCC) (2017), pp. 11223–11226 14. R. Girshick, Fast R-CNN (2015). arXiv:1504.08083
964
V. Bagri et al.
15. S. Sudha, M. Vidhyalakshmi, K. Pavithra, K. Sangeetha, V. Swaathi, An automatic classification method for environment: friendly waste segregation using deep learning, in 2016 IEEE Technological Innovations in ICT for Agriculture and Rural Development (TIAR) (2016), pp. 65–70 16. W. Setiawan, A. Wahyudin, G.R. Widianto, The use of scale invariant feature transform (sift) algorithms to identification garbage images based on product label, in 2017 3rd International Conference on Science in Information Technology (ICSITech) (2017), pp. 336–341 17. L. Omar, R. Oscar, T. Andres, S. Francisco, Multimedia inorganic waste separator, in 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) (2013), pp. 1–4 18. M. Osiur Rahman, A. Hussain, E. Scavino, H. Basri, M.A. Hannan, Intelligent computer vision system for segregating recyclable waste papers. Expert Syst. Appl. 38(8), 10398–10407 (2011) 19. M. Osiur Rahman, A. Hussain, E. Scavino, M.A. Hannan, H. Basri, DNA computer based algorithm for recyclable waste paper segregation. Appl. Soft Comput. 31, 223–240 (2015) 20. R. Cucchiara, C. Grana, M. Piccardi, A. Prati, Detecting moving objects, ghosts, and shadows in video streams. IEEE Trans. Pattern Anal. Mach. Intell. 25(10), 1337–1342 (2003). Oct 21. G.E. Sakr, M. Mokbel, A. Darwich, M.N. Khneisser, A. Hadi, Comparing deep learning and support vector machines for autonomous waste sorting, in 2016 IEEE International Multidisciplinary Conference on Engineering Technology (IMCET) (2016), pp. 207–212 22. A. Salmador, J. Pérez Cid, I. Rodríguez Novelle, Intelligent garbage classifier. Int. J. Interact. Multimed. Artif. Intell. 1(1), 31–36 (2008) 23. S. Koyanaka, Kenichiro Kobayashi, Automatic sorting of lightweight metal scrap by sensing apparent density and three-dimensional shape. Res. Conserv. Recycling 54(9), 571–578 (2010) 24. S. Koyanaka, K. Kobayashi, Incorporation of neural network analysis into a technique for automatically sorting lightweight metal scrap generated by elv shredder facilities. Res. Conserv. Recycling 55(5), 515–523 (2011) 25. J. Huang, T. Pretz, Z. Bian, Intelligent solid waste processing using optical sensor based sorting technology, in 2010 3rd International Congress on Image and Signal Processing, vol. 4 (2010), pp. 1657–1661
Classification of Human Blastocyst Quality Using Wavelets and Transfer Learning Irmawati, Basari, and Dadang Gunawan
Abstract Embryo culture and transfer are the procedure of maturation and transmission of the embryo into the uterus. This procedure is one of a stage in the series of in vitro fertilization processes, better known as IVF. The selection of good quality embryos to be implanted presents a problem because of the blastocyst image. Blastocyst image is a very intricate texture to be visually determined, which is good or poor quality. This research aims to implement the pre-trained Inception-v3 network to predict blastocyst quality with add image pre-processing using wavelets. Using only 249 of human blastocyst microscope images, we developed an accurate classifier that can classify blastocyst quality with a transfer learning. The experiment with twenty epochs, the accuracy of training for only raw blastocyst images is 95%, and the best training accuracy uses a pre-processing image with Daubechies 6-tap of 99.29%. Our model was then tested on the 14 of blastocyst images and classified the images of two kinds of grade with the best accuracy of around 64.29%. Keywords Human blastocyst · Quality classification · Transfer learning · Wavelets
1 Introduction Medically in vitro fertilization (IVF) is a process of fertilizing an egg by sperm cells in a fertilization tube. After the egg is successfully fertilized and is in the active Irmawati · Basari · D. Gunawan (B) Department of Electrical Engineering, Universitas Indonesia, Depok, Indonesia e-mail: [email protected] Irmawati e-mail: [email protected] Basari e-mail: [email protected] Basari Biomedical Engineering Program, Department of Electrical Engineering, Universitas Indonesia, Depok, Indonesia © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_85
965
966
Irmawati et al.
phase, it will be transferred into the uterus, and this stage is the final stage of the IVF process. At the scene of embryo culture, the embryo that has been formed after fertilization will undergo the ripening process until it reaches the blastocyst stage, which is the embryonic development stage 5–6 days after fertilization. After going through the process of embryo culture, a mature embryo will be implanted into the uterus to develop. However, not all embryos can be transferred depending on the quality of the blastocyst. The influence of the success rate of pregnancy outcome is the number of blastocysts transferred, a grade of the leading embryo moved, and the mean grade score of the transferred blastocysts [1]. One failure the process of implantation or planting of a fetus on the uterine wall due to embryologists chooses the bad embryo quality. The pregnancy probability can increase if the embryo is transferred more than one embryo. However, the problems of multiple pregnancies for mother and baby will it can increase the complication of multiple pregnancies. One solution to reduce multiple pregnancies is to transfer one embryo, but this can reduce the chances of pregnancy [2]. One of the problems in the IVF process is the method of selecting and determining the grade of morphological blastocysts. At present, the selection of blastocysts to be transferred is determined based on observations under the microscope by embryologists: they are still manual, subjective, and lack precision. The solution to this problem is to apply machine learning to medical imaging, image processing, and building neural networks base on deep learning for medical image classification [3]. Currently, Deep learning architecture has performed well in representations needed in the classification of images, image detection, and processing tasks, primarily owing to convolutional neural networks [4]. When we are going to classify images, we often choose to build our model from the start so that it matches the input data image that we have. But building a deep learning model requires extensive computing resources and requires a lot of training data. Very deep convolutional networks have become the center of development in recent image recognition performance. The Inception architecture that produces excellent performance with relatively low computing, [3] proposes updates to the first module to further increase ImageNet classification accuracy. In this paper, we used Inception-v3 architecture that has modified by [4], with remove fully connected layers from the original network. This concept is called transfer learning; this is done because training data is so lacking that it will cause overfitting if we used full training from scratch. By using small datasets [5–7], pretrained networks will be better performing than random initialization. In the classification of medical imaging, this usually happens, because it is difficult to get large datasets to be trained from the start. This research aims to implement transfer learning concept using Inception-v3 and add pre-processing image with Daubechies wavelets for prediction human blastocyst quality. In this research, the authors used 236 images from this paper [1] and were also used by [8]. The prediction, recognize and classification object or texture from a biomedical image can be done by deep learning methods such as Convolutional Neural Networks
Classification of Human Blastocyst Quality Using Wavelets …
967
(CNNs). Many methods have been proposed to adopt transfer learning and augmentation techniques for the classification and prediction of medical images, to solved small dataset problems. Talo et al. [9] proposed learning transfer for automatic classification and using the ResNet34 model without pre-processing on Magnetic Resonance (MR) images, and their deep transfer learning methods have obtained 100% accuracy using 613 brain images with CNN models. Inception-v3 becomes an obvious choice for the classification, [10] applied Inception-v3 model as a fine-tuned pre-trained to classification two classes from the breast tissues with accuracy achieved 90%. In literature [11], the authors proposed the survival neural network model base on CNN and RNN to improve clinical outcome predictions for analyzing images from a patient with locally advanced non-small-cell lung cancer. Several recent works used combine wavelets filter and CNN to improving the quality of texture or microscope images so it could increase training data accuracy. In literature [12], the authors proposed an efficient discrete wavelet transform (DWT) to characterize hardwood species. The hardwood types are decomposed using the Daubechies wavelet filter as microscopic images. They obtain the best accuracy for db3 with 96.80% classification accuracy, and it’s better accuracy compare with local binary pattern filter. Fujieda et al. [13] proposed a CNN architecture and wavelets namely wavelet CNNs model, they generalize the pooling layer and the convolution layer to perform spectral analysis with wavelet transform. The Wavelet CNNs model has the number of parameters smaller than that of AlexNet; therefore it’s obtained an accuracy of 59.8% whereas AlexNet resulted in 57.1%. The authors [14] proposed a model for classification by adding a Gabor filter to recognize the images. The model used the covariance matrix that can get a robust feature from different sizes of the image. Cicconet et al. [15] introduced a cell-tracking and division-detection method base on the Morlet-wavelet filters. The method applies only for the embryo with containing one (centralized) in every sequence of frames. In the following paper [8] used the CNN model that combined with a Canny edge detector as image pre-processing to detect the highest quality embryo with getting the accuracy detection 84.62%. Differing from these models mentioned above, we proposed a prediction model of blastocyst quality by using transfer learning and wavelet filter. Transfer learning is intended to prevent overfitting caused by our small number of datasets. The wavelet filter aims to reduce noise in the blastocyst image. In this research, there is three-part of the process that is pre-processing using wavelet Daubechies, augmentation, and transfer learning using Inception-v3 architecture.
2 Materials and Methods In this part, we present the framework model for the proposed work (see Fig. 1). We have presented a raw image for blastocyst, we first decomposition using the Daubechies wavelet. After the decomposition, there will be an augmentation process that includes a range of operations from the field of image manipulation, such as flips, zooms, random rotation, and random brightness. In the last step, there
968
Irmawati et al.
Fig. 1 Flowchart representation of the methods used for the proposed research
is a transfer learning process that we use Inception-v3 model as pre-trained with removing the final fully connected layers, and add several additional layers with random initialization [6], to learn from the recent medical data given.
2.1 Dataset HMC Microscopy images were acquired from the SFU Data Centre [1]. The dataset contains human blastocyst images with two grades (good and poor). The dataset was used by authors for the training data, validation, and testing. The blastocyst dataset contained 157 images with good quality, and the rest 78 were poor quality. The dataset divided into 80% training data and 20% validation data. The 14 rest images which consist of good and poor blastocyst images used for testing. Some images are prepared by using Daubechies wavelets and compare them with only raw images as input.
2.2 Wavelets The application of wavelet transform in the field of digital image processing includes compression, filtering, and texture analysis. In image processing, texture classification is well studied with a spectral analysis that exploits repetitive structures in many textures, and in this research, the blastocyst image is a texture image that usually contains little information regarding the shape which is informative to distinguish a different object in image classification tasks. Wavelet transformation is a description of an image or signal using a wavelet function. Wavelet transformations are significant in various fields due to the following: When complexity is linear, selected wavelet coefficients are rare, and wavelets can adapt to multiple types of functions, such as continuous functions. In this research work, we used the Daubechies wavelet function such as Daubechies 2, Daubechies 4, and Daubechies 6. We choose the Daubechies wavelet because Orthonormal bases of compactly supported. Daubechies wavelet is produced the best results cause of these features characteristic below [16]:
Classification of Human Blastocyst Quality Using Wavelets …
969
• ψr has the support interval compact [0, 2r + 1] • ψr has the continuous derivatives r/5 ∞ ∞ • −∞ ψr (x)dx = . . . = −∞ x r ψr (x)dx = 0. For integer r , the basis of orthonormal [17] for L 2 (R) is defined as Eq. (1) below: j ∅r, j,k (X ) = 2 2 φr 2 j X − k , j, k ∈ Z
(1)
where the scaling index is j and the shifting index is k, and the filter index is r . So a function f ∈ L 2 (R) is defined as Eq. (2) below: f j (X ) =
< f, ∅r, j,k > ∅r, j,k (X )
(2)
k
Equation (3) below are the fluctuations: d j (X ) = f j+1 (X ) − f j (X )
(3)
Figure 2 is the result for our work in the pre-processing using the Daubechies wavelet filter with coefficients two, four, and six. Fig. 2 Raw image for blastocyst of microscopy image (a) and Daubechies wavelet added to the raw images (b–d)
970
Irmawati et al.
Fig. 3 Augmentation result using the Keras generator
2.3 Augmentation Generally, if we use a small dataset for training, it causes overfitting. The model was learning the features of the training, but it does not generalize utilizing the validation. The augmentation method is used for training and can solve the overfitting problem. The augmentation method is used to give more the amount of data for training. The augmentation techniques are manipulation data by zooming, flipping, and rotating [15]. A result of randomly generated blastocyst images is shown in Fig. 3.
2.4 Transfer Learning As we know, to make a model that can predict the quality of a blastocyst image is not easy, especially with a small dataset that we have. Transfer learning is used to repurposed on another related work, and it can make optimization the rapid progress of the performance when modeling in another task. Transfer learning is related to problems and is not really an area of deep learning. However, transfer learning is
Classification of Human Blastocyst Quality Using Wavelets …
971
Fig. 4 Inception V3 base module [3]
common in deep learning. In general transfer learning just works in deep learning if the model features learned from the first work. Transfer learning is usually done by taking the standard ImageNet architecture along with the weights that have been trained before and then perfecting the target task. However, the ImageNet classification and diagnosis of medical images have considerable differences. Raghu et al. [14] found that transfer learning offers limited performance improvements, and a much smaller architecture can appear comparable to the standard ImageNet model. Inception-v3 model is one of the most widely pre-trained architecture used networks for transfer learning, which aims to update to the inception module to further boost ImageNet classification accuracy. The Inception-v3 [3] is a deep convolutional architecture designed for classification tasks in ImageNet [15]. Figure 4 shows a diagram of how the improved inception module. This architecture basic based on; Reduced the number of convolutions to max 3 × 3, increases the overall depth of the networks, and uses the width addition technique at each layer to better feature succession.
2.5 Tools The features described before were implemented in windows 10; Python version 3.0; Keras; TensorFlow and matplotlib version 1.5. All the library was used to perform the classification through deep learning.
3 Results and Discussion The common problems in machine learning are a classification problem to classify between two or more classes, and in this research, we will classification the quality of blastocyst in a good and poor grade. Since this research used two classification
972
Irmawati et al.
categories (good or poor), we utilizing the Loss value function type is binary crossentropy loss and quantify the performance of a model with a probability amongst 0 and 1. Figure 5 are the samples of predicted images. The conclusion of our two test cases. The probability of good and poor quality is represented in a range of 0–1. A number nearby to 0 shows high confidence of good quality, and a number near to 1 shows elevated confidence of poor quality. Table 1 shows the comparison training accuracy and loss values from the raw image and Daubechies wavelets by using transfer learning. With the raw images as input, we get a lower accuracy than if we add the wavelets filter. Table 2 represents the accuracy result test that using the pre-processing method wavelet (db6) for prediction and we get the high accuracy than the raw input images, wavelet (db2), and wavelet (db4). Based on the result of the training accuracy Table 1, the best model obtained 99.29% and a loss value of 0.04. According to Table 2, the best test results obtained an accuracy of 64.29% for a model using pre-processing with wavelet (db6).
Fig. 5 Display of the correctly classified images
Table 1 The training accuracy result using 20 epochs
Method used
20 epochs Training accuracy (%)
Training loss
Raw image
95.00
0.11
Wavelet (db2)
97.14
0.12
Wavelet (db4)
97.86
0.08
Wavelet (db6)
99.29
0.04
Classification of Human Blastocyst Quality Using Wavelets … Table 2 Test result for prediction 14 blastocyst images
Method used
973 Test accuracy (%)
Raw image
42.86
Wavelet (db2)
50.00
Wavelet (db4)
57.14
Wavelet (db6)
64.29
4 Conclusion The conclusion is that we have employed an Inception-v3 architecture and a Daubechies wavelet as a filter for image pre-processing and able to classify the two classes from the human blastocyst quality images. We obtain increasing the number of images as input a result from augmentation techniques to employ deep learning. Base on our simulation results, the best accuracy that we get is 99.29% for a model using a Daubechies wavelet (db6). Our model can be used to predict the quality of blastocyst with the best accuracy of 64.29%. The prediction test results are still low because the image data for poor quality that trained are still in small amounts so for the next work are need ways to solve the implication of fewer data such as using the Up-sample and Down-sample technique for balance the dataset. Acknowledgements This research publication is supported by Hibah PIT9 Number NKB0052/UN2.R3.1/HKP.05.00/2019 from Universitas Indonesia.
References 1. P. Saeedi, D. Yee, J. Au, J. Havelock, Automatic identification of human blastocyst components via texture. IEEE Trans. Biomed. Eng. 64(12), 2968–2978 (2017) 2. J.C. Rocha et al., Using artificial intelligence to improve the evaluation of human blastocyst morphology, in IJCCI 2017—Proceedings of 9th International Joint Conference on Computational Intelligence (2017), pp. 354–359 3. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision (2015) 4. P. Lakhani, D.L. Gray, C.R. Pett, P. Nagy, G. Shih, Hello world deep learning in medical imaging. J. Digit. Imaging 31(3), 283–289 (2018) 5. N. Tajbakhsh et al., Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans. Med. Imaging 35(5), 1299–1312 (2016) 6. H.-C. Shin et al., Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016) 7. P. Lakhani, B. Sundaram, Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284(2), 574–582 (2017)
974
Irmawati et al.
8. Irmawati, Basari, D. Gunawan, Automated detection of human blastocyst quality using convolutional neural network and edge detector, in 2019 1st International Conference on Cybernetics and Intelligent System (ICORIS), vol. 1 (2019), pp. 181–184 9. M. Talo, U.B. Baloglu, Ö. Yıldırım, U. Rajendra Acharya, Application of deep transfer learning for automated brain abnormality classification using MR images. Cogn. Syst. Res. 54, 176–188 (2019) 10. N. Singla, K. Dubey, V. Srivastava, Automated assessment of breast cancer margin in optical coherence tomography images via pretrained convolutional neural network. J. Biophotonics 12(3), 1–8 (2019) 11. Y. Xu et al., Deep learning predicts lung cancer treatment response from serial medical imaging. Clin. Cancer Res. 25(11), 3266–3276 (2019) 12. A.R. Yadav, R.S. Anand, M.L. Dewal, S. Gupta, Performance analysis of discrete wavelet transform based first-order statistical texture features for hardwood species classification. Procedia Comput. Sci. 57, 214–221 (2015) 13. S. Fujieda, K. Takayama, T. Hachisuka, Wavelet convolutional neural networks for texture classification (2017) 14. M. Raghu, C. Zhang, J. Kleinberg, S. Bengio, Transfusion: understanding transfer learning for medical imaging (2019) 15. J.D.J. Deng, W.D.W. Dong, R. Socher, L.-J.L.L.-J. Li, K.L.K. Li, L.F.-F.L. Fei-Fei, Jjkkjj, in 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 2–9 16. C.N. Vasconcelos, B.N. Vasconcelos, Increasing deep learning melanoma classification by classical and expert knowledge based image transforms (2017) 17. A. Cohen, I. Daubechies, J.-C. Feauveau, Biorthogonal bases of compactly supported wavelets. Commun. Pure Appl. Math. 45(5), 485–560 (1992)
Affinity-Preserving Integer Projected Fixed Point Under Spectral Technique for Graph Matching Beibei Cui and Jean-Charles Créput
Abstract Establishing the pairwise geometric correspondence relationship between two feature point sets from the reference and the query graph is one of the vital technique in image processing for identifying and matching the target. In this paper, we introduce a combination variant of graph matching method using the candidate assignment affinity-preserving algorithm and integer projected fixed point algorithm improved by spectral technique to realize one to one correspondence. Extensive comparison experiments show that it is superior to some of the image matching algorithms in the case of deformation noise and outliers. Keywords Image processing · Real and synthetic images · Spectral matching algorithm
1 Introduction Graph matching is an essential problem in computer science. It can be applied to a variety of issues such as pattern recognition, machine learning, and computer vision [1]. Establishing the correspondence between two feature sets is the most basic issue of image matching. Based on the extraction of feature points [2, 3], two sets of feature point are, respectively, extracted from the given images; then, the main task is to find the corresponding feature point pairs between the reference image and query image while maintaining the relationship with other features. This classic problem is challenging to handle in obtaining accurate solutions. The recent revival of the combinatorial optimization method of feature matching changed this situation; graph matching is mostly expressed as an integer quadratic programming (IQP) problem, which obtaining an exact solution is computationally tricky to handle. Therefore, the B. Cui (B) · J.-C. Créput CIAD, University of Bourgogne Franche-Comté, UTBM, 90010 Belfort, France e-mail: [email protected] J.-C. Créput e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_86
975
976
B. Cui and J.-C. Créput
graph matching problem is confirmed to be a non-deterministic polynomial (NP-hard) problem which needs to find an approximate solution to the problem. Since the graph matching based on IQP is regarded as an NP-hard problem, various approximate solutions are used to attempt to solve the pairwise similarity corresponding mapping problem. Leordeanu and Hebert provided a spectral matching (SM) algorithm [4] based on the main strength cluster of the adjacency matrix by using its principal eigenvectors. Cour et al. presented a new spectral relaxation technique named spectral matching with affine constraint (SMAC) [5]. It includes a normalization procedure which matches the scoring capabilities of existing graphics to improve matching accuracy significantly. Zass and Shashua interpreted hypergraph matching (HGM) algorithm [6], where a hypergraph represents the complex relationship. Leordeanu et al. solved the matching problem by using integer projected fixed point (IPFP) algorithm [7], which found a discrete solution with climbing and convergence properties. Reweighted random walks for graph matching (RRWM) algorithm were introduced by Cho et al. [8], and it combined mapping constraints with reweighted jumping schemes. Our work provides an improved algorithm for graph matching using affinity preserving and reconstructed integer projected fixed point improved by spectral technique (APRIP) based on IQP formulation. By considering the unary and second-order terms, this algorithm determines the mapping between two graphs to establish a correspondence. It can reflect the geometric similarity relationship between the pairwise matching features and retaining as many attributes as possible. The paper is organized as follows. In Sect. 2, the problem formulation of the graph matching is interpreted in detail. The proposed APRIP algorithm was demonstrated in Sect. 3. The performance of the proposed algorithm is evaluated in Sect. 4. Section 5 presents the conclusion.
2 Problem Formulation Given two sets of features P and Q in reference graph GP and query graph GQ , where feature points i, j ⊂ P and a, b ⊂ Q, we customize their correspondence mapping pairs set is L. If the point corresponding pairs composed among P and Q are belonging to L, then they can be characterized as inlier points, otherwise these anomaly match pairs are treated as the outliers. Therefore, the main problem is to testify a suitable mapping constraint C, used for corresponding one feature point of a feature set to one or more feature points from the other feature set. Under this mapping constraint, each of the candidate assignment such as e1 means how well the correspondence between i ⊂ P and a ⊂ Q. Meanwhile, for each pair of candidate assignment (e1 , e2 ) means how compatible the correspondence between e1 = (i, a) and e2 = (j, b). Affinity matrix M, also known as similar matrices, is a basic statistical technique used to organize the mutual similarities between sets of feature points. The measurement of affinity can be interpreted as the probability of a pairwise correlation, for example, if two feature points have probability attribute that is close, then their affinity
Affinity-Preserving Integer Projected Fixed Point Under …
977
score will be much bigger than another two feature points with not so much similarity correspondence. That is to say, if two correspondences are compatible based on the mapping constraints C, the affinity matrix will be set 1, otherwise, M(e1 , e2 ) = 0 for incompatible, where M(e1 , e2 ) is the affinity of pairwise assignment between the reference image and the query image. If all the candidate assignments can be considered as nodes forming an undirected graph, then the pairwise affinity M(e1 , e2 ) can be considered as the weight on the edge, and the individual affinity M(e1 , e1 ) can be considered as the weight on the node, and therefore, M is the affinity matrix of that undirected weighted graph. Typically, M is an n × n sparse positive and symmetric matrices, where M(e1 , e2 ) = M(e1 , e2 ), n = m * N P , N P is the total number feature points in P, and m means the average candidate correspondences between reference and query images. Each feature on reference image can have more candidate assignments from its corresponding query image. The goal of graph matching is to find a proper mapping constraint C of GP = (P, E P ) and GQ = (Q, E Q ) between reference and query graph, where E represents edges, ij ∈ E P , ab ∈ E Q . A graph matching score S [9] can be defined the following IQP problem: S=
f (i j, ab) =
i j∼ab
M(ia, jb) =x T M x,
(1)
i j∼ab
where f (,) measures the similarity between attributes of corresponding edges on graph GP and GQ , x is an indicator vector, which reveals the corresponding similarity between P and Q. Then, the purpose of the graph matching problem can be regarded as trying to compute an optimal solution x * [10], which can maximize graph matching score as shown in the following equation: x ∗ = arg max(x T M x), x ∈ {0, 1}
NP NQ
, ∀i
NQ
≤ 1, ∀a
a=1
NP
≤ 1,
(2)
i=1
where N P and N Q mean the total number of data features of GP and GQ . Usually, we impose a one-to-one constraint on x, such that x ia = 1 means feature i in P matches with feature a in Q, and 0 otherwise. Affinity matrix M [4] which consists of the relational similarity values between edges and nodes is constructed to represent the attribute values. The matching score is completely retained as pairwise geometric, then the individual affinity M(e1 , e1 ) = 0 is set since there is no information about individual affinity. That is to say, all the diagonal values of the affinity matrix are zeros. As to pairwise affinity M(e1 , e2 ), the pairwise distance is constructed: M(ia, jb) = 4.5 −
2 di j − dab 2σd2
, if di j − dab | ≤ 3σd
(3)
978
B. Cui and J.-C. Créput
where d ij and d ab are Euclidean distances between corresponding points of two graphs, parameter σ d is used to control the sensitivity of the score on deformation.
3 Algorithm This proposed iterative APRIP algorithm mainly consists of three parts: affinitypreserving process, the initial solution produced by efficient spectral method, and the improved integer projected fixed point algorithm. Drawing on the idea in PageRank [11, 12], each outgoing hyperlink of node i has been normalized to 1/di, in this case, all of the pages can hold the same value of total output weight. In this paper, affinity matrix M has been converted to a row stochastic matrix P. We define (Mia; jb − min Mia; jb ), (4) P = M/ maxia jb
all of the corresponding pairs can realize normalization since each of them based on the same maximum degree. It can not only maintain the original affinity but also convert the affinity matrix to a random transition matrix. In this situation, the relative affinity relations of the candidate assignment pairs can be retained very well. The spectral technique of graph matching using pairwise constraints looks similar to the greedy algorithm. It is a process of splitting the set of candidate assignments into correct assignments and rejected assignments. First, we need to initialize the solution vector x and all sets of candidate assignments L. The details of this method are shown in Algorithm 1. Here, let x be the principal eigenvector of M, ek is one of assignment among L. This discriminating loop can remove outliers, and it will stop after selecting sufficient assignments. In this paper, the output x of spectral matching will be used as the input of the following initial solution x * in the next improved integer projected fixed point algorithm. Algorithm 1 Spectral matching algorithm Require: x , ek , L 1: for L = ∅ do 2: e k = argmaxek ⊂ L (x (ek )); 3: if x (e k ) = 0 then 4: return x; 5: else 6: set x(e k ) = 1 and remove e k from L; 7: remove assignments in conflict with e k from L; 8: end if 9: end for 10: return x
The algorithm of our improved integer projected fixed point (IIPFP) is a series of independent markup problem where the next solution is acquired by the previous one.
Affinity-Preserving Integer Projected Fixed Point Under …
979
After taking the output of spectral matching x as initial solution x * , a corresponding quadratic score S * can be got based on the discrete constraints as described in Formula (1). The projective operation Pd by using Hungarian method can realize one-to-one discrete constraint since each binary vector in a given discrete domain has the same norm. The pseudo-code of the algorithm is shown as below Algorithm 2, where t is the number of iterations, and k represents the current iteration. It mainly is a series of linearly assigned problem where the next solution x k + 1 is found by using the previous solution x k . In step 6, the approximation is maximized in the discrete domain, and the intermediate conversion parameter yk + 1 is a discrete vector produced by projection Pd . The role of yk + 1 is to increase the direction as much as possible. Along this direction, we can further maximize the original quadratic score since the optimal point can be found in this direction from steps 7 to 10. The improved simplification of discriminant condition r = min{1, |C/D|} saves runtime and can get a more precise solution. The loop from step 11 insures the quadratic score S * is getting closer and closer to the optimal discrete solution since the binary solution returned will never be worse than the initial solution. Finally, this whole algorithm effectively achieves a stable optimal solution x * . Algorithm 2 IIPFP Require: x, M , t 1: k = 0; 2: x * ← x; 3: S * = (x * )T Mx * ; 4: for k ≤ t do 5: x k ← x * ; 6: y k + 1 = P d (Mx k ); 7: C = x T k M (y k + 1 − x k ); 8: D = (y k + 1 − x k )T M (y k + 1 − x k ); 9: r = min{1, |C/D|}; 10: x k + 1 = x k + r(y k + 1 − x k ); 11: if y T k + 1 My k + 1 ≥ S * then 12: S * = y T k + 1 My k + 1 ; 13: x * = y k + 1 ; 14: end if 15: if x k + 1 = x k then 16: return x * ; 17: end if 18: k = k + 1; 19: end for 20: return x *
4 Experiment In this section, we perform experiments for graph matching on real image matching, synthetic image matching, and CMU house image matching as experimental explanation. The proposed APRIP algorithm will be compared with some of the stateof-the-art methods such as RRWM, SM, IPFP, HGM, and HADGA. All of these
980 Table 1 Contrast experimental data
B. Cui and J.-C. Créput Methods
Accuracy
Score
Time (s)
RRWM
73.61
96.52
0.16
SM
64.45
77.23
0.02
IPFP
69.77
97.81
0.61
APRIP
72.40
95.80
0.06
different algorithms will share the same input images, affinity matrices, and ground truths as input parameters of comparison images in each group. After the primary image matching process, the greedy algorithm will be typically used as the final post discretization for all different kinds of contrast algorithms.
4.1 Real Image Matching In the experiment of real image matching, we use the dataset created by Cho in RRWM,1 which consists of 30 different kinds of image pairs. All of the ground truths of these corresponding candidates are manually pre-labeled. Accuracy and objective score are the main judging criteria for matching. Based on the above matching theory, optimal objective score can be obtained by Formula (2). Accuracy can be got by dividing the actual correct number of matches detected by the total number of ground truths, and the formula is as follows: X GT (:). (5) Accuracy = x ∗ ∗ X GT / Table 1 shows the final average results for 30 pairs of pictures through RRWM, SM, IPFP, and APRIP algorithms. From Table 1, we can find that the algorithm APRIP we proposed performs much better than SM. Even if APRIP has the relatively equally matched results in accuracy and score with RRWM, but it costs less computing time than RRWM when dealing with real images. What is more, APRIP reduces a lot on time running and accuracy than IPFP mainly because of the participation of spectral technique which gives an effective initial solution, and therefore, it reduces the number of iterations alternatively. Figure 1 shows the visual map of the feature point connection for image matching. We can see that APRIP can outperform RRWM and IPFP in some of the pictures of the dataset. In most cases, the matching results achieved by methods RRWM, SM, IPFP, and APRIP are comparable. The correct matches and wrong matches are marked in yellow lines and black lines, respectively.
1 https://cv.snu.ac.kr/research/RRWM/.
Affinity-Preserving Integer Projected Fixed Point Under …
981
Fig. 1 From the left to the right: a RRWM algorithm, b SM algorithm, c IPFP algorithm, and d APRIP algorithm for graph matching. The yellow lines represent the correct matching pairs, and the black lines represent the wrong matches
4.2 Synthetic Image Matching For the part of synthetic image matching, we artificially constructed two images, both of which have already passed the step of feature points extraction, that is, each image finally contains nin inlier nodes and nout outliers. This experiment is mainly divided into three interference tests: deformation noise, outliers, and edge density. The deformation noise is generated by using Gaussian noise distribution function N(0, σ 2 ), and the deformation noise σ varies from 0 to 0.2 with the interval 0.02, while nin inlier nodes are set 20, 30, and 40, respectively, as shown in Fig. 2. In the outliers test, the number of outliers varies from 0 to 20 with the interval two, while deformation noise σ is set 0, 0.1, and 0.2, as shown in Fig. 3. However, in the edge density test, as shown in Fig. 4, the edge of the reference image is adjusted by changing the edge density varies from 0.1 to 1 with the interval 0.1. As we can see in these experiment results, APRIP algorithm and RRWM algorithm have almost identical objective score and accuracy matching results, but APRIP costs less time. Overall, the proposed method (red line) performs well with the change of inlier points and deformation noise, and it responds better in terms of time assessment relatively. 2 http://vasc.ri.cmu.edu/idb/html/motion/.
982
B. Cui and J.-C. Créput
Fig. 2 From top to bottom are the deformation noise tests according to the inlier points change under the evaluation of a accuracy, b objective score, and c time
4.3 CMU House Image Matching As to the experiment using CMU house sequence dataset2 , matching work is based on different angles of the same object. A total of 110 pictures in this dataset are divided into different sequence gaps (from 10 to 100 with an interval of 10). So finally, we got ten sets of data pairs. Each set of image pairs consists of an initial fixed position picture (sequence 1) and its varying viewpoints after the rotation at different angles. To evaluate the matching accuracy, 30 iconic feature extraction points were manually tracked and marked on all frames as ground truth. The experiment results summarized in Table 2 show a failure case, which RRWM has much better score result than APRIP, but both of them have pretty good results in accuracy. Figure 5 shows a pair of selected test pictures before and after the rotation as shown in Fig. 5a. However, there are many mismatching pairs in this case of APRIP, as shown in Fig. 5b. From these experiment results, we can conclude that the choice of database has a positive influence on the experimental results. APRIP algorithm fails to achieve a better score result due to drastic rotation.
Affinity-Preserving Integer Projected Fixed Point Under …
983
Fig. 3 From top to bottom are the outliers tests according to the deformation noise change under the evaluation of a accuracy, b objective score, and c time
Fig. 4 From left to right are the edge density tests under the evaluation of a accuracy, b objective score, and c time Table 2 CMU house image matching
Methods
Accuracy
Score
Time (s)
RRWM
92.61
100.00
0.53
APRIP
91.66
79.51
0.49
984
B. Cui and J.-C. Créput
Fig. 5 CMU house dataset matching result
5 Conclusion In this paper, a new graph matching algorithm was proposed. First, affinity-preserving can realize normalization by constructing the same maximum degree. It preserves the relative affinity relations between reference and query graph very well. Second, processed by spectral matching algorithm, its output solution will be used as a nest step’s initial solution. Third, the improved integer projected fixed point algorithm realizes the main matching iteration loop. Finally, the greedy algorithm will be typically used as the final post discretization. The experimental results show this proposed algorithm outperforms some of the existing algorithms at matching accurate, objective score, and time-consuming in the presence of deformation noise and outliers.
References 1. J. Lee, M. Cho, K.M. Lee, Hyper-graph matching via reweighted random walks, in CVPR 2011 (IEEE, 2011), pp. 1633–1640 2. D.G. Lowe, Object recognition from local scale-invariant features. ICCV 99(2), 1150–1157 (1999) 3. J. Matas, O. Chum, M. Urban, T. Pajdla, Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004) 4. M. Leordeanu, M. Hebert, A spectral technique for correspondence problems using pairwise constraints, in Tenth IEEE International Conference on Computer Vision, vol. 2 (2005), pp. 1482–1489 5. T. Cour, P. Srinivasan, J. Shi, Balanced graph matching, in Advances in Neural Information Processing Systems (2007), pp. 313–320 6. R. Zass, A. Shashua, Probabilistic graph and hypergraph matching, in 2008 IEEE Conference on Computer Vision and Pattern Recognition (2008), pp. 1–8 7. M. Leordeanu, M. Hebert, R. Sukthankar, An integer projected fixed point method for graph matching and map inference, in Advances in Neural Information Processing Systems (2009), pp. 1114–1122 8. M. Cho, J. Lee, K.M. Lee, Reweighted random walks for graph matching, in European Conference on Computer Vision (Springer, Berlin, Heidelberg, 2010), pp. 492–505 9. J. Lee, M. Cho, K.M. Lee, A graph matching algorithm using data-driven markov chain Monte Carlo sampling, in 2010 20th International Conference on Pattern Recognition (2010), pp. 2816–2819 10. Y. Suh, M. Cho, K.M. Lee, Graph matching via sequential Monte Carlo, in European Conference on Computer Vision (2012), pp. 624–637
Affinity-Preserving Integer Projected Fixed Point Under …
985
11. H. Tong, C. Faloutsos, J.Y. Pan, Fast random walk with restart and its applications, in Sixth International Conference on Data Mining (ICDM’06) (2006), pp. 613–622 12. A.D. Sarma, A.R. Molla, G. Pandurangan, E. Upfal, Fast distributed pagerank computation, in International Conference on Distributed Computing and Networking (2013), pp. 11–26
A New Optimized GA-RBF Neural Network Algorithm for Oil Spill Detection in SAR Images Vishal Goyal and Aasheesh Shukla
Abstract Marine ecosystem is affected seriously by the illegal discharge of oil spills. Marine oil spills can be detected using SAR image processing. Most effective indicators are detection accuracy and efficiency. In marine environment, oil spills are detected using images of Synthetic Aperture Radar (SAR). Cloudiness and conditions of weather cannot affect the images of SAR. Very calm sea area’s backscatter value will most probably equals oil spill’s backscatter value. Oil spill causes short-gravity waves and dampens capillary. Various techniques are used for the detection of oil spills. Dark areas are detected by these techniques. These areas are having a high probability of being an oil spill. These methods involve a lot of non-linearity, which makes the process a complex one. In the input space of multi-dimension, non-linear data can be handled effectively by neural network. The use of NN is getting increased in remote sensing. Well organized explicit relation between output and input are not required by NN. Own relationship is computed in NN. Genetic Algorithm based, a new optimized Radial Basis Function (RBF) neural network algorithm is proposed in this work. It is termed as GA-RBF algorithm. RBF neural network’s structure and weights are optimized by using this genetic algorithm. Hybrid optimizing encoding is done simultaneously. Detection of the Oil spill is done by using various SAR image samples for training. High value of efficiency and accuracy is produced by this proposed technique as shown by experimentation. Keywords Genetic algorithm · Radial basis function (RBF) · Tamura · Feature · Gray-level co-occurrence matrix · Oil spill · Extreme learning machine
V. Goyal (B) · A. Shukla Department of Electronics and Communication Engineering, GLA University, Mathura 281406, India e-mail: [email protected] A. Shukla e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_87
987
988
V. Goyal and A. Shukla
1 Introduction As one of the most noteworthy wellsprings of marine contamination, oil slicks have caused genuine ecological and financial effects to the sea and waterfront zone [1]. Oil slicks close to the coast can be brought about by transport mishaps, blast of oil rig stages, broken pipelines, and purposeful release of tank-cleaning wastewater from ships. The NEREIDs program, supported by the European Commission, was the primary vigorous endeavor to utilize delivery, topographical and metocean information to describe oil slicks in one of the significant oil investigation zones of the world, preceding any significant oil slick mishap. In light of this information, oil slick models were built up to recreate the advancement and directions of oil slicks and research the powerlessness of seaside zone and discover appropriate measures to lighten its effects on the earth [2]. Early cautioning and close ongoing checking of oil spills assumes a significant job in cleaning up the activity of oil slick to mitigate its effect to seaside conditions [2]. Manufactured gap radar (SAR) is one of most encouraging remote detecting frameworks for oil slick checking, for it can provide valuable data about the position and size of the oil slick [1]. In addition, the wide inclusion and all-day, every single climate ability make SAR entirely reasonable for huge scale oil slick observing and early warning [3]. In their beginning periods, investigations of oil slick discovery are mostly founded on single polarimetric SAR images. The hypothetical reason for SAR oil slick discovery is that the nearness of oil spills on the sea surface hoses short-gravity and fine waves, so the Bragg dispersing from the ocean surface is largely debilitated. The perfect ocean surface breeze speed for oil slicks location is 3–14 m/s [4]. As a result, oil spills can be identified as “dim” zones in SAR pictures. Be that as it may, some other artificial or natural phenomena can bring about fundamentally the same as low dispersing regions on the ocean surface, e.g., biogenic slicks, waves, flows, and low-wind zones, and so on. Traditional oil slick identification methodology use intensity, morphological surface, and assistant data to recognize mineral oil and its carbon copies, with its processing chain, separated into three fundamental steps [4]: (1) dim spot location; (2) highlights extraction; and (3) grouping among mineral and its clones. A few studies going for self-loader or programmed oil slick identification can be found in writing [5]. These investigations initially identify physically or with edge separating dim territories on the picture which could be oil slicks. If not upheld by visual examination [6], dull territories discovery prerequired a limit wind speed adequate to produce the ocean state [7]. The degree of the ocean state conditions is therefore incorporated into the estimation of the quality of the difference signal that an oil slick yields. At the point when dim zones are identified, factual characterization techniques (for example Bayesian) are applied to portray the dull territories as oil slicks or “copy” objects. For this reason, estimation of various unearthly and spatial highlights of the dull regions (geometric, encompassing, backscattering, and so forth) is prerequired. In significant examinations, grouping techniques are normally
A New Optimized GA-RBF Neural Network Algorithm for Oil Spill …
989
applied distinctly on the dim zones, considering them as articles [7], while dull zones discovery strategies depend on pixel-premise preparing. The change from the recognition venture to the portrayal one needs client obstruction as far as covering, coding, and choosing the dull items so as to continue to characterization handling. Along these lines, in this examination another advanced Radial Basis Function (RBF) neural system calculation dependent on genetic calculation (GA) called GA-RBF calculation, which uses a hereditary calculation to upgrade the loads and structure of RBF neural system; it picks better approaches for crossover encoding and streamlining all the while.
2 Proposed Methodology The strategy created was applied to an ERS 1 picture caught on 1/6/1992 (circle 4589, outline 2961). The picture speaks to a harsh ocean surface, proficient to deliver a solid difference signal within the sight of oil slicks. It additionally contains copies in the left part, brought about by various ocean states (nearby windfalls in a major swell wave). In the analyses actualized, it was seen that the quantity of sources of info essentially influences the computational time, due to the increased size and intricacy of the neural system. An ERS scene (120 Mb around) requires 5 h for preparing while a picture window of 4–16 Mb size, requires 2–5 min. The strategy was applied on picture windows of 4–16 Mb, to test its presentation as far as time prerequisites and result quality. The fundamental viewpoints considered for oil slick location utilizing neural systems were information readiness, organize design choice, parameters estimation, and system execution promotion. In a past report, an itemized assessment of the highlights commitment to oil slick recognition has been performed [5]. Highlights, which have been directed to fruitful oil slick identification, were extricated from the SAR picture. A readiness was important all together these highlights to be useful to neural arrange. Also, an underlying system was picked and prepared for each system. Picture results were contrasted with reference information with survey technique precision. System engineering consistently changed, including a hub (input layer or neuron) and rethinking the technique.
2.1 Preparing the Information Setting up the information includes highlight extraction and standardize the information into a specific interim (for instance [0, 1]) as indicated by the base and most extreme estimations of the component. The reason for highlight extraction is to outline a picture to a component space that could fill in as the reason for further handling [8]. All together the neural system to be utilitarian and the grouping strategy to be straightforward, the contributions of the neural system were pictures. In this manner, a few pictures were created from the first SAR, everyone introducing a
990
V. Goyal and A. Shukla
Fig. 1 Inputs to neural networks
surface or geometry key-highlight. Five pictures were chosen by their presentation in oil slick order: the first SAR picture, the shape surface, the asymmetry, the mean contrast to neighbors, and the ability to mean pictures (Fig. 1). Shape surface picture is alluded to the surface which depends on ghastly data gave by the first picture layer and determined as the standard deviation of the distinctive mean estimations of picture questions previously created. Capacity to mean proportion is characterized as the proportion of the standard deviation and the mean estimation of the articles.
2.2 PROPOSED GA-RBF Approach for Order For both MPL and RBF neural systems an underlying system topology was chosen. Choice of most appropriate topology for each NN was structured through slope climbing approach, which for a pursuit point utilizes an answer made from a past topology. The contractive calculation was utilized, in which introductory topology
A New Optimized GA-RBF Neural Network Algorithm for Oil Spill …
991
Fig. 2 The flow chart of genetic algorithm
was the easiest one, and hubs were included a while later. The presentation of every topology was assessed and the procedure was rehashed iteratively until a foreordained halting foundation was accomplished. Useful calculation was picked among other slope climbing calculations (for example pruning) since it was extremely simple to determine the underlying NN topology and it was fundamentally quicker regarding preparing time.
2.3 The Basic Theory of Genetic Algorithm Hereditary calculation begins from a populace of spoke to potential arrangement set; be that as it may, the populace is made out of a specific number of encoded quality people, which is the substances with trademark chromosome. The primary issues of developing the hereditary calculation are the resolvable encoding technique and the structure of hereditary administrators. Looked with changed advancement strategies, we have to utilize diverse encoding techniques and hereditary administrators of various activity, so they just as the level of the comprehension of the issues to be tackled are the central matter deciding if the utilization of hereditary calculation can succeed. It is an iterative method; in every cycle, it holds an up-and-comer arrangement and sorts them by the nature of the arrangements and afterward picks a portion of the arrangement concurring a few markers and utilizations hereditary administrators to figure it to deliver another age of competitor arrangements. Genetic algorithm procedure is shown in Fig. 2.
2.4 The Basic Theory of RBF Neural Network Hidden layer space is constructed by using RBF as “basis” of hidden layer. While computing the center point of RBF, non-linear function is exhibited by the hidden layer and locally it is distributed, three-layer feed-forward network is an RBF. The
992
V. Goyal and A. Shukla
Fig. 3 The topology structure of RBF neural network
layers are output, hidden, and input. Topology of RBF network is shown in Fig. 3. Activation function of hidden layer corresponds to RBF. Gaussian function is used in general cases. Network with m outputs and n inputs are taken. S neurons are there in hidden layer, wi j represents the weight of connection between hidden and input layer and w jk represents weight of connection between output and hidden layer. Preparation procedure of RBF system is classified into two stages; initial step is to figure out how to distinguish w_ijweight without educator, and subsequent advance is to recognize weight with w_jkteacher. It is a key issue to recognize quantity of concealed layer’s neurons; for the most part, it begins to prepare from 0 neurons; shrouded layer neuron is expanded naturally by checking blunder and rehashes this procedure until mentioned accuracy or biggest number of concealed layer’s neurons is accomplished.
2.5 The Thought of GA-RBF Algorithm In the stage of training, hidden layers are adjusted self-adaptively by RBF in comparison with RBF neural network with BP network based on specified problems. Training sample’s distribution, category, capacity decides the allocation of neurons in hidden layer. Dynamically identify the center of neurons in hidden layer and width of it. In the training stage, BP network’s architecture will not change, once is it identified. Computation of hidden layers count with its neurons are very difficult. Superior performance is exhibited by RBF than BF network. The fundamental substance of utilizing hereditary calculation to improve RBF system incorporates the chromosome coding, meaning of wellness work, and build of hereditary administrators. Utilization of GA-RBF advancement calculation can be viewed as a versatile framework; it is to consequently change its system structure
A New Optimized GA-RBF Neural Network Algorithm for Oil Spill …
993
Fig. 4 The flow chart of GA-RBF algorithm
and association loads without human intercession and make it conceivable to join hereditary calculation with the neural system naturally, which is appeared as in Fig. 4.
2.6 Chromosome Encoding Assume, s represents maximum number of neurons in hidden layer of neural network RBF and m corresponds to number of neurons in output. Following shows the coding and binary coding of neurons in hidden layer. c1 , c2 , . . . cs
(1)
Method of binary encoding is used to encode number of neurons in hidden layer. ci with 1 or 0 value represents it. Value of 1 corresponds to existence of neuron and value of 0 corresponds to non-existence of neuron with upper limit. Following shows the real encoding scheme with weights: w11 w21 , . . . ws1 w12 w22 . . . ws2 . . . w1m w2m . . . wsm
(2)
Method real number encoding is used to encode weight between output and hidden layer. Weight of connection between ith output neuron to the jth hidden neuron is represented as wi j . Real encoding scheme with threshold is given by, θ1 θ2 . . . θm
(3)
Method real number encoding is used encode output layer neuron’s threshold. The jth output neuron’s threshold is represented as θ j . Threshold, weight of a connection, and structure combination produces chromosome’s entire coding stand and it is expressed as, c1 c2 . . . cs w11 w21 . . . ws1 w12 w22 . . . ws2 . . . w1m w2m . . . wsm θ1 θ2 . . . θm
(4)
994
V. Goyal and A. Shukla
2.7 Constructing Genetic Operator 1. Selection Operator. Proportional selection operator is chosen in this work as a selection operator. In a genetic algorithm, roulette wheel selection is used generally. An individual with high value of fitness is having high probability of getting selected. Individuals having less value of fitness may also be selected. So it corresponds to “survival of fittest”. 2. Crossover Operator. It corresponds to a single-point operator. Two new individuals are generated by cross overing chosen two parents of individuals. In the new generation, they are added. Until reaching maximum size by population of the new generation, repeat this procedure. We utilize single-point hybrid despite the fact that the total system utilizes half and half encoding; notwithstanding, the hybrid activity for double encoding and genuine encoding is equivalent. System of elitism determination is utilized here, that is, to hold a few people with the most noteworthy wellness to cutting edge legitimately; this technique anticipates the loss of ideal individual during development. 3. Mutation Operator. Reversal operator is used by Mutation. Hybrid encoding is used by this. Various code systems are applied to various operations. Bit-flipping mutation is used by binary encoding. Gaussian mutation is used by real encoding. Random Gaussian number is added with chromosome of some genes.
2.8 Calculate Fitness Genetic selection is based on the evaluation of fitness function. Genetic algorithm performance is directly affected by this. It is very crucial to select fitness function. Convergence speed of the genetic algorithm is directly affected by this and optimum solution is computed by this. Training error of network is utilized for forming testing and training data from original data. Chromosomes fitness based on neural network RBF is computed by hidden layer neurons. If E represents training error, s represents number of neurons in hidden layer and smax defines the upper limit of number of neurons in hidden layer. The following defines the fitness, F =C−E×
s smax
(5)
where constant number is represented as C. Chromosome’s fitness value is made high, size of network, and error of training are made small by this expression.
A New Optimized GA-RBF Neural Network Algorithm for Oil Spill …
995
3 Parameters of RBF Neural Network Three parameters can be adjusted in classical neural network RBF. They are basis function in the hidden layer’s width and center, weight of connection between output and hidden layer. The following rules are adapted for constructing classical neural network RBF. 1. Basis Function Centers. If the problem is represented by training sample’s distribution, experience is used to select the center of basis function. D represents spacing. Gaussian functions width is expressed as: d σ =√ 2s
(6)
2. Basis Function. Basis function is selected using K-mean cluster method. Basis functions center corresponds to every cluster’s center. LMS method is used directly to compute weights of output which is a linear unit. Error of training is modified by Eq. (7). Optimum algorithm of neural network is obtained as follows, e=
n
(tk − yk )2
(7)
k=1
Here, error faction is represented as e, actual value is given by tk , and output of neural network is represented by yk .
3.1 GA-RBF Algorithm’s Basic Steps The following describes the steps involved in the algorithm of GA-RBF neural network. Step 1. Number hidden layer neurons are used to set neural network RBF. Basis function’s center is computed using a K-clustering algorithm. Center of the width is computed using Eq. (6). Step 2. GA parameters are set. Size of population, rate of crossover, rate of mutation, mechanism of selection, operator of crossover and mutation, the error of objective function, iteration count are set. Step 3. Randomly population P is initialized with N size. Using Eq. (4), encode every individual according to a network. Step 4. Initially constructed neural network RBF is trained by using training sample. N is the amount of it. Output error E of a network is computed using Eq. (7).
996
V. Goyal and A. Shukla
Step 5. For every network, fitness of chromosome is computed by using Eq. (5) which is based number of neurons in hidden layer and error of training. Step 6. Chromosome are sorted based on value of fitness. In population, best fitness value is selected and represented by Fb . < E min or G ≥ G max is verified. If this condition is satisfied step 9 is executed and else execute step 7. Step 7. For next-generation, various best individuals are directly reserved by selecting them. Step 8. For single-point crossover, chromosomes pair is selected for generating the next-generation’s member. Two members are generated. Until reaching populations maximum size Ps by new generation, this procedure is repeated. Separately coding is done by this time. Step 9. New generation’s population is mutated. Various techniques of mutation are used by real number and binary coding techniques. Generate a new population. P = New P; G = G + 1 is set. Execute step 4. Step 10. Neural network’s optimum structure is obtained. Genetic algorithms iteration is terminated. This corresponds to stopping of optimization. Step 11. Learning weight of a new neural network is not enough. Weights are learned further by LMS method. Algorithm is ended. Neural network structure is optimized, number of neurons in hidden layers are computed, basis function center is computed, threshold and weight of connections are optimized in the new model. These modifications are done for saving the running time of a network, improving speed of convergence as well as training. Network’s operating efficiency also increases by this. In the proposed method, the network topology is chosen with one input and output node. For each topology of the network, oil spill presence is identified from classified images. With reference to the dataset, every image produced by a comparison. Photo-interpretation method is used to produce reference dataset. Confusion matrices are used to make a comparison. Confusion matrix is produced by every image and computes the overall accuracy. Ratio of pixels that are classified correctly to total pixels produces overall accuracy.
4 Experimental Results To evaluate the proposed algorithm, the selected data of this paper are derived from 43 SAR images of marine oil spill, whose satellite type is Envisat. All data have given an accurate evaluation of oil spill regions as illustrated by Fig. 5. Oil spill regions in ground truth are represented by pixels with a value of zero. PC with Win 10 OS, 8 GB memory, and 256 GB SSD (Solid State disk) is used for performing the experimentations. From these 43 images, we randomly select 35 images and their corresponding ground truth for training Radial Basis Function (GA-RBF) network which is based on genetic algorithm, and the left 8 images are taken as test images. And in the
A New Optimized GA-RBF Neural Network Algorithm for Oil Spill …
(a) Original image
997
(b) Ground Truth
Fig. 5 Original oil spill image and it’s ground truth
model, 35 nodes are determined in the hidden layer. Inspirit function corresponds to a sigmoid function. To compute, the accuracy of proposed algorithm, an equation indicating region fitting error (RFE) is introduced [9], see in (10). RFE =
(P ∪ GT) − (P ∩ GT) GT
where P stands for the result of the proposed algorithm and GT stands for the ground truth. Highly accurate segmentation is indicated by a small value of RFE. A perfect segmentation should have a result of RFE equalling to zero. Some results are shown in Fig. 6, where red dashes denote the difference with the ground truth and RFE results are shown in Table 1. From the result in Table 1, for these four original SAR images in Fig. 6a, the proposed algorithm can effectively extract the oil spill regions, which are basically consistent with the ground truth and all RFE are acceptable.
5 Conclusion SAR images are used to investigate the identification of oil spill by neural networkbased optimization in this research. Neural networks are given with original image with modified images. For classification, a small amount of memory is required by GA-RBF. Generalization of GA-RBF is far better than RBF. Images with various oil spill types and sea states require additional examination. In requires investigating the performance of recurrent networks and Support Vector Machines which are the types of neural networks.
998
V. Goyal and A. Shukla
(a)
Four Original SAR Images
(b) Four ground truth corresponding to (a)
© Four results of the proposed algorithm of the input image Fig. 6 Results of the proposed original images Table 1 RFE of four SAR images using the proposed algorithm compared with ground truth IMAGE RFE
Upper left
Upper right
Lower left
Lower right
RFE
0.14
0.07
0.13
0.07
A New Optimized GA-RBF Neural Network Algorithm for Oil Spill …
999
References 1. M. Fingas, The Basics of Oil Spill Cleanup (Lewis Publisher, Boca Raton, FL, USA, 2001) 2. T.M. Alves, E. Kokinou, G. Zodiatis, R. Lardner, C. Panagiotakis, H. Radhakrishnan, Modelling of oil spills in confined maritime basins: the case for early response in the Eastern Mediterranean Sea. Environ. Pollut. 206, 390–399 (2015) 3. W.H. Alpers, A. Espedal, Oils and surfactants, in Synthetic Aperture Radar Marine User’s Manual, ed. by R.J. Christopher, J.R. Apel (US Department Commerce, Washington, DC, USA, 2004) 4. A.H.S. Solberg, Remote sensing of ocean oil-spill pollution. Proc. IEEE 100, 2931–2945 (2012) 5. K. Topouzelis, V. Karathanassi, P. Pavlakis, D. Rokos, Oil spill detection: SAR multi-scale segmentation & object features evaluation, in 9th International Symposium on Remote Sensing— SPIE, 23–27 Sept 2002, Crete, Greece 6. J. Lu, H. Lim, S. Liew, M. Bao, L. Kwoh, Ocean oil pollution mapping with ERS synthetic aperture radar imagery, in IEEE IGARSS 1999 Proceedings (1999), pp. 212–214 7. F. Del Frate, A. Petrocchi, J. Lichtenegger, G. Calabresi, Neural networks for oil spill detection using ERS-SAR data. IEEE Trans. Geosci. Remote Sens. 38(5), 2282–2287 (2000) 8. I. Kanellopoulos, G. Wilkinson, F. Roli, J. Austin, Neurocomputation in Remote Sensing Data Analysis (Springer, 1997) 9. X. Yu, H. Zhang, C. Luo, H. Qi, P. Ren, Oil spill segmentation via adversarial $f$-divergence learning. IEEE Trans. Geosci. Remote Sens. 56(9), 4973–4988 (2018)
Survey of Occluded and Unoccluded Face Recognition Shiye Xu
Abstract Computer Vision has been widely developed in recent years as the development of AI, as people want to let AI be able to see things. In all of the aspects of computer vision, face recognition seems to be one of the most important parts as for AI. People mainly want it to help them recognize the face which may be used for the security like the brush face payment. Also, as there are more and more criminals who know shelter their faces for not let the monitor to recognize their face and face recognition by the computer may solve this issue as there is a limitation for human to recognize the face and also is exhausting. In the last decade, face recognition mainly uses some linear or nonlinear models or neural network methods, which has a low accuracy and the database for these methods are small. Moreover, some of the methods may not be able to solve the picture which is occluded or is low-quality. This paper provides a recent research trends for face recognition in unobstructed, occluded and low-quality pictures. To make it easier to understand, tables of different applications, databases, methods are provided. This paper concludes by proposing some challenges may still involve in face recognition as well as some expectations for the future research. Keywords Face recognition · Occluded · Low-quality · Unobstructed · Databases
1 Introduction Face recognition is quite an active research area in computer vision for many years. There exist many different face image databases, related competitions and models for solving this problem. There are many aspects for this problem and here we will discuss three of them, the original picture which means the picture that only contain the face without the occluded picture and the low-quality picture, and the occluded picture as well as the low-quality picture. It seems there already exists many ways of solving the problem with original picture, basically from the original method like S. Xu (B) University of Nottingham, Ningbo, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_88
1001
1002 Table 1 Typical applications of face recognition
S. Xu Areas
Specific applications
Entertainment
Video games, virtual reality, training programs Human-robot-interaction, human-computer-interaction
Smart cards
Drivers’ licenses, entitlement programs Immigration, national ID, passports, voter registration Welfare fraud
Information security
TV Parental control, personal device logon, desktop logon Application security, database security, file encryption Intranet security, internet access, medical records Secure trading terminals
Law enforcement and surveillance
Advanced video surveillance, CCTV control Portal control, post-event analysis Shoplifting, suspect tracking, and investigation
Library
Access control system
PCA [1] or LDA [1] to deep learning. The method has been changing from year to year until now, it seems that all the question is using deep learning which will give us a high performance for face recognition. However, for the occluded picture and the low-quality picture, there still has some problems as the accuracy is not so high as the original picture. The main concern for this is that the computer may not be able to understand what is occluded and how to make the low-quality picture into highquality picture. Computer can only get the information of the pixels which cannot directly represent the information of the occluded problem. Although, it seems that there still exists some problem in face recognition, there are many applications now in the real world in many aspects, including entertainment, security, and other aspects. Here are some applications of face recognition in the real world in Table 1.
Survey of Occluded and Unoccluded Face Recognition
1003
2 Literature Review 2.1 Unobstructed Method 2.1.1
Database
For the past twenty years, there are many databases for the research of face recognition from many different countries and institutions. The most usable database has already been displayed in Table 2 and in these databases, it is clear that FERET database used the most by the researchers, not only for the reason that it contains many images, also it provides many other properties of the image. It has many different expressions, poses, illuminations, and ages pictures that can be used for the research when it needed to study some detailed algorithm for face recognition, not just the normal Table 2 Database [2] Database
Details
FERET database
14,126 images with 1199 individuals with different expressions, poses, illuminations and ages
XM2VTS database
1180 color images, 2360 side-profile images
ORL database
400 images of 40 people with different poses, expressions, and accessories
BioID database
1512 images with different illuminations and background where the locations of eyes have been marked
MIT face database
2592 images of 16 individuals with different poses, illuminations, and sizes
UMIST face database
564 images of 20 people with different poses and a range of race/sex/appearance.
CMU face detection database 208 images with 441 faces of which 347 are profile views CMU-PIE database
75,000 images of 337 individuals with different expressions, illuminations and poses
Bern database
10 gray images of 30 individuals with different poses
AR database
126 individuals’ images with different illuminations, poses, and expressions
Yale face database
165 grayscale images of 15 individuals with different facial expression or configuration
Yale face database B
5850 images of 10 individuals with 9 poses and 64 illuminations
LFW database
13,233 images of 5749 individuals
WDRef database
99,773 images of 2995 individuals
CelebFaces database
202,599 images of 10,177 individuals
Ours database
2.6 M images of 2622 individuals
Facebook database
4.4 M images of 4030 individuals
Google database
200 M images of 8 M individuals
1004
S. Xu
techniques. There may need to find the solution for the detection of the image with different properties. Except for this database, the databases that have been listed in Table 2 are all worth studying especially for the CMU-PIE database which is quite useful for the detection of the accuracy of the model. As if there are only one database, there may still have some problems for the researchers to find out if the model can detect the pictures already. Also, there are many other databases that may more focus on the gray images or the image with different poses and illuminations as well as the 3D images. So for the researchers, they can choose the database they may be mostly benefited and also try some other databases to check if their models can handle different images and the accuracy of testing for different databases. Here are some of the databases used in face recognition in Table 2.
2.1.2
Traditional Scenario
In the early years, before the establishment of deep learning, the researchers basically use PCA to reduce the dimensions of feature space by using a smaller number of features to describe the sample. PCA can be concerned as an approach to reduce the workload for the computer as for each picture, there may have more than 10,000 pixels, wherein a database, there will have more than 1000 pictures, so if the model needs to handle these data, it may need a space more than 10,000,000, which is too big for the normal computer. PCA can simply reduce this number to 1000 or 10,000, as there are many useless pixels in the images. Despite using PCA, there is another method called LDA which is used for clarifying the difference between two images in different classes. This method will maximize the difference between the two images in different classes and will minimize the difference between the two images in the same class. It is mostly used for dimension reduction and can be used for the classification. These two methods are linear projection methods, and as the face is nonlinear, so nonlinear projection method can be used, such as kernel PCA or kernel LDA. There is also a method named ICA [3] which is used for separating data which is also useful in face recognition. However, these methods are dependent on the scenarios for the training set and testing set and cannot be able to handle the problem of illumination, expression, poses, etc. Thus, these methods are hardly used these days. After this stage, researchers began to use the artificial features and the classifier to handle this issue. There are many artificial features that have been used to recognize the images like the HOG, SIFT, Gabor [4], and LBP features. Among all these features, the most typical one is LBP feature which is easy but efficient. It solves part of the illumination problem, though it still didn’t handle the problem of pose and expression. As for the classifier, there are many, such as the neural networks, or the SVM, or many types of Bayesian classifiers. These are all mature solutions that may not do such a great influence on face recognition. Table 3 shows the performance of using traditional methods with different databases.
Survey of Occluded and Unoccluded Face Recognition Table 3 Traditional method [5]
2.1.3
1005
Method
Databases
Accuracy (%)
DLA [6]
Property DB
90.3
Fisher faces [7]
YALE
99.6
PDBNN [8]
SCR
100
EGM [9]
FERET
80
WPA [10]
MIT
80.5
PCA [1]
AR-faces
70
LDA [1]
AR-faces
88
Direct LDA [11]
ORL
90.8
IFS [12]
MIT
90
ICA [3]
FERET
89
RBF [13]
ORL
98.1
DF-LDA [14]
ORL
96
HMM [15]
FERET
97
Th-infrared [16]
Property DB
98
Hyperspectral [17]
Property DB
92
Gabor EFM [4]
FERET
99
Thermal [18]
Property DB
93
Th-spectrum [19]
Equinox
86.8
DCV [20]
YALE
97.33
Deep Learning Method
After deep learning did tremendous achievements in the ILSVRC-2012 competitions, most of the researchers began to implement this method into face recognition. The first step for deep learning is CNN technology which can extract more usable features than the artificial features. At this stage, researchers more likely to use different structures or different input data and then use CNN for the classical recognition model training. Not long after that, the improvement is more likely in the loss functions which will help CNN to better extract the features which may better figure out the difference between two different images. DeepFace is the method provided by Facebook in 2014 which is the milestone for deep learning in face recognition. It mainly uses the Softmax function for the optimization of the problem and uses feature embedding to get the solid face feature vectors. It also uses the Backbone network which includes multiple local convolutions to learn different characteristics of the same faces. However, the Backbone network only increases the computation but not increase the accuracy. Although there may have some disadvantages, it is clear that the DeepFace method uses the dataset of 4 M images with 4000 people which is much more than the dataset of 400 images with 40 people and the accuracy is also nearly 97.5% which is a huge success in
1006
S. Xu
face recognition. This demonstrates that the big dataset is more beneficial for face recognition. Google pushes off FaceNet which uses Triplet Loss function rather than the Softmax function that can get a compact 128 dimensions face features. This decreases the data and has more accuracy than the DeepFace which reaches a state of 99.63% in LFW database. After this method, there are many methods which use the deep learning method. Like the VGGFace, or Face++, etc. These methods not only increase the accuracy of the model but also increase the images being used to establish the model. As if the model is trained by only a few images, this model may not be so persuasive than the other models. During these years, the development of deep learning become much faster where the accuracy seems to reach 99.8% in some of the models, which demonstrate the importance of implement deep learning in face recognition. Here are some deep learning methods used in LFW and YTF databases and their accuracy in Table 4. Table 4 Deep learning method [21] Method
Database Accuracy (%)
Fisher vector faces [22]
LFW
93.1
DeepFace (Taigman et al. DeepFace: closing the gap to human-level LFW performance in face verification) [23]
97.35
Fusion (Taigman et al. web-scale training for face identification) [24] LFW
98.37
DeepID-2 [25]
LFW
99.15
DeepID-3 [26]
LFW
99.53
FaceNet [27]
LFW
98.87
FaceNet + Alignment [27]
LFW
99.63
VGGFace [28]
LFW
98.95
Light CNN [29]
LFW
98.8
SparseConvNets (Sun et al. sparsifying neural network connections for face recognition) [30]
LFW
99.55
Range loss [31]
LFW
99.52
CoCo loss [32]
LFW
99.86
Arcface [33]
LFW
99.83
Video fisher vector faces [34]
YTF
83.8
DeepFace (Taigman et al. DeepFace: closing the gap to human-level YTF performance in face verification) [23]
91.4
DeepID-2, 2+, 3 [25, 26]
YTF
93.2
FaceNet + Alignment [27]
YTF
95.1
VGGFace (K = 100) [28]
YTF
91.6
VGGFace (K = 100) + Embedding learing [28]
YTF
97.3
Survey of Occluded and Unoccluded Face Recognition
1007
2.2 Obstructed Scenario These pictures are mainly the picture with a human face but not the whole of it or with different poses. Simply as there is some black ink on one’s picture which prevents you of recognizing whose picture is it. It is clear that these kinds of pictures are much harder to recognize than those without it. The most challenge for this question is that the computer may not be able to recognize which is face and which is the ink. And it’s hard to let computer know what occluded is. It may simply recognize a picture of a person wearing the glasses as occluded which we think is not and simply recognize the picture combined with high noisy pictures as human faces which may not be able to be recognized by human beings. As the computer is capable of solving the problem regarding the adding of the pixels but may not be capable to figure out the difference between two pixels. Here are some basic approaches to solve this problem as well as the database that has been used.
2.2.1
Database
The database being used here is mainly listed in Table 2, for those that contain the image with occlusion. In this section, we mainly focus on some of the databases: FERET, CMU-PIE, Multi-PIE, LFW.
2.2.2
Existing Approach
For the occluded pictures recognition, there are many approaches. Mainly, there are more than seven features and many classifiers for this question, and the most used features for this question is PCANet [35] and the DeepID (Sun et al. Deep Learning Face Representation from Predicting 10,000 Classes) [36], both methods are concerned with deep learning method, as deep learning is now the most useful methods in solving this problem. As for the classifier, there are many useful classifiers, like the CRC [37], NN, GRRC [38], RNR [39], CESR [40] and MEC [41]. Of all these classifiers, the NN and the CRC seem mostly usable in solving this question. It is clear from Table 5 that for the Extended Yale B database, no matter how many percentages of the picture is occluded, the best method for this is NN + PCANet, and for the LFW database, the best method is CRC + DeepID and NN + PCANet. It is obvious that for the Extended Yale B database, the picture whose occluded rate is lower than 80%, the accuracy is more than 88% which is very high and it shows that for this database, the method to detect is very mature, however, for the LFW database, the accuracy only high when the picture is only less than 20% is occluded, which shows that is this database, there still needs some more method to solve the problem. Table 5 shows some best methods for the occluded pictures in Extended Yale B and LFW databases.
1008
S. Xu
Table 5 Some best methods for occluded pictures Database
Occluded rate (%)
Best method
Accuracy (%)
Extended yale B
0–30
NN + PCANet [35]
100
40
99.90
50
99.63
60
99.21
70
96.60
80
88.19
90
50.73
LFW
0 10 20
CRC [37] + DeepID (Sun et al. deep learning face representation from predicting 10,000 classes) [36]
30 40
100 98.74 78.99 53.48
NN + PCANet [35]
34.18
50
24.64
60
16.87
2.3 Law Quality Picture Scenario 2.3.1
Database
As there do not have any databases for the low-quality pictures, so for this problem, mainly researchers need to resize the images to a small scale or add a defined blurriness amount to the images which will make the images become the low-quality images. There are two types of datasets, one is constrained dataset which is collected under constrained environments and another is an unconstrained dataset where the data is collected without the subjects’ cooperation, yielding random poses, varying resolutions, and different subjective quality levels. Here are some databases used for the low-quality picture detection in Table 6.
2.3.2
Performance
It can be seen clearly from the dataset that in most cases, the unconstrained data seems to have more accuracy than the constrained data. Also, when the dataset is FRGC or FERET, it is obvious that the accuracy will increase when the data (pixels) is fewer, which suggests that the increase of the dataset seems not to have the ability to increase the accuracy of the recognition obviously. As concerning the pose and illumination variations, it seems that the performance should be expected to degrade. However, when it has been tested by Multi-PIE dataset, it seems that the accuracy is dropped but not so deep where the accuracy is also at around 60% and some
Survey of Occluded and Unoccluded Face Recognition
1009
Table 6 Database Constrained
Unconstrained
Name
Type
Individuals
Images
Variations
FRGC
Static/3D
222 (training) 466 (validation)
12,776
Background
CMU-PIE
Static
68
41,368
Pose, illumination, expression
CMU-multi-PIE
Static
337
750,000
Pose, illumination, expression
Yale-B
Static
10
5850
Pose, illumination
CAS-PEAL-R1
Static
1040
30,900
Pose, illumination, accessory, background
AR
Static
126
4000
Illumination, expression, occlusion
ORL
Static
40
400
Illumination, accessory
HeadPose
Static
15
2790
Pose, accessory
PaSC
Static + video
293/265
9376(static) 2802(video)
Environment
Scface
Static
130
4160
Visible and infrared spectrum
EBOLO
Video
9 and 213 distractors
114,966
Accessory
UCCSface
Static
308
6357
YTF
Video
1595
3425 (video sequences)
LFW
Static
5749
13,000
CFPW
Static
500
6000
Pose
accuracy is still near 90%. As for the unconstrained data, most of the accuracy is less than the constrained data, at around 60%, but for some of data, it may reach 90%. The challenge for unconstrained low-quality image face recognition is the training process and the precise recognition rate as without constrained condition, it is difficult to collect and ground the true images with the same quality and the algorithm used for this cannot capture the real face distribution in the image so it may not be able to
1010
S. Xu
Table 7 Performance
Constrained
Database
Gallery resolution (pixels)
Probe resolution (pixels)
Accuracy (%)
FRGC
56 * 48 [42]
7*6
77.0
64 * 64 [43]
Random real-world image blur
84.2
128 * 128 [44]
16 * 16
100.0
32 * 32 [45]
8*8
81.0
64 * 64 [43]
Random artificial blur
97.1
128 * 128 [46]
64 * 64 Gaussian blur
88.6
128 * 128 [46]
64 * 64 Gaussian blur
87.4
48 * 40 [47]
7*6
78.0
32 * 28 [48]
8*7
98.2
80 * 60 [49]
Random artificial blur
61.3
48 * 40 [50]
8*6
53.0
65 * 55 [51]
20 * 18
89.0
30 * 24 [52]
15 * 12
52.7
48 * 48 [53]
16 * 16
81.5
128 * 128 [44]
16 * 16
12.2
UCCSface
80 * 80 [54]
16 * 16
59.0
LFW
72 * 64 [55]
12 * 14
66.19
112 * 96 [56]
12 * 14
98.25
224 * 224 [57]
20 * 20
90
112 * 96 [56]
12 * 17
93
FERET
CMU-PIE
CMU multi-PIE
Unconstrained
Scface
YTF
get an accurate face recognition under unconstrained condition. In Table 7, it shows some neural networks used in different databases and their accuracy.
3 Challenge 3.1 Pose (Facial Expression and Posture) The method used here is mainly for the picture whose human face is facing the front not facing the other direction, where the method may get more information from the front faced images but may not get the same information from the image which is facing the other direction. Also, if there are expressions on the face, then the detection
Survey of Occluded and Unoccluded Face Recognition
1011
will also be affected, as there will have some information lack in these expressions, which may be harder for the detection method to detect the true face. These two facts still need to be considered when doing the face recognition problem.
3.2 Changing (Oldness, Plastic Surgery, and Skin Color) Regardless of the challenge of the pose, the changing of the human face is also an important factor that needs to be considered in face recognition. It is clear that for the people who get old or mainly being tanned, then the method may not be able to correctly recognize the face as there are some wrinkles or the change of color on the human face which may be hard for it to recognize. Also, if people are doing plastic surgery, then there may have some small changes on their faces, and it is also challenging to recognize the new face.
3.3 Security The security issue is mainly for the spoofing problem as nowadays there is face recognition technology and some people want to use face spoofing technology to cheat the computer and let them know that the image on the picture is a real person. This may lead to serious problems as many people use face to pay everything, but if this can be done simply by using the image, then everyone who gets the image may be able to pay for the things and this is not good for the person who use face to pay things.
4 Conclusion Although, the face recognition for the non-occluded picture, occluded picture, and the low-quality picture is very mature and may have high accuracy, the accuracy may still not so high enough and there are some other challenges that may need to be solved in the future. This essay mainly concerns some existing methods in face recognition about the non-occluded picture, occluded picture, and low-quality picture and their accuracy in different databases and also discuss about their advantages and disadvantages, and also hopes can help others better learn in this area as well as to make the face recognition problem more mature in the future.
1012
S. Xu
References 1. A.M. Martínez, A.C. Kak, Pca Versus Lda. IEEE Trans. Pattern Anal. Mach. Intell. 23(2), 228–233 (2001) 2. W. Zhao et al., Face recognition: a literature survey. ACM Comput. Surv. 35(4), 399–458 (2003) 3. M.S. Bartlett, J.R. Movellan, T.J. Sejnowski, Face recognition by independent component analysis. IEEE Trans. Neural Netw. 13(6), 1450–1464 (2002) 4. C. Liu, Gabor-Based kernel PCA with fractional power polynomial models for face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 5, 572–581 (2004) 5. A.F. Andrea et al., 2d and 3d face recognition: a survey. Pattern Recognit. Lett. 28(14), 1885– 1906 (2007) 6. M. Lades et al., Distortion invariant object recognition in the dynamic link architecture. IEEE Trans. Comput 3, 300–311 (1993) 7. P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces versus fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 7, 711–720 (1997) 8. S.H. Lin, S.Y. Kung, L.J. Lin, Face recognition/detection by probabilistic decision-based neural network. IEEE Trans. Neural Netw. 8(1), 114–132 (1997) 9. L. Wiskott et al., Face recognition by elastic bunch graph matching. IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 775–779 (1997) 10. C. Garcia, G. Zikos, G. Tziritas, Wavelet packet analysis for face recognition. Image Vis. Comput. 18(4), 289–297 (2000) 11. H. Yu, J. Yang, A direct LDA algorithm for high-dimensional data—with application to face recognition. Pattern Recognit. 34(10), 2067–2070 (2001) 12. H. Ebrahimpour-Komleh, V. Chandran, S. Sridharan, Face recognition using fractal codes, in International Conference on Image Processing (2001) 13. Er.M.J. Meng et al., Face recognition with radial basis function (RBF) neural networks. IEEE Trans. Neural Netw. 13(3), 697–710 (2002) 14. J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos, Face recognition using LDA-based algorithms. IEEE Trans. Neural Netw. 14(1), 195–200 (2003) 15. F. Perronnin, J.-L. Dugelay, An introduction to biometrics and face recognition, in Image: ELearning, Understanding, Information Retrieval, Medical (World Scientific), pp. 1–20 (2003) 16. X. Chen, P.J. Flynn, K.W. Bowyer, PCA-based face recognition in infrared imagery: baseline and comparative studies, in 2003 IEEE International SOI Conference. Proceedings (Cat. No. 03CH37443) (2003) 17. Z. Pan et al., Face recognition in hyperspectral images. IEEE Trans. Pattern Anal. Mach. Intell. 25(12), 1552–1560 (2003) 18. D.A. Socolinsky, A. Selinger, Thermal face recognition over time, in Proceedings of the 17th International Conference on Pattern Recognition, ICPR, IEEE (2004) 19. P. Buddharaju, I. Pavlidis, I. Kakadiaris, Face recognition in the thermal infrared spectrum, in 2004 Conference on Computer Vision and Pattern Recognition Workshop (2004) 20. H. Cevikalp et al., Discriminative common vectors for face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 27(1), 4–13 (2005) 21. W. Mei, W. Deng, Deep face recognition: a survey. arXiv preprint arXiv:1804.06655 (2018) 22. K. Simonyan, A. Vedaldi, A. Zisserman, Learning local feature descriptors using convex optimisation. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1573–1585 (2014) 23. Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf, Deepface: closing the gap to human-level performance in face verification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014) 24. Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf, Web-scale training for face identification, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) 25. Y. Sun, Y. Chen, X. Wang, X. Tang, Deep learning face representation by joint identificationverification, in Advances in Neural Information Processing Systems (2014) 26. Y. Sun et al., Deepid3: face recognition with very deep neural networks. arXiv preprint arXiv: 1502.00873 (2015)
Survey of Occluded and Unoccluded Face Recognition
1013
27. F. Schroff, D. Kalenichenko, J. Philbin, Facenet: a unified embedding for face recognition and clustering, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) 28. O.M. Parkhi, A. Vedaldi, A. Zisserman, Deep Face Recognition. BMVC (2015) 29. X. Wu et al., A light cnn for deep face representation with noisy labels. IEEE Trans. Inf. Forensics Secur. 13(11), 2884–2896 (2018) 30. Y. Sun, X. Wang, X. Tang, Sparsifying neural network connections for face recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) 31. X. Zhang, Z. Fang, Y. Wen, Z. Li, Y. Qiao, Range loss for deep face recognition with long-tailed training data, in Proceedings of the IEEE International Conference on Computer Vision (2017) 32. Y. Liu, H. Li, X. Wang, Rethinking feature discrimination and polymerization for large-scale recognition. arXiv preprint arXiv:1710.00870 (2017) 33. J. Deng, J. Guo, N. Xue, S. Zafeiriou, Arcface: additive angular margin loss for deep face recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019) 34. O.M. Parkhi, K. Simonyan, A. Vedaldi, A. Zisserman, A compact and discriminative face track descriptor, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014) 35. T.H. Chan et al., PCANet: a simple deep learning baseline for image classification? IEEE Trans. Image Proc. 24(12), 5017–5032 (2015) 36. Y. Sun, X. Wang, X. Tang, Deep learning face representation from predicting 10,000 classes, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014) 37. L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: which helps face recognition? in 2011 International Conference on Computer Vision IEEE (2011) 38. M. Yang et al., Gabor feature based robust representation and classification for face recognition with gabor occlusion dictionary. Pattern Recognit. 46(7), 1865–1878 (2013) 39. Q. Jianjun et al., Robust nuclear norm regularized regression for face recognition with occlusion. Pattern Recognit. 48(10), 3145–3159 (2015) 40. R. He, W.S. Zheng, B.G. Hu, Maximum correntropy criterion for robust face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1561–1576 (2011) 41. R. Liang, X.X. Li, Mixed error coding for face recognition with mixed occlusions, in International Conference on Artificial Intelligence (2015) 42. L. Pei et al., Face recognition in low quality images: a survey. arXiv E-prints Web. (1 May 2018) 43. R. Gopalan et al., A blur-robust descriptor with applications to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(6), 1220–1226 (2012) 44. M. Haghighat, M. Abdel-Mottaleb, Low resolution face recognition in surveillance systems using discriminant correlation analysis, in 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017) (2017) 45. P. Sinha et al., Face recognition by humans: nineteen results all computer vision researchers should know about. Proc. IEEE 94(11), 1948–1962 (2006) 46. J. Li, C. Zhang, J. Hu, W. Deng, Blur-robust face recognition via transformation learning, in Asian Conference on Computer Vision (Springer) (2014) 47. S. Shekhar, V.M. Patel, R. Chellappa, Synthesis-based robust low resolution face recognition. arXiv preprint arXiv:1707.02733 (2017) 48. J. Jiang, R. Hu, Z. Han, L. Chen, J. Chen, Coupled discriminant multi-manifold analysis with application to low-resolution face recognition, in International Conference on Multimedia Modeling (Springer) (2015) 49. H. Zhang, J. Yang, Y. Zhang, N.M. Nasrabadi, T.S. Huang, Close the loop: joint blind image restoration and recognition with sparse representation prior, in 2011 International Conference on Computer Vision IEEE (2011) 50. S. Biswas, K.W. Bowyer, P.J. Flynn, Multidimensional scaling for matching low-resolution face images. IEEE Trans. Pattern Anal. Mach. Intell. 34(10), 2019–2030 (2011)
1014
S. Xu
51. S. Biswas et al., Pose-robust recognition of low-resolution face images. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 3037–3049 (2013) 52. P. Moutafis, I.A. Kakadiaris, Semi-coupled basis and distance metric learning for crossdomain matching: application to low-resolution face recognition, in IEEE International Joint Conference on Biometrics (2014) 53. F. Yang et al., Discriminative multidimensional scaling for low-resolution face recognition. IEEE Signal Proc. Lett. 25(3), 388–992 (2017) 54. Z. Wang, S. Chang, Y. Yang, D. Liu, T.S. Huang, Studying very low resolution recognition using deep networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016) 55. M. Jian, K.-M. Lam, Simultaneous hallucination and recognition of low-resolution faces based on singular value decomposition. IEEE Trans. Circ. Syst. Video Technol. 25(11), 1761–1772 (2015) 56. K. Zhang, Z. Zhang, C.W. Cheng, W.H. Hsu, Y. Qiao, W. Liu, T. Zhang, Super-identity convolutional neural network for face hallucination, in Proceedings of the European Conference on Computer Vision (ECCV) (2018) 57. S.P. Mudunuri, S. Sanyal, S. Biswas, Genlr-Net: deep framework for very low resolution face and object recognition with generalization to unseen categories, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2018)
A Survey on Dynamic Sign Language Recognition Ziqian Sun
Abstract Sign Language Recognition (SLR) plays a significant role in solving communicating problem between people who are deaf. The focus of a variety of SLR systems is the same, to improve the accuracy of recognition. This paper presents a survey on dynamic SLR, including two main categories, typically mentioning HMM, some main datasets in different languages and methods used for data preprocessing. Keywords Device-based SLR · Vision-based SLR · HMM · Preprocessing · Dataset
1 Introduction There are many people in the world who are impaired in Listening and Speaking and therefore cannot communicate fluently like normal people. Instead, they use Sign Language to communicate with others. When talking about researches on Sign Language Recognition (SLR) in the past few decades, two main categories are mainly discussed: electromechanical devices based on optical, magnetic and acoustic sensing and techniques based on computer vision and image processing. This paper put the introduction and some researches on main categories in continuous SLR in Sect. 2.1. In the area of vision-based SLR, this paper particularly extracts the HMM models and lists some advanced models used in SLR that originated from HMM. Besides, in Sect. 2.2, this paper also comprises two important steps before training and testing: data collection and data preprocessing. Section 3 is the conclusion of this paper.
Z. Sun (B) Harbin Engineering University, Harbin, China e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_89
1015
1016
Z. Sun
2 Literature Review 2.1 Main Categories in Continuous SLR The first category which depends on electronic devices like data gloves obtains position and movements of users’ hands, elbows and input them into the computer to accomplish final processing. Back to 1993, Fels et al. [1] were the first researchers who devoted themselves on building Sign Language Recognition system. The thought of Glove-Talk put forward by them drew data from Cyber Glove. Simultaneously, the classification technique they chose was neural network (NN). Two years later, to identify ASL alphabet, Liang et al. [2] utilized the Cyber Glove to support their idea. Besides, they also improve the functionality of the traditional data gloves in their system, which stressed the importance on recognizing a sequential flow that consists of signs in American Sign Language (ASL). The system developed by Ouhyoung [3] also utilized data gloves and finally did well in dealing with the end point detecting problem which is critical. Lim et al. [4] introduced the virtual button, a new interface in the recognition of hand gesture, which can also be used in Sign Language Recognition. In the virtual button, different kinds of the wrist shape are used to monitor and recognize fingers/hands movements. Savur et al. [5] proposed an American Sign Language system with surface electromyography (sEMG). The sEMG data, to be more specific, the combination of twenty-six ASL gestures that respectively represent twenty-six English alphabets, are attained by subject’s right forearm in order to satisfy the need of ASL recognition project. It is not so convenient for users to wear such a glove-like or other types of cumbersome devices because there are some cables connecting to the computer, which make the user feel less natural, and as a result, lead to nonstandard gesture that can reduce the accuracy of the Sign Language Recognition. Shukor et al. [6] designed a data glove which is used for Malaysian Sign Language Detection. This data glove can catch the finger flexion which is achieved by 10 inclined sensors, recognize hand motions due to a inside accelerometer and send the translated information to a mobile phone accomplished in the Bluetooth component. Also, there is a micro-controller there. It can be improved by using more training and testing datasets and fundamental frameworks like HMM. In hand gesture recognition, Siddiqui et al. [7] developed a cheap and wrist-wearable human–computer interaction (HCI) device based on acoustic measurements. They compared four different kinds of classification performance: decision tree (DT), Knearest neighbors (kNN), support vector machine (SVM), and linear discriminant analysis (LDA), among which LDA served the highest average accuracy. The cost of their project is very low. The other category in which more researchers are interested is based on computer vision and image processing techniques. In this category, there exist a number of researchers using Hidden Markov Model (HMM) or some new models they proposed that originated from HMM. HMM, as we know, is one of the most successful methods for sequence recognition. Continuous Sign Language Recognition (SLR) is a kind of
A Survey on Dynamic Sign Language Recognition
1017
topic that falls into sequence recognition. Therefore, HMM can reach a high accuracy in the area of SLR. Starner et al. [8] utilized Hidden Markov Models (HMM) to presented a high accuracy computer vision-based method. Typically, they used traditional dual camera method: one for tracking users’ hand, and the other fixed on the desk to track the static information. Wang et al. [9] proposed an advanced framework base on HMMs, called Light-HMM. It is advanced comparing to the classic one thanks to the proper evaluated hidden states and fewer frames. Finally, with Light-HMM classification, it got an accuracy rate of 83.6%. Kumar et al. [10] used Coupled Hidden Markov Model (CHMM), which offers interaction in state-space, to invent a new multi-sensor union framework for Sign Language Recognition (SLR). Comparing to classical HMM that provides interaction in observation state, this CHMM have an advantage that can model the relationship between different inter-modal dependencies. With CHMM, the best recognition accuracy is about 90.80%. To determine the quantity of the states in HMM model before establishing the HMM classifier, Li et al. [11] combined entropy-based k-means with ABC-Based HMM, which is a new method proposed by them. The finally accuracy rate was about 91.3%. Fatmi et al. [12] used Myo armband sensor utilizing GHMM and the accuracy was 96.15%. HMM modified with KNN was used by Fok et al. [13], and finally the accuracy was around 93%. Wu et al. [14] did their research on ASL using LibSVM with an accuracy of 85.24%, while Sarhan et al. [16] investigate Arabian Sign Language with an accuracy of 80.47% by using HMM. Yang et al. [15] applied Chinese Kinect-based dataset to reach an accuracy of 87.80% through Level Building-Based Fast Hidden Markov Model (Table 1). Except for HMM, researchers also use many other methods in the area of SLR. Akmeliawati et al. [17] proposed an automatic SLT system which was based on computer vision with a special-produced color glove. This color glove has color both in the top of every finger and in palm. The detection procedure of signers’ movements can be accurately recorded through a camera, which collects all frames of images at regular intervals. During this process, the color of the glove helps easier the procedure of feature extraction. Eventually, the recognition rate was over 90%. Caridakis et al. [18] designed a system that contained Markov chains, HMM, selforganizing maps for segmentation and feature extraction. Lahamy et al. [19] used a range camera which has the ability to simultaneously capture entire 3D point clouds. This is an important step in the area of dynamic gesture recognition. Nguyen et al. [20] proposed a tracking, training and ASL recognition system in an unconstrained environment utilizing pseudo 2-D HMM (P 2-DHMM), which is a Kalman filter and hand blob analysis. Ravikiran et al. [21] used boundary trace-based finger detection to accomplish an automatic capturing and translating Sign Language System which converts sign sequences to speech. The method consists of three steps: edge detection by the Canny operator, clipping and boundary tracing. Finally, the recognition accuracy reaches 95%.
1018
Z. Sun
Table 1 Some advanced HMMs proposed for continuous SLR Paper
Model
Accuracy (%)
Dataset
Wang et al. [9]
Light-HMM
83.6
3 American sign language (ASL) datasets collected by using Microsoft Kinect sensor: among which the first containing 370 daily signs of China, the second containing 1000 signs from Chinese sign language (CSL), the third having the vocabulary size of 1000
Kumar et al. [10]
Coupled hidden Markov model (CHMM)
90.8
The sign language dataset based on Indian sign language (ISL), containing 25 dynamic sign words
Li et al. [11]
Combining entropy-based K-means algorithm with ABC-based HMM()
91.3
Datasets on Taiwan sign language
Fatmi et al. [12]
Combining Myo armband sensor with GHMM
96.15
Datasets on American sign language (ASL)
Fok et al. [13]
HMM modified with KNN 93
Datasets on American sign language (ASL)
Wu et al. [14]
LibSVM
85.24
Datasets on American sign language (ASL)
Yang et al. [15]
Level building based fast hidden Markov model
87.80
Chinese Kinect-based dataset
Sarhan et al. [16]
HMM for Arabian sign language
80.47
Datasets on Arabian sign language
2.2 Datasets 1. Data preprocessing Data preprocessing should be implemented in order to prepare the data before using it for training and classifying and finally improve the recognition accuracy. Different from speech recognition, Sign Language Recognition (SLR) has to handle multiple data streams, such as shape, position, orientation and movement of hands, rather than an only stream of speech signal data. Another difficulty is that finding the basic units of SLR is quite difficult, unlike the phoneme in speech recognition. Because if we extract all the basic units from each stream, the number of combinative units is too large [1]. The main challenge now for SLR is how to solve large-vocabulary sign problems. Fang et al. [30] proposed transition-movement models (TMMs) [1] to handle transition parts between two adjacent signs in large-vocabulary SLR. They also improve a temporal clustering algorithm from k-means by using dynamic time warping to
A Survey on Dynamic Sign Language Recognition
1019
dynamically cluster them. Finally, they compared their TMM with direct HMM. The result is that in premise of the same quantity, TMM has higher accuracy than the others. It is because HMM segments transition parts into two adjacent signs while TMM clusters transition models. The latter one can effectively solve the transition movement issue in a large vocabulary size because transition movements are only related with the end of the preceding sign and the start of the following sign. Huang et al. [31] proposed a novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN) [2] to solve the challenge that how to better change a continuous SLR to an isolated one. They designed a two-stream 3D-CNN for video feature extraction. Besides, they stressed the importance of eliminating temporal segmentation through video feature extraction because transitional movements of hands, head, body are diverse and hard to detect, which can easily result in inaccurate segmentation influencing the subsequent steps. Furthermore, labeling each isolated fragments is highly time-consuming. In addition, Suharjito et al. [32] introduced two different kind of methods to preprocess data: skin segmentation and edge segmentation. They are needed because the noises in raw data will greatly make the accuracy decline. For skin segmentation, they used two types of skin detection, the one with YCbCr color space and the other with HSV color space, while for edge segmentation, they used CLAHE technique. 2. Common Datasets Data collection is always the first step in the process of Sign Language Recognition. Datasets are needed when training and testing for Sign Language Recognition. Now, there exists seven main large datasets in sign language [22]. They are American Sign Language Lexicon Video Dataset [23], MSR Gesture3D [24], Auslan data set [25], LTI-Gesture Database [26], RWTH German Fingerspelling Database [27], DEVISIGN Chinese Sign Language dataset [28] and Indian Sign Language dataset [29]. The quantity of ASL signs in American Sign Language Lexicon Video Dataset (ASLLVD) is over 3300, which are all videos in citation form. Every signs are produced by at least one signers and at most six signers, for a total of nearly 9800 tokens. MSR Gesture3D dataset has twenty gesture types, and each gesture type consists of 10 subjects. For each action, the same action was executed 2 or 3 times. Depth camera is used in it to capture a dynamic hand gesture data set of depth sequences. This dataset contains 12 dynamic hand gestures defined by the ASL and all of the gestures are dynamic ones. Auslan Signbank, containing around 26 finger spellings and 7797 sign words, uses hand shapes and movements, facial expressions and body expressions to express a visual mean of communication. In this dataset, each sign contains five main parts: orientation, handshape, movement, location, and facial expression. LTI-Gesture Database contains 14 dynamic gestures, and totally 364 video sequences were recorded (84 testing, 280 for training). But this database is not freely available. RWTH German Fingerspelling Database contains 35 gestures representing the letters of the alphabet, in which 5 of them are dynamic and others are static. The dataset comprises 1160 images and 20 different signers. DEVISIGN is a Chinese Sign Language dataset, which has a subset (DEVISIGN-G) containing 26
1020
Z. Sun
letters and 10 numbers performed by 8 different signers. In the Indian Sign Language dataset, there are around 7500 sign gestures which are generated by Kinect sensors and Leap Motion. This dataset also has 50 dynamic ones that consists of both single and double hand dynamic sign gestures, which are all available to download (Table 2). Table 2 Common datasets utilized in SLR Name
Genre
Information inside
American sign language lexicon video dataset (ASLLVD) [23]
American sign language (ASL)
The quantity of ASL signs in American sign language lexicon video dataset (ASLLVD) is over 3300, which are all videos in citation form. Every signs are produced by at least one signers and at most six signers, for a total of nearly 9800 tokens
MSR gesture3D dataset [24] ASL
Contains 12 dynamic hand gestures defined by the ASL: “z”, “j”, “where”, “store”, “pig”, “past”, “hungry”, “green”, “finish”, “blue”, “bathroom”, “milk” and 10 subjects, each one performing each gesture 2 times or 3 times
Auslan Signbank (a site) [25]
Australian sign language (Auslan)
Contains around 26 finger spellings and 7797 sign words, each of which is made up of 5 main parts: handshape, orientation, location, movement and facial expression
LTI-gesture database [26]
Dynamic gestures
Contains 14 dynamic gestures in the form of video sequence, each of which is 106 × 96 grey-scale pixel
RWTH German fingerspelling database [27]
German sign language (GSL)
There are 35 gestures in it, each gesture is recorded by 20 signers, which represent German umlauts and letters
DEVISIGN [28]
Chinese sign language (CSL) The subset (DEVISIGN-G) of it consists of 26 letters and 10 numbers, which are performed by 8 different subjects to get rid of the influence generated by different subject (continued)
A Survey on Dynamic Sign Language Recognition
1021
Table 2 (continued) Name
Genre
Indian sign language dataset Indian sign language (ISL) [29]
Information inside It used Kinect sensors and leap motion to generate around 7500 sign gestures. Besides, there are also 50 dynamic ones produced by 10 different signers (8 males, 2 females)
3 Conclusion Continuous Sign Language Recognition is a kind of task that is not easy to accomplish. A variety of factors can let it be a tough problem such as feature extraction under the unclear background, the disunion of sign language and datasets between different countries or areas. This paper mainly introduces two main categories of continuous SLR: electronic-based SLR and vision-based SLR, some common datasets utilized for different kind of sign language and some method for preprocessing data, including TMMs, LS-HAN, skin segmentation and edge segmentation. In the introduction of vision-based SLR, we also compare the accuracy of different kinds of advanced HMM.
References 1. S.S. Fels, G.E. Hinton, Glove-talk: a neural network interface between a data-glove and a speech synthesizer. IEEE Trans. Neural Netw. 4(1), 2–8 (1993) 2. R.H. Liang, M. Ouhyoung, A real-time continuous alphabetic sign language to speech conversion VR system (The Eurographs Association, Wiley, 1995) 3. R.H. Liang, A real-time continuous gesture recognition system for the Taiwanese sign language, in Proceedings of The Third IEEE International Conference on Automatic Face and Gesture Recognition (1998) 4. J.M. Lim et al., Recognizing hand gestures using wrist shapes, in International Conference on Consumer Electronics IEEE (2010) 5. C. Savur, F. Sahin, Real-time American sign language recognition system using surface EMG signal, in IEEE International Conference on Machine Learning and Applications IEEE (2016) 6. A.Z. Shukor et al., A new data glove approach for Malaysian sign language detection. Procedia Comput. Sci. 76, 60–67 (2015) 7. N. Siddiqui, R.H.M. Chan, A wearable hand gesture recognition device based on acoustic measurements at wrist, in International Conference of the IEEE Engineering in Medicine and Biology Society IEEE (2017) 8. T. Starner, J. Weaver, A. Pentland, Real-time American sign language recognition using desk and wearable computer based video. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 0–1375 (1998) 9. H. Wang et al., Fast sign language recognition benefited from low rank approximation, in 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG) IEEE (2015) 10. P. Kumar et al., Coupled HMM-based multi-sensor data fusion for sign language recognition. Pattern Recognit. Lett. 86, 1–8 (2017)
1022
Z. Sun
11. T.H.S. Li, M.C. Kao, P.H. Kuo, Recognition system for home-service-related sign language using entropy-based $K$-means algorithm and ABC-based HMM.” IEEE Trans. Syst. Man Cybern. Syst. (1) (2015) 12. R. Fatmi, S. Rashad et.al, Comparing ANN, SVM and HMM based machine learning methods for american sign language recognition using hidden markov models and wearable motion sensors, in 2019 IEEE 9th Annual Computing and Communication Workshop and Conference, pp. 290–297 13. K.Y. Fok et al., A real-time ASL recognition system using leap motion sensors, in 2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), IEEE (2015) 14. J. Wu, L. Sun, R. Jafari, A wearable system for recognizing American sign language in real-time using IMU and surface EMG sensors. IEEE J. Biomed. Health Inf. 20(5), 1 (2016) 15. W. Yang, J. Tao, Z. Ye, Continuous sign language recognition using level building based on fast hidden Markov model. Elsevier Science Inc. (2016) 16. N.A. Sarhan, Y. Elsonbaty, S.M. Youssef, HMM-based Arabic sign language recognition using kinect, in” Tenth International Conference on Digital Information Management IEEE (2016) 17. R. Akmeliawati, P.L. Ooi, Y.C. Kuang, Real-time Malaysian sign language translation using colour segmentation and neural network, in Instrumentation and Measurement Technology Conference Proceedings, IMTC 2007. IEEE (2007) 18. G. Caridakis et al., Automatic sign language recognition: vision based feature extraction and probabilistic recognition scheme from multiple cues, in International Conference on Pervasive Technologies Related to Assistive Environments ACM (2008) 19. H. Lahamy, D. Litchi, Real-time hand gesture recognition using range cameras, in International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences—ISPRS Archives, vol. 38 (2010) 20. N.D. Binh, et.al., Real-time hand tracking and gesture recognition system, in GVIP 05 Conference CICC (Cairo, Egypt), pp. 19–21 (December, 2005) 21. J. Ravikiran, et.al., Finger detection for sign language recognition, in The International Multi Conference of Engineers and Computer Scientists 2009 (IMECS), vol. I (2009) 22. L. Zheng, B. Liang, A. Jiang, Recent advances of deep learning for sign language recognition, in International Conference on Digital Image Computing: Techniques and Applications IEEE (2017) 23. Athitsos, et.al., The American sign language lexicon video dataset, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops (2008) 24. J. Wang et al., Robust 3D action recognition with random occupancy patterns. Computer Vision—ECCV 2012 (2012) 25. SignBank and Auslan. http://www.auslan.org.au/about/dictionary/ 26. I. Ney, P. Dreuw, T. Seidl, D. Keysers, Appearance-based gesture recognition (2005) 27. RWTH German Fingerspelling database. http://www-i6.informatik.rwthaachen.de/dreuw/fin gerspelling.php 28. X. Chai, et.al., The devisign large vocabulary of Chinese sign language database and baseline evaluations, in 2014, Technical report VIPL-TR-14-SLR-001. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS (2014) 29. P. Kumar et al., A multimodal framework for sensor based sign language recognition. Neurocomputing (2017) S092523121730262X 30. G. Fang, W. Gao, D. Zhao, Large-vocabulary continuous sign language recognition based on transition-movement models. IEEE Trans. Syst. Man. Cybern. Part A Syst. Hum 37(1), 1–9 (2007) 31. J. Huang et al., Video-based sign language recognition without temporal segmentation (2018) 32. Suharjito, et.al., The comparison of some hidden markov models for sign language recognition, in 1st 2018 Indonesian Association for Pattern Recognition International Conference, INAPR 2018—Proceedings, pp. 6–10 (2018)
Extract and Merge: Merging Extracted Humans from Different Images Minkesh Asati, Worranitta Kraisittipong, and Taizo Miyachi
Abstract Selecting human objects out of the various type of objects in images and merging them with other scenes is manual and day-to-day work for photo editors. In this work, we proposed an application, utilizing Mask R-CNN (for object detection and mask segmentation), to group people in a single image with a new background after extracting them from different images. At first, it extracts the full body of a person only without any obstacles, such as a dog in front of the person, from different image, such as three person from three different images. Then, it will merge those extracted human instances together as a single image and place them in a new background. This application does not add any overhead to Mask R-CNN, running at five frames per second. It can extract human instances from any number of images for merging them together as well as it can extract more than one person from a single picture. It also works with video and process it frame by frame. We also structured the code to accept videos of different lengths as input and length of the output video will be equal to the longest input video. We wanted to create a simple yet effective application that can serve as a base for photo editing and do most time-consuming work automatically, so, editors can focus more on the design part. Other application could be to group friends (who cannot be physically together), after extracting each friend from different images, in a single picture. We are showing one-person and two-person extraction, and then placement in two different backgrounds. Also, we are showing a video example with single-person extraction. Keywords Human extraction · Extract and merge · Mask R-CNN · Person detection · Instance segmentation
M. Asati (B) · W. Kraisittipong · T. Miyachi Tokai University, Kanagawa, Japan e-mail: [email protected] W. Kraisittipong e-mail: [email protected] T. Miyachi e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_90
1023
1024
M. Asati et al.
1 Introduction In order to understand the images, computer software should have the capability to describe the images/videos content, and recognize faces or body of people that appeared in those images or videos. To understand the content might also involve extracting a description of those images which it could be an object, text, position, and so on. In the related research of computer vision, human behavior analysis has a unique position. Extracting a person from the video is a fascinating task for image segmentation but to accurately masking an object requires much training and more labeled masked dataset; also, it takes more time in training as the dataset grows. In our application, we utilize an instance segmentation algorithm proposed in Mask R-CNN, which is an extension of Faster R-CNN (object detection) that masks specific pixel on the object of interest in the image. We used pre-trained network weights of Mask R-CNN, which was trained on MS COCO dataset (Fig. 1). So, here we are discussing it briefly. In 2014, Lin et al. introduced Microsoft Common Objects in context or MS COCO dataset [1]. It is a large-scale dataset that contains multiple objects in everyday scenes. Comparing with ImageNet dataset [2], COCO has more labeled objects per category but it has fewer categories than ImageNet. The object in COCO dataset labeled using perinstance segmentation for more precise in object localization. COCO dataset contains 91 common object categories, and total dataset has 2,500,000 labeled instances in 328,000 images. The goal of COCO is to improve the tasks related to scene understanding and to advance the state-of-the-art object recognition. In this work, we used Mask R-CNN algorithm for detecting and generating the mask of the person in an image, and we are contributing an algorithm for extracting the detected person, all or some of them, from different images, suppose there are three images and we want to extract one person from each image, and then placing them together in a single image with a new background. Nowadays, extracting the person and merging them together as a single image is still a manual work in photo editing, and it becomes even harder with video editing. We are proposing here a quick and accurate artificial intelligent system that does this work automatically, so, everybody could use it in image editing for various purposes.
Fig. 1 Samples of annotated images in the MS COCO dataset from [1]
Extract and Merge: Merging Extracted Humans …
1025
2 Related Work 2.1 Instance Segmentation Application There are many approaches for instance segmentation that is based on segment proposals. For example, Li et al. [3] combine the segment proposal [4] with an object detection system [5] for fully convolutional instance-aware semantic segmentation or FCIS [3]. However, FCIS still has a problem on the overlapped instances comparing to Mask R-CNN in Fig. 2. Because instead of segment proposal, Mask R-CNN based on a parallel prediction of masks and class labels. Furthermore, instance segmentation with Mask R-CNN application has appeared in various field. Anantharaman et al. proposed the application of utilizing Mask RCNN for detection and segmentation of common oral diseases [7]. Recently, Tan et al. [8] proposed a Robot-Assisted training in laparoscopy to improve the experiences of surgeons. The Robot-Assisted laparoscopy training system used Mask R-CNN to perform the semantic segmentation of surgical tools and targets for enhancing the automation of training feedback, visualization, and error validation. The static analysis shows that utilizing the training system; trainees skill statistically improves. Alternatively, using data collected from Google Earth, Nie et al. introduced the application for inshore ship detection based on Mask R-CNN [9]. However, there are very few works we found that are closely related to our application. Yuen experiments on Mask R-CNN deep learning image segmentation techniques [10]. The inspiration of Yuen is his client consulted about the image blending project, so he implemented the solution that allows users to take a photo of themselves and blend it with historical photos. The challenging of this experiment is taken photos are from end-users, which could be any background and to extract and blend users in the photo is quite tricky. Utilizing Mask R-CNN for capturing the persons, he extracts the persons by their relevant pixels and merges them into the target historical photos. After extract the persons, he dilates the mask section using OpenCV dilation API to make the person more contrast and tune the color to black-and-white to match with the tone of the historic image. The result of image segmentation blending is imposing and could be used in other image processing or behavior understanding (Fig. 3).
Fig. 2 FCIS+++ [3] exhibits spurious edge on overlapping objects comparing with mask R-CNN with ResNet-101-FPN [6]
1026
M. Asati et al.
Fig. 3 An Example of [10], where a person object extracted from it’s source image and then has been blended with histotical image
Fig. 4 Classification result from [11]
Not only extract and blending people, but another related work is segmenting some part of the human for fashion approach. The idea of this work is to take a raw image, segment the article of clothing, and match with the database to find similar accessories. Michael Sugimura [11] built a custom multi-class image segmentation model to classify bag, boot, and top from an image (Fig. 4). This application phases start with object detection and then comparisons of input objects in an image to a known database for matching. There are two models in this work, first is object detection to localized objects, and second is a comparison based on localized objects. Given 100 training images, this work trains the Mask R-CNN model using pretrained weights from the MS COCO dataset [1]. This application uses pixel-level segmentation instead of bounding box because of the cleaner output of object detection make the comparison stage easier and more optimized. Figure 4 shows the example of the object detection and also the extraction part of top, boot, and bag. With the incredible related work above, it proved that image segmentation application and deep learning could uncover new insights and even more advanced application.
2.2 Adobe Photoshop Tools In Adobe Photoshop, many tools have been used for extracting person in a photo as we describe some of it below:
Extract and Merge: Merging Extracted Humans …
1027
Select Subject: Select subject is an edge detection based tool from Adobe Photoshop. Select Subject automatically selects the foreground instances in an image but we must refine the selection manually to get an accurate selection. Magic Wand Tool: It is known as Magic Wand. Magic Wand is a very famous tool that has been in Adobe Photoshop for a very long time. The basic idea of the Magic Wand is to select pixel based on tone and color, which is different from the select subject that is based on edge detection.
3 Problem Definition We want to point out some limitations of the related work in Sect. 2. Tools from the Adobe Photoshop such as “Select Subject” might be useful in extracting person, but for merging, it still requires a lot of manual actions, and it mainly useful when there is only one object in the image, but in case of multiple objects, it does not work and most of the work need to be done manually because Select Subject uses edge detection as a base algorithm, so, when it comes to an elaborate scene or crowded scene where objects are coupled together, its detection results are quite random. Unlike Select Subject, our application does segmentation pixel-by-pixel, so it is more precise, and it can perform the extraction very precisely, also, it merges the extracted objects together automatically layer by layer. So, the user does not need to do the manual part for extraction and do not need to deal with the layers for merging. Although we found an application [10], a medium post, that is very similar to our application in that author tried to extract human object from an image and placed it into the image of historical places then did step wise color blending, see Fig. 5. But our approach is entirely different because our objective is to merge human objects after extracting them from different images, that is a very crucial task in photo editing and requires more precision. Moreover, our application outperforms in various aspects such as, Fig. 5 application, could not extract human objects from multiple inputs, and also it has the limitation only for a photograph, and do not work with videos. The major problem in this work is that the first result came out blurry and need to sharp it with other tools. After sharping the human instances, it also needs to tone the color to black-and-white shows in Fig. 5. The extract and blend required many processes of work, so it is not an easy to run. In the work [11], their idea is to segment the article of clothing and match with the database to find similar items. It required a data annotation to label the data class, which are top, shoes, and bag. Labeling all of the information in the images is
Fig. 5 Extract and blending on [10]
1028
M. Asati et al.
entirely a manual work which requires a lot of time and effort. Concerning the works described above, our technique for extract and merge is more straightforward. Selecting (Extracting) image instances from different images and merge them together is a frequent and crucial work for the photo editors, and it becomes even harder with the videos. So, we wanted to do it automatically. Also, this could be used for advertising by travel agencies because it enables them to attract more customers by offering a service where their customers can visualize themselves into the beautiful places where they always wanted to go. It brings the imagination into reality.
4 Proposed Method In this paper, we proposed a merger application that detects and extracts only human object from 2 or more images then place them together as a single image with a new background. As an input, the application takes 2 or more images consisting various objects (e.g., human, animal, bikes) and a background image, as an output, we get a single image with target background consisting only human objects extracted from different input images. Since our application uses Mask R-CNN for instance segmentation and mask generation, and, Mask R-CNN is based on Faster R-CNN architecture. So, we begin by briefly reviewing the Faster R-CNN [12] and Mask R-CNN [6]. Faster R-CNN consists of two stages. The first stage proposes candidate object bounding boxes that is called a Region Proposal Network (RPN). The second stage, which is basically, Fast R-CNN [13], uses RoIPool to extracts features from each candidate box and performs classification and bounding-box regression. For faster inference, the features used by both stages can be shared. Faster R-CNN has two outputs for each candidate object, a bounding-box offset, and a class label; to this Mask R-CNN added a third branch that outputs the object mask—which is a binary mask that indicates the pixels where the object is in the bounding box. However, the additional mask output is distinct from the class and box outputs, requiring extraction of the much more exquisite spatial layout of an object. To do this Mask R-CNN uses the Fully Convolution Network (FCN) (Fig. 6).
Fig. 6 Proposed method flow diagram
Extract and Merge: Merging Extracted Humans …
1029
Now, we explain our end to end application structure. Application takes two inputs and gives one output. The first input is, n (n > 1) number of images contains various type of objects and scenes, second input is, a background image such as an image of the dream destination, and output is, a single image in which all extracted human objects, extracted from various input images, placed together into the target background that is a second input image. Our application has three stages. In the first stage, we do instance segmentation and generate their mask for each detected object for each input image using Mask R-CNN. In the second stage, we identify the human object out of all type of objects and then select an expected number of human object based on their area size, such as if we want to extract two person from an image then system will select those two person who has the biggest area size among all. In the third stage, we extract the person, which was selected in the second stage, and place them into the target background image one by one (layer by layer). Here, we explain the three stages of our application in detail. Before feeding the input images into the first stage, we resized all input images and background image to the same size. The first stage is pretty straight forward because here we are using Mask R-CNN without any changes in their architecture for object detection (bounding-box offset), object classification (object name), and mask generation (pixels which belong to the object). In the second stage, we calculate the area of all detected human object by using the bounding-box offsets (Y 1, X1, Y 2, X2) with the following formula. Area = (Y 2 − Y 1 ) ∗ (X 2 − X 1 ) Rectangle Area = Height ∗ Length
(1)
We store this area with respective person ids in descending order, in order to be able to extract main people who are significant in the image, and, select the people to extract (if we expect to extract n person from each image then consider first n person ids). We repeat this entire second stage for each input image. In the third stage, since our all input images, background image, and generated masks are of the same size that is why we can compare pixel-by-pixel and replace the pixels of the background image with input images pixels where we identified a human object. So, for each selected human object in an input image, we replace the background image pixel-value with the image pixel-value where the mask pixel-value is true (which pixels belongs to the object). Moreover, we do this layer by layer. We repeat this entire third stage for each input image. Here, we can see that people of the last input image will be in the top layer and most visible if objects are on the top of each other.
1030
M. Asati et al.
5 Implementation and Results 5.1 Hardware Environment In this project, we used an i5 processor (6-cores), 16 GB RAM, 3 TB ROM, and Graphics card of 12 GB memory. We were able to process around three frames per second when one image were feeding into the net in parallel, and five frames per second when two images were feeding into the net in parallel.
5.2 Implementation In the first stage of our implementation, we used the open-source implementation [14] of Mask R-CNN using Python, Keras, and TensorFlow. It is based on Feature Pyramid Network (FPN) and Resnet101 backbone. We used pre-trained weights of this network that was trained on MS COCO dataset. That network was trained on 62 categories of objects. Most of this implementation follows Mask R-CNN, but there are some cases where they deviated in favor of code simplicity and generalization. There are mainly three differences. First is, they resized all images to the same size to support training multiple images per batch. Second is, they ignored bounding boxes that come with the dataset and generate them on the fly to support training on multiple datasets because some dataset provides bounding boxes and some provide masks only. This also made it easy to apply image augmentation that would be harder to apply to the bounding boxes, such as image rotation. The third is, they used a lower learning rate instead of using 0.02 that was used in the original paper. Because they found it to be too high, that often causes the weights to explode, especially when using a small batch size. They related it to differences between how Caffe (original paper implemented in Caffe) and TensorFlow computes gradients (sum versus mean across batches and GPUs). Alternatively, maybe the official model uses gradient clipping to avoid this issue. They also used gradient clipping but did not set it too aggressively. Second stage and third stage are implemented using Python and its libraries for extracting and merging the extracted objects. We structured the code in a way to be able to extract single or multiple people out of all detected human objects from an image. To handle any number of inputs (images or videos), we placed the extracted objects into a background image layer by layer (as we do in Photoshop to have a clean look) to avoid any mixing and human object’s edges will be visible even though objects are on the top of each other. In the case of video inputs, output video-length is a crucial aspect because if the length of input videos is different, then how to figure out the length of output video. To solve this, we iterated our main application on frames until the frames of longest input video have finished.
Extract and Merge: Merging Extracted Humans …
1031
Fig. 7 Example of two images input, extracted one person from each image
Fig. 8 Example of two images input, extracted two person from one image, and one from another
Fig. 9 Example of three images input, extracted one person from each image
5.3 Results Photos shown in Figs. 7, 8, and 9 are the personal photos of authors and only authors (Minkesh, Worranitta, Taizo) are visible as a human with different background scenes. In Fig. 7, there are two-input images captured in wild and extracting one human object from each and placed them together with a beach-background. We have placed a man from input-1 in layer-1 and a woman from input-2 in layer-2. In Fig. 8, there are two input images captured in wild and extracting two people from 1st input image and one person from 2nd input image. Here, a woman from input-2 is in the bottom layer and two-man from input-1 is on top layer. In Fig. 9, there are three input images captured in wild and extracting one person from each. A man standing in left has a bigger area size as compared to the other man in the Input-1, so, we when we extracted one person from each image then left only left side person was extracted. In Figs. 10 and 11, it is shown the merging and extraction of a single person from 2 input videos (one person from each) frame by frame. In Fig. 10, there are 21 cells in three rows. Each cell consists of 2 frames (one frame of each input video) and Fig. 11, there are also 21 cells, and each cell consists of one frame of output video. Each cell of Fig. 11 is showing the result of its corresponding cell in Fig. 10.
1032
M. Asati et al.
Fig. 10 Total of 21 images in three rows. Each image consists two frames, upper one is a part (0:26–0:28) of video [15] and lower one is a part (37:42–37:44) of video [16]
Fig. 11 21 frames of one output video
6 Conclusion Merging human objects with other scenes is the mainstream task for postmovie/drama production but still being done manually. Here we are solving one of those problems where people need to be merged together, also can replace the background, in a single image. This system automatically extracts people without any obstructions, if there is any, and merges them in a layer format to have a realistic looking. We structured the code in a way to be able to extract single or multiple people from an image. It can handle any number of input images or videos. However, we could not improve the accuracy in human detection because our application is dependent on Mask R-CNN for object detection and mask generation. Even though it does not make any overhead to Mask R-CNN and running at five frames per second, but it is not sufficient for real-time application. Furthermore, accuracy and speed could be improved by training the network on masked human objects only instead of all type of objects. Compliance with Ethical Standards This study was funded by Japan International Cooperation Agency under the Innovative Asia Program 2017. The authors declare that they have no conflict of interest.
Extract and Merge: Merging Extracted Humans …
1033
References 1. T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, C.L. Zitnick, Microsoft COCO: common objects in context. ECCV, Part V, LNCS 8693, 740–755 (2014) 2. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: a large-scale hierarchical image database. CVPR (2009) 3. Y. Li, H. Qi, J. Dai, X. Ji, Y. Wei, Fully convolutional instance-aware semantic segmentation. CVPR (2017) 4. J. Dai, K. He, Y. Li, S. Ren, J. Sun, Instance-sensitive fully convolutional networks. ECCV (2016) 5. J. Dai, Y. Li, K. He, J. Sun, R-FCN: object detection via region-based fully convolutional networks. NIPS (2016) 6. K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask-RCNN. ICCV (2017) 7. R. Anantharaman, M. Velazquez, Y. Lee, Utilizing mask R-CNN for detection and segmentation of oral diseases. BIBM (2018) 8. X. Tan, C.-B. Chng, Y. Su, K.-B. Lim, C.-K. Chui, Robot-assisted training in laparoscopy using deep reinforcement learning. IEEE Robot. Autom. Lett. 4 (2019) 9. S. Nie, Z. Jiang, H. Zhang, B. Cai, Y. Yao, Inshore ship detection based on mask R-CNN. IGARSS (2018) 10. H.K. Yuen, Image blending with mask R-CNN and OpenCV (22 July 2018). Retrieved from https://medium.com/softmind-engineering/image-blending-with-mask-r-cnnand-opencv-eb5ac521f920 11. Michael Sugimura, Stuart Weitzman boots, designer bags, and outfits with mask R-CNN (8 October 2018). Retrieved from https://towardsdatascience.com/stuart-weitzman-boots-des igner-bags-and-outfits-with-mask-r-cnn-92a267a02819 12. S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks. NIPS (2015) 13. R. Girshick, Fast R-CNN. ICCV (2015) 14. Matterport Inc., Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow (2017). Retrieved from https://github.com/matterport/Mask_RCNN 15. Conde Nast Traveler, 70 people recite their country’s tourism slogan. YouTube (26 June 2018). https://www.youtube.com/watch?v=yMoNUHofmPI 16. NDTV, Watch: PM Modi’s Q&A session in London with Prasoon Joshi. YouTube (19 April 2018). https://www.youtube.com/watch?v=WYEyFMaef4M
A Survey of Image Enhancement and Object Detection Methods Jinay Parekh, Poojan Turakhia, Hussain Bhinderwala, and Sudhir N. Dhage
Abstract Image enhancement is a classical problem in computer vision. Image enhancement techniques aim to enlarge the size and quality of low-resolution (LR) images to high-resolution (HR) image. Various techniques have been developed over the years, for example, the traditional methods of upscaling to applying neural networks generating output using trained models and datasets. On the other hand, object detection has vast use cases in modern inference systems like face detection to text detection. This paper surveys the various techniques of image enhancement and object detection along with their contributions and methodologies. Keywords Image enhancement · Image super resolution · Generative adversarial networks (GAN’s) · Convolutional neural networks · Deep learning methods · Image quality · Image enhancement · Object detection
1 Introduction Image enhancement envelops the procedures of changing pictures, regardless of whether they are computerized or conventional photos or representations. Customary simple picture altering is known as photograph correcting, utilizing devices, for example, a digitally embellish to adjust photos, or altering representations with any conventional workmanship medium. Realistic programming programs, which can J. Parekh (B) · P. Turakhia · H. Bhinderwala · S. N. Dhage Computer Engineering Department, Sardar Patel Institute of Technology, Andheri (West), Mumbai 400059, India e-mail: [email protected] P. Turakhia e-mail: [email protected] H. Bhinderwala e-mail: [email protected] S. N. Dhage e-mail: [email protected] © Springer Nature Singapore Pte Ltd. 2021 S. K. Bhatia et al. (eds.), Advances in Computer, Communication and Computational Sciences, Advances in Intelligent Systems and Computing 1158, https://doi.org/10.1007/978-981-15-4409-5_91
1035
1036
J. Parekh et al.
be comprehensively assembled into 3D modelers, raster illustrations editors, and vector designs editors, are the essential instruments with which a client may control, improve, and change pictures. Many picture-altering projects are likewise used to render without any prior preparation. In CV, the way toward improving the quality of a picture is deliberately controlled by programming the image. It is extremely straightforward, for example, to make an image lighter or darker, or to augment or decrease the separation between pixels. Impelled photo improvement programming similarly supports various diverts for adjusting pictures in various ways. Object detection is a computer advancement related to vision and picture preprocessing that oversees perceiving events of semantic objects of a particular class (for example, people, structures, or vehicles) in advanced pictures and recordings. The utilization cases are interminable, be it pedestrian detection, people counting, face detection, tracking objects, anomaly detection, self-driving cars, or video surveillance, and the rundown goes on. There are various object detection methods, and they are mainly classified into machine learning methods and deep learning methods [1]. Here in this survey, we will touch on some of the important methods and various observations observed in various papers.
2 Image Enhancement Techniques The fundamental motive of any image enhancement algorithm is to alter the characteristics and features of the input image such that the output image becomes acceptable for a particular area of interest. Maini and Aggarwal in [2] have given a detailed description of some of the traditional methods of image enhancement. Lee in his paper proposed a preprocessing algorithm for enhancement of such lowlight images [3]. The enhancement techniques can generally be subdivided into two broader categories.
2.1 Traditional Image Enhancement Techniques Some of the traditional image enhancement techniques are as follows: Frequency Domain Methods In this case, there is a need to transfer the image to the frequency domain by applying Fourier transformation after which the enhancement algorithms are applied after which we take inverse Fourier transformation to get the results in the spatial domain. Spatial Domain Methods In this case, we directly alter the intensity values of the pixel in the image by some function in order to obtain an application specific enhanced image. Some of the spatial domain techniques that are mentioned in [2, 4] are as follows:
A Survey of Image Enhancement and Object Detection Methods
1037
Negative of Image This is a relatively easier and computationally less intensive transformation method in which each pixel value of the input image is simply deducted from 255 (in the case of gray-level images having eight levels) in order to obtain the negative of the image. The lighter and more subtle details present in the relatively darker areas are intensified in the negative image. The mathematical formula is shown as: s = (255 − r); s: output image; r: input image. Thresholding This technique is used to remove the noise that might have crept in the image. It is also used in image segmentation which is a method to differentiate the object that is in consideration from its surrounding. A threshold can be decided by trial-and-error method or by rigorous experimental observations. All the pixel values greater than or equal to the threshold are changed to the highest level, i.e., 255, and all the pixels having values less the threshold value are changed to 0. Thus, the resultant image is a binary image comprising of only two levels. The mathematical representation is given below: s = 255 if r ≥ θ; s = 0 if r < θ; s: output image; r: input image; θ: threshold value. Logarithmic Transformation Increment in the value of the dark pixels along with simultaneous compression of higher level values is carried out in this process. The images having a wide range of intensity values are compressed. A very narrow range of low-intensity pixel values that are found in the input image are transformed to a wider range of high-level pixel values. This is used to accentuate the intricacies in the image by intensifying the contrast and diminishing the intensity values. The mathematical representation is given below: s = c * log (1 + r); s: output image; r: input image; c: constant. Power-Law Transformations Gamma correction is the term given to the transformation in which each pixel is subjected to exponentiation by a constant γ and then further multiplied by a constant c. This type of transformation is mostly used in monitor displays and magnetic resonance imaging (MRI) scans. The mathematical representation is given below: s = cr γ ; s: output image; r: input image; c and γ : constant. Gray-Level Slicing This helps in intensifying a particular range of gray level in an image. The application areas of this technique are enhancement of characteristics of different images and augmenting the defects in X-ray images. There are two approaches commonly used in this method: • Non-preservation of background: A high value of 255 is set for all the gray levels that are present in the area of interest and a low value of 0 is set for the other areas that are not present in the desired range. It produces a binary image comprising of two levels. • Preservation of background: Brightening of the desired range of gray levels and preservation of the background of the image at the same time are the two important functions performed in this method. The output in this technique is a gray-level image.
1038
J. Parekh et al.
Bit Plane Slicing The pixel values are represented by 8 bits in the binary representation. A large part of the information related to the visuals and the shape of the image is stored by the higher order or the first 4 bits. Information focused on more intricate details is stored in the last 4 bits. Thus, eight different images are developed to closely observe the features by considering each bit from every pixel in the image. The contribution of every bit in the image can be studied and analyzed in detail. We find that the images produced by the lesser significant bits are not clear, whereas the images produced by the more significant bits have much more clarity in terms of the shape and the general structure of the object. Histogram Equalization This technique is used in image segmentation, compression, and enhancement. A histogram is a plot of the amount of pixels versus the number of gray levels. This method merges various insignificant gray levels into one by the method of approximation hence reducing the total number of levels on the x-axis.
2.2 Image Super-Resolution Technique Super resolution (SR) is reconstructing high-resolution (HR) images using one or more low-resolution (LR) observations of the same frame. The SR can be categorized into two categories single image super resolution (SISR) and Multi-image super resolution (MISR) depending on the number of images in input [5]. Reconstructionbased super resolution is possible since LR image can be mapped using patches to construct the high-resolution image [6]. POCS: Super Resolution Zamani in his paper used projection onto convex Sets (POCS) technique. Zamani compared various resampling algorithms like replica, bilinear, bicubic spline, Lancosz, and POCS. Out of all the methods, POCS gives highest accuracy in natural scene statistics (NSS) metric [7]. Luft in their paper proposed a technique to improve the perceptual nature of pictures that contain profundity data. They proposed an answer where a veil is made by subtracting a lowpass separated duplicate from the information picture, which adequately results in a high-pass channel. It brought about an extra semantic data of the spatial connection between the items in the scene [8]. POCS algorithm is a way to discover component of a viable region characterized by the convergence of various arched imperatives, beginning with an arbitrary point. GAN’s and Two-Way GAN’s The major advantages of generative adversarial network (GAN) is that it is able to produce extremely realistic images of any object because of the optimization the objective function at a high accuracy rate by the GAN model. Another advantage of using GAN’s is that it does not require a lot of the prior and posterior probability calculations often necessary for another competing approach. The disadvantages of using GAN’s are that it takes up a lot of computational power to train these networks which is not feasible always. The function these
A Survey of Image Enhancement and Object Detection Methods
1039
networks try to optimize is a loss function that has no closed form thus optimizing this loss function is very hard and requires a lot of trial-and-error practices. Also, the network is known to be unstable. Due to the above flaws in [9], the authors have proposed a method which is based on the framework comprising of a two-way generative adversarial networks (GAN’s) with various significant advancements in the existing GAN model. The improvements suggested in [9] include improvement of the convolutional neural network by addition of global characteristics along with the addition of a modifiable weighting technique for Wasserstein GAN (WGAN) which could be changed as and when required. The dataset that was used by the authors for the purpose of training and testing the model was the MIT-Adobe 5K dataset. The total quantity of images in this dataset was 5000, and each of which was altered by five experienced photographers using global and local modifications. The model takes in photos and the different characteristics that the user desires to have in the resultant image as an input. After receiving the input, this method aims at finding out the common features present in the photos which helps the model to develop an enhancer, so that the image is enhanced as well as shares similar attributes to the input image. The method trained with the HDR dataset gave the highest accuracy as compared to the other methods. The resultant image was uniform in terms of texture and contrast and clear according to the observers. The authors also performed a user study with 20 participants and 20 images that were enhanced with their model, DPED, CycleGAN, NPEA, and CLHE in which 81% of the users preferred their image enhancement model over the others. The only drawback observed in the model was that it could increase the noise only if the input was very dark and contained a considerable quantity of noise. Sodanil and Intarat [10] tell us about a technique known as homomorphic filtering which is used for enhancement of the low-resolution footages obtained from the CCTV’s throughout the world. This technique is useful in removing multiplicative noise. This technique is also used for correction of irregular illumination frequently found in images. Division of the video clip into image frames is the first step in this method. After that the horizontal and vertical parts are separated as a part of preprocessing after which the actual homomorphic filtering takes place. The results of this method show a good performance in terms of peak signal-to-noise ratio (PSNR). Hence, this technique has a wide application in the security domain especially in enhancing CCTV footages because the amount of disturbances in the original image is reduced. The performance of the proposed method is measured in terms of HMMOD process and has a numerical value of 23.97%.
2.3 Image Super Resolution Using Deep Learning SRCNN Super resolution of an image is a technique where a low-resolution (LR) image is scaled up to a high-resolution (HR) image. Single image super resolution is a challenge in computer vision. Convolutional neural network (CNN) has likewise been utilized with huge enhancements in precision [11]. Keeping CNN in mind Dong
1040
J. Parekh et al.
et al. [12] proposed a profound learning strategy for single picture super goals (SR) which learns start to finish mapping between the low/high-goals pictures spoke to as a profound convolutional neural network (CNN) calling it SRCNN [11]. The properties of SRCNN are: Structure is simple and has high accuracy compared to most of the modern methods. SRCNN uses less number of filters and layers and has achieves superior speed even using a CPU. The reason for its speed is the usage of fully feed-forward technique and no necessity for solving optimization issues. On experimentation, it is proven that there is scope for restoration quality when • Datasets are larger and more diverse. • It has a deeper model. On the contrary, larger datasets/models pose difficulties for the current methods. Besides, the proposed network is also compatible with three channels of color images. It enhances the super-resolution performance. SRCNN has contributed in the following ways: • It has made use of convolutional neural network for super resolution. It is capable to implement end-to-end mapping between LR and HR images with minimum pre/postprocessing. • A connection exists between SRCNN and traditional sparse-coding-based SR methods. This has provided insights for design of network structure. • The authors proved that deep learning can be used in the super-resolution challenge of computer vision domain. SRGAN This method proposed by Ledig et al. [13] is super-resolution generative adversarial networks (SRGAN) which is equipped for recuperating photo-realistic pictures from four downsamplings. They make use of a perceptual loss function which comprises of a content loss and an adversarial loss likewise utilizing a discriminator system to separate between the ground truth and the output enhanced image. The adversarial loss generated carry forwards their solution to natural image manifold which is implemented by a discriminator network trained for differentiating SR images and ground truth. The authors also use a content loss which is driven by similarity of perception rather than similarity in pixel space [13]. GAN’s are a strong framework for generating images of superior perceptual quality and for developing realistic natural images. The GAN’s help in reconstructions of image to be in the region of search space having high probability of holding ground truth images resulting to closeness to the natural image manifold [13]. The main contributions by the authors are: • A new method for SR with enhancing factors of 4× (calculated by PSNR and structural similarity) was achieved with their 16 blocks deep ResNet optimized for MSE. • For achieving a new loss due to perception, SRGAN was optimized. They replaced the content loss achieved by MSE with a deterministic and calculated approach on feature maps of VGG network. It is less variant to changes in pixel space.
A Survey of Image Enhancement and Object Detection Methods
1041
Fig. 1 Comparision of PSNR/Perceptual index values of various image enhancement algorithms
• The mean opinion score (MOS) scored by SRGAN while running on the images of three different benchmark datasets confirms that it is a modern SR method by a very high margin, for any SR method with high upscaling factors (4×). ESRGAN SRGAN was a groundbreaking method capable of generating textures to develop photorealistic images. But at times, the details are not clear enough or have sharp details. To overcome this and to develop an enhanced SRGAN (ESRGAN), the authors studied three important parts of SRGAN—network architecture, adversarial loss, and perceptual loss, and improvised them. The authors introduced residual in residual dense block (RRDB) without batch normalization as the building element for the primary network. They also tried to derive their system upon the idea of relativistic GAN such that the discriminator can find the relative realness in the image rather than the absolute value. They also enhanced the loss due to perception by implementing the features ahead of the activation stage. This resulted in consistent brightness and maximum texture recovery [14]. The authors proposed ESRGAN has delivered amazing results and outperformed previous methods in sharpness and details. PSNR oriented methods have blurry images, and they contain disturbing noise. On the other hand, ESRGAN can produce sharper images. ESRGAN is also capable of producing fine details in structures in architectural building images while other methods add undesired textures. ESRGAN has proven to get more natural textures, e.g., skin, fur, building structure, grass fineness, etc. (Fig. 1).
3 Object Detection Techniques Object detection is a computer advancement related with image preprocessing that manages recognizing occurrences of semantic objects of a specific class (for example, people, structures, or vehicles) in advanced pictures and recordings. Object detection is utilized everywhere nowadays. Research in acknowledgment of both recognizable
1042
J. Parekh et al.
and new faces all around normally utilizes great pictures of target individuals. Be that as it may, the ongoing and new improvements in security frameworks convey a specific issue with picture quality [15]. The utilization cases are interminable, be it pedestrian detection, people counting, face detection, tracking objects, anomaly detection, self-driving cars, or video surveillance, and the rundown goes on. There are various object detection methods and they are mainly classified into machine learning methods and deep learning methods [1]. Here in this survey, we will touch on some of the important methods and various observations observed in various papers.
3.1 Segmentation-Based Object Detection Semertzidis in his paper have proposed an approach based on the concept of exemplars. They created their own dataset from CCTV footages and also manually created segmentation mask for each detection/person. For segmentation, they used dice similarity and KNN, both resulting into similar accuracy of 82% when aggregated [16]. Shan came up with a unique solution for handling face recognition of using pose variability compensation method in addition with adaptive principal component analysis (APCA) [17]. Classification discloses to us that the picture has a place with a specific class. It does not consider the point by point pixel level structure of the picture. It comprises of making a forecast for an entire info. We can separate the picture into different parts called segments. It is anything but an extraordinary thought to process the whole picture simultaneously as there will be locales in the picture which do not contain any data. By isolating the picture into segments, we can utilize the significant fragments for preparing the picture. Semantic segmentation makes thick expectations deducing marks for every pixel, so every pixel in the picture is named with the class of its encasing object. Object detection gives the classes as well as show the spatial area of those classes. It considers the covering of items. Instance segmentation incorporates identification of limits of the items at the point by point pixel level. There are different segmentation methods, like edge detection segmentation, region-based segmentation, and segmentation based on clustering. Segmentation can be done using RCNN, K-means, KNN, and other methods. We here have surveyed on the three methods given below. RCNN RCNN is broadly utilized in taking care of the issue of object recognition and detection. It makes a limit around each object that is available in the given picture. It very well may be done in two stages: region proposition stage and the classification stage. Classification stage comprises of extraction of set of linear SVMs and feature vectors. To tackle the issue of choosing countless areas, a particular search is utilized to remove only 2000 areas from the picture, and this is known as region proposition. In this manner, rather than attempting to characterize countless regions, we can simply work with 2000 regions. The selective search algorithm can be performed in the following steps:
A Survey of Image Enhancement and Object Detection Methods
1043
• Generate initial sub-segmentation (many candidate regions). • Use a greedy algorithm to recursively combine similar regions. • Use generated regions to produce the final region proposals. These proposed areas are then bolstered into the convolutional neural system and produce a 4096-D element vector as yield. CNN separates a feature vector for every district which is then utilized as a contribution to the arrangement of SVMs that yields a class mark. The calculation additionally predicts four offset units to counterbalance esteems and to increase the accuracy of the bounding box. Drawback with RCNN is that it requires a lot of time to train and in this way cannot be executed for real-time issues. K-Means K-means strategy depends on grouping and partitioning to such an extent that the pixels inside a cluster are progressively comparative when contrasted with the pixels in different clusters. K-means strategy is iterative which fragments a picture to shape k clusters. In color-based segmentation using segmented K-means method, Euclidean distance metric is used to segment pixels. The segmentation steps are as follows: • At first k bunch focuses are picked, either arbitrarily or relying upon some predefined criteria. • Every pixel is allocated to the cluster with the end goal that the separation between the pixel and group focus is least. • Cluster focuses are recomputed by ascertaining the normal of the considerable number of pixels in the group. Drawback of K-means is that the K-means algorithm provides great segmentation output for fewer estimations of k. Be that as it may, as the estimation of k builds, division is exceptionally coarse, for example numerous clusters show up in the images at discrete spots. KNN K-nearest neighbor is a straightforward procedure which gives great classification exactness. Considering k focuses as the closest neighbors of x, utilizing a specific distance metric, the characterization of x is the class name that is mostly of the k neighbors. The choice limits are diverse dependent on the distance metric utilized. The steps are as follows: • Indicate positive whole number k and the new sample. • k sections in the dataset are chosen which are like the new sample. • Classification is found in such common entries. Along these lines new sample is classified. In K-nearest neighbor utilizing, segmentation pixel based on color can be arranged in a picture by processing the separation between the pixel of a picture and the color marker. The pixel that correctly coordinates the color marker is the one which has the base separation between the pixel and the color marker.
1044
J. Parekh et al.
3.2 Machine Learning-Based Object Detection Fahad in his paper discusses various identification techniques to identify a person viz feature-based, holistic-based, and hybrid-based out of which holistic-based identification gave excellent recognition results. Fahad also examined three algorithms for image enhancement and had an accuracy rate of 85% for artificial neural network (ANN), 75% for principal component analysis (PCA), and 65% for single-value decomposition (SVD) [18]. Artificial Neural Network (ANN) It is characterized as “a registering framework made up of various basic, exceptionally interconnected handling components, which procedure data by their dynamic state reaction to outside information sources.” Similar to the mind, the artificial neural network, copies this organic neural network of human body. The first ANN was made such a large number of years before by the neurophysiologist Warren McCulloch and the rationalist Walter Pits in 1943. They made a computational model for neural networks for the edge rationale which is the rationale dependent on the arithmetic and algorithms. ANN learns through past understanding and learnings. So, there ought to prepare for the ANN. Preparing for ANN do by institutionalizing the majority of the “loads” utilizing two procedures known as forward spread and back engendering. In forward propagation, test loads are contribution to the ANN through the sources of info and the regarded test yields are recorded. Here, the sources of info are sustained, and yields for the information sources are gotten. In the back propagation, as the name propose working from the yield units through the concealed units to the information units, considering the mistake edge of the yields got in every layer, the data sources are balanced so as to diminish the edge of the blunder. The mentor for the ANN has the officially determined yield esteems for the sources of info. Along these lines, in the wake of accepting the yield for the information sources, the mentor will examine whether the genuine yield and the yield delivered by the ANN are the equivalent. If not, a blunder worth is determined and sent back into the ANN framework. At every one of the layers, the blunder worth is researched constantly and used to adjust the limit and loads for the resulting input. Through this, the mistake edge will be lessen step by step, and the ANN will figures out how to break down the qualities and produce the exact outcomes for the sources of information. Principal Component Analysis (PCA) PCA is a measurable procedure to extract insights and various patterns in a dataset. PCA just believers your dataset to recognize concealed connections, similitudes, or contrasts, at that point you can make dimension reduction, information compression, or feature extraction over its yield. Be that as it may, PCA is the best known and used to lessen the components of dataset. Just, the more dimensions data has, the harder to process it. Consequently, dimensionality-decreased methods like PCA, LDA are connected to separate new incredible highlights from information, and these new highlights or segments are utilized rather than unique features. Albeit, some of information are taken out and chose best parts ought to be sufficient to process. To examine and fabricate new
A Survey of Image Enhancement and Object Detection Methods
1045
dataset (diminished in dimensions) from unique one by PCA, following steps are used in general: • • • • •
Get the dataset. Calculate the covariance matrix of data. Calculate the Eigen vectors and Eigen values over covariance matrix. Choose the principal components. Construct new featured dataset from chosen components.
Single-Value Decomposition (SVD) Single-value decomposition is a lattice factorization technique which is utilized in different areas of science and innovation. Besides, because of extraordinary advancements of AI, information mining, and hypothetical software engineering, SVD has been observed to be increasingly significant. Matrix factorization is a portrayal of a framework into a result of lattices. There is a wide range of framework factorization, each utilized for various class of issues.
4 Conclusion The traditional methods that comprise of digital negative, thresholding, and bit-plane slicing are way too simple to enhance the colored images that are mostly used in today’s world. These point operations that involve transformation of the pixel values by some specific function usually cater to a single problem. Although these methods are less efficient as compared to the highly efficient algorithms built on machine learning and deep learning, they are computationally less intensive. These methods can be used as an initial step for the images which might provide better results when passed through the advanced algorithms. Combining the traditional methods with the advanced algorithms also might reduce the computations and might provide better and faster results. Also, we found that well-lit areas provide clearer images but with little variation in exposure can often make the objects in the image difficult to detect [19]. Even rapid motion of objects in the scene also causes the video images to be blurred and unclear [20]. The main aim of regulated SR calculations is limiting the mean squared error (MSE) between the ground truth and the reproduced highresolution (HR) picture. This is helpful as limiting MSE additionally amplifies the PSNR, which is a typical measure used to assess and look at SR algorithm [21]. The capacity of PSNR and MSE to catch perceptually applicable contrasts, for example, high texture detail, is exceptionally constrained as they are characterized dependent on pixel-wise picture contrasts. Therefore, metrics like SSIM structural similarity are also taken into consideration [13, 21, 22].
1046
J. Parekh et al.
References 1. M.F.E.M. Senan, S.N.H.S. Abdullah, W.M. Kharudin, N.A.M. Saupi, CCTV quality assessment for forensics facial recognition analysis, in 2017 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence (IEEE, 2017), pp. 649–655 2. R. Maini, H. Aggarwal, A comprehensive review of image enhancement techniques. arXiv preprint arXiv:1003.4053 (2010) 3. S.-W. Lee, V. Maik, J. Jang, J. Shin, J. Paik, Segmentation-based adaptive spatio-temporal filtering for noise canceling and mpeg pre-processing, in 2005 Digest of Technical Papers. International Conference on Consumer Electronics, 2005. ICCE (IEEE, 2005), pp. 359–360 4. S.S. Bedi, R. Khandelwal, Various image enhancement techniques—a critical review. Int. J. Adv. Res. Comput. Commun. Eng. 2(3) (2013) 5. W. Yang, X. Zhang, Y. Tian, W. Wang, J.-H. Xue, Q. Liao, Deep learning for single image super-resolution: a brief review. IEEE Trans. Multimed. (2019) 6. L.C. Pickup, Machine learning in multi-frame image super-resolution, PhD thesis, Oxford University, 2007 7. N.A. Zamani, M.Z.A. Darus, S.N.H.S. Abdullah, M.J. Nordin, Multiple-frames superresolution for closed circuit television forensics, in 2011 International Conference on Pattern Analysis and Intelligence Robotics, vol. 1 (IEEE, 2011), pp. 36–40 8. T. Luft, C. Colditz, O. Deussen, Image enhancement by unsharp masking the depth buffer, vol. 25 (ACM, 2006) 9. Y.-S. Chen, Y.-C. Wang, M.-H. Kao, Y.-Y. Chuang, Deep photo enhancer: unpaired learning for image enhancement from photographs with GANs, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018 10. M. Sodanil, C. Intarat, A development of image enhancement for CCTV images, in 2015 5th International Conference on IT Convergence and Security (ICITCS) (IEEE, 2015), pp. 1–4 11. J. Kim, J.K. Lee, K.M. Lee, Accurate image super-resolution using very deep convolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 1646–1654 12. C. Dong, C.C. Loy, K. He, X. Tang, Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015) 13. C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., Photo-realistic single image super-resolution using a generative adversarial network, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 4681–4690 14. X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, C.C. Loy, ESRGAN: enhanced super-resolution generative adversarial networks, in Proceedings of the European Conference on Computer Vision (ECCV) (2018) 15. A.M. Burton, S. Wilson, M. Cowan, V. Bruce, Face recognition in poor-quality video: evidence from security surveillance. Psychol. Sci. 10(3), 243–248 (1999) 16. T. Semertzidis, A. Axenopoulos, P. Karadimos, P. Daras, Soft biometrics in low resolution and low quality CCTV videos (2016) 17. T. Shan, S. Chen, C. Sanderson, B.C. Lovell, Towards robust face recognition for intelligentCCTV based surveillance using one gallery image, in 2007 IEEE Conference on Advanced Video and Signal Based Surveillance (IEEE, 2007), pp. 470–475 18. S. Fahad, S. Ur Rahman, I. Khan, S. Haq, An experimental evaluation of different face recognition algorithms using closed circuit television images, in 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP) (IEEE, 2017), pp. 51–54 19. B. Boom, Face recognition’s grand challenge: uncontrolled conditions under control. Citeseer (2010) 20. C. Henderson, S.G. Blasi, F. Sobhani, E. Izquierdo, On the impurity of street-scene video footage (2015)
A Survey of Image Enhancement and Object Detection Methods
1047
21. C.-Y. Yang, C. Ma, M.-H. Yang, Single-image super-resolution: a benchmark, in European Conference on Computer Vision (Springer, 2014), pp. 372–386 22. Z. Wang, A.C. Bovik, H.R. Sheikh, E.P. Simoncelli et al., Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)