480 39 67MB
English Pages XVI, 663 [679] Year 2021
Advances in Intelligent Systems and Computing 1181
Ajith Abraham Patrick Siarry Kun Ma Arturas Kaklauskas Editors
Intelligent Systems Design and Applications 19th International Conference on Intelligent Systems Design and Applications (ISDA 2019) held December 3–5, 2019
Advances in Intelligent Systems and Computing Volume 1181
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/11156
Ajith Abraham Patrick Siarry Kun Ma Arturas Kaklauskas •
•
•
Editors
Intelligent Systems Design and Applications 19th International Conference on Intelligent Systems Design and Applications (ISDA 2019) held December 3–5, 2019
123
Editors Ajith Abraham Scientific Network for Innovation and Research Excellence Machine Intelligence Research Labs (MIR) Auburn, WA, USA Kun Ma School of Information Science and Engineering University of Jinan Jinan, Shandong, China
Patrick Siarry Université Paris-Est Créteil Val-de-Ma Creteil Cedex, France Arturas Kaklauskas Department of Construction Management and Real Estate Vilnius Gediminas Technical University Vilnius, Lithuania
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-49341-7 ISBN 978-3-030-49342-4 (eBook) https://doi.org/10.1007/978-3-030-49342-4 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Welcome to the 19th International Conference on Intelligent Systems Design and Applications (ISDA’19) held in the World Wide Web. ISDA’19 is hosted and sponsored by the Machine Intelligence Research Labs (MIR Labs), USA. Due to the xenophobic attacks, which spread in South Africa during early September 2019, several authors requested us to withdraw their papers. Hence, we had to change the venue to online mode. ISDA’19 brings together researchers, engineers, developers and practitioners from academia and industry working in all interdisciplinary areas of computational intelligence and system engineering to share their experience, and to exchange and cross-fertilize their ideas. The aim of ISDA’19 is to serve as a forum for the dissemination of state-of-the-art research, development and implementations of intelligent systems, intelligent technologies and useful applications in these two fields. ISDA’19 received submissions from 33 countries and each paper was reviewed by at least five or more reviewers, and based on the outcome of the review process, 62 papers were accepted for inclusion in the conference proceedings (40% acceptance rate). First, we would like to thank all the authors for submitting their papers to the conference, for their presentations and discussions during the conference. Our thanks go to program committee members and reviewers, who carried out the most difficult work by carefully evaluating the submitted papers. Our special thanks to the following plenary speakers, for their exciting plenary talks: • • • •
Michael Pecht, University of Maryland, USA Yukio Ohsawa, University of Tokyo, Japan Karim Djouani, University Paris Est Creteil (UPEC), Paris, France Mourad Fakhfakh, National School of Electronics and Telecommunications of Sfax, Tunisia • Kaushik Das Sharma, University of Calcutta, India. • Ali Siadat, Ecole Nationale Superieure d’Arts et M tiers (ENSAM), France • Fabio Scotti, Università degli Studi di Milano, Italy
v
vi
Preface
We express our sincere thanks to the organizing committee chairs for helping us to formulate a rich technical program. Enjoy reading the articles! Ajith Abraham Patrick Siarry General Chairs Kun Ma Arturas Kaklauskas Program Chairs
Organization
Program Committee Ajith Abraham Laurence Amaral Babak Amiri Mauricio Ayala-Rincon Nashwa El-Bendary Heder Bernardino José Everardo Bessa Maia Mohammad Reza Bonyadi János Botzheim Alberto Cano Paulo Carrasco Oscar Castillo Turgay Celik Isaac Chairez Lee Chang-Yong Francisco Chicano Mario Giovanni C. A. Cimino Phan Cong-Vinh Gloria Cerasela Crisan Haikal El Abed El-Sayed M. El-Alfy Wilfried Elmenreich Carlos Fernandez-Llatas Amparo Fuster-Sabater Terry Gafron
Machine Intelligence Research Labs, USA Federal University of Uberlandia The University of Sydney Universidade de Brasilia Arab Academy for Science, Technology and Maritime Transport, Egypt Universidade Federal de Juiz de Fora State University of Ceará - UECE The University of Adelaide Budapest University of Technology and Economics Virginia Commonwealth University University of Algarve Tijuana Institute of Technology University of the Witwatersrand UPIBI-IPN Kongju National University University of Máalaga University of Pisa Nguyen Tat Thanh University “Vasile Alecsandri” University of Bacau German International Cooperation (GIZ) GmbH King Fahd University of Petroleum and Minerals Alpen-Adria-Universität Klagenfurt Universitat Politècnica de València Institute of Applied Physics (C.S.I.C.), Serrano 144, 28006 Madrid, Spain Bio-Inspired Technologies
vii
viii
Elizabeth Goldbarg Stefan Gruner Biju Issac Isabel Jesus Jerry Chun-Wei Lin Simone Ludwig Ana Madureira Vukosi Marivate Efrén Mezura-Montes Jolanta Mizera-Pietraszko Paulo Moura Oliveira Ramzan Muhammad Akila Muthuramalingam Janmenjoy Nayak Varun Ojha George Papakostas
Konstantinos Parsopoulos Carlos Pereira Eduardo Pires Dilip Pratihar Radu-Emil Precup Oscar Gabriel Reyes Pupo José Raúl Romero Keun Ho Ryu Ozgur Koray Sahingoz Neetu Sardana Mansi Sharma Tarun Kumar Sharma Mohammad Shojafar Patrick Siarry Antonio J. Tallón-Ballesteros Shing Chiang Tan Sanju Tiwari Jih Fu Tu Eiji Uchino Leonilde Varela Gai-Ge Wang Lin Wang
Organization
Federal University of Rio Grande do Norte University of Pretoria Teesside University Institute of Engineering of Porto Western Norway University of Applied Sciences North Dakota State University Departamento de Engenharia Informática University of Pretoria University of Veracruz Wroclaw University of Technology UTAD University Maulana Mukhtar Ahmad Nadvi Technical Campus KPR Institute of Engineering and Technology Veer Surendra Sai University of Technology University of Reading Human–Machines Interaction (HMI) Laboratory, Department of Computer and Informatics Engineering, EMT Institute of Technology University of Ioannina ISEC UTAD University Department of Mechanical Engineering Politehnica University of Timisoara UCO University of Cordoba Chungbuk National University Istanbul Kultur University Jaypee Institute of Information Technology Indian Institute of Technology, Delhi Amity University, Rajasthan University of Padua, Italy Universit de Paris 12 University of Huelva Multimedia University NIT Department of Electronic Engineering, St. Johns University Yamaguchi University University of Minho School of Computer Science and Technology, Jiangsu Normal University University of Jinan
Organization
ix
Additional Reviewers Adly, Mohammad Ahuactzin, Juan-Manuel Bagnall, Anthony Barbudo Lunar, Rafael Berkich, Don Crisan, Gloria Cerasela Das Sharma, Kaushik Diniz, Thatiana Gabriel, Paulo
Goyal, Ayush Kassem, Abdallah Mckinlay, Steve Pérez, Eduardo Ramírez, Aurora Salado-Cid, Rubén Tiago Da Cunha, Italo Timm, Nils
Contents
Data Jackets as Communicable Metadata for Potential Innovators – Toward Opening to Social Contexts . . . . . . . . . . . . . . . . . Yukio Ohsawa, Sae Kondo, and Teruaki Hayashi
1
A Proposal Based on Instance Typicality for Dealing with Nominal Attribute Values in Instance-Based Learning Environments . . . . . . . . . S. V. Gonçalves and M. C. Nicoletti
14
Dataset for Intrusion Detection in Mobile Ad-Hoc Networks . . . . . . . . . Rahma Meddeb, Bayrem Triki, Farah Jemili, and Ouajdi Korbaa
24
Visual Password Scheme Using Bag Context Shape Grammars . . . . . . . Blessing Ogbuokiri and Mpho Raborife
35
Peak Detection Enhancement in Autonomous Wearable Fall Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mario Villar and Jose R. Villar
48
Automated Detection of Tuberculosis from Sputum Smear Microscopic Images Using Transfer Learning Techniques . . . . . . . . . . . Lillian Muyama, Joyce Nakatumba-Nabende, and Deborah Mudali
59
Comparative Performance Analysis of Neural Network Base Training Algorithm and Neuro-Fuzzy System with SOM for the Purpose of Prediction of the Features of Superconductors . . . . . Subrato Bharati, Mohammad Atikur Rahman, Prajoy Podder, Md. Robiul Alam Robel, and Niketa Gandhi
69
Automatic Detection of Parkinson’s Disease from Speech Using Acoustic, Prosodic and Phonetic Features . . . . . . . . . . . . . . . . . . . . . . . Rania Khaskhoussy and Yassine Ben Ayed
80
xi
xii
Contents
A Deep Convolutional Neural Network Model for Multi-class Fruits Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laith Alzubaidi, Omran Al-Shamma, Mohammed A. Fadhel, Zinah Mohsin Arkah, and Fouad H. Awad
90
Distributed Architecture of Snort IDS in Cloud Environment . . . . . . . . 100 Mondher Essid, Farah Jemili, and Ouajdi Korbaa Turing-Style Test Approach for Verification and Validation of Unmanned Aerial Vehicles’ Intelligence . . . . . . . . . . . . . . . . . . . . . . . 112 Marwa Brichni and Said El Gattoufi Big Data Processing for Intrusion Detection System Context: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Marwa Elayni, Farah Jemili, Ouajdi Korbaa, and Basel Solaiman Hardware Accelerator for Real-Time Holographic Projector . . . . . . . . . 132 Mohammed A. Fadhel, Omran Al-Shamma, and Laith Alzubaidi Automatic Lung Segmentation in CT Images Using Mask R-CNN for Mapping the Feature Extraction in Supervised Methods of Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Luís Fabrício de F. Souza, Gabriel Bandeira Holanda, Shara S. A. Alves, Francisco Hércules dos S. Silva, and Pedro Pedrosa Rebouças Filho Structures Discovering for Optimizing External Clustering Validation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Marcos A. Spalenza, Juliana P. C. Pirovani, and Elias de Oliveira The Influence of NER on the Essay Grading . . . . . . . . . . . . . . . . . . . . . 162 Elias Oliveira, James Alves, Jessica Brito, and Juliana Pirovani Evaluation of Acoustic Features for Early Diagnosis of Alzheimer Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Randa Ben Ammar and Yassine Ben Ayed P-Median Problem: A Real Case Application . . . . . . . . . . . . . . . . . . . . . 182 M. B. Bernábe-Loranca, R. González-Velázquez, Erika Granillo-Martinez, M. Romero-Montoya, and Ricardo A. Barrera-Cámara Towards Context-Aware Business Process Cost Data Analysis Including the Control-Flow Perspective . . . . . . . . . . . . . . . . . . . . . . . . . 193 Dhafer Thabet, Nourhen Ganouni, Sonia Ayachi Ghannouchi, and Henda Hajjami Ben Ghezala Drone Authentication Using ID-Based Signcryption in LoRaWAN Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Sana Benzarti, Bayrem Triki, and Ouajdi Korbaa
Contents
xiii
Extrinsic Plagiarism Detection for French Language with Word Embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Maryam Elamine, Fethi Bougares, Seifeddine Mechti, and Lamia Hadrich Belguith Using Opinion Mining in Student Assessments to Improve Teaching Quality in Universities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Aillkeen Bezerra de Oliveira, André Luiz F. Alves, and Cláudio de Souza Baptista Automated Threat Propagation Model Through a Topographical Environment Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Kilian Vasnier, Abdel-Illah Mouaddib, Sylvain Gatepaille, and Stephan Brunessaux Solving Lorenz ODE System Based Hardware Booster . . . . . . . . . . . . . 245 Hassan Al-Yassin, Mohammed A. Fadhel, Omran Al-Shamma, and Laith Alzubaidi Object Recognition Software Using RGBD Kinect Images and the YOLO Algorithm for Mobile Robot Navigation . . . . . . . . . . . . 255 Douglas Henke dos Reis, Daniel Welfer, Marco Antonio de Souza Leite Cuadros, and Daniel Fernando Tello Gamarra Paper Co-citation Analysis Using Semantic Similarity Measures . . . . . . 264 Mohamed Ali Hadj Taieb, Mohamed Ben Aouicha, and Houcemeddine Turki Assessment of the ISNT Rule on Publicly Available Datasets . . . . . . . . . 278 J. Afolabi Oluwatobi, Gugulethu Mabuza-Hocquet, and Fulufhelo V. Nelwamondo An Autonomous Fallers Monitoring Kit: Release 0.0 . . . . . . . . . . . . . . . 287 Enrique de la Cal, Alvaro DaSilva, Mirko Fáñez, Jose Ramón Villar, Javier Sedano, and Victor Suárez Random Forest Missing Data Imputation Methods: Implications for Predicting At-Risk Students . . . . . . . . . . . . . . . . . . . . . 298 Bevan I. Smith, Charles Chimedza, and Jacoba H. Bührmann Noise Reduction with Detail Preservation in Low-Dose Dental CT Images by Morphological Operators and BM3D . . . . . . . . . . . . . . . . . . 309 Romulo Marconato Stringhini, Daniel Welfer, Daniel Fernando Tello Gamarra, and Gustavo Nogara Dotto An Effective Approach to Detect and Prevent Collaborative Grayhole Attack by Malicious Node in MANET . . . . . . . . . . . . . . . . . . . . . . . . . . 318 Sanjeev Yadav, Rupesh Kumar, Naveen Tiwari, and Abhishek Bajpai
xiv
Contents
Hand-Crafted and Learned Features Fusion for Predicting Freezing of Gait Events in Patients with Parkinson’s Disease . . . . . . . . . . . . . . . . 336 Hadeer El-ziaat, Nashwa El-Bendary, and Ramadan Moawad Signature of Electronic Documents Based on Fingerprint Recognition Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 Souhaïl Smaoui, Manel Ben Salah, and Mustapha Sakka Comparison of a Trajectory Controller Based on Fuzzy Logic and Backstepping Using Image Processing for a Mobile Robot . . . . . . . 355 Rodrigo Mattos da Silva, Thiago Rodrigues Garcia, Marco Antonio de Souza Leite Cuadros, and Daniel Fernando Tello Gamarra The Use of Area Covered by Blood Vessels in Fundus Images to Detect Glaucoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 J. Afolabi Oluwatobi, Gugulethu Mabuza-Hocquet, and Fulufhelo V. Nelwamondo Complexity of Rule Sets Induced from Data with Many Lost Values and “Do Not Care” Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 376 Patrick G. Clark, Jerzy W. Grzymala-Busse, Zdzislaw S. Hippe, Teresa Mroczek, and Rafal Niemiec ReLU to Enhance MDLSTM for Offline Arabic Handwriting Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Rania Maalej and Monji Kherallah Histogram Based Method for Unsupervised Meeting Speech Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396 Nouha Dammak and Yassine BenAyed Deep Support Vector Machines for Speech Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 Hadhami Aouani and Yassine Ben Ayed Biometric Individual Identification System Based on the ECG Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416 Sihem Hamza and Yassine Ben Ayed Bayesian Anomaly Detection and Classification for Noisy Data . . . . . . . 426 Ethan Roberts, Bruce A. Bassett, and Michelle Lochner How to Trust the Middle Artificial Intelligence: Uncertainty Oriented Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436 Marwa Brichni and Said El Gattoufi Design the HCI Interface Through Prototyping for the Telepresence Robot Empowered Smart Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 Ramona Plogmann, Qing Tan, and Frédérique Pivot
Contents
xv
Ant Colony Optimization on an OBS Network with Link Cost and Impairments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 Francois Du Plessis, M. C. Du Plessis, and Tim Gibbon The Categorical Integration of Symbolic and Statistical AI: Quantum NLP and Applications to Cognitive and Machine Bias Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 Yoshihiro Maruyama Vehicle Routing Problem with Fuel Station Selection (VRPFSS): Formulation and Greedy Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 Jhonata Soares de Freitas and André Gustavo dos Santos Requirements Change Requests Classification: An Ontology-Based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487 Zaineb Sakhrawi, Asma Sellami, and Nadia Bouassida An Efficient MPLS-Based Approach for QoS Providing in SDN . . . . . . 497 Manel Majdoub, Ali El Kamel, and Habib Youssef HoneyBees Mating Optimization Algorithm for the Static Bike Rebalancing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Mariem Sebai, Ezzeddine Fatnassi, and Lilia Rejeb A Hybrid MAC Protocol for Heterogeneous M2M Networks . . . . . . . . . 520 Abdelfetteh Lachtar, Marwa Lachtar, and Abdennaceur Kachouri Hybrid Approach for Trajectory Identification of Mobile Node via Lagrange Interpolation and Kalman Filtering Framework . . . . . . . . . . 530 Pranchal Mishra, Ayush Tripathi, Abhishek Bajpai, and Naveen Tiwari Post-Truth AI and Big Data Epistemology: From the Genealogy of Artificial Intelligence to the Nature of Data Science as a New Kind of Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540 Yoshihiro Maruyama Interoperable Decision Support System Based on Multivariate Time Series for Setup Data Processing and Visualization . . . . . . . . . . . . . . . . 550 M. L. R. Varela, Gabriela Amaral, Sofia Pereira, Diogo Machado, António Falcão, Rita Ribeiro, Emanuel Sousa, Jorge Santos, and Alfredo F. Pereira Cross-Model Retrieval Via Automatic Medical Image Diagnosis Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 Sabrine Benzarti, Wahiba Ben Abdessalem Karaa, and Henda Hajjami Ben Ghezala Gap-Filling of Missing Weather Conditions Data Using Support Vector Regression Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 Heba Osman, Nashwa El-Bendary, and Essam El Fakharany
xvi
Contents
Automating the Process of Faculty Evaluation in a Private Higher Institution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582 Adewole Adewumi, Olamide Laleye, Sanjay Misra, Rytis Maskeliūnas, Robertas Damaševičius, and Ravin Ahuja A Web Based System for the Discovery of Blood Banks and Donors in Emergencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 Babajide Ayeni, Olaperi Yeside Sowunmi, Sanjay Misra, Rytis Maskeliūnas, Robertas Damaševičius, and Ravin Ahuja Smart City Waste Management System Using Internet of Things and Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601 Aderemi A. Atayero, Segun I. Popoola, Rotimi Williams, Joke A. Badejo, and Sanjay Misra Employability Skills: A Web-Based Employer Appraisal System for Construction Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612 Afolabi Adedeji, Afolabi Ibukun, Ojelabi Rapheal, Sanjay Misra, and Ravin Ahuja A Prognosis Method for Esophageal Squamous Cell Carcinoma Based on CT Image and Three-Dimensional Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 Kaipeng Fan, Jifeng Guo, Bo Yang, Lin Wang, Lizhi Peng, Baosheng Li, Jian Zhu, and Ajith Abraham Age Distribution Adjustments in Human Resource Department Using Shuffled Frog Leaping Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 632 Tarun K. Sharma and Ajith Abraham Selection of Cloud Service Provider Based on Sampled Non-functional Attribute Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641 Mehul Mahrishi, Kamal Kant Hiran, and Ruchi Doshi Image Processing Techniques for Breast Cancer Detection: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649 Mahendra G. Kanojia, Mohd. Abuzar Mohd. Haroon Ansari, Niketa Gandhi, and S. K. Yadav Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
Data Jackets as Communicable Metadata for Potential Innovators – Toward Opening to Social Contexts Yukio Ohsawa1(B) , Sae Kondo2 , and Teruaki Hayashi1 1 Department of Systems Innovation, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo 113-8656, Japan [email protected] 2 Research Center for Advanced Science and Technology, The University of Tokyo, 4-6-1 Komaba, Meguro-Ku, Tokyo 153-8904, Japan
Abstract. Data Jackets are human-made metadata for each dataset, reflecting peoples’ subjective or potential interests. By visualizing the relevance among DJs, participants in the market of data think and talk about why and how they should combine the corresponding datasets. Even if the owners of data may hesitate to open their data to the public, they can present the DJs in the Innovators Marketplace on Data Jackets that is a platform for innovations. Here, participants communicate to find ideas to combine/use/reuse data or future collaborators. Furthermore, explicitly or implicitly required data can be searched by the use of tools developed on DJs, which enabled, for example, analogical inventions of data analysis methods. Thus, we realized a data-mediated birthplace of seeds in business and science. In this paper, we show a new direction to collect and use DJs to fit social requirements externalized and collected in living labs. The effect of living labs here is to enhance participants’ sensitivity to the contexts in the open society according to the author’s practices, and the use of DJs to these contexts means to develop the process of evidence-based innovation, i.e., the loop of living humans’ interaction to create dimensions of performance in businesses. Keywords: Innovation · Data jackets · Living lab
1 Introduction Since innovation appeared as such changes of the combinations of the factors of production as cannot be affected by infinitesimal steps or variations on the margin [1], it does not mean just inventing a product. Innovation is the process of commercial applications of new technology, combining with material, methods, and resources, toward opening up a new market. Rogers, after his theory of the diffusion of innovation involving various stakeholders in the process of innovation and expansion of the opened market, pointed out leading consumers play the role of innovators [2]. Here, not only the creators or developers of new products but also users play the important role to discover new value © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 1–13, 2021. https://doi.org/10.1007/978-3-030-49342-4_1
2
Y. Ohsawa et al.
of a product via using it and diffuse the value to the majority in the market. According to von Hippel [3], leading consumers invent, not only use and diffuse, technologies. All in all, innovation evolved to be a term referring to the thoughts and the interaction of stakeholders in the market including consumers. This point distinguishes innovation from a child’s talent of value sensing acquired in the growth of mind [4] or a part of sensemaking that can be supported by information systems using data [5]. That is, innovation is the interaction of stakeholders of potential markets via combining elements and “doing” the ideas in the real life, to cause a change that creates a dimension of performance [6] of products, the life of users, or the society. Innovators Marketplace on Data Jackets (IMDJ [7]) is a method following the above redefinition of innovation, where participants interact with combining data jackets shown in Sect. 2 to invent and execute ideas of data usage. In IMDJ, participants communicate to create solutions to satisfy data users’ requirements by sharing, combining, and using data without violating constraints of owners (e.g., data protection and confidentiality as a business resource). IMDJ has been used in science and business as stated in Sect. 2 and is now at the stage to prevail to daily human lives. In this paper, Living Lab on Data Jackets (LLDJ) is proposed as a modification of IMDJ for opening the communication and thoughts to a deeper and wider range of latent requirements than in IMDJ. We still aim at aiding innovations, that is not about sheer inventions but means the process of the humans’ interaction to externalize new dimensions of performance. The role of the living lab here is to open participants’ sensitivity to the requirements of people in the society who may not attend the workshop. In Sect. 2, a logical description of data jackets and humans’ process of communication for reasoning toward satisfying requirements are shown. IMDJ is briefly reviewed as a method to realize this process, and its limit is shown from the point of the gap between the requirements and the theory obtained in the reasoning. The living lab is introduced in Sect. 3 as an approach to coping with this limit by deepening and widening participants’ sensitivity to requirements. LLDJ is proposed in Sect. 4. This is not necessarily an improvement to replace IMDJ with, but an addition of a new direction from the viewpoint of daily living of people. The visualized sequence of utterances in a round-table discussion shown as preliminary evidence implies the effect of LL in LLDJ.
2 Data Jackets as Communicable Metadata 2.1 Data Jacket: The Definition and Its Role in Satisfying Requirements A data jacket (DJ hereafter, first introduced in [7]), is a piece of digest information of a dataset, that does not open the content of the data, but includes the title, the abstract, and variables, that may represent the subjective expectation of data owner or potential data users about the utility of the data. The idea comes from a jacket of a movie DVD in a shopping store, where only superficial information about the movie is shown for an exhibition. The content of data should be hidden to reduce the risk of being leaked to anyone who may harm the benefits of stakeholders. Such a policy of secure data management has been used in IMDJ where each data owner takes part in submitting DJs introduced below. In contrast to real data, DJs are easy to write and disclose for appealing the latent utility of corresponding data, via showing potential links between datasets. For example, DJs about personal health and food consumption can be disclosed
Data Jackets as Communicable Metadata for Potential Innovators
3
although the data may be confidential and combined for understanding the relevance of weather and health linked via “time” and “place” that are common variables between the two datasets or via the concept “daily behavior” common between them (Fig. 1).
Fig. 1. A snapshot of on-line IMDJ [9]. Solutions (squares e.g., “We can have..”) are proposed combining DJs (large cards e.g. DI1039) responding to requirements (e.g., “what are….”).
See Fig. 2 to find examples of simple DJs. More formally, a DJ is defined as follows by relaxing the constraint on V i and the redundancies in [8]. DJ i (i ∈ [1, N]): The i-th data jacket (N: the number of datasets in the market of data) DJ i : = {V i , F i , Pi }, where elements are defined as follows. V i : the set of variables in DJ i F i : elements of V i expressed as functions over other elements of V i Pi : the set of predicates that relate elements of V i G: The goal, i.e., the requirement incompletely defined as the relation over terms corresponding to events or entities in the target world T: The theory, i.e., a model described by a set of Horn clauses, each of which is given using predicates in PG below. T is represented over elements of PG , F G , and V G , that compose the set of DJs in DJcom(G) in Eq. (1), that satisfies Eq. (2) (where [v] for variable v means the range of the value of variable v), if a conclusion G’ derived by theory T subsumes goal G. This means a formal expression G’ is related to the informal expression G of goal, and T is completely defined that intuitively means all the clauses in T are supported by data corresponding to some DJcom(G). DJcom(G) := {DJa , DJb , . . . DJL } ⊆ {DJ1 , DJ2 , . . . DJN } where VG := Va ∪ Vb ∪ . . . ∪ VL , FG := Fa ∪ Fb ∪ . . . FL , PG := Pa ∪ Pb ∪ . . . PL , (1)
4
Y. Ohsawa et al. ∃
v ∈ VG [∀ Vx ∈ {Va , Vb , . . . VL }, ∃ vx ∈ Vx |[v] ∩ [vx ] = ∅].
(2)
For example, suppose G is the requirement to know the influence of weather on health, represented as “health ← weather”. By relating health to g-GTP_high(person ID, date) and weather to hot(date), G corresponds to G’ in clause (3). G :∃ person ID{γ − GTP_high(person ID, date) ← hot(date)}
(3)
G’ can be derived by the combination of clauses (4) and (5) by which T is formed. γ − GTP_high(person ID, date) ← beer_consume(person ID, date) ∃
person ID{beer_consume(person ID, date) ← hot(date)}
(4) (5)
Here hot(date) and γ-GTP_high(person ID, date) can respectively mean air_temperature (date) - air_ temperature (date −10) > α [deg] and γ-GTP(date)γ-GTP(date −10) > β [u/l] for constants α and β. The values α and β are obtained using data represented by DJs. For example, a can be obtained from Data B below. Data B, represented by DJ 1 ) that is DJ(B) in Fig. 2) weather: variables {date, address, air temperature, etc.}, a function such as air_temperature(date) in F 1 is also in V 1 defined over date in V 1 , and a predicate such as hot in P1 is defined on air_temperature and date. In Fig. 2(a), each dotted line connects the appearances of the same variable in multiple DJs to combine predicates, corresponding to sharing a variable among all V x used for deriving G’ as in Eq. (2). If the obtained T is not satisfactory (here the low confidence in Fig. 2(a) and (b)), other variables such as address in Fig. 2(b) in a DJ used so far as in Fig. 2 (a) are additionally used. Furthermore, as in Fig. 2 (c), new DJs may be added to DJcom(G) to obtain a satisfactory T and evaluate it by data corresponding to the DJs. 2.2 Innovators Marketplace on Data Jackets Our approach toward realizing such reasoning as in 2.1 or Fig. 2 has been IMDJ as summarized in the introduction. For aiding participants’ thought about the connectivity among DJs, a gaming board is made using KeyGraph [10] where some words or variables shared by multiple DJs are highlighted and positioned on bridges between the DJs. IMDJ starts with the set of DJs and the gaming board obtained from the set, followed by the process to propose and evaluate solutions based on combined DJs to meet the requirements of data users following the procedure exemplified in 2.1. Below let us show a few of the results obtained so far, that externalized the dimension of performance in decisions using data in businesses, that is the explanation of changes rather than the detection or prediction realized using machine learning technologies. Here, a TJ stands for a Tool Jacket [11] where a tool for using data (a method of AI, data visualization, or simulation) is summarized in the form of DJ i.e., the title, the abstract, and the input/output variables. Example 3 was realized on the analogy from the basis of Example 2 using DJ store [12], where the links between DJ3 and Req 2 had been learned from past IMDJ logs, by diverting the idea to use TJ1 to chance explanation from purchases in the market to earthquakes.
Data Jackets as Communicable Metadata for Potential Innovators
5
Fig. 2. The connection of DJs for combining data in IMDJ. To refine the performance of data mining, variables as “address” in (a) to (b) or data as DJ(B) or DJ(D) are in/exported.
6
Y. Ohsawa et al.
Fig. 3. Innovators’ Marketplace on Data Jackets of two types. Solid arrows mean without action planning, whereas dotted arrows with action planning. Action planning may cause requirement revision, which breaks the ideas created in IMDJ
Example 1, Skill development in sports) Req 1: Evaluate and improve the defense skill of a soccer team [13] DJ1: wide-view video DJ2: body direction Sol1: visualize “lines” of teammates on which to quickly pass a ball, that explains the skill of a defensive team to manage the changes in the offensive team Example 2, Change explanation in businesses) Req 2: Detect and explain causes of customers/investors’ behavioral shifts [14, 15] DJ3: data on the market e.g., position of the sale in a supermarket or stock prices TJ1: Tangled String or Graph-based entropy Sol2: user interface for explaining changes in the consumption market with visualized “explanatory” changes implying the latent dynamics in the market Example 3: Change explanation in nature) Req: Detect precursors of and explain changes in earthquakes [16] DJ4: Sequence of earthquakes in Japan DJ5: Location of seismographs in Japan DJ6: (The way of using) Position of sale data (as in Example 3) TJ2: Regional entropy on seismic information based on the idea of TJ1 Sol3: Entropy-based detection of precursors from the sequence of earthquakes However, it turned out that solutions tend not to be satisfactory enough to attract participants in IMDJ to realize the proposed solutions even if they were highly evaluated by the participants. We hypothesize here that the problem was in the lack of correspondence between G and a predicate in G’ derived by T, because we provided no explicit
Data Jackets as Communicable Metadata for Potential Innovators
7
user interface to urge subscribers of DJs to write their subjective expectations meaning the predicates, i.e., elements of P, but just to fill the DJ with their expectations about the utility of the data in natural language. Such expectations may partially cover some potential relations among variables, and the post-process of IMDJ called Action Planning introduced additional details of the planned use of data. In the Action Planning phase, the latent requirement that may be the reason of requirements presented as G in IMDJ was obtained and the solution corresponding to T was revised to meet this new goal. However, the new goal was just one level deeper (higher in Fig. 3) than G which may not reach the level of DJ, and the solution T obtained in IMDJ may get lost due to the goal revision. In such a case, it has been difficult to reach a shared awareness of the value of the data-based solutions to be obtained.
3 Living Lab for Enhancing the Sensitivity to the Open Society We expect to satisfy the requirements not satisfied by the previous IMDJ for the reason in 2.2, by inviting citizens to join the workshop in Living Labs discussed below to both deepen and widen causal desires to explain the originally presented requirements. In this section, let us discuss the expectations of the effects of combining LL and DJs. 3.1 Living Labs and Its Effects In recent years, the living lab (LL hereafter) has been attracting the attention of industry, government, and academia to create new solutions services by solving problems together. LL was born as a social participatory method that works from the viewpoint of consumers, mainly in northern Europe, and is regarded as a framework for the participation of various stakeholders supporting innovation and sustainable development in the community. Therefore, LL is expected as a mechanism for promoting wideranging social participation and changing individual consciousness near living spaces by introducing new, sometimes deepened aspects into communication about problems and solutions in the daily life. By this effect of LL, the proposed LLDJ below aims to overcome the problem of IMDJ mentioned in Sect. 2.2 inviting citizens and working people in the target region to (1) widen the scope of communication, and also (2) deepen the communication about potential requirements to reinforce the possibility of presented goals to reach the level of DJs. The studies so far on LL have been preceded by Europe. In particular, in recent years, interest has been attracted to the LL that aims to create innovations and infiltrate users with ICT as the core. Følstad (2008), who organized 32 references on this type, pointed out the elucidation of processes and methods [17]. In response, Leminen (2012) and Almirall & Wareham (2011) conducted analysis from the perspectives of management and participation methods and the roles of the parties involved [18, 19]. Neither method has yet been elucidated because the definition of innovation has not been explicitly clarified for each study. In other words, since the effect of LL is not clear, the evaluation index has not been established. This point is improved, and the meaning of introducing LL for the improving IMDJ comes to be clear in 3.2.
8
Y. Ohsawa et al.
3.2 Lessons for Livings from Organizational Citizenship Behavior Living lab activities are considered as voluntary and organized social contribution activities, and as an evaluation viewpoint of their effects, we focus on the effects of LL that contribute to Organizational Citizenship Behavior (OCB). OCB has been defined as individual behavior that is discretionary, not directly or explicitly recognized by the formal reward system, and that in the aggregate promotes the effective functioning of the organization [20]. In the sense that OCBs are not parts of the job description but are performed by an employee’s personal choice for positive contribution to the overall organizational effectiveness, Contextual Performance (non-task related work behaviors and activities contributing to the social and psychological aspects of the organization [21]) and Extra-role Behavior (behavior attempting to benefit the organization beyond existing role expectations [22]), are all in our target here to realize by LL. The explanatory scales on the five-factor model in [20] has also been developed, specified or extended in applying to industrial and governmental organizations. For example, the mediating effect of political skills as a scale includes the ability to sense the influence of individuals on others and the intentions of others, as well as the ability to build social agility and human relations [23]. On the other hand, it is known that the LL activities can result in the networking of participants and the expression of potential requirements. Since these results are thought to be related to the above-mentioned regulatory and explanatory factors of OCB, it is hypothesized that LL contributes to the enhancement of OCB using these factors. By introducing LL, via enhancing OCB and taking advantage of its effects, we can expect it raises the sensitivity of participants to deep and wide potential requirements that did not work well in the conventional IMDJ. As shown in Fig. 4, the abstracts of OCB and LL respectively collected from Wikipedia are visualized into one graph by KeyGraph to see the contact points between them, among the 117 words visualized. The words “work” “social” “personal” “life”, “experience”, “evaluation” and “context” are shared between OCB and LL, to which concepts related to collaborative problem solving such as “problem” “conflicts” “multidisciplinary” are linked. This implies, although the abstracts are weak as evidence, that the concept of evaluating the performance of individual persons in social contexts in the problem detection (i.e. requirement sensing) and of the organization to solve problems from multidisciplinary viewpoints are parts of the effects of living lab that can be expected from the perspective of the OCB. It was a lacking point. Thus, in the future workshops, we plan to introduce the process of the next chapter as Living Lab on Data Jackets (LLDJ) in Sect. 4.
4 Living Lab on Data Jackets The presented new process is the simple four steps below. Step 0) Set the topic Z, without a solution for requirements. Collect the initial participants in LL Step 1 (PLL ). Step 1) Open the LL relevant to topic Z (from the viewpoint of daily life, which means to communicate requirements and to propose solutions for the requirements. The requirements are deepened to latent requirements by asking the reasons for the requirements before proposing any solution (see the regulation mentioned below).
Data Jackets as Communicable Metadata for Potential Innovators
9
Fig. 4. The KeyGraph visualization of LL (right half) and OCB (left)
Step 2) Make the set RLL of requirements obtained in Step 1 (KeyGraph can be used here as stated later in Fig. 6). Step 3) Search DJs using words in the requirements in RLL as the query to DJ store [11], on which an IMDJ starts applying RLL as the initial requirements. Collect the participants in IMDJ (IMDJ ) relevant to these DJs and to the initial requirements. Step 4) The solution(s) and added requirements in Step 3 are returned to Step 1. Call participants relevant to these added items additionally to LL . For externalizing deeper requirements, the communication is regulated by setting a rule that each solution in Steps 1 and 3 must be proposed after asking a deep reasoning question i.e., “why is do you require it?”, based on the limit-handling framework in [24, 25]. LLDJ and IMDJ are compared in Fig. 5 and Table 1. As in Fig. 5, LLDJ contributes to solutions for more general social issues than sheer IMDJ where each requirement is shown by a participant and a solution is usually addressed to a few requirements. For externalizing deeper requirements, the communication is regulated by setting a rule that each solution in Steps 1 and 3 must be proposed after asking a deep reasoning question i.e., “why is do you require it?”, based on the theory of questions for design in [24]. LLDJ and IMDJ are compared in Fig. 5 and Table 1. As in Fig. 5, LL. Let us use KeyGraph in Fig. 6 to visualize utterances in the first, second, third, and the last quarter of utterances in a round-table discussion held for two hours inviting five workers in the Ota ward in Tokyo, two governmental workers, and three professors from universities in Tokyo. Ota has more than 3000 manufacturing firms 50% of which have less than four workers, that makes hard to employ young staff members. The topic
10
Y. Ohsawa et al.
Fig. 5. The structure of Living Lab with Data Jackets (LLDJ)
Table 1. The comparison of IMDJ versus LLDJ. Elements of a workshop
IMDJ
LLDJ
Participants
Fixed members including data providers/experts, data users, data scientists
Members revised by cycles (Step 1 to 4), including ordinary citizens in LL, and others similar to IMDJ
Visualization as common reference for participants
KeyGraph showing DJs and links between them
KeyGraph of words in LL and co-occurrence links between them, in addition to the graph for IMDJ. Both are revised by cycles
The communication
Proposing to use multiple data, corresponding DJs connected in KeyGraph, combined into one
Presenting and deepening requirements in the living context of people in LL, followed by IMDJ initiated by the requirements in LL
Structure of requirements
One or two layers
Can exceed two layers
of the discussion was “Networking of Young People and Middle/small Firms,” and had 340 utterances. The white rectangles show words proposing requirements (or suggesting
Data Jackets as Communicable Metadata for Potential Innovators
11
problems) in the region from either side (young people or firm managers), and the black ones the solutions or deepened latent requirements behind the requirements presented in the previous quarter. For example, it was pointed out that there are problems in the education of students in the 1st quarter, which was deepened to the requirement to clarify the utility of lessons in schools for working in each job category. About the words in the questionnaires by the government (2nd quarter), participants came to require the clarification of influence of students’ concerns about the mood in workplaces and the evaluation of workers to the students’ choice of things to learn and places to work (3rd ). In the 4th , methods for education (e.g. OJT) and managements are proposed, with open problems corresponding to deepened requirements. This result shows an example where a communication inviting real living sites works to meet our aim to deepen the requirements, and this effect can be aided by visualization of words as in Step 2.
Fig. 6. A sequence of graphs on KeyGraph for the four segments of a round-table discussion
5 Conclusions We first redefined innovation based on the original definition by Schumpeter, and redefined also data jackets on which the effect of IMDJ for innovation and the problem
12
Y. Ohsawa et al.
for IMDJ are shown. Then LLDJ is proposed as a method to deepen and widen the requirement to be shown from communication inviting local aspects in daily living to externalize general issues that can be closer to DJs than a requirement in IMDJ. This effect is not only due to covering a wider range of requirements, but also due to the tendency that data tends to be collected for general purposes. In future work, we plan to design new DJs, that are revisable and extensible reflecting new expectations about data usage to further reinforce the effects of and take advantage of LLDJ. Authors Contributions. Ohsawa invented DJ and IMDJ, and organizes this project of LLDJ. Kondo has been executing Living Lab at the University of Tokyo, which lead her to the finding that the effects of Living Lab go via the enhancement of the participants’ sensitivity to the interests in the open society. Hayashi contributed to the creation of technologies supporting IMDJ, e.g., the DJ Store and Action Planning.
References 1. 2. 3. 4. 5.
6. 7. 8.
9.
10. 11. 12.
13. 14. 15. 16. 17.
Schumpeter, J.A., Theorie der wirtschaftlichen Entwicklung, Duncker & Humblot (1912) Rogers, E.M.: Diffusion of Innovations, 5th edn. Free Press (2003) von Hippel, E.: Democratizing Innovation. New edn. The MIT Press (2006) Donaldson, M.: Human Minds: An Exploration. The Penguin Press, Allen/Lane (1992) Dervin, B.: From the mind’s eye of the user: the sense-making qualitative-quantitative methodology. In: Glazier, J.D., Powell, R.R. (eds.) Qualitative Research in Information Management, Englewood, CO, pp. 61–84 (1992) Drucker, P.F.: The discipline of innovation. Harvard Bus. Rev. 63(3), 67–73 (1985) Ohsawa, Y., Kido, H., Hayashi, T., Liu, C.: Data jackets for synthesizing values in the market of data. Procedia Comput. Sci. 22, 709–716 (2013) Ohsawa, Y., Hayashi, T., Kido, H.: Restructuring incomplete models in innovators marketplace on data jackets. In: Magnani, L., Bertolotti, T. (eds.) Handbook of Model-Based Science, pp. 1015–1031. Springer (2017) Iwasa, D., Hayashi, T., Ohsawa, Y.: Development and evaluation of a new platform for accelerating cross-domain data exchange and cooperation. New Gener. Comput. 38(1), 65–96 (2019) Ohsawa, Y.: KeyGraph: visualized structure among event clusters. In: Ohsawa, Y., McBurney, P. (eds.) Chance Discovery, pp. 262–275. Springer (2003) Hayashi, T., Ohsawa, Y.: Data jacket store: structuring knowledge of data utilization and retrieval system. Trans. Japan. Soc. Artif. Intell. 31(5), A-G15_1 (2016) Hayashi, T., Ohsawa, Y.: Meta-data generation of analysis tools and connection with structured meta-data of datasets. In: Proceedings of 3rd International Conference on Signal Processing and Integrated Networks, pp. 226–231 (2016) Takemura, K., Hayashi, T., Ohsawa, Y., Aihara, D., Sugawa, A.: Computational coach support using soccer videos and visualization. IEICE-TR 117(440), 93–98 (2018). in Japanese Ohsawa, Y.: Graph-based entropy for detecting explanatory signs of changes in market. Rev. Socionetw. Strat. 12(2), 183–203 (2018). https://doi.org/10.1007/s12626-018-0023-8 Ohsawa, Y., Hayashi, T., Yoshino, T.: Tangled string for multi-timescale explanation of changes in stock market. Information 10(3), 118 (2019) Ohsawa, Y.: Regional seismic information entropy for detecting earthquake activation precursors. Entropy 20(11), 861 (2018) Følstad, A.: Living labs for innovation and development of information and communication technology: a literature review. EJ. Virtual Organ. Netw. 10, 99–131 (2008)
Data Jackets as Communicable Metadata for Potential Innovators
13
18. Leminen, S., Westerlund, M., Nyström, A.G.: Living labs as open-innovation networks. Technol. Innov. Manag. Rev. 2(9), 6–12 (2012) 19. Almirall, E., Wareham, J.: Living labs: arbiters of mid- and ground-level innovation. Technol. Anal. Strateg. Manag. 23(1), 87–102 (2011) 20. Organ, D.W.: A restatement of the satisfaction-performance hypothesis. J. Manag. 14(4), 547–557 (1988) 21. Borman, W.C., Motowidlo, S.J.: Expanding the criterion domain to include elements of contextual performance. In: Schmitt, N., Borman, W.C., (eds.) Personnel Selection in Organizations, pp. 71–98. Jossey-Bass, San-Francisco (1993) 22. Dyne, V.L., Cummings, L.L., McLean Parks, J.: Extra-role behaviors: in pursuit of construct and definitional clarity. Res. Organ. Behav. 17, 215–285 (1995) 23. Ohshima, R., Miyazaki, G., Haga, S.: The mediating effect of political skill in influencing the effect of the big five personality domains on organizational citizenship behavior. Jpn. Assoc. Ind./Organ. Psychol. J. 32(1), 31–41 (2018) 24. Eris, O.: Effective Inquiry for Innovative Engineering Design. Kluwer Academic, Dordrecht (2004) 25. Eris, O., Bergner, D., Jung, M., Leifer, L.: ConExSIR: a dialogue-based framework of design team thinking and discovery. In: Ohsawa, Y., Tsumoto, S. (eds.) Chance Discoveries in Real World Decision Making (SCI30), pp. 329–344. Springer, Heidelberg (2006)
A Proposal Based on Instance Typicality for Dealing with Nominal Attribute Values in Instance-Based Learning Environments S. V. Gonçalves(B) and M. C. Nicoletti Centro Universitário Campo Limpo Paulista - PMCC, Rua Guatemala 167, Limpo Paulista, SP 13231-230, Brazil [email protected]
Abstract. Instance-Based Learning (IBL) is a research area with focus on supervised algorithms that use the training instances as the expression of the learned concept. Usually training instances are described by vectors of attribute values and an associated class. Attributes can be of different types, depending on the values they represent and, usually, can be of discrete or continuous type. A subtype of the discrete type is known as nominal. Attributes of nominal type usually represent categories and there is no order among its values. This paper proposes and investigates an alternative strategy for dealing with nominal attributes during the classification phase of the well-known instance-based algorithm NN (Nearest Neighbor). The proposed strategy is based on the concept of typicality of an instance, which can be taken into account as a possible tiebreaker, in situations where the instance to be classified has more than one nearest neighbor. Experiments using the proposed strategy and the default random strategy used by the conventional NN show that the typicality-based strategy can be a convenient choice to improve accuracy, when data instances have nominal attributes among the attributes that describe them. Keywords: Instance-Based learning · Nearest neighbor · Typicality
1 Introduction Instance-Based Learning (IBL) [1, 2] is a research area within Machine Learning (ML) [5, 8, 11] whose main focus is on supervised algorithms that use training instances as the representation of the learned concept. By doing that the so called generalization process, which usually happens during the learning phase of ML algorithms, in an IBL environment is postponed until the classification phase. Generally the learning phase of a typical IBL algorithm consists of just storing the training set. One of the most widely successful algorithms that has been used as the basis for many others instance-based algorithms is the Nearest Neighbor (NN) [7]. The NN usually employs a distance function to determine the degree of similarity between each of the stored instances and the new instance to be classified. Despite its good classification accuracy in many different application areas [6, 9], the NN algorithm does not work © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 14–23, 2021. https://doi.org/10.1007/978-3-030-49342-4_2
Instance Typicality in IBL Environments
15
efficiently with attributes that have nominal values or, then, attributes that for some reason have missing values [5]. Besides, in addition to the inconvenience of not being measurable, nominal attributes usually have domains with few distinct values. This particularity implies that the number of possible value combinations for describing data instances is much smaller, when compared to the numbers associated with continuous attributes. Therefore, if the instance set has a high predominance of nominal attributes, when the distance function is used for determining the closest stored instance to the new instance to be classified, very often there are ties in the values returned by the function. The research work reported in this paper investigates possible tie situations between values of the distance function, when a new instance to be classified is input to the NN algorithm and proposes an alternative tiebreaker based on the typicality of each instance, as opposed to the tiebreaker suggested by traditional algorithms. The reminder of this paper is organized as follows: Sect. 2 presents a review of the algorithm NN. Section 3 approaches the problem related to attributes characterized as nominal. Section 4 approaches and discusses a criteria for dealing with a situation of ties. Section 5 presents the experiments conducted for evaluating the contribution of the proposed strategy, based on typicality; the section also presents an analysis of the obtained results. Finally Sect. 6 concludes the report on the work done and highlights some possible directions for exploring the continuity of the work.
2 Brief Review of the Nearest Neighbor Algorithm The Nearest Neighbor (NN) [7], briefly presented in Fig. 1 considering it is the focus of the work described in this paper, is one of the earlier IBL algorithms which is still successfully employed in many different tasks. In spite of presenting a few drawbacks, mostly related to the fact that it stores the whole set of instances and, also, is very sensitive to the presence of noisy data, it is very popular, due to the conceptual idea it is based upon and, also, because it is easily implementable. Training algorithm: • store training set with N training instances, TNN = {(x1, θ1), (x2, θ2), ..., (xN, θN)}, where: (a) xi (1 ≤ i ≤ N) is a M dimensional vector of attribute values: xi = (xi1, xi2, ..., xiM) (b) and θi ∈ {1, 2, ..., S}, is the correct class of xi (1 ≤ i ≤ N). Classification algorithm: • given an instance xq to be classified, the decision rule implemented by the NN decides that xq has class θj if d(x, xj) ≤ d(x, xi)
1≤i≤N
where d is a M-dimensional distance metric. Fig. 1. High level pseudocode of the NN, based on the description given in [10].
16
S. V. Gonçalves and M. C. Nicoletti
The algorithm assumes that all stored instances correspond to points in a Mdimensional space RM , where R is the set of real numbers. In Fig. 1 the decision rule establishing θj as the class of a new data instance x i.e., d (x, xj ) ≤ d (x, xi ), 1 ≤ i ≤ N, is referred to as 1-NN rule, since the decision about the class of the new instance is based on one instance only, the one nearest to the new instance. The process of classifying a new instance can be extended by considering a larger number (k) of nearest neighbors of the instance to be classified, which gives rise to the algorithm version known as k nearest neighbors (k-NN), which is among the most influential data mining algorithms in the research community [16]. In the original proposal of the NN algorithm, in a tie situation such as when there are several instances with the same degree of similarity to the instance to be classified, it is recommended when possible, to adopt the most frequent class among these instances, as the class of the new instance.
3 Considerations About Instances with Nominal Attributes Algorithms that use distance functions to calculate the similarity between instances, such as the NN, k-NN and several other instance-based algorithms, are unable to deal with the fact that nominal attributes cannot be quantified, which prevents their use in mathematical expressions and, subsequently, been numerically analyzed. Several proposals of distance functions for attributes of nominal type can be found in the literature. Reference [8] presents a detailed review of several proposals of distance functions grouped by the type of data they are more suitable to deal with. A convenient choice for implementing the distance function in applications where data instances are described by mixed types of attribute domains is by using a distance function that combines different distance functions on different types of attributes, as does the Heterogeneous Euclidean-Overlap Metric (HEOM) defined by Eq. (1), as presented in [15], where x and y are two data instances described by M attributes, and xa and ya are the values of attribute a in instances x and y, respectively. M 2 HEOM(x, y) = da xa , ya (1) a=1
where, ⎧ 1 if xa or ya is unknown, else ⎨ da xa , ya = overlaps(xa , ya ) if a is nominal else ⎩ if a is continous rn_diffa xa , ya The overlaps and the rn_diff a functions are defined by Eq. (2) and (3), respectively. In Eq. (3) rangea = maxa - mina , where maxa and mina represent the maximum and minimum values of attribute a in the training set. 0 if xa = ya , overlaps xa , ya = (2) 1 otherwise rn_diff a(xa , ya ) =
|xa ya | rangea
(3)
Instance Typicality in IBL Environments
17
The da function previously defined, returns as distance the value of 1 (i.e., the maximum distance) in case of missing attribute values. When the attribute is nominal, the da function passes the task on to the function overlaps (Eq. (2)) which returns the value 0 if both attribute values (xa and ya ) are the same and returns value 1, otherwise. When the attribute is continuous, the da function passes the task on to function rn_diffa , which calculates the normalized Euclidean distance, as shown in Eq. (3). As pointed out by the authors, the normalization serves to scale the attribute values down, to the point where differences are almost always less than one.
4 Dealing with Ties Based on the Typicality of Instances When a new instance should be classified, sets of data instances where instances are predominantly described by nominal attributes are more prone to have ties, as far as the distance function values are concerned. To exemplify the situation, Table 1 presents a few data instances from the Led7digit dataset, downloaded from the KEEL Repository [3]. For the considerations and discussion that follows, the chosen instances are the training instances that have been stored as the expression of the concept, in an IBL environment. Table 1. Instances from the Led7digit data set. #ID Attributes
Class
dp1 dp2 dp3 dp4 dp5 dp6 dp7 003 1
1
1
0
1
1
1
0
004 1
1
1
0
1
1
1
0
126 1
1
1
0
1
1
1
9
158 1
1
0
1
1
1
1
6
As it can be confirmed in Table 1, instances 003 and 004 have the same attribute values and the same class i.e., they are identical instances and, as such, in a typical ML environment, one of them could be removed without causing any impact on the classification task by an instance-based algorithm. The instance identified as 126 has the same seven attribute values as instances 003 and 004 but they differ in relation to the class they represent. This type of instance occurrences are identified as contradictory instances (i.e., 126 and 003 are contradictory instances as well as 126 and 004 are contradictory instances). Any new instance to be classified whose nearest neighbor is instance 126, will also have as nearest neighbors instances 003 and 004, which prompts for a tie situation as far as the values of the distance function are concerned, considering that the new instance will be at the same distance to three instances: two identical instances belonging to class 0 and one belonging to class 9. In this situation the approach used in this work, that takes into account typicality, will assign to the new instance the class 0. If instead of two identical instances only one
18
S. V. Gonçalves and M. C. Nicoletti
of them was present, the random rule would be used, although the situation would be detected as contradictory. Another tie situation to be considered for analysis can be observed during the classification phase of an instance-based algorithm. Let the set of stored instances be {003, 004, 126, 158} and consider a new instance of unknown class to be classified, identified as 500 and described by the vector of attribute values [1 1 1 1 1 1 1]. The instance 500 differs from instances 003, 004 and 126 in relation to their corresponding values associated to the attribute dp4 only. Also, instance 500 differs from instance 158 in relation to the value of their dp3 attribute only. So, considering that instance 500 is at the same distance from the four stored instances, a tie situation arises. In this situation class 0 will be assigned to the instance 500, taking into account the typicality of the identical instances 003 and 004. In the strategy proposed to deal with tie situations adopted for the experiments presented in Sect. 6, identical instances are not removed from the set of instances to be stored because they reinforce the concept of typicality of the concept represented by the instances. Typicality has been the object of several studies, such as those in [12–14]. The concept of typicality proposed in [12] is based on the concepts of family resemblance, investigated in a study in the internal structure of categories, conducted by the authors Rosh and Mervis who explored in their study “the hypothesis that the members of categories which are considered most prototypical are those with most attributes in common with other members of the category and least attributes in common with other categories” (see also [4]). The previous hypothesis has been customized for an IBL environment by Zhang in [17] when commenting that “the more similar an instance is to other concept instances and the less similar it is to instances of contrast concepts, the higher its family resemblance, and the more typical it is of its concept.” Typicality is based on the assumption that instances cannot be treated equally but instead, as having a degree of ‘representativeness’ in relation to the concept they represent. Two basic concepts for determining the typicality of an instance are: (1) the intraconceptual similarity, defined as the average similarity of one instance to other instances of the same class and (2) the inter-conceptual similarity, defined as the average similarity of one instance to instances of a different class. In a ML environment where data are assumed to be normalized, the similarity between two instances x and y can be defined as similarity(x, y) = 1−distance (x, y). In the experiments described in Sect. 6, the distance function used was the HEOM, as presented in Sect. 3, slightly modified with the inclusion of a weighting factor given by the inverse of the number of attributes that describe the instances, as shown in Eq. (4) and proposed in [17]. The weighting factor equalizes the results for typicality regardless of the number of attributes that describe instances. 2 1 M ∗ da xa , ya (4) similarity(x, y) = 1 − a=1 M The typicality of an instance is defined as the value resulting from dividing its intraconceptual similarity value by its inter-conceptual similarity value, as shown in Eq. (5). Therefore, the greater the intra-conceptual similarity and the lower the inter-conceptual similarity, the greater the value of an instance’s typicality. As pointed out in [17], “the
Instance Typicality in IBL Environments
19
typicality of typical instances is much larger than 1, boundary instances have typicality close to 1, and instances with typicality less than 1 are either noise or exceptions”. typicality(x) =
similarity(x, y) | class(x) = class(y) similarity(x, y) class(x) = class(y)
(5)
Figure 2 show the typicality of the instances stored and a new instance x9 = [1.5 2.5] (dashed circle) to be classified. The figure also shows the calculations performed to obtain the typicality of an instance, using the most atypical instance of the set (x3 ) as an example. Usually typical instances indicate the profile of the concept and represent the concept better than atypical instances. Based on that, in situations in a ML environment where a new instance has the same distance to two or more nearest neighbors of different classes, the tiebreaker of choosing the class of the most typical instance as the class of the new instance implies the choice of the most common class in that particular region of the instance space.
Fig. 2. Typicality values associated to the eight instances {x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 } and a new instance (x9 ) to be classified.
The distances between x9 and x3 , x9 and x5 , x9 and x6 and x9 and x7 are all the same. According to NN´s tiebreaker random choice rule, instance x9 has a 25% chance of being classified as class 1. If the tiebreaker rule adopted is based on the order the instances have been stored, x9 would be classified as class 1, taking into account that instance x3 was the first instance stored among the four nearest neighbors of x9 . By the tiebreaker typicality based rule, x9 would be classified as class 2, because x6 is the nearest neighbor of x9 that has the higher typicality.
20
S. V. Gonçalves and M. C. Nicoletti
5 Methodology, Experiments, Results and Analysis To investigate the performance of the NN tiebreaker criterion based on the value of instance typicality, first a group of 20 sets of instances, describes on Table 2, was selected from the KEEL Repository [3] with focus on promoting a diverse group of data sets, taking into account number of instances, number and types of attributes, presence or not of identical instances and contradictory instances in the set. Table 2. Description of the main characteristics of the 20 sets of instances used in the experiments, where columns: #I: number of instances, #Identical: identical instances, #Contradictory: contradictory instances, #CA: continuous attributes and #NA: nominal attributes. Dataset
#I
#Identical #Contradictory #CA #NA
Australian
690
0
0
8
6
Balance
625
0
0
0
4
Banana
5,300
8
1
2
0
277
14
6
0
9
1,728
0
0
0
6
Chess
3,196
0
0
0
36
Coil2000
9,822 1,442
119
1
84
Contraceptive
1,473
48
67
2
7
Flare
Breast Car
1,006
701
78
0
11
Haberman
306
17
6
3
0
Hayes.roth
160
67
9
0
4
Housevotes
231
72
0
0
16
Led7Digit
500
354
61
0
7
Lymphography
148
0
0
0
18
830
266
45
1
4
6,876
625
620
0
13
Mammographic Marketing Monk.2 Mushroom Titactoe
432
0
0
0
6
5,644
0
0
0
22
958
0
0
0
9
Considering the work had its focus on nominal attributes, 14 out of 20 have their corresponding instances described only by nominal attributes, 4 out of 20 have occurrences of both types of attributes i.e., nominal and continuous and 2 datasets have their corresponding instances described only by attributes of continuous type. The low number of data sets with continuous attributes only is due to the nature of the attribute values (real numbers) that describe their instances; the occurrence of a tie in an IBL environment where the stored instances are vectors of real numbers, is not a frequent event. However,
Instance Typicality in IBL Environments
21
if the number of attributes that describe the instances is quite small, eventually a tie may occur, particularly when the real values in question are integers values. Each data set was used in a 10-cross-validation process [10] taking into account three different strategies of breaking ties. As discussed earlier in this work, a tie happens when the distance function used by the NN algorithm during the classification phase of the algorithm identifies more than one instance as the nearest neighbor of the instance to be classified. The three tiebreaker strategies considered are (1) the class of the new instance is randomly chosen among the classes of its nearest neighbors, (2) the new instance is classified with the class of its nearest instance with the smaller value of typicality (identified as NNtip−); (3) the new instance is classified with the class of its nearest instance with the highest value of typicality (identified as NNtip +). Results from the 20 experiments involving the execution of 10-cross-validation processes can be seen in Table 3. Table 3. Accuracy values, where columns: Dataset: name of the dataset, NN Accuracy (%): using as tiebreaker criteria the random choice of nearest neighbor instance, NNtip-Accuracy (%): tiebreaker criteria with the lowest value of typicality and NNtip+Accuracy (%):tiebreaker criteria with the highest value of typicality. Dataset
NN-Accuracy (%)
NNtip-Accuracy (%)
NNtip+Accuracy (%)
Australian
82.1739
82.1739
82.1739
Balance
56.9925
44.6568
87.9948
Banana
87.5849
87.5660
87.6038
Breast
72.1534
82.5168
75.4565
Car
75.4655
70.0235
63.0189
Chess
85.2152
84.3681
87.1318
Coil2000
89.5235
89.4523
89.3607
Contraceptive
44.0646
44.1336
44.2692
Flare
69.6077
58.3504
72.1434
Haberman
65.0323
64.3763
66.0000
Hayes.roth
70.6250
78.1250
68.7500
Housevotes
91.4789
91.6144
90.8753
Led7Digit
63.6000
29.0000
72.6000
Lymphography
87.4118
83.5518
87.4118
Mammographic
73.7727
70.1262
79.1317
Marketing
28.2027
23.0951
29.8298
Monk.2
90.9147
49.6957
81.7030
Mushroom
100.000
100.000
100.0000
Titactoe
67.0885
64.8791
72.9866
Average
74.8371
69.7602
76.7554
22
S. V. Gonçalves and M. C. Nicoletti
As it can be confirmed in the numbers shown in Table 3, the NNtip+ obtained the highest average value among the 20 datasets, taking into account the three strategies employed i.e., 76.7554%, against the average accuracy of 74.8371% reached by the random strategy and the average accuracy of 69.7602% reached by the NNtip−. The difference in results between choosing the nearest neighbor with the lowest typicality and the nearest neighbor with the highest typicality reinforces the influence of typicality as a good criteria to be considered for tiebreaker. Analyzing the results taking into account the instance sets individually, it can be seen that the NNtip+ had the best results in 10 out of the 20 data sets and in 5 of them, the accuracy results were 5% higher. In 3 out of the 20 data sets, the results obtained by the NN implementing the random tiebreaker results had the best accuracy values. The NNtip- obtained the best accuracy results in for data sets; in 2 of them, particularly, had expressively better results. It is important to mention that in two data sets there was no occurrence of ties and, as a consequence, the use of the NN with each of the three strategies had results with the same accuracy. In the conducted experiments based on the Lymphograph set, the results given by the NN and the NNtip+ had the same accuracy, which could be considered a coincidence provoked by the NN’s random tiebreaker.
6 Final Remarks This paper reports the proposal of a strategy to be used during the classification phase of instance-based learning algorithms and particularly, of the well-known Nearest Neighbor (NN) algorithm. The strategy is based on the concept of typicality of an instance, used as a tiebreaker in situations where the instance to be classified has more than one nearest neighbor. The motivation for the proposal of the strategy was the fact that the strategy adopted by the original NN, in situations of ties, is the random choice between the nearest neighbors of the new instance to be classified. As pointed out in sections of this paper, the concept of typicality is particularly relevant when instances are also described by attributes having nominal type. It can be found in the literature several other proposals of distance functions for nominal attributes that can be considered more refined alternatives than the simplistic value overlap used by the HEOM. Among them, the Value Difference Metric (VDM) [14], that defines the distance between two values xa and ya of an attribute a to reflect the fact that the two values are closer if they are used for describing instances of the same class. However, empirical results indicate that the HEOM distance function has classification accuracy higher than the original Euclidean distance function and, also, preserves approximately the same computational efficiency [15]. The results of the conducted experiments, presented in Sect. 6, are evidence that the use of the typicality values of the instances involved in a tie situation, tend to help a better choice of the more representative instance of the concept, among them. The help typicality provides can particularly favor data described by nominal attributes. The work described in this paper will be continued by considering another tiebreaker criteria, involving also the use of the information contained in the performance classification records associated with each stored instances, as implemented by algorithms of the IBL
Instance Typicality in IBL Environments
23
family [2]. Instance-based learning algorithms as well as any other ML algorithm can benefit from pre-processing the set of instances aiming at the removal of contradictory instances. Acknowledgment. Authors are grateful to UNIFACCAMP, C. Limpo Paulista, SP, Brazil, for the support received. The first author is also thankful to the CNPq. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
References 1. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6, 37–66 (1991) 2. Aha, D.W.: Lazy Learning. Springer, Dordrecht (2013) 3. Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Log. Soft Comput. 17, 255–287 (2011). https:// sci2s.ugr.es/keel/datasets.php 4. Barsalou, L.: Ideals, central tendency, and frequency of instantiation as determinants of graded structure in categories. J. Exp. Psychol.: Learn. Memory Cogn. 11, 624–629 (1985) 5. Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F.: Guide to Intelligent Data Analysis. Springer (2010) 6. Brighton, H., Mellish, C.: Advances in instance selection for instance-based learning algorithms. Data Min. Knowl. Discov. 6, 153–172 (2002) 7. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967) 8. Gan, G., Ma, C., Wu, J.: Data Clustering Theory, Algorithms, and Applications. ASA/SIAM Publishers (2007) 9. Grochowski, M., Jankowski, N.: Comparison of instance selection algorithms II, results and comments. LNCS, vol. 3070, pp. 580–585. Springer (2004) 10. Moreno-Torres, J.G., Sáez, J.A., Herrera, F.: Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans. Neural Netw. Learn. Syst. 23, 1304–1312 (2012) 11. Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997) 12. Rosch, E., Mervis, C.B.: Family resemblances: studies in the internal structure of categories. Cogn. Psychol. 7(4), 573–605 (1975) 13. Schul, Y., Burnstein, E.: Judging the typicality of an instance: should the category be accessed first? J. Pers. Soc. Psychol. 58(6), 964–974 (1990) 14. Stanfill, C., Waltz, D.: Toward memory-based reasoning. Commun. ACM 29(12), 1213–1228 (1972) 15. Wilson, D.R., Martinez, T.R.: Improved heterogeneous distance function. J. Artif. Intell. Res. 6, 1–34 (1997) 16. Wu, X., Kumar, Y., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.-H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14, 1–13 (2008) 17. Zhang, J.: Selecting typical instances in instance-based learning. In: Proceedings of the Ninth International Machine Learning Conference, pp. 470–479 (1972)
Dataset for Intrusion Detection in Mobile Ad-Hoc Networks Rahma Meddeb(B) , Bayrem Triki, Farah Jemili, and Ouajdi Korbaa ISITCom, MARS Research Laboratory, LR17ES05, Universite de Sousse, 4011 Hammam Sousse, Tunisia {rahma.elmeddeb,ouajdi.korbaa}@mars.rnu.tn, [email protected], jmili [email protected]
Abstract. Mobile Ad-Hoc Networks (MANETs) represent an evolutionary tendency in mobile computing to guarantee future communication. They are among the key networks employed to switch wired networks to wireless networks. The open environment and the large distribution of nodes usually result in its susceptibility to many malicious attackers. Therefore, it is crucially recommended to develop an efficient Intrusion Detection System (IDS) to defend against attacks. In this study, we display the most prominent models for through incorporating machine learning in MANET scenario. We devise a new process to create a quality dataset and generate a high quality of IDS assessment dataset designed for machine learning analysis. This study highlights two key contributions to guarantee high detection of malicious nodes in MANET. The First one deals with data collection, which aims to detect intruders. A necessary fuzzy classification has been projected through the use of a Fuzzy Inference System (FIS) to identify and describe the node behavior. The second one employs a supervised machine learning algorithm for evaluation dataset. Keywords: Mobile Ad-Hoc Network · Intrusion Detection System Data collection · Fuzzy Inference System · Labeled dataset · Evaluation dataset · Machine learning algorithm
1
·
Introduction
Recent advances in the field of mobile devices, as well as the evolution of the user’s requirements, have enabled unlimited service access. One of the leading technologies which have become widely known in this field is Mobile Ad-Hoc Networks. They have been designed to be provided with efficient communications, based on the deployment of network infrastructure, which is expensive impractical. However, MANET is insecure following the network topology changes unpredictably as nodes access and exit the network. Accordingly, there is no clear line This work has been done as part of the Tunisian National Research Project PACTEProfiler. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 24–34, 2021. https://doi.org/10.1007/978-3-030-49342-4_3
Dataset for Intrusion Detection in Mobile Ad-Hoc Networks
25
of security, and the malicious node can quickly attack and carry out some malicious activity, which quite difficult to detect [1]. In this case, intrusion detection is defined as a monitoring process every malicious behavior. The scheme or system is worth mentioning that is IDS or Intrusion Detection System. There are several IDS, which were developed and dedicated to the wired network [2]. Nonetheless, these systems cannot be used with MANET because of their complicated network properties. In this context, researchers proposed a good intrusion detection model for MANET. Network-based Intrusion Detection System (NIDS) is the most efficient way to defend against attacks in MANET [3]. In fact, it needs a representative of the network traffic, which can be presented by datasets. Given a labeled dataset in which each data value is indicated as a normal or abnormal class. The purpose of writing this article is to develop a new labeled dataset, we put the detection of the mobile malicious nodes under the scope. In addition, we highlight the features that marked the malicious nodes and defined the properties of relevant data. This paper aims at updating the proposed dataset to keep it effective and obtain a high-quality evaluation dataset. The attack signature is created out of fuzzy dynamic signatures. This paper is divided into different sections. Section 2 gives a literature review to the existing evaluation datasets. As for Sect. 3, it introduces the suggested architecture and highlights our contribution to detecting intruders. The simulation results are presented in Sect. 4 which demonstrates that the proposed solution is adequate in terms of intrusion detection in MANET. The closure of this paper presents our findings and mentions the implications for further investigation.
2
Network-Based IDSs: Problem Statement
Network-based intrusion detection systems may be defined in terms of Anomalybased and Signature-based. Anomaly-based model refers to normal behavior based on training data and checking deviations from the learned behavior as malicious activity. Misuse-based uses signatures of known attacks and matches incoming network traffic with these signatures to detect intruders [4]. Thus, both of them need a representative of network traffic, which can be presented by a dataset. Moreover, benchmark datasets are a kind of basis to compare and evaluate the quality of various NIDS. Malowidzki et al. [5] address missing datasets as an important problem for intrusion detection and set up specifications for existing datasets. Tarrah et al. presented a large variety of works which apply an existing supervised machine learning techniques and a list of datasets appearing in the research literature [6]. Haider et al proposed an IDS dataset based on fuzzy qualitative modeling according to the five enterprises’ traffic scenarios [7]. However, it is difficult to find appropriate an anomaly-based datasets to evaluate a mobile network. This lack of adequate IDS datasets allowed the genesis of an advanced new evaluation dataset dedicated to MANET. A labeled dataset based on FIS in which each data values are indicated to the class normal or abnormal is proposed.
26
3
R. Meddeb et al.
Optimization Model IDS for MANET
The existing datasets are perceived as critical due to the lack of benchmark of mobile networks evaluation datasets. We presented the proposed approach from the related study that correlates with our work. Accordingly, we propose an optimization model for the intrusion detection system dedicated to mobile network. In providing further clarification, we presented the approach from the related study that correlated with our approach which is a recent paper by Meddeb et al. [8]. The authors present three algorithms for building evaluation dataset for detecting DoS attacks. The main differences between this approach and the previous work are that we use a labeled dataset to improve the precision and the performance of detection. Furthermore, we employ the machine learning algorithm that can be trained and used to detect malicious nodes. The suggested architecture (see Fig. 1) aims to create a standardized process for generating quality IDS datasets, and also to construct high-quality assessment datasets useful for machine learning analysis. Two of the main contributions are elaborated on to guarantee high intrusion detection. The first one is related to data collection, which is aimed at detecting intruders. The second one lies in utilizing a machine learning algorithm for an evaluation dataset.
Fig. 1. Optimization model IDS for MANET
Dataset for Intrusion Detection in Mobile Ad-Hoc Networks
3.1
27
Dataset Generation Process
This process focuses on data collection mechanisms and the generation of the attack scenarios. It is made of the following components: – Data Collection Training Module is responsible for collecting traffic data in a mobile network; – Data Preprocessing Module is effective for obtaining relevant features; – Fuzzy Signature Module is reliable for analyzing the collected data against suspicious activities, defining the class labels and patterns of a signature. Data Collection Training Module: It shows the processed traffic collection and abstraction. Due to the specificity of MANETs, the nodes communicate within wireless links. Each node can function as a sender, a receiver or a router. During the data collection process, the network cannot guarantee the same route to reach the selected destination route. Therefore, some supplementary nodes are required. We consider two kinds of mobile nodes. The first ones include the nodes that will be used as a router to forward packets. The second ones encompass the nodes with special capacities that are used to collect data for the IDS. These IDS nodes are supposed to be protected in order to trust the integrity of the collected data. This specific node is used as both the routing node and the data collection node. In the process of network monitoring, IDS node monitors the network and collects audit data specific for neighbors nodes. It collects the data in the form of Node and Network Characteristic Values (NCF). A dataset forms a dataset that contains all traffic that goes through the network. All these features are described in this NCF dataset. We processed the data collection by the Data Preprocessing Module. We seek to find and select useful features to describe the malicious nodes. Data Preprocessing Module: It defines the relevant features of the data preprocessing method which are applied to these features. A previous and necessary step is to know which is features must be considered to calculate the final evaluation by the IDS node. The features selection is required to obtain the impact of the malicious node on routing behavior and mobile network topology. We select suitable performance metrics according to each requirement, defined by the intruder. In this due, we propose to define a typical dataset, a thorough investigation of attacks and a selection of the most recent DoS attacks against MANET. The study will be limited to four types of attacks. Two attacks fit into Packet Dropping category (Blackhole ‘B’ and Grayhole ‘G’). One attack that fits into Routing Disruption (Wormhole ‘W’). One last attack is involved in the Resource Consumption (Flooding ‘F’). Blackhole and Grayhole are implicated in dropping packets. The Blackhole attack drops all received packets meant for forwarding, whereas Grayhole drops packets at certain frequencies. It provides the highest sequence number (SN) and the lowest hop count number (HC) in routing control packets to attract mobile source node and drop the packets. These attacks inject faked Route Reply (RREP) packets to the source node advertising,
28
R. Meddeb et al.
that is, providing the shortest path to the destination. For Wormhole attacks, it is expected as the additional HC may decrease affected connections. Besides, Flooding attack injects faked Route Request (RREQ) packets into the network. Therefore, all the mobiles nodes in MANET consume its energy for transmitting unnecessary RREQ packets. We mainly evaluate four features starting from NCF dataset. We consider Average of Sequence Number (SN ), Hop Count (HC), Data Packet Dropped Rate (DP DR), and Packet Delivery Ratio (P DR). We define the relevant features using these performance metrics. We consider that the FIS will examine these measures. Fuzzy Signature Module: Employs a Fuzzy Inference System to identify and describe the node behavior (see Fig. 2). FIS uses the Mamdani type to explain the label values of data points [9]. In fact, It is designed to generate fuzzy rules of related normal/abnormal behavior. The Crisp Input is the relevant features normalized in the Data Preprocessing Module. The Average of DPDR, PDR, SN, and HC are used as input parameters. The passage from the Crisp Input into the fuzzy input is called Fuzzification. For example, the membership functions are assigned for Grayhole attack. The inference processing is the principal component of the knowledge based on decision making and is expressed by fuzzy rules. Based on a given rule defined in the fuzzy rule base, it may be interpreted as an inference rule in the FIS with Mamdani (see Table 1). The rule extracts fuzzy inputs from the fuzzification phase and the rules from the knowledge base and then calculates fuzzy outputs. A rule base for a malicious behavior below shows, in principle, the rule evaluation. These Crisp outputs are expressed as three categories referring to the Verity Levels: Low (0–4), Medium (4–8) and High (8–15). The Crisp Output is the Verity Level parameter which can make decisions for the label values of each data points. The verity Level of each node is calculated so that it would be considered as input parameters membership functions. They are also expressed on a scale from 1 to 15, where 15 being the less vulnerable to highly vulnerable. Each one used here is the value of three categories nodes behaviors, such as malicious, normal and the best nodes behavior. Therefore, the low value of Verity Level exhibits more malicious behavior of a node than the behavior of neighboring nodes. Each of these labels represents a fuzzy set value of possible Crisp values. Let us take our NCF dataset where we would like to classify a scenario based on their behavior. On the basis of Multi-Label Classification, each scenario could distribute in one of the behavior categories. Accordingly, each fuzzy input variable can be assigned with one category. Therefore, the labeled dataset can be described by each pattern of signatures in a unique value. Accordingly, the dataset should include the label values of data points from each class labels. Labeled-NCF dataset is required to train and evaluate malicious behavior. 3.2
Dataset Evaluation
Machine learning methods proves effective for anomaly-based intrusion detection. The model provided by the supervised learning process can make
Dataset for Intrusion Detection in Mobile Ad-Hoc Networks
29
Fig. 2. Attack signatures detection process
predictions on new data instances. The good performance of the proposed model can be explained by the choice of an adequate classification algorithm. K Nearest Neighbor (KNN) prediction is a non-parametric method dedicated to the classification [10]. The fundamental principle is classified by the majority vote of its neighbors, with the sample value being assigned to the most common class among its K nearest neighbors. The KNN begins from the idea of taking decisions by searching for one or more similar cases already resolved. Usually, the Euclidean distance is applied as the distance metric. This will update the distance d formula as below: d(x, y) = (y1 −x1 )2 + · · · + (yp −xp )2 =
p
1 2 (yi −xi )2
(1)
i=1
We assume that we have determined the K neighbors (x1 , c(x1 )), ..., (c(xK )) of a node behavior vector y to which we intend to attribute the class c(y). A way to combine the K nearest neighbors with K classes is majority voting. We will select the value of K that minimal classification error. In our situation, the choice of the value of the parameter K is crucial and will determine the model choice. Our study target is the existence of value K (Neighbors) with an Accuracy of maximum classification from the neighboring points. Using cross-validation, we build k different models. That is why, we can are able to make predictions on all of our data. We split our dataset into k = 10 parts. Then, we built ten different
30
R. Meddeb et al.
models. Each model is trained on the nine parts and tested on the tenth. Our first model is trained from part 1 to 9 and tested on part 10. Our second model is trained from part 2 to 10 and tested on part 1 and so on. The section below provides a full description of the experiments and the obtained findings. Table 1. Fuzzy rule base for Grayhole [8].
4
Rules PDR
DPDR
SN
HC
Verity level
1
Low
High
Medium
Medium
Low
2
Low
High
High or Medium Low
3
Low
High
High
Low or Medium Low
4
Low
Medium
Medium
Medium
5
Low
Medium
High or Medium Low
6
Low
Medium
High
Low or Medium Low
7
Low
Low
Medium
Medium
8
Low
Low
High or Medium Low
9
Low
Low
High
10
Medium Low or Medium Low
11
Medium Low or Medium Low or Medium High
12
Medium Low or Medium Medium
Medium
Medium
13
High
High
Medium
Medium
Low
14
High
High
High or Medium Low
15
High
High
High
16
High
Low or Medium Medium
17
High
Low or Medium Low or Medium High
18
High
Low or Medium Low
Low Low Low Low Low
Low or Medium Low High or Medium Medium Medium
Low
Low or Medium Low Medium
High High
High or Medium High
Experiments and Results
We apply the method to the Network-based in MANET with AODV routing protocol. The experimental measurements of some important performance metrics are used to mark the features of the network with normal and malicious behavior. Besides, various metrics were studied with the average values for different simulation times, and for various simulation scenarios. The simulation is studied using Opnet Modeler14.5 (see Table 2). Depending on the various types of malicious nodes, we implemented and modified in Mac layer of the Ad-Hoc network with AODV protocol. Our NCF dataset contains 1510 samples. It is composed of five classes. For each one; we collected the audit data specific for neighbors’ nodes. The sample distributions for different categories will be given to 302 samples for each class. In this part, we discuss the obtained findings. The machine learning algorithm is applied to determine the most optimal dataset; the NCF dataset (KNN) compared with Labeled-NCF dataset (Fuzzy-KNN). We
Dataset for Intrusion Detection in Mobile Ad-Hoc Networks
31
Table 2. Opnet configuration parameters [8] Parameter
Definition/value
Version
Opnet 14.5
Number of mobile nodes
50 nodes
Transmission range
250 m
Routing protocol/application protocol
AODV/FTP, HTTP
Simulation area size /topology
1 Km ×1 Km
Simulation duration
1h
Node placement (mobility)
Random way point
Malicious nodes
2 to 6 nodes
Mac layer/channel type
Wireless Lan mac/Wireless channel
Traffic sources (type)/size of data packet CBR(UDP)/512 bytes Table 3. The results of the performance measures for Fuzzy-KNN [8] Predicted/actual Sensitivity Specificity Normal 1.000 1.000 1.000 0.900 Blackhole 0.900 1.000 Grayhole 0.700 1.000 Wormhole 1.000 1.000 Flooding Balanced accuracy fuzzy Normal 1.000 1.000 1.000 0.912 Blackhole 0.860 1.000 Grayhole 0.700 1.000 Wormhole 1.000 1.000 Flooding Balanced accuracy KNN Normal 1.000 1.000 1.000 0.925 Blackhole 1.000 1.000 Grayhole 0.715 1.000 Wormhole 1.000 1.000 Flooding Balanced accuracy fuzzy-KNN
(+) Predictive 1.000 0.714 1.000 1.000 1.000
(−) Predictive 1.000 1.000 0.975 0.930 1.000
1.000 0.734 1.000 1.000 1.000
1.000 1.000 0.952 1.000 1.000
1.000 0.769 1.000 1.000 1.000
1.000 1.000 1.000 1.000 1.000
Accuracy 1.000 0.950 0.950 0.850 1.000 0.92 1.000 0.950 0.920 0.860 1.000 0.92 1.000 0.962 1.000 0.870 1.000 0.94
divide the dataset into training and testing data with a 90–10 split. The training data comprises 90% of total instances, and testing data covers 10% of the total instances. Previous classifications methods select the k value by either setting a fixed constant for all test data or conducting Cross-Validation to estimate the k = 10 for each test data point.
32
R. Meddeb et al.
Fig. 3. Performance evaluation Fuzzy-KNN
The value of K is set to 10 based on the accuracy we attained in the experiment. The accuracy of the model is equal to the number of correctly classified test instances out of total test instances. The threshold value for the scenarios of attack can only be defined in the form of if then else under the FIS. We can mention that the threshold value set is 7.5. The outcome of the comparison between these two values leads to an identification of the nodes’ behavior. It is mentioned that the Verity Level is higher than the threshold value. When the Verity Level is greater than the threshold value, these latter values bring consequently; the node is not malicious. In the case of the opposite case, the elaborated pattern is considered as a malicious one. Based on the accuracy and the different alarm rates, we can build up a well-constructed performance. The different alarm rates for attacks by two models are shown in Tables 3. The confusion matrix can be helpful in understanding the bias of the proposed detector towards a particular class of attacks. Based on the results shown in the Table 2 when we directly apply the FIS [8] or KNN algorithm (without using the third module) the accuracy equal to 92%. Nevertheless, the Fuzzy-KNN classifier (followed all the modules to generate our dataset) that was trained and tested achieved an accuracy of 94%. Our simulation results demonstrate that the proposed Fuzzy-KNN is more capable of detecting a specific attack with a higher true positive rate and a lower false positive rate. Hence, our approach can be assessed through regressing the estimated predicted vs the observed values (see Fig. 3). It is noteworthy that the various measured observations for wormhole attack are lower than the estimated values. However, as far as the other malicious nodes are considered, the predicted and the observed data are nearly similar. Furthermore, the predicted and measured value of a wormhole attack is remarkable. This may be justified as follows: two malicious nodes form a tunnel, retrieve data packets from one part
Dataset for Intrusion Detection in Mobile Ad-Hoc Networks
33
and pass it on to another part of the network through this malicious behavior. This attack selects the route which contains a malicious node. As it has a much higher probability of packet dropping. This result confirms the fact that an IDS trains a sequence of models. The accuracy of the classification is much more accurate than using directly the three modules without referring to the labeled dataset. By using Cross-Validation, this classifier demonstrates better performance in some attacks categories and the accuracy varies between 0.87 and 1.
5
Conclusion
MANETs are applied in several fields and require providing cost-effectively secure communication. Thus, a defensive mechanism against attacks should be designed. The main contribution is an integration of a typical dataset. We propose an approach to collecting traffic data containing normal and malicious behaviors. In order to take advantage of the optimal performance of labeled dataset, we determine the Fuzzy relevant features. The obtained results show that the proposed dataset is effective in detecting the three types of attacks with a high detection rate. Future work will include further analysis of the simulation results with more complex network scenarios. We also intend to extend the parameters used for identifying intrusive activities to minimize false detection rates.
References 1. Meddeb, R., Triki, B., Jemili, F., Korbaa, O.: A survey of attacks in mobile ad hoc networks. In: ICEMIS2017, pp. 1–6, Monastir (2017) 2. Gaied, I., Jemili, F., Korbaa, O.: Intrusion detection based on neuro-fuzzy classification. In: 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), pp. 1–8. IEEE (2015) 3. Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A.: A survey of networkbased intrusion detection data sets. Comput. Secur. 86, 147–167 (2019) 4. Jyothsna, V., Prasad, K.M.: Anomaly-based intrusion detection system. In: Computer and Network Security, IntechOpen (2019) 5. Ma lowidzki, M., Berezinski, P., Mazur, M.: Network intrusion detection: half a kingdom for a good dataset. In: NATO STO SAS-139 Workshop, Portugal (2015) 6. Glass-Vanderlan, T.R., Iannacone, M.D., Vincent, M.S., Bridges, R.A., et al.: A survey of intrusion detection systems leveraging host data. arXiv preprint arXiv:1805.06070 (2018) 7. Haider, W., Hu, J., Slay, J., Turnbull, B.P., Xie, Y.: Generating realistic intrusion detection system dataset based on fuzzy qualitative modeling. J. Netw. Comput. Appl. 87, 185–192 (2017) 8. Meddeb, R., Triki, B., Jemili, F., Korbaa, O.: An effective IDS against routing attacks on mobile ad-hoc networks. In: New Trends in Intelligent Software Methodologies, Tools and Techniques, vol. 297, pp. 201-214. IOS Press (2018)
34
R. Meddeb et al.
9. Verma, N.K., Singh, V., Rajurkar, S., Aqib, M.: Fuzzy inference network with Mamdani fuzzy inference system. In: Computational Intelligence: Theories, Applications and Future Directions, vol. I, pp. 375–388. Springer (2019) 10. Swathi, D., Lakshmi, D.S.: Network intrusion detection using fast k-nearest neighbor classifier. In: UGC Sponsored National Seminar On Cyber Security With Special Focus On Cyber Crimes & Cyber Laws (NSCS-2014) (2014)
Visual Password Scheme Using Bag Context Shape Grammars Blessing Ogbuokiri1(B) and Mpho Raborife2 1
2
School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa [email protected] Department of Applied Information Systems, University of Johannesburg, Johannesburg, South Africa [email protected]
Abstract. In this paper, we implemented the similar images generated by bag context shape grammars as distractors in a prototype visual password scheme. A bag context shape grammar is a shape grammar that uses spatial rules to generate images in a regulated manner. That is, during the generative process, bag context is used as a technique to control when a shape grammar rule should be applied. The prototype visual password scheme is used to measure user experience as to whether users can remember their passwords immediately after enrolment and one week after enrolment. This is to ascertain whether the similar images generated using bag context shape grammars are good as a distractor for visual password scheme. The prototype visual password scheme is also built for shoulder surfing and guessing attacks resistance. The outcome of this study shows that bag context shape grammars are good for the generation of similar images as distractors for visual password schemes.
Keywords: Visual password Bag context shape grammar
1
· Formal language · Shape grammar ·
Introduction
Authentication in computers is the process of verifying the genuineness of a digital credential. The most common methods of authentication are passwords and biometrics. The former involves the use of numbers and or characters called alphanumeric passwords. Alphanumeric passwords are sometimes forgotten, misplaced or attacked due to the human tendency of using easy to remember passwords. Biometrics involves the use of fingerprint, iris, voice pattern, etc., on an installed hardware device. It is a fact that when people grow old or have an accident their biometric features change. Criminals have taken advantage of these authentication problems to impersonate peoples’ digital credentials to perpetrate crimes. Globally, more than 556 million people have become victims of these crimes in the last 13 years [10,17]. According to the United States federal c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 35–47, 2021. https://doi.org/10.1007/978-3-030-49342-4_4
36
B. Ogbuokiri and M. Raborife
trade commission [3], 40% of initial contacts of most fraud cases were by email, and 20% by Internet websites. In this context, it is estimated that 332,646 victims were affected by financial losses totalling $110 billion dollars [3]. In 2015, it is also estimated that R57.8 million was lost due to unauthorized access to computers in South Africa [22]. In the light of the above, visual passwords have been proposed as a replacement for alphanumeric passwords or biometric authentication systems [7,13]. Visual Password Schemes (VPSs) are authentication systems that use images for enrolment and authentication [15,16,24]. They are classified into recognition or cued recall [24]. The VPSs have memory advantages over alphanumeric passwords because passwords are based on pure recall which can easily be forgotten due to human errors [12]. Studies have shown that pictures are recognized with 98 per cent accuracy more than words and sentences after a long time [22]. According to Shepard [20], seventeen percent recognition error was recorded after viewing 10,000 pictures. This could be part of the reasons why pictures are seen as more easily recalled than passwords [14]. However, use of visual passwords for authentication has been fraught with many challenges such as; biased pass image selection as people tend to select faces from their own race or background or attractive faces or the faces of models which makes it prone for guessing attacks, static image representation during enrolment and authentication which makes it easy for shoulder surfing attacks, consumption of large memory space and high bandwidth connection needed for passing images to and from during authentication [13]. To solve some of these challenges identified, the approach of picture grammars for the generation of abstract images for visual passwords was introduced [13]. A picture grammar is an abstract structure with which one can generate a set of images [9]. The use of these grammars for image generation does not always produce images in a regulated manner. Some of these grammars generate images without considering the similarity between them. However, research has shown that control can be added to the grammar based image generation process using a technique called bag context [4]. Bag context is a technique used to control the derivation (generation process) of an image. This type of technique, when added to shape grammar rules is called Bag Context Shape Grammars (BCSGs). The idea of BCSGs was first proposed by S. Ewert (Personal communication, University of the Witwatersrand). BCSGs is a type of shape grammar that generate an infinite number of images in a regulated manner by controlling when a shape grammar rule should be applied during the generation process. The essence of this paper is to implement the similar images generated by BCSGs as distractors in a prototype visual password scheme (VPS). A distractor is an image in a visual password system that diverts the attention of a user from the desired area of focus during login or incorrect image choices that looks similar [13]. The prototype visual password scheme is used to measure user experience as to whether users can remember their passwords immediately after enrolment and one week after enrolment. This is to ascertain whether the similar images generated using BCSGs are good as a distractor for a visual password
Visual Password Scheme Using Bag Context Shape Grammars
37
scheme. The prototype visual password scheme is also built for shoulder surfing and guessing attacks resistance. The outcome of this study shows that BCSGs is good for the generation of similar images as distractors for visual password schemes. The remainder of the paper is organised as follows. Section 2 presents the visual password design, followed by, the usability study in Sect. 3. In Sect. 4, we present the performance analysis of the VPS, and finally, Sect. 5 is the conclusion and future work.
2
VPS Design
Here, we present the building blocks of the VPS in Sect. 2.1. 2.1
Structure
The VPS is divided into three major parts, namely, enrolment, authentication and the admin. Each part is discussed briefly. 1. Enrolment: This part is where the generation of images (distractors or decoy) and the registration of password begins. The generation process is done by the BCSG interpreter that is embedded in the VPS. The BCSG interpreter models shape grammar rules into an image. A user is allowed to select the kind of predefined initial image. Every initial image has its own predefined rules which can generate infinitely different similar images. The system uses the selected initial image as a sample to generate nine distractors using the predefined rules. The nine distractors are randomly displayed on the screen. A user selects the image of choice to use as a password from the randomly displayed images. The system shuffles the images on the screen and requests that the user confirms image selection. The user is asked to supply an identification (ID) number as a means of a unique identifier just in case the user wants to change his password. The selected image is converted to a vector using the spatial colour distribution descriptor (SpCD) [19] and stored as an Extensible Markup Language (XML) file. The XML file is a file that defines a set of rules for encoding documents in a format that is both human and machine readable. The initial shape that was used to generate the image and the ID number are also saved. 2. Authentication: This part involves the verification of the image password selected by the user. Nine distractors are displayed on the screen. The system uses the saved initial shape in 1. and the appropriate predefined rules to generate the equivalent image distractors generated during enrolment. The nine images are displayed on the screen. One of the nine images must correspond exactly to the image password selected during enrolment. When a user selects an image, the system converts it to a vector using SpCD and matches the same with the saved vector in the XML file. Then the system
38
B. Ogbuokiri and M. Raborife
shuffles the images and requests that the user selects another image. This process is repeated three times. If the result of the three attempts are the same, that is, the vector matches the one in the XML file for the three attempts, then access is granted. When a user does not remember his or her password, the system requests that the user supplies his ID number inorder to change the password. 3. Admin: This part overwrites any account. That is, it can deny or grant access at any time. Next, we present the architecture of the system in diagram (see Sect. 2.2). 2.2
System Architecture
In this section, the system architecture is shown in Fig. 1. The architecture represents how each of the components of the VPS communicates to each other. Next, we discuss the human computer interaction (HCI) design of the VPS.
Fig. 1. The VPS architecuture
2.3
HCI Design
According to cognitive psychology, the way in which humans receive, process and store information, solve problems and acquire skill is very crucial [23]. Research in this area has shown that recognition is easier than recall, as such, humans
Visual Password Scheme Using Bag Context Shape Grammars
39
remember images easily after a long period [23]. In order to design an interactive and an easy to use system, our type of VPS interface is designed having in mind the way in which humans receive, process and store information in order enhance recognition of image password [11]. Layout. The VPS layout is designed based on the International Telecommunication Union (ITU E.1.161) standard and recommendation for the arrangement of a digit, letters, and symbols on the telephone and other devices that can be used for gaining access to a telephone network [8]. The standard number for push buttons are ten digits, 0 to 9. The standard arrangement is 3 × 4 array. The VPS layout was designed to randomly display images on a 3 × 3 array. This type of display arrangement was mimicked from the ITU E.1.161 push buttons arrangement, leaving out the last row. Each image is of size 130 pt × 130 pt. The 3 × 3 array is organised in a 550 pt × 550 pt frame. Hence, during rendering, the system automatically picks the size of the device screen and displays the frame at the center of the screen. GUI. Designing the graphic user interface (GUI) is very crucial as the choice of colour, font type, and font size can affect users interest [1]. According to colour psychology [23], the majority of people see blue as their favourite colour. This is because most colour blind people can see the colour blue and it is associated with nature (e.g. clean water, clear sky, etc.), and more. This motivated our choice of blue as the font colour on a white background. The Sans Serif bolded font size 14 is used to assist users with visual impairments. The Sans Serif font is a category of the font that does not use serif, small lines at the ends of characters. Login GUI Design. The login design is where the verification is done (see Fig. 2). It is made up of four parts as listed below: 1. Admin – denies and grants access to users at any time. 2. Login frame – holds and displays the nine distractors or decoy images for verification. 3. Create New Password – allows a user to generate and select new image password. 4. Forgot Password – requests users identification number for image password change. 5. Number of Trials – keeps count of login attempts by a user. Enrolment GUI Design. This module is made of three major parts, which are listed below: 1. Select Shape – allows a user to choose the type shape he wants to use for the image generation. 2. Image Type – allows the user to choose the type of image he wants to generate using the selected shape. 3. Image Frame – holds the randomly displayed nine distractor images.
40
B. Ogbuokiri and M. Raborife
Fig. 2. The VPS login design
2.4
Password Design
Password formation is done by first, partitioning the picture into 64 (8 × 8) equal blocks. Then, the image is transformed into a 64 coefficient matrix using SpCD. A zigzag scan [18] is performed on the 64 coefficient matrix inorder to reduce it to a vector which is saved for verification (See Fig. 3).
Fig. 3. The VPS passwprd formation
2.5
Strength Analysis
Here, we discuss the ability of the VPS for shoulder surfing and guessing attacks resistance. Shoulder surfing occurs when an attacker learns a user’s password by watching the user login. That is, an attacker takes up a position where the user’s login details can be seen while he login. The attacker can achieve this aim by watching with the optical eyes, a special camera, video recorder or binocular, etc [25]. Guessing or Brute force attack involves trying every possible combination or trials until the correct password is found [6,25]. Shoulder surfing and guessing attack resistance have improved significantly over the years [2,6,25]. To prevent these attacks in our type of VPS, we implemented similar images which act as distractors to the target image, and to the human attacker who
Visual Password Scheme Using Bag Context Shape Grammars
41
observes from a distance. Also, the distractors are randomly displayed at every trial during authentication. This means that the nine images are shuffled when they are displayed on the screen at every trial. The user has to identify the image he selected earlier and select it again. The user has to make three accurate trials for access to be granted. If the user fails at any stage of the three attempts, he will not be notified of the particular stage at which he made the mistake. Although, large password space contributes largely to the security of a system, which is the main defence against a brute force attack. However, most recognition based graphical passwords tends to have a small password space [5]. This is because picture passwords are mostly used in mobile devices, as such, password space must be limited [5]. An increased password space is not realistic for users [5]. Interestingly, it is more difficult to carry out a brute force attack against visual passwords than alphanumeric passwords [21].
3
Usability Study
In this section, we look into users ability to remember a password immediately after enrolment and the user’s ability to remember the password at least one week after enrolment. 3.1
Participants
A hundred and eleven novice users, who are not familiar with the VPS were trained to use the VPS. The participants were 67 male and 44 female with an average age of 25 years. These set of participants were taken because they are expert computer users who use computers for at least 4 to 5 h a day, either for personal research or work activities. 3.2
Materials
Most of the computers used in testing the VPS displayed nine images of equal size in a frame on the center of a screen of standard size. The nine images are randomly displayed at every trial. The system displays the images randomly to determine whether a novice could learn, remember, and select the image password successfully. The time it takes for a successful authentication is recorded by the system. 3.3
Procedure
The participants were divided into four groups. Each group was trained separately by the researcher. After the initial training and explanation of how the system works and how to identify an image. The researcher demonstrated the system by showing the participants how to create a password and how to log into the computer with the password.
42
B. Ogbuokiri and M. Raborife
Then, the participants were guided to create their password individually. All the participants created their password successfully. The participants were asked to use the password they created to login into the computer. At this time, the participants were not guided any more. A number of the participants were able to remember their passwords and login successfully, immediately after the enrolment. The time for each successful login was recorded. The participants were interviewed to get their perception of the system. Furthermore, one week after the enrolment, a follow up was initiated. The participants were reminded to login to determine whether they could still remember their image passwords. The outcome is discussed in Sect. 3.4. The Chi-Square (X2 ) test was used to analyse the association between a participants ability to login to the computer immediately and after one week. That is if those who successfully login immediately and one week after, did that by chance or they remembered their passwords and vice versa. Hypothesis One – H0 : The null hypothesis assumes that there is no association between a participants ability to login immediately and after one week, – H1 : The alternative hypothesis claims that there is some association between a participants ability to login immediately and after one week. 3.4
Results
All participants were able to complete the process of login and authentication. The outcome is grouped and summarised in Table 1 also called observed data. The X2 test is based on a test statistic that measures the divergence of the observed data from the values that would be expected or expected values (see Table 2). The expected value for each cell is generated from the observed data in Table 1 as thus: (1) (row total ∗ column total)/n, where n is the total number of observations in the table. If the expected value for each cell is greater than or equal to 5, then, the X2 is good for the experiment, otherwise, another test statistic will be used. For example, the value in column two, row two of the expected values table (see Table 2) is calculated using Eq. 1 to be 39 ∗ 46/111 = 16.16216 From Table 1, 24 participants are not able to remember their password immediately and after one week, 22 remembered their passwords immediately and forgot it one week later. 15 participants could not remember their password immediately but remembered it after one week, while 50 participants were able to login immediately and also after one week. From the results, there are 72 (64.9%) of the total number of participants who were able to login immediately while 39(35.1%) could not log in immediately.
Visual Password Scheme Using Bag Context Shape Grammars
43
Table 1. Observed data Immediately One week after No Yes Total No
24 15
39
Yes
22 50
72
Total
46 65
111
Table 2. Expected values Immediately One week after No Yes
Total
No
16.16216 22.83784
39
Yes
29.83784 42.16216
72
Total
46
65
111
The results suggest that the majority of the participants were able to log into the system immediately. There were 65(58.6%) of the total number of participants who were able to log into the system after one week successfully and there were 46 (41.4%) of the total number who were not able to log into the system after one week. There was a decrease in the total number of participants who were able to log into the system immediately compared to the total number that logged into the system after one week. The X2 test statistic is computed using Eq. 2. X2 =
k (Oi − Ei )2 i
Ei
(2)
where the observed value for each cell in Table 1 is denoted by Oi , and the expected value for each cell in Table 2 is denoted by Ei . The variable i is the specific cell in the table and k represents the total number of cells without the total column or row. For example, using the Eq. 2, we obtain the X2 test statistic as thus (24 − 16.16216)2 (15 − 22.83784)2 (22 − 29.83784)2 (50 − 42.16216)2 + + + 16.16216 22.83784 29.83784 42.16216 = 3.80096 + 2.86991 + 2.05885 + 1.45703
X2 =
= 10.18675
Then, we proceed to calculate the probability value (p-value) of the X2 test statistic. The p-value helps to support the claim as to whether to accept or reject the (H0 ). The p-value is also evidence against the H0 . Therefore, H0 is rejected if the p-value is ≤0.05. According to the experiment, the p-value = 0.00156 using the z table.
44
B. Ogbuokiri and M. Raborife
3.5
Discussion
The results of the X2 test statistic show that there is a statistically significant (X2 = 10.18675, p-value = 0.00156) relationship or association between a participant’s ability to log into the system immediately after enrolment and one week after. We, therefore, reject the H0 and accept the alternative hypothesis H1 . We say that there is an association between the two variables. That is to say, those who remembered their passwords immediately and one week after was not by chance. The odds (odds ratios1 ) of a successful login immediately are 3.64 times the odds of an unsuccessful login. The odds ratio is significant (p-value = 0.002) with a confidence interval2 of (1.61, 8.23).
4
Performance Analysis
We tested the VPS performance in terms of its ability to produce the right result at the shortest possible time. We recorded the times it took for 20 successful logins on the machine the application was built and arrived at an average of 12.6 seconds and standard deviation of 3.2671. Then, we compared the times to the average of 20.7 s of the 72 successful logins on the different machines with a standard deviation of 8.2619. The essence is to know if the performance of the system is machine bias. That is if the performance in terms of time is dependent on the machine or not. The Independent Samples t test is used to compare the means of the two independent group of times (seconds) to determine whether there is statistical evidence that the associated time means are significantly different. Hypothesis Two – H0 : The null hypothesis assumes that the time means are not significally different – H1 : The alternative hypothesis assumes that there is a signficant difference in the time means. Therefore, H0 is rejected if the p-value is ≤0.05. 4.1
Result
The independent t test statistic is calculated using the Eq. 3. X¯1 − X¯2 − (µ1 − µ2 ) (n1 − 1)s21 + (n2 − 1)s22 t= sp = 1 n1 + n2 − 2 s + 1 p
1 2
n1
n2
The measure of an association between an exposure and an outcome. Used to estimate the precision of the odds ratio.
(3)
Visual Password Scheme Using Bag Context Shape Grammars
45
Where the variables X¯1 = mean of sample 1, X¯2 = mean of sample 2, µ1 − µ2 = difference in the two population means, s21 = sample varience of sample 1, s22 = sample varience of sample 2, n1 = size of sample 1, and n2 = size of sample 2. Note: Sample 1 is for the main machine, while sample 2 is for different machines. The experiment performed using the Eq. 3 showed that the independent t test statistic is −4.3195 and the p-value = 0.000004. 4.2
Discussion
The results of the t test statistic show that t = −4.3195 and p-value = 0.000004 using the t table, this implies that, there is a statistically significant difference between the time means. We reject the null hypotheses H0 and accept the alternative hypotheses H1 . This simply means that the run time of the VPS could be affected by the machine used.
5
Conclusion and Future Work
In this paper, we implemented the VPS using the BCSGs generated images. The similar images which act as distractors were randomly displayed to improve shoulder surfing and guessing attacks resistance in the VPS design. The VPS was tested for human usability, which proves that several people could remember their image password after one week. The performance of the VPS in terms of run time was tested. It is also observed that the VPS performance could be affected by the type of machine used. Finally, the findings in this paper suggest that BCSGs is good for the generation of similar images as distractors for visual password schemes. In the future, if this idea is implemented in a robust software tool, it will find applications in the industry where password security is important. Acknowledgements. The authors would like to thank the Department of Science and Technology (DST) and the Council for Scientific and Industrial Research (CSIR) Inter-bursary support programme, South Africa, for funding this research.
References 1. Alan, D., Finaly, J., Abowd, G.D., Russel, B.: Human Computer Interaction. Pearson Prentice Hall, Upper Saddle River (2004) 2. Awais, A., Muhammad, A., Kashif, H.M., Ramzan, T.: Secure graphical password techniques against shoulder surfing and camera based attacks. In: International Journal of Computer Network and Information Security, November 2016, pp. 11– 18. IEEE (2016) 3. Commission, U.F.T., et al.: Consumer sentinel network data book for JanuaryDecember 2014 (2015) 4. Drewes, F., Du Toit, C., Ewert, S., Van Der Merwe, B., Van Der Walt, A.P.: Bag context tree grammars. In: Developments in Language Theory, pp. 226–237. Springer (2006)
46
B. Ogbuokiri and M. Raborife
5. Haichang, G., Xiyang, L., Sidong, W., Honggang, L., Ruyi, D.: Design and analysis of a graphical password scheme. In: Fourth International Conference on Innovative Computing, Information and Control, pp. 675–678. IEEE (2009) 6. Haichang, G., Zhongjie, R., Xiuling, C., Liu, X., Uwe, A.: A new graphical password scheme resistant to shoulder-surfing. In: 2010 International Conference on Cyberworlds, pp. 18–23. IEEE (2010) 7. Hassanat, A.B.: Visual passwords using automatic lip reading. arXiv preprint arXiv:1409.0924 (2014) 8. International-Telecommunication-Union: Series E: overall network operation, telephone service, service operation and human factors. Telecommunication Standardization Sector of ITU, pp. 1–14 (2004) 9. Jiang, T., Li, M., Ravikumar, B., Regan, K.W.: Formal grammars and languages. In: Algorithms and Theory of Computation Handbook, pp. 20–20. Chapman & Hall/CRC (2010) 10. Kara, M.: Identity theft in United States of America, Bureau of justice statistics, US (2015). http://www.ojp.gov/ 11. Katsini, C., Fidas, C., Raptis, G.E., Belk, M., Samaras, G., Avouris, N.: Influences of human cognition and visual behavior on password strength during picture password composition. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI 2018, pp. 73–87. no. 87. ACM, New York (2018). https://doi.org/10.1145/3173574.3173661 12. Norman, D.A.: The Design of Everyday Things: Revised and Expanded Edition. Basic Books, New York (2013) 13. Okundaye, B., Ewert, S., Sanders, I.: A novel approach to visual password schemes using tree picture grammars. In: Proceedings of the 2014 PRASA, RobMech and AfLaT International Joint Symposium, pp. 247–252 (2014) 14. Paivio, A., Rogers, T.B., Smythe, P.C.: Why are pictures easier to recall than words? Psychon. Sci. 11(4), 137–138 (1968) 15. Renaud, K., De Angeli, A.: Visual passwords: cure-all or snake-oil? Commun. ACM 52(12), 135–140 (2009) 16. Revett, K.: Behavioral Biometrics: A Remote Access Approach. Wiley, Hoboken (2008) 17. Rob, D.: Identity theft victims statistics. Identity theft and scan prevention services official website (2016). http://www.identitytheft.info/victims.aspx 18. Savvas, A.C., Yiannis, S.B., Mathias, L.: Img(rummager): an interactive content based image retrieval system. In: SISAP 2009 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications, Prague, Czech Republic, 29–30 August, pp. 151–153. IEEE Computer Society (2009) 19. Savvas, A.C., Yiannis, S.B., Mathias, L.: SpCD - spatial color distribution descriptor - a fuzzy rule based compact composite descriptor appropriate for hand drawn color sketches retrieval. In: 2nd International Conference on Agents and Artificial Intelligence, pp. 58–63, Artificial Intelligence (2010). http://hdl.handle.net/11728/ 10155. Accessed 20 Sept 2018 20. Shepard, R.N.: Recognition memory for words, sentences, and pictures. J. Verbal Learn. Verbal Behav. 6(1), 156–163 (1967) 21. Sreelaja, N.K., Sreeja, N.K.: An image edge based approach for image password encryption. In: Security Communication Networks 2017, pp. 5733–5745. Wiley Online Library (2017) 22. Stander, A., Dunnet, A., Rizzo, J.: A survey of computer crime and security in South Africa. In: ISSA, pp. 217–226 (2009)
Visual Password Scheme Using Bag Context Shape Grammars
47
23. Sternberg, R.J., Sternberg, K.: Cognitive Psychology. Wadsworth (2011) 24. Wiedenbeck, S., Waters, J., Birget, J.C., Brodskiy, A., Memon, N.: PassPoints: design and longitudinal evaluation of a graphical password system. Int. J. Hum Comput Stud. 63(1), 102–127 (2005) 25. Wiedenbeck, S., WatersLeonardo, J., Birget, S.C.: Design and evaluation of a shoulder-surfing resistant graphical password scheme. In: Proceedings of the Working Conference on Advanced Visual Interfaces, AVI 2006, Venezia, Italy, 23–26 May, pp. 177–184. ACM (2006)
Peak Detection Enhancement in Autonomous Wearable Fall Detection Mario Villar1 and Jose R. Villar2(B) 1
2
University of Granada, Granada, Spain [email protected] Computer Science Department, Faculty of Geology, University of Oviedo, Oviedo, Spain [email protected]
Abstract. Fall Detection (FD) has drawn the attention of the research community for several years. A possible solution relies on on-wrist wearable devices including tri-axial accelerometers performing FD autonomously. This type of approaches makes use of an event detection stage followed by some pre-processing and a final classification stage. The event detection stage is basically performed using thresholds or a combination of thresholds and finite state machines. In this research, a novel event detection is proposed avoiding the use of user predefined thresholds; this fact represents the main contribution of this study. It is worth noticing that avoiding the use of thresholds make solutions more general and easy to deploy. Moreover, a new set of features are extracted from a time window whenever a peak is detected, classifying it with a Neural Network. The proposal is evaluated using the UMA Fall, one of the publicly available simulated fall detection data sets. Results show the improvements in the event detection using the new proposal, outperforming the base line method; however, the classification stage still needs improvement. Future work includes introducing a finite state machine in the event detection method, adding extra features and a pre-classification of the post-peak interval and a better training configuration of the Neural Networks.
Keywords: Fall Detection Wearable devices
1
· Event detection · Classification ·
Introduction
The mean age of the Europe’s population is rising, which means the society needs to solve some challenges to allow a healthy aging; the detection of fall events or This research has been funded by the Spanish Ministry of Science and Innovation, under project MINECO-TIN2017-84804-R, and by the Grant FC-GRUPIN-IDI/2018/000226 project from the Asturias Regional Government. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 48–58, 2021. https://doi.org/10.1007/978-3-030-49342-4_5
Peak Detection Enhancement in Autonomous Wearable Fall Detection
49
Fall Detection (FD) is among the challenges to solve. Interested readers can find complete reviews on FD in [4,5]. There is a wide spread of studies concerning this topic, such as analyzing video recordings from cameras [21] or using wearable devices (WD) and the data from the sensors in the detection [12]. The ubiquity of WD leads, in senior houses, to a reduce the effort of the carers to check the inhabitants and also to study their level of activity during each day. Besides, autonomous on-wrist wearable devices, such as smart-watches, including FD might play a crucial role in helping the elder to continue living by their own. These devices can be easily worn because the population usually carry watches or bracelets; because some smart devices are programmable, deploying intelligent services on them is feasible. In this research, we focus on smart-watches with built-in tri-axial acceleromenters (3DACC), which is by far the most chosen option in FD with wearables [2,10,11,20,22]. Thresholds have been proposed as the elementary decision system [2,8,9,11, 13]; in all these cases, the TS values are analyzed until it surpasses a predefined combination of thresholds or the features extracted from the current sliding window becomes out of the predefined range. Machine Learning is used in the majority of the wearable-based FD to learn the patterns related with falls and with Activities of Daily Living (ADL). Modelling techniques such as Support Vector Machines (SVM), K-Nearest Neighbours, Neural Networks or Decision Trees have been widely used. For instance, SVM were used to classify features extracted from sliding windows [20,22]. A comparison of different classification algorithms has been presented in [10]. There are also studies concerned with the dynamics in a fall event [1,7]. The former study proposed the use of these dynamics as the basis of the FD algorithm [1], with moderate computational constraints but a high number of thresholds to tune. This solution is appealing when developing solutions to run in smart-watches due to their lack of computational power. The proposal of Abbate et al has been modified in a series of papers [14,15,18] to adapt the sensor placement on a wrist. We refer to this event detection as on-wrist Abbate. This sensor location introduces changes in the peak detection, forcing to introduce an over-sampling data balancing stage (using the Synthetic Minority Over-sampling Technique - SMOTE) and modifying the feed-forward Neural Network learning process. The main contribution in this study consist of a new event detection mechanism to detect the high intensity fall events, that is, those that arise when the user stands up and falls either while walking, standing still, running, etc. The mechanism is based on the partial maximum peak detection method [16], where the threshold to detect the peaks is automatically determined for each user. Interestingly, this new event detection makes use of no user predefined threshold, which represents a step ahead in the event detection mechanisms in the literature. We refer to this event detection mechanism as MAX-PEAK. Furthermore, a set of transformations will be calculated for the windows surrounding the fall event candidates, and a feed forward Neural Network classifier is used to label the instances.
50
M. Villar and J. R. Villar
The structure of the paper is as follows. The next section deals with the description of the on-wrist Abbate event detection method and their transformations. Section 3 details the MAX-PEAK and FSM-MAX-PEAK, together with the transformations that are proposed to compute and the modelling method. Section 4 describes the UMA Fall data set used and the experimentation. Section 5 shows and discusses the obtained results. Finally, conclusions are drawn.
2
The Baseline Event Detection
As propose in [1,15], the on-wrist Abbate is a simple finite state machine (FSM) - see Fig. 1. The data gathered from a 3DACC located on the wrist is processed using a sliding window. A peak detection algorithm is executed based on a predefined threshold th1 . When a peak is found, the sliding window data is analyzed in order to extract several features which finally classify the peak as either FALL or NOT FALL.
Fig. 1. The on-wrist Abbate FSM. Whenever the acceleration value is higher than a predefined threshold, the state changes to Post Peak. If no more peaks are detected, the bouncing timer fires and the state moves to Post Fall, which is supposed to be a calm period without any other peak. Once this second timer fires, the sliding window, still located around the last detected peak, is used to compute several transformations and to classify the sample. Finally, the FSM returns to the initial state.
The Activity Test state computes several transformations in the so called peak-window, which is determined as follows. Let’s assume that the gravity is g = 9.8 m/s. The magnitude of the acceleration at time t is at = a2tx + a2ty + a2tz (atx , aty and atz are the acceleration components in each of the axis). A peak occurs at a peak time (pt) whenever at is higher than th1 = 3 × g and there is no other acceleration value above that threshold in the period (t − 2500 ms; t] (no other at value higher than th1 ). The impact end ie denotes the end of the fall event: it is the last time moment for which the at value is higher than th2 = 1.5 × g. Finally, the impact start (is) denotes the starting time point of the fall event, computed as the starting time of a sequence where at = th2 . The impact start must belong to the
Peak Detection Enhancement in Autonomous Wearable Fall Detection
51
interval [ie − 1200 ms, peak time]. If no impact end is found, then it is fixed to peak time plus 1000 ms. If no impact start is found, it is fixed to peak time. Whenever a fall-like peak is found, the following transformations should be computed: AAMV AverageAbsolute Acceleration Magnitude Variation computed ie−1 as AAM V = t=is |at+1 − at |/N , with N the number of samples in the interval. IDI Impact Duration Index IDI = impact end − impact start. MPI Maximum Peak Index M P I = maxt∈[is,ie] (at ). MVI Minimum Valley Index M V I = mint∈[is−500,ie] (at ). PDI Peak Duration Index P DI = peak end − peak start, being peak start the time of the last magnitude sample below thP DI = 1.8 × g occurred before pt, and peak end the time of the first magnitude sample below thP DI = 1.8×g occurred after pt. ARI Activity Ratio Index, calculated as the ratio between the number of samples that are not in [thARIlow = 0.85 × g, thARIIhigh = 1.3 × g] and the total number of samples in the 700 ms interval centered in (is + ie)/2. FFI Free Fall Index, the average acceleration magnitude in the interval [tF F I , pt]. tF F I is the time between the first acceleration magnitude below thF F I = 0.8 × g occurring up to 200 ms before pt; if not found, it is set to pt − 200 ms. SCI Step Count Index, measured as the number of peaks in the interval [pt − 2200, pt]. The on-wrist Abbate FSM is a very challenging event detection that has been successfully used in several studies [14,17,18]. However, it has several drawbacks: the high number of thresholds and the difficulty of the improvements in the feature set. Concerning the high number of thresholds, the peak detection threshold is the main one and can be easily tuned. However, there are many more, some of which are acceleration values and others are time intervals that need to be set. It would be desirable to obtain an event detection and a feature set of transformations free of thresholds: that is the main aim of this research.
3 3.1
A New Fall Detection Approach The Event Detection Stage
For the purpose of detecting peaks in the 3DACC magnitude, the first stage is to smooth the signal using a sliding window sized 14 F REQ, with F REQ being the sampling frequency. Afterwards, we apply the S1 transformation proposed in [16]. For the current problem, the S4 and S5 were too complex for a smartwatch and need too wide windows of data in order to estimate the entropy. From the remaining transformations, we chose S1 because its simplicity and similar performance among all of them. The Eq. 1 defines the calculation of S1 , where k is the predefined number of samples and t is the current sample timestamp.
52
M. Villar and J. R. Villar
It is worth noticing that, although we analyze the window [at−2k−1 , at ] at time t, the peak candidate is at−k , the center of the interval. The S1 transformation represents a scaling of the TS, which makes the peak detection easier using a predefined threshold α. S1 (t) =
1 t × {maxt−k−1 i=t−2k ai + maxi=t−k+1 ai } 2
(1)
The algorithm for detecting peaks is straightforward: a peak occurs in time t if the value St is higher than α and is the highest in its 2k neighbourhood. In the original report, all the parameters (k, α) where carefully determined for each problem in order to optimize the peak detection. The use of predefined threshold is just what is trying to be avoided; therefore, in this research we define simple heuristics to automatically determine them. On the one hand, the value of k is defined with the sampling frequency (the number of samples per second). On the second hand, we define the α threshold as follows. We have walking as the reference activity, so the current user u needs to normally u for this walk during a short period. The mean μuw and the standard deviation σw period are calculated. We also compute the values of S1 for this walking period, u ). The TS is then calculating its mean (μuwS1 ) and standard deviation (σwS 1 u , which means normalized with these statistics; the threshold is set to α = 3σwS 1 (for a normal distribution) that a high value that is statistically the upper limit for S1 when walking is a peak candidate. With these settings the S1 is automatically set according to the current device and the user performance. From now on, we refer to this solution as MAX-PEAK. 3.2
The New Set of Transformations
Whenever a high intensity fall occurs there are three main parts: the activity being carried ordinarily before the fall event, the fall itself that we identify as a peak and what happens next. Because there are no public data set of real falls for healthy participants, we are not able to say accurately what happens after a fall: we can make the hypothesis that what happens after a fall is a period of relative calm, without special activity, perhaps some erratic movements of the hands. Therefore, we will divide the [at−2k−1 , at ] window in two: before IB = [at−2k−1 , at−k−1 ] and after IA = [at−k+1 , at ] the peak. For each of these sub-intervals we propose to compute the following transformations: AAMV Average Absolute Acceleration Magnitude Variation computed e−1 as AAM V = t=s |at+1 − at |/N , with N the number of samples in the interval [s, e]. e E Energy of the Acceleration Magnitude E = t=s a2t /N Mean Mean Activity the mean of the acceleration magnitude in the interval [s, e]. SD Standard Deviation of the acceleration magnitude in the interval [s, e]. Therefore, we have 4 transformations for each of the two intervals (a total of 8 transformations); none of which relies on thresholds of any kind. All of these
Peak Detection Enhancement in Autonomous Wearable Fall Detection
53
transformations are well known in the context of Human Activity Recognition and Fall Detection.
4
Experimental Design
The publicly available simulated falls UMA Fall data set [3] is used in this study. This data set includes several activities, transitions and simulated falls regarding up to 17 participants. There is no fixed number of repetitions of each activity or simulated fall. Each participant used several 3DACC, specially one on a wrist; the sampling frequency was 20 Hz. Altogether, 208 TS are simulated falls, belonging to lateral, forward or backward falls, out of the 531 TS that are available in this data set. The experimentation is divided in two main parts: the first part is devoted to compare the event detection methods, while the second one aims to evaluate the fall detection algorithm using the feature subset proposed in the previous section. The comparison of the event detection evaluates both methods (the on-wrist Abbate and the MAX-PEAK) for each participant in the UMA Fall data set. We use the well-known counters True Positive - TP, True Negative - TN, False Positive - FP - and False Negative - FN - to evaluate the performance of the event detection methods. These counters are updated according to whether the peaks are detected or not in each TS. Additionally, for every peak the 4 features for each k-length sub-interval are calculated and stored for later used. For now on, this second data set is referred as 4x2TRNS. For the second part of the experimentation, the 4x2TRNS data set is scaled to the interval [0.0, 1.0]. Then, this scaled data set is used to train and test a two classes feed forward Neural Network (NN) (using only the non-fall data) and also a two classes feed forward NN for each type of fall (lateral L-NN, forward F-NN and backward fall B-NN). The train part includes all the instances but those generated for the current participant u; these instances are used for testing. The sensitivity and specificity of the results for all the participants will be used to measure the performance of the method. For each of the classifiers the best parameter subset will be found using 300 iterations.
5
Obtained Results and Discussion
Results obtained from the first part of the experimentation, regarding with the comparison of the two methods for detecting falls, are shown in Table 1. As mentioned before, neither the number of TS nor the number of falls for each participant are the same, which in practice will mean that the FP and FN have to be examined in relative terms.
54
M. Villar and J. R. Villar
From the very first peek, it can be clearly seen that the MAX-PEAK method is able to detect much more falls (the number of FN is almost naught). Actually, 29.774% of the fall TS result in FN with on-wrist Abbate while MAX-PEAK only lacks in 0.672% FN. Something to emphasis is that standard deviation of the distribution of FN referred to the participants is much more bigger in the case of on-wrist Abbate than for the MAX-PEAK method (0.3351 against 0.0190). This difference highlights that not only on-wrist Abbate suffers from a higher FN rate but it also does it more irregularly. On the other hand, when talking about FP there is also a clear difference between both methods and, on the contrary, MAX-PEAK has a higher FP rate compared to on-wrist Abbate. Indeed, the MAX-PEAK has an overall ratio of 76.480% of FP whereas on-wrist Abbate’s ratios is just 9.310%. This is due to the fact that Abbate’s method includes a finite state machine which allows filtering the peaks that do not belong to a real fall. For instance, in walking TS there some strong peaks can be found (see Fig. 2), but they are usually very close to each other: a finite state machine surely will help to remove those peaks that cannot logically be a fall (for example those pairs of peaks that are closer than two seconds as it is physically impossible to fall two times in such a short period of time). The implementation of a finite state machine for MAX-PEAK is left for further work and research.
Fig. 2. One of the walking TS graph for participant 1. The red points are the detected peaks, while the blue vertical lines splits in 2-s intervals.
The results obtained from classifying the extracted 4x2TRN data set and the 4 models (NN, L-NN, F-NN and B-NN) are shown in Table 2. The values NA in that Table are due to the fact that several participants did not perform the corresponding type of simulated fall. For instance, participant 5 did not simulate lateral falls, while participant 7 did not simulate any type of fall. The 4x2TRN data set is highly unbalanced (more than 10 Not Fall instances for each Fall one);
Peak Detection Enhancement in Autonomous Wearable Fall Detection
55
Table 1. Event detection results for each participant.
Pid
On-wrist Abbate TN FP FN TP
MAX-PEAK TN FP FN TP
1
16
2
5
15
7
11 0
20
2
15
3
2
10
7
11 0
12
3
17
2
2
16
5
14 0
18
4
18
3
7
10
4
17 1
16
5
15
0
0
6
5
10 0
6
6
4
0
2
4
0
7
20
2
0
0
8
16
3
0
0
9
16
2
2
10
19
2
0
11
19
0
12
22
13
7
14 15
4
0
6
2
20 0
0
3
16 0
0
16
5
13 0
18
0
7
14 0
0
1
0
4
15 0
1
1
9
0
0
23 0
0
5
7
4
3
0
5
0
0
6
1
4
0
6
9
1
8
3
3
7
0
11
16
56
8
5
51
8
56 0
56
17
12
6
8
10
3
15 1
17
Total 286 35 56 154 68
253 2
208
9 12
therefore, a balancing method was used. More specifically, the SMOTE method [6] was used, proposing 500 samples from each class. Nevertheless, the results are highly unsatisfactory as long as the Specificity is negligible in all the cases. Seems that the models are almost always proposing falls for each peak. Clearly, this part of the proposal needs further attention and improvement. Firstly, the 4 features might not be representative, perhaps it would be more interesting to introduce more features (such as the Signal Magnitude Area [19]). Moreover, it might be interesting to label the post-peak interval with an activity level (high, medium or low) and use this label as an input to the final classifier. Additionally, mainly if the number of transformations gets increased, Principal Component Analysis should be performed first to select the most interesting and representative axis. However, we haven’t introduced all these issues in this study because we need to introduce the finite state machine in the event detection first. All of the proposals are left to future work.
56
M. Villar and J. R. Villar
Table 2. Results for the classification task: Sens and Spec stand for Sensitivity and Specificity. The NA stands for Not Available: for these participants there were no simulated falls; therefore, both the TP and FN have a value of 0 and, hence, the Sensitivity is not computed. NN Sens
Spec
L-NN Sens Spec
F-NN Sens Spec
B-NN Sens Spec
1
1.0000 0.2710 1.0000 0.0935 1.0000 0.1402 1.0000 0.1308
2
1.0000 0.4343 1.0000 0.0859 1.0000 0.3939 1.0000 0.0657
3
1.0000 0.3571 1.0000 0.0476 1.0000 0.0655 1.0000 0.1548
4
1.0000 0.4688 1.0000 0.0573 1.0000 0.1042 1.0000 0.2552
5
1.0000 0.4386 NA
6
1.0000 0.2308 1.0000 0.0000 1.0000 0.3077 1.0000 0.0769
7
NA
0.4388 NA
0.1939 NA
0.1429 NA
0.0612
8
NA
0.5223 NA
0.1401 NA
0.0382 NA
0.2102
9
1.0000 0.4670 1.0000 0.3208 1.0000 0.1934 1.0000 0.0660
10 NA
0.2632 NA
0.4211 1.0000 0.2895
0.3505 NA
0.1959 NA
0.2887 NA
0.1237
11 1.0000 0.3597 NA
0.4029 NA
0.0935 NA
0.2014
12 1.0000 0.4171 1.0000 0.1991 1.0000 0.1137 1.0000 0.1659 13 1.0000 0.5970 1.0000 0.7761 1.0000 0.5224 1.0000 0.4179 14 1.0000 0.1250 1.0000 0.3750 1.0000 0.2500 1.0000 0.1250 15 1.0000 0.4731 1.0000 0.2688 1.0000 0.3118 1.0000 0.3548 16 1.0000 0.3888 1.0000 0.1367 1.0000 0.0148 1.0000 0.1779 17 1.0000 0.2756 1.0000 0.0321 1.0000 0.0577 1.0000 0.0962
6
Conclusions
This research analyzes an improvement in the peak detection when studying fall detection systems. More specifically, the developed method for detecting peaks amplifies the signal and uses the statistics values to automatically set the firing threshold, adapted to the current user. To our knowledge, this is the first study that proposes an event detection method that automatically adapts to the user behaviour. The results from the experimentation show that the peak detection clearly outperforms the base line method, but suffers of a high percentage of peaks from ADLs. Nevertheless, the automatically threshold set up works extremely well in adapting to each participant. The modeling part still needs more refinements and improvements, which are left to future work. Future work includes developing a finite state machine for the peak detection that filters the successive peaks within two or three seconds and evaluating with several public FD data sets. Moreover, several transformations should extend the 4x2TRN data set for each interval, Principal Component Analysis will be applied before classifying and also a new stage to label the post-peak interval
Peak Detection Enhancement in Autonomous Wearable Fall Detection
57
with the level of activity. Introducing Autoencoders and Deep Learning is also part of future work.
References 1. Abbate, S., Avvenuti, M., Bonatesta, F., Cola, G., Corsini, P.: A smartphone-based fall detection system. Pervasive Mob. Comput. 8(6), 883–899 (2012) 2. Bourke, A., O’Brien, J., Lyons, G.: Evaluation of a threshold-based triaxial accelerometer fall detection algorithm. Gait Posture 26, 194–199 (2007) 3. Casilari, E., Santoyo-Ram´ on, J.A., Cano-Garc´ıa, J.M.: UMAFALL: a multisensor dataset for the research on automatic fall detection. Procedia Comput. Sci. 110, 32–39 (2017). https://doi.org/10.1016/j.procs.2017.06.110 4. Casilari-P´erez, E., Garc´ıa-Lagos, F.: A comprehensive study on the use of artificial neural networks in wearable fall detection systems. Expert Syst. Appl. 138 (2019). https://doi.org/10.1016/j.eswa.2019.07.028 5. Chaudhuri, S., Thompson, H., Demiris, G.: Fall detection devices and their use with older adults. J. Geriatr. Phys. Ther. 37, 178–196 (2014) 6. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002) 7. Delahoz, Y.S., Labrador, M.A.: Survey on fall detection and fall prevention using wearable and external sensors. Sensors 14(10), 19806–19842 (2014). https://doi. org/10.3390/s141019806 8. Fang, Y.C., Dzeng, R.J.: A smartphone-based detection of fall portents for construction workers. Procedia Eng. 85, 147–156 (2014) 9. Fang, Y.C., Dzeng, R.J.: Accelerometer-based fall-portent detection algorithm for construction tiling operation. Autom. Constr. 84, 214–230 (2017) 10. Hakim, A., Huq, M.S., Shanta, S., Ibrahim, B.: Smartphone based data mining for fall detection: analysis and design. Procedia Comput. Sci. 105, 46–51 (2017). https://doi.org/10.1016/j.procs.2017.01.188 11. Huynh, Q.T., Nguyen, U.D., Irazabal, L.B., Ghassemian, N., Tran, B.Q.: Optimization of an accelerometer and gyroscope-based fall detection algorithm. J. Sens. 2015 (2015). Article ID 452078 12. Igual, R., Medrano, C., Plaza, I.: Challenges, issues and trends in fall detection systems. BioMed. Eng. Online 12(1), 66 (2013) 13. Kangas, M., Konttila, A., Lindgren, P., Winblad, I., J¨ amsa¨ a, T.: Comparison of low-complexity fall detection algorithms for body attached accelerometers. Gait Posture 28, 285–291 (2008) 14. Khojasteh, S.B., Villar, J.R., de la Cal, E., Gonz´ alez, V.M., Sedano, J., Yaz¨ gan, H.R.: Evaluation of a wrist-based wearable fall detection method. In: 13th International Conference on Soft Computing Models in Industrial and Environmental Applications, pp. 377–386 (2018) 15. Khojasteh, S.B., Villar, J.R., Chira, C., Gonz´ alez, V.M., de la Cal, E.: Improving fall detection using an on-wrist wearable accelerometer. Sensors 18(5), 1350 (2018) 16. Palshikar, G.K.: Simple algorithms for peak detection in time-series. Technical report. Tata Research Development and Design Centre (2009) 17. Tsinganos, P., Skodras, A.: A smartphone-based fall detection system for the elderly. In: Proceedings of the 10th International Symposium on Image and Signal Processing and Analysis (2017)
58
M. Villar and J. R. Villar
18. Villar, J.R., de la Cal, E., Fa˜ nez, M., Gonz´ alez, V.M., Sedano, J.: User-centered fall detection using supervised, on-line learning and transfer learning. Prog. Artif. Intell. 2019, 1–22 (2019). https://doi.org/10.1007/s13748-019-00190-2 19. Villar, J.R., Gonz´ alez, S., Sedano, J., Chira, C., Trejo-Gabriel-Gal´ an, J.M.: Improving human activity recognition and its application in early stroke diagnosis. Int. J. Neural Syst. 25(4), 1450036–1450055 (2015) 20. Wu, F., Zhao, H., Zhao, Y., Zhong, H.: Development of a wearable-sensor-based fall detection system. Int. J. Telemed. Appl. 2015, 11 (2015). https://doi.org/10. 1155/2015/576364 21. Zhang, S., Wei, Z., Nie, J., Huang, L., Wang, S., Li, Z.: A review on human activity recognition using vision-based method. J. Healthc. Eng. 2017 (2017). https://doi. org/10.1155/2017/3090343 22. Zhang, T., Wang, J., Xu, L., Liu, P.: Fall detection by wearable sensor and one-class SVM algorithm. In: Huang, D.S., Li, K., Irwin, G.W. (eds.) Intelligent Computing in Signal Processing and Pattern Recognition. Lecture Notes in Control and Information Systems, pp. 858–863. Springer, Berlin (2006)
Automated Detection of Tuberculosis from Sputum Smear Microscopic Images Using Transfer Learning Techniques Lillian Muyama, Joyce Nakatumba-Nabende(B) , and Deborah Mudali Makerere University, Kampala, Uganda [email protected], [email protected], [email protected]
Abstract. Tuberculosis is a contagious disease and is one of the leading causes of death especially in low and middle income countries such as Uganda. While there are several ways to diagnose tuberculosis, sputum smear microscopy is the commonest method practised. However, this method can be error prone and also requires trained medical personnel who are not always readily available. In this research, we apply deep learning models based on two pre-trained Convolutional Neural Networks: VGGNet and GoogLeNet Inception v3 to diagnose tuberculosis from 148 Ziehl-Neelsen stained sputum smear microscopic images from two different datasets. These networks are used in three different scenarios, namely, fast feature extraction without data augmentation, fast feature extraction with data augmentation and fine-tuning. Our results show that using Inception v3 for fast feature extraction without data augmentation produces the best results with an accuracy score of 86.7%. This provides a much better approach to disease diagnosis based on the use of diverse datasets from different sources and the results of this work can be leveraged in medical imaging for faster tuberculosis diagnosis.
Keywords: Tuberculosis
1
· VGGNet · Inception v3 · Transfer learning
Introduction
Tuberculosis (TB) [7] is an infectious disease that causes ill health for approximately ten million people each year and is the ninth leading cause of death worldwide [26]. In a survey carried out by WHO and the Global Fund in partnership with the Ministry of Health National Tuberculosis and Leprosy Program (NTLP), it was found that there has been a 60% increase in TB prevalence in Uganda with the number of cases going up from 159 to 253 per 100,000 people in 2015 [5]. Unfortunately, Uganda has very few trained medical personnel. For example, in 2014, Uganda had only 12 doctors per 100,000 people [9]. The most common means of diagnosing tuberculosis is sputum smear microscopy [15]. It has been the main diagnosis method for pulmonary tuberculosis in low c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 59–68, 2021. https://doi.org/10.1007/978-3-030-49342-4_6
60
L. Muyama et al.
and middle income areas because it is a simple, rapid and inexpensive technique. Sputum samples stained with certain dyes that are retained mainly by mycobacteria are examined using a microscope to determine the presence of bacteria. However, the tuberculosis diagnosis process requires very experienced and trained medical personnel to analyze the samples and determine whether a patient has tuberculosis or not which Uganda severely lacks [16]. Hence, there is a need for automatic tuberculosis detection such that even where resources are limited especially trained personnel, a timely diagnosis can still be achieved. In recent times, deep learning methods have come to the forefront of medical diagnostics and Convolutional Neural Networks (CNNs) are leading the charge in this regard [4]. While seemingly relatively new compared to the traditional machine learning techniques such as support vector machines and random forest, CNNs have been proven to give far better performance results for the same tasks than the aforementioned methods and are currently revolutionizing the Computer Vision field [3]. A few researchers have already attempted to apply these deep learning methods to automatically detect Tuberculosis using sputum smear microscopic images [10,13,18,19]. Nevertheless, there is need for further study to improve on the performance of the models that have already been developed by addressing some of the existing gaps. For instance, the usage of a small dataset to train a convolutional neural network built from scratch, the use of images obtained using a single source or with a singular background, or the exclusion of images with occluded bacilli which would not be the case in the real world. It was therefore imperative to develop a new strategy for automatically detecting tuberculosis while putting into consideration all these deficiencies. The main contribution of this paper is on the use of transfer learning on a diverse dataset with images from different sources and with different backgrounds. Because the training dataset is small, CNNs that have been previously trained on a large dataset are used since they have already learned patterns from the previous tasks [14] and also, the assumption is that the pre-trained CNNs serve as a generic model of the visual world [24]. Furthermore, a comparative analysis of two of the most important CNN architectures is made on a dataset for sputum smear microscopic images which had not been done before. The rest of the paper is structured as follows: Sect. 2 presents the most relevant work done previously in relation to this research. Section 3 shows the methodology that was used. Sections 4 and 5 present the results of this study and a discussion of these results respectively. Section 6 discusses the findings of the research as well as possible future work.
2
Related Work
Adgaonkar [2] proposed the use of image processing techniques and neural network classifiers to automatically detect Tuberculosis bacilli using auramone stained specimens of sputum. The input was an RGB image of sputum obtained using a photomicroscope. Neural networks were used in the recognition process
Tuberculosis Detection Using Transfer Learning
61
and the study yielded a sensitivity score of 93.5%. The authors in [10] proposed a patch wise detection strategy from the input image and these were classified one at a time for the presence or lack thereof of bacilli. A simple neural network with a 5-layered fully-convoluted neural network architecture was used. The proposed method basically a microscopy image with proper zoom level as input and returned the location of suspected bacilli as output. Quinn [19] evaluated the performance of deep Convolutional Neural Networks on three different microscopy tasks i.e., malaria diagnosis in thick blood smears, tuberculosis diagnosis in sputum samples and intestinal parasite eggs in stool samples. The images used were captured using a low-cost smart phone. For the tuberculosis diagnosis, the images were made from fresh sputum, stained using Ziehl-Neelsen stain and examined under x1000 magnification. Surgirtha and Murugesan [21] proposed a color segmentation and classification method for the automated detection of Tuberculosis using sputum smear images. The approach used segmented the bacilli from the image using its characteristics by a technique called Particle Swarm Optimization that was dependent on pixel intensities. After segmentation, the candidate bacilli then underwent morphological operations then, connected components analysis was used to group them together. Classification was done using the random forest technique. Lopez [13] presented a method for automatically classifying light field smear microscopy patches using RGB, R-G and grayscale patches as inputs to three different convolutional neural network models for identifying Mycobacterium Tuberculosis. The dataset used consisted of both negative and positive patches. The best result achieved in the patch classification test was using the R-G input and the CNN with the three convolution layers whilst implementing regularization. It achieved an Area under the ROC curve of 99%. In their paper [18], Panicker presented an automatic method for the detection of tuberculosis from sputum smear microscopic images. They used image binarization and then subsequently classified the detected regions using a convolutional neural network. A dataset of 22 images was used and the achieved a recall and precision score of 97.13% and 78.4% respectively. However, the work done previously had some gaps such as not taking into consideration images with overlapping bacilli, using images obtained exclusively from one source e.g. smart phones, and the usage of relatively small data sets yet the CNN being used is being built from scratch. This research aimed to address these gaps. Due to the size of the dataset, pre-trained CNNs were used as they had already been trained on a very large set of images. Additionally, this research makes use of a diverse dataset i.e., images used are from multiple sources namely smart phones, digital microscopes and cameras with different backgrounds. Also, images with singular, a few, many and occluded bacteria were used. Lastly, this research is the very first to employ transfer learning techniques, in this case using pre-trained models, to detect tuberculosis in sputum smear microscopic images.
62
L. Muyama et al.
3
Methods
3.1
The Datasets
The data used in this research consisted of Ziehl-Neelsen sputum smear microscopic images obtained from two datasets: – The Artificial Intelligence Research Laboratory situated at the College of Computing and Information Sciences1 . These images were captured exclusively using smart phones and were used in this research [19]. – An online database [28] which contains images captured using various phones, cameras and microscopes. The images were all in JPEG format and they were normalized before usage. Figures 1 and 2 show a sample of the positive and negative images that were used for training the model.
Fig. 1. Sample of the positive images used.
Fig. 2. Sample of the negative images used.
Structuring the Dataset: The training data set comprised eighty percent of the original dataset since it is the de-facto standard [6]. This is because the training set needed to be large enough to achieve low variance over the model parameters and it also allowed for another 80/20 split to create the validation set [6]. It had both positive and negative images and its purpose was to train the CNNs. The test data set comprised of twenty percent (the rest) of the original dataset. Again, this is because enough data was needed to observe low variance among the performance results [6]. It also had both positive and negative images and its purpose was to test the performance of the CNNs. 1
http://air.ug/microscopy.
Tuberculosis Detection Using Transfer Learning
3.2
63
Model
Convolutional Neural Networks [11] are deep Artificial Neural Networks that are used for image recognition problems. They have the ability to detect and learn features directly from data instead of the traditional feature engineering [27]. Currently, they are at the forefront of image processing techniques and the main reason why deep learning is being given much wider attention. A CNN has multiple convolutional and subsampling layers optionally followed by fully connected layers. Pre-trained Convolutional Neural Networks are CNNs that have been trained on large datasets. These have been proven to be better feature extractors [20] than more state-of-the-art methods and in some cases, have given a better accuracy rate than CNNs built from scratch [8]. This is because built-up CNNs usually need a very large data set for training to achieve reasonably good results [8,13]. Additionally, the training process is expensive and time consuming. 3.3
Transfer Learning
For this research, two pre-trained Convolutional Neural Networks i.e., VGGNet and GoogLeNet Inception v3 are used. This is because, as seen from previous research that compares the performance of various pre-trained CNN architectures [12,17], these two tend to perform the best. VGGNet is a 16-layer CNN developed by the Visual Geometry Group [25], a world renowned research group. It strictly uses 3 × 3 filters with stride and pad of 1, along with 2 × 2 maxpooling layers with stride 2. The number of filters doubles after each maxpool layer which reinforces the idea of shrinking spatial dimensions but growing depth. VGGNet has been proven to work well on both image classification and localization tasks and has reinforced the notion that CNNs require a deep network of layers in order for the hierarchical representation of visual data to work. Inception v3 is the third version of GoogLeNet [22,23]. It has 42 layers and 11 inception modules where each module consists of pooling layers and convolutional filters with rectified linear unit as the activation function. 3.4
Training Step
The setup for training was considered such as the activation functions, preprocessing, weight initialization, regularization, gradient checking, hyper-parameter optimization. In data preprocessing, tasks were carried out to transform the image data before feeding it as input into the algorithm. Some of the tasks included normalizing the image inputs, data augmentation etc. To introduce non-linearity in the model, the ReLU activation function was used because it runs much faster than the sigmoid and Tanh functions [1]. To prevent the model from overfitting, dropout regularization was used. Additionally, RMSProp optimizer was used to minimize the loss function.
64
4
L. Muyama et al.
Experiments and Results
A total of 148 images was used with a total of 88 images for training, 30 images were used for validation and 30 for testing. These were evenly distributed across the two classes: the training data comprised of 44 positive images and 44 negative images. These images were all in JPEG format. After collecting the images, a data folder was created with three sub-folders i.e. training, validation and test. In turn, these subfolders each had two additional subfolders for positive and negative images with and without Tuberculosis bacilli respectively. As a baseline for the research, classification of the sputum smear microscopic images was done using both Logistic Regression and a convolutional neural network with four convolution layers that was built from scratch. The results are presented in Table 1. Table 1. Performance for baseline models. Model
Accuracy Precision Recall
Logistic regression 0.567
0.667
0.609
Simple CNN
0.729
0.726
0.728
For the implementation of the deep learning experiments, Keras library was used with a TensorFlow backend. Three experimentation scenarios were used and the results are detailed below: 4.1
Fast Feature Extraction Without Data Augmentation
Under the fast feature extraction without data augmentation, the CNNs were used to extract features from the image. This was through using the representations already learned by the CNNs to extract interesting features from the new data [12]. The results are as shown in Table 2. Table 2. Performance for fast feature extraction without data augmentation.
4.2
Model
Accuracy Precision Recall
VGGNet
0.767
0.722
0.867
Inception v3 0.867
0.789
1.000
Fast Feature Extraction with Data Augmentation
The next set of experiments used fast feature extraction with data augmentation. Data augmentation is a technique used in deep learning to prevent overfitting
Tuberculosis Detection Using Transfer Learning
65
[12]. It adds more data to the training set by creating new images through performing rotation or reflection of the image, zooming in and out, shifting, distortion etc. The results for fast feature extraction with data augmentation are shown in Table 3. Table 3. Performance for fast feature extraction with data augmentation.
4.3
Model
Accuracy Precision Recall
VGGNet
0.800
0.795
0.808
Inception v3 0.722
0.754
0.808
Fine-Tuning
Fine-tuning slightly adjusts the more abstract representations of the pre-trained models and makes them more relevant to the new problem [12]. The results from fine tuning of the model are shown in Table 4. Table 4. Performance for fine-tuning the pre-trained CNNs.
5
Model
Accuracy Precision Recall
VGGNet
0.796
0.795
0.789
Inception v3 0.768
0.748
0.808
Discussion of Results
As a baseline, classification of the images was done using Logistic Regression and a simple CNN that was built from scratch. An accuracy of 56.7% and 72.8% was got from the two models respectively. For the research experiments, the pre-trained CNNs were used in three scenarios. Firstly, fast feature extraction on the images was carried out without data augmentation. The accuracy obtained was 76.7% and 86.7% from VGGNet and Inception v3 respectively. In the next instance, the use of data augmentation was introduced. The accuracy then increased to 80% for VGGNet. However, there was a decrease for Inception v3 to 77.2%. Lastly, fine-tuning, which is a complementary technique to feature extraction, gave us an accuracy of 79.6% for VGGNet and 76.8% for Inception v3. This showed almost the same performance as in the previous scenario for both CNNs. From these results, it can be seen that in all three scenarios, the four CNN models generally had a way better performance than the Logistic Regression supporting the theory that CNNs generally perform better than the traditional
66
L. Muyama et al.
machine learning approaches [3]. Also, overall, the pre-trained CNNs had a better performance than the simple CNN built from scratch also supporting the theory that sometimes pre-trained CNNs perform better than CNNs built from scratch especially in the case of small datasets [8]. As for the pre-trained models, it can be seen that in the first scenario (fast feature extraction without data augmentation), Inception v3 outperforms VGGNet while the reverse is true for the other two scenarios i.e., fast feature extraction with data augmentation and fine-tuning. Overall, using Inception v3 for fast feature extraction without data augmentation produces the best results when comparing the two CNNs in all three scenarios. It should also be noted that Inception v3 fast feature extraction without data augmentation has the shortest execution time. When compared to previous work and the baseline models, this performance is very good. In their paper [10], Kant and Srivastava achieved recall and precision scores of 83.78% and 67.5% compared to the 78.9% and 100% in this paper. These results could be explained by the fact that Inception v3’s architecture is way more complex than VGGNet’s. While data augmentation is a particularly useful technique in enlarging a dataset and making the model invariant to various transformations, for the dataset used in this research, these techniques do not bring any new additional information that could help improve the learning and generalization abilities of the models used. Additionally, when fine-tuning a pre-trained CNN, there are two things to put into consideration i.e., the size of the training set as well as the similarity to the task to which the pre-trained model was initially trained on. A dataset may be considered small if it has less than 1000 images per class [2]. In our case, the dataset is not only small but the dataset of microscopic sputum smear images is very different from the ImageNet dataset. This could explain why fine-tuning did not have a positive noticeable effect on the results. For it to work and give the best results, a larger dataset would be needed as this would help the CNNs gain a better understanding of the underlying representation in the data.
6
Conclusion
The performance showed by the models in this study goes a long way in advancing this research area of automatically detecting diseases using machine learning techniques. This is especially relevant in developing countries where medical care/access today is still a far cry from what it should be and where the number of medical personnel is small compared to the general population. Even where medical personnel are present, this model could severely reduce the workload and increase the pace at which Tuberculosis is diagnosed which can be a matter of life and death. Furthermore, due to this research, the differences between using VGGNet and Inception v3 were highlighted. It was seen that using Inception v3 for fast feature extraction without data augmentation not only uses the least computer resources but also gives the best results. The high recall score is also very desirable since it shows that people that have TB have a very high chance of being diagnosed correctly which is essential in the medical field.
Tuberculosis Detection Using Transfer Learning
67
As for future work, a similar model could be developed for Fluorescence microscopic images as well as other kinds of stained images. Additionally, a larger and more diverse dataset can be used to train the model. This could contain images from different sources (phones, digital microscopes, cameras) as used in this research, with different zoom levels and resolution etc. It could also contain microscopic images with more diverse backgrounds and at different autofocusing levels. This would be a step closer to fully automating the TB detection process. The concept of ensemble learning could also be explored in this scenario. Furthermore, other pre-trained CNNs such as ResNet and AlexNet could be used on the same dataset to see how they compare with VGGNet and Inception v3 not just in terms of accuracy but also in terms of computing resources. Acknowledgments. This work was supported by the SIDA project 381 under the Makerere-Swedish bilateral research programme 2015–2020.
References 1. Activation functions in Neural Networks. https://www.geeksforgeeks.org/ activation-functions-neural-networks/ 2. Adgaonkar, A., Atreya, A., Mulgund, A.D., Nath, J.R.: Identification of Tuberculosis Bacilli using Image Processing. Int. J. Comput. Appl. (IJCA) ICONET-2014, 0975–8887 (2014) 3. A Beginners Guide to Convolutional Neural Networks (CNNs). https://skymind. ai/wiki/convolutional-network 4. Bakatoor, M., Radosav, D.: Deep learning and medical diagnosis: a review of literature. Multimodal Technol. Interact. 2(3) (2018). https://doi.org/10.3390/ mti2030047 5. Bwambale, T.: Tuberculosis prevalence rises by 60% survey. http://www. newvision.co.ug/newvision/news/1460677/tuberculosis-prevalence-rises-survey 6. Damien: How to split a dataset. https://www.beyondthelines.net/machinelearning/how-to-split-a-dataset 7. Fogel, N.: Tuberculosis: a disease without boundaries. Tuberculosis 95(5), 527–531 (2015). https://doi.org/10.1016/j.tube.2015.05.017 8. Gupta, D.S.: Transfer learning & The art of using Pre-trained Models in Deep Learning. https://www.analyticsvidhya.com/blog/2017/06/transfer-learning-theart-of-fine-tuning-a-pre-trained-model 9. Health Access Corps “Healthcare in Uganda. Lets do the numbers. http:// healthaccesscorps.org/blog/2014/12/3/healthcare-in-uganda-lets-do-the-numbers 10. Kant, S., Srivastava, M.M.: Towards automated tuberculosis detection using deep learning. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp.1250-1253. IEEE, Bangalore, India (2018). https://doi.org/10.1109/SSCI.2018. 8628800 11. Karpathy, A., Toderici, G., Shetty, S., Leung , T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Columbus, OH, USA (2014). https://doi.org/10.1109/CVPR.2014.223 12. Lopes, U.K., Valiati, J.F.: Pre-trained convolutional neural networks as feature extractors for tuberculosis detection in biology and medicine 89, 135–143 (2017). https://doi.org/10.1016/j.compbiomed.2017.08.001
68
L. Muyama et al.
13. Lopez, Y. P., Filho, C. F. F C., Aguilera, L. M. R., Costa, M. G. F.: Automatic classification of light field smear microscopy patches using Convolutional Neural Networks for identifying Mycobacterium Tuberculosis. In: CHILECON. IEEE, Pucon Chile (2017). https://doi.org/10.1109/CHILECON.2017.8229512 14. Marcelino, P.: Transfer learning from pre-trained models. http:// towardsdatascience.com/transfer-learning-from-pre-trained-models-f2393f124751 15. Molicotti, P., Bua, A., Zanetti, S.: Cost-effectiveness in the diagnosis of tuberculosis: choices in developing countries. J. Infect. Dev. Countries 8(01), 024–038 (2014). https://doi.org/10.3855/jidc.3295 16. Mwesigwa, A.: Uganda crippled by medical brain drain. http://www.theguardian. com/global-development/2015/feb/10/Uganda-crippled-medical-brain-draindoctors ¨ 17. Ozgenel, C ¸ . F., Sorgu¸c, A.G.: Performance comparison of pretrained convolutional neural networks on crack detection in buildings. In: 35th International Symposium on Automation and Robotics in Construction (ISARC) (2018). https://doi.org/10. 22260/ISARC2018/0094 18. Panicker, R.O., Kalmady, K.S., Rajan, J., Sabu, M.K.: Automatic detection of tuberculosis bacilli from microscopic sputum images using deep learning methods. Biocybern. Biomed. Eng. 38, 691–699 (2018). https://doi.org/10.1016/j.bbe.2018. 05 19. Quinn, J.A., Nakasi, R., Mugagga, P.K.B., Byanyima, P., Lubega, W., Andama, A.: Deep convolutional neural networks for microscopy-based point of care diagnostics. In: Machine Learning for Healthcare Conference, pp. 271-281. Los Angeles, California (2016) 20. Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition. In: CVPR 2014 Deep Vision Workshop, pp. 512–519. IEEE, Columbus, OH, USA (2014). https://doi.org/10.1109/CVPRW. 2014.131 21. Surgitha, G.E., Murugesan, G.: Detection of tuberculosis bacilli from microscopic sputum smear images. In: ICBSII. IEEE Press, Chennai, India (2017). https://doi. org/10.1109/ICBSII.2017.8082271 22. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich A.: Going deeper with convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Press, Boston (2015). https://doi.org/10.1109/CVPR.2015.7298594 23. Szegedy, C., Vanhoucke, V., Ioffe S., Shlens, J.,Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Press, Las Vegas (2016). https://doi.org/10. 1109/CVPR.2016.308 24. Transfer Learning Using Pretrained ConvNets. http://www.tensorflow.org/ tutorials/images/transferlearning 25. Visual Geometry Group. http://www.robots.ox.ac.uk/vgg/research/verydeep/ 26. World Health Organization: World Health Organization Global Tuberculosis Report 2017. World Health Organization, Geneva, Switzerland. https://www.who. int/tb/publications/global report/gtbr2017 main text.pdf 27. Zheng, L., Yang, Y., Tian, Q.: SIFT meets CNN: a decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 40(5), 1224–44 (2017) 28. ZNSM iDB: Ziehl Neelsen Sputum Smear Microscopy Image Database. https://14. 139.240.55/znsm/
Comparative Performance Analysis of Neural Network Base Training Algorithm and Neuro-Fuzzy System with SOM for the Purpose of Prediction of the Features of Superconductors Subrato Bharati1(B) , Mohammad Atikur Rahman1 , Prajoy Podder2 , Md. Robiul Alam Robel3 , and Niketa Gandhi4 1 Department of EEE, Ranada Prasad Shaha University, Narayanganj, Bangladesh
[email protected], [email protected] 2 Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh [email protected] 3 Department of CSE, Cumilla University, Cumilla, Bangladesh [email protected] 4 University of Mumbai, Mumbai, Maharashtra, India [email protected]
Abstract. In this paper, the Neural Network based training algorithm has been discussed briefly. Correlation of coefficient (Training, Validation, Testing) using the Levenberg-Marquardt algorithm for superconductivity has been observed graphically. The same operation has been performed applying Bayesian Regularization algorithm and Scaled Conjugate Gradient algorithm. Mean Square Error and Regression has been calculated according to training, validation and testing using Bayesian Regularization, Scaled Conjugate Gradient and Levenberg-Marquardt algorithm for superconductor dataset. The target variable is the critical temperature of the superconductor. The regression value of Scaled Conjugate Gradient, Bayesian Regularization, Levenberg-Marquardt algorithm for superconductor dataset is 0.809214,0,0.854644, respectively which concludes that the LevenbergMarquardt algorithm provides comparatively larger regression (R) value among them in validation state. Error histogram with 20 bins has been explained visually with simulation Bayesian Regularization, Levenberg-Marquardt, and Scaled Conjugate Gradient algorithm. Neuro-Fuzzy system structure and Self-Organizing Maps (SOM) has also been implemented in this paper which provides the supremacy of the proposed work. The main benefit of SOM is that it is a useful multivariate visualization technique that permits the multidimensional data to be exposed as a 2-dimensional map. Keywords: Artificial Neural Network (ANN) · Levenberg-Marquardt · Bayesian Regularization · Scaled Conjugate Gradient · Self-Organizing Maps (SOM)
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 69–79, 2021. https://doi.org/10.1007/978-3-030-49342-4_7
70
S. Bharati et al.
1 Introduction Superconductors are such type of materials that conduct current. But they exhibit zero resistance. There are many applications of superconducting materials. One of them is the Magnetic Resonance Imaging technique. Superconducting dc power cables can be used in order to improve the efficiency of electricity transmission because these cables dissipate very low power. Superconductivity has been fruitfully applied to a considerable number of large-scale particles in the field of physics. For example, magnets exist superconductivity, superconducting accelerator cavities and sensors are used in accelerators in Large Hadron Collider (LHC) at CERN (European Nuclear Research Center). Superconductors are also used in a wireless receiver system, superconducting magnetic energy storage, etc. There are two important issues working behind the wide use of superconductors according to the researchers: (i) A superconductor has zero-ohm resistance but it conducts current at or below its Tc (Critical Temperature). (ii) Scientific theories and postulates cannot predict the superconducting critical temperature quite well [1, 2]. The Neural Network Base Training Algorithm has been discussed in this paper for analyzing the superconductor dataset in order to predict the features or attributes. An Artificial Neural Network (ANN) is a standard statistical process which can discover the interactions between variables with very high precision or accuracy [3, 4]. An ANN contains three layers such as input, hidden, and output layers, therefore it is mentioned to as a three-layer network. The input layer covers independent variables that are associated with the hidden layer for allowance. In the output layer, the classification or prediction procedure is completed and the outcomes are offered with a small estimate error [5–7]. In ANNs, about regularization methods are castoff with the backpropagation training algorithm to achieve a slight error. This sources the network reply to be flatter and fewer probable to overfit the training configurations [8, 9]. In the middle of regularization methods, Bayesian regularization (BR) and Levenberg–Marquardt (LM) is capable of obtaining lower MSE than some other algorithms [10–12]. In this Paper, a superconductor dataset prepared by Japan’s National Institute for Materials Science (NIMS) has been used. 21,263 superconductors are used after the operation of data preprocessing. Features (or predictors) are needed to derive constructed on the superconductor’s elemental properties. These elements may help in predicting superconducting critical temperature. The main contribution of the paper can be summarized as follows: (1) Neural network based training algorithm has been employed in the superconductor dataset. (2) Bayesian Regularization, Levenberg-Marquardt, and Scaled Conjugate Gradient algorithm has been used and Mean square error, Regression coefficient has been calculated for testing, training and validation case. Iteration stops after reaching the maximum number of epochs. Levenberg–Marquardt (LM) algorithm is a combination of two iterative Methods-Gauss Newton and Gradient descent applied to elucidate nonlinear least-square problems. Bayesian regularization algorithm is more strong than standard back-propagation neural networks. It can reduce or eradicate the need for prolonged cross-validation.
Comparative Performance Analysis of Neural Network Base Training Algorithm
71
(3) Data set has been represented applying Neuro-Fuzzy network by making a hybrid structure which has 1396 nodes. (4) SOM (Self Organizing Maps) has been introduced for the purpose of prediction of the features of superconductor.
2 Literature Review Valentine Stanev extracted a list of approximately 16,400 compounds, of which 4000 have no critical temperature reported from Superconductors database [13]. So as to predict the critical temperature for all compounds, Yao Liu et al. developed a prediction model based on the support vector regression (SVR) ML method using structural and electronic parameters [14]. Sajjad Ali Haider [15] recommended a process based on Artificial Neural network for the purpose of predicting critical currents on a superconducting Nb film at various fields as well as temperatures. They presented a descriptive analysis of feedforward, cascaded networks and layer-recurrent networks with the help of six dissimilar learning procedures such as LM, Bayesian regularization, conjugate gradient, BFGS, Resilient backpropagation algorithm, etc. Previously, no other researcher worked on superconductor using ANN (Training, Validation, Testing). There is limited work on a superconductor. Deepika Garg et al. [16] proposed a neural network in Bayesian regularized decision Tree collective model. Based on the simulation they claimed that the proposed Bayesian regularized Neural network decision Tree method accomplishes a noteworthy reduction in time complexity as well as maintains high accuracy. Alomari et al. [17] proposed a forecasting model in order to predict the next-day solar photovoltaic power with the help of the Bayesian Regularization (BR) and Levenberg-Marquardt (LM) algorithms. The author showed a good result with an outstanding real-time performance for next-day forecasting with an RMSE value of 0.0706 applying BR algorithm using 28 hidden layers. The Levenberg-Marquardt algorithm using 23 hidden layers provided an RMSE of 0.0753 for the same inputs in their experimental results. Alomari concluded finally in his paper that the Bayesian regularization algorithm comparatively gives good performance in their proposed scheme in real-time prediction schemes for the PV power construction.
3 Artificial Neural Network Artificial neural networks (ANN) are computing schemes inaccurately motivated by the biological neural networks that found animal brains. In ANN applications, the signal at a construction among artificial neurons that is an actual number, as well as the productivity of respective artificial neuron is considered by around function in non-linear of its all inputs and its sum. The networks among artificial neurons are known as ‘edges’. Naturally, artificial neurons are united into layers [18, 25].
4 Neuro-Fuzzy Fuzzy logic facilitates approximate human reasoning capabilities for applying knowledge-based systems by providing inference morphology. Theoretically fuzzy logic
72
S. Bharati et al.
delivers a mathematical strength to capture the uncertainties related to human cognitive processes and solves the limitation of the conventional approaches. ANN can work like parallel distributed computing networks. Hybrid systems have consisted of fuzzy logic design, neural networks, genetic algorithms, and expert systems. The resulting hybrid system is denoted as fuzzy neural, neural fuzzy or neuro-fuzzy network. The training dataset is a preliminary element, for implementing Neuro-Fuzzy Design. Each row of the training dataset is a preferred input/output pair of the target system that is needed to build the model. It should be distinguished that the number of rows of training data is equal to the number of training data pairs [19].
5 Levenberg-Marquardt Algorithm Resembling the quasi-Newton approaches, the Levenberg-Marquardt algorithm [20] was considered to method and training rapidity second-order deprived of taking to calculate the Hessian matrix. While the function of enactment has the system of an entirety of squares and that is characteristic in networks for training which is feedforward [21].
6 Bayesian Regularization Bayesian regularization is stronger than ordinary back-propagation networks and can eliminate or decrease the requirement for large cross-validation. It is an accurate procedure that changes a nonlinear regression into a “well-posed” problem of statistic in the way of a fold regression [22]. In the Bayesian outline, the network weights are measured arbitrary variables. The weights of the density function can be simplified consistent with Bayes’ rule which is described in Eq. 1. P(w|X , α, β, M ) =
P(X |w, β, γ )P(w|α, γ ) P(X |α, β, γ )
(1)
7 Scaled Conjugate Gradient A Scaled Conjugate Gradient (SCG) is a directed learning algorithm using superlinear convergence rate is presented. SCG customs second-order info from the neural network. The presentation of SCG is benchmarked against the performance of the ordinary backpropagation procedure [23]. The impression is to evaluate the term sk = E wk − pk in Conjugate Gradient with a non-symmetric calculation of the formula sk = E wk − pk E wk + σk pk − E wk = ; where 0 < σk 1 (2) σk
Comparative Performance Analysis of Neural Network Base Training Algorithm
73
8 SOM Self-organizing maps (SOM) is introduced in order to organize input vectors as well as to demonstrate how they can be effectively clustered in the space in input. Two different approaches can be applied for clustering using two-dimensional SOM. They are: (i) The input space can be transformed into map space and then clusters can be established there. (ii) Clusters can be formed in the weight space by aggregating the weight vectors. There are several ways available for computing the numerical value of each processing element in the map space at the time of clustering as well as approximating the PDF (probability density function) of the input vectors [24].
9 Simulation Results Superconductor data set is used in this paper. This data contains records of 21263 instances with 81 attributes to predict and analyze superconductivity. These attributes are a number of elements, mean atomic mass, entropy atomic mass, range atomic mass, mean electron affinity, critical temperature, etc. Figure 1 shows that number of nodes: 1396, number of linear parameters: 656, number of nonlinear parameters: 1296, the total number of parameters: 1952, number of training data pairs: 21263, number of checking data pairs: 0, number of fuzzy rules: 8. Figure 1 shows the Hybrid Neuro-Fuzzy system architecture which is used in simulation purpose. Input data is 81x3000 matrix, representing static data: 81 samples of 3000 elements. Target data is a 1x3000 matrix, representing static data: 1 sample of 3000 elements.
Fig. 1. Neuro-Fuzzy hybrid structure of superconductor
Table 1 describes MSE and Correlation and coefficient (R) for training, validation, testing using Levenberg-Marquardt algorithm here validation and testing is 15% of its sample. Therefore training, validation and testing samples are 2100, 450 and 450 respectively. Bayesian Regularization algorithm has been discussed in Table 2 where validation and testing is 15% of its sample. The value of R in training is 0.960023 and MSE is 0.23. Scaled Conjugate Gradient algorithm has been discussed in Table 3 where
74
S. Bharati et al.
validation and testing is 15% of its sample. The value of R in validation is 0.809214 and MSE is 0.51. For testing network, MSE is 1 and the correlation of coefficient R is 0.783944. Table 1. Calculation of MSE and R according to training, validation and testing using LevenbergMarquardt algorithm for superconductor dataset Samples MSE R Training
2100
0
0.931108
Validation
450
1
0.854644
Testing
450
0.96
0.849070
Table 2. Calculation of MSE and R according to training, validation and testing using Bayesian Regularization algorithm for superconductor dataset Samples MSE R Training
2100
0.23
0.960023
Validation
450
0
0
Testing
450
1
0.835412
Table 3. Calculation of MSE and R according to training, validation, and testing using scaled conjugate gradient algorithm for superconductor dataset Samples MSE R Training
2100
0
0.842486
Validation
450
0.51
0.809214
Testing
450
1
0.783944
Figure 2 (a) and (b) visualizes error histogram with 20 bins using LevenbergMarquardt and Bayesian Regularization algorithm respectively for superconductivity dataset where X-axis indicates error (targets-outputs) and Y-axis indicates instances. Blue, green and red marking specify training, validation, and test respectively where the zero error point is 0.000306 and 5.752 respectively. Figure 2 (c) illustrated error histogram with 20 bins using Scaled Conjugate Gradient algorithm for superconductivity. Zero error occurs when the value of x-axis is -2.296 where the error is the numerical difference of targets and outputs. Figure 3 (a) visualizes Levenberg-Marquardt algorithm training state for superconductivity, X-axis indicates 23 epochs and Y-axis indicates gradient and val. fall. Figure 3 (a) calculates gradient which is 128.0178 and also calculates validation checks which are 6, at epoch 23. Figure 3 (b) visualizes Bayesian Regularization algorithm training state
Comparative Performance Analysis of Neural Network Base Training Algorithm
75
Error Histogram with 20 Bins 1000
Training Validation Test
900
Zero Error
800
700
Instances
600
500
400
300
200
100
84.31
59.02
75.88
50.59
67.45
25.29
42.16
16.86
33.73
8.432
-16.86
0.000306
-25.29
-8.431
-50.59
-33.73
-59.02
-42.16
-67.45
-75.88
0
Errors = Targets - Outputs
(a)
(b)
(c)
Fig. 2. Visualization of error histogram using (a) Levenberg-Marquardt algorithm (b) Bayesian Regularization algorithm (c) Scaled Conjugate Gradient algorithm for superconductivity
for superconductivity, X-axis indicates 101 epochs and Y-axis indicates gradient and val. fall. Figure 3 (b) estimates gradient which is 436.833 and also determines validation checks which are 6, at epoch 101. Figure 3 (c) imagines Scaled Conjugate Gradient algorithm training state for superconductivity, X-axis specifies 143 epochs and Y-axis specifies gradient and val. fall. From the Fig. 3 (c), it can be mentioned that the value of gradient is 114.0222 for 143 number of epochs. Gradient = 128.0178, at epoch 23
5
gradient
10
mu
10
10
Mu = 1, at epoch 23
0
-2
Validation Checks = 6, at epoch 23
6
val fail
4
2
0 0
5
10
15
20
23 Epochs
(a)
(b)
(c)
Fig. 3. (a) Levenberg-Marquardt algorithm (b) Bayesian regularization algorithm (c) Scaled conjugate gradient algorithm
In Fig. 4(a), the property shows the iteration as a result of which the performance of validation touched a minimum value. Figure 4(a) does not point out any major problems with the training. The test curves and validation are very similar. Figure 4(a) mainly shows 23 epochs vs MSE curve where the best validation performance is 225.0304, rescaling MSE value then obtained 1 (Table 1 and Fig. 4(a)) at epoch 17. In Fig. 4(b), the property shows the iteration as a result of which the performance of validation touched a minimum. Figure 4(b) does not point out any major problems with the training. The test curves and validation are very similar. Figure 4(b) mainly shows 101 epochs vs MSE curve where the best validation performance is 71.8817, rescaling MSE value then obtained 0 (Table 2 and Fig. 4(b)) at epoch 101. In Fig. 4(c), the property shows the iteration as a result of which the performance of validation touched a minimum. Figure 4(c) does not point out slightly main training problems. The test curves then validation are actually comparable. Figure 4(c) mainly shows
76
S. Bharati et al. Best Validation Performance is 225.0304 at epoch 17
10
4
Train Validation Test
Mean Squared Error (mse)
Best
10
3
10
2
10
1
5
0
10
20
15
23 Epochs
(a)
(b)
(c)
Fig. 4. Superconductivity performance curve using (a) Levenberg-Marquardt algorithm (b) Bayesian regularization algorithm (c) Scaled conjugate gradient algorithm
143 epochs vs MSE curve where the best validation performance is 310.6094rescaling MSE value then obtained 0.51 (Table 3 and Fig. 4(c)) at epoch 137. The first three plot (Fig. 5 (a)) characterizes the validation, training, besides testing data. The rushed line in this plot denotes the accurate targets (result–outputs). The solid mark symbolizes the greatest regression of fitting linear line among outputs besides targets. The regression (correlation and coefficient) value is a sign of the correlation among the outputs then targets. If R is equal to 1, this specifies a linear correlation among outputs besides targets. There is no linear connection among outputs besides targets if R is adjacent to zero. Table 1 and Fig. 5(a) describes value of R for training, validation, and test using the Levenberg-Marquardt algorithm where R is 0.92566. Validation: R=0.92777
Training: R=0.92616 Data
120
Data
120
Fit
Fit Y=T
Y=T
100
Output ~= 0.88*Target + 7.4
Output ~= 0.88*Target + 7.7
100
80
60
40
20
20
40
60
80
100
80
60
40
20
120
20
40
Target
Data
120
80
100
120
100
120
Data
120
Fit
Fit Y=T
Y=T
100
Output ~= 0.87*Target + 8
100
Output ~= 0.84*Target + 10
60
Target
All: R=0.92566
Test: R=0.92189
80
60
40
20
20
40
60
80
100
120
80
60
40
20
20
Target
40
60
80
Target
(a)
(b)
(c)
Fig. 5. (a) Visualization of correlation of coefficient (1. Training, 2. Validation, 3. Test, 4. All) using the Levenberg-Marquardt algorithm for superconductivity (b) Visualization of correlation of coefficient (1. Training, 2. Validation, 3. Test) using Bayesian Regularization algorithm for superconductivity (c) Visualization of correlation of coefficient (1. Training, 2. Validation, 3. Test, 4. All) using scaled conjugate gradient algorithm for superconductivity
The first three plots (Fig. 5(b)) characterize the validation, training, and testing data. The rushed line in separately plot denotes the accurate targets = result–outputs. The concrete mark symbolizes the greatest regression of fit linear line among outputs as well as targets. The regression (correlation and coefficient) significance is a sign of the correlation among the targets besides outputs. If R is equal to 1, this specifies that here is a particular linear correlation among targets then outputs. There is no linear connection among targets besides outputs if regression is adjacent to zero. Table 2 and Fig. 5(b) illustrates regression value (R) for training, validation, and test using Bayesian
Comparative Performance Analysis of Neural Network Base Training Algorithm
77
Regularization algorithm where the value all of R is 0.93951 and validation 0. Table 3 and Fig. 5(c) describes the value of R for training, validation, and test using Scaled Conjugate Gradient algorithm. Where all of R is 0.81912 and validation 0.80921. Figure 6 (a, b) visualizes SOM topology and SOM Neighbor connections respectively for the purpose of prediction of superconductivity. Figure 6 (c) displays a plane of weight for separately vector elements of the input (one, in this circumstance). Figure 6 (c) shows weights that assign particular input to respective neurons. Dimmer colors denote bigger weights. The nonappearance topology of the SOM is hexagonal. Figure 6 (d) illustrates the neuron positions in the topology and specifies how several of the data for training are related to the neurons separately. The topology visualizes a 10 divided by the grid of 10, therefore 100 neurons are in SOM hits plot. The extreme quantity of hits related to some neuron is 19. Therefore, 19 is in input vectors with its cluster. Figure 6 (e) shows that the green marks indicate the vectors for training. The first step for SOM ranges the first weights through the space in input. SOM Neighbor Weight Distances
SOM Neighbor Connections
SOM Topology
8
8
8
7
7
7
6
6
6
5
5
5
4
4
4
3
3
3
2
2
2
1
1
1
0
0
-1
0 -1
-1 4
2
0
6
8
10
4
2
0
(a)
6
7
1
1
4
2
0
10
(b) Hits
8
8
1
0
0
0
0
8
10
SOM Weight Positions
12000
2
0
6
(c)
0
10000 1
0
1
0 1
0
6
0
0
1
0 1
0
0 1
0
1
0 0
1
0
8000 1
0
1
0 1
0
1 1
0
0
1
0 1
0
0
1
0 0
1
0
Weight 2
5 4
1
1
0
0
0
0
0
1
0
1
6000
3 1
0
2
2
1
0
2
0 0
2
1 4 5
0 0
2
19
1 1
2
0 2
0 1
0
2
0
0 1
1 0
0
0 1
1
4000
0
2000
1
0 1
0 2
0
1
0
-1
0
0
2
4
(d)
6
8
10
2000
4000
6000
8000
10000
Weight 1
(e)
Fig. 6. Visualization of (a) SOM topology for superconductor dataset using neural network (b) SOM Neighbor Connections for superconductor dataset using neural network (c) SOM neighbor weight distances for superconductor dataset using neural network (d) SOM hits plot for superconductor dataset using neural network (e) SOM weight positions for superconductor dataset using neural network
10 Conclusion Superconductors superconductivity has been considered according to 81 attributes and predicted in order to Neural Network using Scaled Conjugate Gradient algorithm,
78
S. Bharati et al.
Levenberg-Marquardt, Bayesian Regularization. It is observed the enactment results of all algorithm which provides MSE and correlation of coefficient value and it also visualized performance curve for best validation performance. It also visualizes training state, error histogram and correlation of coefficient as well as imagines and investigates SelfOrganizing Maps (SOM) for superconductivity. At last, we established a decision about the algorithm for the prediction of superconductivity. Levenberg-Marquardt provides the best performance for superconductor according to its prediction. Levenberg-Marquardt offers validation value of R which is 0.854644. Scaled Conjugate Gradient provides the second highest value of R which is 0.809214. Mean Square Error has been calculated and superconductivity of superconductors has been predicted in this paper.
References 1. Zafeiris, D., Rutella, S., Ball, G.R.: An artificial neural network integrated pipeline for biomarker discovery using Alzheimer’s disease as a case study. Comput. Struct. Biotechnol. J. 16, 77–87 (2018) 2. Pasini, A.: Artificial neural networks for small dataset analysis. J. Thorac. Dis. 7, 953–960 (2015) 3. Franceschini, S., Gandola, E., Martinoli, M., Tancioni, L., Scardi, M.: Cascaded neural networks improving fish species prediction accuracy: the role of the biotic information. Sci. Rep. ISSN 2045-2322, Springer (2018) 4. Alaniz, A.Y., Sanchez, E.N., Loukianov, A.G.: Discrete-time adaptive back stepping nonlinear control via high-order neural networks. IEEE Trans. Neural Netw. 18, 1185–1195 (2007) 5. Mondal, M.R.H., Bharati, S., Podder, P., Podder, P.: Data analytics for novel coronavirus disease. Inform. Med. Unlocked 20, 100374 (2020). https://doi.org/10.1016/j.imu.2020. 100374 6. Khomfoi, S., Tolbert, L.M.: Fault diagnostic system for a multilevel inverter using a neural network. IEEE Trans. Power Electron. 22, 1062–1069 (2007) 7. Okut, H., Gianola, D., Rosa, G.J.M., Weigel, K.A.: Prediction of body mass index in mice using dense molecular markers and a regularized neural network. Genet. Res. Camb. 93, 189–201 (2011) 8. Vigdor, B., Lerner, B.: Accurate and fast off and online fuzzy ARTMAP-based image classification with application to genetic abnormality diagnosis. IEEE Trans. Neural Netw. 17, 1288–1300 (2006) 9. Hassenzahl, W.V.: Applications of superconductivity to electric power systems. IEEE Power Eng. Rev. 20(5), 4–7 (2000) 10. Kayri, M.: Predictive abilities of Bayesian regularization and Levenberg–Marquardt algorithms in artificial neural networks: a comparative empirical study on social data. Math. Comput. Appl. 21, 20 (2016) 11. Hagan, M.T., Menhaj, M.B.: Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Netw. 5, 989–993 (1994) 12. Saini, L.M.: Peak load forecasting using Bayesian regularization, Resilient and adaptive back propagation learning based artificial neural networks. Electr. Power Syst. Res. 78, 1302–1310 (2008) 13. Stanev, V., Oses, C., Kusne, A.G., Rodriguez, E., Paglione, J., Curtarolo, S., Takeuchi, I.: Machine learning modeling of superconducting critical temperature. npj Comput. Mater. 4, 1–14 (2018). https://doi.org/10.1038/s41524-018-0085-8 14. Liu, Y., Zhang, H., Xu, Y., Li, S., Dai, D., Li, C., Ding, G., Shen, W., Qian, Q.: Prediction of superconducting transition temperature using a machine-learning method. Mater. Technol. 52(5), 639–643 (2018). https://doi.org/10.17222/mit.2018.043
Comparative Performance Analysis of Neural Network Base Training Algorithm
79
15. Haider, S.A., Naqvi, S.R., Akram, T., Kamran, M.: Prediction of critical currents for a diluted square lattice using artificial neural networks. Appl. Sci. 7(3), 238 (2017). https://doi.org/10. 3390/app7030238 16. Garg, D., Mishra, A.: Bayesian regularized neural network decision tree ensemble model for genomic data classification. Appl. Artif. Intell. 32(5), 463–476 (2018) 17. Alomari, M.H., Younis, O., Hayajneh, S.M.A.: A predictive model for solar photovoltaic power using the Levenberg-Marquardt and Bayesian regularization algorithms and real-time weather data. Int. J. Adv. Comput. Sci. Appl. 9(1), 347–353 (2018) 18. Bharati, S., Podder, P., Mondal, M.R.H.: Hybrid deep learning for detecting lung diseases from X-ray images. Inform. Med. Unlocked 20, 100391 (2020). https://doi.org/10.1016/j. imu.2020.100391 19. Figueiredo, M., Gomide, F.: Design of fuzzy systems using neurofuzzy networks. IEEE Trans. Neural Netw. 10(4), 815–827 (1999). https://doi.org/10.1109/72.774229 20. Chen, L.V.: Levenberg–Marquardt backpropagation training of multilayer neural networks for state estimation of a safety-critical cyber-physical system. IEEE Trans. Ind. Inform. 14(8), 3436–3446 (2018) 21. Smith, J.S., Wu, B., Wilamowski, B.M.: Neural network training with Levenberg-Marquardt and adaptable weight compression. IEEE Trans. Neural Netw. Learn. Syst. 30, 1–8 (2018). https://doi.org/10.1109/tii.2017.2777460 22. Nameer Al Khafaf; Ayman El-Hag: Bayesian regularization of neural network to predict leakage current in a salt fog environment. IEEE Trans. Dielectr. Electr. Insul. 25(2), 686–693 (2018) 23. Chel, H., Majumder, A., Nandi, D.: Scaled conjugate gradient algorithm in neural network based approach for handwritten text recognition. In: International Conference on Computational Science, Engineering and Information Technology, vol. 204, pp. 196–210 (2011) 24. Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (2012) 25. Bharati, S., Rahman, M.A., Mondal, R., Podder, P., Alvi, A.A., Mahmood, A.: Prediction of energy consumed by home appliances with the visualization of plot analysis applying different classification algorithm. In: Satapathy, S.C., Bhateja, V., Nguyen, B.L., Nguyen, N.G., Le, D.-N. (eds.) Frontiers in Intelligent Computing: Theory and Applications. AISC, vol. 1014, pp. 246–257. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-9920-6_25
Automatic Detection of Parkinson’s Disease from Speech Using Acoustic, Prosodic and Phonetic Features Rania Khaskhoussy(B) and Yassine Ben Ayed MIRACL: Multimedia InfoRmation System and Advanced Computing Laboratory, Sfax University, Sfax, Tunisia [email protected], [email protected]
Abstract. Parkinson’s disease (PD) is a neurodegenerative disease ranked second after Alzheimer’s disease. It affects the central nervous system and causes a progressive and irreversible loss of neurons in the dopaminergic system, that insidiously leads to cognitive, emotional and language disorders. But until day there is no specific medication for this disease, the drug treatments that exist are purely symptomatic, that’s what encourages researchers to consider non-drug techniques. Among these techniques, speech processing becomes a relevant and innovative field of investigation and the use of machine-learning algorithms that provide promising results in the distinction between PD and healthy people. Otherwise many other factors such as feature extraction, number of feature, type of features and the classifiers used they all influence on the prediction accuracy evaluation. The aim of this study is to show the importance of this last factor, a model is suggested which include feature extraction from 3 types of features (acoustic, prosodic and phonetic) and classification is achieved using several machine learning classifiers and the results show that the proposed model can be highly recommended for classifying PD in healthy individuals with an accuracy of 99.50% obtained by Support Vector Machine (SVM). Keywords: Parkinson’s disease · Speech processing learning acoustic prosodic and phonetic
1
· Machine
Introduction
Parkinson’s disease (PD) is described in 1817 by the doctor James Parkinson; the PD profoundly affects the lives of patients and their families, It’s afflicts more than six million people in the world [1]. This disease affects the central nervous system causing a loss of neurons from the dopaminergic system which is found in the substantia nigra of the human brain, it’s the system responsible for the manufacture, release of dopamine and controlling the execution of plans engines [2]. This cognitive dysfunction affects several motor activities among c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 80–89, 2021. https://doi.org/10.1007/978-3-030-49342-4_8
Automatic Detection of Parkinson’s Disease from Speech
81
these activities, speech production [3] for example, a word finding difficulty may lead to slower or more irregular speech patterns [4]. [5] Prove that PD usually causes a weakening of the voice in about 90% of patients and affects people whose age is over 50 years. This pathological evolution make the specialists more interested to techniques or methods external to their domain to get more information for following this disease. For this purpose voice measurements have an important role in the early detection of PD and the researchers have developed several methods aimed to distinguish between healthy people and people with PD by analyzing the voice recording using different types of voice measurements; acoustic, prosodic and phonetic. In the literature, there is a large body of knowledge on the recognition of Parkinson’s from speech using different methods and each one uses a number and types of characteristic; As for disorders, [6] aimed to discriminate healthy people from people with PD by using the first twenty MFCC coefficient extract from 40 subjects which lead to overall correct classification of 77.50% using support vector machine (SVM). [7] use the same database and extract multiple types of features then we selected ten highly uncorrelated measures, and an exhaustive search of all possible combinations of these measures finds four in the combinations provides a classification rate reach 91.4%. These measurements contain jitter, shimmer and her variations, Harmonics-to-Noise Ratio (HNR), Noise-to-Harmonics Ratio (NHR), Recurrence Period Density Entropy (RPDE), Detrended Fluctuation Analysis (DFA), Correlation Dimension (D2), and Pitch Period Entropy (PPE). In the same way to distinguish healthy subjects from PD patients [8] used an hybridization of a SVM and k-nearest neighbor classifiers applied on characteristics of different type, classify into groups defined by [8], and for predict the performance of model a leave-one-subject-out (LOSO) cross validation technique is employed. Most studies use combinations of voice quality measures in several domains. [9] Successfully classify Parkinson diseased and healthy people by using acoustic and prosodic features with a rate of 92%, another combination by using the phonetic features found by [10] allowed a hight accuracy of 98.0%. In this study, we extracted 3 types of features acoustic, prosodic and photnotic which are the 39 Mel-Frequency Cepstral Coefficients (MFCCs), Zero Crossing Rate (ZCR), fundamental frequency F0, energy E of the signal and the three formants (F1, F2, F3) from 40 subjects, of whom 20 were diagnosed with PD. The 39 MFCC, F0, F1, F2 and F3 were compressed by calculating their average value from each voice recording in order to obtain the same number of features for all voice recording. Subsequently, a classification method was performed using several machine learning classifiers which are Support Vector Machine (SVM), Random Forest (RF) and Neural Network exactly the Multilayer Perceptron (MLP). Then to test and validate our results, we used another independent database containing 28 PD patients. The remainder of this paper is organized as follows. Section 2 the description of the database used in this work. Section 3 describes the proposed approach
82
R. Khaskhoussy and Y. B. Ayed
which include features extraction and the machine learning (ML) algorithms applied in this study. The obtained results are presented in Sect. 4 and the main conclusion in Sect. 5.
2
Data Acquisition
The dataset used in this study was collected and used by [8], belongs to 20 patients with PD (6 women and 14 men) and 20 healthy people (10 women and 10 men). The PD patients are suffering from this disease for 0 to 6 years, their ages ranges between 43 and 77 (mean 64.86, standard deviation 8.97) for PD patients and between 45 and 83 (mean 62.55, standard deviation 10.79) for healthy people. All voice recordings are taken by a Trust MC-1500 microphone with a frequency range between 50 Hz and 13 kHz. The microphone is set to 96 kHz, 30 dB and fixed at a distance of 15 cm from the subjects. Three types of sustained vowels (/a/, /o/ and /u/), numbers and other words were recorded for each subjects of the 40 participants (20 PD and 20 healthy) and then the analyses were done on these voice samples, besides all the recordings were made in stereo-channel mode and saved in WAV format. Employing the same recording devices with the same physicians, another dataset was also collected contains 28 PD asked to pronounce sustained vowels /a/ and /o/ three times, respectively which makes a total of 168 recordings. These patients are suffering from PD for 0 to 13 years, and their ages ranges between 39 and 79 (mean 62.67, standard deviation 10.96). This dataset was used to test and validate the obtained results using the first dataset.
3
Proposed Approach
As first step in this work was to build a dataset containing voice samples recordings of healthy people and patients with PD. So we used a PD dataset collected by [8]. We then extracted from each voice sample three types of features: acoustic, prosodic and phonetic which are the 39 MFCC, Zero Crossing Rate (ZCR), Fundamental Frequency F0, Energy E of the signal and the three Formants (F1, F2, F3). For the acoustic features we extract from each voice recording the 39 MFCC coefficients (13 MFCC, 13 delta MFCC and 13 delta delta MFCC). In effect the recordings haven’t the same duration, a normalization step is used which allows each time to calculate the average value of each coefficient from the coefficients of the different frames of a signal in order to obtain a set of 39 MFCC coefficients for each signal. Fundamental Frequency F0 and the three Formants (F1, F2, F3) are estimated by the Linear Predictive Coding (LPC) method. All the features are extracted by MATLAB software and used as input vector to the different machine learning algorithms. The different steps of our model is shown in Fig. 1 and described in the next paragraphs.
Automatic Detection of Parkinson’s Disease from Speech
83
Fig. 1. Speech processing architecture for automatic detection of Parkinson’s Disease.
3.1
Acoustic Features Extraction
Mel Frequency Cepstral Coefficient (MFCC). The MFCC are the coefficients of the Mel cepstrum. The Mel-cepstrum is the cepstrum computed on the Mel-bands (scaled to human ear) it’s a transformation of a signal from the time domain to another domain analogous to the time domain. (MFCC) are the most used acoustic parameters in various areas of automatic speech recognition systems [11], because it is considered quite good in representing signal [12]. The calculation of the MFCC parameters uses a non-linear frequency scale that takes into account the characteristics of the human ear named the Mel scale. This scale was developed by Stevens and Volkmann in 1940 following a study of human auditory perception [13]. It’s a logarithmic scale that similarly represents the perception of sound to the human ear [14]. The technique of calculating the MFCC is shown in Fig. 2 and described in the next paragraphs.
Fig. 2. Functional schema for extraction the 39 Mel Frequency Cepstral Coefficients (MFCC).
Pre-emphasis. Is to increase the higher frequencies by applying the first order difference equation to the voice samples : {Sn , n = 1 . . . N } [6,14] Sn = Sn − k . Sn−1
(1)
here k is the pre-emphasis coefficient and it should be within the range of 0 ≤ k ≤ 1 [14].
84
R. Khaskhoussy and Y. B. Ayed
Framing. The analysis of the voice signal over a long time periods shows that the voice signal is not stationary so it is essential to resort to the technique of short time analysis. Mostly with the interval of 10 ms–30 ms, the voice signal can be considered stable [15]. The rate of movement of the voice articulators is limited by physiological limitations. Therefore, the analysis of voice signal is done within uniformly frames of typical duration (from 10 to 30 ms). In frame blocking, the voice signal is divided into frames of N samples. Neighboring frames should be separated by M (M < N ) [6,15]. Hamming windowing. Voice signal processing is only possible on a limited number of samples. This operation consists in applying a rectangular window of finite duration on all the frames of a signal. One of the most used windows is the Hamming window which aims to reduce the signal to zero at the beginning and at the end of each frame, applying the following equation to the voice samples {Sn , n = 1...N } [6,14] 2Π(n − 1) (2) Sn = {0.54 − 0.46 . cos } . Sn N −1 Fast Fourier transform FFT. Allows to transform each frame of N samples from the time domain to the frequency domain. The FFT is a fast algorithm that implement the discrete Fourier transform (DFT) [15]. The DFT is defined on the set of N samples (Sn ) as follow : Sn =
N −1
Sk e−2Πjkn/N , n = 0, 1, 2, . . . , N − 1
(3)
k=0
Filter bank analysis. According to psychophysical research that human ear resolution of frequencies does not follow a linear scale across the audio spectrum [14]. Therefore for each frequency measured in Hz, a subjective pitch is measured on the mel scale [6]. The Mel-frequency scale is linearly spaced less than 1000 Hz and logarithmic above 1000 Hz and the filters have a triangular form [6]. Figure 3 show the general form of the filter bank. the Mel of a given frequency is calculated as follows [14]: f M el(f ) = 2595 . log10 1 + (4) 700
Logarithm/DCT. The MFCC coefficients are then obtained by applying a discrete cosine transform on the log filter bank amplitudes (mj ) [6,14]. ci =
N Πi 2 (j − 0.5) mj . cos N j=1 N
where N is the number of filter bank channels.
(5)
Automatic Detection of Parkinson’s Disease from Speech
85
Fig. 3. Mel-scale filter bank form [14].
δM F CC and δδM F CC. The δM F CC coefficients are calculated by the following equation: 2 δc(i) = α j(c(i + j) − c(i − j)) (6) j=1
with α is a constant ≈ 0.2 δδM F CC coefficients are calculated as follows : δδc(i) = δc(i + 1) − δc(i − 1)
(7)
Zero Crossing Rate ZCR. Is an interesting parameter that has been used in several speech recognition systems. The number of zero crossings in a region represents the number of times that the signal, in its amplitude/time representation, passes through the central value of the amplitude (generally zero) divided by the number of samples in that region [16]. The figure Fig. 4 show some example for Zero crossing. And the calculation is done as follows [16] ZCR =
N −1 1 sgn(s(n)s(n − 1)) N − 1 n=1
(8)
where sgn(s(n)s(n − 1))
3.2
= 1, s(n)s(n − 1) ≥ 0 = 0, s(n)s(n − 1) < 0
Prosodic and Phonetic Features Extraction
Energy of the Signal. In a typical speech signal the amplitude of unvoiced segments is noticeably lower than that of the voiced segments. Energy of speech signals reflects this amplitude variation [16]. The energy of a sampled signal s(i) of length N is defined by : E=
N 1 s(i)2 N i=1
(9)
86
R. Khaskhoussy and Y. B. Ayed
Fig. 4. Definition of zero-crossings rate.
Fundamental Frequency F0 and Formants (F1, F2, F3). Phonation features is based on the vocal cords vibrations which measures the stability of pitch frequency and energy [10]. Linear Predictive Coding (LPC) is one of the most powerful analysis techniques which allows to estimate the characteristics of the vocal tract [17]. Procedure followed for Fundamental Frequency F0 and Formants (F1, F2, F3) estimation with LPC – – – –
Reading audio file Segmentation of signal in 30 ms analysis frames Windowing the frame LPC analysis [17] and estimation of F0, F1, F2 and F3
3.3
Machine Learning (ML) Classifiers
Support Vector Machine (SVM). First proposed by [18], is a type of supervised learning algorithm aims to separate large quantities of data by using the concept of hyperplanes and margins. It’s proposed firstly to distinguish between two classes and after several researches SVM evolved from a binary classifier to multi-class classification tasks [19]. SVM simultaneously minimize the empirical classification error and maximize the geometric margin, by transforming the space of the input vector to a higher dimensional space where a maximal separating hyperplane is constructed. This hyperplane is the one that maximizes the margin (which have the largest separation between the two classes) compared to the other plane [20]. Multilayer Perceptron(MLP). Is the simplest form of a neural network and a supervised learning algorithm, which have features such as the ability to learn and generalize, perform a wide variety of detection and estimation tasks, smaller training set requirements, fast operation, ease of implementation [21]. MLP consists of a set of source neurons forming the input layer, one or more hidden layers of computation neurons, and an output layer of neurons. Each neuron performs a relatively simple task as receiving the external information or outputs of the previous layer’s neuron and use them to calculate its own output that propagates to the connected neurons of the next layer.
Automatic Detection of Parkinson’s Disease from Speech
87
Random Forest (RF). The RF algorithm was named by [22] it’s a learning technique where classifiers are a combination of unpruned decision trees. A decision tree is a flow chart type structure in which each internal node represents a test on an attribute, each branch represents the result of the test, and each leaf node represents a class label. The paths from root to leaf represent the classification rules.
4
Results and Discussion
The first phase in this study was to establish a dataset containing voice recordings of normal people and Parkinsonian patients and this has been done by using this dataset [8], then we extract from each voice recording three types of features acoustic prosodic and phonetic which are 39 MFFC, ZCR, F0, F1, F2 and F3. Subsequently by WEKA software [23] we applied three machine learning (ML) algorithm: SVM, MLP and RF and calculate the overall accuracy. Table 1 represent the classification accuracy using the three SVM kernels. As can be seen, a maximum classification accuracy of 99.50% was achieved using RBF kernel of SVM. We continue tested our model by another machine learning algorithms and Table 2 show the obtained results by this three ML. It is observed that the Random Forest (RF) succeed to discriminate between the healthy people and the Parkinsonian patients With an accuracy rate of 99%. In addition, the Multilayer Perceptron (MLP) shows a precision rate of 97% which is less than the RF but it’s considered important rate. According to the Table 2 it is clear that the three ML classifiers demonstrated a high precision and observe again that SVM present the high accuracy of 99.50% by the RBF kernel. Table 1. Accuarcy rate using three kernels SVM. SVM kernels
Accuracy (%)
Polynomial
95%
Normalized polynomial 99% RBF
99.50%
Table 2. Result of ML classifiers. ML classifiers Accuracy (%) SVM
99.50%
RF
99%
MLP
97%
In regards to the output from our study, we could prove that using acoustic, prosodic and phonetic features extracted from voice recordings of PD patients could be effective biomarkers of Parkinson and the related dementia diseases.
88
R. Khaskhoussy and Y. B. Ayed
However, compared to [9] and [10], our study identifies more detailed healthy people from people with PD and this is shown through the rates found by the different machine learning (ML) algorithms. Also contrary to other study [6,8] we used more than one features type (acoustic, prosodic and phonetic) that have improved the accuracy of the ML classifier.
5
Conclusion
The signs of Parkinson’s disease and the voice disorders do not appear abruptly. It is a slow process whose first stages may go unnoticed and the clinical diagnosis may require much time from caregivers, patients, and medical personnel. In order to helping people’s lives, this paper proposes a new model for enhance the assessment of PD based on the analysis of free speech. The system performs an acoustic, prosodic and phonetic analysis of a short speech sample and uses this information to feed a classification algorithm based on a ML algorithm. Using of the three type of features to distinguish between PD patients and normal people have shown that are good parameters for the detection of voice disorder in the context of PD. Based on our results, we concluded that the SVM was the most successful classifer with a best accuracy rate of 99.50%. We believe that the proposed model will help improving the prediction performance in detecting PD and cover the limitations discussed in the previous researches. In the future, we plan to investigate the effect of Parkinson disease in early stages by analyzing the voice samples of people who have been suffering from the disease for less than two years. Other features and classifiers may be also used to distinguish between PD and healthy people.
References 1. Romulo, F., Per, P., Miguel, A.L.N.: Restoration of locomotive function in Parkinson’s disease by spinal cord stimulation: mechanistic approach. Eur. J. Neurosci. 32, 1100–1108 (2010) 2. Christopher, G.G.: The history of Parkinson’s disease: early clinical descriptions and neurological therapies. Cold Spring Harb. Perspect. Med. 1, a008862 (2011) 3. Pinto, S., Ghio, A., Teston, B., Viallet, F.: La dysarthrie au cours de la maladie de Parkinson. Histoire naturelle de ses composantes: dysphonie, dysprosodie et dysarthrie. Revue Neurologique 166, 800–810 (2010) 4. The Michael J. Fox Foundation for Parkinson’s Research. https://www. michaeljfox.org/understanding-parkinsons/living-with-pd/topic.php?speechswallowing 5. O’Sullivan, S.B., Schmitz, T.J.: Parkinson disease. In: Physical Rehabilitation, pp. 856–894. F.A. Davis Company (2007) 6. Achraf, B., Abdelilah, J., Ahmed, H.: Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson’s disease and healthy people. Int. J. Speech Technol. 19, 449–456 (2016) 7. Little, M.A., McSharry, P.E., Hunter, E.J., Spielman, J., Ramig, L.O.: Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 56, 1015–1022 (2009)
Automatic Detection of Parkinson’s Disease from Speech
89
8. Betul, E.S., Erdem, I.M., Okan, S.C., Ahmet, S., Fikret, G., Sakir, D., Hulya, A., Olcay, K.: Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J. Biomed. Health Inf. 17, 828–834 (2013) 9. Khan, T., Westin, J., Dougherty, M.: Classification of speech intelligibility in Parkinson’s disease. Biocybern. Biomed. Eng. 34, 35–45 (2014) 10. Upadhya, S.S., Cheeran, A.: Discriminating Parkinson and healthy people using phonation and cepstral features of speech. Procedia Comput. Sci. 143, 197–202 (2018) 11. Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29, 82–97 (2012) 12. Anggraeni, D., Sanjaya, W.S.M., Solih, M.Y., Munawwaroh, M.: The implementation of speech recognition using Mel-Frequency Cepstrum Coefficients (MFCC) and Support Vector Machine (SVM) method based on python to control robot arm. IOP Conf. Ser.: Mater. Sci. Eng. 288 (2018) 13. Mishra, A.N., Chandra, M., Biswas, A., Sharan, S.N.: Robust features for connected hindi digits recognition. Int. J. Signal Process. Image Process. Pattern Recognit. 4, 79–90 (2011) 14. Young, S., Evermann, G., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book, vol. 3. Cambridge University Press, Cambridge (2002) 15. Kumar, S.C., Mallikarjuna, P.R.: Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm. Int. J. Comput. Sci. Eng. 3, 2942–2954 (2011) 16. Shete, D.S., Patil, S.B.: Zero crossing rate and energy of the speech signal of Devanagari script. IOSR J. VLSI Signal Process. 4, 01–05 (2014) 17. Fujimoto, K., Hamada, N., Kasprzak, W.: Estimation and tracking of fundamental, 2nd and 3d harmonic frequencies for spectrogram normalization in speech recognition. Bull. Pol. Acad. Sci.: Tech. Sci. 60, 71–81 (2012) 18. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992) 19. Dellaert, F., Polzin, T., Waibel, A.: Recognizing emotion in speech. In: Proceeding of Fourth International Conference on Spoken Language Processing, ICSLP’96, vol. 3, pp. 1970–1973. IEEE (1996) 20. Saloni, R.K., Gupta, A.K.: Detection of Parkinson disease using clinical voice data mining. Int. J. Circuits Syst. Signal Process. 9, 320–326 (2015) 21. Caglar, M.F., Cetisli, B., Toprak, I.B.: Automatic recognition of Parkinson’s disease from sustained phonation tests using ANN and adaptive neuro-fuzzy classifier. J. Eng. Sci. Des. 1, 59–64 (2010) 22. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001) 23. WEKA. https://www.cs.waikato.ac.nz/ml/weka/
A Deep Convolutional Neural Network Model for Multi-class Fruits Classification Laith Alzubaidi1,2(B) , Omran Al-Shamma1 , Mohammed A. Fadhel1,4 , Zinah Mohsin Arkah1 , and Fouad H. Awad3 1 University of Information Technology and Communications, Baghdad, Iraq
[email protected], {o.al_shamma,Mohammed.a.fadhel, zinah2018}@uoitc.edu.iq 2 Faculty of Science and Engineering, Queensland University of Technology, Brisbane, Australia 3 College of Computer Science and Information Technology, University of Anbar, Anbar, Iraq [email protected] 4 University of Sumer, Thi Qar, Iraq
Abstract. Fruits classification is a challenging task due to the several types of fruits. To classify fruits more effectively, we propose a new deep convolutional neural network model to classify 118 fruits classes. The proposed model combines two aspects of convolutional neural networks, which are traditional and parallel convolutional layers. The parallel convolutional layers have been employed with different filter sizes to have better feature extraction. It also helps with backpropagation since the error can backpropagate from multiple paths. To avoid gradient vanishing problem and to have better feature representation, we have used residual connections. We have trained and tested our model on Fruits-360 dataset. Our model achieved an accuracy of 100% on a divided image set from the training set and achieved 99.6% on the test set, which outperformed previous methods. Keywords: Fruits classification · Convolutional neural network · Deep learning
1 Introduction In the field of academic research, fruit classification is still considered as an excitable issue. For instance, identifying the class of a single fruit helps the workers in the supermarket to determine its price rapidly [1]. In addition, it is useful to present nutritional instructions to assist customers in selecting their appropriate food types that meet their nutrient and well-being needs [2, 3]. For automatic packaging, fruit classification techniques are widely used in all food factories, as well. Manual fruit classification remains a challenging topic since the fruit types and subtypes differ from region to another. This wide difference focuses on the availability of the fruit of population-dependent and region-dependent, as well as, the necessary ingredients in the fruits [3]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 90–99, 2021. https://doi.org/10.1007/978-3-030-49342-4_9
A Deep Convolutional Neural Network Model
91
The great progression in computer vision and machine learning, mainly in the last decade, brings the attention of several researchers for employing the developed techniques in the automated fruit classification. Researchers commonly used some features related to the external quality descriptors, like shape, size, texture, and color in their work [3–15]. In general, most of the proposed classifiers were either restricted to a specific type, or their performance has not acceptable accuracy. In previous work, some methods have utilized deep learning techniques to classify fruits [16, 17] which have shown a great performance. This motivated us to employ a deep learning model for fruits classification task which considers a very challenging task due to a large number of classes.
2 Related Work In recent years, specialists introduced several automated fruit classification techniques. The first group of scientists used clustering technique for classifying fruits and vegetables is Pennington and Fisher in 2009 [3]. Pholpho et al. [4], employed visible spectroscopy technique for recognizing the damaged/undamaged fruits. While Yang et al. [5], introduced an estimation system, employing multispectral imaging analysis for blueberry fruit application. In contrast, classifying various types of fruit with 88.2% accuracy is achieved using computer vision and multi-class Support Vector Machine (SVM) [6]. Later, eight different citrus fruits are recognized utilizing Raman spectroscopy as a fast and undamaging tool, and two analysis techniques (hierarchical cluster and principal component) [7]. Fadhel, M., et al. have used color segmentation to recognize the unripe strawberry [21]. In 2013, Marchal et al. [8], employed computer vision and machine learning for developing an expert system, which aims to estimate the impurity content inside an individual sample of olive oil. While Breijo et al. [9], utilized an electronic nose (socalled an odor sampling system) for classifying Diospyros kaki aroma. The working parameters of the system have the ability to influence the variable configurations to make the system adaptable. On the other hand, the artificial neural network with two hidden layers is applied for predicting the characteristics of the texture extracted from the food-surface image [10]. The backpropagation algorithm is used for training the network. Moreover, Omid et al. [11], introduced an expert system, using machine vision and fuzzy logic, for extracting size and defect features. Another automated fruit classification system was proposed based on fitness-scaled chaotic artificial bee colony algorithm [12]. In addition, the texture-based technique, which includes descriptor computation and interest-point feature extraction, was suggested for detecting green fruits on palms [14]. Lastly, data fruits were classified based on Weber local descriptor and local binary pattern techniques with SVM for classifier and Fisher discrimination ratio for selecting features [15].
92
L. Alzubaidi et al.
Most of the previous works have the following drawbacks. a) They need high-priced sensors like weight, dew, heat, chemical, gas-sensitive, and invisible light sensors. b) The classifiers have the ability to recognize limited classes of fruits. c) The performance of the systems is not high enough, mainly with closely similar texture, color, and shape features. d) The accuracy of the classification does not achieve the requirements for typical applications.
3 Methodology 3.1 Dataset We have used Fruits-360 dataset to train and test our model [16]. The dataset has been downloaded from the Kaggle website (link to the dataset: https://www.kaggle.com/mol tean/fruits). It has 80653 as the total number of images, which represent 118 classes of fruits. The dataset divided into 60318 images for the training set and 20232 for the testing set. Each image of the dataset has the size of 100 × 100 pixels. 3.2 Deep Learning Artificial neural networks can obtain the most successful results, mainly in the field of image classification and recognition [18, 19, 22–24]. Deep learning models are based on these networks. Machine learning algorithms can be categorized into different classes, where deep learning is one of them. It utilizes multi-layers that composed of nonlinear processing units [20]. All layers learn to convert their input data into a complex representation and somewhat further abstract. Several machine-learning algorithms are failing to stand against deep neural networks (DNNs) that have well managed. In certain domains, DNNs achieved the first supreme pattern recognition. Moreover, DNNs are extra boosted, since deep learning represents a significant step in the direction of achieving Robust Artificial Intelligence. Currently, convolutional neural networks (CNNs), as a type of DNNs, have approved for obtaining valuable results, mainly in the image recognition field. Each CNN has several types of layers such as loss, fully connected, ReLU, pooling, as well as, convolutional layers [20]. Generally, its structure consists of a convolutional layer followed by ReLU layer, a pooling layer, one or more convolutional layer, and one or more fully connected layer, respectively. The key feature that characterized the CNN apart of the normal neural network is the image structure during its processing. It should be noted that normal neural network changes the image input into a 1D array (one-dimensional array), which reduces the sensitivity of the trained classifier to positional variations. Studying the structure of CNN is critical. In this paper, we focused on how to design a model with better feature extraction and deal with overfitting and gradient vanishing problems.
A Deep Convolutional Neural Network Model
93
3.3 Proposed Model A large number of fruit classes require good feature extraction to discriminate between classes. Our proposed model is very effective due to the structure designed. At first, the model starts with two tradition convolutional layers of 3 * 3, 5 * 5 sizes to reduce the input size. Each convolutional layer is followed by batch normalization and rectified linear unit layers to speed up the training process and avoid gradient vanishing problems. Using small filter size (such as 1 * 1) at the beginning of the model could lead to losing large features. For that reason, we avoided using a small filter size. After the traditional convolutional part, four blocks of parallel convolutional layers have been employed to extract the features. In the first block, four convolutional layers work in parallel then the output of four convolutional layers and the traditional convolutional layers using residual connection concatenates in the first concatenation layer. The convolutional layers have different filter sizes (3 * 3, 5 * 5, 7 * 7, 11 * 11) and followed by batch normalization and rectified linear unit layers which applied to all four blocks. The second block followed the same structure of block one except for the residual connection part. The output of block two and block three concatenates in the third concatenation layer. Furthermore, the output of block one passed through a single convolutional layer and concatenates with the output of block four in fourth concatenation layer. On top of that, the average pooling layer has been employed to perform the huge dimensional reduction which helps to void overfitting problem. Then, three fully connected layers have been utilized and two dropout layers employed between the three fully connected layers to prevent the overfitting problem. Lastly, the softmax function used to classify 118 fruit classes. The total number of layers of our model is 74 as described in Table 1 and Fig. 1. We have trained out model for 3700 iterations until the learning stopped as shown in Fig. 2.
4 Experimental Results We have evaluated our model in term of accuracy. Accuracy is measured as a key indicator of model effectiveness. Accuracy is defined as the ratio of the number of correct predictions to the total number of predictions. Our model achieved an accuracy of 100% on a divided set from the training set and achieved an accuracy of 99.6% on the testing set. It is superior to all methods that employed in Ref [16] which is the same source of the Fruits-360 dataset (Table 2). It is worth mentioning that the Ref [16] classified 101 classes while our model used to classify 118 classes. Table 3 shows some test samples with correct predictions.
94
L. Alzubaidi et al.
Table 1. Our model architecture, C refers to Convolutional Layer, B refers to Batch Normalization Layer, R refers to Rectified Linear Unit layer, CN refers to Concatenation Layer, G refers to Average Pooling Layer, D refers to dropout layer, F refers to fully connected layer. Name of layer
Filter Size (FS) and Stride (S) Activations
Input Layer
–
100 * 100 * 3
C1, B1, R1
FS = 3 * 3; S = 1
100 * 100 * 16
C2, B2, R2
FS = 5 * 5; S = 2
50 * 50 * 16
C3, B3, R3
FS = 3 * 3; S = 1
50 * 50 * 16
C4, B4, R4
FS = 5 * 5; S = 1
50 * 50 * 16
C5, B5, R5
FS = 7 * 7; S = 1
50 * 50 * 16
C6, B6, R6
FS = 11 * 11; S = 1
50 * 50 * 16
CN1, B1x
Input CN1 = 5
50 * 50 * 80
C7, B7, R7
FS = 3 * 3; S = 2
25 * 25 * 32
C8, B8, R8
FS = 5 * 5; S = 2
25 * 25 * 32
C9, B9, R9
FS = 7 * 7; S = 2
25 * 25 * 32
C10, B10, R10 FS = 11 * 11; S = 2 CN2, B2x
Input CN2 = 4
25 * 25 * 32 25 * 25 * 128
C11, B11, R11 FS = 3 * 3; S = 1
25 * 25 * 32
C12, B12, R12 FS = 5 * 5; S = 1
25 * 25 * 32
C13, B13, R13 FS = 7 * 7; S = 1
25 * 25 * 32
C14, B14, R14 FS = 11 * 11; S = 1
25 * 25 * 32
CN3, B3x
Input CN3 = 5
25 * 25 * 256
C15, B15, R15 FS = 3 * 3; S = 2
13 * 13 * 64
C16, B16, R16 FS = 5 * 5; S = 2
13 * 13 * 64
C17, B17, R17 FS = 7 * 7; S = 2
13 * 13 * 64
C18, B18, R18 FS = 11 * 11; S = 2
13 * 13 * 64
C19, B19, R19 FS = 7 * 7; S = 4
13 * 13 * 64
CN4, B4x
Input CN4 = 5
13 * 13 * 320
A1
FS = 7 * 7; S = 2
4 * 4 * 320
F1, D1
–
1 * 1 * 300
F2, D2
–
1 * 1 * 200
F3
–
1 * 1 * 118
O,Softmax
–
118 class
A Deep Convolutional Neural Network Model
Fig. 1. Our model architecture
95
96
L. Alzubaidi et al.
Fig. 2. Our model training progress
Table 2. Comparison of previous methods and our method on Fruits-360 dataset. Method
Accuracy on the training set (%)
Accuracy on the testing set (%)
Method 1 [16]
99.60
96.13
Method 2 [16]
99.37
95.85
Method 3 [16]
99.61
95.53
Method 4 [16]
98.95
93.13
Method 5 [16]
99.62
96.03
Method 6 [16]
96.03
92.30
Method 7 [16]
99.57
95.95
Method 8 [16]
99.47
95.80
Method 9 [16]
98.70
93.26
Method 10 [16]
99.44
94.16
Our Model
100
99.60
A Deep Convolutional Neural Network Model
97
Table 3. Test samples with correct predictions
Avocado: 99.1%
Apple Red3: 98.5%
Tomato1: 97.7%
Potato White: 96.1%
Plum 3: 95.1%
Pineapple: 99.2%
Banana: 100%
Dates: 98.1%
Kiwi: 99.3%
Lemon: 100%
Cocos: 98.9%
AppleGolden2: 98.7%
5 Conclusion We proposed a deep convolutional neural network model for fruits classification which is a challenging task due to the many types of fruits. Our model used to classify 118 types of fruits. The proposed model aggregated two modes of convolutional neural networks which are traditional and parallel convolutional layers. Our model has proved to be very helpful for the backpropagation process since the error can backpropagate from multiple paths. For the sake of preventing gradient vanishing problem and to have better feature representation, we have utilized residual connections. The fruits-360 dataset has been used to train and test our model on. Our model achieved an accuracy of 100% on divided images set from the training set and achieved 99.6% on test set which outperformed previous methods.
References 1. Zhang, B.H., Huang, W., Li, J., Zhao, C., Fan, S., Wu, J., Liu, C.: Principles, developments and applications of computer vision for external quality inspection of fruits and vegetables: a review. Food Res. 62, 326–343 (2014)
98
L. Alzubaidi et al.
2. Zhang, Y.D., Wu, L., Wang, S., Ji, G.: Comment on: principles, developments and applications of computer vision for external quality inspection of fruits and vegetables: a review (Food Research International; 2014, 62: 326–343). Food Res. 70, 142 (2015) 3. Pennington, J.A.T., Fisher, R.A.: Classification of fruits and vegetables. J. Food Compos. Anal. 22(Suppl. 1), S23–S31 (2009) 4. Pholpho, T., Pathaveerat, S., Sirisomboon, P.: Classification of longan fruit bruising using visible spectroscopy. J. Food Eng. 104, 169–172 (2011) 5. Yang, C., Lee, W.S., Williamson, J.G.: Classification of blueberry fruit and leaves based on spectral signatures. Biosyst. Eng. 113, 351–362 (2012) 6. Wu, L., Zhang, Y.: Classification of fruits using computer vision and a multiclass support vector machine. Sensors 12, 12489–12505 (2012) 7. Feng, X.W., Zhang, Q.H., Zhu, Z.L.: Rapid classification of citrus fruits based on raman spectroscopy and pattern recognition techniques. Food Sci. Technol. Res. 19, 1077–1084 (2013) 8. Cano Marchal, P., Gila, D.M., García, J.G., Ortega, J.G.: Expert system based on computer vision to estimate the content of impurities in olive oil samples. J. Food Eng. 119, 220–228 (2013) 9. Breijo, E.G., Guarrasi, V., Peris, R.M., Fillol, M.A., Pinatti, C.O.: Odour sampling system with modifiable parameters applied to fruit classification. J. Food Eng. 116, 277–285 (2013) 10. Fan, F.H., Ma, Q., Ge, J., Peng, Q.Y., Riley, W.W., Tang, S.Z.: Prediction of texture characteristics from extrusion food surface images using a computer vision system and artificial neural networks. J. Food Eng. 118, 426–433 (2013) 11. Omid, M., Soltani, M., Dehrouyeh, M.H., Mohtasebi, S.S., Ahmadi, H.: An expert egg grading system based on machine vision and artificial intelligence techniques. J. Food Eng. 118, 70–77 (2013) 12. Zhang, Y., Wang, S., Ji, G., Phillips, P.: Fruit classification using computer vision and feedforward neural network. J. Food Eng. 143, 167–177 (2014) 13. Khanmohammadi, M., Karami, F., Mir-Marqués, A., Garmarudi, A.B., Garigues, S., de la Guardia, M.: Classification of persimmon fruit origin by near infrared spectrometry and least squares-support vector machines. J. Food Eng. 17–22 (2014) 14. Chaivivatrakul, S., Dailey, M.N.: Texture-based fruit detection. Precis. Agric. 15(6), 662–683 (2014). https://doi.org/10.1007/s11119-014-9361-x 15. Muhammad, G.: Date fruits classification using texture descriptors and shape size features. Eng. Appl. Artif. Intell. 37, 361–367 (2015) 16. Mure¸san, H., Mihai, O.: Fruit recognition from images using deep learning. Acta Universitatis Sapientiae, Informatica 10(1), 26–42 (2018) 17. Siddiqi, R.: Effectiveness of transfer learning and fine tuning in automated fruit image classification. In: Proceedings of the 2019 3rd International Conference on Deep Learning Technologies. ACM (2019) 18. Alzubaidi, L., et al.: Deep learning models for classification of red blood cells in microscopy images to aid in sickle cell anemia diagnosis. Electronics 9(3), 427 (2020) 19. Alzubaidi, L., Fadhel, M.A., Oleiwi, S.R., et al.: DFU_QUTNet: diabetic foot ulcer classification using novel deep convolutional neural network. Multimed. Tools Appl. 79, 15655–15677 (2020). https://doi.org/10.1007/s11042-019-07820-w 20. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015) 21. Fadhel, M., et al.: Recognition of the unripe strawberry by using color segmentation techniques. Int. J. Eng. Technol. 7(4), 3383–3387 (2018) 22. Al-Shamma, O., et al.: Boosting convolutional neural networks performance based on FPGA accelerator. In: International Conference on Intelligent Systems Design and Applications. Springer, Cham (2018)
A Deep Convolutional Neural Network Model
99
23. Alzubaidi, L., et al.: Optimizing the performance of breast cancer classification by employing the same domain transfer learning from hybrid deep convolutional neural network model. Electronics 9(3), 445 (2020) 24. Alzubaidi, L., et al.: Towards a better understanding of transfer learning for medical imaging: a case study. Appl. Sci. 10(13), 4523 (2020)
Distributed Architecture of Snort IDS in Cloud Environment Mondher Essid(B) , Farah Jemili, and Ouajdi Korbaa MARS Research Laboratory, Universite de Sousse, ISITCom, LR17ES05, 4011 Hammam Sousse, Tunisia [email protected], [email protected], [email protected]
Abstract. Intrusion Detection System (IDS) is the most used mechanism for intrusion detection. Traditional IDS have been used to detect suspicious behaviors in network communication and hosts. However, with the evolution of Intrusion detection datasets size, we faced a new challenge which is storing those large datasets in Cloud Infrastructure and analyzing datasets traffic using Big data technology. Furthermore, some Cloud providers allow deploying and configuring IDS for the user. In this paper, we will introduce an architecture based on Snort IDS in cloud computing with distributed intrusion detection datasets. Keywords: Snort · Cloud computing · Cloud security · Intrusion detection datasets · IDS
1 Introduction Cloud has improved the world of information technologies. Today, cloud computing is the preferred choice of everybody, for normal user as well as for IT company since it provides flexible and pay-per-use based services to its users. However, security and privacy are a major obstacle for its success because of its open and distributed architecture that is vulnerable to intrusion. IDS have been used to detect suspicious activity and attacks in network or host. But nowadays IDS management is not only limited for network or host, it’s also an important thing for distributed IDS solutions, which makes it possible to integrate and handle different types of infrastructure like Cloud. Within this paper, we will propose a distributed architecture of Snort IDS in Microsoft Azure.
2 Cloud Computing Cloud computing is an internet-based system, based on shared virtual servers, which affords software, platform, infrastructure… Likewise, cloud computing offers different devices and resources to the user as a service on a pay-as-you-use basis. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 100–111, 2021. https://doi.org/10.1007/978-3-030-49342-4_10
Distributed Architecture of Snort IDS in Cloud Environment
101
Cloud customers do not retain the physical infrastructure but they rent the usage from a third- party provider. They consume resources as a service and pay only for resources that they use. What they only need is a personal computer and an internet connection. Cloud computing has three layers: • System layer: this layer is based on virtual machines abstraction of the server. • Platform layer: the main important role of this layer is to virtualize the operating system of the server. • Application layer: essentially includes web applications [1]. Cloud Computing offers convenient, demand-based access to a shared group with customized computing resources (like network, and storage services, applications, servers too) that can be equipped so fast and released with few effort or service provider interactions [1]. Therefore, there are 3 basic types of services in the cloud which provide services to its users in different ways: • Infrastructure as a Service (IaaS), where the user has control over complete virtual machines [2] such as Eucalyptus, Open Nebula [3]. • Platform as a Service (PaaS), where the customer can deploy a proper application in the cloud if the provider supports the languages, APIs, and tools used for creating an application, [4] such us Microsoft’s Azure, Google App Engine, [3]. • Software as a Service (SaaS) which enables users to use provider’s applications [4] such as Amazon apps [3]. These services are provided via the Internet. There are four different deployment models for the cloud: • Public cloud, its infrastructure is proposed to be used by the general public and managed by a governmental academic or business organization. • Private cloud, it is deployed for a particular organization has a lot of users. • Community cloud is deployed essentially for use by a group of users from different organizations having common goals. It can be managed by any of the organizations within that group or by a third part. • Hybrid cloud, is an infrastructure based on hybrid cloud models (public, private or community) that ensures the portability of applications and data using a standard technology [4]. The architecture of the cloud is open and fully distributed, making it a sensitive target for the infiltrator. So, the security of the cloud environment is at big risk where traditional network intrusion as well as cloud-specific attacks threaten the cloud users (such as individuals or organizations). According to IDG Enterprise’s 2013 Cloud Computing survey, security is the second major problem after insufficiency of control which causes problems for the companies from moving to cloud computing paradigm [5]. In the next section, we will introduce intrusion detection in the Cloud platform.
102
M. Essid et al.
3 Big Data The main idea to use BigData technology is to handle and analyze the huge volume of datasets, also to combine those datasets and remove redundancy data, in our case we used our predeveloped model which proved he’s performance [14] based on Hadoop MapReduce framework. Hadoop Hadoop meets the challenges of Big Data by simplifying the manipulation of huge datasets, also distributed applications [6]. Used all over the world by universities and other companies, it allows different task such as analytical tasks, which can be divided into fragments of work and distributed over huge number of computers. Hadoop provides programming approach that reduces the complexity of any distributed implementations. As consequence Hadoop offers a strong mechanism for exhaustive data treatment. Hadoop development is driven by a goal to better support vast amount of data. It offers a powerful framework named MapReduce for easily writing application, furthermore, providing highly scalable and parallelizable execution. MapReduce MapReduce is a framework for data processing which can be used to handle a huge amount of data stored in Hadoop file system HDFS. It is distributed and can deal with massive amount of data, which is not an easy task [7]. MapReduce provides for developers a comprehensive abstraction as automatic parallelization of the programs and framework managed with fault tolerance support. Hadoop MapReduce is based on programming two main functions Map and Reduce. Every record of the input data is received by the Map function (such as lines of a file, rows of a database…) as key-value pairs and outputs key-value pairs as the result [8]. Every Map function when invocated is independent of another framework. This will allow the Map tasks to duplicate executions or re-execute the task without affecting the results in case of failures. MapReduce groups the output key-value records from every map test by key-id and send them to the Reduce tasks. Then the reduce function will be invoked for each key and group of values in the sorted order of the keys. In classic MapReduce program, users only have to implement the Map and Reduce functions and Hadoop manage the rest in parallel.
4 Intrusion Detection An intrusion is defined as an attempt to expose the confidentiality and integrity, and availability (CIA) or to surpass the security mechanisms of a computer or network [8]. Intrusions may be launched by attackers trying to access the cloud resources through the Internet, legitimate users trying to gain privileges not given to them formally, and privileged users who abuse their rights to access resources [9]. IDS is to analyze intrusion detection datasets, to monitor different events in the computer or network and then to analyze them for the suspect intrusion. IDS could be a hardware, software or both for automating the process of intrusion detection. It captures data from the network and
Distributed Architecture of Snort IDS in Cloud Environment
103
notifies network manager by mailing the intrusion event or sending text message [5] or analyzing the traffic from the datasets. Even though, the alerts generated by IDS are not always relevant to actual intrusion due to false negatives and false positives which affect the IDS performances. Intrusion detection system plays an important role in the security and perseverance of active defense system against intruder opposite attacks for any business and IT organization. Intrusion Detection Datasets CTU-13 datasets have huge capture of normal, background, botnet traffic and contain traces of thirteen different scenarios of running bots from 7 different families. The datasets are carefully labeled, although the whole traffic from infected hosts was marked as hostile. CTU-13 seems to be the most valuable datasets we are aware of and we believe it should be included in recent intrusion detection research. Intrusion Detection in Cloud To implement IDS in cloud computing, it requires an efficient virtualization scalable and based approach. In cloud computing, application and data are hosted on cloud service providers and cloud user has limited control over its data and resources. In this case, the cloud provider becomes responsible for the administration of IDS. Even though the administrator of cloud IDS should be the user and not the provider of cloud services. In this paper [1], authors have proposed a solution for IDS management that can integrate and combine IDS sensors then output reports on a single interface. The main attacks affecting cloud security at the network layer are IP spoofing, DNS poisoning, port scanning, DOS Denial of Service and Distributed Dos [10]. Traditional network security measures such as firewall are designed to stop intruder attacks but attacks from within the network as well as some complicated outsider attacks can’t be tackled effectively by using such mechanisms [3, 10]. The role of IDS in the security of cloud is very important since it acts as an additional preventive layer of security [5] and apart from detecting only known attacks, it can detect variants of many known and unknown attacks. Snort Snort is an open source network IDS and a prevention system. Snort scans real-time traffic and data flow in the network or in the local datasets. It can detect different types of intrusions, also checks packet against rule written by the developer. Rules in Snort can be written in different languages. Also, it can be easily read and modified. If rules pattern matches with an attack, then it can be easily found and detected but when rules can’t identify new attack then the system fails. To surpass this limitation, we used snort to analyze the real-time traffic. Whenever any packet comes into network then snort checks the network behavior [11]. Snort has some common aspects: • Packet logger: log data in text file and log packets to the disk. • A packet sniffer: A program will capture and display packets from the network on the console. • NIDS: network intrusion detection system, it consists of detecting anomalies of the activity into computers by scanning network traffic [11].
104
M. Essid et al.
Components of Snort: • Packet decoder: is the first component which collects packets from different network interfaces and prepares the packets to be preprocessed. • Preprocessors: this component is used to arrange and modify packets before being analyzed by the detection engine. • Detection engine: is main component of Snort. Its responsibility is to analyze all the packets passing through it for signs of intrusion by using certain pre-defined rules. • Logging and Alerting System: while intrusion detection by the detection engine, either the activity is logged for the network administrator or an alert is generated. • Output modules: This last component is used to control the type of output produced by the logging and alerting system. Bro Bro can perform multi-layer analysis, Behavioral monitoring, Policy enforcement, Policy-based intrusion detection, and Logging network activity Bro detects intrusions by first parsing network traffic to extract its application-level semantics and then executing event-oriented analyzers that compare the activity with patterns deemed troublesome. Its analysis includes detection of specific attacks (including attacks which defined by signatures, but also those defined in terms of events) and unusual activities. Bro analyses the traffic in three phases. First Bro filters the traffic, discarding elements of minimal importance to its analysis. The remaining information is sent to its “event” log engine, where Bro interprets the structure of the network packets and abstracts them into higher-level events describing the activity. Finally, Bro executes policy scripts against the stream of events or database, looking for the activity that the rules indicate should generate alerts or actions, such as possible intrusions. Components of Bro Bro IDS consists of the following major components: • Libpcap • Event Engine • Policy Script Interpreter Comparison of Snort and Bro Comparison of Snort and Bro is made on the basis of different parameters such as speed, signatures, flexibility, deployment, interface and operating system capability. 1. Speed: Snort IDS has the ability to run in high-speed environments. Snort is very effective and able to capture data from Gbps networks. This makes it suitable for more large-scale networks whereas Bro IDS is not able to run perfectly in high-speed networks without dropping packets or slowing down the traffic. 2. Signatures: When it comes to the signatures used for detecting intrusions, the Bro signatures are more sophisticated than the signatures used in Snort. 3. Flexibility: Snort is a flexible intrusion detection system with the possibility of being configured and then specified for its intended computer network. Snort comes with
Distributed Architecture of Snort IDS in Cloud Environment
105
pre-written policy scripts which can be used right out of the box and can detect the most well-known attacks. 4. Deployment: Compared to Snort, which is more a “plug and play” system, Bro is more difficult. Based on the previous comparison our model consists of implementing IDS Snort in Microsoft Azure which will analyze distributed datasets CTU-13 in Microsoft Azure platform. In the section below, we will describe the whole process.
5 Contribution Our work consists in creating Intrusion Detection model based on SNORT IDS, which involves detection intrusion stored into CTU-13 datasets. In order to implement our model, this work is divided in four major steps (Fig. 1):
Fig. 1. The main idea of distributed architecture
A- Importing Database in Cloud B- Combining Datasets
106
M. Essid et al.
C- Configuration Snort in Cloud D- Job executing In our work we are based on Microsoft Azure platform which offers large number of third party services also it’s flexible and less expensive compared to other Cloud platforms. Microsoft Azure is cloud platform developed by Microsoft company which offers several web services. These services are often called remote computing services or cloud services. Microsoft Azure provides huge collection and diverse services. In this work, we will only use three major services provided by Microsoft Azure which we will describe them in each step. A. Importing Database in Cloud Any IDS needs an intrusion detection dataset in order to work properly and analyze different intrusion types. In our case, we use CTU-13 malware datasets the fifth and sixth scenarios [13] which contain large capture of real botnet and the most recent intrusion detection dataset. For better performance, we import the datasets in ABS. • Azure Blob Storage (ABS), is a storage used for Binary Large Object (BLOBs), it can be stored publicly or privately. It provides five types of storage: File Storage, Disk Storage, Queue Storage, Table Storage, Blob Storage. Blob Storage provides three tiers for storing files: hot, cool and archive. – Hot Access Tier, for files that are accessed a lot. It’s expensive to store, but cheap to access. – Cool Access Tier, for files that aren’t accessed frequently, it’s not expensive to store but expensive to access. – Archive Access Tier, for files that aren’t accessed only once or twice a while. They are very cheap to store, but much more expensive to access. They are some advantages of using ABS service in Microsoft Azure [12]: – Permissions management: grant, deny access to stranger user who wants to store or retrieve data. Authentication mechanisms ensure and guaranty data secure from unauthorized access. – Standard interfaces: Use norm interfaces (SOAP and REST) which are arranged to work with any Internet-development toolkit. Then we create a folder which contains both datasets and we start the upload. In case of huge datasets, we can initiate some of cloud services in order to improve transfer rate and make operation time in less time.
Distributed Architecture of Snort IDS in Cloud Environment
107
B. Combining Datasets: In order to combine both fifth and sixth CTU scenarios, we use BigData technology, based on Hadoop/Mapreduce framework and on our previous method (Fig. 2).
Fig. 2. Combining datasets architecture
The main advantage of combining datasets is to improve analyze time rather of scanning datasets one by one and to eliminate data redundancy. In our case we used • HDInsight is a managed Hadoop service using the Hortonworks Data Platform (HDP) and can be installed on Windows or Linux virtual machines. HDInsight clusters can be customized using Script Action to add additional packages or automation processes. HDInsight clusters store data separately in Azure Storage Blobs or Azure Data Lake instead of HDFS. Unlike the data stored in the native HDFS, data in Azure Storage Blobs or Azure Data Lake remains even after the cluster is shut down but it’s limited to 500 TB per account. Users can use Hadoop, Spark, Storm and other products on HDInsight, in our case we will be using MapReduce. In this work, we used the combining method [14] and we adapted it with our datasets by rectifying the second step of horizontal combine. C. Configuration Snort in Cloud Implementation Snort in Cloud environment can be seen in Fig. 3 above [15]. The goal is to deal with attacks stored in the datasets. The Installation and configuration of SNORT are based on Virtual Machine, through Microsoft Azure as our cloud platform and created in Linux Ubuntu Server 16.0.4 LTS virtual machine.
108
M. Essid et al.
Fig. 3. Virtual machine integrated IDS in cloud environment.
Once Virtual machine is running, we must install four essential pre-requisites (pcap, pcre, libdnet, daq) to start the installation of Snort. • Virtual Machine is also one of the web services that provides secure, stable computing capability in the cloud, which is designed for developers to simplify access to cloud resources over the Web. After the installation and the configuration of Snort, we must configure the datasets path which is stored in ABS. D. Job Executing After installing and configuring Snort on Virtual Machine, the next step is to specify the PATH of the Intrusion detection datasets already fused which is located in ABS. Then we load Snort default rules file and run the Job.
6 Experimentation In our experimentation we used CTU-13 datasets which is the most recent intrusion detection dataset. CTU-13 The CTU-13 datasets consist of a group of 13 different malware captures done in a real network environment. Each scenario capture includes Botnet, Normal and Background traffic. The Botnet traffic comes from the infected hosts, and the Normal traffic from the verified normal hosts. Finally, the Background traffic is the rest of the traffic. The datasets are labeled in a flow by flow basis, consisting in one of the largest and more labeled botnet datasets available. In this experimentation, we just used the fifth and sixth scenarios and we combined them to eliminate the redundancy. Then it’s Snort’s task to analyze the datasets. Every display message in Snort is based on the Rule file. Our rule header contains the information that defines the who, where, and what of a packet/dataset, as well as what to do if a packet with all the attributes determined in the rule must show up. The first item in a rule is the rule action. It informs Snort what to do when it catches a packet that matches the rule criteria. There are 3 available default actions in Snort: alert, log, pass.
Distributed Architecture of Snort IDS in Cloud Environment
109
1. Alert: consists of generating alert then we log it into the packet. 2. Log: log the packet. 3. Pass: Consist of ignoring the packet and drop it In the table below (Table 1), there are the results of Snort detection of merged datasets. Table 1. Snort result detection Alert
0.173%
Logged 0.173% Pass
0.051%
7 Discussion In this section, we will compare the previous Snort result with a single datasets result presented below in Tables 2 and 3. Table 2. Snort result in CTU13 fifth scenario Alert
0.273%
Logged 0.273% Pass
0.071%
Table 3. Snort result in CTU13 sixth scenario Alert
0.233%
Logged 0.233% Pass
0.091%
The Alert generated using one dataset for the fifth scenario is 0.273% and 0.233% for the sixth scenario comparing of 0.173% of our proposed system. Similarly, in the case of the sixth scenarios, the pass paquet is 0.091% compared to 0.051% when we combined the both datasets. There is a high improvement in result while comparing combined dataset with a single dataset. Result improvement of our system (Table 1) may be explained by removing duplicate data in both databases which explain the lowest rate in Alert and Logged compared to Table 2 and Table 3 even Pass are improved, also it’s another benefits of combing both datasets into one dataset.
110
M. Essid et al.
8 Conclusion Snort is a widely used open source NIDS, supporting multiple operating system environments. In the context of security, we still have to go miles. In this paper, we have worked in a cloud environment and we proposed an architecture of Snort in Microsoft Azure platform. The advantage of our system is to use this model as a service to detect malicious attack stored in distributed dataset, which will improve security in the cloud environment. Cloud environment provides more benefits for the user. In the next step, we will combine heterogeneous intrusion detection datasets in one dataset in order to maximize the detection rate and get more efficiency in detection performance and lowest rate in false alert, also we will use another IDS and we will compare performance between them, also we aim to handle real time intrusion detection. Moreover, we will use our architecture for intrusion prevention in order to build a complete intrusion detection and prevention system based on the cloud platform.
References 1. Sebastian, R., Feng, C., Meinel, C.: Intrusion detection in the cloud. In: Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing (2009) 2. Lo, C.C., Huang, C.C., Ku, J.: A cooperative intrusion detection system framework for cloud computing networks. In: 39th International Conference on Parallel Processing Workshops, pp. 280–284 (2010) 3. Modi, C., Patel, D., Borisaniya, B., Patel, H., Patel, A., Rajarajan, M.: A survey of intrusion detection techniques in cloud. J. Netw. Comput. Appl. 36, 42–57 (2013) 4. Quick, R.: 5 reasons enterprises are frightened of the cloud (2013). http://thenextweb.com/ insider/2013/09/11/5-reasons-enterprises-are-frightened-of-the-cloud 5. Mell, P., Grance, T.: The NIST definition of cloud computing, National Institute of Standards and Technology Special Publication 800–145 (2011) 6. Lublinsky, B., Smith, K.T., Yakubovich, A.: Professional Hadoop Solutions (2013) 7. Thilina, G.: Hadoop MapReduce v2 Cookbook, Second edn. Packt Publishing Ltd. (2015) 8. El Ayni, M., Jemili, F.: Using MongoDB databases for training and combining intrusion detection datasets. In: International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, vol. 721, pp. 17–29 (2017) 9. Bace, R., Mell, P.: Intrusion Detection Systems, National Institute of Standards and Technology (NIST), Technical Report, 800-31 (2001) 10. Modi, C.N., Patel, D.R., Patel, A., Muttukrishnan, R.: Bayesian classifier and snort based network intrusion detection system in cloud computing. In: Third International Conference on Computing, Communication and Networking Technologies, 26th–28th July 2012 11. Dhage, S.N., Meshram, B.B., Rawat, R.: Intrusion detection system in cloud computing environment. In: International Conference and Workshop on Emerging Trends in Technology TCET, Mumbai, India (2011) 12. Hafsa, M., Jemili, F.: Comparative study between big data analysis techniques in intrusion detection. Big Data Cogn. Comput. 3, 1 (2019) 13. Garcia, S., Grill, M., Stiborek, H., Zunino, A.: An empirical comparison of botnet detection methods. Comput. Secur. J. 45, 100–123 (2014). http://dx.doi.org/10.1016/j.cose.2014.05.01
Distributed Architecture of Snort IDS in Cloud Environment
111
14. Essid, M., Jemili, F.: Combining intrusion detection datasets using MapReduce. In: Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016 15. Modia, C.N., Dhiren, R.P., Avi, P., Muttukrishnan, R.: Integrating signature apriori based Network Intrusion Detection System (NIDS) in cloud computing. In: 2nd International Conference on Communication, Computing and Security (ICCCS-2012), pp 905–912 (2012)
Turing-Style Test Approach for Verification and Validation of Unmanned Aerial Vehicles’ Intelligence Marwa Brichni(B)
and Said El Gattoufi
Higher Institute of Management, SMART Laboratory Tunis, University of Tunis, Tunis, Tunisia [email protected], [email protected]
Abstract. The dynamic aspect of unmanned aerial vehicles requires adopting non-conventional methods for V&V due to the changing context. Special attention was directed to these systems because the testing methods are not standardized as for regular systems. We believe that applying verification and validation to complex phenomena like intelligence or autonomy, besides systems, might bring a standardization proposition to the Software engineering community. The traditional V&V techniques enquire if systems behave correctly, whereas, our proposition will query if self adaptive systems learn correctly to behave or if their behavior evolve correctly as they learn. This paper’s purpose is to check the possibility of establishing a V&V process for “Intelligence” to understand how to evaluate them along with systems. The concept choice takes a part of a proposal aiming at making explicit human intelligence projection over machines to reach the human level machine intelligence. Keywords: Machine intelligence · Human intelligence quotient · Human level machine intelligence · Turing test · Verification and validation
1
Introduction
Intelligent systems assessment is increasingly gaining importance as they represent artificial intelligence evolution. Distinct forms of assessment are illustrated in literature either in the form of metrics [1] or tests [2] or test combined with an aggregation value as a metric [3]. Hence, the diversity of propositions for the sake of evaluating intelligence has received much attention over the last two decades especially for unmanned aerial vehicles. Our work calls into question the applicability of verification and validation to intelligence as an updated process of assessment for properties as long as black box tests were detained to judge about apparent behavior. A growing body of literature has examined verification and validation of systems [4,5], software [6] and simulation models [7] but none have dug to question about the feasibility of c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 112–121, 2021. https://doi.org/10.1007/978-3-030-49342-4_11
Verification and Validation of Unmanned Aerial Vehicles’ Intelligence
113
this process over properties. They draw our attention to the incapacity reason for standardizing non-conventional methods which entail debatable reliability. The changing context characteristic of adaptive systems called as well intelligent systems depends on some key properties such as autonomy and performance wrapped together to form intelligence. We believe that with regards to the late clarification, the utility of a verification and validation process for these properties is fully justified. This paper offers a V&V approach for assessing intelligence. We adopted for this proposition, the rapprochement to the human IQ measure that is based on our expectations and the ability of the control system to do corrections if any uncertainty happens to meet them. In other terms, it is an evaluation of performance joined with an estimation of autonomy. The existing models of intelligence are significantly related to systems’ specifications, which might not be suitable for such a generic concept. Thus, we expressed the need to an overall model such as the abstraction ladder that extracts any complex phenomenon into ladders in order to move from abstraction to substantiation. We brought modification to this model by adding a test level that incorporates other dimensions for an accurate measurement. Our paper is organized as follow, the first part is reserved for presenting a general view of the assessment of the pre-human-level machine intelligence concept and the second part is explaining how the chosen intelligence model is used to transpose the concept key dimensions namely performance and autonomy into scalar quantities that will be used to improve the quantitative measure of intelligence.
2
Background
In an earlier work [8] we explained that flight uncertainties are handled usually by pilots whom are trained for emergencies, nevertheless they can be also the source of emergency. As a solution, UAV constructors proposed autopilots to be trained similarly like human to these situations to restrain from pilot’s dependency since they may cause errors by dint of stress or lack of knowledge or simply loss of consciousness. Unfortunately, most of flight control systems are only performing tasks in a known situation. It is almost the main reason to construct automatic intelligent autopilots to deal with unpredictable circumstances. The intelligent aspect of these autopilots is preferably to be measured in order to increase trust in the controller as much as in the vehicle. We beleive that Intelligence loom up when an uncertain situation happen and that the vehicle perform the current task successfully. One of the limitations of intelligent autopilots is that they are not always meeting expectations set accordingly to tasks complexity since they cannot anticipate all the flight malfunctions. To predict these limitations and to cope with them we need to assess the UAV’s intelligence by defining a testing process that is based on what is expected to happen and what is really happening.
114
M. Brichni and S. E. Gattoufi
The basic idea is to measure the UAV controller’s intelligence [8]. For so, the detailed approach in the next sections will be implemented in an added component to a software control system and used as a metric for intelligence within a well defined testing approach.
3
About Evaluating the Pre-human-level Machine Intelligence
As stated in the introduction, intelligent systems’ assessment is increasingly gaining importance as it measures artificial intelligence evolution. We are interested in human-level machine intelligence since it is the current evolution target. It is obvious to draw the definition of human-level machine intelligence as how the systems reasons and reacts in an uncertain situation similarly like human. The prerequisite for detaining intelligence is the ability of intelligent systems to adjust their own controls based on the information sensed from the source of disturbance as uncertainty. Thus, corrected controls show how intelligent the system is, compared to other systems given the same situation but also compared to the action that a human would take. We took as a reference in this section the recent review [9] revoking the most interesting work related to the machine intelligence measurements. Measuring machine intelligence is considered of actual importance in the development of Artificial Intelligence. Even though the definition of machine intelligence is not yet well defined, the authors believe that it can be measured. They also deduced that the lack of a universal definition for machine intelligence is not as important as the assessment itself, whereas several works found it unworkable to define a measure for a concept that is not well defined. The authors’ conclusion from the survey is that there is no universal opinion about what is machine intelligence and there is no standard measure for it. They believe that evaluating one system’s intelligence should be based on metrics or tests. The measure of intelligence is commonly defined as machine intelligence quotient. We share the most of Iantovics [9] and his fellows claims and we rely on their review that empathize on the absence of works related to evaluating intelligence as a concept to be compared with. Although the article raises a very important point which is the critics about the use of Human IQ tests scores [10], there is a great support to acquire the measurements from the human intelligence as it is the quintessential example of intelligence [11]. In the same context, we should mention that several properties emerged from machine intelligence when it is adopting a human approach or expressing a sort of an anthropocentric orientation such as human-like machine intelligence, human equivalent machine intelligence and human level machine intelligence. Indeed, the idea outlined in [9] states that the difficulty encountered while defining the machine intelligence concept does not affect the possibility of being measured, and in an analogy, human intelligence is not yet precisely defined but there are intelligence tests to measure it. We believe that this hypothesis does
Verification and Validation of Unmanned Aerial Vehicles’ Intelligence
115
not imply human intelligence tests applicability to machines neither deprive the human IQ form of their validity for machines. Based on the literature review [9], there is no universal view of what should an intelligence metric measure when the test exhibits a sort of intelligence. Thus, we consider that measuring machine intelligence or human-level machine intelligence is an open direction context where we can always improve the existing propositions and invent others.
4
Verification and Validation Approach Transformation from Products to Properties
The aim is to present a method to check the possibility of establishing a V&V (verification and validation) process for properties same as for products. In this section we express the need to an overall model such as the abstraction ladder that extracts any complex phenomenon into ladders in order to move from abstraction to substantiation. Using a standard model like the abstraction ladder [12] is very useful to set key properties contributing in the definition of a complex phenomenon, but it can also be confusing when it offers many dimension axes which will lead to divergence of results. In this section, we will specify the use of the abstraction ladder for intelligent systems and verify the dimensions extracted with the dimensions proposed by more specialized models for such systems. 4.1
Abstraction Ladder Adoption for the “Phenomenon” of the Human-Level Machine Intelligence
To characterize the machine intelligence concept, we can always refer to [12] where they extracted four types of constructs. These constructs are used mainly to establish a non-conventional model adopted by any system. This idea is the resultant of the reflection saying that there is no intelligent model that can be used for all smart systems since adaptation is not an easy task to do. With reference to capturing critical aspects of complex phenomena within the abstraction ladder, we consider intelligence as a complex phenomenon and among many possible constructs we chose control correction as an intelligence feature that is not programmable because of its infinite results while adjusting the input. Down along the ladder in Fig. 1 we define the construct’s dimensions, their variables and the corresponding indicators to each one of them to which we can associate a quantifiable measure. Finally, the aggregate of these measures is a new scalar that improves the primary quotient of intelligence based on expectations and facts inspired by William Stern’s human IQ. The steps of the model are respecting the below order: 1. Define the construct of control correction: We strongly believe that intelligence does neither refer to autonomy since it can be programmed nor to performance because it can be assisted. Thus, we adopted the intelligent control correction as a construct of intelligence because it relies on autonomy and performance at the same time.
116
M. Brichni and S. E. Gattoufi
Fig. 1. The modified abstraction ladder
2. Define the construct dimensions: A minimal sufficient number of dimensions is required in order to set the generic model that we should adopt to have only one MIQ(machine intelligence quotient) for each construct. The dimensions extracted from the definition of machine intelligence are autonomy and robust performance. Robust Performance: Performance is the action or reaction generated from the system to reach the desired goal. Robust performance is confirmed when the current situation involves external elements that may change the desired performance and the result is successful. Autonomy: is the main dimension to judge about the system’s intelligence but not sufficient to confirm it. 3. Define the dimension’s variables: in this step, depending on the system that we aim to measure its intelligence, we are going to define variables for each dimension. These variables are captured through the dimensions’ critical aspects such as the environment complexity relating to the robustness aspect of performance dimension, mission complexity indicating the executive side of performance dimension and human intervention stating autonomy dimension. 4. Define the variable’s indicators: Indicators are specific to the current intelligent system under evaluation. They might be either directly measurable such as error tracking, which is the difference between the desired result and the current one while applying the control effort, or indirectly measured using assumptions like activating the wind force or the percentage of human intervention that is randomly selected.
Verification and Validation of Unmanned Aerial Vehicles’ Intelligence
117
If we consider the abstraction ladder as it is, we may find different MIQs for one phenomenon since the steps leading to the indicators’ measures are following a standard model and the integration of measurement could be done differently from one user to another. We encounter across the literature review few works related to autonomous intelligent systems adopting other V&V models [6]. These models can give more accurate results if associated together with the abstraction ladder, that is each one can lead help from the other to define a general and detailed picture of what could represent intelligence. 4.2
Verification and Validation Model for Testing Intelligent Autonomous Machines
Cooper Harper Rating Scale (CHRS) Model for Unmanned Systems’ Validation. Apart from the abstraction ladder proposed by Bien, we surveyed other specific models used to validate unmanned aerial vehicles. In fact, V&V traditional methods are not useful for adaptive systems. Thus, the Cooper-Harper rating scale is augmented to some control systems to validate aircraft handling qualities as a V&V method. The Cooper-Harper Scale (CHS) is the first subjective rating technique taught to student test pilots and test engineers as observers to estimate aircraft handling qualities combining a scaled level with a decision tree approach. This scale is grouping four categories where the workload and the need for change are: uncontrollable, unacceptable, Unsatisfactory but Tolerable, and Satisfactory. The idea suggested in [6] to ensure software safety and reliability, is to use the Cooper-Harper Aircraft Handling Qualities Rating to verify and validate control systems. These processes’ results help to improve the controller’s performance by considering the user feedback to tune the controller. The controller is the system’s entity responsible for the control correction to reach a desired value. The challenge in [6] is to rely on the V&V results as feedbacks to tune controller software adaptively instead of using a fixed value that does not ensure sufficient control. The proposition is to provide the software in charge of the adaptive tuning to compute the correct amount of control required with CooperHarper ratings. Adjusted control, is the expected adaptation to the changes in the operating environment. Larry Young suggested in his work [3] to set scoring criteria based on quantitative metrics of the mission success and the handling quality metrics such as the Cooper-Harper rating scale initially used for manned aircraft. Autonomous aerial vehicles are considered passing the UAV Turing test only if the success metrics’ aggregation and the handling qualities match or if they exceed the equivalent metrics conducted by human pilots. The idea behind this proposition is based on the Turing test principle imitation game. One of the issues also that were detected, is that CHRS is calling for a single rate that is covering three dimensions that are: satisfaction and the need for change, acceptability using workload and performance. Another problem is related to how the scaling is confounding the interpretation of medians that are falling between the rates (3; 4) and (6; 7) also (9; 10). Thus, the quality of
118
M. Brichni and S. E. Gattoufi
measuring using the conventional method is deteriorating because of these issues. Despite these problems, the Cooper-Harper rating scale is always used as the reference method since it forces estimation to a consensus due to the dimensions’ monotonic relationship for handling qualities preventing ambiguities in rating. The focus is pointed to perception cognition and communications to validate MCH since the old dimensions’ terminology are not sufficient for the modified scale. Although that results ensured the efficiency of MCH, it still carries CHRS interpretive difficulties and it considers high workload as the reason for the change. The Bedford Workload Scale (BWS) is based on the same structure and the idea of the CHRS along with the concept “spare capacity” that helps to define levels of workload. This new scale is easy to use without referring to the decision tree and permits to assign rates of 3.5 or 8.5 when necessary as an advantage comparing to CHRS. Testing methodologies developed for manned machines constitute the test misconception conducted for unmanned systems. It is stated that the only difference between driving a manned or an unmanned vehicle is the position of the operator neglecting their role in the decision made on board. 4.3
Abstraction Ladder Modification: Test Level Involvement
Adoption of HLMI Model for Autonomy Assessment: HLMI is the abbreviation of “Human level machine intelligence” that we developed to assess autonomous vehicles [8]. The model’s three axes are inspired from [12] work to include the representative of mission complexity, environment complexity, and human intervention on the basis of the CHRS model logics about autonomy instead of handling qualities. Adopting models designed for manned vehicles is not effective for an unmanned vehicle because of uncertainties. We believe that we need exclusive models for autonomous intelligent vehicles. Another reason to support the objection of using models for manned vehicles is that they rely on performance which does not reflect intelligence. Hence, we look for models that ensure autonomy and robust performance as they are featuring intelligence. We adopted the fuzzification of HLMI model that transforms subjective rates to objective rates and compare them to the results given by the fuzzification of CHRS proposed in [6]. Using HLMI model can remediate to the technical flaw of the calculated CHRS rates. Applicability of Turing Style Test for Performance Assessment: From our perspective, the abstraction ladder model can lead to numerous measures for only one phenomenon which brings to confusion the choice of the most accurate measure. It is why we proposed to involve a test level into the ladder to specify how indicators can constitute valid unique measures of intelligence via other models of intelligent systems according to the selected dimensions. From a larger perspective, traditional verification technology is not applicable for adaptive systems knowing the gap of using conventional tests for the
Verification and Validation of Unmanned Aerial Vehicles’ Intelligence
119
system with high uncertainty, which brings need to draw new techniques to overcome this issue. Since the difficulty of testing decision-making systems stands when uncertainties are unpredictable, we thought about introducing the uncertainty aspect within the proposed model. Thus, even though that the action of operating in a dynamic environment without human supervision is always non-deterministic, the system is improving its correction due to added adaptive components comparing to its ordinary reaction when there are no challenges. The original Turing test is a boolean question which cannot be applied to measure intelligence. The conceivable solution is to convert it from a discrete to concrete metric. A conversion from observation only or by checking flying criteria seen as piloted or as unmanned [3], to a further sophisticated value as an aggregation of missions’ success rate. The aim of the approach defined in Fig. 1 is to combine the Turing test principle with Searl’s Chinese room argument to produce a test that gives identity to the observed behavior to not be affected blindly to human. The purpose also is to support Searl’s argument by introducing uncertainty to the Turing test. The example to this was invoked in [3] by judging the aircraft with successful results when disturbances occur as piloted whereas those jets were unmanned as a matter of fact, which endorses the alleged intelligence. A Human-Based IQ for Machines: Intelligence looms when a vehicle performs the current task successfully in an uncertain situation. Automatic intelligent autopilots use components implementing artificial neural networks to sort the situation (certain, uncertain) that the UAV vehicle is facing during the flight process. The intelligent aspect of these autopilots is preferable to be measured to increase trust in the controller as much as in the vehicle. One of the limitations of intelligent autopilots is that they are not always meeting expectations set accordingly to tasks complexity since they cannot anticipate all the flight malfunctions [8]. We need to assess the UAV intelligence by defining a testing process based on what is expected and what happens, so we cope with this limitation. The Turing style test implementation is conducted through the intelligence verification levels that are defined in each incremented intelligence test to determine whether an adaptive system is a task-oriented machine [13] or an uncertainty-oriented machine when it passes critical situations. Consequently, the intelligence validation phases determines automatically if the vehicle is based on a simple machine intelligence or it aims for pre-towards a human level machine intelligence. At this point, the uncertainty-oriented aspect of intelligence is apparent if the validation value exceeds the verification standard value. If verification and validation values match then the vehicle is based solely on machine intelligence. The previous details depict the similarity between the intelligence-verification level and the chronological human age from one side plus the intelligencevalidation phase and the mental human age from the other side. This analogy is based on the mathematical model designed via the equation of mental and
120
M. Brichni and S. E. Gattoufi
chronological intelligence for a human that informs about the performance. The measured value for intelligence in the validation phase is calculated as the performance value returned by the test tuned with the HLMI rate. The idea behind tuning the performance value returned by the HLMI test do not only rest on the fact that it tests performance but also because we perceived that the human IQ is not reliable. It is compulsory to redefine the IQ formula to take into consideration the desired value and the specifications about the subjects that we can call dimensions of their intelligence. Suggesting a new definition of the human IQ is quite a complex task since it requests many aspects such as, the subject background, mental state, personality, education, general state, feelings, etc. . . Now as to be conducted for machines, these dimensions are not as numerous as for a human being and we are able to select only the most involved ones depending on the context where the machine performs which are introduced by the HLMI model.
5
Conclusion
The present work involves a modification of the abstraction ladder known for evaluating complex phenomenons. The modification consists of introducing a test level to ensure the conducted assessment’s reliability. The new test level introduces assessment V&V models for intelligent systems marking the smooth transition made from a complex concept to a system’s feature. Evaluating machine intelligence using human IQ and Turing style will: match with the human level machine intelligence quotient as long as it sustains the anthropocentric side of the original IQ; validate the use of incremental testing for all types of machines and trust machines’ decision-making process.
References 1. Liu, F., Shi, Y., Liu, Y.: Intelligence quotient and intelligence grade of artificial intelligence. Ann. Data Sci. 4(2), 179–191 (2017) 2. Maguire, P., Moser, P., Maguire, R.: A clarification on Turing’s test and its implications for machine intelligence. In: Proceedings of the 11th International Conference on Cognitive Science, pp. 318–323 (2015) 3. Young, L.A.: Feasibility of turing-style tests for autonomous aerial vehicle intelligence. In: AHS International Specialists’ Meeting on Unmanned Rtorcraft, Chandler, AZ, 23–25 January 2007 (2007) 4. Mili, A., Cukic, B., Liu, Y., Ayed, R.B.: Towards the verification and validation of online learning adaptive systems. In: Khoshgoftaar, T.M. (ed.) Software Engineering with Computational Intelligence. The Springer International Series in Engineering and Computer Science, vol. 731. Springer, Boston (2003) 5. Mili, A., Jiang, G., Cukic, B., Liu, Y., Ayed, R.B.: Towards the verification and validation of online learning systems: general framework and applications. In: Proceedings of the 37th Annual Hawaii International Conference on 2004, p. 10 (2004) 6. Pham, T.-A.: Validation and verification of aircraft control software for control improvement. Master’s thesis, Department of Computer Science, San Jose State University (2007)
Verification and Validation of Unmanned Aerial Vehicles’ Intelligence
121
7. Sargent, R.G.: Verification and validation of simulation models. J. Simul. 7(1), 12–24 (2013) 8. Brichni, M., Gattoufi, S.: HLMIQ of aircraft control software for control intelligence measurement. In: 2018 5th International Conference on Control, Decision and Information Technologies (CoDIT), pp. 228–232. IEEE (2018) 9. Iantovics, L.B., Gligor, A., Niazi, M.A., Biro, A.I., Szilagyi, S.M., Tokody, D.: Review of recent trends in measuring the computing systems intelligence. BRAIN Broad Res. Artif. Intell. Neurosci. 9(2), 77–94 (2018) 10. Dowe, D.L., Hern´ andez-Orallo, J.: IQ tests are not for machines, yet (2012) 11. Besold, T., Hern´ andez-Orallo, J., Schmid, U.: Can machine intelligence be measured in the same way as human intelligence? KI-K¨ unstliche Intelligenz 29(3), 291–297 (2015) 12. Bien, Z., Bang, W.-C., Kim, D.-Y., Han, J.-S.: Machine intelligence quotient: its measurements and applications. Fuzzy Sets Syst. 127(1), 3–16 (2002) 13. Hern´ andez-Orallo, J.: Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement. Artif. Intell. Rev. 48(3), 397–447 (2017)
Big Data Processing for Intrusion Detection System Context: A Review Marwa Elayni1(B)
, Farah Jemili1 , Ouajdi Korbaa1 , and Basel Solaiman2
1 ISITCom, MARS Research Laboratory, LR17ES05, Universite de Sousse, Sousse, Tunisia
[email protected], [email protected], [email protected] 2 ITI Laboratory, IMT Atlantique, 29238 Brest, France [email protected]
Abstract. The rapid growth of data, the increasing number of network based applications, and the advent of the omnipresence of internet and connected devices have affected the importance of information security. Hence, a security system such as an Intrusion Detection System (IDS) becomes a fundamental requirement. However, the complexity of the generated data and their huge size, plus, the variation of Cyber-attacks on: the network traffic, wireless network traffic, worldwide network traffic, connected devices and 5 G communication media, lead to hinder the IDS’s efficiency. Dealing with this huge amount of traffic is challenging and requires deploying new big data security solutions. This paper proposes an overview of intrusion detection which offers a review of IDS that deploy big data technologies and provides interesting recommendations for further study. Keywords: Intrusion detection · Big data · Big heterogeneous data · Fusion data · Machine learning · Big processing tools · Real time processing tools
1 Introduction The security of computer systems is a sensitive problem that requires the implementation of several security elements such as Firewalls, Honeypots and Intrusion Detection System. This latter is one of the most efficient security system which helps to detect any possible threats occurring in the network. Due to the rapid growth of data and the advent of the omnipresence of internet, a number of studies have emerged towards new paradigm architectures, such as Big Data [1] and Cloud computing [2]. Moreover, the existing traffic which is characterized by big heterogeneous data, such as host logs, application logs, network traffic, and wireless network traffic is provoked by the rapid growth of various emerging technologies, such as sensors, connected devices, smart home appliances, smart cities, 5 G communication media [3]. In this context, a CISCO network forecast report has announced that, the world-wide network traffic in 2016 was 96 EB/month and is expected to reach 278 EB/month in 2021 [4]. This statistic © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 122–131, 2021. https://doi.org/10.1007/978-3-030-49342-4_12
Big Data Processing for Intrusion Detection System Context
123
send alarming signals. Thus, it is necessary to have a vision of incoming attacks to detect, predict and prevent them in real time. Similarly, big heterogeneous data pose an even bigger data challenges. Also, detect any malicious event in the real and near real time is challenging. The current monitoring tools are not able to detect those amounts of data due to the huge volume, velocity and variety of data received for analysis. Hence, it is crucial to produce new Big Data security solutions that must be able to handle and analyze large volumes of data and monitor all access to sensitive data in the real time. Fortunately, the use of big data technologies in IDS offer the possibility to deal perfectly with any data in the real time and produce more robust software that can work with big heterogeneous data without causing errors or crashing. Numerous surveys are oriented towards big data processing on intrusion detection [3, 5–8]. In a recent survey [3] the authors focused on the existing literature that deployed real time big data processing for anomaly detection and classify them by big data technology, anomaly detection, and machine learning techniques. The originality of our paper that we will focus on big data processing approaches used for intrusion detection. The major focus will be on intrusion detection process and not only on anomaly detection as mentioned in [3]. The main idea of our paper is to give a global view of intrusion detection process for helping the new researchers to decide in which level they want to intervene. Also, we will focus on the deployment of big data technologies in intrusion detection, which are classified into several kinds; we will find data storage, streams, data analysis and real time processing. This work is structured as follows. Sect 2 presents an overview that explains each step of intrusion detection. It also identifies the most important big data tools used in intrusion detection based on data storage, streams and data analysis. Sect 3 introduces the state of the art of big data processing for intrusion detection. Sect 4 investigates and compares big data approaches based on the proposed taxonomies. Sect 5 presents interesting implications for further investigations. Finally, Sect. 6 provides conclusions of our study.
2 Overview In this section, an overview of the intrusion detection systems based on deploying big data technologies will be introduced. We will focus on IDS process in big data context. Furthermore, we will present three major topics of big data processing tools which are respectively: data storage tools, streams and data analysis tools. 2.1 Intrusion Detection Process According to [9–11]. IDS focuses on data collection, preprocessing, feature selection, implicit correlation and detection. As far as the implicit approach aims to aggregate and group large data sets of alerts into one. However, using this method does not improve the semantics of the alerts, as it’s deployed during the detection step to group the alerts of the same family into one class [10, 12].
124
M. Elayni et al.
In this study, we focus on IDS in the context of big data. We proposed a taxonomy that classifies the current IDS into four major steps which are respectively data collection, the preprocessing, the feature selection, and detection methods. Data Collection and Datasets There are two types of components used to capture and collect traffic; an agent and a sensor. The traffic collected is very variable and heterogeneous. Liao et al. [12] provided a semantic comprehensive review for presenting the data type of intrusion detection traffic. According to them, the traffic can contain any data instances and may include host logs, applications logs, wireless networks and network traffic (Network Packets, SNMP info). However, in the literature, until today, the majority of the works are based on two famous data sets (KDD, DARPA). In the same context, Canali et al. [13] aims to propose new dataset, its traffics are collected from several website contents from the internet. In [14] a dataset has been also created by collecting user behavior for each application during an observed time interval. Similarly, Moustafa et al. [15] used generator tool to create their dataset, namely UNSW-NB15. Preprocessing As far as the preprocessing step is concerned, the main objective is to improve the quality of information. In the literature, preprocessing can be managed by three methods. The manner is the elimination of redundancies; it focuses on eliminating any redundant data or any data that do not have any information. The second is the elimination of the missing values. Due to big data processing tools, we can easily eliminate any uncertain and incomplete information [16]. The last one is the fusion; which is used to combine multiple datasets from heterogeneous sources [17]. In this direction, numerous recent studies [17, 18] focused on the information fusion approach for combining big heterogeneous traffic. Feature Selection The feature selection is based on wrapper method and filter method. The former have high interaction with classifier and they can record features relations to evaluate the quality of features (high computational cost) [9, 10, 19]. The latter use the characteristics of features to filter the irrelevant and redundant ones. The filters are independent of the classifier (low computational cost) [9, 19]. In this due, Ravi Kiran Varma et al. [19] presented a taxonomy for feature selection techniques used in the intrusion detection. In their taxonomy, they presented several techniques such as, the soft computing techniques, rough sets, fuzzy rough sets and ant colony optimization. Detection Method In this part, we start by the strategy of the detection method. In bellow we can distinguish three methods of detection: misuse, anomaly and hybrid, which combine the two methods [9–12, 20]. • The misuse method use specifically known unauthorized behavior patterns, called signatures, to predict and detect subsequent similar attempts. In this case, labeled datasets are required and subsequently supervised algorithms must occur.
Big Data Processing for Intrusion Detection System Context
125
• For the anomaly method, it is designed to discover abnormal behaviors. IDS establish a baseline of normal usage patterns. Any incident occurring at a frequency greater than or equal to two standard deviations from the standard statistical raises an eyebrow and will be reported as a possible intrusion. For anomaly detection, a dataset must be used without labels, subsequently, unsupervised algorithms are required. Several surveys have classified the current IDS based on the strategy of detection methods (anomaly/misuse) and ignored the part of modeling. In the following, we outline the model used in IDS. The most used models are machine learning, data mining, and deep learning and those are described bellows: 1. Machine learning tends to focus on building a system that can optimize its performance in a loop cycle and modify its execution strategy based on the feedback information [10]. Several library machine learning is available for learning a big volume of data such as, Spark Machine Learning Library, Machine Learning Library Mahout under Hadoop. 2. Data mining can improve the intrusion detection process by displaying important models, associations, anomalies and events in the data. In this direction, Shanbhogue et al. [21] provided a survey of machine learning and data mining algorithms used on IDS. They presented also the difference between the two models by providing the majors steps of each model. 3. Deep learning offers a good performance to IDS in the context of big data since it is suitable only for a large amount of complicated data. In [22] the most used models in IDS are: auto-encoders, Convolutional Neural Network (CNN), Restricted Boltzmann Machine (RBM) and Recurrent Neural Networks (RNN). Currently, numerous studies are oriented towards streaming learning classifiers by using machine learning and deep learning to analyze the traffic in real time [23]. 2.2 Big Data Processing Tools In this section, we discussed the big data tools used for the intrusion detection. These tools have been deployed to offer new distributed architectures. Numerous tools are used for data analysis and others are deployed for data storage and streams. • Big data storage: Big data storage tools are used mainly for storing data in longterm or/and short-term storage. We can find Hadoop Distributed File System (HDFS), which is a basic framework for Hadoop used for storing files in record time [24]. Similarly, NoSQL databases are used as database that can supports unstructured data. Numerous NoSQL databases are useful for intrusion detection. Mentioning, MongoDB, Cassandra, Hbase, elasticsearch, Neo4J [18]… Cloud computing is also used for intrusion detection to store and access to data via the internet rather than the local machine [16]. • Streams framework: These tools are dedicated to collect and handle high volume of data streams in real time. We can find for example, kafka apache and flume apache [25].
126
M. Elayni et al.
• Big data analysis: Big data analysis tools can: rapidly process a big volume of data (e.g. Hadoop), handle a huge amount of data in real time (e.g. Spark, Storm) and also process a graph data (e.g. GraphX in Spark). In addition, we can find other analysis tools that support both batch processing and data streaming programs (e.g. Flink apache) [25].
3 State of the Art Big Data Technologies for Intrusion Detection In this section, we will present the recent methods of intrusion detection system based on deploying big data technologies. The main insight of deploying big data is manifested in its capacity to deal with big heterogeneous data in the real time. The existing literature is classified into a set of categories respectively big data processing approaches and real time big data processing approaches. 3.1 Big Data Processing Elayni et al. [18] proposed a new method for the preprocessing step. The authors used a NoSQL database such as MongoDB to handle big unstructured data. Their approach follows different steps which are: feature selection using Analysis Factorial Multiple Correspondence AFMC, redundancies elimination then the vertical combine that present the fusion of all similar records. They used MongoDB for storing data and MapReduce under MongoDB for processing data. In the evaluation step, the authors used bayesian network such as K2 algorithm implemented in WEKA. The training is offline. The results demonstrate that the preprocessing step of big unstructured data give more performance for IDS against the use of a single dataset. In [26] Dahiya et al. introduced a big data framework processing for intrusion detection using Spark. They presented two methods for feature selection. The manner is Canonical Correlation Analysis (CCA). The latter is Linear Discriminant Analysis (LDA). Then, the authors used, for the classification, numerous algorithms such as, Naïve Bayes, Rep Tree, Random Forest, Random Committee Bagging, Randomizable. For the experiment result, the authors used the UNSW NB-15 dataset. Dahiya et al. [26] provided that for big data flow and in their case, LDA feature selection method and Random Forest are the most suitable. Othman et al. [27] aims to propose a new method for intrusion detection namely Spark-Chi-SVM. They used a ChiSqSelector for feature selection and SVM algorithm implemented on spark to learn the KDD Cup 1991. To evaluate their work, the authors compared their proposed model with two different models. The first is the SVM without the Chi-SVM. The latter is the logistic regression classifier with ChiSqSelec-tor for feature selection. The result of the experiment showed that their Chi-SVM model has high performance against the other models. In [28] Marchal et al. proposed an architecture based on big data for large scale security monitoring. The data includes DNS replies, HTTP packets, IP flow records, and Honeypot data. For data storage, DNS traffic, IP flow record and honeypot data have
Big Data Processing for Intrusion Detection System Context
127
been stored into cassandra, HDFS under Hadoop and SQLite database respectively. The authors implemented an algorithm for the classification using Mapreduce, its main idea is based on calculating the anomaly scores according to a specific threshold. However, the calculating model is a little ambiguous and not clear. In the evaluation part, the authors offer a comparative study on the performance of five systems of big data such as Hadoop, Pig, Hive, Spark, and Shark. These tools are evaluated in four different scenarios implicated for the computation of different scores. The experiments results show that the two systems Spark and Shark offer the best performance in all scenarios. 3.2 Real Time Big Data Processing Hafsa et al. [16] proposed an approach based on Apache Spark Streaming to classify intrusions in the real-time. The authors used MAWI dataset. This dataset is stored into cloud environment: Microsoft Azure. The authors focused on HDInsight, which offers the possibility to install and configure Spark within cloud environment. Their proposed approach benefit from using HDInsight as it provides distributed storage. Then, by using Spark Structured Streaming, Hafsa et al. [16] selected the useful feature with Spark select command and deleted the records containing missing values with Spark na.fill command, to finish by removing the duplicated events with drop duplicates command. Considering that these three steps have been manipulated in the real time. Then, for the classification, the authors used a Decision Tree algorithm, which is implemented under Spark Machine Learning Library (MLib). The experimental results of this proposed system showed 99.95% accuracy and more than 55, 175 events per second were processed by the proposed system on a small cluster. Mehta et al. [25] focused on their approach on the problem of the operational flexibility of large scale data management systems. Their approach is based on both the anomaly detection and the real- time streaming. First, Mehta et al. [25] presented their architecture to store any audit data. The traffic is stored in different Oracle databases. Each database has an apache Flume, which is used for collecting, clustering and transporting a big volume of data in a distributed manner. Next, the data are transported in the real time via a central Flume collector to Elastic search for preprocessing and then to Kibana for the visualization. The data used is about the traffic of the European Organization for Nuclear Research (CERN). The machine learning algorithms deployed for the classification are k-nearest neighbors, isolation forests, local outlier factor, and support vector machine using the Mahout library under Hadoop. They found that the existing data storage system can reduce the monitoring load, handle a different kind of data, and perform nearly real-time change propagation. Finally, a recent work [29] aims to provide a real time approach able to process an evolving traffic at high speed, namely Bigflow. This paper proposes two mains in-sights. The manner determines whether the classification used should be accepted or rejected instead of a traditional classification, which classify the event as normal or attack. The latter is based on analyzing traffic in the near real time by using stream learning. In their approach, the authors used for data storage, Kafka apache, which takes as input MAWIFlow dataset. For data analysis, the authors implemented their model on the top of the Flink streaming processing framework. To analyze the input data in streaming mode, Viegas et al. [29] proposed two mains modules. The first is the measurement module. It
128
M. Elayni et al.
is proposed for feature extraction, which aims to compute flow features in window time intervals. To do this the authors [29] used the windowing mechanisms such as, tumbling windows in kafka apache. The second is the reliable stream classification module. In this due, the proposed approach relies on the confidence level. Therefore the classification can only be accepted when the confidence level is met in all the employed stream learning algorithms. Furthermore, the rejected instances support incremental model updates that can incorporates new knowledge into the model based on corrected instances only. Their evaluation is performed through three models: without Bigflow, with Bigflow or with Bigflow but without updates if the event is rejected. They found that their Bigflow has high performance against other models in terms of accuracy, training time …
4 Discussion In this section, we discussed the reviewed works within intrusion detection process proposed in Sect. 2.1. For each work, we present, in Table 1 a comparative study based on: used datasets, the level at the researchers are interested in the preprocessing step, the feature selection and the detection method. Table 1. Comparison of the reviewed works by the intrusion detection process Ref
Datasets
Preprocessing Redundancy
Missing values
Fusion
Feature selection
Detection Misuse
Anomaly
[16]
MAWILab
✓
✓
✗
0
✓
[18]
Kdd99, Darpa99
✓
✗
✓
AFMC
✓
[26]
Kdd99
✗
✗
✗
HPBBA
✓
[27]
Kdd99
✗
✗
✗
ChiSqSelector
✓
[28]
DNS traffic, IP flow, honeypot data
✗
✗
✗
✗
✓
[25]
The CERN traffic
✓
✗
✗
✗
✓
[29]
MAWIFlow
✗
✗
✗
Tumbling windows
✓ Used
✗ Not used
✓
0 Not indicated
Here, we can notice that it is important to pass by each step of IDS for having efficient security systems. However, the discussed works are not interested in the preprocessing step, precisely fusion step [16, 25–27, 29]. Recently, in [30] the authors described the goal and challenges of data fusion in big security data. Eight articles are also presented
Big Data Processing for Intrusion Detection System Context
129
that used several approaches to fuse several types of big data, such as, network traffics, sensor networks, V2X heterogeneous networks… Equally, the mainly insight of the discussed works in our study, is manifested in two options: one that focus on using big data tools [16, 18, 26] and others that focus on implementing their own models [28, 29]. Similarly interesting topic is observed within deployed architecture, each work aims to propose a distributed architecture for presenting an efficient model. In conclusion, the deployment of the distributed architecture helps the experts to use different traffic coming from different source, it can also be visualized, and this is due to the big data storage tools. Likewise, we can treat a real traffic within stream framework. Furthermore, detection attack in real time is one of the big issues for intrusion detection. In the reviewed works, we found that the authors focused on the real time considering the usage of stream framework (e.g. flume apache [25] and kafka [29]) that help to read and collect data in the real time. Also, this work [29] employed interesting mechanism that can mainly increase IDS in the context of big data. It is stream learning algorithms that been used for the classification and the learning of streaming data. Equally, different tools are used in this case such as, Spark streaming, Flink.
5 Research Challenges Interesting lines of the trendiest research which represent the major challenges around the current IDS are presented below: • The visualization of the Big Heterogeneous traffics by using a graphical solution in the context of big data can help to improve cyber security. For example, Cisco [31] uses graph analytics at scale to identify servers controlled by criminals. • High protection is needed for IoT devices. IDS must be able to adapt to the traffic from physical components. In this context, we can cite the most dangerous attack called the Mirai Botnet (2016). Therefore, a process for the collection, detection and prevention are dedicated to the safety of these devices, which are constantly increasing. • Decentralized architecture with Blockchain. In the majority of taxonomies, we find that the structure of IDPS is classified into two types: individual IDPS or collaborative IDPS that represent centralized, hierarchical, and distributed architectures. Nowadays, due to the robust technology such as the Blockchain, we can invest in developing powerful collaborative IDPS in a completely decentralized architecture. In fact, each node is autonomous and can exchange information between neighboring nodes by sharing useful information due to the cryptocurrency such as the Bitcoin, Ethereum [32]……
6 Conclusion This paper introduces a taxonomy of intrusion detection process. It reviews and compares recent studies that deal with a big data environment in intrusion detection. All the research in this area sought to have new systems that are able to handle with different kind of incoming data in the near real or real time. Equally, it must run continuously with minimal or no human supervision. The deployment of big data processing technologies
130
M. Elayni et al.
and distributed architecture in IDS can resolve all these issues. However, many works, that have been reviewed, focus on intrusion detection approaches. Also, the deployment of several distributed architecture, is greatly discussed, in the goal of providing an efficient IDS that can be able to deal with today’s traffic. Then, we compare and analyze the reviewed works based on the proposed taxonomy. Finally, we provide some recommendations and research challenges for further study.
References 1. Vani, Y.K., Krishnamurthy: Survey anomaly detection in network using big data analytics. In: 2017 International Conference Energy Communication Data Analytics Soft Computing (ICECDS) (2017) 2. Sharma, P., Sengupta, J., Suri, P.: Survey of intrusion detection techniques and architectures. Cloud Comput. Int. J. High Perform. Comput. Netw. 13, 184 (2019) 3. Ariyaluran Habeeb, R., Nasaruddin, F., Gani, A., Targio Hashem, I., Ahmed, E., Imran, M.: Real-time big data processing for anomaly detection: a survey. Int. J. Inform. Manag. 45, 289–307 (2019) 4. Provider, S., Forecasts, V., Papers, W.: Cisco visual networking index: forecast and trends, White Paper (2017–2022). https://www.cisco.com/c/en/us/solutions/collateral/service-pro vider/visual-networking-index-vni/white-paper-c11–741490.html 5. Resende, P.A.A., Drummond, A.C.: A survey of random forest based methods for intrusion detection systems. ACM Comput. Surv. 51, 1–36 (2018) 6. Chaabouni, N., Mosbah, M., Zemmari, A., Sauvignac, C., Faruki, P.: Network intrusion detection for IoT security based on learning techniques. IEEE Commun. Surv. Tutorials 21(3), 2671–2701 (2019) 7. Sheenam, S., Dhiman, S.: Comprehensive review: intrusion detection system and techniques. IOSR J. Comput. Eng. 18, 20–25 (2016) 8. Bostami, B., Ahmed, M.: Intrusion detection for big data. Data Anal. 375–402 (2018) 9. Zuech, R., Khoshgoftaar, T.M., Wald, R.: Intrusion detection and big heterogeneous data: a survey. J. Big Data 2(1), 3 (2015). https://doi.org/10.1186/s40537-015-0013-4 10. Patel, A., Taghavi, M., Bakhtiyari, K., Celestino Júnior, J.: An intrusion detection and prevention system in cloud computing: a systematic review. J. Netw. Comput. Appl. 36, 25–41 (2013) 11. Masarat, S., Sharifian, S., Taheri, H.: Modified parallel random forest for intrusion detection systems. J. Supercomput. 72, 2235–2258 (2016) 12. Liao, H.J., Lin, C.H.R., Lin, Y.C., Tung, K.Y.: Intrusion detection system: a comprehensive review. J. Netw. Comput. Appl. 36, 16–24 (2013) 13. Canali, D., Cova, M., Vigna, G., Kruegel, C.: Prophiler. In: Proceedings of the 20th International Conference World Wide Web - WWW 2011 (2011) 14. Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput. Secur. 31, 357–374 (2012) 15. Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS) (2015) 16. Hafsa, M., Jemili, F.: Comparative study between big data analysis techniques in intrusion detection. Big Data Cogn. Comput. 3, 1 (2018) 17. Essid, M., Jemili, F.: Combining intrusion detection datasets using MapReduce. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 4724–4728 (2016)
Big Data Processing for Intrusion Detection System Context
131
18. Elayni, M., Jemili, F.: Using mongodb databases for training and combining intrusion detection datasets. In: Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, pp. 17–29 (2017). https://doi.org/10.1007/978-3-319-62048-0_2 19. Guo, K., Xu, T., Kui, X., Zhang, R., Chi, T.: Towards efficient intelligence fusion for deep learning from real-time and heterogeneous data. Inform. Fusion 51, 215–223 (2019) 20. Lv, K., Chen, Y., Hu, C.: Dynamic defense strategy against advanced persistent threat under heterogeneous networks. Inform. Fusion 49, 216–226 (2019) 21. Shanbhogue, R.D., Beena, B.M.: Survey of data mining (DM) and machine learning (ML) methods on cyber security. J. Sci. Technol. 10, 1–7 (2017) 22. Zhao, R., Yan, R., Chen, Z., Mao, K., Wang, P., Gao, R.X.: Deep learning and its applications to machine health monitoring: a survey. arXiv preprint arXiv:1612.07640 (2016) 23. Ahmad, S., Lavin, A., Purdy, S., Agha, Z.: Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262, 134–147 (2017) 24. Natesan, P., Rajalaxmi, R., Gowrison, G., Balasubramanie, P.: Hadoop based parallel binary bat algorithm for network intrusion detection. Int. J. Parallel Program. 45, 1194–1213 (2016) 25. Mehta, S., Kothuri, P., Garcia, D.L.: A big data architecture for log data storage and analysis. Integr. Intell. Comput. Commun. Secur. Stud. Comput. Intell. 201–209 (2018) 26. Dahiya, P., Srivastava, D.: Network intrusion detection in big dataset using spark. Procedia Comput. Sci. 132, 253–262 (2018) 27. Othman, S., Ba-Alwi, F., Alsohybe, N., Al-Hashida, A.: Intrusion detection model using machine learning algorithm on big data environment. J. Big Data 5(1), 1–12 (2018) 28. Marchal, S., Jiang, X., State, R., Engel, T.: A big data architecture for large scale security monitoring. In: 2014 IEEE International Congress Big Data (2014) 29. Viegas, E., Santin, A., Bessani, A., Neves, N.: BigFlow: real-time and reliable anomaly-based intrusion detection for high-speed networks. Future Gener. Comput. Syst. 93, 473–485 (2019) 30. Yan, Z., Liu, J., Yang, L.T., Pedrycz, W.: Data fusion in heterogeneous networks. Inform. Fusion 53, 1–3 (2020) 31. Cyber security: how Cisco uses graph analytics to identify threats. https://linkurio.us/blog/ cyber-security 32. Meng, W., Tischhauser, E.W., Wang, Q., Wang, Y., Han, J.: When intrusion detection meets blockchain technology: a review. IEEE Access 6, 10179–10188 (2018)
Hardware Accelerator for Real-Time Holographic Projector Mohammed A. Fadhel1,3 , Omran Al-Shamma1 , and Laith Alzubaidi1,2(B) 1 University of Information Technology and Communications, Baghdad, Iraq
{Mohammed.a.fadhel,o.al_shamma}@uoitc.edu.iq, [email protected] 2 Faculty of Science and Engineering, Queensland University of Technology, Brisbane, Australia 3 University of Sumer, Thi Qar, Iraq
Abstract. With increasing popularity holographic method, 3D scene and augmented reality, needless to say, that 3D holography would be playing the most important role of real-time recording display. This paper demonstrates a setup that shows and records the scene in a real-time 3D appearance. We speed up the holographic processing by using a hardware accelerator to take advantage of its parallel specificities (architecture). The results clarify the system’s ability for viewing the holographic objects by applying four cameras running at the same time with a difference in partial of millisecond attributed to reason the clock of cameras and the VGA update. Keywords: Hologram · FPGA · Real-time · Three-dimensional (3D) display
1 Introduction One of the primary techniques that characterizes the next generation 3D TV systems and fulfills surgical-support systems in the field of medicine is the three-dimensional display (so-called 3D screen). In practice, the availability of 3D images in real-time that represent 3D object information in real space makes the ability of bidirectional interactive communication is possible with the next-generation 3D TV systems. In general, there are three types categorizes the 3D display. These types are holographic, light-field, and stereoscopic displays. The first type is based on wave optics [1, 2] while the second [3, 4] and third [5, 6] types are based on geometric optics. The first type can store and regenerate both the light phase and the light intensity as a hologram, and in turn, it can regenerate 3D images with deep depth and high quality. In contrast, the second and third types can store and regenerate only the light intensity. In addition, both types lose the light phase during the 3D image rebuilding, and in turn, the 3D image quality is getting worse. Hence, the holographic display (the first type) is currently taken into consideration [2]. Moreover, focusing on electro-holography [7, 8] means that the moving images can be regenerated via showing holograms on a spatial light modulator (SLM). These holograms © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 132–139, 2021. https://doi.org/10.1007/978-3-030-49342-4_13
Hardware Accelerator for Real-Time Holographic Projector
133
can be easily processed as digital images. Recently, computer-generated holograms, which are determined via a light–numerical simulator on a computer, are widely used in electro-holography. For realizing a 3D display based on electro-holography, several researchers concentrate their investigations on 3D image regenerating [9, 10] CGH (Computer-generated holography) calculation [11, 12] and 3D information acquisition in real space [13, 14]. For instance, the studies [11, 14–17] of regenerating 3D images based on 3D information acquisition in real space are actually do not execute continuously the regenerating processing of 3D images in real-time. Therefore, it is necessary for continuously executing a number of processes to regenerate 3D images in order to realize real-time regenerating in real space based on electro-holography. On the other hand, light-field technique [13] for capturing 3D real-object information to regenerate real-time electro-holography of actual sights was reported. In Ref. [18] a light-field camera with micro-lens array that comprises several primary lens is employed for capturing 3D real-object information as light fields. This technique can simply comprehend occlusion culling. Therefore, 3D image occlusion can be accurately regenerated at changing the eye position. However, if the space between the camera and the 3D object is long, then the image quality based on this technique has considerably gotten worse [19]. By employing various ray-sampling planes, related to the camera positions [20] this problem can easily resolved [21, 22]. Instead, it is necessary to capture the images several times along with changing the depth for clearly acquiring 3D object information. More specifically, based on light-field technique, it is impossible to capture a clear 3D object information with deep depth immediately. This process takes long time for effectively capturing 3D information with deep depth, and in turn, it is a challenging problem for capturing dynamic scenes like a person’s movement. In addition, it is an awful problem, which makes the electro-holography realization of the next-generation 3D TV system is extremely difficult. Regarding to the Ref. [11, 14–17] the processing of the 3D object information based RGB-D camera for regenerating 3D image was not real-time processing. Conversely, the RGB-D camera can be employed for successfully capturing 3D object information in deep depth immediately. Moreover, the view angle of the camera is subject to the lens of the camera. Therefore, the camera can be employed for capturing 3D object images in actual sights. In contrast, the acquired 3D objects comprise a vast volume of background information, which considered as redundant information in regenerating the person’s movement. Thus, the background subtraction technique is used for eliminating the background information in extracting 3D person information [23]. There are more application of hardware implementation [24–26] but in this paper, the performance of the hologram projector is boosted by using a hardware accelerator called FPGA (Field programmable gate array) to provide the system with parallel architecture.
2 Methodology The methodology of the research process, like any process, consists of three stages: inputting, processing, and outputting stages. The inputting stage includes capturing a video for the object of interest and sending the stream data to the FPGA unit. The processing stage includes using the FPGA (as a hardware accelerator) unit for storing the captured video, processing it, and transmitting the result to the VGA display. Lastly,
134
M. A. Fadhel et al.
the third stage includes using a transparent pyramid with VGA display for showing the holographic result. Figure 1 shows the function block diagram of these stages.
Fig. 1. Block diagram of hologram sequences
Initially, four cameras are utilized for capturing a video for the object under consideration. These cameras are fixed-positioned at the four-side of the object (left, right, front, and back), as shown in Fig. 2. Preparing the input video (stream of frames) data to the FPGA board represents the first step in the processing of the hologram. The OV7670 camera interface board is employed for this purpose [27]. It offers the complete functionality of a VGA camera and image processor established in a very small space. In addition, it has the ability to process up to 30 fps in VGA with full user handling all around the transfer of the output data, formatting, and image quality. Furthermore, all functions needed for image processing such as hue control, color saturation, white balance, gamma, and exposure control, are programmable via the SCCB (Serial Camera Control Bus) interface. More details are available in Ref. [28]. From the FPGA side (Altera DE1 SoC board), there are three different ports for inputting the image data. These ports are NTSC for analogue camera, USB for digital camera, and two GPIO (General-purpose input/output) ports, each of 40 pins, for general input/output purposes. Because this study utilizes four cameras, the NTSC and USB ports, which are used for the only single camera, both are ignored. The two GPIO ports are employed for inputting the four-camera data, each port for two cameras (each camera has 8-bit data and the remaining pins are for power and control signals). The second stage represents the processing of the four-camera videos in the FPGA unit. The initial step is storing the video (as a stream of frames) in the SDRAM memory.
Hardware Accelerator for Real-Time Holographic Projector
135
Note that each frame is stored as a memory array of 640 × 480 pixels and each pixel is located eight bits. Next, each frame is compressed by first dividing it into odd and even with a so-called YUV 4:2:2 format, and then, converting the frame to another format called YUV 4:4:4. The resulting frame is sent to the VGA controller, which combine the four-camera frames with their positions in the VGA display. Lastly, the VGA controller sends the final frame to the AVD7123, which convert the received frame to analogue form to be ready for displaying by the VGA display. The last stage is displaying the hologram. The VGA display is placed in a dark room with black color walls for better and clearer hologram displaying. In addition, the display is placed horizontally and it is connected to the FPGA unit via a VGA port, built-in the FPGA board. Note that each camera frame has its own region in the display. These regions are allocated and programmed using the Verilog language of the FPGA unit. Figure 3 shows each camera region on the VGA display. Furthermore, a transparent pyramid (see Fig. 4) is placed on the center of the display in reverse direction (the pyramid base is at the top), hence each side of the pyramid makes 45° angle with the display surface. Therefore, each side of the pyramid reflects one camera stream. The four sides of the pyramid reflect/display the four sides of the object, thus giving a 3D effect.
OV7670 Camera IDE Cable
60 cm limit
DE1-SoC Board
Fig. 2. IDE cable connection between FPGA and four cameras
Corresponding side of the 3D holographic pyramid would then reflect the four cameras’ live streamed video displayed on the monitor.
136
M. A. Fadhel et al.
Fig. 3. Four-camera display VGA screen
Fig. 4. Transparent pyramid
3 Results When plug on the FPGA and upload the Verilog code, the video will show on the VGA display, and then the 3D effect on the pyramid has been seen as in Fig. 5. The only thing that might need to be adjusted is the light and an angle due to the environment. The following Verilog sub-code is repeated for each camera with different GPIO pins. CAMERA_BUFFER CAM0( .CLK_24(CLK_CAM), .RESET(~KEY[0]), .VSYNC(GPIO_0[11]),
Hardware Accelerator for Real-Time Holographic Projector
137
.HREF(GPIO_0[10]), .PCLK(GPIO_0[9]), .XCLK(GPIO_0[8]), .DATA(GPIO_0[7:0]), .CAM_RST(GPIO_0[34]), .CAM_PWDN(GPIO_0[35]), .RD_CLK(CLOCK_VGA), .X_ADDR(TEMP_X0), .Y_ADDR(TEMP_Y0), .VALUE(VAL0), .KEY(~KEY[3]), .FRAME_DONE(VGA_VS) );
This Verilog sub-code was written in side top module to call other part of hologram program like VSYNC (to control vertical synchronization), PCLK (Peripheral Clock), XCLK (External Clock) and other control signal that Synced the processing procedure.
Fig. 5. The generated holographic from four sides
Table 1 clarified the time of execution for each camera dependently. The variation in time attributed to reason the clocks of camera and VGA update.
138
M. A. Fadhel et al. Table 1. Time of processing for each camera Name of camera Time of execution in msec Camera_00
1.472
Camera_01
1.584
Camera_02
1.784
Camera_03
1.924
4 Conclusions In this paper, we presented the holographic projector in real time based FPGA. We overcome the limitation of real time 3D scene by applying a DE1 SoC board to take benefit of parallel processing features. We conclude the following points: 1. The ability to operate four cameras on the single FPGA unit synchronously. 2. The time difference of processing between the four cameras is very small (in µs) and is acceptable for real-time applications. 3. The result is even better when changed the place of the VGA display and transparent pyramid.
References 1. Li, X., Chen, C.P., Li, Y., Zhou, P., Jiang, X., Rong, N., Liu, S., He, G., Lu, J., Su, Y.: Highefficiency video-rate holographic display using quantum dot doped liquid crystal. J. Disp. Technol. 12(4), 362–367 (2016) 2. Zhang, Z., Chen, C.P., Li, Y., Yu, B., Zhou, L., Wu, Y.: Angular multiplexing of holographic display using tunable multi-stage gratings. Mol. Cryst. Liq. Cryst. (Phila. Pa.) 657(1), 102–106 (2017) 3. Lippmann, G.: Epreuves reversibles photographies integrals. C. R. Acad. Sci. 146, 446–451 (1908) 4. Yang, R., Huang, X., Li, S., Jaynes, C.: Toward the light field display: autostereoscopic rendering via a cluster of projectors. IEEE Trans. Vis. Comput. Graph. 14(1), 84–96 (2008) 5. Johnson, P.V., Parnell, J.A., Kim, J., Saunter, C.D., Love, G.D., Banks, M.S.: Dynamic lens and monovision 3D displays to improve viewer comfort. Opt. Express 24(11), 11808–11827 (2016) 6. Lee, S., Park, J., Heo, J., Kang, B., Kang, D., Hwang, H., Lee, J., Choi, Y., Choi, K., Nam, D.: Autostereoscopic 3D display using directional subpixel rendering. Opt. Express 26(16), 20233 (2018) 7. Hilaire, P.S., Benton, S.A., Lucente, M., Jepsen, M.L., Kollin, J., Yoshikawa, H., Underkoffler, J.: Electronic display system for computational holography. Proc. SPIE 1212, 174–182 (1990) 8. Masuda, N., Ito, T., Tanaka, T., Shiraki, A., Sugie, T.: Computer generated holography using a graphics processing unit. Opt. Express 14(2), 603–608 (2006) 9. Hahn, J., Kim, H., Lim, Y., Park, G., Lee, B.: Wide viewing angle dynamic holographic stereogram with a curved array of spatial light modulators. Opt. Express 16(16), 12372–12386 (2008)
Hardware Accelerator for Real-Time Holographic Projector
139
10. Xue, G., Liu, J., Li, X., Jia, J., Zhang, Z., Hu, B., Wang, Y.: Multiplexing encoding method for full-color dynamic 3D holographic display. Opt. Express 22(15), 18473–18482 (2014) 11. Kang, H., Ahn, C., Lee, S., Lee, S.: Computer-generated 3D holograms of depth-annotated images. Proc. SPIE 5742, 234–241 (2005) 12. Kakue, T., Wagatsuma, Y., Yamada, S., Nishitsuji, T., Endo, Y., Nagahama, Y., Hirayama, R., Shimobaba, T., Ito, T.: Review of real-time reconstruction techniques for aerial-projection holographic displays. Opt. Eng. 57(06), 1 (2018) 13. Mishina, T., Okui, M., Okano, F.: Calculation of holograms from elemental images captured by integral photography. Appl. Opt. 45(17), 4026–4036 (2006) 14. Chang, E.Y., Choi, J., Lee, S., Kwon, S., Yoo, J., Park, M., Kim, J.: 360-degree color hologram generation for real 3D objects. Appl. Opt. 57(1), A91–A100 (2018) 15. Zhao, Y., Kwon, K.C., Erdenebat, M.U., Islam, M.S., Jeon, S.H., Kim, N.: Quality enhancement and GPU acceleration for a full-color holographic system using a relocated point cloud gridding method. Appl. Opt. 57(15), 4253–4262 (2018) 16. Zhao, Y., Piao, Y., Park, S., Lee, K., Kim, N.: Fast calculation method for full-color computergenerated hologram of real objects captured by depth camera. Electron. Imaging 2018(4), 250–251 (2018) 17. Zhao, Y., Shi, C., Kwon, K., Piao, Y., Piao, M., Kim, N.: Fast calculation method of computergenerated hologram using a depth camera with point cloud gridding. Opt. Commun. 411, 166–169 (2018) 18. Ichihashi, Y., Oi, R., Senoh, T., Yamamoto, K., Kurita, T.: Real-time capture and reconstruction system. Opt. Express 20(19), 21645–21655 (2012) 19. Yamaguchi, M.: Light-field and holographic three-dimensional displays [Invited]. J. Opt. Soc. Am. A 33(12), 2348–2364 (2016) 20. Wakunami, K., Yamashita, H., Yamaguchi, M.: Occlusion culling for computer generated hologram based on ray-wavefront conversion. Opt. Express 21(19), 21811–21822 (2013) 21. Wakunami, K., Yamaguchi, M.: Calculation for computer generated hologram using raysampling plane. Opt. Express 19(10), 9086–9101 (2011) 22. Igarashi, S., Nakamura, T., Matsushima, K., Yamaguchi, M.: Efficient tiled calculation of over10-gigapixel holograms using ray-wavefront conversion. Opt. Express 26(8), 10773–10786 (2018) 23. Heikklä, M., Pietikäinen, M.: A texture-based method for modeling the background and detecting moving objects. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 657–662 (2006) 24. Al-Shamma, O., Fadhel, M., Hameed, R., Alzubaidi, L., Zhang, J.: Boosting convolutional neural networks performance based on FPGA accelerator. In: International Conference on Intelligent Systems Design and Applications, pp. 509–517. Springer, Cham (2018) 25. Fadhel, M., Al-Shamma, O., Oleiwi, S., Taher, B., Alzubaidi, L.: Real-time PCG diagnosis using FPGA. In: International Conference on Intelligent Systems Design and Applications, pp. 518–529. Springer, Cham (2018) 26. Al-Shamma, O., Fadhel, M.A., Hasan, H.S.: Employing FPGA accelerator in real-time speaker identification systems. In: Recent Trends in Signal and Image Processing, pp. 125– 134. Springer, Singapore (2019) 27. OV7670, CMOS VGA. CAMERACHIPTM Sensor Datasheet, OmniVision. http://www.cut edigi.com/pub/sensor/Imaging.OV7670-Datasheet.pdf 28. Board, DE1-SoC. Terasic (2017)
Automatic Lung Segmentation in CT Images Using Mask R-CNN for Mapping the Feature Extraction in Supervised Methods of Machine Learning Lu´ıs Fabr´ıcio de F. Souza, Gabriel Bandeira Holanda, Shara S. A. Alves, Francisco H´ercules dos S. Silva, and Pedro Pedrosa Rebou¸cas Filho(B) Laborat´ orio de Processamento de Imagens, Sinais e Computa¸ca ˜o Aplicada, Instituto Federal do Cear´ a, Fortaleza, Brazil [email protected] http://lapisco.ifce.edu.br
Abstract. According to the World Health Organization the automatic segmentation of lung images is a major challenge in the processing and analysis of medical images, as many lung pathologies are classified as severe and such conditions bring about 250,000 deaths each year and by 2030 it will be the third leading cause of death in the world. Mask R-CNN is a recent and excellent Convolutional Neural Network model for object detection, localization, segmentation of natural image object instances. In this study, we created a new feature extractor that functions as Mask R-CNN kernel for lung image segmentation, yielding highly effective and promising results. Bringing a new approach to your training that significantly minimizes the number of images used by the Convolutional Network in its training to generate good results, thereby also decreasing the number of interactions performed by network learning. The model obtained results evidently surpassing the standard results generated by Mask R-CNN. Keywords: Lung segmentation · Deep learning Feature extractor · Digital image processing
1
· Mask R-CNN ·
Introduction
According to the World Health Organization (WHO), several respiratory diseases such as pneumonia and lung cancer are considered severe [1]. Regarding asthma, 400 million people might have acquired the pathology by 2025, increasing the number of cases significantly, compared to previous data [2,3]. Chronic Obstructive Pulmonary Disease (COPD) comprehends a significant public health problem. It obstructs the passage of air through the lungs causing severe consequences and resulting in permanent inflammation and destruction of the alveoli [4]. As well as interstitial lung diseases, which represents a group c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 140–149, 2021. https://doi.org/10.1007/978-3-030-49342-4_14
Automatic Lung Segmentation in CT Images Using Mask R-CNN
141
of more than 200 chronic pulmonary disorders distinguished by inflammation in the tissues of the lung causing scarring. It is called fibrosis, and in some cases, it results in stiffness of the lungs interfering in the ability to capture and transport oxygen to the bloodstream leading to partial or permanent loss of respiratory capacity [5]. A research held by WHO showed that such pathologies bring about 250,000 deaths each year, making COPD the third leading cause of death worldwide by 2030 [6]. For diagnostic purpose, computed tomography (CT) is one of the most commonly used in pulmonology [7]. The diagnosis of interstitial lung diseases is usually based on the evaluation of high-resolution CT images. The earlier diagnosis, the better since it is essential for an adequate clinical treatment [3,5]. The specialist draws a region of interest on the exam towards regions that characterize pathological changes, visually segmenting it. However, there are changes that are quite small and almost invisible to the naked eye, thus it turns such task more limited [8]. The primary focus of our work is to propose a new deep learning-based approach to efficient segment images in order to make possible to use all the power of Convolutional Neural Networks (CNN) regardless of the small training set available. Our proposal creates a model alongside Mask R-CNN, which significantly minimizes the use of large amounts of images used by the CNNs to generate good results. In this context, one can also highlight the number of interactions decreasing carried out by the network’s learning process when it comes to the segmentation of lung images.
2
Related Works
The problem of segmenting images is a classic challenge in the medical field, a bunch of methods in digital image processing has been used in order to segment medical images in the literature: region growth [9], thresholding [10], active contour method [11,12], watershed [13] and wavelet [14]. Deep learning-based approaches have been widely applied in a variety of tasks, e.g., people detection on beach images [15], speech recognition in noisy environments [16], wind speed forecasting [17]. It revealed good performance in several applications related to images classification and both regions and objects recognition [18]. The state-of-the-art studies highlight the ability of CNNs and its range of challenges in classification, segmentation and object detection [19,20]. Related to CNN to tackle lung diseases, the authors in [21] conducted a study on the survival prediction of patients with lung cancer through supervised techniques; in [22] it was used to detect lung lesions by pixel scanning on CT; classification of lung diseases [23]; automatic pulmonary segmentation of chest CT images [24]; segmentation of multi-classes in chest radiographs [25]; in [26] it was proposed a contour assessment for lung cancer focusing on the start point of segmentation based on automatic models; in [27] it was tackled 2D and 3D segmentation on chest CT with diffuse lung disease with CNN. One can find more works in [28].
142
L. F. de. F. Souza et al.
The increasing number of works using CNNs is due to its effectiveness in extracting data through its network layers from the shallowest layers to the deeper layers (with combinations of applied filters). Such layers are capable of extracting a broader range of data generating complete information regarding the domain they are employed [19,29]. All these CNN methods commonly require a large amount of data and, thus, they are capable of generating good results.
3 3.1
Methods of Literature Mask R-CNN
Mask R-CNN is an improvement of R-CNN [30], Fast R-CNN [31] and Faster R-CNN [32] that represents deep neural network created to solve instance segmentation problem in computer vision [33]. The method is composed by two essential segments, to wit, (i) Region Proposal Network that creates several proposals for regions in a single image in its training stage, and for each proposal, called Region of Interest (ROI), to the second segment (ii) Object Detection Network and Mask Prediction, both are processed in parallel within the same structure at each defined ROI, then the network predicts masks belonging to all classes. Region Proposals are suppressed in the detection boxes where they are processed by the Mask Forecast, thus, having the ROIs and the object classes of the image based on the mask prediction of the network it will produce a 4D tensor. The work of [32] used a feature extractor next to the region proposal network followed by the procedure ROI-Pooling, which realized appropriate outputs for classifier input. In the Mask R-CNN model, changes were made, the ROI-Pooling procedure was changed by ROI-Align used to create segmentation masks, with a network head, to create the segmentation of the desired instances, and then the mask and class predictions are dismembered, thus resulting in the use of a loss function L = Lclc + Lbbox + Lmask [33]. 3.2
Classifiers
The classifiers we use in the lung segmentation process of the proposed method are briefly described below. The output of each of these classifiers represents the region of interest in the lung image. Naive Bayes classifier is a probabilistic model based on the Bayes theorem [34], where the characteristics are independent of each other, and thus, given the occurrence of Y, we are able to find the probability of an event X occurring. K-Nearest Neighbors (KNN) [35] is a supervised learning method that labels a sample as a class using the similarity between this sample and others already labeled. Distances between the attribute vectors of these samples are computed for this purpose. Multilayer Perceptron (MLP) consists of an input layer, a hidden layer, and an output layer, each composed of nodes with a nonlinear activation function
Automatic Lung Segmentation in CT Images Using Mask R-CNN
143
[36]. The input signal is propagated by all layers, and the learning process is based on the backpropagation technique. MLP allows tackling of non-linearly separable problems. Support Vectors Machine (SVM) is a classifier based on the statistical learning theory [37] that is able to find in N-dimensional space (N is the number of attributes).
4
Proposed Method
The images used in this paper represent a set of 39 lung CT scans, with 1,265 pictures segmented at a resolution of 512 × 512 pixels with a depth of 16 bits. Approved, validated, and validated by the Research Ethics Committee - COMEPE (Protocol No. 35/06) - Resolution No. 196/96 [12]. This work designs a new automatic lung segmentation method of CT images using Mask R-CNN for mapping regions inside and outside lung to automate the feature extraction in machine learning. The configuration parameters that we used were the Faster R-CNN Inception V2 as a features extractor, a learning rate of 0.0002, a batch size of 1, for 2 classes during 1000000 steps. The proposed approach is depicted in Fig. 1.
Fig. 1. Automatic Lung Segmentation in CT images, using Mask R-CNN for mapping the feature extraction in supervisioned methods of machine learning.
Figure 1(a) consists of training a Mask R-CNN Model specialized for mapping lung regions through images already segmented by a specialist from the dataset. The knowledge of Mask R-CNN Lung Model is then stored after training. The next step depicted in Fig. 1(b) returns the lung mapping which is computed using the already trained Mask R-CNN Lung Model and a new lung image
144
L. F. de. F. Souza et al.
as input. The Eq. 1 explains the lung map point. M ap(x, y) returns 1 when Mask R-CNN probably found a lung region, and 0 otherwise (background region). 1, Lung BinaryM ap(x, y) = (1) 0, not Lung After obtaining the lung map, the next step (Fig. 1(c,d)) is to calculate the maximum pulmonary map and the minimum pulmonary map in order to obtain maximum and minimum local lung regions. To compute the maximum pulmonary map, a morphological expansion is necessary on the original map in which the region extrapolates the pulmonary region as shown in Fig. 1(c), where the red region represents the result of such morphological operation. On the other hand, it is necessary to apply a morphological erosion in the original map to calculate the minimum pulmonary map, where the region is internal to the whole lung as shown in green in Fig. 1(d). For these morphological operations it was necessary to define an automatic convolutional kernel since a lung varies greatly in size to each slice of a CT scan. In order to initially calculate this kernel shown in Eq. 2, one needs to compute first the M apDensity that corresponds to the number of points on the lung map with a value of 1. kernel = (ImageHeight )
M apDensity ImageSize
(2)
It is necessary to find the outermost polygon for the maximum and the innermost for the least pulmonary map in order to find out the lung key points. Therefore, the outermost polygon of the maximal pulmonary map returns points which are external to the lung and the outermost polygon of the least pulmonary map returns points which are internal to the lung. The result is presented in Fig. 1(e) where green colored points comprehends the key points internal to the lung, and the external ones are in red. Finally, the Fig. 1(f) consists of performing the training of supervised methods of machine learning in order to achieve the lung segmentation. To do it so, we use the pulmonary density (found by the previously computed Key Points) as the input attributes. The External Points to the lung are given to the classification method as class 0 samples, and the internal ones as class 1. Thus, the training is carried out using classical methods of classification and it returns which points of the image belongs to lung regions.
5
Results and Discussions
In this section, we present the results related to the computational cost and segmentation metrics. All experiments were conducted on personal laptop with an Intel Core i7 at 2.9 GHz, 8 GB of RAM, GPU NVIDIA GeForce GTX 1050 TI running Ubuntu 16.04. The following metrics were used for evaluation: intensity adjustment, size adjustment, shape adjustment, sensitivity and specificity, and accuracy.
Automatic Lung Segmentation in CT Images Using Mask R-CNN
145
Two experiments were carried out to evaluate and validate our proposal. The first objective was to evaluate the performance of our relationship proposal with the original Mask R-CNN and to identify which classifier obtained the best result without half project; The second experiment aims to validate our experience of lung segmentation compared to the work of [38]. It is an experiment applying our method (R-CNN Mask + SVM) over the lung images used in [38]’s experiment. Each one of the experiments is described below. The first experiment aims at evaluating our methodology on lung segmentation tasks (dataset described in Sect. 4) against the standard Mask R-CNN. To do it so, we compared the performance of Mask R-CNN with our methodology using the following classifiers Bayes, k-NN, SVM with RBF kernel, and MLP (see Sect. 3). We stored the accuracy and runtime, and since we are handling segmentation we also computed the metrics: position, shape, intensity and size. One can see in Fig. 2 the segmenting result of each approach. The results are presented in Table 1 and as one can see, overall, all others approaches outperformed the Mask R-CNN alone, the final decision stayed between SVM and MLP. One can see the whole picture by analyzing the Fig. 3 which presents the aforementioned metrics and its standard deviations using bar plot. In medical tasks, high accuracy values are extremely desired, however when we handle segmentation tasks we need to also pay attention to the sensitivity results, since it worths more high sensitivity values (setting lung region as lung regions) than high specificity values (setting non-lung regions as non-lung regions), but when one see lower specificity values as one can see on Bayes, k-NN, and MLP results it means that some lung regions had been misclassified as non-lung regions. This situation means that it could happen eventually some lung nodules in such regions wouldn’t be considered since it was wrong segmented as non-lung regions. Moreover, high standard deviations values were found with them as well, for example, regarding accuracy we had values greater than 6 up to 11,414 against the 2,742 using SVM.
Fig. 2. Example of results of the comparison among different classifiers using Mask R-CNN to create a Lung Map, a) Input image, b) Mask+kNN, c) Mask+Bayes, d) Mask+MLP and e) Mask+SVM
Having this discussion in mind, we ended up setting our best methodology as the one using SVM classifier since it successfully segmented lung images and outperformed Mask R-CNN and the others combination. SVM approach got the
146
L. F. de. F. Souza et al.
Table 1. Segmentation evaluation metrics using Mask R-CNN, and classifiers as Kernel of Mask R-CNN. Approach Mask Mask Mask Mask Mask
Metrics Position Adjustment Intensity Adjustment Size Adjustment Shape Adjustment
R-CNN + Bayes + KNN + SVM + MLP
97,72 ± 4,194 98,426 ± 2,476 98,178 ± 2,237 99,099 ± 2,230 99,001 ± 2,249
96,07 ± 5,969 96,193 ± 7,473 93,956 ± 7,419 98,276 ± 3,717 97,833 ± 4,428
94,93 ± 8,140 87,656 ± 12,822 85,933 ± 8,126 95,858 ± 6,097 94,452 ± 7,512
76,71 ± 17,045 76,117 ± 16,790 71,714 ± 12,306 85,991 ± 11,325 84,568 ± 12,041
Accuracy
Sensitivity
Specificity
89,86 ± 4,524 86,428 ± 11,414 83,576 ± 6,360 95,729 ± 2,742 93,877 ± 6,763
87,62 ± 16,965 91,073 ± 14,964 96,868 ± 8,918 96,639 ± 10,361 97,366 ± 8,564
86,60 ± 6,344 78,064 ± 27,548 65,048 ± 16,789 92,128 ± 4,987 87,614 ± 15,472
Fig. 3. Automatic Lung Segmentation in CT images, using Mask R-CNN for mapping the feature extraction in supervisioned methods of machine learning.
highest metric values in all metrics, accuracy value of 95, 72 ± 2, 74, specificity of 92,12 ± 4,98, but sensitivity value of 96,63 ± 10,36 against the first one MLP with 97,36 ± 8,56. Specially attention to shape segmentation metric that was pretty difficult to all methods, our methodology also outperformed the compared ones. Therefore, our methodology is successfully evaluated and now we aim at validating it against a related work. The work of [38] tackled lung segmentation using a method based on active contour using 36 lung images. In our second experiment we ran our methodology, that was already trained in the previously experiment, over the 36 lung images in order to segment them, store the metrics and compare with [38] results. The Table 2 shows this comparison, one can see that our approach outperformed [38] regarding position (99.36 ± 0.80 against 99.08 ± 0.11), intensity (99.63 ± 0.42 against 98.11 ± 0.96) and seconds (11.67 against 12.87) metrics and was so similar concerning size (97.41 ± 0.59 against 98.81 ± 0.41) and shape (95.12 ± 0.92 against 96.71 ± 0.61) metrics. The standard deviations values were so lower on both approaches is checked. If IDˆd1 = IDd1 – Verifyμ : The generated triple ← Setup[1k ] 2. Extractλ, μ : For any identities IDd1 and IDd2 : PrvKd1 = Extractλ, μ [IDd1 ] with IDd1 = (IDUNIQUE1 , IDtemp1 ) = (IDRFID1 , DevEUI1 , IDtemp1 ) PrvKd2 = Extractλ, μ [IDd2 ] with IDd2 = (IDUNIQUE2 , IDtemp2 ) = (IDRFID2 , DevEUI2 , IDtemp2 ) PrvKd1 = Extractλ , μ [(IDRFID1 , DevEUI1 , IDtemp1 )] PrvKd2 = Extractλ , μ [(IDRFID2 , DevEUI2 , IDtemp2 )] 3. 4. 5. 6.
Signμ : < Sig, E > ← Signμ [PrvKd1 , IDd1 , mes] Encryptμ : CTXT ← Encryptμ [PrvKd1 , IDd2 , mes, Sig, E] ˆ ˆ Sig> ← Decryptμ [PrvKd2 , CTXT] Decryptμ : ← Setup[1k ] 2. Extractλ, μ : For any identities IDd1 and IDobject : PrvKd1 = Extractλ, μ [IDd1 ] with IDd1 = (IDUNIQUE1 , IDtemp1 ) = (IDRFID1 , DevEUI1 , IDtemp1 ) PrvKobject = Extractλ, μ [IDobject ] with IDobject = (DevEUIobject , DevAddrobject ) So we have PrvKd1 = Extractλ, μ [(IDRFID1 , DevEUI1 , IDtemp1 )] PrvKobject = Extractλ , μ [(DevEUIobject , DevAddrobject )] 3. 4. 5. 6.
Signμ : < Sig, E > ← Signμ [PrvKd1 , IDd1 , mes] Encryptμ : CTXT ← Encryptμ [PrvKd1 , IDobject , mes, Sig, E] ˆ ˆ Sig> ← Decryptμ [PrvKobject , CTXT] Decryptμ : =50) co-citations (b).
6 Experiments’ Results The performance evaluation of the method proposed in the present work for measuring the Semantic Similarity (SS) between co-cited papers is based on two analytical experiments. The first is based on word-based benchmarks and aims to study the performance of the word-based semantic similarity measures outlined in the previous section; the second analyzes the semantic similarity between the highly co-cited papers using the dataset DBLP citation Network4 . We extract from the DBLP dataset 2305156 distinct co-citations. From this dataset, we select a subset (ϒ) including the co-citations having a frequency of more than 3. The resulted subset contains 83143 different co-citations. The semantic similarity between the titles of the referenced papers in each co-citation pertaining to ϒ is then computed. The computing process is described in Fig. 1 and modeled in Eq. 1. The different SS measures cited in the present work are used to visualize the percentage of similar papers related to a target co-citation through the threshold θ (see Fig. 4). 4 http://aminer.org/billboard/DBLP_Citation.
274
M. A. Hadj Taieb et al.
Fig. 4. Curves representing the distribution of similar co-citations using different semantic similarity measures with a variation of the threshold θ .
Despite the difference at the used computational models, Fig. 4 shows that several semantic similarity measures have similar behaviors, such as (Liu1 and Liu2) and (Hao and WP). Moreover, we note that the Zhou measure provides the most high similarity values when compared to other measures. We also note the marked congestion visualized by the curves for the θ -values ∈ [0.5, 1]. This could presumably be attributed to the fact that highly similar co-citations are limited (Elkiss et al. 2008) (for example, for θ = 0.7 and the measure Zhou, we have the percentage 2.43%). Furthermore, Fig. 4 shows a set of measures (Li, Liu1, Liu2 and Hadj1) whose curves exist in the center. Each point pertaining to a curve refers to the percentage of co-citations having a similarity degree higher than or equal to the threshold θ . For example, when θ = 0.1 for the curve assigned to the measure WP, the percentage of SemSim tP1 , tP2 ≥ 0.1 is 57.29%. The analysis of the semantic similarity between the paper pairs referenced by the co-citations through their titles shows that the majority are not similar (Fig. 4, θ ≥ 0.6), confirming that most of the papers are not co-cited because they are similar but they are mainly co-cited so that they can be complementarily used to explain an idea in the target paper as precisely shown in (Small 1986).
7 Conclusion and Future Work In this paper, we propose a new method for analyzing the semantic similarity between the referenced papers in a co-citation. This method can be considered as a continuation of (Elkiss et al. 2008) and (Jeong et al. 2014). Our approach is based on the semantic similarity measures applied on paper titles. Each title is submitted to a pre-treatment analytical step, including principally the POS tagger tool of Stanford, to extract the nouns and transform them into their lemma. Among the used SS measures, we exploit the intrinsic IC-based SS measures combining between IC-computing methods and IC-based measure. The evaluation includes the use of the dataset DBLP citations network.
Paper Co-citation Analysis Using Semantic Similarity Measures
275
Therefore, these SS measures are exploited to calculate the SS between the titles of referenced papers related to highly repeated co-citations. In fact, a couple of paper titles is considered similar if the soft value provided by a SS measure is higher than a threshold θ . The analysis of the semantic similarity of the co-citations, according to the dataset of DBLP, shows that most of the highly repeated co-citations are not or slightly similar, which proves the complementarity between the co-cited papers exploited to explain an idea in the target paper. Considering the promising results generated by the method proposed in the present study, further research, some of which is currently underway in our laboratory, is needed to apply it in a process of co-citation clustering and investigate the topical relatedness between co-cited papers and to investigate if SS measures of the titles of co-cited papers significantly correlate with co-citation proximity, with co-citation frequency, and with human judgment of the similarity of co-cited papers.
References Ben Aouicha, M., Hadj Taieb, M.A., Ben Hamadou, A.: LWCR: multi-layered Wikipedia representation for computing word relatedness. Neurocomputing 216, 816–843 (2016) Braam, R.R., Moed, H.F., Van Raan, A.F.: Mapping of science by combined co-citation and word analysis I. Structural aspects. J. Am. Soc. Inf. Sci. 42(4), 233 (1991a) Braam, R.R., Moed, H.F., Van Raan, A.F.: Mapping of science by combined co-citation and word analysis II. Dynamical aspects. J. Am. Soc. Inf. Sci. 42(4), 252 (1991b) Chen, C.: Visualising semantic spaces and author co-citation networks in digital libraries. Inf. Process. Manag. 35(3), 401–420 (1999) Chen, C., Song, I.Y., Zhu, W.: Trends in conceptual modeling: citation analysis of the ER conference papers (1979–2005). In: Proceedings of the 11th International Conference on the International Society for Scientometrics and Informetrics, pp. 189–200 (2007) Elkiss, A., Shen, S., Fader, A., Erkan, G., States, D., Radev, D.: Blind men and elephants: what do citation summaries tell us about a research article? J. Assoc. Inf. Sci. Technol. 59(1), 51–62 (2008) Eto, M.: Evaluations of context-based co-citation searching. Scientometrics 94(2), 651–673 (2013) Fellbaum, C.: WordNet: An Electronic Lexical Database (Language, Speech, and Communication), illustrated edn. MIT Press, Cambridge (1998) Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. IJCAI 7, 1606–1611 (2007) Gao, J.B., Zhang, B.W., Chen, X.H.: A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng. Appl. Artif. Intell. 39, 80–88 (2015) Hadj Taieb, M.A., Ben Aouicha, M., Ben Hamadou, A.: A new semantic relatedness measurement using WordNet features. Knowl. Inf. Syst. 41(2), 467–497 (2014b) Hadj Taieb, M.A., Ben Aouicha, M., Ben Hamadou, A.B.: Ontology-based approach for measuring semantic similarity. Eng. Appl. Artif. Intell. 36, 238–261 (2014a) Haggan, M.: Research paper titles in literature, linguistics and science: dimensions of attraction. J. Pragmat. 36(2), 293–317 (2004) Hao, D., Zuo, W., Peng, T., He, F.: An approach for calculating semantic similarity between words using WordNet. In: 2011 Second International Conference on Digital Manufacturing and Automation (ICDMA), pp. 177–180. IEEE (2011)
276
M. A. Hadj Taieb et al.
Hou, J., Yang, X., Chen, C.: Emerging trends and new developments in information science: a document co-citation analysis (2009–2016). Scientometrics 115(2), 869–892 (2018) Jeong, Y.K., Song, M., Ding, Y.: Content-based author co-citation analysis. J. Informetr. 8(1), 197–211 (2014) Letchford, A., Moat, H.S., Preis, T.: The advantage of short paper titles. Roy. Soc. Open Sci. 2(8), 150266 (2015) Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15(4), 871–882 (2003) Lin, D.: An information-theoretic definition of similarity. In: ICML, pp. 296–304 (1998) Liu, X.Y., Zhou, Y.M., Zheng, R.S.: Measuring semantic similarity in WordNet. In: 2007 International Conference on Machine Learning and Cybernetics, vol. 6, pp. 3431–3435. IEEE (2007) Magerman, T., Van Looy, B., Song, X.: Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications. Scientometrics 82(2), 289–306 (2010) Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014) Meng, L., Gu, J.: A new model for measuring word sense similarity in WordNet. In: Proceedings of the 4th International Conference on Advanced Communication and Networking. SERSC, Jeju, Korea, pp. 18–23 (2012) Meng, L., Gu, J., Zhou, Z.: A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int. J. Grid Distrib. Comput. 5(3), 81–94 (2012) Merrill, E., Knipps, A.: What’s in a title? J. Wildlife Manag. 78(5), 761–762 (2014) Robertson, S.E., Sparck Jones, K.: Document retrieval systems. In: Willett, P. (ed.) Relevance Weighting of Search Terms, pp. 143–160. Taylor Graham Publishing, London (1988) Sánchez, D., Batet, M., Isern, D.: Ontology-based information content computation. Knowl.-Based Syst. 24(2), 297–303 (2011) Small, H.: Co-citation context analysis and the structure of paradigms. J. Document. 36(3), 183– 196 (1980) Small, H.: Co-citation in the scientific literature: a new measure of the relationship between two documents. J. Assoc. Inf. Sci. Technol. 24(4), 265–269 (1973) Small, H.G.: A co-citation model of a scientific specialty: a longitudinal study of collagen research. Soc. Stud. Sci. 7(2), 139–166 (1977) Small, H.: Macro-level changes in the structure of co-citation clusters: 1983–1989. Scientometrics 26(1), 5–20 (1993) Small, H.: The synthesis of specialty narratives from co-citation clusters. J. Am. Soc. Inf. Sci. 37(3), 97–110 (1986) Small, H., Sweeney, E.: Clustering the science citation index® using co-citations: I. A comparison of methods. Scientometrics 7(3–6), 391–409 (1985) Small, H., Sweeney, E., Greenlee, E.: Clustering the science citation index using co-citations. II. Mapping science. Scientometrics 8(5–6), 321–340 (1985) Sternitzke, C., Bergmann, I.: Similarity measures for document mapping: a comparative study on the level of an individual scientist. Scientometrics 78(1), 113–130 (2009) Sullivan, D., Koester, D., White, D., Kern, R.: Understanding rapid theoretical change in particle physics: a month-by-month co-citation analysis. Scientometrics 2(4), 309–319 (1980) Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 990–998. ACM (2008) Thijs, B., Glänzel, W.: The contribution of the lexical component in hybrid clustering, the case of four decades of “Scientometrics”. Scientometrics 115(1), 21–33 (2018)
Paper Co-citation Analysis Using Semantic Similarity Measures
277
van Eck, N.J., Waltman, L.: Citation-based clustering of publications using CitNetExplorer and VOSviewer. Scientometrics 111(2), 1053–1070 (2017) Wang, N., Liang, H., Jia, Y., Ge, S., Xue, Y., Wang, Z.: Cloud computing research in the IS discipline: a citation/co-citation analysis. Decis. Supp. Syst. 86, 35–47 (2016) Wang, T., Hirst, G.: Refining the notions of depth and density in wordnet-based semantic similarity measures. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1003–1011. Association for Computational Linguistics (2011) Wang, X., Zhao, Y., Liu, R., Zhang, J.: Knowledge-transfer analysis based on co-citation clustering. Scientometrics 97(3), 859–869 (2013). https://doi.org/10.1007/s11192-013-1077-6 Whittaker, J.: Creativity and conformity in science: titles, keywords and co-word analysis. Soc. Stud. Sci. 19(3), 473–496 (1989) Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994) Zhou, Z., Wang, Y., Gu, J.: New model of semantic similarity measuring in WordNet. In: 3rd International Conference on Intelligent System and Knowledge Engineering, ISKE 2008, vol. 1, pp. 256–261. IEEE (2008)
Assessment of the ISNT Rule on Publicly Available Datasets J. Afolabi Oluwatobi1(B) , Gugulethu Mabuza-Hocquet2 , and Fulufhelo V. Nelwamondo2 1 University of Johannesburg, Johannesburg, South Africa
[email protected] 2 Council for Scientific and Industrial Research, Pretoria, South Africa
{gmabuza,fnelwamondo}@csir.co.za
Abstract. The ISNT rule is a technique that has been used to detect glaucoma from fundus images. The rule states that for a healthy fundus image, the segmented optic disc can be divided into four neuro-retina rim quadrants namely; the Inferior, Superior, Nasal and Temporal neuro- retina rims. The Inferior is the widest followed by the Superior then the Nasal. The Temporal quadrant is the least. However, since the advent of the rule there have been several experiments that prove the inefficiency of the rule to diagnose glaucoma while other experiments argue that the rule is efficient. Experiments carried out by individuals were done using dataset sourced by the individuals not on publicly available fundus datasets. This makes the experiments not easily reproducible. This work assesses the ISNT rule using the RIM-ONE v3 dataset and the DRISHTI-GS dataset which are both publicly available datasets. The performance of the ISNT rule on the datasets is compared with the performance of a trained Extreme Gradient Boost classifier (XGB). The results show that the XGB classifier outperforms the ISNT rule and its’ variant. The ISNT rule demonstrated a random performance on the databases used. Keywords: Retinal fundus image · Glaucoma · Blood vessel segmentation · ISNT · Image segmentation
1 Introduction Glaucoma is an ocular diseases characterized by progressive degeneration of the optic disc and the retinal ganglion cells [1, 2]. It is a leading cause of blindness and usually comes with no obvious symptom at its early stages. Diagnosis of glaucoma is usually carried out by evaluating structural changes in the optic disc [3, 4]. The ISNT rule was first proposed by Jonas et al. in the year 1988 and has since been used for the diagnosis of glaucoma [5, 6]. The ISNT rule states that for a healthy fundus image, the segmented optic disc can be divided into four neuro-retina rim quadrants namely; the Inferior (I), Superior (S), Nasal (N) and Temporal (T) neuro- retina rims. The Inferior is the widest © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 278–286, 2021. https://doi.org/10.1007/978-3-030-49342-4_27
Assessment of the ISNT Rule on Publicly Available Datasets
279
followed by the Superior then the Nasal and the Temporal quadrant being the least i.e. I > S > N > T [5–9]. Several studies in research have been carried out to verify the ability of the ISNT rule to effectively detect glaucoma [10–13]. From the conducted studies it cannot be established that the ISNT rule can effectively detect glaucoma. Authors like Harizman et al. [10] and Chan et al. [14] concluded that the ISNT rule is effective in detecting glaucoma while Morgan et al. [12]. Pogrebniak et al. [13] and Qui et al. [2] concluded that the ISNT rule has limited potential in detecting glaucoma. Sihota et al. [11] was inconclusive about the use of ISNT rule to detect glaucoma. Furthermore, most of the studies were conducted on privately sourced dataset and fundus images. This makes the results not reproducible and as a result, cannot be easily verified. This work uses publicly available fundus images to assess the ISNT rule. This makes the work reproducible and the results verifiable. This work uses the RIM-ONE v3 and the DRISHTI-GS database to assess the extent to which ISNT rule can be used to detect glaucoma. Moreover, a classifier is trained using the extracted I, S, N and T values and its performance in detecting glaucoma is tested on both databases. This is done using the cross validation method. Finally, the performance of the ISNT rule is then compared with that of the classifier. The contributions made by this work include an assessment of the ISNT rule on two publicly available dataset and an alternative use of the I, S, N, and T values is proposed. The rest of this paper is organized as follows: Sect. 2 discusses the related work, Sect. 3 discusses the proposed approach of the experiment, Sect. 4 presents the results of the experiment and Sect. 5 discusses the limitations of the study. Section 6 presents the conclusion and the last section explains the future work.
2 Related Work The analysis of the neuro-retina rim in optic discs has been a subject of great interest. The Inferior (I), Superior (S), Nasal (N) and Temporal (T) features are obtained from such analysis. The ISNT rule as already stated explains that these features follow a known pattern in healthy optic discs and has been reported to detect glaucoma in some cases. Hariz et al. [11] tested the ISNT rule on 66 non-glaucomatous eyes and 43 open-angle glaucoma. Their objective was to determine whether the ISNT rule can differentiate nonglaucomatous eyes from glaucomatous eyes. This was done by subjecting all subjects to rigorous eye examination. The examination included perimetry, laser ophthalmoscopy and disc photography. After the examination, an eye was randomly selected from each subject (i.e. the selected eye could be the left or right eye). The ISNT rule was then assessed after extracting the ISNT features from the optic disc photographs. They found out that the ISNT rule was consistent in 52 of the 66 non-glaucomatous eyes and 12 of 43 glaucomatous eyes. They concluded that the ISNT rule can be used to differentiate non-glaucomatous eyes from glaucomatous eyes and is not affected by differences in race. Sihota et al. [11] evaluated the effectiveness of the ISNT rule in discriminating between non-glaucomatous eyes and early glaucoma. In the experiment, 136 subjects with non-glaucomatous eyes and 63 subjects with primary open-angle glaucoma were
280
J. A. Oluwatobi et al.
subjected to Heidelberg Retina Tomograph (HRT) and achromatic automated perimetry. It was discovered that the ISNT rule was applicable in 71% of the subjects with nonglaucomatous eyes and 68% of the early glaucoma subjects. However, in their report, Sihota et al. were not conclusive about the effectiveness of the use of ISNT rule in differentiating non-glaucomatous eyes from early glaucoma. Morgan et al. [12] determined how good an optic disc can be classified as glaucomatous or non-glaucomatous using the ISNT rule. A total of 129 subjects were used in the experiment: 78 subjects with open-angle glaucoma and 51 with closed-angle glaucoma. The initial classification categorized as non-glaucomatous or glaucomatous eyes was done by two experts based on the shape of the optic disc and the subjects’ visual field. The ISNT rule was broken down into three separate Boolean comparisons which are I > S, S > N, N > T. The result was published based on the positive likelihood that the ISNT rule was observed by the 129 subjects. The evaluation was carried out by three expert observers and their positive likelihood ratio was reported as 1.11 (at 95% Confidence Interval (CI)), 1.07 (at 95% CI) and 1.06 (at 95% CI) respectively. Morgan et al. concluded that the ISNT rule is not a very good technique for detecting open–angle glaucoma. Pogrebniak et al. [13] carried out an experiment to find out if non-glaucomatous optic disc does not follow the ISNT rule in children. The experiment was done on a total of 131 children. This was done by obtaining fundus images of children with large non- glaucomatous optic disc. The width of the neuro-retinal rims was then extracted for further analysis. The results showed that only 16% of non-glaucomatous non-premature children followed the ISNT rule and 21% of non-glaucomatous eyes but with a history of prematurity. The results also showed that 73% of children with normal optic disc followed the ISNT rule. Their experiment concluded that ISNT rule is more applicable in children with normal optic disc and also that the inherent shape of the optic disc greatly affects the applicability of the ISNT rule. Chan et al. [14] evaluated the accuracy of the ISNT rule and some of its variant using subjects that are Asian adults. The subjects went through standard eye examinations and glaucoma subjects were defined using the International Society Geographical & Epidemiological Ophthalmology (ISGEO) standards. The extensive experiment was carried out using 6,112 subjects and 11,840 eyes were used. There were 249 eyes with glaucoma and 11, 591 eyes without glaucoma. The results showed that 232 (93.2%) out of 249 eyes with glaucoma violated the ISNT rule as expected but only 1,823 (15.7%) out of 11,591 eyes followed the ISNT rule. Chan et al. concluded that the ISNT rule may be useful when combined with other techniques such as the HRT algorithms for glaucoma detection. Qiu et al. [2] evaluated the performance of the ISNT rule on the Retinal Nerve Fiber Layer (RNFL) thickness and further assessed the ISNT rule on 138 non-glaucomatous but myopic eyes (myopia is an eye condition that causes light rays to be focused in front of the retina thereby making the subject to see near objects clearly but far objects are not clearly seen). The results showed that 88.4% and 37% of the eyes did not follow the ISNT rule on the RNFL thickness and the rim area respectively. Qiu et al. concluded that ISNT rule and its variants are not very good options in differentiating glaucoma from non-glaucomatous eyes especially in myopic eyes.
Assessment of the ISNT Rule on Publicly Available Datasets
281
It should be noted that all the research-work discussed were carried out on privately sourced subjects and datasets and each research-work has its own peculiar result even though the same ISNT rule was being evaluated.
3 Proposed Experimental Approach RIMONE v3 [15, 16] and DRISHTI-GS [17, 18] databases are used for this experiment. The two databases are used because they both have the optic discs and the optic cups segmentation provided as ground-truth and thus eliminating any error that can arise as a result of improper segmentation. The segmentations provided in the databases (shown in Fig. 1 and Fig. 2). were carried out by trained experts. Furthermore, the segmented optic discs and optic cups are properly labelled. The RIMONE v3 dataset is labelled as ‘glaucoma’, ‘suspect’ and ‘normal’. DRISHTI-GS dataset contains only ‘glaucoma’ labelled segmentations.
Fig. 1. (a) Fundus image showing the optic disc and cup RIMONE v3 (b) Optic disc segmentation of a left eye from RIMONE v3 (c) Optic cup segmentation of a left eye from RIMONE v3
Fig. 2. (a) Fundus image showing the optic disc and cup from DRISHTI-GS database (b) Optic disc segmentation from DRISHTI-GS database (c) Optic cup segmentation from DRISHTI-GS database
The ground-truths (these are segmented optic discs and segmented optic cups) and the dataset labels are extracted from the database. The segmented optic discs and cups are then masked along their sectors to obtain the Inferior, Superior, Nasal and Temporal quadrants as shown in Fig. 3. This is done using the ogrid library of the numpy package.
282
J. A. Oluwatobi et al.
The masking is done along the 3, 6, 9 and 12 o’clock positions. The I, S, N and T values are then extracted from the prior created quadrants. Some of the values extracted are shown in Table 1. The extracted values are then re-arranged based on the eye to which the segmented optic disc belongs to (The segmented optic disc belongs to either the left eye or the right eye). This is done because the Nasal and Temporal quadrants differ according to which eye it is. This is shown in Fig. 3.
Fig. 3. (a) ISNT evaluation for a right eye (b) ISNT evaluation for a left eye
Table 1. I, S, N and T values extracted from their quadrants for the RIM-ONE V3 and DRISHTIGS databases Database
Serial number in database
I
S
N
T
Label (‘glaucoma’ or ‘normal’)
RIM-ONE v3
1
30
26
30
32
normal
2
38
16
32
24
normal
3
27
26
16
34
normal
4
24
25
35
17
normal
1
17
17
26
12
glaucoma
DRISHTI-GS
2
6
5
7
12
glaucoma
1
5
14
11
9
glaucoma
2
5
8
8
3
glaucoma
3
12
17
10
13
glaucoma
4
2
7
7
5
glaucoma
5
6
6
6
4
glaucoma
Subsequently, the analysis of the I, S, N and T values is performed. This step assesses the ISNT rule. An Extreme Gradient Boost (XGB) classifier is then trained with the I, S, N and T values and its performance in discriminating between glaucoma and non-glaucoma is obtained. The proposed approach is further described by the following algorithm.
Assessment of the ISNT Rule on Publicly Available Datasets
283
Step 1: Extraction of ground-truth segmentation and labels Step 2: Masking of optic cup and optic disc segmentations to obtain the ISNT quadrants. Step 3: Obtaining the I, S, N and T values for each pair of optic disc and optic cup Step 4: Analysing the obtained values to check for consistency with the ISNT rule Step 5: Training an XGB classifier with the ISNT values and using 5- fold cross validation method to test the accuracy in discriminating between glaucoma and non-glaucoma.
4 Experiment Results and Analysis The ISNT rule is tested on the RIM-ONE v3 database and the DRISHTI-GS database. The RIM-ONE v3 database has 158 segmented optic discs and cups, in which 39 of them are labelled as ‘glaucoma’, 84 are labelled ‘normal’ and 35 are labelled as ‘glaucomasuspect’. The DRISHTI-GS has 50 segmented optic discs and cups. All segmented optic discs and cups in DRISHTI-GS database are labelled ‘glaucoma’. In our analysis, we focus only on ‘glaucoma’ and ‘normal’ labelled segmented optic discs and cups. We leave out the optic discs and cups that are labelled as ‘suspect’. This is done so as to have a clearer estimation of the ISNT rule’s performance because an item labelled ‘suspect’ could either be normal or glaucomatous. The experiment was carried out using Kaggle’s 2 CPU cores, 14 GB RAM. It is expected that all segmented optic discs labelled ‘normal’ should follow both the ISNT rule and its variant. Also, it is expected that none of the optic discs labelled ‘glaucoma’ should follow neither the ISNT rule nor its variant. In the RIM-ONE v3 dataset, five out of 84 segmented optic discs labelled ‘normal’ follow the ISNT rule and 26 follow a variant of the ISNT rule (i.e. I ≥ S ≥ N). The ISNT rule variant does not include the Temporal (T) values in its computation. Also in the dataset, two out of 39 segmented optic discs labelled ‘glaucoma’ follow the ISNT rule and six follow the ISNT variant (i.e. I ≥ S ≥ N). In the DRISHTI-GI dataset, three out of 50 segmented optic discs follow the ISNT rule and eleven follow the ISNT variant (i.e. I ≥ S ≥ N). It should be noted that DRISHTIGS dataset contains only ‘glaucoma’ labelled segmented optic discs. Table 2 shows the outcome of the ISNT rule for both the RIM-ONE v3 and the DRISHTI-GI databases. Table 1 shows some I, S, N and T values from both the RIM-ONE v3 database and the DRISHTI-GS database. The table also includes the serial number and label of the optic disc and cup as it is in the databases. In Table 1 it can be seen that none of the fundus image labelled normal follows the ISNT rule and so are the images labelled ‘glaucoma’. However, it can be seen that fundus images labelled ‘normal’ have higher I, S, N and T values than the ‘glaucoma’ labelled images. Table 2 shows the percentages of RIM-ONE v3 and DRISHTI-GS datasets that follow the ISNT rule. The table shows a very little conformity to the ISNT rule especially by the RIM-ONE v3 dataset. Only about 6% follow the ISNT rule and 31% follow the ISNT variant. It can also be seen that about 5% of the ‘glaucoma’ labelled optic discs and cups in both databases follow the ISNT rule. For an optimum performance, it is expected that all the ‘normal’ labelled optic disc should follow the ISNT rule (and its’ variant) and none of the ‘glaucoma’ labelled optic
284
J. A. Oluwatobi et al. Table 2. ISNT rule performance on the RIM-ONE V3 and DRISHTI-GS databases RIM-ONE v3
DRISHTI-GS
Percentage of non-glaucomatous optic discs that follow the rule (%)
Percentage of glaucomatous optic discs that follow the rule (%)
Percentage of glaucomatous optic discs that follow the rule (%)
ISNT rule (I ≥ S ≥ N ≥ T)
5.95
5.13
6
ISNT variant rule (I ≥ S ≥ N)
30.95
15.38
22
discs should follow the rule. Hence, we should expect close to 100% conformity (not 5.95 and 30.95) from the ‘normal’ labelled optic discs and about 0% conformity (not 5.13, 15.98, 6 and 22) from the ‘glaucoma’ labelled optic discs. We further trained an Extreme Gradient Boost classifier (XGB) using the I, S, N and T values. An XGB classifier was used because of its flexibility and proven performance in regression and classification tasks [19]. The XGB classifier was first tested only on the RIM-ONE v3 dataset using five cross validation. It was then tested on both the RIM-ONE v3 and DRISHTI-GS datasets combined. The obtained results are shown in Table 3. The metrics used for testing are precision, recall, ROC-AUC and accuracy. Table 3. XGB classifier performance on both the RIM-ONE V3 and DRISHTI-GS databases RIM-ONE v3 RIM-ONE V3 + DRISHTI-GS Precision
0.74
0.87
Recall
0.81
0.88
ROC-AUC 0.83
0.90
Accuracy
0.87
0.81
Table 3 shows that the classifier does better in classifying segmented optic discs into glaucoma and normal. The performance of the classifier becomes optimum when more data from the DRISHTI-GS dataset is added. This is expected as a classifier performs better when trained with more instances. In order to make a comparison between the performance of the ISNT rule and the XGB classifier, we represent the performance of the ISNT rule using the same metrics used for the XGB classifier. This is shown in Table 4. Table 4 shows that the performance of the ISNT rule and its variant is erratic and close to random. Although the ISNT variant performs better, the XGB classifier however, has the best performance. The ISNT rule may not have a good performance on the databases used in this experiment, however the I, S, N and T values have proven to be of great relevance especially when they are used to train a classifier.
Assessment of the ISNT Rule on Publicly Available Datasets
285
Table 4. Comparison between ISNT rule and XGB classifier
Precision
ISNT rule (I ≥ S ≥ N ≥ T)
ISNT variant rule (I ≥ S ≥ N)
XGB classifier
0.50
0.60
0.87
Recall
0.06
0.31
0.88
Accuracy
0.51
0.57
0.87
5 Limitation of Study The study is carried out using the ground-truths available in the databases chosen. Hence, the study assumes the provided ground-truths are very accurate segmentations of the optic discs and cups. Furthermore, ‘normal’ labelled optic discs and cups may be affected by other ocular diseases other than glaucoma. The effect of which is not quantified in this study.
6 Conclusion The ISNT rule and its variant are not able to discriminate excellently between normal segmented optic discs and glaucoma segmented optic disc using the RIM-ONE v3 and the DRISHTI-GI databases. The ISNT rule could be useful in detecting deep glaucoma as reflected in the result from the DRISHTI-GS but may not be used for detecting early and moderate glaucoma. Though the ISNT rule may not prove very useful in discriminating normal from glaucoma optic discs, the I, S, N and T values are very useful and should be used to train a classifier. The XGB classifier out-performed both the ISNT rule and its variant.
7 Future Work The study will be carried out using more publicly available databases. This will give a wider overview of the ISNT rule’s performance. Also, a study that compares the performance of ISNT rule with other methods of glaucoma detection will be conducted.
References 1. Moon, J., Park, K.H., Kim, D.M., Kim, S.H.: Factors affecting ISNT rule satisfaction in normal and glaucomatous eyes. Korean J. Ophthalmol. 32(1), 38–44 (2018) 2. Qiu, K., Wang, G., Lu, X., Zhang, R., Sun, L., Zhang, M.: Application of the ISNT rules on retinal nerve fibre layer thickness and neuroretinal rim area in healthy myopic eyes. Acta Ophthalmol. 96(2), 161–167 (2018) 3. Quigley, H.A., Broman, A.T.: The number of people with glaucoma worldwide in 2010 and 2020. Br. J. Ophthalmol. 90(3), 262–267 (2006) 4. Shen, S.Y., Wong, T.Y., Foster, P.J., Loo, J., Rosman, M., Loon, S., Wong, W.L., Saw, S., Aung, T.: The prevalence and types of glaucoma in malay people: the Singapore malay eye study. Invest. Ophthalmol. Vis. Sci. 49(9), 3846–3851 (2008)
286
J. A. Oluwatobi et al.
5. Jonas, J., Dichtl, A.: Optic disc morphology in myopic primary open-angle glaucoma. Graefe’s Arch. Clin. Exp. Ophthalmol. 235(10), 627–633 (1997). https://doi.org/10.1007/BF00946938 6. Jonas, J.B., Gusek, G.C., Naumann, G.O.: Optic disc, cup and neuroretinal rim size, configuration and correlations in normal eyes. Invest. Ophthalmol. Vis. Sci. 29(7), 1151–1158 (1988) 7. Bhartiya, S., Gadia, R., Sethi, H.S., Panda, A.: Clinical evaluation of optic nerve head in glaucoma. Curr. J. Glaucoma Pract. DVD 4, 115–132 (2010) 8. Tan, M.H., Sun, Y., Ong, S.H., Liu, J., Baskaran, M., Aung, T., Wong, T.Y.: Automatic notch detection in retinal images, pp. 1440–1443 (2013) 9. Narasimhan, K., Vijayarekha, K., JogiNarayana, K.A., SivaPrasad, P., SatishKumar, V.: Res. J. Appl. Sci. Eng. Technol. 4(24), 5459–5463 (2012). ISSN 2040-7467 10. Harizman, N., Oliveira, C., Chiang, A., Tello, C., Marmor, M., Ritch, R., Liebmann, J.M.: The ISNT rule and differentiation of normal from glaucomatous eyes. Arch. Ophthalmol. 124(11), 1579–1583 (2006) 11. Sihota, R., Srinivasan, G., Dada, T., Gupta, V., Ghate, D., Sharma, A.: Is the ISNT rule violated in early primary open-angle glaucoma-a scanning laser tomography study. Eye 22(6), 819–824 (2008) 12. Morgan, J.E., Bourtsoukli, I., Rajkumar, K.N., Ansari, E., Cunliffe, I.A., North, R.V., Wild, J.M.: The accuracy of the inferior > superior > nasal > temporal neuroretinal rim area rule for diagnosing glaucomatous optic disc damage. Ophthalmology 119(4), 723–730 (2012) 13. Pogrebniak, A.E., Wehrung, B., Pogrebniak, K.L., Shetty, R.K., Crawford, P.: Violation of the ISNT rule in nonglaucomatous pediatric optic disc cupping. Invest. Ophthalmol. Visual Sci. 51(2), 890–895 (2010) 14. Chan, E.W., Liao, J., Foo, R.C.M., Loon, S.C., Aung, T., Wong, T.Y., Cheng, C.: Diagnostic performance of the ISNT rule for glaucoma based on the Heidelberg retinal tomograph. Transl. Vis. Sci. Technol. 2(5), 2 (2013) 15. Fumero, F., Alayon, S., Sanchez, J.L., Sigut, J., Gonzalez-Hernandez, M.: RIM-ONE: an open retinal image database for optic nerve evaluation, pp. 1–6 (2011) 16. Maninis, K.K., Pont-Tuset, J., Arbeláez, P., Van Gool, L.: Deep retinal image understanding: medical image computing and computer-assisted intervention (MICCAI), pp. 1–8 (2016) 17. Sivaswamy, J., Krishnadas, S.R., Joshi, G.D., Jain, M., Tabish, A.U.S.: Drishti-GS: retinal image dataset for optic nerve head (ONH) segmentation, pp. 53–56 (2014) 18. Krishnadas, J.S.R., Chakravarty, A., Joshi, G.D., Tabish, A.U.S.: A comprehensive retinal image dataset for the assessment of glaucoma from the optic nerve head analysis. JSM Biomed. Imaging Data Pap. 2(1), 1004 (2015) 19. Matousek, J., Tihelka, D.: Using extreme gradient boosting to detect glottal closure instants in speech signal. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6515–6519 (2019)
An Autonomous Fallers Monitoring Kit: Release 0.0 Enrique de la Cal1(B) , Alvaro DaSilva2 , Mirko F´ an ˜ez2 , Jose Ram´on Villar1 , 2 arez3 Javier Sedano , and Victor Su´ 1
3
Computer Science Department, University of Oviedo, Oviedo, Spain {delacal,villarjose}@uniovi.es 2 Instituto Tecnol´ ogico de Castilla y Le´ on, Pol. Ind. Villalonquejar, 09001 Burgos, Spain [email protected],[email protected],[email protected] Control and Automatica Department, EPI, University of Oviedo, Gij´ on, Spain [email protected]
Abstract. Fall is one of the main causes of the reduction of the Quality of Live in Elderly, since it has a high impact in mortality rate as well as in the probability of different disabilities like Fractures, head injuries, etc, and mainly a loss of confidence. Thus, a monitoring instrument to follow elderly activity is required and the most common non-invasive and easy-to-use are based on wearable devices. This work presents our first prototype of an autonomous, low-cost and easy-to-use elderly activity monitoring kit suitable to be deployed in nursery houses. The prototype is composed of: i) our own data analytics web application deployed in a low cost server located in our laboratory (SERVER) and ii) a transparent plastic case with a set of 10 commercial smartbands (WD), their respective charging docks, one NUC (Next Unit of Computing) as data gateway, one Wi-Fi router and one LTE router. Concerning the case, we can state that each WD has built-in a 3DACC (3DA), a Gyroscope (GY) and a wrist Heart Rate sensor (HR). Besides each WD is running our own software instead the pre-built commercial one in order to achieve two issues: recording high frequency data (1 Hz for HR and 10 Hz for 3DA and GY) and optimize the battery life (18 h with 10 Hz continuous sensors recording). Finally, a first trial of the presented kit is being carried out in a Nursery House in Burgos (Spain), and the participants enrolled criteria have been designed by a gerontologist from the Diputaci´ on de Burgos (Spain).
Keywords: Falls in elderly
1
· Wearable sensors · Fall detection
Introduction and Motivation
Healthy aging is one of the main challenges that is asking industry and researchers for products and services to assist people remain independent, productive, active and socially connected for longer. Fall is one of the main causes Supported by University of Oviedo and ITCL. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 287–297, 2021. https://doi.org/10.1007/978-3-030-49342-4_28
288
E. de la Cal et al.
of the reduction of the Quality of Live in Elderly, since it has a high impact in mortality rate as well as in the probability of different disabilities like Fractures, head injuries, etc, and mainly a loss of confidence. Besides, the influence in the Healthcare public and private systems is high, considering both, direct and indirect costs. Thus, non-invasive tools to monitor and analyse Elderly activity are required. The most common non-invasive and easy-to-use tools to measure the activity in elderly are the wearable devices. This work presents the first prototype of an autonomous, low-cost and easy-to-use elderly activity monitoring kit suitable to be deployed in nursery houses. Even when the data analytics web app is not part of this contribution, we can state that we have the preliminary releases of a suite of fall detection algorithms based on different machine-learning paradigms [14,16]. Finally, a first trial of the presented kit is being carried out in a Nursery House in Burgos (Spain), and the participants enrolled criteria have been designed by a gerontologist from the Diputaci´ on de Burgos (Spain). This work is structured as follows. Next section includes the state of art about fall detection platforms, while the design issues of the falls monitoring kit is explained in Sect. 3. Experimentation and the discussion on the results are coped in Sect. 4. Finally, conclusions and future work is included in Sect. 5.
2
State of Art
In our previous work [22], a specific platform was presented for the monitoring and detection of epilepsy, and it analyzed the platforms for this purpose in the literature [11], both those that use WD and those that use sensor networks corporal [13]. Traditionally WDs are used to collect data corresponding to biomedical variables or to obtain feedback from patients, either by doing calculations in the WD [19] or through a cloud service [18]. There exist several works in the literature where WD are used to monitorize falls [12]. Also in very a preliminary release of our proposal we included a comparison between our wearable-devices and the Apple Watch (Apple3), where it can be stated that battery last of our devices is more than the doble of the Apple Watch [9]. Sensors for Fall Detection. The solutions for the detection of falls can combine different types of sensors such as a barometer and an inertial [20], 3DACC and gyroscope [21], 3DACC and smart tiles [8] or 3DACC and barometer in the neck [3]. However, 3DACC is the most widespread solution in the literature [10]. Different solutions have been proposed for the detection of falls [4], but most of them have the common characteristic that the device has been placed on the hip or on the chest, since it is easier to detect at these locations [7]. The hip location is valid for patients with severe disability, but forcing the use of a belt in conjunction with dresses (in the case of women), may not be valid for healthy patients. Public Datasets with Genuine. One of the main challenges in the field of the detection of human activity (HAR - Human Activity Recognition), consists of
Fallers monitoring Kit
289
the provision of contrasted datasets for the validation of the new detection algorithms. Among the best-known public datasets we can point out the following: UMA Fall [5] (simulated falls with a sensor located on the wrist), UNIOVIEpilepsy [23] (epileptic seizures and other activities with a sensor located on the wrist), DaLiac [17] (multiple activities from running to riding a bike, with sensors located in different places) and FARSEEING dataset [2] (simulated falls with sensor located on the thigh). The present work aims to create a public dataset with genuine fall data from older people.
3
The Proposal
This proposal consists of three main elements: 1) A box-kit containing all the WD and other devices necessary for the storage of the collected data; 2) the WD App responsible of recording sensors data with the desired frequency; and 3) a Web Application for data exploitation. 3.1
The Box
The box-kit consists of the following elements: a) 10 WD for participants’ biometric data collection. The chosen WD is the Samsung Gear Fit 2 band, with a wearable app developed for Tizen OS, b) 1 Intel NUC for data storage. Model BOXNUC6CAYH, this mini-pc is powered by an Intel Celeron J3455 CPU and 8GB of RAM. It has 480 GB of SSD storage, c) 1 TP-Link Wi-Fi Router for WD-NUC connections. Model TP-Link TL-WR840N, being able to transmit at 300 Mbps, d) 1 4G Huawei Router for outside access to the NUC data. Model Huawei B525 4G/LTE, with a 5 GB per month SIM card, e) 1 Multiple-socket outlet for supplying power to all the devices. ORICO Super Charger with 6 outlets and 5 USB 5V/8A charging ports, and f)2 3-Ports USB Chargers for charging all the remaining WD. RAVPower Fast USB Chargers 5V/6A. All these items are assembled on a plastic box for easy transport (10e). The cost of the entire assembly is 478e (NUC, routers, chargers and box) + 25e/month (Mobile data plan) + 125e/participant (each WD). There are no known limitations on the number of participants this system supports (apart from the obvious: box size and USB charging ports; the more participants, the more physical space is required for each WD charging station and more USB chargers are needed). We have developed a system starting with 10 participants for one nursery house, but with plans to add more kits to the system in other nursery houses. Each WD is able to record up to 18 h/day of sensors-data with its 200 mAh battery, which is approximately 90 MB/day for each participant. So, for 10 participants recording 18 hours each day, the NUC with its 480 GB of SSD storage is able to keep approximately up to 17 months of data (with space for the Windows 10 NUC’s OS). If for any reason the NUC isn’t available, each WD is able to keep 22 days of data with its 2 GB of free internal storage.
290
E. de la Cal et al.
Through the LTE connection, daily data summaries are sent using a scheduled task via FTP to a lab server, in which there is a web application prepared for viewing these summaries. Them contain aggregation information of the data recorded on each WD, which allows you to verify remotely that the system is working properly and the adequacy of the data recorded for each participant, without sending all this data through a mobile connection, that would require to contract an Internet mobile flat rate and more costs. The retrieval of the raw data from the NUC can be done in person when necessary (up to once every 17 months), moving all data to a removable device (Fig. 1).
Fig. 1. The fallers monitoring kit 0.0
Fig. 2. States diagram for the app WD and successive data manipulation.
3.2
The Wearable App
The WD that has been chosen is a Samsung commercial device equipped with optical HR, 3DA and GY sensors, as well as WiFi connectivity. The WD configuration has been customized in order to minimize battery consumption, getting rid of device features that were not focused on this platform.
Fallers monitoring Kit
291
Fig. 3. WD application (left), Data Visualization Web App on both desktop and mobile (right).
For this particular WD, which runs under Tizen OS version 2.3.1, a system has been developed for the continuous recording of data, which clearly improves the native solution of the manufacturer both in battery consumption and sampling frequency. This system consists of: i) an application (the Launcher) responsible for managing (launching, stopping, pausing, etc.) the data recording and transmission services. This application is password-protected, only available to administrator users. ii) A Watch-Face, responsible of showing anytime informative data to the user (participant ID, Battery %, etc), as well as guaranteeing that the data recording and transmission services are operative (Fig. 3). iii) The service responsible for collecting the data from the sensors with the frequency set (1 Hz for HR and 10 Hz for the 3DA and GY sensors) and store them, locally, in a database. iv) The data transmission service, responsible for sending the collected data through the Wi-Fi connection via FTP to the data gateway (the NUC device). Figure 2 shows a state diagram with operational detail of the system: state1) the sensors are recording data continuously while the device is ON and worn by a participant, state2) Likewise, it sends the collected samples to the NUC via FTP using the Wi-Fi connection when the WD is placed on the charging base, state3) Once all data from the WD has been downloaded into the data gateway, the WD App goes idle, state4) until it is removed from the charger the next morning, to be placed on the participants again (state 1). The WD App is able to do this autonomously, even to stop recording data from the sensors when reaching a battery level lower than 15% (State 3). Thanks to this behaviour, the nurses only have to worry about picking up the WD from the chargers at the beginning of the day and placing them on the participants, and at the end of the day taking them off and putting them back to the chargers. This allows a fully automated operation, without no interaction with the WD.
292
3.3
E. de la Cal et al.
The Data Visualization Web App
To exploit easily with all the collected data (each WD registers more than half a million of samples per day), a web application (see Fig. 3) has been developed, which allows: i) the visualization and filtering of the summaries of the data registered in the WD that are sent each day from the NUC, ii) The generation of graphs and other summarized information from the data previously moved from the NUC to the lab server. This web application would also allows in the future to carry out a calibration process of the activity of each participants and the sensors measures. This calibration consist in a series of specific activities that would be carried out by the participants while sensors data is being registered with the WD. This calibration would help when designing and training machine learning models using participants’ data. This web application is deployed on the data exploitation server, located in the remote laboratory. It is prepared to receive data from multiple nursery houses at once, simply by having the NUCs from each of the places send the summaries to the same web server. Even when the data analytics process is not part of this contribution, we can state that we have the preliminary releases of a suite of fall detection algorithms based on different machine-learning paradigms [15].
4
Preliminary Results
Falls are defined as involuntary events that make one lose balance and find the body on the ground or other firm surface that stops it (WHO1 , 2012). Falls are an important cause of disability in the elderly and, at their Once, one of the adverse outcomes of frailty. Approximately 30% of people over 65 and 50% of those over 80 fall at least once a year [1]. 4.1
Inclusion and Exclusion Criteria
Among the 150 patients living in the Nursery House “Fuentes Blancas” (Diputaci´ on de Burgos, Spain), 10 participants have been enrolled (Table 1 shows the exclusion and inclusion criteria) in the first phase of the presented falls monitoring kit: 8 fallers and 2 healthy participants (participants 4 and 8). The fallers have an average fall rate of one event per 15 days. Table 2 shows the features and comorbilites for each participant. Comorbidity implies an increase in risk factors involved in falls and an increase in mortality due to various factors, including polypharmacy. The Charlson Comorbidity Index [6] considers, based on the pathologies potentially responsible for death, the prediction capacity based on the score obtained plus a correction factor for the number of years. The taking of more than five medications constitutes a Geriatric Syndrome that we call Polypharmacy. The intake of several drugs increases the risk of the adverse effects of the drugs and the changes 1
World Health Organization.
Fallers monitoring Kit
293
Table 1. Inclusion and exclusion criteria Inclusion
Exclusion
Having suffered at least one fall in the last year
Dementia or cognitive impairment
Stable cognitive situation with MMSE greater than 26/30Functional or physical instability Maintenance of sensory-perceptual abilities
Life expectancy less than six months
Ability to move autonomously
Table 2. Features of the enrolled participants: IdParticipant, gender, age and comorbilites Id Gender Age Comorbilites 1
Male
88
Left amaurosis, diabetes, ulcus, prostatism, hypertension, renal failure, ACXFA
2
Male
78
Diabetes, hypertension, heart disease, aplastic anemia, duodenopancreatectomy
3
Male
86
Diabetes, hyperuricemia, brucellosis, ulcus, osteoarthritis, spastic colon, laminectomy, cholecystectomy
4
Male
76
Hypertension, diabetes, renal failure, diabetic nephropathy
5
Female 89
piliartrosis, diabetes, hypertension, total knees prosthesis, hyperlipemia, hepatic hydatid cyst, diabetic retinopathy, diabetic nephropathy, ulcus, hypothyroidism
6
Female 91
Glaucoma, AC X FA, knee prostheses, enf Crohn, shoulder subluxation, chronic renal failure, varicose veins
7
Male
88
Prostatism, hypertension, nephrectomy, COPD, dyslipidemia, arthritis, AC x FA, knee prosthesis, olecranon fracture, chronic renal failure, intermittent claudication
8
Male
83
Diabetes, Hypertension, Hyperlipidemia, Aortic aneurysm, COPD, Goiter, polycystosis hepatica, nephrolithiasis
9
Female 85
10 Female 98
Hypertension, fx, humero, syncope, bilateral aphakia, delusional disorder, hearing loss Anemia, endometrial carcinoma, mild cognitive impairment, ACXFA, renal failure, PE, bilateral DVT, Fx colles, ulcus
derived from normal aging those of pharmacodynamics. The Potential Toxicity Scale (PTS) quantifies with a correction factor the taking of drugs and the possible implication in falls. Table 3 includes the computation of the typical risk indexes for elderly: CCI, Aid, Treatments, BARTHEL and MMSE. The two healthy participants (id =
294
E. de la Cal et al.
4, 8) score a BARTHEL index of 90 and 95 respectively, that a shows a very good physical state. The remaining indexes will allow to correlate the state of the participants and the physical evolution of participants (considering falls and other kind of activities). Table 3. Risk factors for the enrolled participants: Id (IdParticipant), CCI (The Charlson Comorbidity Index), Aid (cane or walker), Treatments, PTS (Potencial Toxicity Scale), BARTHEL (Assessment of physical disability), MMSE (Mini-Mental State Exam) Id 1 2 3
ICC Aid 6 Cane 3 3 Cane
4 8 5 11 6 6 7 9
8 8 9 4 10 13
4.2
Treatments Duodart, digoxin, enalapril, sintrom, xelevia Zomarist, sevikar Omeprazole Pregabalin, xelevia, mirtazapine, xarelto, bisoprolol, zuranpic, seretide, mixtard, feliben Atrovastatin, lantus, sevikar, velmetia Walker Omeprazole, diliban, eutirox, inegy, vesicare, mixtard Cane Omeprazole Openvas, sintrom, travatan, tardyferon Manidipine, digoxin, esomeprazole, enelapril, januvia, duodart, pravastatin, allopurinol, sintrom Lantus, adiro, furosemide, Brimica, Profer, Omeprazole, Procoralan, Quetiapine Mirtazapine, lexatin, irbesartan Walker Adiro, nexium, bisoprolol, quetiapine, atrovent
PTS BARTHEL MMSE 5 75/100 27/30 3 80/100 26/30 13 70/100 28/30
4 4
90/100 65/100
28/30 26/30
2
70/100
26/30
6
80/100
28/30
9
95/100
30/30
6 5
65/100 75/100
26/30 26/30
Numerical Results
This is only a very preliminary resume of the two first weeks of working. The monitoring kit has been running from 9am to 9pm with all the 10 participants. The WDs were able to record 12 h of continuous data (one day) as planned. At the end of these 12 h, the battery life was around 25% charge (see Fig. 4), that is a very good performance. In the summaries sent from the NUC to the Lab SERVER, we can also obtain average information for all the sensors, aggregated by hour (see HR sensor data in Fig. 5). It can be observed that in the period between 2pm–4pm there is a valley in HR curve due to the nap time for the most of the participants. Two important issues were: i) that all the participants attended to the nursery to wear the smartband on and off everyday, and ii) as this is a prototype sometimes there are problems of communications between the smartbands and the Gateway.
295
80 60 40 20
Average Battery level (%)
100
Fallers monitoring Kit
Participant 4 Participant 5 Participant 6 Participant 7 Participant 9 Participant 10 10
12
14
16
18
20
Hour in day
80 60 40 0
20
Average Heart Rate (bpm)
100
120
Fig. 4. Battery discharge curve for 6 participants in one day (information obtained from summary sent).
Participant 4 Participant 5 Participant 6 Participant 7 Participant 9 Participant 10 10
12
14
16
18
20
Fig. 5. Heart Rate curve for 6 participants in one day (information obtained from summary sent). At the end of the day, the HR drop is because of the WD being removed from the participants, until they are connected to the charging bases.
5
Conclusion and Future Work
This work, presents a low-cost, robust, non-invasive and easy-to-use physical activity monitoring kit based on sensorized wristbands. The preliminary results show a good behaviour of the different hardware and software elements. As this is just a prototype the potential improvements are: i) The use of the kit in other kind of studies like fragility studies for example, ii) use of a more affordable gateway hardware. The Raspberry PI 3 last model can be a good option, since the model 2B gave us stability problems, iii) include a small screen for the gateway with a very simple visual interface including status and parameters of the elements in the kit, iv) Install a Data Visualization Web Application in the NUC to explode the kit locally and v) statistics rewards (activity level, evolution, ...) to encourage participants to wear the WD. Acknowledgement. This research has been funded partially by Spanish Ministry of Economy, Industry and Competitiveness (MINECO) under grant TIN2017-84804-R
296
E. de la Cal et al.
and by Foundation for the Promotion of Applied Scientific Research and Technology in Asturias, under grant FC-GRUPIN-IDI2018000226.
References 1. American Geriatrics Society, B.G.S.: Ags/bgs clinical practice guideline: prevention of falls in older persons. Report, American Geriatrics Society, British Geriatrics Society (2010) 2. Bagala, F., Becker, C., Cappello, A., Chiari, L., Aminian, K., Hausdorff, J.M., Zijlstra, W., Klenk, J.: Evaluation of accelerometer-based fall detection algorithms on real-world falls. PLoS ONE 7, e37062 (2012) 3. Bianchi, F., Redmond, S.J., Narayanan, M.R., Cerutti, S., Lovell, N.H.: Barometric pressure and triaxial accelerometry-based falls event detection. IEEE Trans. Neural Syst. Rehabil. Eng. 18, 619–627 (2010) 4. Bourke, A.K., O’Brien, J.V., Lyons, G.M.: Evaluation of a threshold-based triaxial accelerometer fall detection algorithm. Gait Posture 26, 194–199 (2007) 5. Casilari, E., Santoyo-Ram´ on, J.A., Cano-Garc´ıa, J.M.: UMAFall: a multisensor dataset for the research on automatic fall detection. Procedia Comput. Sci. 110, 32–39 (2017) 6. Charlson, M., Pompei, P., Ales, K., MacKenzie, C.R.: A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J. Chron. Dis. 40, 373–383 (1987) 7. Chaudhuri, S., Thompson, H., Demiris, G.: Fall detection devices and their use with older adults: a systematic review. J. Geriatr. Phys. Ther. 37(4), 178–196 (2014) 8. Daher, M., Diab, A., El Najjar, M.E.B., Khalil, M.A., Charpillet, F.: Elder tracking and fall detection system using smart tiles. IEEE Sens. J. 17(1), 469–479 (2017) 9. De La Cal, E., Fa˜ nex, M., Villar, J., Gonzalez, V.: Plataforma para el estudio de ca´ıdas y desvanecimientos en grupos de personas mayores. In: CEA BIOINGENIER´IA 2018 (2018) 10. Hakim, A., Huq, M.S., Shanta, S., Ibrahim, B.S.K.K.: Smartphone based data mining for fall detection: analysis and design. Procedia Comput. Sci. 105, 46–51 (2017) 11. Hassan, M.M., Albakr, H.S., Al-Dossari, H.: Internet of things framework for pervasive healthcare. In: 1st International Workshop on Emerging Multimedia Applications and Services for Smart Cities, EMASC 1 (2014) 12. Jalloul, N.: Wearable sensors for the monitoring of movement disorders. Biomed. J. 41(4), 249–253 (2018) 13. Khelil, A., Shaikh, F.K., Sheikh, A.A., Felemban, E., Bojan, H.: DigiAID: a wearable health platform for automated self-tagging in emergency cases. In: de 4th International Conference on Wireless Mobile Communication and Healthcare. MOBIHEALTH20144th International Conference on Wireless Mobile Communication and Healthcare, MOBIHEALTH 2014 (2014) 14. Khojasteh, S., Villar, J., de la Cal, E., Gonz´ alez, V., Sedano, J.: Fall detection analysis using a real fall dataset. Adv. Intell. Syst. Comput. 771, 334–343 (2019) 15. Khojasteh, S., Villar, J., Chira, C., Gonz´ alez, V., de la Cal, E.: Improving fall detection using an on-wrist wearable accelerometer. Sensors (Switzerland) 18(5) (2018)
Fallers monitoring Kit
297
16. Khojasteh, S., Villar, J., De La Cal, E., Gonzalez, V., Tan, Q., Kiadi, M.: A discussion on fall detection issues and its deployment: When cloud meets battery. In: 2018 3rd IEEE International Conference on Cloud Computing and Big Data Analysis, ICCCBDA 2018, pp. 112–115 (2018) 17. Leutheuser, H., Schuldhaus, D., Eskofier, B.M.: Hierarchical, multi-sensor based classification of daily life activities: comparison with state-of-the-art algorithms using a benchmark dataset. PLoS ONE 8(10), e75196 (2013) 18. Maglogiannis, C.D., Maglogiannis, I.: Bringing IoT and cloud computing towards pervasive healthcare. In: de 6th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing. IMIS 2012 (2012) 19. Rahimi, M.R., Ren, J., Liu, C.H., Vasilakos, A.V., Venkatasubramanian, N.: Mobile cloud computing: a survey, state of art and future directions. Mob. Netw. Appl. 19(2), 133–143 (2014) 20. Sabatini, A.M., Ligorio, G., Mannini, A., Genovese, V., Pinna, L.: Prior-to- and post-impact fall detection using inertial and barometric altimeter measurements. IEEE Trans. Neural Syst. Rehabil. Eng. 24(7), 774–783 (2016) 21. Sorvala, A., Alasaarela, E., Sorvoja, H., Myllyla, R.: A two-threshold fall detection algorithm for reducing false alarms. In: Proceedings of 2012 6th International Symposium on Medical Information and Communication Technology (ISMICT) (2012) 22. Vergara, P.M., Cal, E., Villar, J.R., Gonz´ alez, V.M., Sedano, J.: An IoT platform for epilepsy monitoring and supervising. J. Sens. 2017, 18 (2017) 23. Villar, J.R., Vergara, P., Men´endez, M., Cal, E., Gonz´ alez, V.M., Sedano, J.: Generalized models for the classification of abnormal movements in daily life and its applicability to epilepsy convulsion recognition. Int. J. Neural Syst. 26(06), 1650037 (2016)
Random Forest Missing Data Imputation Methods: Implications for Predicting At-Risk Students Bevan I. Smith(B) , Charles Chimedza, and Jacoba H. B¨ uhrmann University of the Witwatersrand, Johannesburg, RSA {bevan.smith,charles.chimedza,joke.buhrmann}@wits.ac.za
Abstract. In the field of higher education, predicting students At-Risk of failing is crucial since they could then be recommended for various interventions. These predictions are made on real world datasets that most likely have various missing data. Addressing this missing data could have substantial affects on the eventual At-Risk predictions. In this study we address the missing data problem with recently developed missing data imputation techniques not currently seen in the relevant literature. These techniques include multivariate imputation by chained equations (MICE) and missForest. We found that MICE does not perform better than simple listwise-deletion. However, missForest is shown to substantially improve predictive performance. This is important since it implies that any subsequent machine learning predictions on students At-Risk of failing are potentially much better.
Keywords: Missing data learning
1
· MICE · missForest · At-Risk · Machine
Introduction
It is crucial to accurately predict and identify students at-risk of failing a university course since we could then advise these students on potential steps to reduce their being at-risk. However, the datasets used in these predictions are real world datasets and generally contain various missing data [9,16] that need to be addressed since it can substantially affect any conclusions drawn from the predictions [9,10]. Although dealing with missing data is an important part of data preparation, this has not been properly addressed in the literature on predicting student performance. Could the results in the literature have been more trustworthy had the authors addressed missing data? In the field of predicting student performance, listwise deletion methods have been used [1,4,28], but no detail was given indicating if the missing data was less than 5% or missing completely at random (MCAR) (a requirement for implementing listwise deletion). The most thorough description found was by Mduma et al. [20] who addresses missing values but simply imputes zeros and medians into missing c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 298–308, 2021. https://doi.org/10.1007/978-3-030-49342-4_29
Random Forest Missing Data
299
values. Beaulac et al. [3] only states that missing values are a problem. Perhaps these studies did address the missing value problem but did not report on it. Multivariate imputation by chained equations (MICE) was used in two case studies on predicting student performance [7,15]. However, no thorough analysis of missing data has been found in the literature. We believe that addressing missing data in a detailed way may provide much more reliable results going forward. In this paper we show the following: 1. For very low percentage of missing data, less than 1%, complicated methods such as multivariate imputation by chained equations (MICE) do not perform better than a simple listwise deletion method. 2. However, even for the low percentage of missing data, the new random forest imputation method, missForest, performs substantially better than other methods. 3. We argue that these results show that prediction and inference of At-Risk students, after the missForest imputation will be more accurate.
2 2.1
Dealing with Missing Data Types of Missingness
A first step in dealing with missing data is to know the type of missingness, i.e. why the data are missing. In general, there are three types of missingness [23]: 1. Missing completely at random (MCAR). These missing values result from random chance. For example, missing values may simply be due to arbitrary mistakes while capturing data. This type is preferred since ignoring this missing data does not introduce any bias into the data. 2. Missing at random (MAR). This type of missing data is dependent on observed data. That is, once we control for a certain observed feature, the missingness becomes random [10]. 3. Missing not at random (MNAR). This missing data depends on itself, i.e. depends on missing data. For example, a person with low income may not include their income in a survey, because of their low income. Whereas MCAR and MAR do not bias the data, MNAR does [10]. MNAR is also known as “non-ignorable”. 2.2
MCAR Test
Little’s MCAR test is a statistical test to determine whether missing data is MCAR or not [19]. Using a χ2 test, it tests whether there are statistically significant differences between means of different missing-value patterns [17]. The null hypothesis states that there are not significant differences, i.e. that the missing data is MCAR. Therefore at a 95% confidence interval, p-values smaller than 0.05 computed for this test indicate not MCAR. If not MCAR, then the data is either MAR or MNAR and requires further analysis. The following describe various ways of dealing with missing data, either by deletion or imputation.
300
2.3
B. I. Smith et al.
Listwise and Pairwise Deletion
If missingness is MCAR then listwise deletion and pairwise deletion are options [23]. Listwise deletion is to delete an entire observation if one or more values are missing. If the missingness is not MCAR then deleting these observations might introduce bias into the data [23]. However, if missing data is less than about 5% or if the total number of observations deleted listwise is under 5%, then the bias in the data will likely be inconsequential [10]. Listwise deletion is the default approach to dealing with missing data in most statistical software packages [11,23]. Pairwise deletion deletes cases only when a pair of features include a missing value. For example, if correlation is performed on two features, and one of the features has missing data, the correlation will exclude the pair of values where one value was missing [21]. Disadvantages of this method are that different subsets of observations are used for different analyses. This will tend to bias parameter estimates and likely result in a non-positive definite covariance matrix which cannot be used for most multivariate statistical analyses [10,21]. 2.4
Single Imputation - Mean, Mode, Median and Regression
Single imputation imputes single values into the missing data, for example the mean in continuous features [10,21] and the mode in categorical features. Mean imputation, however, decreases the variance due to adding a number of mean values [21] and is not recommended by some [10]. Regression is another single imputation method for estimating missing values. The variable that is missing is regressed on other measured variables. The problem with a single imputation procedure is that the filled in data does not have the variation in the data, had it been complete [10,21]. 2.5
Multiple Imputation
Multiple imputation (MI) aims to restore the lost variance inherent in single imputation [10,23] by creating a number of different imputed data sets. Various analyses are performed on each imputed data set, and the results are then averaged over the imputed datasets. For example, the mean squared error (MSE) could be calculated for each imputed data set. Thereafter, the average MSE is calculated. The literature has suggested from between 3 and 50 imputed datasets depending on the amount of missing data [6,10,23]. Higher numbers are said to reduce simulation error and improve power [2,6]. Some suggest that 10 imputations are sufficient [2,23]. Therefore to limit computation time, in this analysis, the number of imputed datasets was 10. One obvious downside to multiple imputation is that we don’t end up with a single imputed dataset, but multiple datasets. For any further analysis such as training and testing of machine learning models, the models need to be trained and tested on all the imputed datasets each time.
Random Forest Missing Data
301
A method developed recently for multiple imputation is known as multivariate imputation by chained equations (MICE) [2,6]. In this method a series of regression models are fitted where each model is dependent on the other variables. MICE operates under the assumption that the missing data are MAR and implementing MICE when the data is not MAR could result in biased estimates [2]. This method can be applied in RStudio using the MICE package [6]. MICE is used in this study and therefore briefly described next. This description is based on Azur et al. [2]: 1. Begin by imputing a simple value such as the mean for all missing values. This is viewed as a “place-holder”. 2. For a feature, lets call it Fm , remove the “place-holder” values and return values in that feature to missing. 3. The non-missing, observed values in Fm are then considered as dependent variables and are regressed against other features. Essentially we are fitting a regression function where the observed values in Fm are the predictions and the other features are the input variables. 4. The missing values in Fm are then imputed by predicting with the regression model from Step 2. 5. Then, we consider another feature, remove that feature’s “place- holder” values and impute using the same method. However, now we include the imputed data in Fm in our regression model. This is then repeated for each feature until all the features have had their missing values imputed via these regression equations. 6. The above method is then repeated multiple times based on the idea that after multiple cycles the parameters in the regression model should have converged to become stable. Eventually we obtain a single imputed dataset from the above steps. There are however some disadvantages of MICE. The first is that in the process we generate multiple imputation sets, that is, the above steps are repeated often 10 times to obtain 10 imputed datasets. This means that as we go forward with training models, we are required to train (or perform any analysis) on all the sets and compute an average. This is indeed cumbersome. Secondly, the default for MICE is for the regression models to not include interaction terms that could fit non-linear data [24]. Also, MICE is a parametric model and therefore is sensitive to multicollinearity [24]. 2.6
Random Forest Methods
The missForest method, developed by Stekhoven and B¨ uhlmann [25], aims to circumvent the limitations of MICE discussed above. This method is based on random forest (RF) methods [5] and has been shown to outperform MICE methods in a number of simulation studies [22,25,27]. The MICE function in R-Studio also has the functionality of using random forests for multiple imputation. Thus far, no literature has been found that applies random forest methods, including
302
B. I. Smith et al.
MICE-Random Forest and missForest to studies on predicting student performance. A brief description of this method is presented next. Random forest is an ensemble method that is a combination of multiple decision trees. A decision tree can be used in both regression and classification problems by recursively splitting a dataset into smaller subsets until it obtains the target outcome [8]. Each tree in the random forest is trained on only a subsample (often bootstrap1 ) of the observations and a random subset of the features. This has the effect of reducing overfitting, reducing variance in the predictions, and decorrelating the trees [5,14]. This can lead to better predictions. For a detailed explanation on the random forest method, see Breiman [5] or Hastie et al. [12].
3
Analysis of Missing Data
Based on the techniques presented above, we now analyze the missing data. 3.1
Original Data
The original dataset consisted of 901 observations and 38 features. Figure 1, generated by SPSS [13] shows the overall missing values in terms of variables (features), cases (i.e. observations) and values. It shows that 8.2% of cases (observations) contain missing values and overall there were only 1.28% of missing values. If the missingness is MCAR, then it is possible to perform listwise deletion.
Fig. 1. Overall missing data analysis generated via SPSS (Version 25).
1
Random sampling with replacement.
Random Forest Missing Data
3.2
303
MCAR Test
As indicated above, the percentage missing values is well below 5% and therefore would allow for listwise deletion. However, this requires the missingness to be MCAR. Little’s MCAR hypothesis test was carried out in SPSS and a p-value of 0.000 was calculated. This value is significant and suggests that the data is not MCAR but either MAR or MNAR. Therefore based on this, it is recommended to not carry out listwise deletion. 3.3
Determining MAR/MNAR
It is therefore important to determine if the missingess is MAR or MNAR. We do this by analyzing patterns in the missing data using SPSS. Due to limitations, a brief discussion is presented here. Missing data patterns generated in SPSS indicate groups of students that had exactly the same data missing. In order to determine MAR or MNAR missingness, these patterns were examined individually. We first identified an obvious pattern, ones that literally had all grades data missing. This indicates that these students most likely dropped out before attending any class. The number was 23 and were completely deleted from the dataset, reducing the dataset to 878 observations. Therefore, after deleting the 23 total dropouts, the percentage missing values decreased from 1.28% to 0.66% and observations with missing data decreased from 8.2% to 5.8%. We next found 3 patterns that were considered to be MNAR. These patterns showed that students obtained grades early in the semester but none later on. This suggests they dropped out. When dropout occurs in a program, the subsequent missing values are known as “non-ignorable” or MNAR [18,23,29]. This is because the missing value is dependent on itself or dependent on some unobservable factor. For example dropout may have occurred because of poor grades or due to unobservable factors such as relocating to another city or switching to a preferable degree course. Nevertheless, both are considered MNAR. We next identified 4 patterns that could be considered MAR because their patterns were assessed to be random. We next identified 1 pattern that was difficult to describe due to the students not having grades in the beginning of the year and then having exam grades. It was not understood how they could have taken the final exam but not have taken any prior major tests. Due to the uncertainty, these were considered MNAR. Finally, 3 patterns were identified as MAR since we could explain the missingness based on observable features. Imputation can be performed on missingness that is either MCAR or MAR, but it will introduce bias if MNAR. Here we identified 4 patters that were MNAR. We next explain how we dealt with the MNAR patterns. 3.4
MAR Assumption
In this section we justify two reasons to assume that the MNAR data can be considered to be MAR. The reason for this assumption is to simplify the analysis so that multiple imputation can be performed on all the missing data.
304
B. I. Smith et al.
The first justification is described by Schafer and Graham [23]. As discussed above, dropout may be due to either the outcome or to an unobserved factor. If the true cause of the missing values is the value being measured (i.e. the missing grades are due to poor grades), then failure to account for MNAR might significantly bias the estimates. However, if the cause is not the value being measured, but is unmeasured and unobservable factors that are not highly correlated with the outcome (less than 0.4), then only minor bias will be introduced. Schafer and Graham state that in many social science studies, the second reason is much more common. If we can conclude that the dropout of the students in this study are due to unobserved factors, then we can trust analysis based on assuming that the data is now MAR. Therefore, in this study we can assume that the students dropped out of the course due to either performing poorly, or various other unobserved factors discussed above (i.e relocating, changing degrees, needing to work etc.). It is therefore assumed that analyzing the MNAR data using MAR techniques will not introduce significant bias. The second justification for assuming MNAR data to be MAR is based purely on the percentage of MNAR data. All the missing observations from the MNAR patterns added up to at most 1%. Our assumption is that this small amount will not significantly bias the results and we can assume MAR. However, to validate these assumptions, a simple comparison exercise was performed: we first imputed the missing data under the MAR assumption, then we removed all the MNAR cases and imputed again and compared performance results. The aim is that if there was no significant difference in results between the two datasets, then we have confidence in our assumption. To validate the MAR assumption, we performed multiple imputation under the assumption that all missing values are MAR. We call this dataset Entire, with 878 observations. Thereafter we removed the 6 MNAR observations and again carried out multiple imputation on the dataset we call Reduced, with 872 observations. For each dataset, 10 datasets were imputed. As discussed earlier, if the metrics were not significantly different, then we have confidence in our assumption that all missing data is MAR. The datasets were “measured” by performing linear regression using 10-fold cross-validation and computing the mean squared error (MSE) for each imputed dataset (10 in total). Therefore both the Entire and Reduced datasets had 10 imputed sets. The average MSE values for the Entire and Reduced datasets are presented in Table 1 together with the t-test p-value of 0.065, showing that there is no significant difference in mean values (α = 5%). The boxplots are presented in Fig. 2. This therefore validates the assumptions discussed in Sect. 3.4 and we therefore have confidence in assuming that all missing values are MAR. We can now carry out multiple imputation and random forest techniques to impute data. 3.5
Missing Data Imputation
We have argued that addressing missing data using thorough analysis and imputation techniques may be valuable. However, we also would like to compare these complicated analyses with the most simple method, which is listwise deletion.
Random Forest Missing Data
305
Table 1. Average MSE for Entire and Reduced dataset. Standard deviations are shown in brackets. Entire
Reduced
p-value
0.545
0.550
0.555
0.560
0.565
0.570
0.575
0.558 (0.008) 0.552 (0.005) 0.065
Entire
Reduced
Fig. 2. Boxplot of MSE values for Entire and Reduced datasets.
Our dataset before imputation had only 0.68% missing values. Because that is well below 5%, we assumed that it would not introduce any significant bias and we therefore deleted all the observations that had any missing data. Our assumption is that if the complicated imputed datasets do not perform any better than the simple listwise-deleted dataset, then the complicated procedure of imputation was unnecessary. Therefore we compared all the subsequent imputations with a baseline dataset, i.e. the listwise-deleted dataset. After performing listwise deletion to obtain the baseline dataset, 64 observations were completely deleted to leave 814 observations. The first type of imputation performed was mean and mode imputation for continuous and categorical features respectively. The second type was MICE using regression as the imputation technique using the MICE package in RStudio [26]. The third was MICE using random forest as the imputation technique. Finally, missForest was applied using the missForest package in R-Studio [25]. Furthermore, for the random forest methods, 10, 50 and 100 trees were selected as well and compared. Imputations were carried out as discussed above, then linear regression models were trained on the datasets via 10-fold cross validation and MSE values were calculated for each dataset.
306
B. I. Smith et al.
0.52
0.54
0.56
0.58
0.60
The results of these imputed datasets are presented in Table 2 and Fig. 3. For the multiple imputations there were 10 imputed datasets and therefore a range of MSE values is presented. The MSE results show that the mean/mode imputation performed the worst with the highest MSE of 0.57. What is interesting is that the baseline listwise-deleted dataset performs similarly to all the MICE imputed datasets (both regression and random forest). This suggests that the missing data was MCAR and if we were only comparing listwise-deletion with MICE, then the simpler listwise-deletion was sufficient. Finally, the missForest imputed datasets perform substantially better than the rest of the datasets. Whereas the MSE values for all the other datasets are around 0.55, the missForest datasets all score around 0.51. In addition to performing the best, the missForest method only imputes a single dataset, unlike the MICE technique that imputed 10 datasets. Since the MICE-RF technique is similar to the missForest technique, it is uncertain why the missForest performed better than the MICE-RF. This requires further analysis. Nevertheless, to conclude, the dataset imputed using missForest with 100 trees is now considered the final imputed dataset for the entire population.
Baseline
Mean
MI−Def
MI−10
MI−50
MI−100
MF−10
MF−50
Fig. 3. MSE values for different imputed datasets.
MF−100
Random Forest Missing Data
307
Table 2. MSE for various imputed datasets. The random forest imputation method missForest is shown to perform the best.
4
Dataset
MSE
Baseline Mean/mode MICE (Reg) MICE (RF, ntree = 10) MICE (RF, ntree = 50) MICE (RF, ntree = 100) MF (ntree = 10) MF (ntree = 50) MF (ntree = 100)
0.56 0.57 0.56 0.55 0.55 0.56 0.52 0.51 0.51
(0.020) (0.008) (0.007) (0.008)
Conclusion
The results show that imputation using the relatively complicated MICE method did not perform any better than the simple listwise-deleted dataset. This is helpful since for future datasets, if there was a very small amount of missing data, less than 1%, it might be sufficient to choose a listwise-deleted dataset and save considerable analysis time. However, we found that the new imputation method, missForest, improves the predictive performance substantially, even with such a small amount of missing data. Therefore with a small amount of missing data, missForest can potentially result in substantial improvements in identifying AtRisk students using machine learning methods.
References 1. Aguiar, E., et al.: Engagement vs performance: using electronic portfolios to predict first semester engineering student retention. J. Learn. Anal. 1, 103–112 (2014) 2. Azur, M.J., et al.: Multiple imputation by chained equations: what is it and how does it work. Int. J. Methods Psychiatric Res. 60(2), 40–49 (2011) 3. Beaulac, C., Rosenthal, J.S.: Predicting university students’ academic success and major using random forests. Res. High. Educ. 60(7), 1048–1064 (2019) 4. Beemer, J., et al.: Assessing instructional modalities: individualized treatment effects for personalized learning. J. Stat. Educ. 26 (2018). https://doi.org/10.1080/ 10691898.2018.1426400 5. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001) 6. van Buuren, S., Groothuis-Oudshoorn, K.: mice: Multivariate imputation by chained equations in R. J. Stat. Softw. 45 (2011) 7. Crespo-TurradoJos´e, C., et al.: Student performance prediction applying missing data imputation in electrical engineering studies degree. In: International Joint Conference SOCO’17-CISIS’17-ICEUTE’17 Le Spain, 6–8 September 2017, Proceeding, pp. 405–411 (2016) 8. Friedl, M.A., Brodley, C.E.: Decision tree classification of land cover from remotely sensed data. Remote Sens. Environ. 61(3), 399–409 (1997). ISSN 0034-4257. https://doi.org/10.1016/S0034-4257(97)00049-7
308
B. I. Smith et al.
9. Frisell, T.: SP0187 why missing data is a problem, and what you shouldn’t do to solve it. Ann. Rheumatic Dis. 75(Suppl. 2), 45 (2016). ISSN 0003-4967. https://doi.org/10.1136/annrheumdis-2016-eular.6249. eprint: https://ard.bmj.com/content/75/Suppl 2/45.4. full .pdf. https://ard.bmj.com/ content/75/Suppl 2/45.4 10. Graham, J.W.: Missing data analysis: making it work in the real world. Ann. Rev. Psychol. 2, 549–576 (2009) 11. Granberg-Rademacker, J.S.: A comparison of three approaches to handling incomplete state-level data. State Politics Policy Q. 7(3), 325–338 (2007) 12. Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Data mining, inference, and prediction, pp. 1-694. Springer, Heidelberg (2009) 13. IBMCorp. “IBM Statistics for Windows, Version 25.0”. In: Armonk, NY: IBM Corp 83 (2017) 14. James, G., et al.: An introduction to statistical learning, vol. 7, pp. 995– 1039 (2013). ISBN 978-1-4614-7137-0. https://doi.org/10.1007/978-1-4614-71387. arXiv:1011.1669v3 15. Jove, E., et al.: Attempts prediction by missing data imputation in engineering degree. In: Proceeding of the International Joint Conference SOCO 2017-CISIS 2017- ICEUTE 2017, Le Spain, 6–8 September 2017, pp. 405–411 (2017) 16. Kang, H.: The prevention and handling of the missing data. In: Proceedings, 1988 American Society for Engineering Education Conference, pp. 402–406 (2013) 17. Li, C.: Little’s test of missing completely at random. Stata J. 13(4), 798–809 (2013) 18. Li, M., et al.: Comparison of different LGM-based methods with MAR and MNAR dropout data. Front. Psychol. 7(3), 325–338 (2017) 19. Little, R.J.A.: A test of missing completely at random multivariate data with missing values. J. Am. Stat. Assoc. 83(404), 1198–1202 (1988) 20. Mduma, N., Kalegele, K., Machuve, D.: Machine learning approach for reducing students dropout rates. Int. J. Adv. Comput. Res. 9, 42 (2019) 21. Peugh, J.L., Enders, C.K.: Missing data in educational research: a review of reporting practices and suggestions for improvement. Rev. Educ. Res. 74(2), 525–556 (2004) 22. Ramosaj, B., Pauly, M.: Who wins the Miss Contest for imputation methods? Our vote for Miss BooPF. In: arXiv preprint arXiv:1711.11394 3 (2017) 23. Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147–177 (2002) 24. Shah, A.D., et al.: Comparison of random forest and parametric imputation models for imputing missing data using MICE: A CALIBER study. Am. J. Epidemiol. 179(6), 764–774 (2013) 25. Stekhoven, D.J., B¨ uhlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1) (2012) 26. van Buuren, S., Groothuis-Oudshoorn, K.: mice: Multivariate imputation by chained equations in R. J. Stat. Softw. 45(3), 1–67 (2011). https://www.jstatsoft. org/v45/i03/ 27. Waljee, A.K., et al.: Comparison of imputation methods for missing laboratory data in medicine. Br. Med. J. Open 3 (2013) 28. Xu, J., Moon, K.H., Van Der Schaar, M.: A machine learning approach for tracking and predicting student performance in degree programs. IEEE J. Sel. Topics Sig. Process. 11(5), 742–753 (2017) 29. Yang, M., Maxwell, S.E.: Treatment effects in randomized longitudinal trials with different types of nonignorable dropout. Psychol. Methods 7(3), 325–338 (2014)
Noise Reduction with Detail Preservation in Low-Dose Dental CT Images by Morphological Operators and BM3D Romulo Marconato Stringhini1(B) , Daniel Welfer1 , Daniel Fernando Tello Gamarra2 , and Gustavo Nogara Dotto3 1
Department of Applied Computing, Federal University of Santa Maria, Santa Maria, Rio Grande do Sul, Brazil [email protected], [email protected] 2 Department of Electrical Energy Processing, Federal University of Santa Maria, Santa Maria, Rio Grande do Sul, Brazil [email protected] 3 Universitary Hospital of Santa Maria, Santa Maria, Rio Grande do Sul, Brazil [email protected]
Abstract. Compared to other traditional imaging exams, computed tomography (CT) is more efficient, where a digital geometry processing is used to generate a 3D image of an internal structure of an object, or patient, from a series of 2D images obtained during various rotations of the CT scan around the scanned object. Also taking in consideration traditional imaging exams such as MRI or ultrasound, for example, the CT technique uses higher radiation doses than these exams, providing high quality images. However, in order to prevent constant exposures to high radiation doses, low-dose computed tomography (LDCT) scans are often recommended. Nevertheless, the images acquired in LDCT scans are degraded by undesirable artifacts, known as noise, which affects negatively the image quality. In this study, a two-stage filter based on morphological operators and Block-Matching 3D (BM3D) is proposed to remove noise in low-dose dental CT images. The quantitative results obtained by our proposed method demonstrated superior performance when compared to several state of the art techniques. Also, our proposed method obtained better visual performance, removing the noise and preserving details more efficiently than the compared filters. Keywords: Low-dose · Computed tomography · Image segmentation Noise reduction · Mathematical morphology · B3MD · PSNR
1
·
Introduction
The term computed tomography (CT) refers to an x-ray imaging procedure in which a narrow x-ray beam is directed to the patient and rotated around the body, producing signals that are processed by the CT computer which generates c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 309–317, 2021. https://doi.org/10.1007/978-3-030-49342-4_30
310
R. M. Stringhini et al.
high quality images, also known as slices [1]. On the other hand, CT exams contain a very important negative aspect that has been taken into consideration by the medical community: the doses of radiation that are used in this type of exam. Compared to traditional imaging exams, computed tomography uses higher radiation doses. Therefore, patients who require constant CT scans may develop some type of disease resulting from high radiation doses exposure [2]. Recently, the medical community has been following the ALARA principle, which the main idea is to reduce the dose of radiation at an acceptable level while maintaining an accurate medical diagnosis and an acceptable image [3]. As the quality of the image obtained in a CT scan is directly linked to the amount of radiation used, images obtained by low-dose CT (LDCT) scans are negatively affected by noise, leaving the images with a poor quality, which may compromise the medical diagnosis, depending on the type of structure to be analyzed [2]. However, the quality of LDCT images can be improved through noise filtering techniques. The objective of noise filtering algorithms is to reduce the noise from images while maintaining important details preserved. Several noise reduction methods have been proposed in the literature such as BM3D [4], PSMF (Progressive Switching Median Filter) [5], NAFSMF (Noise Adaptive Fuzzy Switching Median Filter) [6], MDBUTMF (Modified Decision Based Unsymmetrical Trimmed Median Filter) [7], bilateral filter [8], guided filter [9], Optimized Bayesian Non-Local Means (OBNLM) [10], Open-Close Sequence filter (OCS) [11] and the SMMF (Switching Median and Morphological Filter) [13], for example. In this context, a filter based on morphological operators and BM3D filtering, which is divided in two main stages, is proposed. In the first stage, an imagem segmentation is applied in order to improve the performance of the second stage, where the noise reduction occurs. After extensive simulations, the results obtained by our proposed method demonstrated to be superior when compared to some well known noise reduction techniques. The validation process is done by using the PSNR, SSIM, MSE and EPI quantitative metrics. The course of this paper is organized as follows: in Sect. 2 is presented the proposed method and the images used in this study. Experimental results and validation are presented in Sect. 3. The results obtained by the evaluated filters as well as of our proposed method are discussed in Sect. 4. Lastly, this study is concluded in Sect. 5.
2
Materials and Methodology
The low-dose CT dental images used in this study were taken from an anonymized database from the Universitary Hospital of Santa Maria (HUSM). These images are contaminated with natural noise acquired during the scanning process. Our proposed noise reduction method is based on morphological operators and BM3D and has two main stages: image segmentation and noise filtering. The main goal of our proposed technique is to reduce the noise in these images while preserving important details, structures, contrast and borders.
Noise Reduction with Detail Preservation in Low-Dose Dental CT Images
2.1
311
First Stage: Image Segmentation
Image segmentation is a process in which a given image is divided into regions in order to select which will be processed or analyzed separately [14]. To achieve the best performance in the stage of noise filtering, we divided the input images into two different regions: background and foreground. The first step of our proposed method is to convert the input noisy lowdose dental CT image f1 to its gray scale level f2 . In addition to preserve the contrast of the input image, converting to gray scale level assists the behaviour of morphological operators (see Fig. 1(a)). Afterwards, a morphological opening by reconstruction is applied in f2 , with a diamond-shape structuring element of size 2. First, the opening removes bright features which sizes are smaller than the structuring element and, after that, the dilation restore the contours of components that have not removed by the opening operator. The output image after this reconstruction is denoted f3 and is described as (n) f3 = δf2 (γf2 ). (1) n≥1
where γfB2 is the morphological opening operator and (δ B (f2 ))(x) is the morphological dilation operator in f2 , both with a diamond shaped structuring element B of size 2. The image resulted from this step is illustrated in Fig. 1(b). The pixels that make up the bright structure of the image, named as foreground, contain a high contrast in relation to the background pixels, that is, the dark pixels. Because of that, we applied the Otsu threshold method [15] to detect the foreground pixels in f3 . The resulting image f4 from this process is illustrated in Fig. 1(c). Now, the foreground region is the white structure in Fig. 1(c). Posteriorly, to separate the foreground region (white structure) from the background region (dark pixels), an element-wise multiplication was applied between f4 and the gray scale level image f2 . Done that, the noisy foreground and background regions, f5 and f6 , respectively are obtained. The foreground and background images from this step are shown in Fig. 1(d) and Fig. 1(e) respectively. These two images are the input images of the second stage, where the noise present in these regions is removed. 2.2
Second Stage: Noise Filtering
To reduce some of the noisy artifacts present in the foreground region, the first step in the second stage of our proposed filter is to apply a morphological opening operator γ in f5 with the same structuring element B used in the first stage, resulting in f7 illustrated in Fig. 2(a). This operation can be understood as f7 = γ B = δ B (εB (f5 )).
(2)
This operation removes objects that are smaller than B through the erosion operator and restore in part the remained objects through the morphological dilation.
312
R. M. Stringhini et al.
Fig. 1. (a) Gray scale level input image f2 . (b) Resulting image f3 from opening by reconstruction. (c) Image f4 after Otsu threshold. (d) Noisy foreground f5 . (e) Noisy background f6 .
However, as can be seen through Fig. 2(a), it is possible to identify that some pixels were degraded and the contrast of the image was negatively affected. To restore the image contrast, we applied the CLAHE technique [16] which, according to [17], is very effective for medical images. The algorithm basically divides the image into regions of nearly the same size, which will form three different groups containing the corner, border and inner regions. After this grouping process, the histogram of each region is enhanced and the regions are combined back to their original positions using bilateral interpolation. In our proposed method, the image obtained from this step can be seen in Fig. 2(b) and is denoted as f8 . To restore the pixels degraded by the morphological opening operator, we used a morphological reconstruction by dilation. In our methodology, the mask image is the foreground image f5 and the marker image is f8 . The resulting reconstructed image f9 is illustrated in Fig. 2(c) and can be mathematically expressed as (n) f9 = δf5 (f8 ). (3) n≥1
According to Fig. 2(c), it is still possible to notice some noise is still present in the foreground region. To overcome this situation, we applied the BlockMatching 3D (BM3D) [4] filtering technique. Proposed by Dabov et al. [4], the BM3D technique can be understood in two steps: basic estimate and final estimate. In the first step, the filter processes the noisy image f9 to find similar blocks, grouping them in a 3D array called
Noise Reduction with Detail Preservation in Low-Dose Dental CT Images
313
groups to pass through the noise reduction process using a 3D transform and its inverse. To complete the first step, the processed groups are computed by weighted averaging, which generates the input image of the second step. The second step, named as final estimate, is done by grouping similar blocks between the original noisy image and the output image from the previously step (basic estimate) into two 3D groups. In this case, a 3D transform is also applied in both groups and a collaborative Wiener filtering is used to filter the noise, producing estimates of each group. Then, the inverse 3D transform is applied and these estimates are returned to their original positions. The final output image (see Fig. 2(d).), denoted as f10 , is obtained by aggregating all the estimates using weighted average. Now that the foreground region is completely processed, the noise present in the background region image f6 must be filtered. For that, we can simply apply another morphological opening operator with the same parameters used before. Done that, the filtered background image f11 is obtained and it is possible to see in Fig. 2(e) that the noise was successfully erased. The final procedure of the second stage and, consequently, of our proposed method, is to compute the filtered foreground and background regions in order to get the final noise-free output image f12 . For that, it is done an addition between f10 and f11 . Figure 2(f) shows the final noise-free output image of our proposed method.
Fig. 2. (a) Resulting image f7 from opening operator. (b) Resulting image f8 from CLAHE enhancement. (c) Resulting image f9 from reconstruction by dilation. (d) Filtered foreground after BM3D. (e) Filtered background. (f) Output image f12 of the proposed method.
314
3
R. M. Stringhini et al.
Experimental Results and Validation
To verify and validate the performance of our proposed method and the filters evaluated, several experiments were conducted in 991 low-dose computed tomography dental images corrupted by natural noise acquired during the scanning process. The experiments were conducted on Matlab R2015a. The performance of the proposed method is compared with some state of the art techniques, such as guided filter, bilateral filter, BM3D, bitonic filter, PSMF, NAFSMF, MDBUTMF, OBNLM, OCS, TTV (Truncated Total Variation) [12] and the hybrid filter proposed by [18]. The quantitative image quality metrics used to validate the obtained results were the PSNR (peak signal-tonoise ratio), MSE (mean-squared error), SSIM (structural similarity) and EPI (edge preservation index). The average results for each quantitative quality metric obtained by all the evaluated filters are described in Table 1 below. According to these results, our proposed method obtained superior performance in all quantitative quality metrics among the evaluated filters. Table 1. Average quantitative results of each evaluated noise reduction filter.
4
Filters
PSNR SSIM MSE
EPI
Guided filter Bilateral filter BM3D Bitonic filter PSMF NAFSMF MDBUTMF OBNLM OCS TTV Hybrid filter [18] Proposed
21.83 22.91 25.59 21.91 21.28 21.81 21.43 24.55 22.95 23.85 23.19 28.78
0.8 0.73 0.83 0.74 0.69 0.71 0.7 0.83 0.75 0.82 0.78 0.91
0.48 0.57 0.63 0.48 0.45 0.59 0.57 0.64 0.43 0.45 0.61 0.82
271.27 242.76 207.11 488.18 560.93 598.32 571.26 406.12 466.48 Indef 322.79 177.06
Results Discussion
The input noisy low-dose dental CT imagem taken as input is shown in Fig. 3 (a). Figure 3 (b) is represents the image processed by the guided filter, which is possible to observe that this technique left the noise present in the foreground region, as well as small structures and details, over blurred. In other words, the noise is still present in Fig. 3 (b) but with less distinctness.
Noise Reduction with Detail Preservation in Low-Dose Dental CT Images
315
Illustrated by Fig. 3 (c), the image processed by the bilateral filter also presents noise in the foreground region and traces of noise in the background region. However, borders, details and small structures were well preserved. The BM3D technique managed to remove the noise almost completely from the background region and preserved details very well, however, the noise from the foreground region has not been removed efficiently, which can be seen in Fig. 3 (d).
(a) Original
(b) Guided filter
(c) Bilateral
(d) BM3D
(e) Bitonic filter
(f) PSMF
(g) NAFSMF
(h) MDBUTMF
(i) OCS
(j) OBNLM
(k) TTV
(l) Hybrid filter [19]
(m) Proposed
Fig. 3. Visual performance of each evaluated filter.
The visual performance obtained by the bitonic filter is similar to the guided filter. The only difference is that there was a slight blur in small structures that
316
R. M. Stringhini et al.
can be observed in Fig. 3 (e). The images processed by the PSMF, NAFSMF and MDBUTMF, Fig. 3 (f), (g) and (h), respectively, obtained similar quantitative and visual results. Analyzing the visual aspects of these filters, it is possible to see that the noise was not removed at all but, on the other hand, details and edges were well preserved. The Optimized Bayesian Non-Local Means (OBNLM) technique demonstrated good performance in removing noise. According to Fig. 3 (i), it is possible to analyze that the noise in both foreground and background regions was well removed. Regarding the preservation of details, it was observed that the OBNLM technique also preserved important details when compared with the original image in Fig. 3 (a). The TTV technique obtained good visual results in relation to noise removal, as can be seen through Fig. 3 (k). Borders and part of structures remained preserved, however, the image processed by this technique does not have the same variations of gray in the foreground region that can be seen in the original image. The image processed by the hybrid filter proposed by Thakur et al. [18] (see Fig. 3 (l)) was able to partially remove the noise from the background region and left the foreground region with the same noise as in the input image. The positive aspect of this filter is that details were well preserved. Our proposed method outperformed all the filters evaluated in all the quantitative quality metrics used in this study. In Fig. 3 (m) it is possible to analyze that the noise present in both foreground and background regions has been removed satisfactorily. In addition, borders, small structures and details have been preserved and maintained in line with the original image (Fig. 3 (a)).
5
Conclusions
A noise reduction and detail preservation filter is proposed in this paper. The technique relies on the use of morphological operators and a 3D filtering and is divided in two main stages, named as image segmentation and noise filtering. The performance of the proposed method was validated using an anonymized database from the Universitary Hospital of Santa Maria, which consists of a total of 991 dental images from low radiation computed tomography exams, degraded by natural noise obtained during the image acquisition process. Also, the quantitative quality metrics used were the PSNR (peak signal-to-noise ratio), SSIM (structural similarity), MSE (mean-squared error) and EPI (edge preservation index). Our method outperformed all the evaluated state of the art techniques in all quantitative metrics and demonstrated better visual results. Comparing with the best average results presented in Table 1, our proposed method obtained an increase of 12.46% in the PSNR metric, preservad structures 11.11% better, had 14.5% less loss of information and obtained an increase of 9.63% in edge preservation. Acknowledgements. This study was financed in part by the Coordination of Improvement of Higher Level Personnel - Brazil (CAPES) - Finance Code 001.
Noise Reduction with Detail Preservation in Low-Dose Dental CT Images
317
References 1. Hounsfield, G.N.: Computerized transverse axial scanning (tomography): Part 1. Description of system. Br. J. Radiol. 46(552), 1016–1022 (1973) 2. Brenner, D.J., Hall, E.J.: Computed tomography–an increasing source of radiation exposure. N. Engl. J. Med. 357(22), 2277–2284 (2007) 3. Li, Z., et al.: Adaptive nonlocal means filtering based on local noise level for CT denoising. Med. Phys. 41(1), 011908 (2014) 4. Dabov, K., et al.: Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Trans. Image Process. 16(8), 2080–2095 (2007) 5. Wang, Z., Zhang, D.: Progressive switching median filter for the removal of impulse noise from highly corrupted images. IEEE Trans. Circuits Syst. II: Anal. Digit. Sig. Process. 46(1), 78–80 (1999) 6. Toh, K.K.V., Isa, N.A.M.: Noise adaptive fuzzy switching median filter for saltand-pepper noise reduction. IEEE Sig. Process. Lett. 17(3), 281–284 (2010) 7. Vasanth, K., Manjunath, T.G., Nirmal Raj, S.: A decision based unsymmetrical trimmed modified winsorized mean filter for the removal of high density salt and pepper noise in images and videos. Procedia Comput. Sci. 54, 595–604 (2015) 8. Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: ICCV, vol. 98. no. 1 (1998) 9. He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2013) 10. Coup´e, P., et al.: Nonlocal means-based speckle filtering for ultrasound images. IEEE Trans. Image Process. 18(10), 2221–2229 (2009) 11. Ze-Feng, D., Zhou-Ping, Y., You-Lun, X.: High probability impulse noise-removing algorithm based on mathematical morphology. IEEE Sig. Process. Lett. 14(1), 31– 34 (2007) 12. Dou, Z., et al.: Image smoothing via truncated total variation. IEEE Access 5, 27337–27344 (2017) 13. Yuan, C., Li, Y.: Switching median and morphological filter for impulse noise removal from digital images. Optik 126(18), 1598–1601 (2015) 14. Masood, S., et al.: A survey on medical image segmentation. Curr. Med. Imaging Rev. 11(1), 3–14 (2015) 15. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979) 16. Pizer, S.M., et al.: Adaptive histogram equalization and its variations. Comput. Vis. Graph. Image Process. 39(3), 355–368 (1987) 17. Reza, A.M.: Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement. J. VLSI Sig. Process. Syst. Sig. Image Video Technol. 38(1), 35–44 (2004) 18. Thakur, K., Damodare, O., Sapkal, A.: Hybrid method for medical image denoising using Shearlet transform and bilateral filter. In: 2015 International Conference on Information Processing (ICIP). IEEE (2015)
An Effective Approach to Detect and Prevent Collaborative Grayhole Attack by Malicious Node in MANET Sanjeev Yadav, Rupesh Kumar, Naveen Tiwari, and Abhishek Bajpai(B) Department of Computer Science and Engineering, Rajkiya Engineering College, Kannauj, Uttar Pradesh, India [email protected]
Abstract. Mobile Ad-hoc Network (MANETs) is a dynamic topological wireless network in which group of mobile nodes are interconnected with other nodes. Each node can communicate with each other by cooperating with other nodes hence do not require a centralized administrator. Due to dynamic topology MANETs are infrastructure-less so which makes easy for any malicious node to enter the network and disrupt normal functioning of network and security is compromised hence MANETs suffer from a wide range of attacks. One of these attacks is Grayhole attack in which a malicious node silently enters the network and partially drops the data packets passing through it. Due to partial dropping of packets it is difficult to detect grayhole node. We have proposed a scheme for detection and prevention from grayhole attack. Keywords: MANET · Grayhole attack · Malicious node · Backtrack AODV
1 Introduction Wireless sensor networks are a group of static or mobile specialized devices or sensors which are used to monitor different environmental conditions to collect and organize data at some central location. In order to communicate with each other mobile devices make use of a network, such a kind of network is MANET. MANET also known as mobile adhoc network. MANET is a network in which many autonomous nodes (composed of mobile devices) are wirelessly connected to each other to form a continuous and flexible network. MANET is a peer-to-peer, self-forming, self-healing network [21, 22]. Manet has many specialties that most of the other networks do not have, it’s self-configuring nature to infrastructure-less system separates it from the legion. Counting on its selfconfiguring nature means each node not only works as host but also acts as a router, if a node is forwarding or receiving data packets then the nodes should cooperate with each other. Due to presence mobile nodes it has a dynamic topology which in turn is responsible for its infrastructure-less nature. This makes the domain of application of MANETs very diverse starting from military crisis operations, emergency preparedness, secure and rescue operations, law enforcement, replacement of fixed infrastructure in © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 318–335, 2021. https://doi.org/10.1007/978-3-030-49342-4_31
An Effective Approach to Detect and Prevent Collaborative Grayhole Attack
319
case floods or fire. In order to provide efficient communication between thee nodes in MANET different routing protocol are used. Main aim of each routing protocol is to minimize routing message overhead with increasing number of mobile nodes, placing best route in routing table in case more than one route is available to destination, to add new routes, handling replacement of lost or broken route with the best currently accessible route at hand (Fig. 1).
Fig. 1. Mobile adhoc network architecture
Moreover routing protocols are divided into set of three classes, these are Proactive, Reactive and Hybrid protocols. Proactive protocol maintains list of destination and their routes by periodically distributing routing tables throughout the network i.e. each node maintains one or more tables representing entire topology of network. These routing tables are updated regularly so that they maintain fresh routing information from every node to every other node in the network. But maintaining fresh routing information charges a high price in form of extra overhead which influences overall throughput of the network. Distance Vector (DV) protocol, Destination Sequence Distance Vector (DSDV) protocol, Link state routing (LSR) protocol fall under the proactive category. Reactive protocol also called as the on-demand routing protocol as each node in the network will discover or maintain route on the basis of demand. But the issue with reactive protocols is that they incur high latency. Examples of Reactive protocol are Ad-hoc on demand routing (AODV), Dynamic Source Routing (DSR), and Associativity based Routing (ABR). Hybrid protocol is combination of both proactive and reactive protocol. In proposed research paper prime attention is paid in detecting Grayhole nodes in AODV routing protocol because AODV is most commonly used routing protocol in MANETs. In AODV routing information is only broadcasted when needed i.e. not periodically broadcasted. If source node desires to dispatch data packets then it launches route discovery process and tries to discover the route before dispatching data packets. But with such great features of a network come a number of challenges. The primary challenge of
320
S. Yadav et al.
MANETs is to be secure. Security means protecting privacy, availability, integrity and non-repudiation. Security implies the identification of potential attacks, threats and vulnerability of certain system from unauthorized access, use, modification or destruction. MANETs suffer from two types of attacks passive and active. In passive attack, normal functioning and resources of network are not compromised but information is disclosed to malicious (unauthentic) member of network. In active attack, information is shared as well as modified by the malicious node that disturbs normal functioning of network. Due to lack of infrastructure these network are highly vulnerable to routing attacks also referred as network layer attacks. Attacks such as Blackhole attack, Grayhole attack and Warmhole attack lie under this category (Fig. 2).
Fig. 2. Selective packet dropping by grayhole node
In Blackhole attack, malicious node advertises that it has the shortest path to the destination. If malicious node is able to send fake reply to source before authentic node then it can attract all the packets then it may drop all the packets without sending them to further nodes. Grayhole attack is a variant of blackhole attack. In Grayhole attack the malicious node shows dual behaviour i.e. it functions as normal node but turns malicious at a certain time. Malicious node selectively rejects/forwards the data packets by dropping them into the network from an individual or group of individual node. Selective dropping technique makes it much more difficult to detect and hence Grayhole attack seems much more troubling than Blackhole attack. This attack disturbs normal functionality of network as well as their resources are compensated. Grayhole attack makes a large impact on network by decreasing packet delivery ratio, increasing overheads hence putting impact on overall throughput of network. Due to large application domain of MANETs, they are prone to number of threats in which security is the prime issue. Providing Security is a difficult problem. The traditional security based methods are not suitable for Grayhole attacks. In this paper we have proposed a scheme for detection of malicious node in network and hence avoid Grayhole attacks.
An Effective Approach to Detect and Prevent Collaborative Grayhole Attack
321
2 Related Work A number of techniques have been proposed in context of gray-hole attacks. The major problem with most of the present solutions is that they can only detect sequence number based gray hole attacks. While the solutions for smart gray-hole attacks require monitoring and calculations that require a huge amount of energy. Zardari et al. [1] proposed a Dual attack Detection for Black and Gray-hole attacks (DDBG) technique using Combined Dominating Set (CDS) for a graph. Intrusion Detection System (IDS) is constructed from CDS set of the graph having enough energy levels and no blacklisted nodes. Node which is trusted and has highest energy level is selected as the IDS query issuing node. It sends question packets periodically to detect any malicious nodes by checking the replies. The questions involve sequence no, no of packets received and forwarded and the no of packets dropped with the reason for dropping. Any fabricated reply or no reply helps the IDS to detect the malicious node and it sends block message throughout the entire network. In case the malicious node changes its location to a different set, it still will be blocked since the block messages will be transmitted to all the nodes selected via CDS. The problem with this approach is that it requires the IDS nodes to be in promiscuous mode which is energy consuming. Apart from this the IDS nodes also have to perform calculations that have higher energy requirements. [2] Marepalli Radha and M. Nagabhushana Rao have come up with an approach of securing detecting preventing and eliminating of Grayhole attacks. The study is done in DSDV protocol. The idea is to detect and eliminate infected node that participate in route discovery, shortest path is the latest source routing table. Initially an autonomous network is formed then accessibility of node is verified. Next step is to initiate route discovery, source node checks its route cache to verify valid routes between source and destination. If no route is found it initiates route discovery process, after route discovery in application layer verification for each nodes’ key is done. After obtaining RREP and RREQ malicious node detection is done using a novel secure mechanism. Kukreja et al. in [3] has focused on energy requirements by an intrusion detection system. The proposed solution suggests to save energy by making the IDS nodes to enter into promiscuous mode only for smaller intervals of time. The proposed solution uses the advantage of table driven routing protocol. The IDS set is selected keeping in mind that the nodes in the IDS set are not in the malicious and have sufficient energy. The IDS nodes are selected using CDS technique. The source node maintains a table of various paths towards the destination node, accusation list and malicious list. These entries are also stored by the destination node. The path table contains various routes with the number of packets transmitted in checkpoint time. Accusation list is supplied by the destination node to the source node. After transmitting the data the source node sends the table stored by it to the destination node. The destination node compares its table with the table sent by the source node. If any anomaly (packet drop is greater than a threshold) is detected between the two tables then destination node divides the various paths into two sets, reliable and unreliable. There is a chance of existence of malicious nodes in the non- reliable routes. The destination node calculates the reliability index of the intermediate nodes and put them into accusation or malicious list as per the values obtained. After building the accusation list the destination node sends this list to the source node using the most reliable path or using the path formed by the IDS nodes. On
322
S. Yadav et al.
receiving the accusation list the source node can transmit further packets by determining the reliability values of the various nodes. The paths having accused nodes are monitored by the IDS nodes and if the packet drop is greater than a threshold, the nodes considered as malicious and removed from source table and a new path is chosen otherwise the same path is selected if not detected to be malicious. The proposed solution is mainly based on energy usage of the IDS nodes. Energy usage of the IDS nodes are efficiently reduced by putting them into promiscuous mode only at certain times. The author has compared the solution to be more energy efficient as compared to crypto key encryption technique. But no focus has been given on the energy usage by the source and destination node for the calculation of the accusation table. Another overhead occurs if the number of intermediate nodes present is large as the table will grow storing each intermediate node, growing the size of the accusation and malicious table. Sarika U Patil et al. [4] has proposed a grayhole detection scheme by performing clustering of nodes in the network. The design in this paper is such that it provides a dynamic cluster head selection to detect gray hole attack via using its Battery Power. The main idea of paper is to divide nodes into a subnetwork called cluster and each cluster is managed by a leader called cluster head. Cluster head selection is totally dynamic along with this, MD5 algorithm is used for extra security purposes and providing less routing overhead. Each cluster head broadcasts a message from source to destination node and each node receiving message replies with NACK but grayhole node replies with fake NACK hence it is detected by the source node. Clustering technique provides efficiency to system but also comes with a loop hole i.e. if malicious node participates in cluster head selection process and becomes cluster head then attack of even much greater scale of impact will occur causing damage to normal functioning of the network. Sandhyavenu et al. [5] proposed a modified version of AODV, invincible AODV to detect grayhole and blackhole attacks in MANETs. There are two versions of solutions proposed a) Frame Check Sequence (FCS) mechanism b) Enhanced FCS mechanism. Two phases are incorporated in the FCS mechanism, first monitoring is done for the proper transmission for packets from source to destination. Each packet transfer involves check value generation. These check values are stored by the node itself and the previous node by passing it via acknowledgements. The destination upon receiving packets replies with acknowledgement packets which helps the source node to detect any errors if occurred during packet forwarding. The acknowledgements are used both in node to node transmission and source to destination transmission. End to end acknowledgements help the source node to identify any forged acknowledgements. Node to node acknowledgements helps in isolating the malicious node in the network. If any malicious behavior is identified, detection of malicious node starts. The source node enquires about the check values for the node where packet drop has occurred and if incorrect check values are received the node is blacklisted. The solution is effective for detection of smart gray-hole nodes. A major drawback of this approach is the large computation required by the source node in case any malicious activity is found. Acknowledgements also require bandwidth and energy consumption. The enhanced FCS involves use of digital signatures to validate the check value replies are from valid owners. This further adds up to computation cost which results in energy consumption.
An Effective Approach to Detect and Prevent Collaborative Grayhole Attack
323
S.V. Vasantha et al. [6] proposed a scheme to detect grayhole nodes in AODV protocol. In this scheme grayhole nodes are detected by filtering its RREPs, the filtering is done by intermediate nodes as well as source node and then uses the valid shortest path for routing. The scheme is divided into two stages, first is prevention and the second one is detection. In this scheme destination sequence number (DSN) is used for filtering RREPs, since grayhole nodes have the highest DSN so on comparison with RREPs DSN the malicious nodes are detected. At source nodes with minimum DSN RREPs are selected and an algorithm is applied to find shortest route till destination. Following scheme provides prevention and detection from collaborative grayhole attack but also gifts overhead to the network as its detriment. Authors in [7] has proposed scheme based on comparing destination sequence number (DSN) of RREPs by intermediate nodes. Each intermediate node calculates a PEAK value after every time interval and the peak value is treated as the maximum value of DSN for current time interval. So, when an intermediate node receives RREP then it compares RREPs DSN to the PEAK value if the DSN is greater than the peak value then node is marked as DO_NOT_CONSIDER node. The proposed algorithm detects malicious node effectively but calculating and comparing PEAK value at each intermediate node increases computational overhead. Sharma et al. [8] discussed use of two phase algorithm to detect and remove black hole and gray-hole attacks in MANETs. The first phase involves the route discovery phase that detects black hole nodes. The second monitoring phase helps in detection of gray-hole nodes. In the route discovery phase the sender sends a trap RREQ packet which contains a destination address that does not exist in the network. The black hole node upon receiving the RREQ does not check its routing table and reply with a fake RREP. In this way it can be detected and blocked. If any other gray hole node still remains, it is detected in the monitoring phase. In the monitoring phase all the nodes work in promiscuous mode monitoring their neighbor nodes. Packet drop ratios by the neighbor nodes are calculated by the nodes and checked with a threshold value set by an algorithm. If the ratio drops below the threshold the node is put in the malicious node list. The approach is simple but for each node to be working in promiscuous mode every time there is a huge energy overhead. Authors in [9] have proposed a grayhole detection scheme that is based on grayhole node pattern of packet forwarding. The main idea of this scheme is to use two types of packets i.e. TCP and UDP/FTP packets as TCP packets comes with acknowledgement so grayhole node does not drops TCP packets but in case of UDP/FTP packets grayhole node drops them because UDP protocol is a connectionless protocol and does not involve exchange of acknowledgements. So with the help of UDP packets grayhole node detected. Process involves forwarding UDP packets from source to destination and each intermediate node as well as destination node replies with an ACK so if any node does not reply with ACK then it is suspected to be a malicious node. This step is looped several times and once the probability of malicious node reaches over 60% then it is considered as a malicious node and the information about the malicious node is broadcasted in the network and hence grayhole nodes are excluded from the network. The scheme provides grayhole node detection but using probabilistic approach does not ensure accuracy so sometimes result may lead to wrong interpretations.
324
S. Yadav et al.
Sachan et al. [10] presented a modified AODV protocol. The proposed solution includes three steps: a) Transmit RREP packet b) Data routing information count c) Reliability checking of a route. The RREP packets are broadcasted and monitored by the nodes. Each and every node stores a broadcast route reply list. Additional Data Routing Information (DRI) field contains count value corresponding to route reply in RREP broadcast list. When a node receives a RREP from its neighboring packet it’s count value is incremented by 1 except for the first time. Upon receiving the RREP packet, each node looks for its entry in the routing table. If it exists, then it is considered reliable and its count value is incremented by 1. Data transfer can be done now via this route. Otherwise a new entry is made with route reply count 0. This approach yet again can detect sequence number based gray hole attacks but not smart gray hole attacks. Further a malicious node can get its route’s count value increased without being caught easily. Storage and energy overhead is there because each node has to store and monitor DRI table entries. Authors in [11] have discussed about MANTEs, its capabilities, advantages, disadvantages. Moreover authors have focused on the security issues and the vulnerability aspects which ensure integrity, availability and confidentiality of network. Paper primarily focuses on how grayhole and blackhole attacks reduces network performance also different mechanisms to mitigate and detect Grayhole nodes is discussed. Gurung et al. [12] proposed MGAM using G-IDS nodes. The nodes would be specially programmed and would cover all the area of the node deployment. These special nodes will be listening to all the packets transfers carried out by the nodes in their neighbourhood being in promiscuous mode. If the packet delivery ratio decreases below a drop threshold calculated via an algorithm, it will alert all the nodes in the system about the node being as malicious. This approach can detect smart gray hole attacks. The major drawbacks are the selection of special G-IDS nodes. These nodes also require computation and energy considerations for usage. Further, introducing a new set of special nodes is less cost effective. In [13] Shani Makwana Krunal Vaghela has come up with an approach based on destination sequence number for detection of grayhole attacks. Proposed work initiates by assigning default credit value to each node, these credit value proves the authenticity of any node in the network. Further the credit values of each node is increased or decreased based on RREQ message forwarding. If an adjacent node receives RREQ from intermediate node then credit score is incremented by 1, if not so then decreased by 1. In case if a node continuously broadcasts RREP then its credit score is decremented. Once the credit score becomes zero then DSN is compared with SSN if the difference is very high i.e. DSN is very high as compared to SSN (Source Sequence Number) then the node is declared as a malicious one else it is the legitimate one. The proposed scheme performs well at detection of grayhole node but comparing DSN to SSN and continuously incrementing and decrementing the credit values brings overhead to the network. Authors in [14] have proposed a Cooperation based detection scheme (CBDS) as solution. The proposed mechanism is capable of detecting individual as well as collaborative grayhole attacks. In CBDS, source node uses address of an adjacent node as destination in RREQ which in turn acts as a bait for malicious grayhole node and after
An Effective Approach to Detect and Prevent Collaborative Grayhole Attack
325
malicious node sends fake RREP then malicious node are detected using reverse tracing step, after malicious node is detected then they are listed in grayhole list which alerts other nodes to communicate with the nodes in the grayhole list. As the proposed scheme implements both proactive and reactive step which shows it’s excellence over other proposed works but CBDS suffers through high overhead problem and is also incapable of detecting adjacent malicious node in the network. For detecting grayhole/blackhole attacks in MANETs. Dhaka et al. [15] suggested a modification to the AODV protocol. Two packets are introduced in AODV Code Sequence packet (Cseq) and Response Sequence Packet (Rseq). Each intermediate node on route discovery sends Cseq packet to all its neighboring packets. The neighboring packets upon receiving the Cseq respond with Rseq packets. If the Rseq packet matches with the Cseq then the node is allowed to connect otherwise the node is considered as malicious and discarded. This approach is effective only for sequence number based gray hole attacks as it works in the route discovery phase. In [16] N. Dharini has proposed a Light weight learning based prediction algorithm to detect malicious grayhole nodes in the network based on LEACH routing protocol. The main idea of proposed algorithm is to monitor energy consumption by each node and compare it to the predicted threshold value. As grayhole node drops the packets selectively so it consumes less energy than the energy consumed by a legitimate node. In this scheme each cluster head performs intrusion detection so it helps to detect malicious node within each cluster and hence malicious node in each cluster are detected. As in existing IDS only sink node is used to perform detection but given scheme takes an advantage over other scheme by using cluster head for detection. No doubt the work achieves energy consumption but using a predicted value which may be inaccurate can lead to capricious result. Authors in [17] has proposed a technique that assists to detect malicious grayhole node in network and eliminating it from the network. The main idea of this paper starts with selecting a node among 3–4 Candidate Nodes which is having the highest energy and is called BackBone Node (BBN). BBN maintains a table used for detection and elimination of grayhole node. Source node sends a request to BBN for Restricted IP (RIP). After receiving RIP from BBN node source node source node broadcasts a RREQ containing RIP as destination address. If RREP comes then there exists a malicious node in the route reply. Further source node again sends a dummy data packet to the path and starts monitoring if its neighboring node is forwarding the data packet or not, if neighboring node is malicious then it drops the packets else same verification process is done by neighboring and other intermediate nodes until the malicious node is detected. On detection of malicious node its ID is sent to BBN and table is updated. This process is repeated and when same node is found malicious again then it is declared as malicious node and the information about malicious node is broadcasted in the network. The proposed method detects grayhole node but it comes at a cost of high network overhead and this technique is capable of detecting only single malicious node at a time. So, if a network contains multiple grayhole nodes then process will be looped until all grayhole nodes are detected which overhead to network.
326
S. Yadav et al.
In [18] Meenakshi Patel and Sanjay Sharma has come up with Support Vector Machine(SVM) which is a machine learning based approach for detection of malicious grayhole node in the network. SVM-based IDS consists of three phases data gathering module, detection module and response module. Each module is dependent on its previous module and SVM starts with taking set of inputs and these inputs are Packet Delivery Ratio (PEDR), Packet Modification Ratio (PMR), and Packet Misrouted Rate (PMIR). Evaluation is performed on the basis of set of these inputs and comparison is made with the previously decided threshold values (Packet delivery ratio must be greater than 70%, Packet Modification Rate should be less than 30% and Packet Misrouted Rate must be less than 20%). If set of values fall under the specified threshold then node is considered to be legitimate else the node is considered to be malicious and hence Grayhole nodes are detected. The proposed scheme is able to detect the malicious nodes and provides an efficient way but nature of grayhole node is not always fixed so in many case it can lead to inappropriate interpretations also there is no reliable factor of deciding the threshold values these are just decided by observing nature of Grayhole nodes. In [19] Onkar V. Chandure et al. has come up with an efficient scheme to detect grayhole attacks in MANETs. As AODV is a reactive protocol so more vulnerable to such type of attacks. The main idea of this paper is to select a coordinating node (CN) based on data routing information from neighbor of Initiator node (IN), Initiator node is the node which initiates the process of malicious node detection. After selection of CN, IN broadcasts a RREQ to its 1-hop neighbors’ asks for the path to CN. RREPs are received from neighboring nodes, probability is that one of them may be a suspected node (SN). Later IN sends a probe packet to CN via SN, after TTL of probe packet is over then it checks if CN has received the packet. If CN has received the packet then SN is a legitimate node else it is considered to be a malicious node and under DRI table it adds zero in Probe status column (Check Bit) and a message is sent to all nodes in the network informing about the malicious node. The proposed scheme is helpful in identifying the malicious node but it comes at a cost of routing overhead and in case malicious node alters its DRI table and becomes CN then even worse scenario will occur because CN is not checked whether it is malicious or not rather whole verification is performed for Suspected node which may be malicious or maybe not. Authors in [20] have proposed a contradiction based grayhole detection scheme. The scheme functions in OLSR routing. In OLSR two types of messages i.e. HELLO and Topology Control messages are owned to frame the network topology and only Multi Point Relays (MPR) are allowed to retransmit which prevents network traffic. So the given scheme verifies the HELLO message on the basis of some contradiction rules and if any node fails to satisfy the proposed rules then they are shortlisted as malicious node and this information is circulated in the network. This scheme dominates over other by using only internal information without leaning on any centralized or extraneous honored party. But for such nodes where there is no other connection to same node i.e. when a node is the only path to 2-hop neighbors’ of source the proposed scheme suffers.
3 Proposed Work AODV routing protocol executes in two phases one of them is route discovery which involves discovery of path from source to destination through which data/message is
An Effective Approach to Detect and Prevent Collaborative Grayhole Attack
327
going to be transmitted. Grayhole attack bluffs the route discovery phase by acting as legitimate node and sends fake reply to the source because of which many existing IDS are unable to find the malicious node in route discovery phase. Our proposed technique ensures both gray hole avoidance as well as its prevention. In our proposed scheme we have modified the way AODV does the route discovery. The proposed scheme detects the malicious node in route discovery phase as such, the route discovery phase involves sending only RREQ but in proposed scheme lightweight dummy data packets (DDP) will be sent along RREQ and these DDPs differentiate our algorithm from other IDS. As DDP propagates from one node to other successive nodes then each transmitting node has a responsibility of monitoring its successive node. Monitoring involves two aspects on the basis of which further actions are taken 1. Packet Drop Ratio (PDR) 2. Destination Sequence Number (DSN) PDR is the ratio of the packets dropped by each node and it is measured by calculating the drop ratio of the DDPs and DSN of each node is compared with the sequence number in the RREQ packet. If PDR is confined within specified/pre-decided range and DSN is also is in range of the sequence number of RREQ packet only then the node is considered to be legitimate else node may be a malicious node or may be a suspected node. On the basis of PDR and DSN we classify nodes in three categories:-Legitimate node (1), Suspected node (0) and Malicious node (−1), values in the brackets signify the flag variable values. According to our proposed algorithm each node has following characteristics node_id, node_next which corresponds to next node with respect to the current node, node_flag corresponds to flag value of node. Each node flag variable is stored in the node which is monitoring the nodes thus on the basis of node categorization algorithm each node will be assigned a flag variable with the help of which nodes are categorized. With the help of Node Categorization Algorithm existing malicious nodes in the network can be identified and the Grayhole attack can be stopped. The approach is divided into various phases which involve flag variable distribution along with malicious node detection. Main idea behind the proposed algorithm is to reduce the participation of unwanted nodes that can disrupt the network. Initially in route discovery phase DDPs are sent along RREQ so if any node drops DDPs more than decided threshold then automatically that node is considered to be malicious. This mechanism allocates each node a flag variable. The idea is to transmit the data packets through such nodes whose flag variable is one. After this second phase starts in which actual data packets are sent by the source and the data packets travel through only those nodes which have a flag value of one and if data packets are sent to a node whose flag variable is not one then data packets quickly backtrack from that node to previous node having flag value one then again the data packets are transmitted to nodes having flag value as one.
328
S. Yadav et al.
3.1 Node Categorization Algorithm Components of Node Categorization Algorithm DSN: Destination Sequence Number DDP: Dummy Data Packet PDR: Packet Drop Ratio RREQ: Route Request node_id: identification number of node flag: flag variable of each node Node Categorization Algorithm
1. Node_Category(DSN, DDP) 2. if (DSN)node is greater than (Sequence Number)RREQ or PDR is greater than 30% 3. node is malicious_node() 4. flag ← -1 5. else if PDR is greater than 10% and PDR is less than 30% 6. node is suspected_node() 7. flag ← 0 8. else if PDR is less than 10% 9. node is legitimate_node() 10. f lag ← 1 11. endif 12. return flag, node_id
3.2 Backtrack Algorithm Backtrack Algorithm helps to backtrack to previous node in case the current node is found to be malicious or suspected one which is identified using above specified node categorization algorithm. Backtrack Algorithm 1. Backtrack(node) 2. node ← prev_node 3. if (flag)node is equal to 1 4. node ← node + 1 5. return node
To send route request another algorithm is proposed which includes propagating RREQ to next node until destination is reached as well as regular monitoring of nodes is
An Effective Approach to Detect and Prevent Collaborative Grayhole Attack
329
done to check if they are malicious. In case node is identified to be a malicious one then with the help of Backtrack (node) backtracking is done to the node having flag value as 1 i.e. legitimate node (Fig. 3).
Fig. 3. Backtracking detection scheme
DDPs and RREQ broadcasted in the network. Back-track if any malicious node or suspected node is encountered. RREP + DDP_ACK sent by destination node via safe route.
330
S. Yadav et al.
3.3 Secure RREQ Algorithm Secure RREQ Algorithm 1. Send_RREQ(s, d, DDP) 2. i ← s+1 3. safe_route[n] 4. while( i is less than d and i is less than n) 5. rep: 6. Node_Category( i, DSN, DDP) 7. if ( flag is equal to 1) 8. safe_route[i] += node_id 9. else 10. Backtrack(i) 11. goto rep 12. if route is equal to safe_route 13. Send_RREP(d, s, DDP_ACK) 14. else 15. Send_RREQ(s, d, DDP)
In above Send_RREQ(s, d, DDP) algorithm: s = source node d = destination node safe_route[n] = stores the safe_route DDP_ACK = acknowledgement of DDP from destination. Finally information of malicious nodes which is stored in a list will be broadcasted in the network. Each legitimate node sends data packets via safe route to avoid any kind of dropping of data packets. Unless such a node having flag variable 0 or –1 is encountered, RREP+DDPs continue to forward through intermediate nodes and as a malicious or suspected node is encountered, it means that the route is not safe and may be infected by malicious nodes. So as soon as such kinds of nodes are sniffed backtracking is initiated for such nodes which are legitimate/normal. Finally when a safe route is found to destination then RREP comes back with a DDP_ACK.
4 Simulation Results Network Simulator-3 which is a discrete event simulator is used to simulate the performance of proposed Backtrack Grayhole detection and avoidance algorithm, GNU Plot library is helps to configure generated simulation graphs. In this section network performance variables, throughput and packet delivery ratio are calculated. Throughput is defined as maximum rate at which the packet are processed and received. Information of Simulation variables is given below (Table 1):
An Effective Approach to Detect and Prevent Collaborative Grayhole Attack
331
Table 1. Simulation requirements Parameters
Values
Simulation tool
NS3/GNU-Plot
Simulation runtime
200 s
Topological range
300 × 1500 m2
Total number of nodes 50 Grayhole attack nodes 10 Mobility model
RandomWayPoint
Routing protocol
AODV
4.1 Packet Delivery Ratio Packet Delivery Ratio can be defined as the number of packets that are successfully transmitted from sender-end to receiver-end, ideally packet delivery ratio should be 100% (Fig. 4).
Fig. 4. Packet delivery ratio vs malicious node ratio
The above graph shows the comparison of different routing schemes with the proposed backtrack algorithm. On the x-axis we represent number of nodes and on the y-axis represents packet delivery ratio. Simulation resulted in decrease packet delivery ratio with increase in number of malicious node but in each case behaviour observed different. In case of DSR packet delivery ratio constantly decreases with maximum rate, this rate is relatively low in case of CBDS and lowest in case of proposed backtrack AODV. Due to backtracking feature of proposed algorithm rate of packet delivery doesn’t drops drastically.
332
S. Yadav et al.
4.2 Throughput Network throughput is represented as data packets successfully moved from one point to another point and measured in kilo-bits per seconds (kbps) (Fig. 5).
Fig. 5. Throughput vs number of nodes
Above represented graph shows variation of throughput with increasing number of nodes. AODV without proposed backtrack mechanism results in lesser throughput than in AODV with backtrack mechanism hence the proposed AODV proves to be better than the traditional AODV mechanism. 4.3 Control Packet Overhead In proposed algorithm we are using DDPs, these DDPs helps to identify malicious nodes in the network by monitoring their packet drop ratio. Since DDPs are very lightweight packets still they add some overhead to the network. In the above graph x-axis represents number of data packets transmitted between nodes and y-axis represents control packet overhead. It can be observed in the graph that small amount of overhead is being added by DDPs moreover for small number of data packets overhead is significant but as the number of data packets increases overhead aligns itself to nearly 100% (Fig. 6). 4.4 Packet Misroute Ratio In Grayhole attack, data packets are either dropped by the malicious node or the packets are misrouted to malicious route, these packets increases packet misroute ratio (Fig. 7). As simulation results suggest our proposed algorithm ensures that the packets are not misrouted and it can be observed that there is no significant change in packet misroute ratio.
An Effective Approach to Detect and Prevent Collaborative Grayhole Attack
333
Fig. 6. Control packet overhead vs number of data packets
Fig. 7. Packet misroute ratio vs number of nodes
5 Conclusion Due to infrastructure-less architecture and no centralization, AODV routing protocol is highly vulnerable to network layer attacks. Blackhole and Grayhole attacks are the major threats to the security of AODV routing protocol. As Grayhole attacks has evolved with time because of which existing IDS are not effective for Grayhole detection. In this paper we have proposed a new backtracking detection scheme. This scheme offers Grayhole avoidance as well as its detection. Use of dummy packets and monitoring may push you to think about the introduced overhead to system but the packets used are so lightweight that it offers almost negligible overhead to the system. But this cost is nothing if compared to the merits this scheme. Our proposed scheme makes use of
334
S. Yadav et al.
flag variables to ensure that no malicious node participates in process. This mechanism allows the data packets to travel through only such nodes which are legitimate and avoids the packets to travel through malicious nodes. The proposed scheme is highly effective in detecting Grayhole nodes and in future further work can be done to improve this scheme to increase its efficiency in case of cooperative attacks.
References 1. Ali Zardari, Z., He, J., Zhu, N., Mohammadani, K.H., Pathan, M.S., Hussain, M.I., Memon, M.Q.: A dual attack detection technique to identify black and gray hole attacks using an intrusion detection system and a connected dominating set in MANETs, 5 March 2019 2. Radha, M., Rao, M.N.: Gray hole attack detection prevention and elimination using SDPEGH in MANET, 3 February 2019 3. Kukreja, D., Dhurandher, S.K., Reddy, B.V.R.: Power aware malicious nodes detection for securing MANETs against packet forwarding misbehavior attack, 22 April 2017 4. Patil, S.U.: Gray hole attack detection in MANETs. In: 2nd International Conference of Convergence in Technology (2017) 5. Sandhya Venu, V., Avula, D.: Invincible AODV to detect black hole and gray hole attacks in mobile ad hoc networks, 16 August 2017 6. Vasantha, S.V., Damodaram, A.: Bulwark AODV against Black hole and Gray hole attacks in MANET. In: ICCIC 2015, 21 March 2016 7. Schweitzer, N., Stulman, A.: Contradiction based gray-hole attack minimization for ad-hoc networks. IEEE Trans. Mob. Comput. (2016). https://doi.org/10.1109/tmc.2016.2622707 8. Sharma, N., Bisen, A.S.: Detection as well as removal of black hole and gray hole attack in MANET, March 2016 9. Angu, M., Anand, S.: Detection and avoidance of gray hole attack in mobile ad-hoc network. Indian J. Sci. Technol. 9(47), 6 (2016) 10. Sachan, K., Lokhande, M.: An approach to prevent gray-hole attacks on mobile ad-hoc networks, August 2016 11. Chowdari, R., Srinivas, K.: A survey on detection of blackhole and grayhole attacks in mobile ad-hoc networks, 5 May 2017 12. Gurung, S., Chauhan, S.: A novel approach for mitigating gray hole attack in MANET, 26 August 2016 13. Makwana, S., Vaghela, K.: Detection and elimination of gray hole attack using dynamic credit based technique in MANET. Int. J. Comput. Appl. (0975 – 8887) 125(4), 1–6 (2015) 14. Chang, J.-M., Tsou, P.-C., Woungang, I., Chao, H.-C., Lai, C.-F.: Defending against collaborative attacks by malicious nodes in MANETs: a cooperative bait detection approach. IEEE Syst. J. 9(1), 65–75 (2015) 15. Dhaka, A., Nandal, A., Dhaka, R.S.: Gray and black hole attack identification using control packets in MANETs, March 2015 16. Dharini, N., Balakrishnan, R., Renold, A.P.: Distributed detection of flooding and gray hole attacks in wireless sensor network. In: ICSTM 2015, 6 May 2015 17. Pokhariyal, S., Kumar, P.: A novel scheme for detection and elimination of blackhole/grayhole attack in MANETs. IJCSMC 3(12), 217–223 (2014) 18. Patel, M., Sharma, S.: Detection of malicious attack in MANET a behavioral approach. In: 3rd IEEE International Advance Computing Conference (IACC), 8 March 2013 19. Chandure, O.V., Gaikwad, V.T.: Detection & prevention of gray hole attack in mobile ad-hoc network using AODV routing protocol. Int. J. Comput. Appl. (0975 – 8887) 41(5), 27–32 (2012)
An Effective Approach to Detect and Prevent Collaborative Grayhole Attack
335
20. Jhaveri, R.H., Patel, S.J., Jinwala, D.C.: A novel approach for GrayHole and BlackHole attacks in mobile ad-hoc networks. In: 2012 Second International Conference on Advanced Computing & Communication Technologies, May 2012 21. Singh, S., Bajpai, A., Suryambika: A survey on black hole attack in MANET. In: Afzalpulkar, N., Srivastava, V., Singh, G., Bhatnagar, D. (eds.) Proceedings of the International Conference on Recent Cognizance in Wireless Communication & Image Processing. Springer, New Delhi (2016) 22. Bajpai, A., Nigam, S.: Normalized scores for routes in MANET to analyze and detect collaborative blackhole attack. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds.) Smart and Innovative Trends in Next Generation Computing Technologies, NGCT 2017. Communications in Computer and Information Science, vol. 828. Springer, Singapore (2018)
Hand-Crafted and Learned Features Fusion for Predicting Freezing of Gait Events in Patients with Parkinson’s Disease Hadeer El-ziaat1(B) , Nashwa El-Bendary2 , and Ramadan Moawad1 1
2
Future University in Egypt, New Cairo, Egypt {hadeer.elziaat,ramdan.mowad}@fue.edu.eg Arab Academy for Science, Technology and Maritime Transport (AASTMT), Smart Village, Egypt [email protected]
Abstract. Freezing of Gait (FoG) is a common symptom of Parkinson’s disease (PD) that causes intermittent absence of forward progression of patient’s feet while walking. Accordingly, FoG momentary episodes are always accompanied with falls. This paper proposes a novel multi-feature model for predicting FoG episodes in patients with PD. The proposed approach considers FoG prediction as a multi-class classification problem with 3 classes; namely, normal walking, pre-FoG, and FoG events. In this paper two feature extraction schemes have been applied, which are time-domain hand-crafted feature engineering and Convolutional Neural Network (CNN) based spectrogram feature learning. Also, after fusing the two extracted feature sets, Principal Component Analysis (PCA) algorithm has been deployed for dimensionality reduction. Data of three tri-axial accelerometer sensors for patients with PD, in both principleaxes and angular-axes, has been tested. Performance of the proposed approach has been characterized on experiments with respect to several Machine Learning (ML) algorithms. Experimental results have shown that using multi-feature fusion with PCA dimensionality reduction has outperformed using the other tested typical single feature sets. The significance of this study is to highlight the impact of using feature fusion of multi-feature sets on the performance of FoG episodes prediction. Keywords: Freezing of gait (FoG) · Parkinson’s disease (PD) · Machine learning · Convolutional neural network (CNN) · Angular-axes features · Spectrogram · Principal Component Analysis (PCA) · Multi-feature fusion
1
Introduction
Parkinson’s disease (PD) is a degenerative nervous system disorder causing inability to control body movements. Additionally, Freezing of Gait (FoG) is c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 336–345, 2021. https://doi.org/10.1007/978-3-030-49342-4_32
Hand-Crafted and Learned Features Fusion for Predicting Freezing
337
a common symptom of PD that causes intermittent absence of forward progression of patient’s feet while walking. FoG, as a common motor symptom, typically lasts between a few seconds and up to one minute. During the FoG episodes PD patients intermittently feel that their feet are stuck to the floor as if held by magnets when trying to walk. Accordingly, FoG momentary episodes are always accompanied with falls, the case that impair daily life activities [1]. Authors in many research studies have used various approaches for achieving enhanced detection of FoG episodes in patients with PD. However, few experiments have achieved notable accuracies for enhancing prediction of FoG episodes. Authors in studies for prediction of FoG episodes proposed several approaches based on using different feature extraction methods and machine learning models. In [2], authors aimed at identifying and quantifying the characteristics during pre-FoG gait through comparing it with the characteristics of the gait that doesn’t precede FoG. This study depends on threshold-based model of FoG using CuPiD dataset. When applying a linear discriminant analysis classifier, 83% and 67% was achieved as sensitivity and specificity, respectively. On the other hand, authors in [3] extracted spatio-temporal gait parameters as well as convolutional statistical features for discriminating between FoG patients and non-FoG patients. An accuracy of 88% has been achieved using SVM classifier. The proposed model has been tested on 51 PD patients with tri-axial accelerometers. Also, authors in [4] used lower-limp accelerometers for FoG events prediction. Window length of 2 s was implemented using three features namely; freeze index (FI), Discrete Wavelet Transform (DWT), and sample entropy based on the use of ground truth for classification. The proposed model in this study achieved an average of 87.6%, 87.6% and 87.5% for accuracy, sensitivity and F − measure, respectively. This paper proposes a novel multi-feature model for predicting FoG episodes in patients with PD. The proposed approach considers FoG prediction as a multiclass classification problem with 3 classes; namely, normal walking, pre-FoG, and FoG events. FoG prediction in the proposed model is achieved through pre-FoG events recognition. The significance of this paper is to assess the impact of using multi-feature fusion on the performance of FoG episodes prediction. Accordingly, the main contributions of this work are summarized as follows: – Editing the tested time-series dataset by adding a new label (pre-FoG), which featured all episodes comes exactly before FOG episodes. – Deep feature learning using CNN for time-frequency analysis features based on spectrogram images. – Feature engineering of various hand-crafted time-domain statistical features. – Multi-feature fusion and PCA-based dimensionality reduction for the resultant fused statistical and time-frequency analysis multi-feature set. The remaining of this paper is organized as follows: Sect. 2 presents an overview of the proposed model structure. Obtained experimental results have been illustrated and discussed in Sect. 3. Conclusions and future work have been introduced in Sect. 4.
338
2
H. El-ziaat et al.
The Proposed Model
This section describes the phases of the proposed model; namely Data Preparation, Feature Extraction and Fusion, and Classification, as illustrated in Fig. 1. First of all, a brief description of the dataset used for testing and validation of the proposed approach is presented.
Fig. 1. General structure of the proposed FoG prediction model
2.1
Dataset
The publicly available Daphnet FoG dataset [5] has been used for validating the proposed approach. It contains time-series data collected in the lab, while
Hand-Crafted and Learned Features Fusion for Predicting Freezing
339
performing several walking tasks, from ten participants with PD. The benchmark Daphnet dataset has been recorded using three wearable tri-axial accelerometer sensors attached to the shank (ankle), the thigh (above the knee) and to the trunk (lower back) of each subject. Over eight hours of recorded data, 237 FoG events have been identified by professional physiotherapists during the study. Data samples have been originally labeled as 0, 1, and 2 corresponding to outof-experiment, no-FoG (normal walking), and FoG events, respectively. 2.2
Data Preparation
According to the description of the original validation Daphnet dataset, it doesn’t contain samples labeled as 3 corresponding to pre-FoG events. So, before starting any calculations in the data preparation phase, all unified labels, for a window time, before label 2 have been converted into label 3 for featuring the pre-FOG episodes with a new label. Figure 2 depicts an example of signal pattern for walking, pre-FoG, and FoG data. Accordingly, the data preparation phase includes the four steps of angularaxes values calculation, magnitude calculation, windowing, and spectrogram generation.
Fig. 2. An example signal pattern for FoG data categories
Angular-Axes Values Calculation. The angular-axes [6], also known as axes of rotation, provide a way to represent the 3D orientation of an object using a combination of three rotations about different axes. In this step, the values of principle x, y, and z axes from the 3 tri-axial accelerometer sensors have been used to calculate new angular-axes values; Roll (r), Pitch (p), and Yaw (y), as shown in Eqs. (1), (2), and (3), respectively, where Π = 3.14 is constant. (1) Roll(r) = 180 ∗ arctan(y/ (x2 + z 2 ))/Π P itch(p) = 180 ∗ arctan(x/ (y 2 + z 2 ))/Π
(2)
Y aw(y) = 180 ∗ arctan(z/ (x2 + y 2 ))/Π
(3)
340
H. El-ziaat et al.
Magnitude Calculation. After having the original accelerometer data converted into angular-axes data, the magnitude of the obtained values has been derived from each record of the three angular-axes values, according to Eq. (4). (4) M agnitude = (rs2 + p2s + ys2 ) Where s refers to the used sensor of ankle (shank), knee (lower thigh), or trunk (lower back) . Windowing. During the windowing step, the calculated magnitudes of motion data for each sensor have been sliced into partially overlapping fixed-sized windows based on the data labels. Firstly, in order to prepare the data for multi-feature fusion of time-domain statistical features and time-frequency spectrogram CNN-learned features, a windowing method of 1 s window size (67 samples, 15 msec each) has been adopted. For this method, a group of similarly labeled data samples with size less than 1 s, accidentally appearing as a FoG-like behavior, have been neglected. Also, partial overlapping has been applied in case of having a remaining group of similarly labeled samples, but with size less than 67 samples. Using this method of windowing should guarantee generating spectrogram images of the same size in the next step. Spectrogram Generation. The conventional time-domain or frequencydomain analysis cannot fully describe non-stationary/non-periodic signals, whose frequency content varies with time [7]. So, in order to gain frequencydomain as well as time-domain related information of real-life signals, it is recommended to use time-frequency analysis. Spectrogram is one of the basic visual tools for displaying the time-frequency analysis information. In this paper, the Short-Time Fourier Transform (STFT) has been used as the time-frequency analysis algorithm of windowed sensors readings for generating the corresponding spectrograms [8]. 2.3
Feature Extraction and Fusion
The second phase of the proposed model is divided into feature extraction step and feature fusion and dimensionality reduction step. Feature Extraction. Based on the resultant windowed magnitudes as well as the spectrogram images generated from the data preparation phase, two extraction schemes have been investigated, as follows: i. Feature engineering for various hand-crafted time-domain features. The generated feature set is a time-domain statistical feature set that involves features manually engineered/calculated based on measures typically used in motion or human activity pattern recognition problems. The extracted
Hand-Crafted and Learned Features Fusion for Predicting Freezing
341
statistical features in this study are Variance, Standard deviation, Median, Mean, Maxima, Minima, and Range. ii. Deep feature learning for time-frequency analysis of spectrogram images. The generated feature set based on CNN deep learning for features reflecting the time and frequency analysis of spectrogram images. Recently, the Convolutional Neural Network (CNN) is the most widely used deep learning model showing outstanding performance for image processing and feature learning [9]. Various CNN models are cable of automatically learning features to capture complicated visual variations. Typically, CNN consists of several convolutional layers, pooling layers, and fully-connected layers. The main aim of CNN is the automatic and adaptive learning of spatial hierarchies of useful features, from low to high level patterns. In this paper, a CNN conventional model has been used for learned feature extraction. Feature Fusion and Dimensionality Reduction. After feature extraction, each pair of the resultant feature vectors from the extracted statistical and timefrequency analysis feature sets have been concatenated to form a higher dimensionality fusion feature set. In order to avoid incompatibility due to fusion of different features, normalization has been performed to transform the features into a unified range of [0, 1]. On the other hand, high dimensionality leads to the problems of higher computation complexity and time consumption. So, Principal Component Analysis (PCA) algorithm [10] has been adopted for reducing the dimensionality of the fused feature vectors. 2.4
Classification
For the classification phase, k-fold cross validation has been adopted for training and validation of various ML classifiers using the resultant multi-feature fused feature set. Then, the trained ML models have been implemented for testing the performance of the proposed FoG detection approach. The implemented ML classifiers [11,12] are: Random Forest, Bagging, Logistic Regression, Adaptive boosting (Adaboost), K-Nearest Neighbor (KNN), Decision tree, and Support Vector Machine (SVM) with various kernel functions.
3
Results and Discussion
This section presents and discusses the experimental outcomes of using the proposed feature fusion scheme for FoG episodes prediction. The experiments have been conducted on Google Colab with K80 GPU and 15GB memory. The proposed approach has been designed and implemented with Tensorflow and Keras using Python environment on Linux platform. The metrics that have been used for performance evaluation are Accuracy, Recall, Precision, and F − measure, which have been calculated according to Eqs. (5), (6), (7), and (8), respectively.
342
H. El-ziaat et al.
Where, the terms TP, FP, TN, and FN are True Positive, False Positive, True Negative, and False Negative, respectively. Accuracy =
TP + TN TP + FN + TN + FP
(5)
TP TP + FN TP P recision = TP + FP P recision ∗ Recall F − measure = 2 ∗ P recision + Recall
(6)
Recall =
(7) (8)
For each of the resulted feature sets, data has been divided into 80% for training and 20% for testing. The k-fold cross validation method has been applied, with k = 5, for training and validation of the proposed model. Figure 3 shows the obtained classification accuracy of implementing the previously stated ML classifiers for FoG prediction using the three implemented feature extraction schemes. In addition, the Stacked Ensemble of classifiers algorithm has been also applied for combining the SVM kernels as well as combining all the applied ML classifiers. As depicted in Fig. 3 (ii), (iv), and (vi), it has been observed that using multi-feature fusion outperforms using time-domain feature extraction and spectrogram based CNN-learned time-frequency features schemes, both with angular-axes data and PCA, for FoG prediction. In that case, the highest enhancement in prediction accuracy is 17% using Adaboost classifier and 9.2% using Sigmoid-SVM classifier, against time-domain and time-frequency CNN-learning schemes, respectively. Also, as depicted in Fig. 3 (ii) and (iv), for the spectrogram based time-frequency CNN-learned features using angular-axes data and PCA, the best improvement in FoG prediction performance against Table 1. Performance measures of ML classifiers using multi-feature fusion Classifiers
Metrics Precision
Recall
F-measure
using PCAwithout PCAusing PCAwithout PCAusing PCAwithout PCA Random Forest
87.3%
83.5%
87.6%
83.3%
87.5%
Bagging
85.6%
82.3%
85.8%
82.4%
85.6%
83.7% 82.5%
Logistic Regression
84.5%
81.4%
84.6%
81.4%
84.7%
81.5%
Adaboost
85.3%
79.6%
85.4%
79.6%
85.1%
79.6%
KNN
83.6%
80.4%
83.5%
80.6%
83.4%
80.6%
Decision Tree
68.1%
66.6%
68.6%
66.5%
68.3%
66.4%
Linear-SVM
84.5%
81.3%
84.5%
81.1%
84.6%
81.5%
RBF-SVM
82.3%
78.2%
82.3%
78.3%
82.4%
78.3%
Polynomial-SVM
65.5%
65.3%
65.5%
65.1%
65.5%
65.1%
Sigmoid-SVM
86.4%
79.1%
86.6%
79.5%
86.5%
79.6%
Ens-Staking-SVMs
83.4%
79.7%
83.6%
79.2%
83.8%
79.8%
Ens-Staking-AllClassifiers85.8%
81.7%
85.5%
81.8%
85.3%
81.1%
Hand-Crafted and Learned Features Fusion for Predicting Freezing
i
ii
iii
iv
v
vi
343
Fig. 3. Classification accuracy of FoG prediction using different feature extraction schemes: (i)&(ii)Time-domain statistical features, (iii)&(iv)Time-frequency spectrogram based CNN learned features, (v)&(vi) multi-feature fusion. (i)&(iii)&(v) without using PCA, (ii)&(iv)&(vi) using PCA
using time-domain statistical feature engineering is 13.5% using Decision Tree classifier. Moreover, Fig. 3 (i), (iii), and (v) present the degrading impact of not using dimensionality reduction on FoG prediction accuracy by depicting the achieved accuracies considering the three feature extraction schemes, using both angularaxes and principle-axes data. For the spectrogram based CNN-learned time-
344
H. El-ziaat et al.
frequency features, the FoG prediction achieve an improved accuracy of 7.3%, 4.1%, and 2.0% using Adaboost, KNN, and Random Forest ML classifiers, respectively, against using statistical features engineering. On the other hand, an enhancement of 5.1% and 7.1% has been achieved using feature fusion with Random Forest classifier against using time-frequency and time-domain features, respectively. Also, as it is well-known that accuracy cannot always act as an accurate indicator for model performance. Therefore, as summarized in Table 1, we used additional performance measures; namely precision, recall, and F − measure, in order to reflect the performance of the implemented ML classifiers considering the multi-feature fusion approach using the angular-axes data, with and without applying PCA algorithm. Observations summarized in Table 1 matches and confirms the ones reached from Fig. 3. As also shown in Table 1, the positive impact of using PCA for dimensionality reduction is noticeable. Accordingly, based on the observations summarized in Table 1 and depicted in Fig. 3, the best performance of FoG episodes prediction has been achieved through adopting the proposed multi-feature fusion approach, using angularaxes data, with applying PCA dimensionality reduction.
4
Conclusions and Future Work
In this paper, several feature extraction approaches have been investigated for FoG prediction in patient with PD. Among them, multi-feature fusion approach has been adopted for combining time-domain statistical and time-frequency analysis features with PCA dimensionality reduction. That has led to the observation of the positive impact of using the proposed feature fusion for providing a meaningful representation of the pre-FoG class. It has also been shown that the features learned in the unsupervised manner, using CNN deep feature learning with spectrograms, have been more discriminative than state-of-the-art representations based on time-domain statistical features. Accordingly, it has been predominantly concluded that the best enhancement in FoG episodes prediction has been achieved through adopting the proposed multi-feature fusion approach, followed by adopting the CNN-based spectrogram time-frequency analysis feature learning approach, with applying PCA dimensionality reduction. For future work, various challenges could be considered in the domain of predicting the FoG episodes via pre-fog behavior detection as well as predicting FoG severity.
References 1. Spildooren, J., Vercruysse, S., Desloovere, K., Vandenberghe, W., Kerckhofs, E., Nieuwboer, A.: Freezing of gait in Parkinson’s disease - the impact of dual-tasking and turning. J. Movement Disorders 25, 2563–2570 (2010) 2. Palmerini, L., Rocchi, L., Mazilu, S., Gazit, E., Hausdorff, J.M., Chiari, L.: Identification of characteristic motor patterns preceding freezing of gait in Parkinson’s disease using wearable sensors. Front. Neurol. 8, 394 (2017)
Hand-Crafted and Learned Features Fusion for Predicting Freezing
345
3. Aich, S., Mohan Pradhan, P., Park, J., Sethi, N., Vathsa, V.S.S., Kim, H.C.: A validation study of freezing of gait (FoG) detection and machine-learning-based FoG prediction using estimated gait characteristics with a wearable accelerometer. Sensors 18(10) (2018) 4. Naghavi, N., Wade, E.: Prediction of freezing of gait in Parkinson’s disease using statistical inference and lower–limb acceleration data. IEEE Trans. Neural Syst. Rehabil. Eng. 27(5) (2019) 5. Bächlin, M., Plotnik, M., Roggen, D., Maidan, I., Hausdorff, J.M., Giladi, N., Tröster, G.: Wearable assistant for Parkinson’s disease patients with the freezing of gait symptom. IEEE Trans. Inf. Technol. Biomed. 14, 436–446 (2010) 6. Pasciuto, I., Ligorio, G., Bergamini, E., Vannozzi, G., Sabatini, A., Cappozzo, A.: How angular velocity features and different gyroscope noise types interact and determine orientation estimation accuracy. Sensors 15(9), 23983–24001 (2015) 7. Boualem, B.: Time-Frequency Signal Analysis and Processing: A Comprehensive Reference, 2nd edn. Academic Press, London (2016) 8. Wang, L., Wang, C., Chen, Y.A.: Fast three-dimensional display method for timefrequency spectrogram used in embedded fault diagnosis devices. Appl. Sci. 8, 1930 (2018) 9. Ahn, J., Park, J., Park, D., Paek, J., Ko, J.: Convolutional neural network-based classification system design with compressed wireless sensor network images. PLoS ONE 13(5), e0196251 (2018) 10. Mazilu, S., Calatroni, A., Gazit, E., Roggen, D., Hausdorff, J.M., Tröster, G.: Feature learning for detection and prediction of freezing of gait in Parkinson’s disease. In: Perner, P. (ed.) Proceedings: The International Workshop of Machine Learning and Data Mining in Pattern Recognition, MLDM 2013, New York, USA, 19-25 July 2019. Lecture Notes in Computer Science, vol. 7988, pp. 144–158. Springer, Heidelberg (2013) 11. Amalina, F., Feizollah, A., Anuar, N., Gani, A.: Evaluation of machine learning classifiers for mobile malware detection. Soft Comput. 20(1), 343–357 (2016) 12. Trivedi, S.: A study of machine learning classifiers for spam detection. In: Proceedings: The 4th International Symposium on Computational and Business Intelligence, ISCBI 2016, Olten, Switzerland, 5–7 September, pp. 176–180. IEEE (2016)
Signature of Electronic Documents Based on Fingerprint Recognition Using Deep Learning Souhaïl Smaoui1(B) , Manel Ben Salah2 , and Mustapha Sakka1 1 Higher Institute of Technological Studies Sfax, Sfax, Tunisia
[email protected], [email protected] 2 Higher Institute of Technological Studies Kairouan, Kairouan, Tunisia
[email protected]
Abstract. Our paper presents a new method to securing electronic documents. This method integrate hybrid technologies such as biometrics founded on fingerprint recognition, PDF417 [3] coding, encryption methods, and electronic document signing. This method practices these techniques to enhance the signature security and therefore the signer’s authentication guarantee. Authentication is a task asked by several domains to certify the security and information uniqueness. In our method we have opted for the use of fingerprint recognition techniques to certify a great level of confidentiality. To do this we have prepared a database holding fingerprints for an important number of people. The identification and classification of fingerprints is done through a convolutional Neural network of Deep Learning [8]. Keywords: Authentication · Fingerprints · Convolutional neural networks
1 Introduction The volumes of documents created, exchanged or managed are increasing day by day in incredible ways. The most commonly used example is the secure transfer of documents called, also, document signing or digital signature. In our approach, we have opted for the following structure: The next section shows the problematics of such a research subject. The succeeding two sections present the state of the art of fingerprint recognition work. Section 5 discusses the principle process of our approach, their advantages and disadvantage. We detail the different steps that we used during our work. Section 6 is devoted to presenting the results of the different experiments. The latter section contains conclusions and some thoughts on possible openings and our prospective work. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 346–354, 2021. https://doi.org/10.1007/978-3-030-49342-4_33
Signature of Electronic Documents Based on Fingerprint Recognition
347
2 Problematics When exchanging information or documents, some applications use security standards such as cryptography and information coding. However, the latter sometimes have shortcomings in relation to the authentication of the document issuer and the transactions confidentiality: How can one ascertain the messages origin and guarantee their diffusion without taking risks? The problematic of the proposed solutions derives from the collection of methods of encryption, coding, authentication and biometrics. In this setting that this work is located in which we will show our document signing solution. This solution is based on a hybrid crypto biometric approach based on fingerprint recognition. The process of fingerprint recognition is a complex mission when it comes to machine-based automatic processing without any smart and human thinking. This task may fail with the presence of new situations i.e. darkness, high brightness, position variation, etc. The performance of such a recognition system is related to the quality of the data used during the learning phase. A variation in the environment and the conditions of acquisition of fingerprints causes a huge change in the data to be stored, and subsequently on the performance of the classifier [21]. Fingerprint recognition is not evident for several reasons such as different fingerprint orientations (Fig. 1) positional variations (Fig. 2) physical differences (Fig. 3) and variation brightness may complicate this recognition (Fig. 4).
Fig. 1. Different fingerprint orientations
Fig. 2. Variation of positions
Fig. 3. Physical variation
To deal with these problems, several techniques are put into effect.
348
S. Smaoui et al.
Fig. 4. Brightness variation
3 Digital Fingerprint Recognition Process Fingerprint recognition is a complex process that takes place in a series of operations. We can group these operations into three major steps. They are the preparation of the data, the data mining which is the central stage of the process and finally the validation of the model thus elaborated (Fig. 5).
Fig. 5. General process of fingerprint recognition
The data preparation stage concerns the acquisition and digitization of images from a specific environment. It is in this phase to determine the general structure of the data and the rules used to build them. The exploitable information must be identified and its quality and effectiveness verified. This phase is the focal point of the process of fingerprint recognition; it is it that will determine the quality of the established model. Data mining is the stage of research of the model, which will also be called the modeling phase, which consists in extracting useful knowledge from all the data prepared in the previous phase and presenting it in a synthetic form. This phase is based on exploratory research, i.e. without any prejudices concerning the relationships between the data. The prediction model established in this step will be used to recognize fingerprints. After the construction of the model, the last step is dedicated to the evaluation and validation of the relevance of the prediction rules. It is preferable to first establish a test base for testing only. The latter makes it possible to validate the model and finally reach the stage of knowledge.
Signature of Electronic Documents Based on Fingerprint Recognition
349
4 Techniques of Recognition of Digital Fingerprints In literature we distinguish two principal methods of fingerprints recognition: global and geometrical methods [13, 14]. 4.1 Global Methods Global methods are essentially based on values of pixel where the image is processed in bulk without segmentation. These methods generally rely on a learning phase where techniques such as Neural Networks (RN), Vector Support Machines (SVM) can be used. If these approaches are simple and have a high recognition rate, however, we can blame them for the slowness of the learning phase [4, 5]. In global methods, stochastic methods, parametric methods and nonparametric methods are still distinguished. Stochastic methods are founded on scanning the image from top to bottom. There is a natural order in which the features appear, and therefore can be modeled using a Hidden Markov Model (HMM) [11]. This model confronts the problems of shooting the input images. Parametric methods consist in making a hypothesis regarding the analytical form of the probability distribution sought, and in estimating the parameters of this distribution from the available data. As for nonparametric methods, they make no assumptions about the distribution of learning data, unlike parametric methods. 4.2 Geometrical Methods Geometrical methods kind it possible to complete attaining the “signature” of the imprint. From an image of the previously processed fingerprint, a data structure (or signature) is extracted using different algorithms [1]. The signature chosen to characterize the fingerprint is based on a sufficient and reliable set of minutiae around 14. It is then appropriate to identify a fingerprint among many millions of copies. Generally, each minutia uses about 16 bytes spaces (Fig. 6).
Fig. 6. Extraction of the minutiae
During the extraction process, averages of 100 minutiae are initially detected, of which about 60% relate to incorrect minutiae which will be identified in a later process. We extract, therefore, 4 real minutiae of the fingerprint.
350
S. Smaoui et al.
Minutiae number is significantly greater than 10, which enhances the reliability. Besides, this number is far from the entire of detected minutiae, which suggests that having kept only the best reliable ones, the erroneous minutiae which could have disturbed the solution behavior have been eliminated [10]. This system will be able to identify the possessor of a fingerprint with encouraging accuracy and will have the ability to reject a fingerprint when the system is uncertain of the outcome. The presented system has been structured into three stages: – Pretreatment of a fingerprint image. – Extraction of the devices that represent the fingerprint. – Classification of the fingerprint for a decision or rejection. This approach suffers from two handicaps: position variation and physical variation.
5 Adopted Approach Our approach contains three stages: The first is dedicated to the corpus, as well as the data preparation for the training phase. The digital image of fingerprint will be processed at the resolution of 500 Points/Pixel, so at least 10 pixels must exist between the edges. This resolution is essential for the minutiae extraction stage, since central fingerprint point can change from one person to another. The original image is processed through a Gabor filter bank. Every filter is produced by producing a filter image for 6 angles (0, π/6, π/3, π/2, 2π/3 and 5π/6) (Fig. 7).
Fig. 7. Filter of Gabor
The processing gives us the following Fig. 8. The next stage is to find a prediction model. In second stage we unveiled, the importance of the Convolutional Neural Network (CNN) for such a recognition application [22]. The third stage concerns the validation of the model. Therefore, to train and evaluate our proposed fingerprint recognition method, a fingerprint image dataset is required. The construction of the training base is a significant
Signature of Electronic Documents Based on Fingerprint Recognition
351
Fig. 8. Extraction of the minutiae
Fig. 9. Examples of used images
component in a process of data mining. Figure 9 shows some examples of the used images. Figure 10 shows examples of images used to evaluate the accuracy of our detection method.
Fig. 10. Brightness variation
We have used the High-Resolution-Fingerprint Database from The Hong Kong Polytechnic University [7] with consists of 1,480 fingerprint images from 148 fingers, each having five images captured. The images have a resolution of 1200 DPI, have size 320 × 240 pixels, and depict the central region of the finger. To optimize the training process by CNN we have opted to reduce the mass of data by using the bi-cubic technique. The advantage of this technique is the use of the low-pass filter allowing the low frequencies to pass and eliminating the high ones with the aim of decreasing crenellations. A reduced size of the images accelerates the training process and improves the classification reliability [23].
352
S. Smaoui et al.
In our experiment, we opted to resize the size of the images by 40%. This threshold assures great performance both in the training phase and in the phase test. Reduction threshold was specified after an intensive series of experiments. Our aim is to get the most suitable model. We have indeed proposed to classify the fingerprints by a supervised approach using a CNN. The training process is performed on 80% of the fingerprints; the rest will be used to evaluate the classifier. To start the training process, we defined the vector and the required neurons constituted by the vectors relating to the fingerprints to be learned. Thereafter, we use a stack of layers of independent processing (Layers). We applied a 2D convolutional layer (convolution2dLayer) applying sliding convolutional filters to the input followed by a batch normalization layer (BatchNormalizationLayer). In order to improve the processing efficiency of our system, we used a correction layer (reluLayer). A pooling layer (maxPooling2dLayer) is also applied in our algorithm which represents a form of subsampling of the image. After several layers of convolution and max-pooling, the high-level reasoning in the neural network is via fully connected layers (fullyConnectedLayer). Finally, we used a loss layer (softmaxLayer) that specifies how network training penalizes the gap between the expected and actual signal and a classification layer (classificationLayer) that calculates the entropy loss. The experiments carried out show a degradation in the accuracy of the classification by changing in the setting of the “InitialLearnRate” option. In our method we have used the sigmoid activation function. The sigmoid function is the best in building the neural network Example of sigmoid’s function: ϕ(v) =
1 1 + exp(−v)
Our training process is based on retro propagation algorithm of the gradient. We used TRAINCGP method proposed by Polakand R. [9] which is characterized by its adjusted algorithm. The number of hidden layers has a big impact on the structure of the neural network, for these reasons, we practical sets of tests in order to define a threshold equivalent to the square root among the data by input vector. Furthermore, the performed experiments indicate a degradation of the classification performance with a hidden layers number lesser than the threshold, however using a number greater than this threshold we obtained a slowness in the phase of training without increasing performance. The last step is dedicated to validate and evaluate the model which will be determined in the following section.
6 Experiments This section presents the experiments realized with CNN algorithm. Our experiment use a training base made up of 1.480 images and 296 images of test.
Signature of Electronic Documents Based on Fingerprint Recognition
353
Table 1. Results of experiments Time of training
350 s
Time of test
10 s
Rate of success RT 91% Rate of failure RF
9%
Table 1 presents the results of our experiments. Our proposed approach has reached 91% for recognition of fingerprint, which is a significant rate compared to some proposed techniques. In fact, H. U. Jang and his colleagues [17] proposed the CNN classifier to recognize finger print with a rate of 84.85%, V. Anand and V. Kanhangad [18] did ameliorate the rate to reach 85% using Minutiae features and SVM classifier, A.T. Gowthami [19] also realized almost the same rate 94.32% by exploring Zone Based Linear Binary Patterns with neural network classifier, also Thai and his colleagues [20] proposed to use standardized fingerprint model in order to ameliorate the rate of recognition. Figure 10 shows the advantage of deep learning compared to the deep Port. This performance of the use of deep learning is manifested especially by increasing the number of classes. Hence, we can conclude that our approaches are promoting for recognizing finger print within a high level of precision.
7 Conclusion and Prospects In our application we presented our approach of signing documents based on fingerprint recognition using a data mining approach. It is important to mention that the results presented in this work were made using larges databases. This proves that the proposed method was designed to be easily adaptable to the number of people and the number of images regardless of the biometric measurement used. The results obtained show the contribution of our approach for such a recognition application. In future work, the proposed method will aim to reduce the complexity of CNN network architectures and minimize the percentage of information and submodules for designing CNN networks. We propose moreover to improve the performance of our approach by integrating other techniques extraction. We also intend to study other classification algorithms.
References 1. Isobe, Y., et al.: A proposal for authentication system using a smart card with fingerprints. Inf. Process. Soc. Jpn. SIG Notes 99(24), 55–60 (1999) 2. Wu, J.-K.: Neural networks and simulation methods. Editeur CRC Press, December 1993. ISBN 0-8247-9181-9 3. AIM Europe Uniform Symbology Specification PDF417, published by AIM Europe. [Spécification relative à la symbologie PDF 417 uniforme de l’AIM Europe, publiée par l’AIM Europe, 1994] (1994)
354
S. Smaoui et al.
4. Labati, R.D., Piuri, V., Scotti, F.: A neural-based minutiae pair identification method for touch-less fingerprint images. In: 2011 IEEE Workshop on Computational Intelligence in Biometrics and Identity Management (CIBIM) (2011) 5. Kristensen, T., Borthen, J., Fyllingsnes, K.: Comparison of neural network based on fingerprint classification techniques, pp. 1043–1048 6. Djalal, D.: Application des réseaux de neurones pour la gestion d’un système de perceptron pou un robot mobile d’intérieur. Thèse préparer au laboratoire d’électronique avancée (LEA) Batna 7. Zhao, Q., et al.: Adaptive fingerprint pore modeling and extraction. Pattern Recogn. 43(8), 2833–2844 (2010) 8. Dreyfus, G., Personnaz, L.: Perceptrons, past and present. Organisation des systèmes intelligents (1999) 9. Grippo, L., Lucidi, S.: Convergence conditions, line search algorithms and trust region implementations for the Polak–Ribière conjugate gradient method. Optim. Methods Softw. 20(1), 71–98 (2005) 10. Maio, D., Maltoni, D., Cappelli, R., Wayman, J.L., Jain, A.K.: FVC2004: third fingerprint verification competition. In: Proceedings of the International Conference on Biometric Authentication (ICBA), Hong Kong, pp. 1–7, July 2004. https://doi.org/10.1007/978-3-54025948-0_1 11. Samaria, F.S., Harter, A.C.: Parameterization of a stochastic model for human face identification, pp. 138–180. IEEE Computer Society Press (1994) 12. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015) 13. Ben Salem, M., Ettabaâ, K.S., Bouhlel, M.S.: Anomaly detection in hyperspectral images based spatial spectral classification. In: SETIT 2016, pp. 166–170 (2016) 14. Zouari, J., Hamdi, M.: Enhanced fingerprint fuzzy vault based on distortion invariant minutiae structures. In: SETIT 2016, pp. 491–495 (2016) 15. Smari, K., Bouhlel, M.S.: Gesture recognition system and finger tracking with kinect: steps. In: SETIT 2016, pp. 544–548 (2016) 16. Bana, S., Kaur, D.: Fingerprint recognition using image segmentation. Int. J. Adv. Eng. Sci. Technol. (IJAEST) 5, 12–23 (2011) 17. Jang, H.U., Kim, D., Mun, S.M., Choi, S., Lee, H.K.: DeepPore: fingerprint pore extraction using deep convolutional neural networks. IEEE Sig. Process. Lett. 24(12), 1808–1812 (2017) 18. Anand, V., Kanhangad, V.: Pore detection in high-resolution fingerprint images using deep residual network. J. Electron. Imaging 28(2), 1–4 (2019) 19. Gowthami, A.T., Mamatha, H.R.: Fingerprint recognition using zone based linear binary patterns. Proc. Comput. Sci. 58, 552–557 (2015) 20. Thai, L.H., Tam, H.N.: Fingerprint recognition using standardized fingerprint model. IJCSI Int. J. Comput. Sci. 11–17 (2010) 21. A grey wolf optimizer for modular granular neural networks for human recognition. Comput. Int. Neurosci. 2017, 4180510:1–4180510:26 (2017) 22. Optimization of modular granular neural networks using a firefly algorithm for human recognition. Eng. Appl. AI 64, 172–186 (2017) 23. Optimization of modular granular neural networks using hierarchical genetic algorithms for human recognition using the ear biometric measure. Eng. Appl. AI 27, 41–56 (2014)
Comparison of a Trajectory Controller Based on Fuzzy Logic and Backstepping Using Image Processing for a Mobile Robot Rodrigo Mattos da Silva1 , Thiago Rodrigues Garcia1 , Marco Antonio de Souza Leite Cuadros2 , and Daniel Fernando Tello Gamarra1(B) 1 Universidade Federal de Santa Maria, Santa Maria, RS 97105-900, Brazil
[email protected], [email protected], [email protected] 2 Instituto Federal do Espirito Santo, Serra, ES 29173-087, Brazil [email protected]
Abstract. This work aims to compare the application of two controllers for a mobile robot in a trajectory tracking task. The first method uses a heuristic approach based on the prior knowledge of the designer, while the second method uses a mathematical model based on the robot kinematics. Both systems employ the estimated robot position derived from an image processing algorithm. The paper shows experimental results with a real robot following a predefined path to explore the use of these techniques. Keywords: Fuzzy logic · Backstepping · Image processing · Mobile robot
1 Introduction In the last few years, the number of applications involving mobile robots has been escalating and one of the factors responsible for that is the diversity and relevance of the tasks that this category of robot may perform [1]. The referred tasks require that the robot could be able to perform actions such as following a given trajectory and knowing its localization. The main contribution of this paper aims to present comparative approach of two control systems to resolve the trajectory tracking problem using computational vision. Thus, this paper presents a simple fuzzy trajectory tracking controller for differential drive robots based on heuristic approach alongside a backstepping kinematic model based controller. The paper shows experimental results and highlight the advantages and limitations of both implementations. There are several approaches to trajectory control, these methods can be classified into two categories, the classic control approach [2–4], and the heuristic approach [5, 6]. This work presents a backstepping controller based on kinematics model and a heuristic control method based on fuzzy logic. This approach was previously explored by Omrane et al. [6] with a fuzzy controller to track trajectory and avoid obstacles based on the © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 355–364, 2021. https://doi.org/10.1007/978-3-030-49342-4_34
356
R. Mattos da Silva et al.
encoder information using a simulator in Matlab called SIMIAN. Xiong et al. [7] applied a method based on machine vision and fuzzy control for intelligent vehicles driving, the fuzzy controller replaces a traditional PID controller. Saifizi et al. [8] explored a similar approach using a fuzzy logic controller and computer vision in order to recognize a circle as a landmark and identify the distance and orientation by knowing the diameter of the circle and the calibration of the camera. Kanayama et al. [9] shows a robot-independent tracking control rule for non-holonomic systems that can be applied to mobile robots with a dead reckoning ability and the stability is proved through the use of a Liapunov function. Fierro et al. [5] proposes a control structure that makes possible the integration of a Backstepping controller and a neural network (NN). Precup et al. [10] design a fuzzy logic control system to stabilize the Rössler chaotic dynamical system, a system of three non-linear ordinary differential equations that define a chaotic continuous-time dynamical system, based on a Takagi-Sugeno-Kang and the stability is guaranteed by Lyapunov’s theory. The asymptotic complexity of algorithm is proved to be lower in comparison with linear matrix inequality-based fuzzy logic control algorithms. Many of fuzzy controller schemes were developed using Takagi– Sugeno type state-space fuzzy model that is based on the plant physical model. Chatterjee et al. [11] states that in practical cases, the plant model may have uncertainties due to variations in plant parameters, sensor imperfections and may be severely affected by load disturbances. Thus, Chatterjee et al. [11] proposes a stable state-feedback fuzzy controller for a robot arm. It was developed an approximate Takagi–Sugeno type neurofuzzy state space model for the plant, using input–output experimental data collected from the real robotic arm and then training the model using a particle swarm optimization (PSO) technique. The fuzzy controller was develop on the basis of the trained neuro-fuzzy state-space model employing parallel distributed compensation. One of the techniques that could be used to estimate position and orientation of robots is image processing. Santana et al. [12], for example, proposed a system to localize a robot using preexisting lines detected on floor images and odometry, an extended Kalman filter is employed for the fusion of the information of the image processing and the Odometry. Borenstein et al. [13] states that the methods of estimation of position and orientation may be summarily categorized in two groups according to their measurement: relative and absolute, the image processing method that is used in the paper could be classified as a relative one. The article will tackle the problem of following a trajectory with a mobile robot using two controllers, one heuristic controller based on Fuzzy logic and the other one based on backstepping. In contrast to these controllers design approaches, the motivation of this paper is to design a controller that is easy to build, test and deploy in real world environments without necessarily knowing its dynamics. Based on this is introduced a simple fuzzy trajectory tracking controller for differential drive robots using a heuristic approach which output variables are linear and angular velocities of the robot, commonly used in commercial robots. The present article presents the relative method implementation and it is divided into six sections, after a brief introduction in the first section, the second section approaches the image processing method, In the third section, the fuzzy control approach is explained, The fourth section explains the backstepping
Comparison of a Trajectory Controller Based on Fuzzy Logic
357
controller, the fifth section shows the obtained results, followed by the conclusions in the last section.
2 Image Digital Processing Image processing has different applications in robotics [14]. The objective of using image processing is to analyse the images and determine the localization of the mobile robot in the space and the localization of the goal to which we would like that the robot could arrive. The images are captured with a camera that has a wireless network connection and is located in the roof of the laboratory pointing out to the laboratory’s floor. The image processing is written using the python programming language and the library of computational vision OpenCV, in a notebook that is on the robot. In order to make sure that the algorithm could identify and localize the robot, green circular markers of different sizes were positioned on the robot (Fig. 1). An important step on the image processing is the segmentation. The simplest method used for segmentation is the image thresholding, that establishes a threshold value for the grayscale pixels values in the image.
Fig. 1. Mobile robot with differential traction.
The method employed for the markers’ segmentation makes a color segmentation. The robot markers are green, therefore, the segmentation process must classify the green color pixels of a determined range that will become white pixels and pixels out of this range will become black. Finally, the centroid of each circle is calculated. Using the coordinates x and y of the circles’ centroid, it is possible to draw a line that passes through those two points. The slope angle of the straight line is the robot orientation angle in the Cartesian space. The slope angle of the straight line in relation of the abscissas axis is obtained with the following trigonometrical relation: y2 − y1 θ = arctan x2 − x1 Where x1 , y1 and x2 , y2 correspond to the markers’ centroid. After determining the robot orientation angle, the last step consists in localizing in the image the point that will be the goal that the robot should reach. It was established that the goal point could have a marker with a minor area. Figure 2a shows the result of the segmentation with the goal point included.
358
R. Mattos da Silva et al.
Fig. 2. 2a and 2b: Segmentation results of the three markers (2a). Final result of all the necessary variables for the Fuzzy controller (2b).
It is necessary to have a reference for the robot frontal position, for our application was established that the medium area circle will be the frontal robot position. So, the circle of biggest area will be the backward part of the robot. The distance between the robot and the goal is computed, which corresponds to the distance between the centroid of the biggest area (robot’s backward) and the smallest segmented area of the image (goal point). The angle α that corresponds to the straight line that crosses those two points is the orientation of the goal point in the Cartesian space.
3 Fuzzy Controller From the information obtained in Sect. 2, the fuzzy controller is elaborated for the robot to move from one point to another. The fuzzy controller aims to minimize the angle γ and the distance between the two straight lines and the fuzzy rules are represented by two inputs and two outputs. The basic model of the controller known in the literature and the fuzzy rules were proposed in [15, 16]. The controller receives a distance D and the difference in angle γ between the robot and the point of destination returns the linear and angular speed necessary to minimize, respectively, the distance and the difference in angle. In the Figure below, it is possible to observe the graphic representing the center of mass of the mobile robot x, y and the point that is intended to achieve xr , yr . Where γ = α − β. The fuzzy variables for the controller are depicted in Fig. 3. The fuzzy variables and fuzzy sets are shown in Table 1 and the rules are explained in the Table 2 and Table 3 describes the range of values used in the fuzzy variables for our application:
Comparison of a Trajectory Controller Based on Fuzzy Logic
359
Fig. 3. Distance between the robot center of mass and the destination point1. Table 1. Fuzzy variables.
Variable Notation NB NS
Dev
Description Negative Big Negative Small
ZE
Zero
PS PB VS S M B VB LF LS Z RS RF
Positive Small Positive Big Very Small Small Medium Big Very Big Left Fast Left Slow Zero Right Slow Right Fast
4 Backstepping Controller The stable tracking control rule employed in this work was withdrawn from [9] which stability is guaranteed by Lyapunov theory. This method is useful to the class of mobile robots in which a reference path specification and current position estimation are given separately. The tracking control rule purpose is to converge the error posture to zero, making the robot follow a specified path. The reference path is given for an algorithm that generates the reference trajectory points. Considering a reference posture pr = (xr , yr , θr ) as a
360
R. Mattos da Silva et al. Table 2. Fuzzy rules.
VS
S
NB VS
NS VS
ZE VS
PS VS
VS
(RF) (RS) S
S
(RF) Me (RF) B B D (RF) VB VB (RF) M
PB
(Z)
(LS)
(LF)
S
S
S
S
(RS) Me (RS) B (RS) VB (RS)
(Z) Me (Z) B (Z) VB (Z)
(LS) Me (LS) B (LS) VB (LS)
(LF) Me (LF) B (LF) VB (LF)
Table 3. Variable range values Variable
Inferior limit
Superior limit
D
1 pixel
800 pixel
π
−π
ω
−16 rad/s
16 rad/s
V
0 m/s
8 m/s
goal posture, and a current posture given by pc = (xc , yc , θc ) as an instantly posture of the robot estimated through the image processing, the error posture pe = (xe , ye , θe ) can be represented as follows: ⎡ ⎤ ⎡ ⎤ xe cosθ sinθ 0 (1) Pe = ⎣ ye ⎦ = ⎣ −sinθ cosθ 0 ⎦ × (Pr − Pc ) θe 0 0 1 Error posture is a linear transformation of the reference posture frame to current posture frame. The architecture of control system consists in two inputs: a reference posture pr = (xr , yr , θr ) and a reference velocities qr = (vr , ωr ), reference velocities are respectively the linear and angular velocity. The system has two control action outputs, which are the target velocities q = (v, ω) necessary to reach the reference. The control law is depicted in Eq. 2. v vr cosθ
e + Kx xe (2) q= = ωr + vr Ky ye + Kθ sinθe ω
Comparison of a Trajectory Controller Based on Fuzzy Logic
361
Where Kx , Ky and Kθ are positive constants computed in a heuristic approach. This control law approach is useful only for small reference velocities. The constants heuristically computed used in experimental results are shown in Table 4. The methodology used in this work is very similar to our previous work [16], but in this paper image processing is being used as a sensor instead of odometry. Table 4. Parameters determined heuristically Parameter Value Kx
0.3
Ky
1.0
Kθ
5.0
ωr
0.015 rad/s
vr
2.0 m/s
5 Results In this Section will be shown the experimental results obtained with the mobile robot depicted in Fig. 1, the mobile robot was constructed in our laboratory, and it has two DC motors with encoders in every wheel and a third free wheel, a video camera is located on the roof, and the images are send by a wireless network to a notebook that is located on the robot and controls the robot. All the software architecture employed runs in the ubuntu distribution of the linux operative system. The robot will track a lemniscata trajectory, that is a curve that has abrupt changes in its geometry, and it is a challenging trajectory to track for mobile and manipulator robots. The results that will be shown are related to the application of the fuzzy controller, backstepping controller and a comparison of both controllers. 5.1 Fuzzy Experimental Results Both controllers were implemented in the robot shown in Fig. 1, and tested in a real environment. Figure 4 shows the desired lemniscata trajectory (blue) and the trajectory accomplished by the mobile robot using the fuzzy logic controller. It is possible to observe that the robot follows successfully the predetermined trajectory. 5.2 Backstepping Experimental Results Experiments were also performed with the backstepping control law explained in the Sect. 4. Figure 5 plots the x and y results together using the backstepping controller, the Figure plots the Lemniscate desired trajectory (blue) and the robot accomplished trajectory (red). It is possible to observe that the robot will have a good performance following the specified trajectory.
362
R. Mattos da Silva et al.
Fig. 4. Results of the Fuzzy Controller for tracking the lemniscata trajectory in x and y.
Fig. 5. Results of the backstepping controller for the trajectory tracking with the variables x and y.
5.3 Comparison of the Controllers Figure 6 shows a comparison of the performance of both controllers, we have ploted in the Figure: desired position (blue), position obtained using a fuzzy controller (green) and position obtained using the backstepping controller (red). It is possible to observe a slightly better performance of the backstepping controller in relation to the fuzzy controller in low velocities. Furthermore, if we increase the velocity of the robot, Fig. 7 is obtained by plotting: the desired position (blue), the position obtained using a fuzzy controller (green), and the position obtained using the backstepping controller (red). It can be observed a decay of the performance of the backstepping controller in relation to the Fuzzy controller with the same parameters, the fuzzy controller will have just a slightly decay in its performance. Thus, it becomes increasingly difficult to find a set of satisfactory parameters for the backstepping controller at higher speeds, making the fuzzy controller more suitable.
Comparison of a Trajectory Controller Based on Fuzzy Logic
363
Fig. 6. Comparison of the two controllers for a reference tracking.
Fig. 7. Comparison of the two controllers with a higher reference velocity.
6 Conclusions In this work, two tracking controllers were presented. The designed controllers had as a final objective to make the robot track a predefined trajectory, in order to accomplish this objective, the robot should use the information provided by an image processing algorithm. Experimental results in a real robot with the developed controllers validate our approach and it is possible to make a comparison of their performance. Based on the results obtained, we could observe that the backstepping controller shows just a little better performance compared to the fuzzy controller in low velocities, if we use higher velocities the performance of the backstepping controller will decrease and the fuzzy controller performance will not change. This is mainly due to the fact that, a fuzzy system isn’t an exact approach, and it relies completely on previous knowledge of the designer about the subject. On other hand, the backstepping is a mathematical model based on robot kinematics, ensuring greater accuracy even if does not include a dynamic model. However, the fuzzy logic option is one of the most convenient ones, as it directs the focus to the resolution of the problem, facilitating and optimizing the implementation of the controller, as it removes the need for expensive mathematical models.
364
R. Mattos da Silva et al.
References 1. Bezerra, C.G.: Localizacao de um robo movel usando adometria emarcos naturais. MS thesis. Universidade Federal do rio Grande do Norte (2004) 2. Mu, J., Yan, X.G., Spurgeon, S.K., Mao, Z.: Nonlinear sliding mode control of a two-wheeled mobile robot system. Int. J. Model. Ident. Control 27(2), 75–83 (2017) 3. Jiang, D., Zhong, P., Henk, N.: Tracking control of mobile robots: a case study in backstepping. Automatica 33, 1393–1399 (1997) 4. Gamarra, D.F.T., Bastos Filho, T.F., Sarcinelli Filho, M.: Controlling the navigation of a mobile robot in a corridor with redundant controllers. In: Proceedings of the IEEE International Conference on Robotics and automation, ICRA 2005, Barcelona (2005) 5. Fierro, R., Lewis, F.L.: Control of a nonholonomic mobile robot using neural networks. IEEE Trans. Neural Netw. 9(4), 589–600 (1998) 6. Omrane, H., Masmoudi, S.M., Masmoudi, M.: Fuzzy logic based controller for autonomous mobile robot navigation. Comput. Intell. Neurosci. 2016, 10 (2016) 7. Xiong, B., Qu, S.R.: Intelligent vehicle’s path tracking based on fuzzy controller. J. Transp. Syst. Eng. 10(2), 70–75 (2010) 8. Saifize, M., Hazry, D., Ruduan, M.: Vision based mobile robot navigation system. Int. J. Control Sci. Eng. 2, 83–87 (2012) 9. Kanayama, Y.J., Kimura, F., Miyazaki, T., Noguchi, T.: A stable tracking control method for an autonomous mobile robot. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), pp. 384–389 (1990 10. Precup, R.-E., Tomescu, M.-L., Dragos, C.-A.: Stabilization of Rössler chaotic dynamical system using fuzzy logic control algorithm. Int. J. Gen. Syst. 43(5), 413–433 (2014) 11. Chatterjee, A., et al. Augmented stable fuzzy control for flexible robotic arm using LMI approach and neuro-fuzzy state space modeling. IEEE Trans. Ind. Electron. 55(3), 1256–1270 (2008) 12. Santana, A., Souza, A., Alsina, P., Medeiros, A.: Fusion of odometry and visual datas to localization of a mobile robot using extended Kalman filter. In: Sensor Fusion Applications, pp. 407–421. InTech (2018) 13. Borenstein, J., Everett, H., Feng, L., Wehe, D.: Mobile robot positioning sensors and techniques. J. Robot. Syst. 14, 231–249 (1997) 14. Pfitscher, M., Welfer, D., Nascimento, E.J., Cuadros, M.A.S.L., Gamarra, D.F.T.: Users activity gesture recognition on Kinect sensor using convolutional neural networks and FastDTW for controlling movements of a mobile Robot. Intel. Artif. 22, 121–134 (2019) 15. Farias, H.G., Pereira, R.P.A., Rezende, C.Z., Almeida, G.M., De Souza, M.A.S.L.C., Gamarra, D.F.T.: Fuzzy trajectory controller for differential drive robots. In: 12th IEEE/IAS International Conference on Industry Applications, INDUSCON 2016, Curitiba-Brazil (2016) 16. Silva, R.M., Cuadros, M.A.S.L., Gamarra, D.F.T.: Comparison of a backstepping and a fuzzy controller for tracking a trajectory with a mobile robot. In: 18th International Conference on Intelligent Systems Design and Applications (ISDA), Vellore (2018)
The Use of Area Covered by Blood Vessels in Fundus Images to Detect Glaucoma J. Afolabi Oluwatobi1(B) , Gugulethu Mabuza-Hocquet2 , and Fulufhelo V. Nelwamondo2 1 University of Johannesburg, Johannesburg, South Africa
[email protected] 2 Council for Scientific and Industrial Research, Pretoria, South Africa
{gmabuza,fnelwamondo}@csir.co.za
Abstract. Several techniques have been employed to detect glaucoma from optic discs. Some techniques involve the use of the optic cup-to-disc ratio (CDR) while others use the neuro-retinal rim width of the optic disc. In this work, we use the area occupied by segmented blood vessels from fundus images to detect glaucoma. Blood vessels segmentation is done using an improved U-net Convolutional Neural Network (CNN). The area occupied by the blood vessels is then extracted and used to diagnose glaucoma. The technique is tested on the DR-HAGIS database and the HRF database. We compare our result with a similar method called the ISNTratio which involves the use of the Inferior, Superior, Nasal and Temporal neuroretina rims. The ISNT-ratio is expressed as the ratio of the sum of blood vessels in the Inferior and the Superior to the sum of blood vessels in the Nasal and Temporal. Our results demonstrate a more reliable, stable and efficient method of detecting glaucoma from segmented blood vessels. Our results also show that segmented blood vessels from healthy fundus images cover more area than those from glaucomatous and diabetic fundus images. Keywords: Retinal fundus image · Glaucoma · Blood vessel segmentation · Image segmentation
1 Introduction Glaucoma is an eye diseases which is marked by gradual and continuous loss of the optic disc as well as the retinal ganglion cells [1, 2]. Second only to diabetic retinopathy, glaucoma is a leading cause of blindness. Hence, it’s early detection and treatment is of great importance. A common technique for the detection of glaucoma is the determination of the Cupto–disc ratio (CDR) from the fundus image as seen in Fig. 1. Alternative to using CDR technique is the use of the neuro-retina rim width of the optic disc. The neuro-retinal rim width is the measured length or thickness between the boundary of the optic cup and the optic disc [3–5]. The width measurement is taken for four different regions of the optic disc. The four regions are obtained by dividing the optic disc into quadrants as shown © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 365–375, 2021. https://doi.org/10.1007/978-3-030-49342-4_35
366
J. A. Oluwatobi et al.
in Fig. 3. The quadrants are (i) The lower quadrant region known as the Inferior (ii) The upper quadrant region known as the Superior (iii) The left quadrant region known as the Nasal and (iv) The right quadrant region known as the Temporal [6–12].
Fig. 1. Determination of cup-to-disc ratio. The inner circle is the segmented cup and the outer circle is the segmented disc [13]
Few methods have been used to detect glaucoma from segmented blood vessels. One of such methods is the ISNT-ratio. This is the ratio of the sum of blood vessels in the Superior and Inferior to the sum of blood vessels in the Nasal and Temporal i.e.: (I + S)/(N + T)
(1)
Where I is the area of inferior blood vessel quadrant, S is area of superior blood vessel quadrant, N is area of nasal blood vessel quadrant and T is the area of temporal blood vessel quadrant. However, this method is unstable and the optimum value of ISNT-ratio to use as discriminant between glaucomatous and non-glaucomatous fundus images varies across database and across authors reviewed. Hence, the method is very subjective. In this work, we propose a method that detects glaucoma from fundus images based on the area covered by segmented blood vessels. The segmentation process is done using an improved U-net Convolutional Neural Network (CNN). A CNN [14] is used because of its adaptable and less context specific nature i.e. it generalizes well on unseen fundus images [15–17]. The modified U-net architecture used in this work has much less number of parameters than the traditional U-net [18]. This work uses the HRF [18, 20] and the DR HAGIS databases [21]. The databases consist of labelled fundus images as well as their corresponding segmented blood vessels. Both the proposed method and the ISNT-ratio method are evaluated on the databases. The ability of our proposed method to separate glaucomatous fundus from non-glaucomatous fundus images is measured against that of the ISNT-ratio technique. The contribution made by this work include a method of glaucoma detection that makes use of area covered by blood vessels in fundus images. Also, a result analysis of both the proposed and the ISNT-ratio methods is provided. The rest of this paper is organized as follows: Sect. 2 discusses the related work, Sect. 3 discusses the proposed approach of the experiment, Sect. 4 presents the results of the experiment and Sect. 5 discusses the limitations of the study. Section 6 presents the conclusion and the last section explains the future work.
The Use of Area Covered by Blood Vessels in Fundus Images
367
2 Related Work The detection of glaucoma from fundus images involves two major processes which are: segmentation of blood vessels from fundus images and extraction of desired features from the segmented blood vessels. This section discusses the various segmentation methods that have been used and the subsequent use of segmented blood vessels for glaucoma detection. Segmentation processes are usually carried out using CNNs because of their robust and efficient result. Pre-trained CNN model was used by Sunil et al. [22, 23] for blood vessels segmentation. The model used was pre-trained on the Microsoft COCO dataset [24]. Training of the model was done using patches from the fundus images. Hence, the outputs of the model are segmented image patches which were recombined to form the desired segmented blood vessels. Sonro et al. [25] used a method that involves both a pre-processing of the fundus images and a post pro-processing of the output. Pre-processing includes ensuring uniform illumination of fundus, conversion of fundus images to a grey scale and rescaling of the images. The post-processing includes the application of a double threshold on the output. The model used was a CNN. Oliveira et al. [17] employed the combination of stationary wavelet transformations and a fully CNN. The transformations were done on the fundus images before they were fed into the CNN in patches. The above discussed methods of blood vessel segmentation involve a lot of pre-processing and post-processing. These processes make the methods to be cumbersome. After a proper segmentation process, some analysis must be carried out in order to detect which ocular disease is present. One of the analysis needed to detect which ocular disease is present is based on a variant of the ISNT rule. For segmented fundus images, the ratio of the number of blood vessels in the neuro-retina rims is often used as a feature to detect glaucomatous eye. Shyam et al. [26] proposed a glaucoma detection technique that makes use of segmented blood vessels from the fundus image. The analysis of the blood vessel is done using the ISNT-ratio. The ISNT-ratio used is computed as shown in (1). The analysis is performed on 10 glaucomatous and 10 non-glaucomatous fundus images. The ISNT ratio is calculated for all of the segmented fundus images. Their work reports that ISNT ratio of non-glaucomatous fundus images is higher than that of glaucomatous fundus images. This is because more blood vessels are lost in a glaucomatous fundus as a result of increased pressure in the eye. They concluded that the ISNT ratio for a non-glaucomatous fundus image is within the range of 2.166 ± 0.19 and 1.755 ± 0.08 for a glaucomatous fundus image. The difference in range allows for separation of healthy eyes from glaucomatous eyes. Jeyashree et al. [6] also detected glaucoma from segmented blood vessels using the ISNT-ratio as shown in (1). They did this by using a mask image of 360 × 360 to evaluate the area occupied by blood vessels in each of the ISNT quadrant. Their report shows that the average number of blood vessels in a non-glaucomatous fundus image is 29254.3 ± 10775.5 and 35746 ± 11443.2 in a glaucomatous fundus image. Their report also shows that the average ISNT ratio for non-glaucomatous and glaucomatous fundus image are 1.024 ± 0.02 and 1.037 ± 0.021 respectively. However, Jeyashree et al. concluded that the number of blood vessels are higher in non-glaucomatous fundus images than in glaucomatous images. Deepika et al. [27] also proposed a glaucoma detection
368
J. A. Oluwatobi et al.
technique using blood vessel segmentation. However, they did not use the ISNT-ratio for blood vessel classification into glaucoma or non-glaucoma but rather used statistical features extracted from blood vessels. The statistical feature used were entropy, mean and standard deviation. Deepika et al. reported that the average entropy, mean and standard deviation for healthy fundus images are 7.0591, 87.279 and 77.049 respectively. For a glaucomatous fundus image, they are 5.6832, 54.121 and 58.086 respectively. They proposed that the entropy, mean and standard deviation are higher in non-glaucoma fundus images. It should be noted that the ISNT-ratio range quoted by Jeyashree et al. is very different from those quoted by Shyam et al. [26] for both non-glaucomatous and glaucomatous fundus images. This suggests that the ISNT- ratio is dependent on the source of the fundus image and the image processing technique used. A glaucoma detection method which is independent of the source of database used and the image processing technique adopted is therefore needed.
3 Proposed Experimental Approach The Blood vessel segmentation from the fundus images is done using a CNN architecture based on the U-net. When compared to the original U-net, the proposed architecture as shown in Fig. 2 has more convolutional layers. Though the proposed CNN architecture has more layers, the filter size (3 × 3) is kept the same in all layers except the output layer which has a filter size of 1 × 1. The proposed architecture has much less number of parameters than the earliest U-Net. Our experiment revealed that networks with large parameters over-fit quickly on the training data and therefore generalizes poorly for segmentation tasks. The U-net architecture was set up as described in Afolabi et al. [28]. The fundus images are scaled down to 512 × 512 pixels so as to reduce computation time and cost. The re-scaling has no negative impact on the training process, it rather increases the training speed. The contrast of the fundus images is further enhanced using the histograms calculated over several tile regions of the image. Scikit equalize adapthist is used for this process. Improving the contrast makes the fundus images to have uniform contrast and hence, better training of model. The improved contrast fundus images are fed into the CNN model. The segmented and re-sized fundus images are then processed to obtain the area occupied by the blood vessels. This is done by binarizing the images and extracting active pixels. The extracted area is normalized to remove the effect of varying imagesize across different databases. To obtain the ISNT-ratio of the fundus images, we mask the segmented and resized fundus images along their sectors to obtain the Inferior (I), Superior (S), Nasal (N) and Temporal (T) quadrants as shown in Fig. 3. This is done using the ogrid library of the numpy package. The number of blood vessels in each quadrant is then obtained. The obtained number of vessels is used to compute the ISNT ratio. The ISNT ratio is the ratio of the addition of area covered by the Inferior and Superior quadrants to the addition of the area covered by the Nasal and Temporal quadrants.
The Use of Area Covered by Blood Vessels in Fundus Images
369
Fig. 2. Proposed model architecture for extraction of blood vessels from fundus images
Fig. 3. Optic disc divided into the ISNT quadrants. For the right and left eye respectively [29]
The proposed approach is further described by the following algorithm. Step 1: Re-sizing the images to 512 × 512 pixels Step 2: Applying histogram equalization to images Step 3: Training the modified U-net CNN with the re-sized fundus images. The output of step 3 are segmented blood vessels from fundus images. Figure 4 explains the further processes applied to the segmented images. Figure 5 shows a fundus image and a segmented fundus image from which the total area of blood vessel is acquired. Figure 6 shows the I, S, N and T quadrants from which the ISNT-ratio values are obtained.
370
J. A. Oluwatobi et al.
Fig. 4. Proposed flow chart for total area and ISNT ratio acquisition
Fig. 5. Fundus image (Left), segmented fundus image (R)
4 Experimental Results The experiment is tested on fundus images from HRF and DR-HAGIS database. The HRF database has 45 images of size 3504 × 2336 and the DR-HAGIS database has 40 images of varying size from 4752 × 3168 to 2896 × 1944. The DR-HAGIS database has images from glaucoma, hypertension, diabetic retinopathy and age-related macular diseases. The experiment was carried out using Kaggle’s 2 CPU cores, 14 GB RAM. We compare our proposed method’s results for the glaucomatous with the nonglaucomatous segmented fundus images. The comparison was done to see the differences in the area occupied by glaucomatous and the non-glaucomatous fundus images.
The Use of Area Covered by Blood Vessels in Fundus Images
371
Fig. 6. Fundus images for the extraction of ISNT-ratio values. (a) I quadrant (b) S quadrant (c) N quadrant (d) T quadrant
The differences becomes the bases for separation between non-glaucomatous and glaucomatous segmented fundus image. The same process was repeated for the ISNT-ratio method. Some results are shown in Table 1 and Table 2 for the HRF and DR-HAGIS database respectively. Table 1. Area covered by segmented blood vessels and ISNT-ratio values for the HRF database Area covered by non-glaucomatous fundus (/pixels)
Area covered by glaucomatous fundus (/pixels)
ISNT ratio for non-glaucomatous fundus
ISNT ratio for glaucomatous fundus
0.081
0.063
1.134
1.315
0.093
0.069
1.263
1.069
0.091
0.072
1.165
1.247
0.087
0.062
1.074
1.135
0.082
0.066
1.169
1.219
Table 2. Area covered by fundus images and ISNT-ratio values for the DR-HAGIS database Area covered by glaucomatous fundus (/pixels)
ISNT ratio for glaucomatous fundus
0.0523
1.2654
0.0567
1.760
0.069
1.629
0.053
1.545
0.048
1.917
Table 1 shows some of the 45 segmented fundus images’ ISNT-ratio and area covered by blood vessels for the HRF database. The samples displayed in Table 1 are randomly picked from the entire samples. It can be seen that the area (per pixel) occupied by non-glaucomatous images are higher than that of the glaucomatous images. This is
372
J. A. Oluwatobi et al.
because non-glaucomatous images have healthier blood vessels while glaucomatous images are often marked with gradual loss of blood vessels. The ISNT-ratio values for both glaucomatous and non-glaucomatous fundus images are inter-twined as observed from Table 1. This makes differentiating between non-glaucomatous and glaucomatous fundus images impossible in most cases. Table 2 shows some of the 40 segmented fundus images’ ISNT-ratio and area covered by blood vessels for the DR-HAGIS database. The DR-HAGIS database does not have non-glaucomatous fundus images. The database was selected to compare the area covered by glaucomatous fundus images and some other ocular diseases with that of non-glaucomatous fundus images of another database. This assures that the values are consistent or at least close across different databases. The table shows that the area covered by glaucomatous fundus images is comparable with that of the HRF database and it is below the range of the non-glaucomatous images. Figure 7 shows the scatter plot of the area covered by blood vessels in each fundus image of the HRF-database. It can be seen that fundus images with higher area per pixel are non-glaucomatous fundus images. This helps to set a boundary that guarantees that a fundus image is non-glaucomatous. The boundary is drawn in Fig. 9.
Fig. 7. Area covered by segmented blood vessels in each fundus image of the HRF database.
Figure 8 shows the scatter plot of the ISNT-ratio value for each fundus image in the HRF-database. From the figure, there is no clear boundary between the glaucomatous and the non-glaucomatous fundus images. This makes using the method as a means of differentiating between the glaucomatous and non-glaucomatous fundus images inefficient. Figure 9 shows the scatterplot of the area covered by blood vessels in each fundus image of the DR-HAGIS database interpolated on those of the HRF database. This helps to give a clear view of the method’s performance across two different database. From the figure, it can be seen that there is a demarcation between glaucomatous and nonglaucomatous fundus images. An area of 0.08 per pixel is a good value to pick as a boundary. In other words, a fundus image with an area occupied by blood vessel of 0.08 per pixel or above is non-glaucomatous.
The Use of Area Covered by Blood Vessels in Fundus Images
373
Fig. 8. ISNT-ratio of each fundus image in the HRF database
Fig. 9. Area covered by segmented blood vessels in fundus images from the HRF database overlaid by glaucomatous images from the DR-HAGIS database
5 Limitation of the Study The study is affected by some anomalies in the database used. This is because fundus images marked non-glaucomatous could also be affected by other ocular disease not captured by the database. Also, the accuracy of the method proposed depends on the goodness of the segmentation process used on the fundus images.
6 Conclusion The proposed method performed better on the two databases when compared with the ISNT-ratio method. The proposed method is especially useful when used to detect nonglaucomatous fundus images since non-glaucomatous fundus images have higher areavalue. However, for optimum result, the proposed method should not be used alone for the detection of glaucoma in fundus images. It should be used in conjunction with other techniques of glaucoma detection such as the CDR.
374
J. A. Oluwatobi et al.
7 Future Work The same procedure will be carried out using more databases. The use of more databases will give a more generalized value to be used as a boundary of separation. Also, the performance of this method will also be evaluated against other methods of glaucoma detection.
References 1. Qiu, K., Wang, G., Lu, X., Zhang, R., Sun, L., Zhang, M.: Application of the ISNT rules on retinal nerve fibre layer thickness and neuroretinal rim area in healthy myopic eyes. Acta Ophthalmol. (2018). https://doi.org/10.1111/aos.13586 2. Moon, J., Park, K.H., Kim, D.M., Kim, S.H.: Factors affecting ISNT rule satisfaction in normal and glaucomatous eyes. Korean J. Ophthalmol. KJO (2018). https://doi.org/10.3341/ kjo.2017.0031 3. Ahmad, H., Yamin, A., Shakeel, A., Gillani, S.O., Ansari, U.: Detection of glaucoma using retinal fundus images. In: IEEE International Conference on Robotics and Emerging Allied Technologies in Engineering 2014, pp. 321–324 (2014) 4. Bhartiya, S., Gadia, R., Sethi, H.S., Panda, A.: Clinical evaluation of optic nerve head in glaucoma. Curr. J. Glaucoma Pract. DVD 4, 115–132 (2010). https://doi.org/10.5005/jp-jou rnals-10008-1080 5. Das, P., Nirmala, S.R., Medhi, J.P.: Detection of glaucoma using neuroretinal rim information. In: International Conference on Accessibility to Digital World, ICADW 2016, pp. 181–186 (2016) 6. Jeyashree, D., Ramasamy, K.: Combined approach on analysis of retinal blood vessel segmentation for diabetic retinopathy and glaucoma diagnosis, vol. 5 (2014) 7. GR Foundation: Five common glaucoma tests. https://www.glaucoma.org/glaucoma/diagno stic-tests.php.9. Accessed 13 Sept 2019 8. Moon, J., Park, K.H., Kim, D.M., Kim, S.H.: Factors affecting ISNT rule satisfaction in normal and glaucomatous eyes. Korean J. Ophthalmol.: KJO 32(1), 38–44 (2018) 9. Shyam L., Kumar G.S.: Blood vessel segmentation in fundus images and detection of glaucoma. In: International Conference on Communication Systems and Networks, ComNet 2016, pp. 34–38 (2016) 10. Kang, D., Sowka, J.: The ISNT rule is a clinically useful method to aid in the diagnosis of glaucoma. Optom. - J. Am. Optom. Assoc. 82(3), 134 (2011) 11. Poon, L.Y., Solá-Del Valle, D., Turalba, A.V., Falkenstein, I.A., Horsley, M., Kim, J.H., Song, B.J., Takusagawa, H.L., Wang, K., Chen, T.C.: The ISNT rule: how often does it apply to disc photographs and retinal nerve fiber layer measurements in the normal population? Am. J. Ophthalmol. 184, 19–27 (2017) 12. Nawaldgi, F., Lalitha, Y.S.: A novel combined color channel and ISNT rule based automatic glaucoma detection from color fundus images. Indian J. Sci. Technol. 10(13), 1–6 (2017) 13. Lim, G., Cheng, Y., Hsu, W., Lee, M.L.: Integrated optic disc and cup segmentation with deep learning, pp. 162–169. IEEE, November 2015. https://doi.org/10.1109/ictai.2015.36. https:// ieeexplore.ieee.org/document/7372132. ISBN 1082–3409 14. Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. TPAMI 39(4), 640–651 (2017) 15. Vengalil, S.K., Sinha, N., Srinivas, S.S.K., Venkatesh Babu, R.: Customizing CNNs for blood vessel segmentation from fundus images. In: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings 2016, p. 1 (2016)
The Use of Area Covered by Blood Vessels in Fundus Images
375
16. Soomro, T.A., Afifi, A.J., Junbin, G., Hellwich, O., Khan, M.A., Paul, M., Zheng, L.: Boosting sensitivity of a retinal vessel segmentation algorithm with convolutional neural network. In: International Conference on Digital Image Computing: Techniques and Applications 2017, DICTA, pp. 1–8 (2017) 17. Oliveira, A., Pereira, S., Silva, C.A.: Retinal vessel segmentation based on fully convolutional neural networks. Expert Syst. Appl. 112, 229–242 (2018). https://doi.org/10.1016/j.eswa. 2018.06.034 18. Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation (2015). http://arxiv.org/abs/1505.04597 19. Budai, A., Bock, R., Maier, A., Hornegger, J., Michelson, G.: Robust vessel segmentation in fundus images, Int. J. Biomed. Imaging 2013, 154860-11 (2013). http://dx.doi.org/10.1155/ 2013/154860 20. Odstrcilik, J., Kolar, R., Budai, A., Hornegger, J., Jan, J., Gazarek, J., Kubena, T., Cernosek, P., Svoboda, O., Angelopoulou, E.: Retinal vessel segmentation by improved matched filtering: evaluation on a new high-resolution fundus image database. IET Image Process. 7(4), 373–383 (2013). http://digital-library.theiet.org/content/journals/10.1049/iet-ipr.2012.0455 21. Holm, S., Russell, G., Nourrit, V., McLoughlin, N.: DR HAGIS-a fundus image database for the automatic extraction of retinal surface vessels from diabetic patients. J. Med. Imaging 4(1), 014503 (2017). http://doi.org/10.1117/1.JMI.4.1.014503 22. Vengalil, S.K., Sinha, N., Kruthiventi, S.S.S., Venkatesh Babu, R.: Customizing CNNs for blood vessel segmentation from fundus images, p. 1, 1 January 2016 23. Chen, L., et al.: Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs (2014). https://arxiv.org/abs/1412.7062 24. Lin, T., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollár, P.: Microsoft COCO: Common Objects in Context (2014). https:// arxiv.org/abs/1405.0312 25. Soomro, T.A., Afifi, A.J., Gao, J., Hellwich, O., Khan, M.A.U., Paul, M., Zheng, L.: Boosting sensitivity of a retinal vessel segmentation algorithm with convolutional neural network, pp. 1–8, November 2017 26. Shyam, L., Kumar, G.S.: Blood vessel segmentation in fundus images and detection of glaucoma, pp. 34–38, July 2016. https://ieeexplore.ieee.org/document/7823982 27. Deepika, E., Maheswari, S.: Earlier glaucoma detection using blood vessel segmentation and classification, pp. 484–490, January 2018. https://ieeexplore.ieee.org/document/8399120 28. Joshua, A.O., Nelwamondo, F.V., Mabuza-Hocquet, G.: Segmentation of optic cup and disc for diagnosis of glaucoma on retinal fundus images, pp. 183–187, January 2019 29. Ruengkitpinyo, W., Kongprawechnon, W., Kondo, T., Bunnun, P., Kaneko, H.: Glaucoma screening using rim width based on ISNT rule, pp. 1–5, 22 March 2015. https://ieeexplore. ieee.org/document/7110827
Complexity of Rule Sets Induced from Data with Many Lost Values and “Do Not Care” Conditions Patrick G. Clark1 , Jerzy W. Grzymala-Busse1,2(B) , Zdzislaw S. Hippe2 , Teresa Mroczek2 , and Rafal Niemiec2 1
Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA [email protected], [email protected] 2 Department of Artificial Intelligence, University of Information Technology and Management, 35-225 Rzeszow, Poland {zhippe,tmroczek,rniemiec}@wsiz.rzeszow.pl
Abstract. Missing or incomplete data sets are a common problem in data mining. To deal with structured data of this type, the interpretation of attribute values is a contributing factor to the resulting accuracy as well as complexity of the rule sets induced. In this paper, lost values and “do not care” conditions are studied as a representation for the missing values. Further study is conducted with global and saturated approximations, two new types of probabilistic approximations. These approaches are combined to produce four primary data mining experiments; rule induction with two types of approximations and two interpretations of missing attribute values. The main objective of this work is to compare the complexity of the induced rule sets by the four approaches to find the lowest complexity of rules. This is a complement to previous research where experimental evidence show that none of the four approaches induces rules with the lowest error in all scenarios, and it depends on the data set being mined. The result of this paper’s experiments in complexity show that using the “do not care” condition provides simpler rules sets than the lost value interpretation of missing attribute values. Furthermore, there is not statistically significant differences in complexity between using global or saturated probabilistic approximations. Keywords: Data mining · Rough set theory probabilistic approximations
1
· Characteristic sets ·
Introduction
Rough set theory is one of the basic approaches used in data mining with the data sets mined being complete or incomplete. Complete data sets have all data specified, known attribute values, and incomplete data sets are missing some c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 376–385, 2021. https://doi.org/10.1007/978-3-030-49342-4_36
Complexity of Rule Sets
377
attribute values. This paper studies data sets with two interpretations of missing attribute values, “do not care” conditions and lost values. “Do not care” conditions defines that the original value is not important and as a result may be replaced by any value in the domain of attributes. Lost values are values that are erased or not inserted, and for data mining only existing attribute values may be used. In this work, missing attribute values are not handled by substitution. The experiments in this paper instead use rule induction algorithms that take those missing values into account. To accomplish this, the basic elements of the data mining tools used are probabilistic approximations. These approximations are generalizations of the standard approximation, originally introduced for complete data sets in [11], and a fundamental idea of rough set theory. The standard approximation is extended to probabilistic approximations and studied by many research teams in [9–17]. For incomplete data sets, probabilistic approximations are introduced in [5]. Four approaches to data mining are combined in the experiments for this paper. Two of the approaches are new types of probabilistic approximations to be used during rule induction; global and saturated [2]. With the final two approaches being interpretations of missing attribute values; lost values and “do not care” conditions. Previous work has studied this combination of approaches with a goal of determining the best approach as measured by the lowest resulting error rate [2]. However, the result of the ten-fold cross validation is that no single approach is best for all situations and it depended on the data set being mined. As a result, if no approach is optimal in terms of error rate, the better approach produces the least complex rule sets. This problem is discussed in detail in [3], reporting results of experiments on data sets with 35% of missing attribute values. In that study using global and saturated probabilistic approximations and with a 5% of significance level, no statistically significant difference is found. However, it is discovered that for the majority of data sets, “do not care” conditions result in less complex rule sets than lost values. The results are not completely conclusive in [3], therefore further study of data sets with many more missing attribute values is continued in this paper. The main objective of this paper is to study what approach will provide the lowest complexity of rule sets. Rule set complexity is a measurement of the number of rules and the total number of conditions, inducing rules for data sets with many missing attribute values. The general results of the experiments show that for all data sets, the number of rules and conditions are always smaller for “do not care” conditions than for lost values. On the other hand, the results between global and saturated probabilistic approximations varies but the differences are not statistically significant.
2
Incomplete Data
Table 1 is a small example of an incomplete data set. It is presented as a decision table where each row of the decision table represents a case. The set of all cases
378
P. G. Clark et al.
is denoted by U . In Table 1, U = {1, 2, 3, 4, 5, 6, 7, 8}. Independent variables are called attributes, the set of all attributes is denoted by A. A dependent variable is called a decision and is denoted by d. In Table 1, A = {Temperature, Wind, Humidity}. In a decision table, the value for a case x and an attribute a is denoted by a(x). For example, Temperature(1) = medium. Table 1. A decision table Attributes Case Temperature 1 2 3 4 5 6 7 8
medium high * ? high ? medium high
Wind
Decision Humidity
low * ? * medium ? * medium high high medium * * high * ?
Trip yes yes yes yes no no no no
A concept is the set X of all cases defined by the same value of the decision d. For example, a concept associated with the value yes of the decision Trip is the set {1, 2, 3, 4}. For complete decision tables, a block of the attribute-value pair (a, v), denoted by [(a, v)], is the set {x ∈ U | a(x) = v} [4]. For incomplete decision tables, the definition of a block of an attribute-value pair is modified as follows: – for an attribute a and a case x, if a(x) = ?, then the case x is lost and should not be included in any blocks [(a, v)] for all values v of attribute a; – for an attribute a and a case x, if a(x) = ∗, then the case x is “do not care” condition and should be included in blocks [(a, v)] for all specified values v of attribute a. For the data set from Table 1, the blocks of attribute-value pairs are: [(Temperature, medium)] = {1, 3, 7}, [(Temperature, high)] = {2, 3, 5, 8}, [(Wind, low)] = {1, 4, 7, 8}, [(Wind, medium)] = {3, 4, 6, 7, 8}, [(Wind, high)] = {4, 5, 7, 8}, [(Humidity, medium)] = {1, 2, 4, 6}, and [(Humidity, high)] = {1, 2, 5, 6, 7 }. The characteristic set KB (x), is defined as the intersection of the sets K(x, a), for all a ∈ B, where x ∈ U , B ⊆ A, and the set K(x, a) is defined as follows:
Complexity of Rule Sets
379
– if a(x) is specified, then K(x, a) is the block [(a, a(x))] of attribute a and its value a(x); – if a(x) = ? or a(x) = ∗, then K(x, a) = U . For Table 1 and B = A, KA (1) = {1, 7}, KA (2) = {2, 3, 5, 8}, KA (3) = {3, 4, 6, 7, 8}, KA (4) = {1, 2, 4, 6},
KA (5) KA (6) KA (7) KA (8)
= = = =
{5}, {3, 4, 6, 7, 8}, {1, 7}, and {2, 3, 5, 8}.
Fig. 1. Number of rules for the breast cancer data set with 44.81% of missing attribute values
Fig. 2. Number of rules for the echocardiogram data set with 40.15% of missing attribute values
Fig. 3. Number of rules for the hepatitis data set with 60.27% of missing attribute values
Fig. 4. Number of rules for the image segmentation data set with 64.86% of missing attribute values
380
P. G. Clark et al.
Fig. 5. Number of rules for the lymphography data set with 64.90% of missing attribute values
Fig. 6. Number of rules for the wine recognition data set with 64.65% of missing attribute values
Fig. 7. Total number of conditions for the breast cancer data set with 44.81% of missing attribute values
Fig. 8. Total number of conditions for the echocardiogram data set with 40.15% of missing attribute values
3
Probabilistic Approximations
In this paper restrict our attention to two types of probabilistic approximations: global and saturated. Some necessary definitions are quoted from [2,3]. 3.1
Global Probabilistic Approximations
A special case of the global probabilistic approximation, limited only to lower and upper approximations, is introduced in [7,8], and then generalized in [1]. A B-global probabilistic approximation of the concept X, with the parameter global (X) is defined as follows α and denoted by apprα,B {KB (x) | ∃ Y ⊆ U ∀ x ∈ Y, P r(X|KB (x)) ≥ α}.
Complexity of Rule Sets
381
Fig. 9. Total number of conditions for the hepatitis data set with 60.27% of missing attribute values
Fig. 10. Total number of conditions for the image segmentation data set with 64.86% of missing attribute values
Fig. 11. Total number of conditions for the lymphography data set with 64.90% of missing attribute values
Fig. 12. Total number of conditions for the wine recognition data set with 64.65% of missing attribute values
For given sets B and X and the parameter α, there exist many B-global probabilistic approximations of X. Additionally, an algorithm for computing Bglobal probabilistic approximations is of exponential computational complexity. As a result, a heuristic version is used for the definition of B-global probabilistic approximation, called the MLEM2 B-global probabilistic approximation of the mlem2 (X) [1]. concept X, associated with a parameter α and denoted by apprα,B This definition is based on the rule induction algorithm MLEM2. The approxmlem2 (X) is a union of the characteristic sets KB (y), the most imation apprα,B relevant to the concept X, i.e., with |X ∩ KB (y)| as large as possible and P r(X|KB (y)) ≥ α, where y ∈ U . If more than one characteristic set KB (y) satisfies both conditions, the characteristic set KB (y) with the largest P r(X|KB (y)) is selected. If this criterion ends up with a tie, a characteristic set is picked up heuristically, as the first on the list [1].
382
P. G. Clark et al.
Special MLEM2 B-global probabilistic approximations, with B = A, are called global probabilistic approximations associated with the parameter α, and are denoted by apprαmlem2 (X). Similarly, for B = A, the characteristic set KB (X) is denoted by K(x). Let Eα (X) be the set of all eligible characteristic sets defined as follows {K(x) | x ∈ U, P r(X|K(x)) ≥ α}. A heuristic version of the MLEM2 global probabilistic approximation is computed using the following algorithm. MLEM2 global probabilistic approximation algorithm input: a set X (a concept), a set Eα (X), output: a set T (apprαmlem2 (X)) begin G := X; T := ∅; Y := Eα (X); while G = ∅ and Y = ∅ begin select a characteristic set K(x) ∈ Y such that |K(x) ∩ X| is maximum; if a tie occurs, select K(x) ∈ Y with the smallest cardinality; if another tie occurs, select the first K(x); T := T ∪ K(x); G := G − T ; Y := Y − K(x) end end For Table 1, all distinct MLEM2 global probabilistic approximations for [(Trip, no)] are appr1mlem2 ({5, 6, 7, 8}) = {5}, mlem2 ({5, 6, 7, 8}) = {3, 4, 5, 6, 7, 8}, appr0.6 mlem2 ({5, 6, 7, 8}) = {2, 3, 4, 5, 6, 7, 8}. appr0.5
3.2
Saturated Probabilistic Approximations
Saturated probabilistic approximation are unions of characteristic sets while giving higher priority to characteristic sets with larger conditional probability P r(X|K(x)). Additionally, if the approximation covers all cases from the concept X, the algorithm will stop adding characteristic sets.
Complexity of Rule Sets
383
Let X be a concept and let x ∈ U . Let us compute all conditional probabilities P r(X|K(x)). Then sort the set {P r(X|K(x)) | x ∈ U } in descending order. Let us denote the sorted list of such conditional probabilities by α1 , α2 ,..., αn . For any i = 1, 2,..., n, the set Ei (x) is defined as follows {K(x) | x ∈ U, P r(X|K(x)) = αi }. To compute a saturated probabilistic approximation, denoted by apprαsaturated (X), for some α, 0 < α ≤ 1, it is necessary to identify the index m such that αm ≥ α > αm+1 , where m ∈ {1, 2, ..., n} and αn+1 = 0. The saturated probabilistic approximation (X) is computed using the following algorithm. apprαsaturated m Saturated probabilistic approximation algorithm input: a set X (a concept), a set Ei (x) for i = 1, 2,..., n and x ∈ U , index m output: a set T (apprαsaturated (X)) m begin T := ∅; Yi (x) := Ei (x) for all i = 1, 2,..., m and x ∈ U ; for j = 1, 2,..., m do while Yj (x) = ∅ begin select a characteristic set K(x) ∈ Yj (x) such that |K(x) ∩ X| is maximum; if a tie occurs, select the first K(x); Yj (x) := Yj (x) − K(x); if (K(x) − T ) ∩ X = ∅ then T := T ∪ K(x); if X ⊆ T then exit end end For Table 1, all distinct saturated probabilistic approximations for [(Trip, no)] are appr1saturated ({5, 6, 7, 8}) = {5}, saturated ({5, 6, 7, 8}) = {3, 4, 5, 6, 7, 8}. appr0.6 mlem2 ({5, 6, 7, 8}) covers the case 2 in spite of the fact that Note that appr0.5 this case is not a member of the concept {5, 6, 7, 8}. The set {2, 3, 4, 5, 6, 7, 8}
384
P. G. Clark et al.
is not listed among saturated probabilistic approximations of the concept {5, 6, 7, 8}. 3.3
Rule Induction
For mining the data to produce rule sets, the MLEM2 algorithm [6] is used with a parameter β, a value between 0 and 1, interpreted as a probability. This parameter controls the quality of induced rules and if a rule covers a subset Y of U and the rule indicates the concept X, the rule is produced by the rule induction system if P r(X|Y ) ≥ β.
4
Experiments
Six data sets, available at the University of California at Irvine Machine Learning Repository, are used for the experiments. For every data set, a template is created by randomly replacing as many of the specified attribute values as possible with lost values. The maximum percentage of missing attribute values replaced is restricted by the requirement that no row of the data set should contain only lost values. The same templates are used for constructing data sets with “do not care” conditions, by replacing “?”s with “∗”s. The parameter α is equal to 0.5 in all experiments. Results of our experiments are presented in Figs. 1,2,3,4,5,6,7,8,9,10,11,12, where “Global” denotes a MLEM2 global probabilistic approximation, “Saturated” denotes a saturated probabilistic approximation, “?” denotes lost values and “∗” denotes “do not care” conditions. As previously described, four approaches for mining incomplete data sets are used in the experiments, combining two options of probabilistic approximations (global and saturated) with two interpretations of missing attribute values (lost values and “do not care” conditions).
5
Conclusions
Experiments are conducted using six data sets that are randomly modified to simulated missing attribute values. Two interpretation for missing attribute values are then used producing a total of twelve data sets, and with these data sets two types of probabilistic approximation rule induction algorithms are used. The resulting comparisons show that the simpler rules sets are induced using “do not care” conditions than using lost values; however, the difference between both probabilistic approximations are not significant.
References 1. Clark, P.G., Gao, C., Grzymala-Busse, J.W., Mroczek, T., Niemiec, R.: A comparison of concept and global probabilistic approximations based on mining incomplete data. In: Proceedings of ICIST 2018, the International Conference on Information and Software Technologies, pp. 324–335 (2018)
Complexity of Rule Sets
385
2. Clark, P.G., Grzymala-Busse, J.W., Mroczek, T., Niemiec, R.: A comparison of global and saturated probabilistic approximations using characteristic sets in mining incomplete data. In: Proceedings of the Eight International Conference on Intelligent Systems and Applications, pp. 10–15 (2019) 3. Clark, P.G., Grzymala-Busse, J.W., Mroczek, T., Niemiec, R.: Rule set complexity in mining incomplete data using global and saturated probabilistic approximations. In: Proceedings of the 25-th International Conference on Information and Software Technologies, pp. 451–462 (2019) 4. Grzymala-Busse, J.W.: LERS—a system for learning from examples based on rough sets. In: Slowinski, R. (ed.) Intelligent Decision Support. Handbook of Applications and Advances of the Rough Set Theory, pp. 3–18. Kluwer Academic Publishers, Dordrecht (1992) 5. Grzymala-Busse, J.W.: Generalized parameterized approximations. In: Proceedings of the 6-th International Conference on Rough Sets and Knowledge Technology, pp. 136–145 (2011) 6. Grzymala-Busse, J.W., Clark, P.G., Kuehnhausen, M.: Generalized probabilistic approximations of incomplete data. Int. J. Approximate Reason. 132, 180–196 (2014) 7. Grzymala-Busse, J.W., Rzasa, W.: Local and global approximations for incomplete data. In: Proceedings of the Fifth International Conference on Rough Sets and Current Trends in Computing, pp. 244–253 (2006) 8. Grzymala-Busse, J.W., Rzasa, W.: Local and global approximations for incomplete data. Trans. Rough Sets 8, 21–34 (2008) 9. Grzymala-Busse, J.W., Ziarko, W.: Data mining based on rough sets. In: Wang, J. (ed.) Data Mining: Opportunities and Challenges, pp. 142–173. Idea Group Publ, Hershey (2003) 10. Pawlak, Z., Skowron, A.: Rough sets: some extensions. Inf. Sci. 177, 28–40 (2007) 11. Pawlak, Z., Wong, S.K.M., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. Int. J. Man-Mach. Stud. 29, 81–95 (1988) ´ ezak, D., Ziarko, W.: The investigation of the Bayesian rough set model. Int. J. 12. Sl¸ Approximate Reason. 40, 81–91 (2005) 13. Wong, S.K.M., Ziarko, W.: INFER—an adaptive decision support system based on the probabilistic approximate classification. In: Proceedings of the 6-th International Workshop on Expert Systems and their Applications, pp. 713–726 (1986) 14. Yao, Y.Y.: Probabilistic rough set approximations. Int. J. Approximate Reason. 49, 255–271 (2008) 15. Yao, Y.Y., Wong, S.K.M.: A decision theoretic framework for approximate concepts. Int. J. Man-Mach. Stud. 37, 793–809 (1992) 16. Ziarko, W.: Variable precision rough set model. J. Comput. Syst. Sci. 46(1), 39–59 (1993) 17. Ziarko, W.: Probabilistic approach to rough sets. Int. J. Approximate Reason. 49, 272–284 (2008)
ReLU to Enhance MDLSTM for Offline Arabic Handwriting Recognition Rania Maalej1(B)
and Monji Kherallah2
1 National School of Engineers of Sfax, University of Sfax, Sfax, Tunisia
[email protected] 2 Faculty of Sciences, University of Sfax, Sfax, Tunisia
Abstract. Multi-dimensional Long Short-Term Memory networks (MDLSTMs) are now a state-of-the-art technology that provide a very good performance on different machine learning tasks due to their ability to model any n dimensional pattern using n recurrent connections with n forget gates. For this reason, we are going to focus on the handwritten Arabic word recognition, in which we will need to use only two dimensional MDLSTM for 2D input images then try to improve the accuracy of this baseline recognition system. Such several deep neural networks, the vanishing gradient problem can affect the performance of this MDLSTMbased recognition system. To solve this problem, Rectified Linear Units (ReLUs) are added with different modes, to draw out the best MDLSTM topology for the offline Arabic handwriting recognition system. Proposed systems are evaluated on a large database IFN/ENIT. According to the experimental results and compared to the baseline system, the best tested architecture gives a 5.57% reduction in the label error rate. Keywords: LSTM · MDLSTM · ReLU · Dropout · Offline Arabic handwriting recognition
1 Introduction Deep learning has a sound reputation in solving a huge number of classification problems such as the handwriting recognition. This field is one of the active and hot research problems in the Optical Character Recognition (OCR) domain, it has various applications, such as personality identification, automatic cheque processing in banks, signature verification, form processing, writer’s identification, postal/zip code recognition etc. In this work, we will focus on the offline Arabic handwriting recognition and, as accurate segmentation is one of the nightmares in Arabic cursive scripts, we will adjust a deep network based on MDLSTM to disregard the explicit segmentation stage. In this baseline system, MDLSTM layers are stacked in order to extract meaningful features from the images, this allows an implicit segmentation from a raw data. Then, the 2D data is transformed into a 1D sequence to obtain the character-level transcription of the input image by using the Connectionist Temporal Classification (CTC) method. As in many © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 386–395, 2021. https://doi.org/10.1007/978-3-030-49342-4_37
ReLU to Enhance MDLSTM for Offline Arabic Handwriting Recognition
387
recurrent neural networks, where BBPT algorithm is adopted for training, the vanishing gradient problem can occur mainly the tails of the sigmoidal function get saturated because the derivative will always be near 0. To overcome this issue, we propose to integrate the Rectified Linear Units (ReLUs) in different ways, to draw out the best MDLSTM architecture for the offline Arabic handwriting recognition system, which has been previously tested on the IFN/ENIT corpus [1]. This paper is then organized as follows. Section 2 presents the MDLSTM network and its applications in offline Arabic handwriting recognition. Section 3 describes the ReLU and the way in which it is integrated into the MDLSTM baseline system. In Sect. 4, we report on the experiment results. Finally, Sect. 5 concludes the paper and gives some future perspectives for future research.
2 The MDLSTM Baseline System The LSTM unit architecture [2] consists of a number of cyclically connected memory blocks, each of which contains a set of internal cells, the activation of which is regulated by means of three multiplicative ‘gate’ units. that make the cells store and access information over long time periods., The LSTM model is therefore considered to be one of the most powerful sequence learners on which different derived architectures, such as BLSTM and MDLSTM, are based, as it implements multiplicative logic gates which help store and access relevant information over long intervals. Moreover, it has been shown that the BLSTM, which helps access to a wide-range of bidirectional contexts, provides the state-of-the-art output on online handwriting recognition [3, 4]. On the other hand, the MDLSTM has been shown to offer the state-of - the-art results at offline handwriting recognition [5], which can be extended to n number of dimensions using n recurrent connections with n forget gates. Moreover, the MDLSTM network can represent all the spatio-temporal input data dimensions since many connections can replace a single recurrent connection. In fact, the MDLSTM layers are designed for two-dimensional inputs in this work since input data is 2-D images. The MDLSTM’s main advantage is that it can mask dots, diacritics and strokes in the Arabic script by scanning the image data in either direction, (horizontal and vertical). On the other hand, the baseline system architecture consists of three stacked MDLSTM networks that can help extract useful image features. In fact, each of these three levels, which are separated by two feedforward layers with the tanh activation function (see Fig. 1) hierarchy involves four hidden layers for our two-dimensional data. Additionally, the recurrent connections within the hidden layer plane represent the scanning tapes along which previous points are reviewed from each corner of the input image. The nodes used in these hidden layers are LSTMs, in which the logistic sigmoid is the gate activation function, while the cell input and output functions are both tanh. Therefore, two LSTM blocks are used in the first level, ten in the second and fifty in the third. Besides, size of both feedforward layers that separate the hidden levels are 6 and 20, respectively. Next, the Connectionist temporal classification (CTC) [6] neural network output is added to translate the 2-D data into a 1-D sequence in order to obtain the input image character-level transcription. Moreover, this CTC-based output layer has 121 units with 120 of which present the 120 labels. As a matter of fact, their activation
388
R. Maalej and M. Kherallah
is interpreted as the likelihood of perceiving the matching labels at specific moments while that of the last one is the probability of no or a ‘blank’ observation. Moreover, this output layer can easily decode all the possible ways of aligning to every sequence of labels through the addition of all its possible probabilities. As for the online steepest descent, it is used for 0.9 momentum training and 1e − 4 learning rate. On the other hand, the set of the error measure dimensions, which is used as the early stop on the validation sets, is the label error rate. Then, convergence is achieved when the label error rate, for a given number of iterations, does not drop by more than a threshold for a given number of iterations.
Fig. 1. Architecture of the MDLSTM baseline recognition system [7]
As the MDLSTM network can deal with higher-dimensional data, it has recently won several handwriting recognition competitions [8]. Indeed, several research studies have been conducted to improve the MDLSTM performance, for a better offline Arabic recognition system by applying dropout technique during training [9]. This regularization method prevents our network from overfitting and improves the network performance and significantly reduces the error rate when it is applied before, after or between the LSTM layers [7]. In another work [10], two methods of Maxout [11] incorporation are tested and compared with different Maxout group sizes. First, a Maxout unit is added inside the LSTM units instead of τ the tanh function. The best result is recorded with a group size of 4. Second Maxout units are added in the feedforward layers instead of the sigmoid functions. The performance of this system outdoes the MDLSTM baseline one as the reduction of the error label rate reaches 6.86% to get a better label error rate of 10.11% (Table 1). Although the MDLSTM is a deep feedforward neural network, its training with the Backpropagation Through Time (BPTT) algorithm can suffer from the vanishing gradient problem despite the use of powerful LSTM units [12, 13]. In fact, this vanishing problem arises when the tails of the sigmoidal function (0 or 1) are saturated as the
ReLU to Enhance MDLSTM for Offline Arabic Handwriting Recognition
389
Table 1. Comparison of Label Error Rate of recognition systems based on MDLSTM and applied on IFN/ENIT database Approach
Label Error Rate (LER)
Baseline system: MDLSTM w/CTC 16.97% MDLSTM w/CTC w/dropout [9]
12.09%
MDLSTM w/CTC w/dropout [7]
11.62%
MDLSTM w/Maxout [10]
10.11%
derivative gets closer to 0. Then, during the BPTT algorithm, this derivative near zero will be repeatedly multiplied by our error. In fact, if you multiply a small number less than 1 over and over again, it falls to 0, which weakens our error signal. Therefore, to overcome this issue, we propose the ReLU since they are not much affected by this tail saturation. In fact, when the input drops below zero, the function will produce zero, otherwise the function will mimic the identity one. The ReLUs are added in different locations in this MDLSTM network in order to draw out the best topology that improves the baseline system.
3 ReLU on the MDLSTM 3.1 Definition The Rectified Linear Unit (ReLU) [14] has been found to be more suited nonlinear activation functions for deep networks. Inspired by the developing nonlinear activation functions for DNNs and due to their successful applications on LSTM for speech recognition [15, 16], we discuss these activations functions in MDLSTM for offline Arabic handwriting recognition. A rectified linear function is basically a hinge function which is zero for the negative input values and an identity function otherwise as depicted in (see Fig. 2). The function is extremely fast to compute and has a simple derivative, 0 for the negative input values and 1 otherwise. 3.2 Previous Research Studies In the same context, Glorot et al. [17] showed that ReLU-based networks are as much or even more efficient than the classic ones that are based on tanh or sigmoid activation functions. The ReLU improvement was noticed in spite of the hard-non-linearity and non-differentiability at zero, creating sparse representations that seem remarkably suitable for sparse data. This deduction has been verified for some image classification databases such as MNIST, CIFAR10, NISTP and NORB [18]. Another success was observed in an application of the ReLU by Szegedy et al. who won the 2014 Image Net Large Scale Visual Recognition Challenge thanks to a convolutional neural network (CNN) that they combined with rectified neurons added in nearly 40 layers [19]. In these
390
R. Maalej and M. Kherallah
Fig. 2. Sigmoid, tanh and ReLU functions
previous research studies, the ReLUs proved an efficient improvement for deep neural networks. On the other hand, and according to the state-of-art, we notice that the ReLUs are also efficient when they are integrated into the BLSTM, as shown in Table 2. For their part, Friken et al. proposed to combine the BLSTM and the ReLUs for the unconstrained handwriting recognition task. The experimental evaluation achieved with the IAM database showed an increase in this recognition system performance since the word error rate dropped from 27.11% to 24.99% i.e. a 2.12% reduction. These enhancements were achieved by adding multiple ReLU layers between the BLSTM network ones [20]. Additionally, the ReLUs have been more successful in acoustic models for the large vocabulary continuous speech recognition system [21]. Indeed, for the Hub5’00-SWB database, the Word Error Rate (WER) decreased by 0.7% when multiple ReLU layers were integrated over two BLSTM layers. However, in the same work, and for the RT03S database, the decrease of the word error rate reached 0.8% in a BLSTM with a single hidden layer, while in the BLSTM with two hidden layers, it reached 1.2%. In [22], we found that the ReLU reduces the label error rate of the online Arabic handwriting recognition system based on the BLSTM by not more than 1.52%. In this same work, the same architecture was tested with dropout applied during training. This last topology achieved a better recognition rate and an improvement of the label error rate by 8.51%. Table 2. Error recognition rate reduced with ReLU Authors
BLSTM
Dataset
Reduction
Frinken and Uchida [20]
2
IAM
2.12%
Luo et al. [21]
2
Hub5’00 0.70%
Luo et al. [21]
2
RT03S
1.20%
Maalej and Kherallah [22] BLSTM (1) ADAB
1.52%
Maalej and Kherallah [22] BLSTM (2) ADAB
8.51%
ReLU to Enhance MDLSTM for Offline Arabic Handwriting Recognition
391
In this work, we tried to check the ReLUs performance in enhancing the MDLSTM network for the offline Arabic handwriting recognition system, on the one hand and to find out their best architectures, on the other. 3.3 Architecture To improve the offline Arabic handwriting recognition system based on the MDLSTM, we added the ReLU layers between the input and output layers. Indeed, by stacking several ReLU layers with the MDLSTM ones, we could observe the behavior of the learning and the accuracy of the recognition system. The number of possibilities in extending our deep network with other layers increases exponentially with its depth therefore, we focused on three types of extensions, as shown in Fig. 3. The first type of extension (position 1) results from adding directional ReLU layers between the input and the LSTM layers. The second extension type consists in adding the directional ReLU layers between the LSTM layers in the feedforward layers (position 2), whereas the last type is generated by adding the stationary ReLU layers before the output one (position 3).
Fig. 3. Different locations of ReLU layers on based-MDLSTM recognition system.
4 Experimental Results The IFN/ENIT Database [1] contains 32492 images of Arabic words written by more than 1000 writers, which are used to validate different proposed architectures. Those
392
R. Maalej and M. Kherallah Table 3. The IFN/ENIT dataset Sets Words Characters a
6537
51984
b
6710
53862
c
6477
52155
d
6735
54166
e
6033
45169
words are the names of 937 Tunisian towns and villages. The IFN/ENIT database is divided in 5 sets (see Table 3). To compare our proposed architectures with those of other systems we chose the same circumstances. In fact, set a, b and c are used for training while set d and e are used for testing. The IFN/ENIT database was triumphantly exploited by several research groups as it was used for offline Arabic handwriting recognition competition in ICDAR 2009 [8]. 4.1 ReLU on MDLSTM-Based System Combining ReLU and MDLSTM, we obtain the results shown in Table 4. We notice that ReLU reduces the label error rate and the best location of these ReLU layers is found to be between the hidden LSTM layers. Table 4. Label Error rate with ReLU and MDLSTM on IFN/ENIT database. System
Label Error Rate Reduction
MDLSTM baseline system
16.97%
–
ReLU before MDLSTM
16.40%
0.57%
ReLU after MDLSTM
16.12%
0.85%
ReLU between LSTM layers 15.17%
1.80%
Based on these results and despite the improved accuracy of our recognition system based on MDLSTM and ReLU, the results of the MDLSTM network regularized by the dropout technique remain better [7]. Hence the idea of combining the dropout to prevent this network from overfitting and the ReLU to avoid the problem of vanishing gradient can be seen as a successful option. 4.2 ReLU and Dropout on MDLSTM-Based System Dropout [23] is a technique that avoids overfitting in deep and recurrent neural networks. It is very effective in the field of machine learning. This technique consists in temporarily
ReLU to Enhance MDLSTM for Offline Arabic Handwriting Recognition
393
removing some units from the network. Those removed units are randomly selected only during the training stage. This regularization method improves network performance and reduces the error rate. According to the experiments carried out and mentioned in [7], the best way to apply dropout is when it is placed before the hidden layers of the MDLSTM and with a rate of 0.5. By applying this technique to our system based on ReLU layers between MDLSTM layers, we obtain the results displayed in the Table 5. Table 5. Label Error rate with Dropout, ReLU and MDLSTM on IFN/ENIT database. System
Label Error Rate Reduction
MDLSTM baseline system
16.97%
MDLSTM w\ dropout (0.5) [7]
–
11.62%
5.35%
ReLU layers between MDLSTM 15.17%
1.80%
MDLSTM w\ dropout w\ ReLU
5.57%
11.40%
Table 6. Comparison of offline Arabic handwriting recognition systems trained and tested with IFN/ENIT database. Authors
Networks
Label Error Rate
Jayech et al. [24]
Dynamic Bayesian Network
18.00%
Maalej et al. [7]
MDLSTM w/CTC w/dropout
11.62%
Amrouch et al. [25]
CNN-HMM
10.77%
Maalej et al. [26]
CNN-BLSTM
07.79%
Maalej et al. [10]
MDLSTM w/Maxout
10.11%
Present work
MDLSTM w/dropout w/ReLU
11.40%
From obtained results, we find that the combination of both dropout and ReLU provides a significant improvement for the offline Arabic handwriting recognition system based on MDLSTM. Indeed, the label error rate is decreased to 11.40% achieving a significant 5.57% reduction. On the other hand, for the system that combines only the MDLSTM and the ReLU, the reduction in the label error rate does not exceed 1.8%. 4.3 Experimental Comparison and Analyses As illustrated in Table 6, we found that our solution for offline Arabic handwriting recognition, which is based on a deep learning architecture combining MDLSTM and ReLU units, gives a competitive result against the other systems. In fact, the label error rate of a recognition system based on the Dynamic Bayesian Network reached 18% [24], while the one based on the MDLSTM with CTC and on which dropout is applied was
394
R. Maalej and M. Kherallah
11.62%. However, for a hybrid recognition system with CNN and HMM, the label error rate is 10.77%. Although for a hybrid system based on CNN-BLSTM and trained with an extended database created by the data augmentation techniques applied on the IFN/ENIT database, has the best label error rate estimated at 7.79% and for the MDLSTM-based system with Maxout units the label error rate reaches 10.11%. While, the label error rate recorded in the present work, in which the recognition system is based on ReLU dropout and MDLSTM, reaches 11.40%.
5 Conclusion In this paper, we have proposed a powerful offline Arabic handwriting recognizer based on MDLSTM, ReLU and dropout. In order to draw out the best Relu-MDLSTM topology, we have combined the MDLSTM and the ReLU with different modes by adding directional ReLU layers firstly, between the input layer and the LSTM ones and secondly, between the LSTM layers and thirdly by adding stationary ReLU layers before the output one. In conclusion, and based on the experimental results, we can say that the best location of these ReLU layers is between the hidden LSTM ones. In fact, despite the improved accuracy of this recognition system, the results of the MDLSTM network regularized by the dropout technique are better, therefore, we had better combine both dropout first, to prevent this network from overfitting and make the ReLU avoid the vanishing gradient problem. Moreover, this architecture showed a significant improvement for the offline Arabic handwriting recognition system. Indeed, the label error rate dropped to 11.40% to reach a reduction of 5.57%. To improve the accuracy of the baseline recognition system based on MDLSTM, we propose, for a future research study, to increase the depth of the MDLSTM network by adding other types of units, such the leaky ReLU and Sof-Maxout and then, extending the IFN/ENIT database by applying several data augmentation techniques, which is more suitable for the deep learning architecture.
References 1. Pechwitz, M., Maddouri, S.S., Märgner, V., et al.: IFN/ENIT-database of handwritten Arabic words. In: Proceedings of CIFED. Citeseer, pp. 127–136 (2002) 2. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997) 3. Graves, A., Liwicki, M., Bunke, H., Schmidhuber, J., Fernández, S.: Unconstrained on-line handwriting recognition with recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 577–584 (2008) 4. Maalej, R., Tagougui, N., Kherallah, M.: Online Arabic handwriting recognition with dropout applied in deep recurrent neural networks. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 417–421. IEEE (2016) 5. Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 545–552 (2009) 6. Graves, A., Fernández, S., Gomez, F., et al.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)
ReLU to Enhance MDLSTM for Offline Arabic Handwriting Recognition
395
7. Maalej, R., Kherallah, M.: Improving MDLSTM for offline Arabic handwriting recognition using dropout at different positions. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 431–438. Springer, Cham (2016) 8. Märgner, V., El Abed, H.: ICDAR 2009-Arabic handwriting recognition competition, vol. 14, pp. 3–13 (2011) 9. Maalej, R., Tagougui, N., Kherallah, M.: Recognition of handwritten Arabic words with dropout applied in MDLSTM. In: International Conference Image Analysis and Recognition, pp. 746–752. Springer (2016) 10. Maalej, R., Kherallah, M.: Maxout into MDLSTM for offline Arabic handwriting recognition. In: International Conference on Neural Information Processing, ICONIP 2019, pp. 534–545. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36718-3_45 11. Goodfellow, I.J., Warde-Farley, D., Mirza, M., et al.: Maxout networks. arXiv Prepr arXiv: 13024389 (2013) 12. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013) 13. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5, 157–166 (1994) 14. Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8609–8613 (2013) 15. Cai, M., Liu, J.: Maxout neurons for deep convolutional and LSTM neural networks in speech recognition. Speech Commun. 77, 53–64 (2016) 16. Li, X., Wu, X.: Improving long short-term memory networks using maxout units for large vocabulary speech recognition. In: ICASSP-2015, pp. 4600–4604 (2015) 17. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011) 18. Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015) 19. Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) 20. Frinken, V., Uchida, S.: Deep BLSTM neural networks for unconstrained continuous handwritten text recognition. In: 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 911–915. IEEE (2015) 21. Luo, Y., Liu, Y., Zhang, Y., et al.: Maxout neurons based deep bidirectional LSTM for acoustic modeling. In: International Conference on Robotics and Biomimetics (ROBIO), pp. 1599– 1604. IEEE (2017) 22. Maalej, R., Kherallah, M.: Improving the DBLSTM for on-line Arabic handwriting recognition. Multimed. Tools Appl. 79, 17969–17990 (2020). https://doi.org/10.1007/s11042-02008740-w 23. Hinton, G.E., Srivastava, N., Krizhevsky, A., et al.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv Prepr arXiv:12070580 (2012) 24. Jayech, K., Mahjoub, M., Ben, Amara N.: Arabic handwritten word recognition based on dynamic Bayesian network. Int. Arab J. Inf. Technol. 13(6B), 1024–1031 (2016) 25. Amrouch, M., Rabi, M., Es-Saady, Y.: Convolutional feature learning and CNN based HMM for Arabic handwriting recognition, pp. 265–274 (2018) 26. Maalej, R., Kherallah, M.: Convolutional neural network and BLSTM for offline Arabic handwriting recognition. In: International Arab Conference on Information Technology (ACIT), pp. 1–6. IEEE (2018)
Histogram Based Method for Unsupervised Meeting Speech Summarization Nouha Dammak1,2(B) and Yassine BenAyed1 1 Multimedia InfoRmation Systems and Advanced Computing Laboratory (MIRACL),
3021 Sfax, Tunisia [email protected], [email protected] 2 Higher Institute of Computer Sciences and Communication Techniques, University of Sousse, Sousse, Tunisia
Abstract. The appearance of various platforms such as YouTube, Dailymotion and Google Video has a major role in the increasing of the number of videos available on the Internet. For example, more than 15000 video sequences are seen every day on Dailymotion. Consequently, the huge gathered amount of data constitutes a big scientific challenge for managing the underlying knowledge. Particularly, data summarization aims to extract concise abstracts from different types of documents. In the context of this paper, we are interested in summarizing meetings’ data. As the quality of video analyzing’s output highly depends on the type of data, we propose to establish our own framework for this end. The main goal of our study is to use textual data extracted from Automatic Speech Recognition (ASR) transcriptions of the AMI corpus to give a fully unsupervised summarized version of meeting sequences. Our contribution, called Weighted Histogram for ASR Transcriptions (WHASRT), adopts an extractive, free of annotations and dictionary-based approach. An exhaustive comparative study demonstrates that our method ensured competitive results with the ranking-based methods. The experimental results showed an enhanced performance over the existing clustering-based methods. Keywords: Summarization · Unsupervised · Transcription · Automatic Speech Recognition · Meetings · Natural Language Processing
1 Introduction Most people spent a lot of their time in meetings. Once they finish, it turns very significant to produce rendering reports, citing the main issues discussed at the meeting, such as the problems encountered and the decisions made. It is now possible to record and store a meeting even in audio format or video format. Several existing tools, which are embodied in a context known as «Speech-to-text», generate rate text transcriptions listing what has been said during the meeting period. An important issue is then to be able to extract automatically from these textual transcriptions, often very noisy, topics and summaries leading to the creation of the meetings reports’. In this paper, we furnish a fully non-supervised extractive text summarization system and we check its effects on © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 396–405, 2021. https://doi.org/10.1007/978-3-030-49342-4_38
Histogram Based Method for Unsupervised Meeting Speech Summarization
397
an automatic meeting speech summarization transcriptions database. Making in minds that summarizing spontaneous meeting speech text is a very hard task in Natural Language Processing (NLP) domain [1]. As, all grammatical and well-structured sentences of the ordinary documents are replaced by utterances or sentences fragments’ of speech transcriptions. The input data suffer from additional noise due to its structure of utterances broadcasted by many speakers customarily hesitating, which can be detected by the appearance of many filler words like “um”, “euh”, etc. The contributors are interrupting each other and producing unrelated chit-chat. Moreover, ASR outputs inserts more additional noise on the input data, which make the summarization task more complex. However, we are impellent to use ASR outputs since human transcriptions are very costly. This paper proceeds as follow. We begin with several descriptions of most popular and relevant works cited in the literature. Next, we detail a weighted histogram based model for unsupervised meeting summarization task. Our experiments show the WHASRT results’, along with four baseline systems. We conclude with a discussion and some suggestions for future work.
2 Related Work Speech summarization based on its Automatic Speech Recognition (ASR) is the process of shortening an original meeting document and distilling the salient information from it. Let us take the truth as absurd; we all know that humans are better on understanding meanings from documents and synthetizing summaries. However, automatic methods are more usable, cheaper and even faster. Automated speech summarization is a long-standing research area in Natural Language Processing [2–4]. Speech summarization methods such as [5] and [6] can be mainly categorized into two different types: (a) extractive, where important utterances are directly selected from the input meeting transcript to produce the summary and (b) abstractive, where the main ideas of the input meeting transcript are reformulated like a human generated belief to form finally a coherent summary. Nowadays, most researches focus on extractive summarization, as abstractive one seems costly [4]. However, extractive summarization of multi-party conversations becomes a heavy challenging problem due to the informal and disfluency nature of the meeting transcripts with overlapping speakers [1]. The latter had been applied to several domains relating to news articles [7] and meeting records [8]. Moreover, extractive summarizer approaches tailored to meetings conversations widely differ from techniques used in other domains such as document summarization on DUC (Document Understanding Conference) corpora [9, 10]. Afterwards, feature based researches are adopted in meeting summarization. In 2008, Xie and al [11] attempted to use several features like lexical and structural ones. In this manuscript, we focus on meeting summarization task on the AMI corpus [12]. Inspired by the greedy extractive algorithm in unsupervised methods [8], we propose a weighted histogram based approach for scoring sentences and building dictionary. Our main contribution is the novel coverage reward term of the objective function optimized by our proposed WHASRT algorithm.
398
N. Dammak and Y. BenAyed
3 Proposed System Our system can be composed of four modules shown in Fig. 1. ASR transcripts
Text pre-processing
Histogram building Keywords extraction
Dictionary creating Extractive summary
Sentence scoring
Fig. 1. Overall system process flow.
3.1 Text Pre-processing The fully unsupervised nature of our system makes it more flexible through the possibility of fitting it to several languages. This task needs only few changes in the pre-processing step without any deeper changes over the overarching model. In our paper, we are interested on the English language only. The pre-processing task consists on cleaning the data in order to obtain all surviving words for the next phase. Therefore, all words are lowercased and then putted on their stemmed or basic format. Also, all extra spaces are replaced with single space and punctuation is removed, too. We have discarded some specific flags incorporated, in the AMI dataset, by the ASR system and indicating some extra information like {vocalsound}, {gap}, {disfmarker}, {comment}, {pause}, etc. Moreover, we try to eliminate from the cleaned data custom stopwords and filler words learned from the development set of the AMI corpus and specific for the speech meetings field. 3.2 Histogram Building Summarizing a speech meeting set is the process of shortening the text part of a document, in order to create a brief description of the major points cited previously. An abstractive explication of this problem is that the main idea of summarization is to find a subset of clean data, which incorporates the “knowledge” of the entire corpora. For this reason, we have developed a system based on building an histogram of weights that focuses on most relevant subset. The process begins through creating an empty dictionary ready to host unique terms of the entire set. Then, a paramount tokenization step is required in order to break down the clean text into single words (unigrams) [13]. In this research, we try to divide our data only into single words. After checking the non-presence of the token nor in the stop words set, neither in the filler words gang, neither it already occurs in the dictionary under constraint, it will be automatically added to the current and updated dictionary. This way, all the tokens (the unigrams) are ranked using Term Frequency-
Histogram Based Method for Unsupervised Meeting Speech Summarization
399
Inverse Sentence Frequency (TF-ISF) metric described by the following equations (Eq. 1, 2 and 3): tf (wi ) =
TC({doc : wi ∈ doc}) |# W in doc|
(1)
Where TC denotes Term Count. It represents sentences sent as vectors of which wi . elements are the absolute frequency of words wi in the document doc and W = # of sent isf (wi ) = log (2) # sent with word wi Our Weighted Histogram for ASR Transcriptions (WHASRT) approach is a centroidbased algorithm, which computes the scores of words in the transcript. We define the weight of a word wi as: weight(wi ) = tf (wi ) ∗ isf (wi )
(3)
As detailed in Algorithm 1, the weighted histogram building is obtained by implementing a process that iteratively append new words or tokens wi to the currently existing dictionary based on a weighting scheme [14]. The prominent power of the proposed algorithm is its opposite aspect of greedy algorithms in time consuming. So that, it belongs to the polynomial class of problems and it requires θ (nW ) as time complexity. • Algorithm 1: weighted histogram building Input: Clean Text CT, Word in CT Output: Weighted Histogram, W_Hist Dictionary ← Ø for Word in CT if Word not in StopWords if Word not in Dictionary Dictionary ← Dictionary + Word else Dictionary ← Dictionary end if end if end for in Dictionary for W_Hist( ) ← weight( ) end for
3.3 Sentence Scoring Regarding the weighted histogram built previously, we can automatically detect a subset of the well-ranked utterances. The score of each sentence is computed according to the weights. Our main goal is to calculate the degree of informativeness of the document’s
400
N. Dammak and Y. BenAyed
sentences. Our WHASRT method is based on the assumption verified empirically. Thirty words were fixed as a maximal number of utterances for each sentence. This way, we obtain all scores for the overall document’s sentences. Equation 4 demonstrates the sentence scoring step of our proposed approach: score sentj =
1 ∗ weight(wi ) (# of sentences in doc) w ∈ sent i
(4)
j
3.4 Dictionary Creating and Extractive Summarizing We use an objective function F capturing the desirable properties in a summary, which have been formalized in the state of the art as relevance and non-redundancy. F relies on a coverage metric C of the form presented in Eq. 5 in order to collect candidates’ summaries, which are selected from the primary utterances: (5) C(S) = max score sentj j∈S
The coverage metric C assures the fulfillment of two requirements, which are submodularity and monotonicity [15]. In an abstractive explication of this issue, Eq. 5 shows that the computed coverage of a candidate summary S is defined by the maximum weighted sentences it includes. For clarity and consistency, submodular functions acquire at the same time the property and the advantage of being optimizable. Furthermore, they are ubiquitous and have been applied in many real word bugs in artificial intelligence and machine learning [16] including feature selection [17], sensor placement [18] and even automatic summarization [15]. In this section, we detail how the concept of submodularity is usable when dealing with extractive summarization. • Submodularity: a submodular function is a set of function: 2S → R, where S = {s1 , s2 , . . . , sn } and it satisfies a diminishing returns property [19]: ∀X ⊆ Y ⊆ S\s, F(X ∪ s) − F(X ) ≥ F(Y ∪ s) − F(Y )
(6)
F has the property that the difference in the incremental value of the function that a single element makes when added to an input set decreases as the size of the input set increases [15]. For clarity and consistency, if F detects the quality of the summary, the diminishing returns property measures the fulfilled gain obtained by adding a new sentence to a candidate summary. This rise must exceed the gain of appending the same sentence to a larger summary including the smaller one. • Monotonocity: a submodular function F is monotone, non decreasing if: ∀X ⊆ Y , F(X ) ≤ F(B)
(7)
So, when the summary grows in size, its quality can only boost or remains constant.
Histogram Based Method for Unsupervised Meeting Speech Summarization
401
4 Experimental Setup 4.1 Dataset We tested our approach on ASR output of one standard dataset, which is the AMI corpora [12, 20]. This database is well known in the field of meeting speech summarization. The AMI meeting corpus contains 100 h of meetings captured using many synchronized recording devices, and is designed to support work in speech and video processing, language engineering, corpus linguistics, and organizational psychology. It has been transcribed orthographically, with annotated subsets for everything from named entities, dialogue acts, and summaries to simple gaze and head movement. The AMI corpora contains ASR transcripts for 137 meetings spoken by 4 participants. The average duration of each meeting is 35 min including over 800 unprocessed utterances, so almost 6700 words. Every meeting processes one human-written abstractive summary. We utilized the test set including 20 meetings adopted by [21] and [8]. 4.2 Evaluation The generated extractive summaries picked up throw our system or the subsequently detailed baselines were compared against the human abstractive summaries in order to stay in the same wavelength as the anterior works. In the evaluation step of our work, we use the ROUGE metric. ROUGE is the acronym of Recall-Oriented Understudy for Gisting Evaluation [22]. It serves for a standard way in the evaluation phase of automatic text summarization. It includes several measures, based on n-gram overlapping, in order to automatically check the quality of the candidate summary by comparing it against the human created summary or even summaries. Therefore, we attempt that ROUGE-1 and ROUGE-L are suitable with extractive summarization and especially with fairly verbose systems and references summaries [22]. In particular, ROUGE-1 defines the overlap of unigrams between the candidate summary and the reference one. However, ROUGE-L refers to the longest matching sequence of words using the Longest Common Subsequence (LCS). The major advantage of utilizing LCS is that we do not need to predefine n-grams length since it only depends on the in-sequence matches and does not require any consecutive matches. To compute ROUGE measures, we calculate actually the recall and the precision in ROUGE context. In fact, the recall refers to how much the human summary is recovered by the candidate summary. However, the precision indicates how much the system summary was relevant. If we only consider the unigrams, the recall and the precision will be computed as the following equations (Eq. 8 and 9): # overlapping W # W in ref summary
(8)
# overlapping W # W in candidate summary
(9)
R = P =
Afterwards, we report the Fscore based on the recall and the precision (Eq. 10): Fscore = 2 ∗
R∗P R+P
(10)
402
N. Dammak and Y. BenAyed
In our work, we compute ROUGE scores for every meeting transcript present in the test set and then we attempt to use the macro-averaging method in order to produce an overall score suitable for our WHASRT method. 4.3 Baseline Systems We benchmarked the performance of our proposed system WHASRT against four competitor baselines [23]. • Random: This system is included based on the best practice recommendation of Riedhammer and al [21]. In this implementation, utterances/sentences are selected randomly until a fixed budget is satisfied. • Longest greedy: This baseline greedily picks the longest utterance/sentence until the length constraint or budget is satisfied [21]. • TextRank: is a graph-based system for sentence extraction [24]. In the whole graph, every node designs a single utterance/sentence. Between two nodes exists an undirected and weighted edge computed based on lexical similarity (Eq. 11):
wk : wk ∈ Si ∩ wk ∈ Sj Similarity Si , Sj = log(|Si |) + log Sj
(11)
Where, Si and Sj are two sentences and represented as a set of words: S = w1 , w2 , . . . , w|S| . For the case of spontaneous speech, it is useful to operate with full utterances/sentences of each speaker as a cluster rather than individual ones. In a second pass, PageRank [25] is applied. Then, the well-ranked nodes are highlighted in order to construct the summary.
• ClusterRank: is an extension of TextRank suitable for meeting summarization. In this approach, authors proposed to present text as a directed graph [26]. Each node of the graph represents one separate cluster. The similarity between clusters is computed using cosine similarity of words they contain and taking profit of the node centroid. Afterwards, each utterance/sentence is assigned a score based on the weighted PageRank of the node it belongs to and its cosine similarity with the node centroid. Finally, the utterances/sentences which gain the highest scores are selected in the output summary, especially if they differ enough from it.
5 Results We consult the cost of a sentence, in a candidate summary, to be the number of words it includes. However, we consider the budget to be the maximum size permitted for a given summary, also computed in number of words. For each meeting, we generate an extractive summary that gratifies a budget constraint satisfied in terms of number of words ranging from 20 to 300 words on the AMI corpora.
Histogram Based Method for Unsupervised Meeting Speech Summarization
403
0.2 0.1
Accuracy
0.3
Results, which encompass the recall, the precision and the F_1score according to ROUGE-1 metric, are shown in Fig. 2. Moreover, Table 1 provides detailed comparisons corresponding to the best performances achieved in the literature. However, Table 2 details the outperformance of ROUGE-L against ROUGE-1 metrics since ROUGE-L focuses on the longest subsequence.
20
40
60
80
100 120 140 160 180 200 220 240 260 280 300
Summary size (# of words)
ROUGE-1 F-1 score
ROUHGE-1 Recall
ROUGE-1 Precision
Fig. 2. ROUGE-1 scores representation for various budgets on AMI dataset.
Table 1. Macro-averaged ROUGE-1 scores on the AMI test set (20 meetings). System
Recall Precision F_1score
WHASRT
31.00
29.52
30.24
TextRank
34.33
28.66
30.82
ClusterRank
33.87
28.18
30.35
Longest greedy 32.61
27.47
29.41
Random
26.05
27.95
31.06
Table 2. A comparison of macro-averaged ROUGE-1 and ROUGE-L scores on the AMI test set (20 meetings) along with our WHASRT model. System
ROUGE-1
WHASRT R-1
P-1
ROUGE-L F-1
R-L
P-L
F-L
31.00 29.52 30.24 35.64 34.22 34.92
Our approach significantly outperforms Random and Longest greedy baselines on AMI corpora. Besides, we obtain very competitive results even with clusterRank or
404
N. Dammak and Y. BenAyed
TextRank. Beyond the shadow of a doubt, the generated extractive candidate summaries are compared against the abstractive summaries freely written by human annotators especially while they try to incorporate their own words and expressions into them. This characteristic lets the summarization task hardly and makes it unthinkable for extractive summaries to attain perfect scores. This is on account of abstractive summarization that integrates expressions which are not indented over the meetings.
6 Conclusion In the current work, we present a fully unsupervised extractive system dealing with spontaneous speech summarization task in a greedy way with near competitive performance guarantees. The main contribution of our WHASRT model makes stress on the coverage metric of our useful objective function, which makes it simply optimizable by the greedy algorithm. The evaluation subsequence of this paper demonstrates that our WHASRT system reaches the state of the art in meetings’ extractive performance. Our model proves its effectiveness especially when dealing with noisy texts even derived from ASR outputs. In future experiments, we emphasis to work on longer duration and non-scenario meetings.
References 1. McKeown, K., Hirschberg, J., Galley, M., Maskey, S.: From text to speech summarization. In: 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005), vol. 5, pp. v/997–v1000 (2005) 2. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York (1998) 3. Murray, G., Hsueh, P.-Y., Tucker, S., Kilgour, J., Carletta, J., Moore, J.D., Renals, S.: Automatic segmentation and summarization of meeting speech. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), Rochester, New York (2007) 4. Shang, G., Ding, W., Zhang, Z., Tixier, A., Meladianos, P., Vazirgiannis, M., Lorré, J.-P.: Unsupervised abstractive meeting summarization with multi-sentence compression and budgeted submodular maximization. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne (2018) 5. Yamamura, T., Shimada, K.: Annotation and analysis of extractive summaries for the Kyutech corpus. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018), Miyazaki (2018) 6. Li, M., Zhang, L., Ji, H., Radke, R.J.: Keep meeting summaries on topic: abstractive multimodal meeting summarization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence (2019) 7. Nallapati, R., Zhou, B., Santos, C., Gu̇lçehre, Ç., Xiang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, Berlin (2016) 8. Tixier, A., Meladianos, P., Vazirgiannis, M.: Combining graph degeneracy and submodularity for unsupervised extractive summarization. In: Proceedings of the Workshop on New Frontiers in Summarization, Copenhagen (2017)
Histogram Based Method for Unsupervised Meeting Speech Summarization
405
9. Document Understanding Conferences. https://duc.nist.gov/ 10. Moratanch, N., Chitrakala, S.: A survey on extractive text summarization. In: International Conference on Computer, Communication and Signal Processing (ICCCSP) (2017) 11. Xie, S., Liu, Y.: Using corpus and knowledge-based similarity measure in Maximum Marginal Relevance for meeting summarization. In: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4985–4988 (2008) 12. Mccowan, I., Lathoud, G., Lincoln, M., Lisowska, A., Post, W., Reidsma, D., Wellner, P.: The AMI meeting corpus. In: Noldus, L.P.J.J., Grieco, F., Loijens, L.W.S., Zimmerman, P.H. (eds.) Proceedings of Measuring Behavior 2005, 5th International Conference on Methods and Techniques in Behavioral Research, Noldus Information Technology, Wageningen (2005) 13. Tixier, A.J.-P., Skianis, K., Vazirgiannis, M.: GoWvis: a web application for graph-of-wordsbased text visualization and summarization. In: Annual Meeting of the Association for Computational Linguistics (ACL) (2016) 14. Meladianos, P., Tixier, A., Nikolentzos, I., Vazirgiannis, M.: Real-time keyword extraction from conversations. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers (2017) 15. Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, Stroudsburg (2011) 16. Tschiatschek, S., Iyer, R., Wei, H., Bilmes, J.: Learning mixtures of submodular functions for image collection summarization. In: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1, Cambridge (2014) 17. Zheng, J., Jiang, Z., Chellappa, R., Phillips, J.P.: Submodular attribute selection for action recognition in video. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 1341–1349 Curran Associates, Inc. (2014) 18. Krause, A., Leskovec, J., Guestrin, C., Vanbriesen, J., Faloutsos, C.: Efficient sensor placement optimization for securing large water distribution networks. J. Water Resour. Plann. Manag. 134(6), 516–526 (2008) 19. Krause, A., Golovin, D.: Submodular function maximization. In: Bordeaux, L., Hamadi, Y., Kohli, P. (eds.) Tractability, pp. 71–104. Cambridge University Press (2014) 20. AMI corpus. http://groups.inf.ed.ac.uk/ami/corpus/ 21. Riedhammer, K., Gillick, D., Favre, B., Hakkani-Tür, D.: Packing the meeting summarization knapsack. In: Ninth Annual Conference of the International Speech Communication Association (2008) 22. Lin, C.-Y.: ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out: Proceedings of the ACL-04 Workshop, vol. 8 (2004) 23. Hong, K., Conroy, J., Favre, B., Kulesza, A., Lin, H., Nenkova, A.: A repository of state of the art and competitive baseline summaries for generic news summarization. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik (2014) 24. Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (2004) 25. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998) 26. Garg, N., Favre, B., Reidhammer, K., Hakkani-Tür, D.: ClusterRank: a graph based method for meeting summarization. In: Tenth Annual Conference of the International Speech Communication Association (2009)
Deep Support Vector Machines for Speech Emotion Recognition Hadhami Aouani1,2 and Yassine Ben Ayed2(B) 1 National School of Engineers, ENIS University of Sfax, Sfax, Tunisia
[email protected] 2 Multimedia InfoRmation Systems and Advanced Computing Laboratory, MIRACL University
of Sfax, Sfax, Tunisia [email protected]
Abstract. Speech Emotions recognition has become the active research theme in speech processing and in applications based on human-machine interaction. In this work, our system is a two-stage approach, namely feature extraction and classification engine. Firstly, two sets of feature are investigated which are: the first one is extracting only 13 Mel-frequency Cepstral Coefficient (MFCC) from emotional speech samples and the second one is applying features fusions between the three features: zero crossing rate (ZCR), Teager Energy Operator (TEO), and Harmonic to Noise Rate (HNR) and MFCC features. Secondly, we use two types of classification techniques which are: the Support Vector Machines (SVM) and the k-Nearest Neighbor (k-NN) to show the performance between them. Besides that, we investigate the importance of the recent advances in machine learning including the deep kernel learning. A large set of experiments are conducted on Surrey Audio-Visual Expressed Emotion (SAVEE) dataset for seven emotions. The results of our experiments showed given good accuracy compared with the previous studies. Keywords: Emotion recognition · MFCC · ZCR · TEO · HNR · KNN · SVM · Deep SVM
1 Introduction Emotions color our language and it can make its meaning more complex. The listener interacts with the emotional state of the speaker and adapts his behavior to any kind of emotion transmitted by the speaker. Speech recognition of emotions (SER) aims to identify the emotional or physical state of a human being from his or her voice. This is a relatively recent research topic in the field of speech processing. An automatic emotion detection system is a system able of detecting the speech of the speech signal and analyzing it to recognize the emotion of an unknown speaker. It can be employed in various applications. These include psychiatric diagnosis, smart toys, lie detection, smart call center, educational software, and so forth [1]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 406–415, 2021. https://doi.org/10.1007/978-3-030-49342-4_39
Deep Support Vector Machines for Speech Emotion Recognition
407
SER can be realized using automatic learning methods including the extraction and classification of vocal functions [2]. For a better generalization, the characteristics must be well defined [3]. This document deals with the extraction of feasible characteristics to indicate whether it is doable to offer a good precision to the SER and to determine the relevant characteristics. Different classification techniques namely support vector machines (SVM), k-nearest neighbor (k-NN) and the use of Deep SVM methods. The rest of this article is planned as follows: Sect. 2 presents the literature survey. Section 3 describes the architecture of our system. Section 4 presents the results of the experimental and comparative studies. The results of the SVM method will be compared to the KNN method, as well as the Deep SVM methods. Finally, in Sect. 5, we present some conclusions.
2 Related Work The acoustic features principally classified as prosody, spectral and voice quality features [4]. Features such as energy, pitch, and Zero Crossing Rate (ZCR) are considered prosodic features, and Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficient (MFCC) are considered spectral characteristics. Lots researchers have done different models of speech recognition using different combinations of features. Ya Li et al. [5] extract two categories of characteristics which are: the audio characteristics obtaining by the software ‘OpenSmile’ such as energy level and low spectral descriptors, the sum of the auditory spectrum, the slope, MFCC, the spectral flow and the triggering of low-level descriptors that are a fundamental frequency F0, Formant (F1, F2, F3). And the video features like membership, shape characteristics and face detection by the tracking algorithm (Viola and Jaunes) using the Random Forest classification method (RF) to identify the eight emotions of the base Chinese Natural Audio-Visual Emotion database (CHEAVD) for multimodal recognition [11]. Song et al. [6] propose a new method of non-negative matrix transfer factorization (TNMF) use the two databases the first is Berlin database that contains seven emotions: Anger, boredom, disgust, fear, happiness, sadness and neutrality. The second eNTERFACE’05 with six emotions Anger, disgust, fear, happiness, sadness and surprise. In the work presented by Papakostas et al. [7] aimed to analyze the emotions of the speakers on the basis of paralinguistic information. They exploit two automatic learning approaches: a support vector machine (SVM), consisting of a set of 34 extracted features, and a convolutional neuron network (CNN). The datasets used were EMOVO, SAVEE and EMOtional speech-DataBase (EMO-DB). The emotions represented from the EMOVO and SAVEE datasets were the six basic emotions. Seven emotions used at EMO-DB are disgust, anger, joy, fear, sadness, boredom and neutrality. Ramdinmawii et al. extracted the features such as energy E0, zero crossing rate (ZCR), formant (F1, F2, F3) and fundamental frequency F0, which serve to detect the four types of emotions (joy, fear, anger and neutral) of two German databases (German Emotion Database) and Telugu (Telugu Emotion Database) [8].
408
H. Aouani and Y. Ben Ayed
Shi uses the Multimodal Emotion Recognition Competition Database (MEC2017), collected in movies and television, to rank eight emotions (neutral, pissed, sad, happy, anxious, worried, surprised and ashamed) with two classifiers which are the Vector Support Machines (SVM) and Artificial Neural Networks (ANN) using two categories of characteristics: the original characteristics which are the Mel frequency cepstral coefficient, the fundamental frequency, the length of the audio, the length of silence in the audio sequence and their mean values, and new features extracted from deep belief networks (DBNs). The results of the classification of the new feature are greater than the original functionality by at least 5%. It can also be noted that the result obtained with DBN-SVM was slightly improved compared to the result obtained with DBN-DNN, which is due to a better classification capacity in a small sample [9]. Siddique Latif et al. [10] made use of transfer learning technique to improve the performance of SER systems by using the eGeMAPS feature set containing 88 features. Evaluations of five different data sets in three different languages reveal that the Deep Belief Network (DBN) offers greater precision than previous approaches. The main contribution to this work is to use two conventional classification methods such as SVM and KNN, and to compare the results. We have proposed a deep learning, this is the deep SVM using the SAVEE database.
3 Proposed System In this work, we propose a system that addresses speech recognition and its objectives improving outcomes using MFCC and ZCR, TEO and Harmonic to Noise Ratio (HNR) features using two classifiers SVM and KNN firstly and secondly the use of deep learning. The proposed architecture is shown below Fig. 1.
Train Data
12MFCC + Energy
Test Data
Features extraction (ZCR, TEO, HNR)
Features fusion
12MFCC + Energy
Features extraction (ZCR, TEO, HNR)
Features fusion Method classification (SVM, KNN) Emotion recognition
Fig. 1. Architecture of our system of emotion recognition.
In this article, feature extraction used MFCC. After extracting the MFCC entities, save them as feature vectors. The characteristic vectors are entered in the two classifiers
Deep Support Vector Machines for Speech Emotion Recognition
409
SVM and KNN are used for the classification of the emotions. Then, extract the prosodic features ZCR, TEO and HNR and save them as feature vectors by performing feature merge with MFCC coefficients to gain 16 characteristics in order to proceed with the classification with both methods. This brings the need to select characteristics in the recognition of emotions In order to improve the results of SVM and KNN, we have proposed the deep learning. To increase the performance of this system, we proposed DSVM. 3.1 Feature Extraction In this work, we use 12 MFCC coefficients + Energy [11], Zero Crossing Rate (ZCR), Teager Energy Operator (TEO) and Harmonic to noise Ratio (HNR). • Zero Crossing Rate The Zero Crossing Rate (ZCR) is an interesting parameter that has been used in many speech recognition systems. As the name suggests, it is defined by the number of zero crossings in a defined region of the signal, divided by the number of samples in that region [12]. 1 N −1 sign(s(n)s(n − 1)) n=1 N −1 1, s(n)s(n − 1) ≥ 0 Where sign(s(n)s(n − 1)) = 0, s(n)s(n − 1) < 0 ZCR =
(1)
• Teager Energy Operator Teager Energy Operator (TEO) functions check the characteristics of the speech when the utterance presents a certain stress. TEO functions measure the non-proximity of the utterance by treating its behavior in the frequency and time domain. For estimation of the TEO, each output of the M signal is segmented into frames of equal length (for example, 25 ms with a frame offset of 10 ms); where M is the number of critical bands and f is the number of the frame for which the TEO is extracted. In our work we extract the TEO from the total of signal. ψM [xf [t]] = (xf [t])2 − (xf [t − 1] xf [t + 1]))
(2)
• Harmonic to Noise Ratio The harmonic-to-noise ratio (HNR) is a measure of the proportion of harmonic noise in the voice measured in decibels [13]. It describes the distribution of the acoustic energy between the harmonic part and the inharmonic part of the radiated vocal spectrum.
410
H. Aouani and Y. Ben Ayed
3.2 Classification The extracted features were introduced in two machine-learning models in order to choose the most efficient. In this work, experiments conducted on SVM and k-NN. Then the use of the deep learning which is Deep SVM [11].
4 Experiments and Results In our work, we employed the Surrey Audio-Visual Expressed Emotion Database (SAVEE) is widely used in the emotional recognition of speech [11, 15]. It contains 480 sentences in total. The Table 1 presents a summary of the best recognition rate found for the three SVM kernels as a function of features. Table 1. The recognition rates on the test corpus obtained with the SVM as a function of features. Features
Kernel Linear
Kernel Polynomial
Kernel RBF
12 MFCC +Energie
60.86
66.67
72.43
12 MFCC +Energie, ZCR, TEO, HNR
65.22
73.91
74.29
The results obtained show that the improvement of the recognition rate is proportional to the increase in characteristics. The best results for different emotions using the 16 functions (13 MFCCs with ZCR, TEO and HNR) with RBF core SVM are summarized in the following Table 2: Table 2. The recognition rates on the test corpus obtained using SVM with RBF kernels as a function of 13 MFCC and fusion features. Features emotions
13 MFCC
Fusion features (13MFCC, ZCR, TEO, HNR)
Angry
77.77
77.78
Disgust
77.77
77.77
Fear
55.55
55.55
Happy
76.66
66.66
Neutral
79.88
86.66
Sad
55.55
88.88
Surprise
44.44
66.77
The two figures below (Fig. 2, Fig. 3) 3 shows the variation of the KNN identification rates for each change in the value of k on the two systems respectively: the system using the 13 MFCCs and the one with the fusion features (13MFCC, ZCR, TEO and HNR).
Deep Support Vector Machines for Speech Emotion Recognition
411
80 70 60 50 40 30 20 10 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 K variaƟon Fig. 2. The identification rates found of KNN method for each change in the value of k on the system with 13 MFCC.
80 70 60 50 40 30 20 10 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 K variaƟon Fig. 3. The identification rates found of KNN method for each change in the value of k on the system with fusion features.
After a series of experiments by varying the value k, we conclude that for both systems when the value k is 17, we have identified the best recognition rates which are equal to 72.46% and 73.91% respectively for the two systems: with 13 MFCC and the system using fusion features. The table below represents a summary of the best results found for each emotion with k = 17 for both systems: 13 MFCC and fusion features (13MFCC, ZCR, TEO and HNR) (Table 3). The tables below summarize the best recognition rates found for the seven emotions across the characteristics used and the different systems used (Tables 4 and 5). These tables show the effectiveness of the DSVM algorithm for the two systems: systems of 13 MFCC and the system using fusion features which give an overall rate equal to 79.71% and 82.60% respectively.
412
H. Aouani and Y. Ben Ayed
Table 3. The recognition rates on the test corpus obtained using KNN as a function of 13 MFCC and fusion features. Features emotions
13 MFCC
Fusion features (13MFCC, ZCR, TEO, HNR)
Angry
88.88
77.77
Disgust
55.55
89
Fear
44.44
56
Happy
66.66
44.44
Neutral
86.66
80
Sad
77.77
88.88
Surprise
77.77
78
Table 4. Results obtained by the different systems for the system with 13 MFCC. Methods emotions SVM DSVM Angry
77.77 100
Disgust
77.77 88.88
Fear
55.55 44.44
Happy
76.66 66.66
Neutral
79.88 100
Sad
76.55 77.77
Surprise
55.55 66.66
Table 5. Results obtained by the different systems for the system with fusion features. Methods emotions SVM DSVM Angry
77.78 88.88
Disgust
44.44 100
Fear
77.77 88.88
Happy
88.88 44.44
Neutral
86.66 93.33
Sad
66.66 66.66
Surprise
66.77 88.88
The table below summarizes the best recognition rates found for our two systems using the different methods of classifications (Table 6).
Deep Support Vector Machines for Speech Emotion Recognition
413
Table 6. The recognition rates obtained using different classifiers for both systems. Systems classifiers
13 MFCC
Fusion features (13MFCC, ZCR, TEO, HNR)
KNN
72.46
73.91
SVM
72.43
74.29
DSVM
79.71
82.60
For the use of KNN method, we find after a series of experiments that the best recognition rates are of the order of 73.91% fusion features systems and 72.46% for the system using 13 MFCC. Using the RBF kernel SVM model, the best result is 74.29% for the system that combines the features that is superior to the system with 13 MFCC which gives a rate equal to 72.43%. For use of Deep SVM (DSVM), we achieved a recognition rate of 82.60% for the system with fusion features and a rate of 79.71% for the 13 MFCC systems which are higher than the standard SVM. Table 7 shows a comparison of the proposed SER system with some of the recent published studies that used SAVEE dataset with different classifiers and features. Table 7. Comparison of our proposed system with some previous studies that used the SAVEE database. Related work
Methods and no. of features
Recognition accuracy (%)
No. of emotions
Noroozi et al. [15]
Random Forest (RF) 13 features
66.28
6 emotions
Papakostes et al. [7]
SVM CNN 34 features
30 with SVM 25 with CNN
4 emotions
Siddique Latif et al. [10]
DBN 88 features
56.76
7 emotions
Current study
13 MFCC
KNN
72.46
7 emotions
SVM
72.43
DSVM
79.71
KNN
73.91
SVM
74.29
DSVM
82.60
Current study
Fusion features (13MFCC, ZCR, TEO, HNR)
7 emotions
414
H. Aouani and Y. Ben Ayed
5 Conclusion In this paper, we propose two systems, one that use 13 MFCC only and the other that merges the features: 13 MFCC, Zero Crossing Rate (ZCR), Teager Energy Operator (TEO), and Harmonic noise ratio (HNR) by testing them with different methods of classifications. The first method of classification is the Support Vector Machines (SVM) which gives the best result only for the merger system features. The second method is the k-Nearest Neighbor (KNN), after a series of experience on the change in value of k, we obtain a better recognition rate equal to 73.91% with the system of fusion features compared to the system using 13 MFCC that gives recognition rate equal to 72.46%. In order to improve the results, we have proposed the Deep SVM (DSVM) which shows its efficiency in the two systems: system using 13 MFCC and system with fusion features with a rate respectively equal to 79.71% and 82.60%. Our comparative study of emotion classification systems demonstrates the effectiveness of using the fusion features system with the SVM method compared to system with 13 MFCC as well as the use of Deep SVM (DSVM) for the two systems. This work achieves better accuracy using the two systems in comparison with recent previous studies that used the same dataset. Two speech emotion recognition systems based on seven emotions were proposed in this article using different classifiers and compared their performance. Future work should try to think of other descriptors. We can also consider performing the recognition of emotions using an audiovisual base (image and speech) and in this case to benefit from the descriptors from speech and others from image. This allows us to improve the recognition rate of each emotion.
References 1. Emerich, S., Lupu, E.: Improving speech emotion recognition using frequency and time domain acoustic features. In: EURSAIP (2011) 2. Park, J.-S., Kim, J.-H., Oh, Y.-H.: Feature vector classification based speech emotion recognition for service robots. IEEE Trans. Consum. Electron. 55(3), 1590–1596 (2009) 3. Law, J., Rennie, R.: A Dictionary of Physics, 7th edn. Oxford University Press, Oxford (2015) 4. Zhibing, X.: Audiovisual Emotion Recognition Using Entropy estimation-based Multimodal Information Fusion. Ryerson University, Toronto (2015) 5. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006) 6. Song, P., Ou, S., Zheng, W., Jin, Y., Zhao, L.: Speech emotion recognition using transfer nonnegative matrix factorization. In: Proceedings of IEEE International Conference ICASSP, pp. 5180–5184 (2016) 7. Papakostas, M., Siantikos, G., Giannakopoulos, T., Spyrou, E., Sgouropoulos, D.: Recognizing emotional states using speech information. In: Vlamos, P. (ed.) GeNeDis 2016. AEMB, vol. 989, pp. 155–164. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-573489_13 8. Ramdinmawii, E., Mohanta, A., Mittal, V.K.: Emotion recognition from speech signal. In: IEEE 10 Conference (TENCON), Malaysia, 5–8 November 2017 (2017) 9. Shi, P.: Speech emotion recognition based on deep belief network. IEEE (2018)
Deep Support Vector Machines for Speech Emotion Recognition
415
10. Latif, S., Rana, R., Younis, S., Qadir, J., Epps, J.: Transfer learning for improving speech emotion classification accuracy (2018). arXiv:1801.06353v3 [cs.CV] 11. Aouani, H., Ben Ayed, Y.: Emotion recognition in speech using MFCC with SVM, DSVM and auto-encoder. In: IEEE 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) (2018) 12. Hùng, L.X.: Détection des émotions dans des énoncés audio multilingues. Institut polytechnique de Grenoble (2009) 13. Ferrand, C.: Speech Science: An Integrated Approach to Theory and Clinical Practice. Pearson, Boston, MA (2007) 14. Noroozi, F., Sapi´nski, T., Kami´nska, D., Anbarjafari, G.: Vocal-based emotion recognition using random forests and decision tree. Int. J. Speech Technol. 20(2), 239–246 (2017). https:// doi.org/10.1007/s10772-017-9396-2 15. Swerts, M., Krahmer, E.: Gender-related differences in the production and perception of emotion. In: Proceedings of the Interspeech, pp. 334, 337 (2008)
Biometric Individual Identification System Based on the ECG Signal Sihem Hamza1,2 and Yassine Ben Ayed2(B) 1 Higher Institute of Computer Science and Multimedia, ISIMS,
University of Sfax, Sfax, Tunisia [email protected] 2 Multimedia InfoRmation Systems and Advanced Computing Laboratory, MIRACL, University of Sfax, Sfax, Tunisia [email protected]
Abstract. Human biometric identification based on the ElectroCardioGram (ECG) is relatively new. This domain is intended to recognize individuals since the ECG has unique characteristics for each individual. These features are robust against forgery. In this study, feature extraction from ECG signals was performed using a combination of three new types of characteristics: MFCC, ZCR, and entropy. We proposed to apply classification methods: K Nearest Neighbors (KNN) and support vector machines (SVM) for human biometric identification. For evaluation we used two bases, namely MIT-BIH arrhythmia and normal sinus rhythm obtained from the Physionet database. For the MIT-BIH database, we used 47 individuals, each recording contains ECG data recorded for 15 s and in the SNR database, we used 18 individuals, the duration of each recording is 10 s. The analysis of the results obtained shows that the combination of all the features proposed makes it possible to improve the efficiency of our identification system to reach a performance rate equal to 100% for the two bases. Keywords: Biometric identification · ECG · MFCC · ZCR · Entropy · KNN · SVM
1 Introduction Biometrics involves recognizing a person. This is a very active area of research because it helps to ensure security [1]. In biometrics, there are traditional methods of verifying individuals’ identities such as passwords, etc. [2]. Biometrics is a technology that identifies individuals based on their physiological or behavioral characteristics [3]. These characteristics must be unique, different from one person to another and permanent, not changing over time [2]. Among the biometric features: fingerprint, face, iris, signature, gait, etc. [4]. These characteristics can be faked so one needs to find new characteristics difficult to spoof such as the physiological signals (such as the ECG). The use of ECG is not well known in biometrics. The ECG has © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 416–425, 2021. https://doi.org/10.1007/978-3-030-49342-4_40
Biometric Individual Identification System
417
characteristics specific to each person [5]. Certain works have proposed for biometric identification based on the ECG [6–13]. Biel et al. [6] used the 12 lead database. They applied the SIMCA (Soft Independent Modeling of Class Analogy: allows finding similarities between test objects and classes rather than finding the same behavior and this model uses the ACP model) to classify the individual using the analytical characteristics [6]. This system obtains an identification rate equal to 98%. Wang et al. [11] used the PTB database (use only 13 normal subjects). They tested with the following classifiers: LDA, Nearest Center (NC), KNN using analytic characteristics and appearances. They found an identification rate of 100%. In 2010, S. Chantaf et al. [9] have proposed a wavelet array method and have used the neural network (RBF) for classification which gives a performance rate of 80% (when the number of beats equals 100) and 92% (when the number of beats equal to 120). Islam et al. [8] used two bases for identification: the MIT-BIH Arrhythmia database (this database uses only 26 subjects) and the PTBDB database (use 76 subjects). They used the HBS method and they used the ROC curve for classification. They found a rate equal to 99.85% for the MIT-BIH base and 97.93% for the PTB base. In 2011 [12], the wavelet coefficient method was used and the distance measurement between the coefficients was applied. They found a rate equal to 100% for the MIT-BIH Arrhythmia base (use only 21 subjects). Joao Ribeiro Pinto et al. proposed DCT and transformed Haar method and they performed 4 classification methods: SVM, KNN, MLP and Gaussian mixing models (UBM). They obtained a recognition rate equal to 94.9% with the SVM method. In 2013 [10], used a combination of the two characteristics extraction techniques that are the analytical parameters and the HPE coefficients. They worked on the SNR basis (18 normal subjects) and they applied the HMM model. The system gives a rate equal to 99%. Kiran Kumar Patro and P. Rajesh Kumar [13] used the analytical parameters with 3 classification methods ANN, KNN, SVM. They worked on two bases, which are the basic MIT-BIH Arrhythmia and the ECG IDDB base and they found with the SVM method 93.7%. These works have used methods that are very recognized in this field and have been applied on some records so we have to search for other new methods then apply them on the totalities of the bases. In this paper, we proposed a new approach that identifies individuals by ECG. Indeed, our system has two phases, namely the extraction of characteristics and the classification of individuals. In the first phase, we used three methods, which are the coefficients MFCC, the ZCR, and the entropy. In order to improve the identification rate, we combined all of the methods in one input vector. So we chose to use the MFCC coefficients, ZCR, and entropy because these methods are more used in speech recognition and they are given good recognition rates for this we used these methods in our study. In the second phase, we used the Support Vector Machines (SVM) and the K-nearest neighbor (KNN). Our system is evaluated on the MIT-BIH Arrhythmia database [14] and the SNR database.
418
S. Hamza and Y. B. Ayed
The remainder of this document is organized as follows: Sect. 2 describes the architecture of our system. Section 3 presents the preprocessing method. The extracted features are presented in Sect. 4. In Sect. 5, we present the classifiers used for the identification. Section 6 presents the experimental results obtained by different methods applied for classification; also, we give some comparative studies. Finally, a conclusion is presented in Sect. 7.
2 Architecture of the Proposed Identification System Our system has three stages, as shown in Fig. 1 and which consists of preprocessing, feature extraction and classification.
Fig. 1. Block diagram of the algorithm adopted
We tested our system with an input vector formed by the combination of three methods such as MFCC coefficients, ZCR, and entropy. This combination improves the performance of recognition. Subsequently, we used the SVM and KNN method to perform the classification of individuals.
3 Preprocessing The ECG is acquired from the electrodes placed on the limbs. The pretreatment phase is necessary to eliminate the noises. So, the preprocessing operation has three steps, namely filtering, scaling, and segmentation. First, a band-pass filter (2–50 Hz) is applied to each ECG record to remove the baseline wander and the power line interference [15, 16].
Biometric Individual Identification System
419
• Detection of Peaks R To detect the R peaks, the same algorithm proposed by Pan, J., and Tompkins, W. J. was chosen [17]. After having corrected the baseline, the detection of the peaks R is carried out by adaptive thresholding. The goal of R peak detection is to locate the timing position for all true R positive peaks while eliminating false positive peaks. For the location of the peaks R, it is necessary to determine both the amplitude of the peak and the instant of the appearance of this peak. • Segmentation of the ECG Signal Segmentation is performed to ensure a good average value of the ECG signals. We perform this technique so that at each ECG signal containing multiple segments and on each segment include two R peaks. The segments obtained are not of the same size. For this, we standardize the size of each segment so that all segments are the same size [18]. Normalization is as follows: the new segment size is the largest segment size [18]. That is to say, we add to each index segment i: [N m -N i ] samples (except the segment of size N m ). The new size would be equal to N m . Where N is the size of the descriptor of the locations of the peaks R of the ECG signal. Nm = max[Ni ]
(1)
4 Features Extraction Feature extraction is an important step before classification. The literature presents different parameters used for the classification. In our study, we used a fusion of the three methods, which are the MFCC coefficients, ZCR, entropy. The integration of these parameters makes it possible to improve the identification rate. • MFCC To calculate the Mel-Frequency Cepstral Coefficients (MFCCs) parameters, the inverse Fourier Fast Fourier Transform (IFFT) is applied to the logarithm of the Fast Fourier Transform (FFT) module of the signal, filtered according to the Mel scale [19] (Fig. 2). • ZCR Zero Crossing Rate (ZCR) is an interesting feature that has been used in many speech recognition systems, which measures how many times a signal crosses the zero per unit of time [20]. The most attractive feature of the ZCR is that it can be computed in real time without even having the analog to digital conversion as it only needs to recognize the voltage sign changes in a signal [20].
420
S. Hamza and Y. B. Ayed
Fig. 2. Steps for calculating MFCC coefficients
It is defined by the following equation [21]: ZCR =
1 N −1 sign(s(n)s(n − 1)) n=1 N −1
(2)
• Entropy Entropy was introduced by Claude Shannon to measure the amount of information contained in a random signal [22]. Its use is widespread in information theory. It is defined as follows [23]: H (x) = P(xk ) × log2 [P(xk )] (3) k
where x = {xk } 0 ≤ k ≤ N − 1 is a discrete random variable (time, frequency or other), and P(xk ) the probability of a certain state x k . After the preprocessing phase, we extracted the characteristics. At each ECG segment, we calculated the MFCC coefficients (we have 12 MFCC coefficients). After, we also calculated the value of the ZCR of each ECG segment. Subsequently, we calculated the entropy value of the segment (see Fig. 3). Then, after extracting these features from each segment, we combine them into one vector. Therefore, the descriptor of each segment contains 12 MFCC coefficient, ZCR value, and entropy value.
5 The Classifier Identification We used two classifiers such as SVM and KNN. • SVM This is a linear classification method introduced by Vladimir Vapnik [24]. It relies on the existence of a linear classifier in a suitable space. For two classes of the given example, the goal of the Large Margin Separators (SVM) is to find a classifier that will separate the data and maximize the distance between the two classes. It based on the function named kernel, which allows an optimal separation of the data. Among
Biometric Individual Identification System
421
Fig. 3. Feature extraction phase
the most used kernel functions, such as linear, polynomial, RBF. Therefore, the use of this classification method essentially consists of selecting good kernel functions and adjusting the parameters to obtain a maximum identification rate. We will use the SVM with these three kernel functions, so that: • Linear: K xi , xj = xiT xj • Polynomial: K xi , xj = (γ xiT xj + r)d , γ > 0 2 • RBF: K xi , xj = exp −γ xi − xj , γ > 0. With d: degree of polynomial, r: weighting parameter (used to control weights) and γ: kernel flexibility control parameter. Then the adjustment of the various parameters of the SVM classifier is done in an empirical way. Each time we modify the type of the SVM kernel in order to determine the values γ, r, d and c which are values chosen by the user, in order to find the most suitable kernel parameters for our search. with xi ∈ Rn , i = Indeed, the data are represented by: S = {(xi , yj ) 1, . . . , m and yi ∈ {1, . . . , k}} where k represents the number of classes used. The Support Vector Machines (SVM) constitute a very powerful technique for pattern classification problems. However, its efficiency in practice depends highly on the selection of the kernel function type and relevant parameter values. Selecting relevant features is another factor that can also impact the performance of SVM. In fact, the principle of SVMs is based on the construction of the optimal hyperplane that is the best separation of the training data projected in the feature space by a kernel function K. To obtain this hyperplane, we are led to solve the problem of maximizing 2 which is equivalent to minimizing the square of ω. Thus, the value of the margin ω the margin plays a crucial role in SVMs. • KNN The KNN algorithm is one of the simplest classification algorithms [25]. Even with such simplicity, it can give very competitive results. The only difference from the methodology
422
S. Hamza and Y. B. Ayed
discussed will be to use the averages of the nearest neighbors rather than voting for the nearest neighbors. The K is a constant defined by the user; it is the nearest neighbors for whom we wish to vote.
6 Experiments and Results We used two databases such as MIT-BIH Arrhythmia and SNR. • MIT-BIH Arrhythmia Database This database1 contains 48 ECG recordings. In 48 extracts there are two recordings (201 and 202) are of the same subject. This database contains 47 classes (individuals). The duration of each recording is 15 s. The subjects were 25 men aged 32 to 89 years and 22 women aged 23 to 89 years. • SNR Normal Sinus Rhythm Database (nsrdb) This database2 includes two types of ECG (ECG1 and ECG2), each type includes 18 ECG recordings. The duration of each recording is 10 s. The subjects included in this database showed no significant arrhythmia; they include 5 men aged 26 to 45 and 13 women aged 20 to 50. In this work, we try to compare ourselves with the results realized with the author Zied et al. [26] and for that; we use new methods of extractions of the characteristics applied on the two bases MIT-BIH and SNR. – For the MIT-BIH database: Each individual with 10 samples and each sample has two R peaks. – For the SNR database: Each individual with 9 samples and each sample also has two R peaks. We used two classifiers SVM and KNN for human identification. Hence, we divided each individual’s samples into two parts: 70% dedicated for learning and 30% for testing. After the preprocessing phase and the extraction phase of the different characteristics, we are interested in this part in the learning phase. In our work, we perform learning for the construction of the model that can predict the class of individual. Support Vector Machines (SVM) is a discriminating learning method that has given good performance in several applications. We started using the MFCC settings since they are widely used in the speech domain. Our realization is carried out with a combination of three parameters: the MFCC parameters (12 MFCC), the ZCR parameter and the entropy parameter. Then, we will apply the KNN method and the SVM technique (with the different kernels) in order to compare with other works that have used the same databases (MIT-BIH, SNR). We will use the SVM with the kernel functions (linear, polynomial and RBF). 1 https://physionet.org/physiobank/database/mitdb/. 2 http://www.physionet.org/physiobank/database/nsrdb.
Biometric Individual Identification System
423
We have started to modify for each type of kernel its parameters adapted firstly with the MIT-BIH Arrhythmia database and secondly on the SNR database. About the MITBIH database, for the linear function we have varied the regularization parameter C and which gives a better identification rate when C = 0.01 with an identification rate equal to 100%, and even for the polynomial function the best identification rate when C = 0.001 with an identification rate equal to 99.3%. The parameter C is fixed at 0.001, we vary the parameter r in order to obtain the best parameter r* to have a maximum identification rate (we found a rate equal to 100% when r = 1000). After fixing the value C and r we have varied the parameter gamma g and which gives a better identification rate when g = 0.0001 with an identification rate equal to 100%. The best result obtained for the RBF kernel is when C = 10 and g = 0.0005 to reach a level equal to 100. same work for the SNR database. Then, for the KNN we have varied the parameter K in order to obtain the optimal parameter K* which gives the maximum identification rate. The table below presents a summary of the best recognition rates found for the two bases using the combination of three types of characteristics (MFCC, ZCR, Entropy) and the different systems used (Table 1). Table 1. The best recognition rates found for both bases Feature
MIT-BIH
SNR
KNN SVM KNN SVM [MFCC+ZCR+Entropy] 93%
100% 100% 100%
In fact, we have an identification rate of 100% for the combination of three methods using the SVM model on the entire MIT-BIH Arrhythmia database (47 individuals). This identification rate is higher than that found by Zied et al. [26] which are of the order of 98.8% and which used the analytical characteristics with SVM modeling applied on 44 individuals of the MIT-BIH Arrhythmia database. When working on 44 individuals the KNN also gives a rate equal to 100%, which is higher than that found by Zied et al. [26]. In addition, the application of the KNN algorithm on the SNR base gives a rate equal to 100% when using the SNR base. These two rates are better than the one found by Zied et al. [10], which are of the order of 99%, and which used the morphological descriptors and the HPE coefficients with the HMM modeling applied on the SNR base. The table below represents a comparison with similar studies (Table 2).
424
S. Hamza and Y. B. Ayed Table 2. Summary on different works used with similar studies
Author
Feature
Classification
Database
Identification rate
Zied et al. [10]
Morphological descriptors, HPE
HMM
SNR
99%
Zied et al. [26]
Analytical, morphological characteristics
SVM
MIT-BIH
98.8%
SNR
99.38%
K. Patro and others. [13]
Temporal characteristics
KNN
MIT-BIH
92.72%
SVM
MIT-BIH
93.7%
Proposed approach
MFCC, ZCR, Entropy
KNN
MIT-BIH
93%
KNN
SNR
100%
SVM
MIT-BIH
100%
SVM
SNR
100%
7 Conclusion In this paper, we presented the performance of the different systems proposed based on the use of a fusion of three types of characteristics (MFCC, ZCR, Entropy) in order to identify individuals. Then we apply the KNN for the classification of individuals, so we applied the SVM model for classification by using the different types of SVM kernels in order to find the best results. The results of these systems show its effectiveness in achieving good results. We show that the integration of these parameters improves the identification rate. The results obtained for the MIT-BIH database are better than those obtained by the author Zied et al. [10] as well as for the SNR database, the results obtained are better than those made by the author Zied et al. [26] and the author K. Patro et al. [13]. In future, we can think about using other types of features and apply our system on other bases that are larger and more general and finally we can act at the level of classification methods thinking to use deep learning that to show its performance in several domains.
References 1. Jain, A.K., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans. Circuits Syst. Video Technol. 14(1), 4–20 (2004) 2. Jolicoeur, P.: Introduction à la biométrie. Décarie (1991) 3. Kim, J.S., Pan, S.B.: A study on EMG-based biometrics. J. Internet Serv. Inf. Secur. (JISIS) 7(2), 19–31 (2017) 4. Toufik, H.: Reconnaissance Biométrique Multimodale basée sur la fusion en score de deux modalités biométriques: l’empreinte digitale et la signature manuscrite cursive en ligne. Ph.D. thesis, Universite Badji Mokhtar-Annaba (2016)
Biometric Individual Identification System
425
5. Chang, C.: Authentication biométrique par dynamique de frappe pour évaluation à distance utilisant svm à une classe (2016) 6. Biel, L., et al.: ECG analysis: a new approach in human identification. IEEE Trans. Instrum. Meas. 50(3), 808–812 (2001) 7. Plataniotis, K.N., et al.: ECG biometric recognition without fiducial detection. In: 2006 Biometrics Symposium: Special Session on Research at the Biometric Consortium Conference, pp. 1–6. IEEE (2006) 8. Islam, M.S., et al.: HBS: a novel biometric feature based on heartbeat morphology. IEEE Trans. Inf Technol. Biomed. 16(3), 445–453 (2012) 9. Chantaf, S., et al.: ECG modelling using wavelet networks: application to biometrics. Int. J. Biom. 2(3), 236–249 (2010) 10. Zied, L., et al.: Biometric personal identification system using the ECG signal. In: Computing in Cardiology Conference (CinC), 2013, pp. 507–510. IEEE (2013) 11. Wang, Y., et al.: Integrating analytic and appearance attributes for human identification from ecg signals. In: 2006 Biometrics Symposium: Special Session on Research at the Biometric Consortium Conference, pp. 1–6. IEEE (2006) 12. Naraghi, M.E., Shamsollahi, M.B.: ECG based human identification using wavelet distance measurement. In: 2011 4th International Conference on Biomedical Engineering and Informatics (BMEI), vol. 2, pp. 717–720. IEEE (2011) 13. Patro, K.K., Kumar, P.R.: Machine learning classification approaches for biometric recognition system using ECG signals. J. Eng. Sci. Technol. Rev. 10(6), 1–8 (2017) 14. Thomas, K.P., Vinod, A.P., et al.: EEG-based biometrie authentication using selfreferential visual stimuli. In: 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3048–3053. IEEE (2017) 15. Tan, R., Perkowski, M.: Toward improving electrocardiogram (ECG) biometric verification using mobile sensors: a two-stage classifier approach. Sensors 17(2), 410 (2017) 16. Zhang, Q., Zhou, D., Zeng, X.: Heartid: a multiresolution convolutional neural network for ECG-based biometric human identification in smart health applications. IEEE Access 5, 11805–11816 (2017) 17. Pan, J., Tompkins, W.J.: A real-time qrs detection algorithm. IEEE Trans. Biomed. Eng. 32(3), 230–236 (1985) 18. Noureddine, B., et al.: Person identification system based on electrocardiogram signal using labview. Int. J. Comput. Sci. Eng. 4(6), 974 (2012) 19. Yassine, B.A.: Détection de mots clés dans un flux de parole. Ph.D. thesis, Télécom ParisTech (2003) 20. Lee, S., Wang, J., Chen, M.: Threshold-based noise detection and reduction for automatic speech recognition system in human-robot interactions. Sensors 18(7), 2068 (2018) 21. Bachu, R.G., Kopparthi, S., Adapa, B., Barkana, B.D.: Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In: American Society for Engineering Education (ASEE) Zone Conference Proceedings, pp. 1–7 (2008) 22. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948) 23. Laleye, F.A.A.: Contributions à l’étude et à la reconnaissance automatique de la parole en fongbe. Université du Littoral Côte d’Opale (2016) 24. Hasan, M., Boris, F.: Svm: Machinesa vecteurs de support ou séparateursa vastes marges. Rapport technique, Versailles St Quentin, France. Cité, p. 64 (2006) 25. Amina, A., Mouhamed, A., Morad, C.: Identification des personnes par système multimodale 26. Dhouha, R., Zied, L.: ECG biometric recognition using SVM-based approach. IEEJ. Trans. Electr. Electron. Eng. 11, S94–S100 (2016)
Bayesian Anomaly Detection and Classification for Noisy Data Ethan Roberts1,2(B) , Bruce A. Bassett1,2,3,4 , and Michelle Lochner2,3 1
University of Cape Town, Rondebosch, Cape Town 7700, South Africa [email protected] 2 African Institute of Mathematical Sciences, Muizenburg, Cape Town 7950, South Africa 3 South African Radio Astronomical Observatory, Observatory, Cape Town 7295, South Africa 4 South African Astronomical Observatory, Observatory, Cape Town 7295, South Africa https://github.com/ethyroberts/BADAC
Abstract. Statistical uncertainties are rarely incorporated into machine learning algorithms, especially for anomaly detection. Here we present the Bayesian Anomaly Detection And Classification (BADAC) formalism, which provides a unified statistical approach to classification and anomaly detection within a hierarchical Bayesian framework. BADAC deals with uncertainties by marginalising over the unknown, true, value of the data. Using simulated data with Gaussian noise as an example, BADAC is shown to be superior to standard algorithms in both classification and anomaly detection performance in the presence of uncertainties. Additionally, BADAC provides well-calibrated classification probabilities, valuable for use in scientific pipelines. Keywords: Machine learning · Anomalies Bayesian · Unsupervised class detection
1
· Classification · Novelty ·
Introduction
In any fully rigorous or scientific analysis, uncertainties must be quantified and propagated through the entire analysis pipeline. This is difficult to do with traditional machine learning algorithms that do not explicitly take into account uncertainties on the data or features. As machine learning is increasingly given authority for making more important and high-risk decisions, (e.g. in self-driving cars), and with the potential for adversarial attacks [1], there is an increasing need for interpretable models and rigorous statistical uncertainties on machine learning predictions. An algorithm that automatically outputs unbiased, accurate probabilities, is particularly desirable in the physical sciences: knowing the probabilities of an object belonging to various classes is typically more useful than the class label c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 426–435, 2021. https://doi.org/10.1007/978-3-030-49342-4_41
Bayesian Anomaly Detection and Classification
427
alone because they are often used as inputs to another algorithm in a complex pipeline. This preference is especially telling when the true class labels of the training data are noisy or subjective, or the training data are not representative of the test set. An example in astronomy is provided by the photometric classification of type Ia supernovae which are subsequently used for studies of dark energy. Hard label classification leads to contamination from non-Ia supernovae that leads to biases in dark energy properties while fully propagating class probabilities instead allows for unbiased results at the end of the pipeline [2,3]. In this context Bayesian methods are ideal [4], as they have been proven optimal for classification for certain loss metrics, e.g. [5], and allow the option of both supervised or unsupervised classification [6]. A common limitation in the classification of noisy data however, is that the classes in the training data are typically represented by a single template with zero variability (e.g. [7]). This allows straightforward Bayesian methods to be used but does not apply if there is significant intraclass variability. Ignoring this intraclass variability also makes principled anomaly detection challenging: how unlikely is an example if one doesn’t know the underlying distribution within a class? Examples of recent work in this area include [8] and [9]. Here we address these limitations, constructing what we will argue is a natural, statistically robust supervised Bayesian method that can simultaneously be used for both anomaly detection and classification in the presence of measurement uncertainties on all data. Our method works directly with raw data, requiring no feature extraction, and requires minimal assumptions about the nature of the anomalies or classes. A longer version of this paper is available in preprint (see https://arxiv.org/abs/1902.08627).
2
Formalism
Bayesian Anomaly Detection and Classification (BADAC) is a hierarchical formalism for both classification and anomaly detection in the presence of measurement uncertainty on all data. We make use of language common to machine learning by referring to training data (data for which we have a class label) and test data (unlabelled data we wish to classify). Additionally, we refer to features for the data for a specific instance of a class. We start by assuming we have a training dataset consisting of multiple classes, τ , with each class having a subset of training data {yo }τ with entries i )τ with associated uncertainties on the features. The o subscript denotes (yo,j an observed variable, an important distinction to note when we later introduce latent variables. Here i indexes the instances/examples in a class, while j indexes the specific features within an instance. We now wish to classify test data {d} given the training data {yo }τ . Bayes’ theorem gives this posterior probability as: P (τ |{d}, {yo }τ ) ∝ P ({d}, {yo }τ |τ )P (τ )
(1)
where P (τ ) is the prior probability of belonging to class τ . Here the prior term is a constant for each class, and we drop it for notational simplicity. From here
428
E. Roberts et al.
we compute the likelihood term, P ({d}, {yo }τ |τ ). We ignore the evidence term for now, but will later show that it is straight forward to calculate. Let us now assume the data of each class can be modelled by Fτ (θτ ), where θτ is in general a set of hyperparameters for class τ . Let us also assume that the measurement uncertainties associated with the observed data yo is encapsulated by a mean-zero probability distribution function with parameters Σ (e.g. the covariance in the case of a Gaussian). Each instance in our observed data is thus generated by the model: {yoi }τ = Fτ (θτi ) + i where i is a realisation of the noise controlled by Σ. The distribution of the hyperparameters θτi describes the intraclass variability of class τ , while Σ controls the quality of the measurements. Since we won’t know Fτ and θτ in general, or they may be impractical to compute, we can model the observed data as: {yoi }τ = {yti }τ + i
(2)
where {yti }τ is the set of latent variables giving the “true” but unknown values of the training data {yoi }τ .1 In general, the likelihood term P ({d}, {yo }τ |τ ) in Eq. 1 cannot be evaluated since both {d} and {yo }τ have associated measurement uncertainty. However, if we assume our data have a known uncertainty distribution, then we can marginalise over the uncertainty on the training data. In some special cases, such as the Gaussian case we discussed below, this can be done analytically (as shown in Eq. 6). Using our latent variables, {yt }τ , the likelihood in Eq. 1 can be written as the multidimensional marginalisation: (3) P ({d}, {yo }τ |τ ) = d{yt }τ P ({d}, {yo }τ , {yt }τ |τ ) If we assume {d} and {yo }τ are statistically independent of one another, then this simplifies to: P ({d}, {yo }τ |τ ) = d{yt }τ P ({d}|{yt }τ , τ )P ({yo }τ |{yt }τ , τ )P ({yt }τ |τ ) (4) The likelihood for a new test instance {d} belonging to class τ - assuming the instances in the training data are uncorrelated - is then given by: n 1 P ({d}, {yo }τ |τ ) = d{yt }τ P ({d}|{yt }τ , τ ) n i=1 ×
n i=1
1
P ({yo }τ |{yt }τ , τ )
n
P ({yt }τ |τ ) (5)
i=1
It is worth noting that this method of modeling the uncertainty fails when the measurement errors are zero, since the probability of classifying data {d} into a known class vanishes almost everywhere. In this case one could account for intraclass variability by modelling Fτ and θτi explicitly.
Bayesian Anomaly Detection and Classification
429
Here P ({d}|{yt }τ , τ ) is the likelihood of observing the data {d}, conditioned on both the class type τ and the unknown true values of the training data. P ({yt }τ |τ ) is the prior on the true value {yt }τ given the class τ . Because of the uncertainties in the training data, the classification of just a single scalar data point requires an n-dimensional integral over the n2 instances in the training data of each class τ . We now focus on the case where we can solve this integral analytically, which fortunately corresponds to many datasets in physical sciences.
Fig. 1. Schematic representation of BADAC as a classifier. Left: a single test example consisting of just two data points (black triangles with error bars). The training data comes from two classes shown schematically as the blue (τ = 0) and orange (τ = 1) 1-σ error envelopes. Which of these two classes does the test data come from? Middle and Right: panels showing the unnormalised posterior probability for the true value, yti , for the first (middle panel) and second (right panel) data point, marginalised over the true value of the other point and conditioned on belonging to either class (class 0 - blue or class 1 - orange). The relative area of the corresponding Gaussians in the middle and right panels gives the probability for the data to belong to either class. As can be seen, the data is more likely to come from class 1 (the orange class), in this case with a probability of 73%.
The posterior can be analytically evaluated in the special case of uncorrelated Gaussian distributed test and training data, and for (improper) flat priors on {yt }τ . The two terms that then make up the likelihood are: 1 P ({d}|{yt }τ , τ ) = exp 2πσd2 and P ({yo }τ |{yt }τ , τ ) =
2
1 2πσy2
exp
{d} − {yt }τ σd
2
{yo }τ − {yt }τ σy
2 ,
Strictly speaking we should write nτ since the number of samples in each class will be different but we suppress this to keep the notation relatively simple.
430
E. Roberts et al.
where σd is the measurement uncertainty on d, and σy is the uncertainty on yo . We can solve Eq. 5 analytically, giving: 1/2 n m 1 π i −1 (2πσd σy ) P ({d}, {yo }τ |τ ) = 1 i n i=1 j=1 2 (Γd + Γy ) (Γd {d} + Γyi {yoi }τ ) 1 2 i i 2 Γd {d} + Γy {yo }τ − (6) × exp − 2 Γd + Γyi where Γd ≡ σd−2 and Γy ≡ σy−2 are the precisions of the data, n is the total number of training instances and m is the number of datapoints per instance. Figure 1 demonstrates using BADAC for classification. We use Eq. 6 in our experiments in Sect. 4 to evaluate BADAC for uncorrelated Gaussian noise. In order to simultaneously perform anomaly detection and to normalise the posterior probabilities in Eq. 6, we compute the Bayesian evidence for K known classes, P ({di }, {yo }τ ∈K ) over the entire training data {yo }τ ∈K , and for each test data instance, i, giving: P ({di }, {yo }τ ∈K ) =
K
P ({di }, {yo }τ |τk )P (τk )
(7)
k
where the likelihood is given by Eq. 6. We use the evidence in Eq. 7 as our anomaly score: lower evidence values imply a data instance is more anomalous than test instances with higher evidence for one of the known classes. If one has some prior knowledge of the anomalies, then a better alternative is to create a K + 1-th class with no training data but with a prior P (τK+1 ) that encodes this knowledge. This is, however, more sensitive to model misspecification: for example, using an anomaly prior performs worse when the noise is assumed uncorrelated Gaussian but is actually either correlated or non-Gaussian. We therefore report our anomaly results using Eq. 7 to rank instances.
3
Rank-Weighted Score
Here we introduce the Rank-Weighted Score (RWS), as a new anomaly detection performance metric. In addition to being insensitive to class imbalance, this metric is sensitive to the relative ranking of anomalous objects. RWS uses the ranking of the most anomalous N objects according to their degree of anomalousness (from high to low) as identified by any algorithm. Here N is a usersupplied, the expected number of anomalies. The RWS score is then computed as the weighted sum: N 1 wi Ii (8) SRWS = S0 i=1 where the weights are: wi = (N + 1 − i)
(9)
Bayesian Anomaly Detection and Classification
431
Note that this gives linearly more weight to correctly identifying anomalies at the top of the ranks (with low values of i) compared to lower down the list. In Eq. 8, S0 = N (N + 1)/2 and Ii is an indicator variable: Ii = 1 if object i is an anomaly, and vanishes otherwise. This means the RWS score has a possible range of [0, 1], where 0 implies that no true outliers were found in the N most anomalous objects ranked by the algorithm, while 1 would mean that all N most anomalous objects identified were in fact true anomalies. The value of N must be chosen on a per problem basis, and kept consistent across the various algorithms being considered. In Sect. 4 we use this metric along with several other commonly used metrics to gauge algorithm performance.
4
Results
To illustrate and test the performance of BADAC, we simulate a one-dimensional dataset and evaluate algorithm performance using a number of metrics. For classification, we use the average accuracy only. For anomaly detection, we consider the Area Under the Curve (AUC), Matthew’s Correlation Coefficient (MCC) [10] and the Rank-Weighted Score (RWS) introduced in Sect. 3. 4.1
Simulations
We simulate data for two “normal” classes from a sine and a quadratic function and build three anomaly classes from a top hat, exponential and a sum of sine waves respectively. Each class has hyperparameters (such as the frequency of the sine wave) randomly drawn from various Gaussian distributions. For each experiment, we generate 15000 curves of roughly equal number of objects from class 0 and 1 as training data. In the test data, we add 1% anomalous instances from the three anomaly classes. We add Gaussian noise to the data, the standard deviation of which depends on the class. 4.2
Comparison of Algorithm Performance
We assess the performance of our algorithm on the simulated data discussed in Sect. 4.1. We compare our algorithm to a series of benchmark algorithms, namely IsolationForest [11] and Local Outlier Factor (LOF) [12] for anomaly detection, and random forests [13] for classification. We use sklearn [14] implementations for all benchmark algorithms we compare against BADAC. For anomaly detection, all algorithms are parsed the training data, and the percentage contamination of outliers. For classification with random forests, we set n estimators=1000. We use the formalism shown in Sect. 2 to provide two probabilities, P0 and P1 , which are the un-normalised probabilities of belonging to class 0 and class 1 respectively (shown in Fig. 2). Plotting the unnormalised probabilities is useful for visualising the decision boundary that separates both the known classes and anomalies. It also does not
432
E. Roberts et al.
Fig. 2. Scatter plot showing the computed log-probabilities for the simulated data (Sect. 4.1). Each point corresponds to a test object, which is shown in the log(P0 )log(P1 ) space. Points that appear high on the y-axis have a high likelihood of being type 1. Points that appear higher (to the right) on the x-axis have a high likelihood of being type 0. The points are coloured by true type, where light blue corresponds to type 0, orange is type 1 and the dark crosses are all outliers.
require us to make any assumptions about the nature of the anomalies we expect to see. However, to make use of these probabilities in an analysis pipeline, they must be normalised. In order to normalise the probabilities, we divide by the Bayesian evidence (Eq. 7). If we bin the normalised probabilities for a single class, we can measure whether or not they are calibrated. It is a well known problem that many machine learning algorithms give uncalibrated probabilities that do not correspond to the true probability of an object belonging to a certain class. The reliability of probabilities can be investigated by plotting a probability calibration curve. We show this result for classification only for type 1 objects in Fig. 3, and compare the results of BADAC with those of random forests. We show the ROC curves for BADAC as well as LOF and IsolationForest in Fig. 4 in order to compare algorithm performance in anomaly detection. In terms of classification accuracy BADAC (99.02%) also marginally outperformed random forests (98.66%). 4.3
Beyond the Gaussian Assumption
In our formalism we have explicitly assumed a Gaussian distribution for the noise, which will not always apply. In principle, if the noise distribution is known the BADAC formalism can be extended to incorporate this knowledge. We have tested BADAC with the Gaussian assumption (Eq. 6) in the presence of nonGaussian noise as well as correlated Gaussian noise and find that BADAC still out-performs LOF and IsolationForest at anomaly detection, although
Bayesian Anomaly Detection and Classification
433
Fig. 3. Probability calibration curve for class 1 for BADAC and random forests (for classification only). Perfectly calibrated probabilities would lie on the line y = x. All objects within a particular probability range are binned, and the fraction of correct positive predictions plotted. The errorbars show the Poisson uncertainties given by the number of objects in each bin, and the x-coordinate for each bin is given by the mean calculated probability for that bin. Random forests gives poorly calibrated probabilities while BADAC automatically returns well-calibrated probabilities.
Fig. 4. ROC curves for BADAC, LOF and IsolationForest on the dataset with uncorrelated Gaussian error, for anomaly detection. BADAC performs best under the AUC, MCC and RWS metrics shown in the legend. The best classification algorithms have a ROC curve that reaches close to the top left hand corner, with perfect performance corresponding to an AUC of one.
434
E. Roberts et al.
performance degrades for classification in the correlated case. In the case of compact anomalies, which could emulate noisy spikes in data, we find that BADAC’s performance is comparable to LOF and superior to IsolationForest. See our online documentation3 for more information on these tests.
5
Conclusions
We have presented a novel statistically robust joint anomaly detection and classification method, Bayesian Anomaly Detection And Classification (BADAC), that takes advantage of knowledge of the underlying noise distribution in the features of the data. While our formalism is general, we perform tests for the case of Gaussian uncertainties which is most common in physical sciences. Using simulated one-dimensional data, we test the classification and anomaly detection capabilities of BADAC. We make use of several metrics, including our novel Rank-Weighted-Score that rewards algorithms for ranking more anomalous objects above those that have been commonly seen. We find that in the case where the correct noise model is known, BADAC outperforms random forests at classification and both IsolationForest and local outlier factor (LOF) at anomaly detection, due to its ability to correctly exploit uncertainty information. We demonstrate how BADAC produces calibrated classification probabilities, which is crucial if a machine learning algorithm is to be incorporated into a precise, scientific analysis pipeline. BADAC allows a number of extensions, including the ability to naturally handle missing data either by making use of a prior based on the other training data or by interpolating with a technique like Gaussian processes. While BADAC provides excellent performance by exploiting the extra information about the underlying noise distributions, the computational limitations mean that it scales badly to large training datasets. In this case one must either use prototype templates to represent the classes (e.g. through Gaussian processes) or parameterise the data, to speed up classification and anomaly detection with BADAC. Acknowledgements. We thank Alireza Vafaei Sadr, Martin Kunz and Boris Leistedt for discussions and comments. We acknowledge the financial assistance of the National Research Foundation (NRF).
References 1. Akhtar, N., Mian, A.: IEEE Access 6, 14410 (2018) 2. Kunz, M., Bassett, B.A., Hlozek, R.: Phys. Rev. D 75, 103508 (2007). https://doi. org/10.1103/PhysRevD.75.103508 3. Hlozek, R., Kunz, M., Bassett, B., Smith, M., Newling, J., Varughese, M., Kessler, R., Bernstein, J.P., Campbell, H., Dilday, B., et al.: Astrophys. J. 752(2), 79 (2012) 3
https://github.com/ethyroberts/BADAC.
Bayesian Anomaly Detection and Classification
435
4. Denison, D.G., Holmes, C.C., Mallick, B.K., Smith, A.F.: Bayesian Methods for Nonlinear Classification and Regression, vol. 386. Wiley, Hoboken (2002) 5. Domingos, P., Pazzani, M.: Mach. Learn. 29(2–3), 103 (1997) 6. Cheeseman, P., Stutz, J.: Bayesian Classification (AutoClass): Theory and Results, pp. 153–180. (American Association for Artificial Intelligence, Menlo Park, CA, USA (1996). http://dl.acm.org/citation.cfm?id=257938.257954 7. Sako, M., Bassett, B., Connolly, B., Dilday, B., Cambell, H., Frieman, J.A., Gladney, L., Kessler, R., Lampeitl, H., Marriner, J., Miquel, R., Nichol, R.C., Schneider, D.P., Smith, M., Sollerman, J.: Astrophys. J. 738, 162 (2011). https://doi.org/10. 1088/0004-637X/738/2/162 8. Ishida, E.E.O., Kornilov, M.V., Malanchev, K.L., Pruzhinskaya, M.V., Volnova, A.A., Korolev, V.S., Mondon, F., Sreejith, S., Malancheva, A., Das, S.: arXiv eprints arXiv:1909.13260 (2019) 9. Xu, X., Liu, H., Yao, M.: Complexity 2019, 1 (2019). https://doi.org/10.1155/ 2019/2686378 10. Matthews, B.W.: Biochimica et Biophysica Acta (BBA) - Protein Struct. 405(2), 442 (1975). https://doi.org/10.1016/0005-2795(75)90109-9 11. Liu, F.T., Ting, K.M., Zhou, Z.H.: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, ICDM 2008, pp. 413–422. IEEE Computer Society, Washington, DC (2008). https://doi.org/10.1109/ICDM.2008.17 12. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: SIGMOD Rec. 29(2), 93 (2000). https://doi.org/10.1145/335191.335388 13. Breiman, L., Schapire, E.: Mach. Learn. 5–32 (2001). 10.1.1.23.3999 14. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: J. Mach. Learn. Res. 12, 2825 (2011)
How to Trust the Middle Artificial Intelligence: Uncertainty Oriented Evaluation Marwa Brichni(B)
and Said El Gattoufi
Higher Institute of Management, SMART Laboratory Tunis, University of Tunis, Tunis, Tunisia [email protected]
Abstract. These last years have seen a renewed importance in measuring machine intelligence and considerable interest in the human-level machine intelligence. Despite this interest, few to the best of our knowledge who proposed a classification for these evaluations. Hernandez-Orallo’s recent findings regarding the measurements of machine intelligence have led to the need of classifying them into major categories which are the task-oriented evaluation and the ability-oriented evaluation. Although this approach is interesting as a broad classification inspired from Weak/strong AI, it suffers from the same discontinuity pitfall as Narrow/general AI. Our work aims to broaden further current knowledge of the efforts applied to evaluate artificial intelligence. Our main contribution in this paper is to shed light on the gap left behind the existing evaluations and propose a middle AI evaluation based on uncertainties and expectations. Thus, we are optimistic that within the next few years, an agreed-on, machine intelligence measurement is destined to become an important component in evaluating human-level machine intelligence concept. Keywords: Strong AI · Weak AI Uncertainty · Machine intelligence
1
· Human level machine intelligence · · Evaluation · Middle AI
Introduction
Before tackling the subject of classifying the machine intelligence evaluations, few researchers have addressed the question of whether the existing measures of intelligence are really quantifying intelligence or just measuring some related concepts such as autonomy or performance instead [1]. Previous works have only focused on identifying observable intelligent behavior and evaluate the performance to be attributed to intelligence as the easiest way to evaluate what seemed to be intelligent behavior [2]. Some other advanced works have seen autonomy as the major characteristic of an intelligent entity so c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 436–445, 2021. https://doi.org/10.1007/978-3-030-49342-4_42
Middle AI: Uncertainty-Oriented Evaluation
437
that they conducted their measurements on the autonomy detected while performing specific tasks [3]. Thus, current solutions to measuring the so-called intelligent machines are unsatisfactory. Experts have used machine intelligence’s characteristics separately to establish accordingly new measures for intelligence. Nevertheless, they do not grant to fit a machine intelligence measure. In a matter of fact, machine intelligence concept is still not widely understood, which explains why scholars are using other concepts to substitute it partly like performance and autonomy. A major defect of considering machine intelligence as a result of a successful task or successful ability is that it does not take into account that it might be chance driven result or by dint of redundancy and that is not intelligence. Thus, one of the main issues to our opinion is that scientists neglect the most important keyword in the definition of intelligence [4], namely uncertainties. Experts have overlooked machine intelligence evaluation when it is enduring uncertainties. Hence, we have noticed that they neglected an area where intelligence stands on autonomy and performance, that refers accordingly to abilities and tasks, to overcome disturbances. The most intuitive example of how uncertainty intervenes to confuse intelligence is the information loss game. This game returns the analytical intelligence capacity of the machine with obscure input due to undergoing uncertainties which correspond to the entropy theory process used already as a measurement approach [5]. There is still also considerable controversy surrounding splitting task-oriented from ability-oriented evaluation methods for nowadays’ machines because they lead help from each other to exhibit a more apparent intelligent behavior. For example, Turing tests were attributed mainly to task-oriented evaluation. They also took a significant place within ability oriented evaluation by dint of the anthropocentric side that they hold [6]. We have raised disagreement with regard using these evaluations types interchangeably in an ambiguous way. In light of these concerns, there is now considerable interest in designing a new evaluation type for the middle AI current inventions. We believe that today’s inventions do neither belong to the narrow AI nor to the general AI, they are more than simple task executors and less than a human being in terms of abilities. Moreover, as declared, contemporary artificial intelligence is yet assigned to the weak AI now for years “AI nowadays nearly always means narrow AI” [6]. Thus, our main goal is to add a new evaluation that fills the middle AI and that gathers uncertainty-oriented approaches. Concerns call into question the validity of our proposition but since we feel supported by Iantovics [7] for the encouragement of presenting new measurements methods for intelligence, by daugherty [8] for the possibility of defining an intermediate artificial intelligence type, we took the assumption of the need of a new evaluation type for the middle AI. However, there still a need for sci-
438
M. Brichni and S. El Gattoufi
entific support evaluating machines on the basis of uncertainties in extension to performing tasks autonomously using task and ability. This paper is an overview of the most known evaluations of artificial intelligence. Our purpose is to establish a new evaluation type for AI systems incurring uncertainty. We also aim to highlight Turing tests’ applicability, often rejected because of the incapacity of understanding due to uncertainty [9], now propped by the late surveys [7,10]. We believe that we have designed an innovative proposition about the middle AI evaluation type to shed lights on uncertainties being the basic component of the new measurements. In the coming sections, we will explain how the way to achieve human-level machine intelligence is perceived, we will also discuss the evaluations proposed by Hernandez-Orallo [6] and suggest a new evaluation type to ensure the smooth transaction of systems from weak AI to strong AI and establish trust from within.
2
Towards the Human Level Machine Intelligence Concept
Nowadays, introducing the trend of Human-level machine intelligence (HLMI) has become the new marketing strategy to convince about the credibility of new intelligent machines. Furthermore, it induces about human as the quintessential example to initiate the concrete utilization of intelligence [11]. With accordance to the fact that the human level machine intelligence is strong AI futuristic project, the most suitable label to be attributed to today’s machines is “the pre human-level machine intelligence” that holds an optimistic view about the evolution from weak to strong AI and a realistic perception about its application. We should mention that the towards HLMI was deeply discussed by Zadeh [12], the rest of the researchers were keen about defining and measuring human-level machine intelligence ignoring how and what comes before. We have surveyed many papers carrying the word intelligence in the computer science domain , but, we noticed only a few of them distinguishing intelligent machines from non-intelligent machines [13–16]. Most of the existing studies are involved in establishing human intelligence into the allegedly intelligent machines without a substantial analysis of what intelligence is [17–19]. Tremendous inventions are characterizing the evolution of this field. Machines are exhibiting more and more a sort of intelligent behavior that is improving throughout time, but that was never sufficient to claim their membership to the strong AI. It has been decades since the first Turing machine stemmed the concept of narrow AI plus experts are not enthusiastic about a nearer evolution to reach the Strong AI. We believe that current machines do not belong anymore to the weak AI since they detain a sort of exhibited intelligence, but, there are some concepts related to the strong AI dislodging them towards the human level machine intelligence. This raises the following question: is nowadays AI still weak? It is evident that with accordance to the strong AI principal, the contemporary AI
Middle AI: Uncertainty-Oriented Evaluation
439
is not yet based on general abilities with regards to the improved performance, it is neither weak. According to Hernandez-Orallo in his late research paper [6], strong AI evaluation will take advantage of the weak AI aspects because machine intelligence is performing tasks intelligently above all. Evaluations are improving continuously and there is no split between them. As proposed by [8], instead of sticking to the narrow AI we need to look further to the missing middle. Our proposition of the middle AI is endorsed by the missing middle in AI of Daugherty and Wilson. In their book [8], they insist on the role of human and smart machines to assure a mutual help to each other as partners. In this phase of AI lifetime, Human are needed to train, explain and sustain to complement machines. As for machines, they are providing us with the superpower to amplify, interact and embody using their capacities of processing and analyzing enormous data.
Fig. 1. The missing middle of artificial intelligence
We elaborated Fig. 1 to characterize each phase in the artificial intelligence lifetime including keywords extracted from [8]. The middle AI proposition is a way to bring out the towards human-level machine intelligence concept’s missing value. The middle AI uses machine intelligence to draw the way toward humanlevel machine intelligence. The missing middle seems to be adequate to fill the gap of intelligent systems’ membership to strong/weak AI. Hence, it will explain the confusion of using some other evaluations’ approaches and allows a new type of evaluation to see light.
3
In the Search of Middle AI Evaluation
McCarthy [20] defined artificial intelligence as the science of making intelligent machines. Tremendous measurement endeavors concerning proving this declaration such [6] give the motivation to assess the trueness of artificial intelligence. According to Minsky’s definition of intelligence [21], he related the artificial intelligence field to making machines performing their tasks that require intelligence if they were performed by a human, thus, evaluation of AI in term of machine performance success is checking if these tasks are well done as if they were done by a human. From here stems the concept of human-level machine
440
M. Brichni and S. El Gattoufi
intelligence and we can intuitively detect the utility of the Turing Test for evaluating this concept as it can be confusing when we would differentiate the machine performance from the human performance. In practice, artificial intelligence is the domain of injecting knowledge into a system to learn how to perform their tasks. It is a confession about the anomaly of featuring the capability of learning instead of intelligence. The question that could be legitimate is: what is the difference between machine intelligence and machine learning in this case? This confusion of concepts requires clarification to determine the difference and the commonality that exists between them. We are digging into the comparison of soft and strong AI as a time-worn debate like how described by Hernandez-Orallo [6]. Resolving this issue would set a coherent understanding of all the concepts that were invented all along the way of AI existence. Although its importance, there were no efforts to define what was meant by helping human and replacing him. By overhauling AI measurements, there is no global evaluation to classify them but Hernandez-Orallo’s proposition. Moreover, some evaluations are old edition as how described by [6], not thorough to include all AI fields and result oriented rather than how and what to measure. Evaluations’ types suggested by [6] are gathering performance measurements, human discrimination, problem benchmarks, and peer confrontation under task-oriented evaluation. The second type is the ability-oriented evaluation, enclosing universal psychometrics and algorithmic information theory. As has been stated in [6], it is beneficial to have specialized systems working together with systems including abilities just like a human to solve new problems and explore new solutions. Since the weak AI involves systems to resolve specific tasks and the strong AI is characterized with ability oriented systems, to which evaluation the compound systems belong? Hernandez-Orallo has suggested a specific evaluation for each AI phase. That is a task-oriented evaluation for taskspecific systems and ability-oriented evaluation for general-purpose AI systems. It seems to be interesting to dig into searching what type of evaluation to apply to systems that are task and ability oriented at the same time. Ability oriented systems are evaluated using a defined and limited set of tasks or even reprogrammed for a specific target what makes them lose their general-purpose identity to be more like task-oriented systems. This statement points out the absence of evaluation methods for such systems and sheds light on the strong relationship between task and ability oriented systems that complicates the separation of evaluation tools. So, is it feasible to rely only on ability oriented evaluation or do they lead help from task-oriented evaluation to resolve this confusion ? or maybe there should be another evaluation type that has to be considered for actual systems that are not yet implementing the general purpose characteristic but detain some sort of ability depicted while performing tasks? Artificial intelligence field has known a lot of inventions, which may tell the progress of this science that is maintained by dint to evaluation and measurement. We still have to dig into these evaluations since improvement is needed to resolve using task-oriented evaluations concern for ability oriented systems. As
Middle AI: Uncertainty-Oriented Evaluation
441
declared by Hernandez-Orallo, this muddle might be related to not considering AI evaluation as a measurement process when it deals with complex phenomena such as intelligence. 3.1
Task-Oriented Evaluation
Because of the sophistication involved in AI systems, using testing methods such as white-box assessment has become difficult because of the changing context of the complex systems and the uncertainty that may occur meanwhile. Instead, black box assessment approach has known a broad use for such systems since we can at least evaluate the system through their behavior. This approach is divided into three categories according to [6], Human discrimination, Problem benchmarks, and Peer confrontation. Human discrimination category is the concrete application of the Turing machine. It is the assessment made by or against a human to judge human behavior through simple observation. Problem benchmarks are the assessment performed against a set of problems. Some of the approaches known in this category are Information-driven sampling and difficulty-driven sampling. Peer confrontation is an assessment made between one match or various matches to set a comparison between machines competing against each other. The overlap may be detected between these categories as they share one target which is the black box assessment approach. There was an interest from [22] on renewing how task-oriented evaluations were visioned which can support our assumption for a new evaluation type. The author of [22] elaborated also new ways for evaluating the real progress of AI to reflect the true belonging to the field instead of evaluating the efficiency of AI through their performance. 3.2
Towards Ability-Oriented Evaluation
According to Hernandez [6], we need to distinguish between AI applications and AI systems. Applications use task-oriented evaluation and systems use ability-oriented evaluation since they are generic but reduced to only few tasks. Hence, the task-oriented evaluation is not desirable for these systems. AI systems are supposed to detain human-like abilities such as learning, reasoning, which endorse the idea of using ability-oriented techniques for evaluating them. According to [23], cognitive abilities are forming what is known by “the general intelligence”. Rajani [24] has proposed evaluating machines on the basis of human intelligence that places the machine performance on a scale comparing to human performance but the added specificity is that it mixes the Turing style test as a task-oriented evaluation together with the aspect of assessing artificial abilities that introduce the possibility of performing better than all human or most of them. Cognitive abilities are forming what we can call the general intelligence. The idea of evaluating this general intelligence was dealt with Hernandez-Orallo [6]
442
M. Brichni and S. El Gattoufi
when the question was raised concerning the feasibility of application-specific approaches to evaluate AI systems. As for how it was criticized, stakes were too high for such evaluation since that intelligent systems are more and more complex, so as their evaluation. Hernandez-Orallo [25] proposed the approach of the universal psychometrics to measure the cognitive abilities of any systems. A clear confession from [6] that evaluation metrics for intelligent systems are defined in terms of performance variables since it is not an easy task to assess how intelligence is depicted without leading help from its concrete and measurable dimensions. The blame is, why intelligence dimensions like autonomy and robustness are not considered while assessment? It might be the principal key of the uncertainty-oriented evaluation to assess the middle AI. This proposition includes intelligence dimensions to improve performance measurement as a task-oriented evaluation and to push forward implementing uncertainty and empathizing on the role of cognitive abilities to deal with it. Task-oriented evaluation has its own place within the assessment world of artificial intelligence, whereas ability-oriented evaluation is not well settled yet since it is linked to general purpose systems. We should improve task-oriented evaluation to take into consideration the artificial system’s progress that exceeds tasks’ limits and embrace some cognitive abilities. 3.3
Uncertainty Oriented Evaluation
As endorsed by Hernandez-Orallo in [6] “There is a hierarchical continuum between task-oriented evaluation and ability-oriented evaluation.” It is permitted to define intermediate evaluation relying basically on tasks and evolving to introduce ability throughout the way but to not be mistaken with considering specific purpose systems or general purpose systems are indicators of AI progress. The main objective from the present paper is to reveal the gap that was caused by artificial intelligence evaluation milestones depicting the AI progress within AI lifetime. Several endeavors are destined to organize artificial intelligence evaluations. In [26] Goertzel categorizes measures and tests as either helping for evaluating the achievement of human-level AGI like Turing test or the partial progress toward human-level AGI. The study conducted in [6] falls back to the postulate of the human reference to classify systems designed for help from those for human replacement. The questions we raise are: to what evaluation kind is the middle AI oriented? Are today’s inventions evaluated on the basis of tasks or abilities or maybe both? If so, a new evaluation type seems interesting. We depicted in Fig. 2 the middle AI trust phase and the proposed uncertainty oriented evaluation that rests on basic tasks’ presence along with primordial abilities such as the decisional process which relies on advanced learning techniques to deal correctly with uncertainties. The towards human-level machine intelligence involves a bunch of concepts according to the machine situation. A task-oriented evaluation can be helpful when the task is defined and the result is granted. When we deal with adaptive
Middle AI: Uncertainty-Oriented Evaluation
443
Fig. 2. AI timeline: an uncertainty oriented evaluation for the middle AI
systems, uncertainty is unavoidable due to the changing context, which cannot be handled with the mentioned evaluation. Ability oriented evaluation, neither, is appropriate for today’s intelligent machines let alone the fact that it is not clearly defined as a “futurist concept. According to what we can interpret from Searl’s argument of the Chinese room [9], Turing test, as a task-oriented evaluation, is not considered as a valid test while enduring uncertainty. We believe that what can characterize AI the most, is intelligent autonomous adaptive systems, which recalls the abridged understanding of intelligence as the appropriate action performed autonomously in an uncertain situation. Thus, we believe also that assessing intelligence has to empathize on the added value of performance and autonomy within uncertainty, and that can be under an uncertainty oriented evaluation. This kind of evaluation is specialized in assessing intelligence when uncertainties occur and when expectations and results are mismatched. The uncertainty oriented evaluation focuses on the role of abilities to achieve a robust task performance taking into account the disturbance in a way to correct it to match with the desired results. Numerous verification and validation methods, intelligence measures and tests for adaptive systems were using a similar approach but were not classified to any of the existing AI evaluations such as [27–29] and many other works. According to [30], the difficulty of certifying decision making systems stands when uncertainties are unpredictable. Thus, since performance and autonomy are considered the main key concepts to define intelligent systems, it is meaningful to use an uncertainty oriented method for evaluation.
4
Conclusion
The continuous use of Turing tests as a task-oriented evaluation and an abilityoriented evaluation opened our eyes about the significant value of the conducted work.
444
M. Brichni and S. El Gattoufi
This paper has investigated the evaluations suggested by Hernandez-Orallo. In his most recent research, Hernandez-Orallo is presenting the categorization of measurements with their membership to artificial intelligence. Our research underlined the importance of introducing a new type of evaluation to place measurement approaches that are based on a human methodology to overcome uncertainties. This evaluation type is relying on performing tasks using human artificial abilities in uncertain situations, which characterizes the middle AI. We have found it innovative to suggest and set a separate evaluation to a new phase of the artificial intelligence that is in her way to bloom since the narrow AI took so long as the general AI still a future concept. Our work might have some limitations as long as it is a theoretic suggestion. Nevertheless, we believe it could be a springboard for artificial intelligence since it will focus on a new component other than tasks and abilities. This study has gone some way towards enhancing our understanding of where stands intelligent machines within the artificial intelligence timeline which will facilitate the measurement process and will define better the components to take into consideration for a robust quotient of intelligence. We are currently in the process of investigating the novel measurement approach that will satisfy the new evaluation type and that stands on uncertainties while performing tasks autonomously. Later, additional work needs to be elaborated to establish the validity of our suggested work to give a helping hand to the artificial intelligence community.
References 1. Gunderson, J., Gunderson, L.: Intelligence= autonomy= capability, Performance Metrics for Intelligent Systems, PERMIS 2. Robert, F.: A method for evaluating the “IQ” of intelligent system. In: Preliminar Proceedings Performance Metrics for Intelligent Systems Workshop (2000) 3. Meystel, A., et al.: Measuring performance of systems with autonomy: metrics for intelligence of constructed systems. In: White Paper for the Workshop on Performance Metrics for Intelligent System, pp. 14–16. NIST, Maryland (2000) 4. Legg, S., Hutter, M., et al.: A collection of definitions of intelligence. Front. Artif. Intell. Appl. 157, 17 (2007) 5. Saridis, G.: Definition and measurement of machine intelligence, pp. 441–452. NIST Special Publication SP (2001) 6. Hern´ andez-Orallo, J.: Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement. Artif. Intell. Rev. 48(3), 397–447 (2017) 7. Iantovics, L.B., Gligor, A., Niazi, M.A., Biro, A.I., Szilagyi, S.M., Tokody, D.: Review of recent trends in measuring the computing systems intelligence. BRAIN Broad Res. Artif. Intell. Neurosci. 9(2), 77–94 (2018) 8. Daugherty, P.R., Wilson, H.J.: Human + Machine: Reimagining Work in the Age of AI. Harvard Business Press, New York (2018) 9. Searle, J.R.: Minds, brains, and programs. Behav. Brain Sci. 3(3), 417–424 (1980) 10. Maguire, P., Moser, P., Maguire, R.: A clarification on Turing’s test and its implications for machine intelligence. In: Proceedings of the 11th International Conference on Cognitive Science, pp. 318–323 (2015)
Middle AI: Uncertainty-Oriented Evaluation
445
11. Besold, T., Hern´ andez-Orallo, J., Schmid, U.: Can machine intelligence be measured in the same way as human intelligence? KI-K¨ unstliche Intelligenz 29(3), 291–297 (2015) 12. Zadeh, L.A.: Toward human level machine intelligence-is it achievable? The need for a paradigm shift. IEEE Comput. Intell. Mag. 3(3), 11–22 (2008) 13. Hutter, M.: One decade of universal artificial intelligence. In: Theoretical Foundations of Artificial General Intelligence, pp. 67–88. Springer (2012) 14. Jain, L.C., Quteishat, A., Lim, C.P.: Intelligent machines: an introduction. In: Innovations in Intelligent Machines-1, pp. 1–9. Springer (2007) 15. Mikolov, T., Joulin, A., Baroni, M.: A roadmap towards machine intelligence. arXiv preprint arXiv:1511.08130 16. Warwick, K., Nasuto, S.: Rational AI: what does it mean for a machine to be intelligent? IEEE Instrum. Meas. Mag. 9(6), 20–26 (2006) 17. Adams, S., Arel, I., Bach, J., Coop, R., Furlan, R., Goertzel, B., Hall, J.S., Samsonovich, A., Scheutz, M., Schlesinger, M., et al.: Mapping the landscape of humanlevel artificial general intelligence. AI Mag. 33(1), 25–42 (2012) 18. Powers, D.M.: Characteristics and heuristics of human intelligence. In: 2013 IEEE Symposium on Computational Intelligence for Human-like Intelligence (CIHLI), pp. 100–107. IEEE (2013) 19. Goertzel, B., Yu, G.: From here to AGI: a roadmap to the realization of human-level artificial general intelligence. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 1525–1533. IEEE (2014) 20. McCarthy, J.: From here to human-level AI. Artif. Intell. 171(18), 1174–1182 (2007) 21. Minsky, M.L.: Semantic Information Processing. The MIT Press, Cambridge (1969) 22. Marcus, G., Rossi, F., Veloso, M.: Beyond the turing test. AI Mag. 37(1), 3–4 (2016) 23. Strickler, R.E.: Change in selected characteristics of students between ninth and twelfth grade as related to high school curriculum (1973) 24. Rajani, S.: Artificial intelligence-man or machine. Int. J. Inf. Technol. 4(1), 173–176 (2011) 25. Hern´ andez-Orallo, J., Dowe, D.L., Hern´ andez-Lloreda, M.V.: Universal psychometrics: measuring cognitive abilities in the machine kingdom. Cogn. Syst. Res. 27, 50–74 (2014) 26. Goertzel, B.: Artificial general intelligence: concept, state of the art, and future prospects. J. Artif. Gen. Intell. 5(1), 1–48 (2014) 27. Liu, Y., Yerramalla, S., Fuller, E., Cukic, B., Gururajan, S.: Adaptive control software: can we guarantee safety? In: Computer Software and Applications Conference, COMPSAC 2004. Proceedings of the 28th Annual International, vol. 2, pp. 100–103. IEEE (2004) 28. Cukic, B., Mladenovski, M., Fuller, E.: Stability monitoring and analysis of learning in an adaptive system. In: Proceedings of the 2005 International Conference on Dependable Systems and Networks, pp. 70–79. IEEE Computer Society (2005) 29. Yerramalla, S., Liu, Y., Fuller, E., Cukic, B., Gururajan, S.: An approach to V&V of embedded adaptive systems. In: International Workshop on Formal Approaches to Agent-Based Systems, pp. 173–188. Springer (2004) 30. Crum, V., Homan, D., Bortner, R.: Certification challenges for autonomous flight control systems. In: AIAA Guidance, Navigation, and Control Conference and Exhibit, p. 5257 (2004)
Design the HCI Interface Through Prototyping for the Telepresence Robot Empowered Smart Lab Ramona Plogmann, Qing Tan(B) , and Frédérique Pivot Faculty of Science and Technology, Athabasca University, Athabasca, Canada [email protected], [email protected], [email protected]
Abstract. Human Computer Interaction (HCI) focuses on the interfaces between human user and computer, which largely impacts the usability of any computerbased system involving the human user. The research program Telepresence Robot Empowered Smart Lab (TRESL) aims to enable distance learning students to experience lab work in a remote laboratory as if they were physically present. The HCI module is a critical component since user experience is the key measurement of system usability. This paper proposes a solution for a web interface that enables online learners to employ telepresence robots as their avatars in a remote laboratory via the Internet. In this way, they can do lab work from anywhere as long as they are online. The main focus of this paper is to present the interface design through prototyping while explaining related parts of the TRESL solution to understand the design and software decisions presented. Keywords: Human Computer Interaction (HCI) · Web interface · Prototyping · Telepresence robot · Remote lab · Smart lab
1 Introduction The rise of the Internet has made online education easier than ever. However, some subjects like chemistry, physics, and biology require extensive lab work which is impractical to do remotely. There is a need for an alternative solution that would engage remotely located students and fulfill learning outcomes when doing remote laboratory activities. An in-depth comparison of different approaches to implement remote lab work in online education today can be found in Tan et al. [1]. They conclude that remote labs – though complicated to set up – can provide “higher learning outcomes and richer learning experience” [1] if implemented in the right pedagogical framework. Telepresence Robot Empowered Smart Lab (TRESL) is a research program ongoing at Athabasca University. It tackles the lab work in the online education environment and aims to provide the state-of-the-art remote laboratory solution. The TRESL solution aims Ramona Plogmann was a Mitacs Globalink Intern at Athabasca University in Summer, 2019. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 446–455, 2021. https://doi.org/10.1007/978-3-030-49342-4_43
Design the HCI Interface Through Prototyping
447
at empowering online students to complete hands-on lab work despite the constrictions associated with distance education. Within the TRESL system, online students sign on to the telepresence robots in the remote laboratory via the Internet. Then, the associated robots become their avatars, which they can online control and manipulate to act like themselves. Through the robots they can conduct their laboratory work either alone or collaboratively as if the students themselves were present. In the laboratory, the telepresence robots, laboratory sensors, and other laboratory devices and equipment are interconnected to create a mesh network. This Internet of Things (IoT) network forms the smart laboratory environment. The ultimate goal of the TRESL system is to enable online users to: “Be There, i.e. experience as if they were present in the remote laboratory, and “Act There”, i.e. extend their interactive capabilities, including sensing, communicating, and mechanical capabilities to do laboratory work and to interact with the laboratory environment” [2]. In the TRESL system the HCI interface connects robots and users, as well as the hardware and software on the client side. The hardware could be a simple computer or mobile device, or more advanced Virtual Reality (VR) devices such as head-mounted displays (HUDs), a goggle, a joystick, or any other human computer interactive device. The software running on the client site computer could be as simple as a web portal accessed through a web browser, and/or a client site application that should be developed and integrated as part of the TRESL system. The more sophisticated and immersive the used HCI devices are, the better the telepresence experience will be. The interface between a human user and the computer-based components in the HCI module becomes the most critical part, which largely defines the success of the TRESL solution. Designing an HCI interface before the other parts of the system have been developed is a challenge since it is difficult to have an idea of how exactly the interface will work. Prototyping the interface gives a good and intuitive preview of the what could become the finished product, which is a great way to improve the design process, especially enabling users and collaborators to provide their feedback ahead. This paper presents the HCI interface design for the TRESL system as a prototype. In the following section, the HCI interface design prerequisites are provided to give more information of what is unique to the interface’s requirements. Section 3 illustrates the main contents of this paper by presenting the HCI interface prototype. Software design and scalability of the interface will be discussed in Sect. 4.
2 The HCI Interface Design Prerequisites For the TRESL interface to eventually be used in an educational setting, there is a lot that needs to be done beforehand. This proposed HCI interface design is based on a remote laboratory in Athabasca University’s main campus in Athabasca, Alberta, Canada. It is equipped with several different sensors that make precise detection of positions and movements of the telepresence robots possible. They also enable the algorithms that are being constructed for easy and effective handling of the robots to work.
448
R. Plogmann et al.
2.1 The Robots’ Driving Modes A telepresence robot has three driving modes. The first one is the manual mode, where the user has full control over all movements of the robot by using a controlling device such as keyboard, mouse, touch controls, or others (more in Sect. 4.2 Scalability). Second, there is an automatic mode in which the user sets an end point for where he wants the robot to go by selecting a desired target location. The robot then finds its way with the help of an algorithm that takes route planning and collision avoidance with objects and other robots into consideration and moves to the target without any further user action. Lastly, a follow mode is currently being developed. It is to be used in group activities like teaching or solving a research exercise involving several users and their respective telepresence robots. In this case, one robot is chosen as the lead and the other robots move to maintain a formation around the lead. Eventually, the goal is, for a person that is in the lab physically, to also be able to take the lead role. As seen in the automatic mode, as well, the user is not required to do any action connected to driving the robot while in follow mode. 2.2 The Variety of Telepresence Robots The TRESL solution aims to provide an open platform where various types of telepresence robots can be integrated according to the lab work needs and pedagogical requirements. To get a clearer understanding of what the robots need to be able to do, it is required to consult subject experts that can give a detailed description of the labs they plan to run. This will help to determine what kinds of robots will be most suiting for these tasks. Hence, the design of the HCI interface has to provide the flexibility for users to interact with the robots that are going to be used and the exact kind of lab work that is going to be conducted in the TRESL lab.
3 The HCI Interface Prototype With so many unresolved variables, it made sense to develop the interface as only a prototype for the time being. For this, the software Justinmind1 was chosen. It allows its users to create functional prototypes including actions, transitions, and variables, which is very practical for concept proofing and allows making your ideas visible to others. In the professional version, you can also share your prototype online and have multiple developers work on the same prototype. 3.1 Inspiration from Related Works Besides taking preexisting telepresence robot control interfaces [3] into consideration for input on how to best design this interface, it was very helpful to take inspiration from computer game designs. First person perspective computer games come closest to the experience of driving a telepresence robot, because they have the same goal of 1 https://www.justinmind.com/.
Design the HCI Interface Through Prototyping
449
completely immersing the user and making them feel like they were actually in the game and not in front of a computer just playing a game. This research led to the decision of making the menu and controls semi-transparent to not block the main view [4], which is more like a head-up display (HUD) that presents information just in front of a user’s usual viewpoints [5]. For the pinned view of the map, the design was also inspired by computer games to be partially see through as well as to always be centered on the user’s own robot and turn and move relatively to that. An alternative solution was keeping the map excerpt aligned just like the map in the main view and adding an angle to the position circle that shows the robot’s orientation, as it is used in Google Maps [6]. Existing telepresence robot interfaces showed the importance of a downward facing camera for collision avoidance and precise steering [7, 8]. Another important issue was searching for how to design the on-screen controls as an alternative to keyboard controlling and as a solution for screen only devices such as tablets or smartphones. After an evaluation of different on screen controls, like the slider controls on the PadBot T1 [9] and buttons for each direction as seen on the iPad interface of the Double by Double Robotics [10], it was decided to use a virtual joystick as shown in Cooley [7], because it allows for precise steering and handy access to all directions while taking up minimal screen space. 3.2 Structural Design Figure 1 shows the main setup of the interface. There are different windows displaying different graphical inputs. One of these windows is set as the main view. The main view can be easily exchanged by clicking on the little arrow placed on the left side of any of the pinned windows. This will cause the chosen pinned window and the main view to switch places. Furthermore, all pinned windows can be opened as external windows and dragged to a second screen for a better overview. The pinned window can be closed at any time by clicking the x in the upper right corner. Afterwards, the x turns into a plus sign shown in Fig. 2. If clicked, the plus sign will rotate 45°, displaying a closing x again and a small menu will pop up that displays the windows that could be opened here. This window menu can be seen in Fig. 3. It is arguable whether or not the menu should be restricted to display only non-active windows. Displaying all windows (including open ones) allows each window to always be found in the same place; however, this would allow users to open multiple instances of the same view. Showing only non-open windows has the benefit of a cleaner structure, since there are fewer options to choose from and the maximum number of windows that can be opened is limited. 3.3 The Menu The menu button is constantly located on the upper left corner of the main screen and, when clicked, the menu slides in from the left. The first version of the menu can be seen in Fig. 2. It is separated into sections. On the very top, there is a home button that resets the view to the users predetermined preferred window combination. Underneath, the mute buttons for audio and video can be found, which allow the user to control if
450
R. Plogmann et al.
Fig. 1. Prototype - Main view
Fig. 2. Prototype - menu
they can be heard and seen. When activated, the red buttons stay on screen even if the menu is closed, as shown in Fig. 3, to remind the user that they are not able to be heard or seen in the lab. They also remain clickable to quickly deactivate the muting without having to open the menu. Below the mute buttons, the different driving modes (as explained in Sect. 2.2) can be chosen. In order from top to bottom, these are: manual, automatic, and follow. The book icon underneath is a button to open and browse the student’s current or past tasks, either as a link to a separate platform, or incorporated into a new window. In the tasks section, there could also be an option for the student to take notes.
Design the HCI Interface Through Prototyping
451
Fig. 3. Prototype - mutes and Window menu
The mouse icon currently toggles the on-screen controls on a computer (on a mobile device, they are visible by default). However, additional controls like a physical joystick, a game controller, or Virtual Reality gloves will also be incorporated here. Depending on the number of plugged in devices, they may either be displayed one below the other or as an expandable sub-menu that opens to the right. The question mark opens the information graphic that reminds users how to control the robot with the keyboard, as seen in Fig. 2. This could become important if students are required to switch between different robot models for different lab work, each having its own specific features, for example an extra arm, and therefore varying controls. Finally, there is a button to open the settings that will include choosing your own home screen setup and plugging in new devices. 3.4 Map In the pinned state the map only shows the excerpt of the map where the user is currently located, while spinning around the robot’s icon to maintain correct orientation. The main view map, however, is supposed to show the complete laboratory and show all robots and users that are currently operating in the laboratory, distinguishing robots and humans by different colors. The user’s robot itself is represented by a triangle that can show its orientation in the stationary lab map. The user could also be represented by a circle with an orientation angle instead of a triangle. In this case, there would have to be another way to distinguish one’s own robot from the other robots. It could, for instance, be displayed as an enlarged circle or a circle of a different color, as seen in Fig. 4. The map is a special kind of window, since it doesn’t just display an input stream but has logic to itself. There can only ever be one map that is dynamically updated when users move their robots or other objects/people move inside the remote laboratory. Even though different users see the same map in the same moment, it has to be adapted for each user to display it from their point of view, i.e. displaying their own position in a
452
R. Plogmann et al.
Fig. 4. Prototype – map
way to be easily distinguished from everyone else’s. The map could also play a bigger role in the automatic mode where users are supposed to click on a specific point of the map to move there. As the robot moves there on its own, it will graphically display the route on its way. In this case, the map has to be traceable in order to interact with it.
4 The Interface Software and Scalability 4.1 Software Design The interface is the central piece in the architecture that will bring together the telepresence robots and users. The goal is to make the interface easy to use on both sides, featuring an intuitive and fast to understand frontend for the users to control the telepresence robots and to easily grasp what is happening in the remote laboratory. On the other hand, it should also make it easy for developers to add in a new type of robot or a new kind of controlling hardware and seamlessly integrate it with the existing functionalities. Figure 5 shows the first draft of a class diagram describing the proposed basic structure of the web interface. The left side of the diagram shows the functional structure, while the right side represents view components. For clarity, getters and setters have been left out of the diagram. The controller has not been designed yet, either, because it highly depends on the implementation that will be chosen. Windows can show several different inputs. In the current design, there are windows for views produced by cameras attached to the robot, cameras in the laboratory itself, and the camera showing the user’s representation in the video call as well as the map window. However, this does not limit what can be represented in a window. For the finished product, other applications are possible such as displaying stats or other input in a new window or showing other participants’ camera views. For every graphical input, there is one window. The Boolean variable “active” describes if a window is visible at the moment. One window, the main window, has
Design the HCI Interface Through Prototyping
453
Fig. 5. The class diagram
the menu and other controls attached to it. The main window has all pinned windows pinned to it. Pinned windows can be unpinned and dragged to another screen. The method unpin() returns the position that the window had been pinned at, which can be used to pin another window (for example when changing the main window). Depending on how free pin positions are stored and determined, and therefore depending on how the error is handled in the case that pin(position) receives a position that is already taken, the method could either return nothing or the success as a Boolean. All windows except the main one can be closed and reopened. This alters the state of the “active”-variable. Since the interface is supposed to accommodate a variety of different types of telepresence robots for different kinds of lab work, the Robot Class itself is abstract. In Fig. 5 there are two examples of inheriting classes visible on the very left side of the diagram. The specific robots will each implement the particular movements of the robot’s motors. The Robot class as described in the diagram features the basic setup that consists of the driving apparatus and a single arm. More sophisticated robots will require the addition of more functions to control more detailed movements. For driving, the abstract Robot class has methods like forwards, backwards, etc., but only to have an easy way to have the controller call simple movements. Inside these methods, the move(dir) method will be called, with a parameter determining the direction of the move, so that the movements that might be processed very differently by different types of robots can all be handled in a central place, which prevents unnecessary code duplication. A user can connect to one robot at a time. User data will be stored in a designated database. 4.2 Scalability The HCI interface is designed to be highly adaptable to both the users’ needs and their technical opportunities. In its most basic setup, the user would connect to the robot through any computer or mobile device and use the keyboard or on-screen controls to
454
R. Plogmann et al.
steer the robot. This way the interface wouldn’t need additional hardware to work and is easily available to everyone. At the same time, this doesn’t allow for a very authentic and personal experience. To accomplish this, users could plug in different kinds of hardware. The controls could vary from a controller or a joystick, which would make steering significantly more intuitive than just a keyboard or a mouse, to VR controls such as sensing gloves that would greatly benefit the naturalness of movements and help with feeling more physically involved. For viewing purposes, a Virtual Reality headset would make users feel more immersed and make for a more realistic experience. Naturally, most students do not have access to expensive hardware, which is why the interface is meant to be scalable and to work on these different levels. This way, all students can have the opportunity to take part in the lab work, even if they can’t access a facility that provides the additional hardware such as Virtual Reality devices.
5 Conclusion This paper introduces the proposed design of a web-based interface through prototyping for the human computer interaction needed in the distance education environment desired in the TRESL solution. Different design options have been discussed and, after research and consideration, this design seems to best meet the outlined needs at this current stage of the project. By using the prototyping software Justinmind, the design process becomes more intuitive and interactive. Prototyping provides a cost and time effective way to explore different design options. Therefore, we can conclude that designing the HCI interface by prototyping is an efficient step in research and development of the TRESL solution. Furthermore, through the prototyping, missing knowledge regarding the hardware and software of the TRESL system, and several other limitations at this stage of the research have been identified. These issues need to be resolved before the interface can go into the final stage of development. Nevertheless, this research and the results that have been presented hopefully provide a profound basis for further research and development of the TRESL HCI and solution.
6 Image References for Images Used in the Prototype Mary MacDougall. Room Arrangement Map [Sketch]. Retrieved May 17, 2019 from macdougallteacher.weebly.com/room-arrangement-map.html. ˇ [Untitled photograph from series “KÚPELNE | Marble Lab | Graniti fiandre | Obklady ˇ a dlažba, série | SIKO KÚPELNE”]. Retrieved May 17, 2019, from https://www.siko. sk/marble-lab/graniti-fiandre/serie. [Untitled photograph of a lab]. Retrieved May 17, 2019, from http://www.artlabsin dia.com/lab-furniture.html.
Design the HCI Interface Through Prototyping
455
References 1. Alkhaldi, T., Pranata, I., Athauda, R.I.: A review of contemporary virtual and remote laboratory implementations: observations and findings. J. Comput. Educ. 3(3), 329–351 (2016). https://doi.org/10.1007/s40692-016-0068-z 2. Tan, Q., Denojean-Mairet, M., Wang, H., Zhang, X., Pivot, F.C., Treu, R.: Toward a telepresence robot empowered smart lab. Smart Learn. Environ. 6(1), 1–19 (2019). https://doi.org/ 10.1186/s40561-019-0084-3 3. Denojean-Mairet, M., Tan, Q., Pivot, F. C., Ally, M.: A ubiquitous computing platform affordable telepresence robot design and applications. In: Published on the 13th IEEE International Conference on Ubiquitous Computing and Communications, Chengdu, China, 19–21 December 2014 (2014) 4. WatchMojo.com: Top 10 Immersive First Person Games [Video file], 13 February 2018. https://www.youtube.com/watch?v=vhMOfIxqhnQ 5. Head-up display. https://en.wikipedia.org/wiki/Head-up_display 6. Google. (n.d.). https://maps.google.com/ 7. Cooley, W.: WebRTC controlled Telepresence Robot project [Video file], 25 June 2014. https:// www.youtube.com/watch?v=rtgysHYEnNo 8. Robotics, D.: Double Robotics – Driving [Video file], 7 January 2014. https://www.youtube. com/watch?v=tgcKsFsLLkw 9. Robot PadBot: PadBot T1 telepresence robot, RC, remote control, tabletop robot [Video file], 7 July 2016. https://www.youtube.com/watch?v=ijnE1RZETdM 10. Robotics, D.: How do I drive my Double using the Double iOS app? 25 July 2017. http://support.doublerobotics.com/customer/en/portal/articles/2376976-how-doi-drive-my-double-using-the-double-ios-app
Ant Colony Optimization on an OBS Network with Link Cost and Impairments Francois Du Plessis1(B) , M. C. Du Plessis1 , and Tim Gibbon2 1
Department of Computing Sciences, Nelson Mandela University, Port Elizabeth, South Africa [email protected] 2 Department of Physics, Nelson Mandela University, Port Elizabeth, South Africa
Abstract. The association of cost to the use of a link within an Optical Burst Switched network using an Ant Colony Optimization algorithm is considered. This creates a multi objective optimization problem for the ACO to solve by having to maximize the success rate of the network and minimize the cost of link utilization. This paper proposes three Pheromone Functions with the goal of minimizing the total cost on the network whilst maximizing the success rate. Two of the Pheromone Functions displayed promise in their ability to work toward the multi objective optimization problem. Keywords: Ant Colony Optimization (ACO) · Optical Burst Switched Networks (OBS) · Routing and Wavelength Assignment (RWA) · Multi objective optimization
1
Introduction
Ant Algorithms are algorithms created by studying the behaviour of ants. It was noted that ants had an ability to communicate to each other in an indirect way called stigmergy [4]. Optimizing network routing is a problem where, through altering their environment, the ant inspired algorithm Ant Colony Optimization (ACO) can be applied [3]. Optical to electrical and electrical to optical conversion is slow and does not allow for the maximum utilization of network resources. Optical Burst Switching (OBS) tries to avoid this problem by setting up an all optical path of transmission between the source and destination node, reducing the amount of opticalelectrical conversions for a transmission. Small control packets are sent ahead of a transmission that are read by all the nodes along the path in order to configure their switches before the transmission arrives so that no conversion is required once the transmission arrives at a node [11]. Routing a transmission in an optical network does not only require the selection of a route but also the wavelength of the transmission. This is called the c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 456–465, 2021. https://doi.org/10.1007/978-3-030-49342-4_44
ACO on an OBS Network with Link Cost and Impairments
457
Routing and Wavelength Assignment (RWA) problem. A problem is that the optical medium used for transmissions and multiple wavelengths on this medium interact, which causes a reduction in the signal quality of a transmission. The physical factors that causes degraded signal quality of a transmission are called Physical Layer Impairments (PLIs) [10]. Routing in an OBS network requires solving the RWA problem. Adding the concept of cost to the network adds another dimension to the routing problem that the ACO algorithm must consider. This paper will add a cost for the use of each link in the network. This creates a multi objective optimization problem where the ACO must try to maximise the success rate of transmissions over the network whilst trying to minimize the cost of these transmissions. The cost for a transmission would be the sum of the costs of each link utilized for the transmission. This paper is organized as follows: Sect. 2 consists of a literature review of Ant Colony Optimization and Optical Burst Switching. Section 3 details the implementation of the Ant Colony Optimization algorithm. Section 4 describes the problem of adding costs to links in a network. Section 5 explains the setup of the experiments and the parameters used. Section 6 presents the results found during experimentation. Section 7 summarizes the findings of this paper.
2 2.1
Literature Review Ant Colony Optimization
The roots of Ant Colony Optimization (ACO) started from the experimentation of Goss et al. [5] that found that ants have a tendency to find the shortest path to a food source. Ants will walk in random directions initially in search of food. Once a food source is found, the ant deposits a pheromone all the way back to the nest. Other ants may now decide whether or not they want to follow this pheromone, if they do, they also deposit a pheromone strengthening the pheromone on the route to the food source. Routes with strong pheromone deposits are more likely to be followed by ants. This results in a convergence to a short (possibly shortest) route to a food source [4]. Dorigo [2] developed one of the first models of ants foraging techniques. This lead to more research effort into Ant Algorithms. Ant Colony Optimization uses virtual ants that imitate this behaviour of ants. They deposit pheromones on routes they traversed and the more virtual ants utilize a route the higher its pheromone value will become. High pheromone routes are considered optimal. 2.2
Optical Burst Switching
Optical Burst Switching (OBS) is a method of switching optical traffic to reduce the overhead of conversion between light and electrical signal. The control packet is converted from optical to electrical at intermediate nodes for its data to be read. This packet is sent ahead of the payload and contains information about
458
F. Du Plessis et al.
the destination and the duration of the transmission [1]. The payload of the transmission, also known as the burst, only undergoes optical electrical conversion at the source and destination nodes. In the case where an intermediate node is unable to accommodate a burst, it is either buffered or dropped. Other switching methods exist for example Optical Circuit Switching (OCS) and Optical Packet Switching (OPS). OCS uses wavelength routing in order to determine the correct path for a transmission. OCS’s lack of statistical multiplexing and long light path configuration times creates unnecessary network control overhead when dealing with transmissions of short durations. High bandwidth transmissions are normally short creating a burst like effect on a network putting strain on the OCS network. OPS transmits headers with its packets and buffers the packets at intermediate nodes whilst the header gets processed. The buffering of packets in OPS requires optical technology that is currently not available to be widely feasible. OBS tries to use available technology to enable optimal utilization of the speed optical fibre provides and suits the burst nature of internet traffic. OBS is thus poised to be the chosen method of optical switching [9].
3
ACO Implementation
The authors of Gravett, Plessis, and Gibbon [6,7] created an OBS protocol that is aware of Physical Layer Impairments (PLIs). This will be used as the basis for this paper along with the ACO algorithm developed by Gravett, Plessis, and Gibbon [7]. This algorithm considers the route and the wavelength as being coupled and are considered at the same time when a routing decision needs to be made. The protocol and algorithm were implemented in a simulator used in this paper for investigating the addition of link cost to the network. Each simulation is executed for T timesteps. At each timestep t, X transmissions are generated. The source and destination node of each transmission is selected at random. After traffic has been generated, the ACO selects a solution using Algorithm 1 and a control packet is sent along the selected route for each transmission in order for the intermediate nodes to be configured correctly and remains one timestep ahead of the transmission. If an intermediate node is unable to be configured for a transmission, that transmission is considered to have failed. During each timestep, the transmissions in flight will traverse across the links defined in their route. The loss of power s for the transmission is calculated using the Pi following: i(I\{s}) 4.78 × L × Bs × 10P10s where I is the collection of all concurrent (λs −λi )×Bi × 10 10
transmissions on a given fibre link, the length L of the link in kilometres, λ, B and P is the wavelength, the bitrate (in Mbps) and power for the transmission respectively. Once a transmission suffers a power loss equal to or greater than smax defined for each node, the transmission is considered unreadable and is treated as a failed transmission. A route and wavelength pair is known as a solution and defines the path and wavelength a transmission uses. Algorithm 1 details how solutions are selected or
ACO on an OBS Network with Link Cost and Impairments
459
created for a given transmission to destination node n. α and β are user defined parameters that should be selected such that α + β < 1. Large values for α define an ACO that exploits existing solutions whereas large values for β define an ACO that explores solutions. Algorithm 1. ACO selection algorithm 1: procedure Selection(n) 2: r ← uniformly random value ∈ [0, 1] 3: if r < α then 4: return Greedy(n) 5: else if r < α + β then 6: return Probabilistic(n) 7: else 8: return Create(n) 9: end if 10: end procedure
n is the destination node
Algorithm 2 describes how a solution is constructed. A finite list of saved solutions for destination n is denoted by the ϑn symbol. The upper and lower bound for feasible wavelengths to select from are defined by λmax and λmin respectively. The user defined parameter c denotes the maximum number of solutions stored for a destination n. Once ϑn is at capacity, the solution with the lowest pheromone value, denoted by sold , is replaced with the newly created solution. Algorithm 2. ACO Creation algorithm 1: procedure Create(n) 2: route ← uniformly random route ∈ ςn 3: λr ← uniformly random value ∈ [λmin , λmax ] 4: if Length(ϑn ) > c then 5: sold where minsold ∈ϑn (Pheromone(s)) 6: ϑn .Remove(sold ) 7: end if 8: snew ← Solution(route, λr ) 9: ϑn .Add(snew ) 10: return snew 11: end procedure
n is the destination node
Algorithm 3 is known as the Greedy algorithm as it selects the solution with the largest pheromone value from ϑn . Lastly, Algorithm 4 defines a selection algorithm that allows for more exploration than Algorithm 3. It achieves this by selecting solutions in a probabilistic manner. Solutions with high pheromone values have a greater chance of being selected.
460
F. Du Plessis et al.
Algorithm 3. ACO Greedy algorithm 1: procedure Greedy(n) 2: return s where maxs∈ϑn (Pheromone(s)) 3: end procedure
n is the destination node
Algorithm 4. ACO Probabilistic algorithm 1: procedure Probabilistic(n) 2: r ← uniformly random value ∈ [0, 1] 3: T otalsum ← 0 4: a ← [] 5: for i = 0..Length(ϑn ) do 6: T otalsum ← T otalsum + Pheromone(ϑni ) 7: ai ← T otalsum 8: end for 9: for i = 0..Length(a) do ai then 10: if r < T otal sum 11: return ai 12: end if 13: end for 14: end procedure
4
n is the destination node
Addition of Link Cost
The utilization of network resources is usually not free. It might be that the cost of using a link might be in monetary value where a service provider charges for the use of their infrastructure. It could also have a more abstract definition for instance: the insecurity of a link where a higher cost implies a less secure link. Being able to train an ACO to minimize the cost whilst still trying to maximize the success rate would prove useful in the situations mentioned above. Adding cost to the choices that an algorithm makes can vastly change the way it reacts to a problem. This paper explores the effects of adding costs to each link between nodes within the network. Cost is defined as a positive value on each link. Adding cost changes the optimization problem into a multi objective optimization problem. One is trying to maximize the success rate of transmissions whilst at the same time trying to minimize costs. This might mean that some tradeoffs might need to be made. In order to control these tradeoffs weightings are added to allow for control between the two different objectives. A mechanism must be used to make the ACO aware of the cost so that it is able to react on it. For this paper Pheromone Function 1 investigated by Gravett, Plessis, and Gibbon [7] is altered by considering the total cost of all the links in the route taken. This paper examines the following Pheromone Functions: κ(t) + 1 χ(t) + 1
(1)
ACO on an OBS Network with Link Cost and Impairments
ω1
ω2 κ(t) + 1 + χ(t) + 1 ς(t)
461
(2)
ω1 κ(t) + 1 ω1 χ(t) + ω2 ς(t) + 1
(3)
ςmax (t) − ς(t) κ(t) + 1 ςmax (t) χ(t) + 1
(4)
κ(t) and χ(t) are the total successful and total failed transmissions for the solution at time t, respectively. ω1 and ω2 are the weight values and ς(t) is the sum of the cost of each link on the route of the solution and ςmax (t) is the maximum cost for any route found at time t. It is worth noting that Eq. 4 considers cost without requiring weight values. Determining the correct weight values is not a trivial task and depends on the function chosen and what the desired outcome would be. The weight values can either be statically defined or changed dynamically. Getting the correct static weight values require multiple tests with different weight value combinations called weight tuning. A Pareto Front analysis can also be used to discard suboptimal weight value combinations. Static values have the disadvantage of being specific to a problem and if the problem scope changes, new tests should be run to determine the new best weight value combinations. Dynamically determined weight values are usually better for when the conditions are unknown and multiple tests for weight values are not feasible. It is difficult to find a dynamic weight function and it is hard to control the desired outcome. This paper used a sinusoidal function to oscillated between the two objectives to find a balance between the two as suggested in Engelbrecht [4] for multi objective optimization problems. The idea is that by continuously changing the focus between the two objectives the algorithm would find a good middle ground for the two objectives as it will never focus on one objective for a long time. The user does not have to specify the weight values for the algorithm. Equation 5 describes how the weight values are calculated at timestep t. The parameter ρ is used to adjust the period of the oscillation between the two weights. tπ ω1 = sin( ) ρ
(5)
ω2 = 1 − ω1
5
Experiment
The simulations were executed with the default values as follows: α = 0.98, β = 0.01, X = 20, T = 2400000 and B = 1024. Each test was executed 15 times and the average of the results taken to avoid any anomalies. Each link in the network topology was assigned a cost value between 0 and 1 which remained the same for all tests to be able to compare performance between the different test
462
F. Du Plessis et al.
runs. A real world example of the NSFNet network as depicted in Fig. 1 is used for the network topology of the simulator.
Fig. 1. NSFNet network topology [8]
This study performs three different experiments. The first experiment compares the success rates and total costs for the Pheromone Functions defined in Sect. 4 with Pheromone Function 1 which does not consider cost. This is done to determine if the functions were effective in reducing cost whilst keeping the success rate at an acceptable level. The second experiment is done on Pheromone Functions 2 and 3. Weight tuning is performed for these two functions by running multiple experiments with different values for w1 and w2 . A Pareto Front analysis is done for both functions to determine the optimal weight value combinations. The last experiment determines the effective use of the dynamic weight function defined by Eq. 5. Using Pheromone Function 2 and multiple different values for ρ. The results are again compared against Pheromone Function 1, which does not consider cost, to determine the effectiveness of this method.
6
Results
Firstly we compare the simulation results of the three functions based on their success rates achieved found in Fig. 2 using static weights. Whilst Pheromone Function 1 had the best success rate, the other three functions performed rather well with the worst performance having a less than 1% decrease in success rate. Secondly we compare the total cost for the three functions as can be seen in Fig. 3. Pheromone Function 3 performed poorly, considering that it had a worse total cost than Pheromone Function 1 which does not consider cost. Pheromone Function 2 reached the lowest cost with Pheromone Function 4 also having a decreased cost compared to Pheromone Function 1. The Pareto front analysis of the weight tuning done for Pheromone Function 2 is shown in Fig. 4. The black line indicates the front analysis for the best weight value combinations. The weight values from left to right on the front line
ACO on an OBS Network with Link Cost and Impairments
Fig. 2. Success rate comparison
463
Fig. 3. Cost comparison
are: ω1 = 0.1, ω2 = 0.8; ω1 = 0.1, ω2 = 0.9; ω1 = 0.1, ω2 = 0.7; ω1 = 0.5, ω2 = 0.9; ω1 = 0.8, ω2 = 0.4; ω1 = 0.9, ω2 = 0.2. It can be clearly seen that as one increases the value for ω1 in relation to ω2 one gets a better success rate but at an increased cost. The converse is true for increasing ω2 in relation to ω1 which reduces the total costs of transmissions but yielding lower success rates.
Fig. 4. Pareto front weight value analysis of Function 2
Fig. 5. Pareto front weight value analysis of Function 3
The Pareto front analysis for Pheromone Function 3 is depicted in Fig. 5. No front line could be drawn as the points form a linear line where as the value for ω1 is decreased and/or the value for ω2 is increased the algorithm returns worse values for both success rate and cost. The best weight value combination found was ω1 = 1.0; ω2 = 0.0. Pheromone Function 3 is clearly not well suited at addressing this multi objective optimization problem.
464
F. Du Plessis et al.
The dynamic weight results uses Pheromone Function 2 as it showed promising results mentioned above. Weight values are now calculated using Eq. 5. The value for ρ is indicated in the legends of Figs. 6 and 7.
Fig. 6. Dynamic weight value success rate comparison
Fig. 7. Dynamic weight value cost comparison
Figure 6 depicts the success rates for using Eq. 5 with different values for ρ. All these functions achieved a lower success rate than Pheromone Function 1 with some of them oscillating between higher and lower success rates notably when ρ is large. The associated cost results are shown in Fig. 7. All values for ρ produced lower overall costs than Pheromone Function 1 and similar results as for Pheromone Function 2 with only ρ = 100000 oscillating wildly. Using the dynamic weight function allows for a method to minimize the cost without having to perform weight tuning but does have an effect on the algorithm by reducing the maximum success rate achieved as compared to using statically defined weight values.
7
Conclusion
This study sought to determine whether it is feasible for an ACO algorithm to solve the multi objective problem of maximising the success rate and minimizing the link utilization cost on an OBS network with impairments. It was found that it is indeed feasible given the use of a suitable Pheromone Function. Pheromone Function 2 paired with the selection of appropriate weight values allows the user to tune the desired balance between the two objectives. Pheromone Function 4 performed well and does not require the user to select weight values but lacks configurability. Lastly it was found that using the Eq. 5 for dynamic weights produced low cost totals but did cause a significant reduction in the success rate of the algorithm.
ACO on an OBS Network with Link Cost and Impairments
465
Acknowledgements. Thank you to the National Research Foundation (NRF) for its assistance by funding this research. Opinions and conclusions are that of the authors and not necessarily those of the NRF.
References 1. Chen, Y., Qiao, C., Yu, X.: Optical burst switching: a new area in optical networking research. IEEE Netw. 18(3), 16–23 (2004) 2. Dorigo, M.: Optimization, learning and natural algorithms. Ph.D. thesis, Politecnico di Milano (1992) 3. Dorigo, M., St¨ utzle, T.: Ant colony optimization: overview and recent advances. In: Techreport, IRIDIA, Universite Libre de Bruxelles 8 (2009) 4. Engelbrecht, A.P.: Computational Intelligence: An Introduction. Wiley, Hoboken (2007). ISBN 9780470512500 5. Goss, S., et al.: Self-organized shortcuts in the Argentine ant. Naturwissenschaften 76(12), 579–581 (1989) 6. Gravett, A.S., Du Plessis, M.C., Gibbon, T.B.: Hybridising Ant Colony Optimisation with a upper confidence bound algorithm for routing and wavelength assignment in an optical burst switching network. In: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–8. IEEE (2016) 7. Gravett, A.S., du Plessis, M.C., Gibbon, T.B.: A distributed ant-based algorithm for routing and wavelength assignment in an optical burst switching flexible spectrum network with transmission impairments. Photon. Netw. Commun. 34(3), 375–395 (2017) 8. Pavani, G.S., Waldman, H.: Restoration in wavelengthrouted optical networks by means of ant colony optimization. Photon. Netw. Commun. 16(1), 83–91 (2008) 9. Qiao, C., Yoo, M.: Optical burst switching (OBS)-a new paradigm for an Optical Internet. J. High Speed Netw. 8(1), 69–84 (1999) 10. Saradhi, C.V., Subramaniam, S.: Physical layer impairment aware routing (PLIAR) in WDM optical networks: issues and challenges. IEEE Commun. Surv. Tutor. 11(4), 109–130 (2009) 11. Xu, L., Perros, H.G., Rouskas, G.: Techniques for optical packet switching and optical burst switching. IEEE Commun. Mag. 39(1), 136–142 (2001)
The Categorical Integration of Symbolic and Statistical AI: Quantum NLP and Applications to Cognitive and Machine Bias Problems Yoshihiro Maruyama(B) The Hakubi Centre for Advanced Research, Kyoto University, Kyoto, Japan [email protected]
Abstract. Statistical AI is cutting-edge technology in the present landscape of AI research whilst Symbolic AI is generally regarded as good old-fashioned AI. Even so, we contend that induction, i.e., learning from empirical data, cannot constitute a full-fledged form of intelligence on its own, and it is necessary to combine it with deduction, i.e., reasoning on theoretical grounds, in order to achieve the ultimate goal of Strong AI or Artificial General Intelligence. We therefore think of the possibility of integrating Symbolic and Statistical AI, and discuss Quantum Linguistics by Bob Coecke et al., which, arguably, may be seen as the categorical integration of Symbolic and Statistical AI, and as a paradigmatic case of Integrated AI in Natural Language Processing. And we apply it to cognitive bias problems in the Kahneman-Tversky tradition, giving a novel account of them from the standpoints of Symbolic/Statistical/Integrated AI, and thus elucidating the nature of machine biases in them.
1
Introduction
Symbolic AI was dominant at the dawn of AI research and in the following, so-called Golden Age; from a theoretical point of view, the computer per se was born from symbolic logic. Yet Statistical AI has replaced Symbolic AI and it is practically the most successful paradigm of AI in the present Big Data age. So is there no rˆ ole to be played by Symbolic AI? Is it just a historical legacy of good old days AI as it is generally referred to as GOFAI (Good Old-Fashioned AI)? We contend that each has its own advantage and Symbolic and Statistical AI as the two paradigms of AI should be combined and integrated in order to achieve the ultimate goal of Strong AI or Artificial General Intelligence (whether it is conceived as human-level or trans-human). Arguably, induction, i.e., learning from empirical data, cannot constitute a full-fledged form of intelligence on its own, and it is necessary to combine it with deduction, i.e., reasoning on theoretical grounds. We therefore think of the possibility of integrating Symbolic and Statistical AI. In the following we first discuss Quantum Linguistics by Bob c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 466–476, 2021. https://doi.org/10.1007/978-3-030-49342-4_45
The Categorical Integration of Symbolic and Statistical AI
467
Coecke et al. [3,4,7–10] (Sect. 2); it gives the category-theoretical integration of compositional algebraic syntax (symbolic model of grammar) and distributional statistical semantics (vector space model of meaning). Arguably it may be seen as the categorical integration of Symbolic and Statistical AI, and as a paradigmatic case of Integrated AI in Natural Language Processing (NLP for short). It may count as one of the most successful paradigm of quantum machine learning or quantum NLP in particular. We then apply Quantum Linguistics to cognitive bias problems in the Kahneman-Tversky tradition, giving a novel account of them from the standpoint of Symbolic and Statistical AI (Sect. 3). We finally conclude with an outlook for the further categorical integration of Symbolic and Statistical AI paradigms (Sect. 4).
2
Quantum Linguistics Qua Categorical Integration of Symbolic and Statistical AI Paradigms
Quantum Linguistics combines distributional statistical semantics with compositional logical syntax. It is called “quantum” because it was born as a spinoff of what is called Categorical Quantum Mechanics by Abramsky-Coecke [1]. We first briefly review distributional statistical semantics or the vector space model of meaning, arguably the most successful method of Natural Language Processing and Information Retrieval as broadly used in diverse practical applications including Machine Translation and Google Search. The gist of distributional statistical semantics is as follows: Words (w1 , w2 , ...) are associated with vectors (v1 , v2 , ...), called meaning vectors; similarities between words is given by v1 |v2 /||v1 ||||v2 || where v1 |v2 denotes inner product. There is usually a fixed basis of space, i.e., a set of basic meaning vectors, weighted sums of which give arbitrary meaning vectors; weights are computed according to the distribution of words in different contexts (there are many ways to implement this idea; see any textbook on vector space semantics). Put simply, the relative angle θ between v1 and v2 determines their similarity value as cos θ. If meaning vectors are parallel, for example, the similarity value is 1, which means they have the same meaning. Distributional statistical semantics thus gives the geometry of meaning; it is an established and widely applied theory in Statistical AI. Conceptually, it is based upon what is called the Distributional Hypothesis: words that occur in similar contexts have similar meanings. A closely related hypothesis is the Bagof-Words Hypothesis: frequencies of words in a document indicate the relevance of the document to a query (thus, e.g., if documents have similar column vectors in a term-document matrix, they have similar meanings). These hypotheses practically regard a document as a mere multiset of words; hence no grammatical structure considered (and yet it does the job quite well, Statistical AI thus being the dominant paradigm of NLP). As to VSM (Vector Space Model), Turney and Pantel [19] remark as follows: The success of the VSM for information retrieval has inspired researchers to extend the VSM to other semantic tasks in natural language processing, with impressive results. For instance, Rapp (2003) used a vector-based
468
Y. Maruyama
representation of word meaning to achieve a score of 92.5% on multiplechoice synonym questions from the Test of English as a Foreign Language (TOEFL), whereas the average human score was 64.5%. Caliskan et al. [2] remark in their paper entitled “Semantics derived automatically from language corpora contain human-like biases” as follows: – “[T]here are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions.” – “[H]uman-like semantic biases result from the application of standard machine learning to ordinary language”. Distributional statistical semantics is purely descriptive, and cannot adequately capture the normative aspect of language, as seen in the case of machine biases. We could then ask: Where does normativity come from? Ideally, any existing norm should appear in text, and then distributional statistical semantics can allow AI to learn it. Yet if it does not appear in text it is impossible for statistical semantics to detect it. So norms coming from non-linguistic everyday contexts would be difficult to learn through statistical semantics. Now we review another ingredient of Quantum Linguistics. That is, Lambek’s pregroup grammar [12–14], which gives a compositional model of syntax. Lambek [12] says as follows: At first sight, it seems quite unlikely that mathematics can be applied to the study of natural language. However, upon closer examination, it appears that language itself is a kind of mathematics The basic storyline of Quantum Linguistics may be summarized as follows: – Compositional Logical Syntax: Lambek’s Pregroup Grammar. – Distributional Statistical Semantics: Vector Space Model of Meaning. – Quantum Linguistics as Synthesis: Compositional Distributional Semantics. It basically constructs what is called a strongly monoidal functor from grammar algebra (pregroup) to semantic space (compact closed category). In Quantum Linguistics, word meaning is given in a distributional way, and sentence meaning is given in a compositional way. That is, the meaning of words is analyzed through contextual distribution, and the meaning of a sentence is composed from the meaning of the words according to the grammatical structure, as we shall see below in more detail. “Quantum”, as remarked above, stems from Categorical Quantum Mechanics; basically, the category-theoretical structure of quantum mechanics (i.e., Hilbert spaces) is analogous to that of (the vector space model of) natural language. Now we get into the details of pregroup grammar. A pregroup (P, ≤, ∗, 1, l, r) is a partially ordered monoid (P, ≤, ∗, 1), where each x ∈ P has a left dual xl and a right dual pr satisfying the following conditions: pl ∗ p ≤ 1 ≤ p ∗ pl and p ∗ pr ≤ 1 ≤ pr ∗ p.
The Categorical Integration of Symbolic and Statistical AI
469
The following hold then: Both r and l are order-reversing. (xr )l = (xl )r = x. (x ∗ y)r = y r ∗ xr and (x ∗ y)l = y l ∗xl . Any P forms a residuated monoid (P, /, \) (aka. Lambek calculus), another model of grammar, by letting x/y = xr ∗ y and x\y = x ∗ y l . A compact closed category is defined as follows (for category theory terminology, see, e.g., [15]). A monoidal category (C, ⊗, I) is compact closed if every object C has a dual object C ∗ with pairing C : C ⊗ C ∗ → C and copairing
ηC : C → C ∗ ⊗ C
satisfying basic properties (aka. unit and counit, evaluation and coevaluation, etc). Note that a monoidal category is equipped with parallel composition ⊗ with unit I, such as composite quantum systems, which underpin quantum phenomena such as entanglement and non-locality. The point is that a pregroup is a degenerate compact closed category (with right and left duals different). Thus, a pregroup naturally maps into a compact closed category (such as Vectf d , the category of finite-dimensional vector spaces, and the category of sets and relations Rel). Note that compact closed categories exhibit, in a certain sense, inconsistent logic; it can, e.g., solve the Liar equation A ↔ ¬A by virtue of dual objects. From a logical point of view, this means that the structure of proofs can be consistent while the structure of propositions is inconsistent. Transitive verbs are understood in pregroup grammar as follows. Transitive verbs are of type nr ∗ s ∗ nl . Intuitively, this means that transitive verbs are functions whose inputs are two nouns, subjects and objects, and whose outputs are sentences. Then, for example, the sentence “Bob likes Alice” is grammatical because n(nr snl )n ≤ s where ∗ is omitted; we shall often omit ∗ in the following as well. Pregroup grammar is applied this way to decide whether given sentences are grammatical or not. Relative pronouns are treated as follows. Subject relative pronouns are of type nr ∗ n ∗ sl ∗ n. “Entities which think exist” is grammatical because n(nr nsl n)(nr s)(nr s) ≤ nsl snr s ≤ nnr s ≤ s. The relative pronoun of this sentence is a function whose inputs are a subject and a verb, and whose output is a subject noun. To be precise, every type reduction computation is going on within the free pregroup generated from basic types,
470
Y. Maruyama
even though it is made implicit in the above account. Pregroup grammar is equivalent to context-free grammar: the languages of pregroup grammars are precisely the context-free languages (not containing the empty string). There are criticisms of statistical vector space models as Grefenstette et al. [8] say as follows: Search engines such as Google either fall back on bag of words models— ignoring syntax and lexical relations—or exploit superficial models of lexical semantics to retrieve pages with terms related to those in the query Now the question is: how to incorporate compositional logical structure into statistical semantics? What is compositional distributional semantics? Quantum Linguistics answers this question by means of translating the compositional structure of pregroup grammar into the compositional structure of vector spaces in the following way. Types translate into vector spaces, ∗ into ⊗, and reductions into maps on vector spaces (or compact closed categories), which allow us to compose meaning vectors for sentences from those for words. This translation yields strongly monoidal functors from pregroups to compact closed categories. Basic types translate into vector spaces, and pregroup multiplication ∗ into tensor product ⊗ on vector spaces. Reductions in pregroup grammar translate into morphisms on vector spaces, since pregroups and vector spaces share compact closed structure. Both x ∗ xr ≤ 1 and xl ∗ x ≤ 1 correspond to V : V ⊗ V → k taking inner product, i.e., V (vi ⊗ wj ) = vi |wj , where k denotes the scalar field r l (or semiring) of vector spaces concerned. n 1 ≤ x ∗ x and 1 ≤ x ∗ x translate into η : k → V ⊗ V defined by η(1) = i=1 vi ⊗ vi . Now, and η give a compact closed structure. A single example will let one see what all this means. For example, the reduction n ∗ (nr snl ) ∗ n ≤ s translates into N ⊗ 1S ⊗ N : N ⊗ N ⊗ S ⊗ N ⊗ N → S. Now, the meaning vector of “sbj. verb obj.” is as follows: −→ −−→ −→ N ⊗ 1S ⊗ N (sbj ⊗ verb ⊗ obj). The sentence meaning vector is built from the word meaning vectors whilst respecting the grammatical structure. This is the central idea of Quantum Linguistics qua Compositional Distributive Semantics. Let us recapitulate the storyline: – Reductions in pregroup grammar translates into morphisms on vector spaces (since pregroups and vector spaces share an abstract categorical structure called compact closure). – Given meaning vectors for words, this translation enables us to compose meaning vectors for sentences according to their grammatical structures. – The semantics of words is distributional, based on cooccurrence in context. The semantics of sentences is compositional on the basis of the distributional semantics of words.
The Categorical Integration of Symbolic and Statistical AI
471
Entanglement plays a crucial rˆ ole in the compositional process of meaning generation. maps are Bell states as broadly used in quantum computation and also in the EPR paradox. A transitive verb, for example, must be taken as an entangled vector; otherwise the meaning of the whole sentence gets irrelevant to the meanings of its subject and object. Meanings of different sentences live in the same space. If you just tensor word meaning vectors to compose sentence meaning vectors, you have different sentences living in different spaces, and this makes their direct comparison impossible. Another issue is that the simple tensoring of word meaning vectors gives rise to high dimensional spaces, which may computationally be intractable, and make it impossible to compare sentences of different grammatical types.
3
Applications to Cognitive Bias Problems
Here we discuss applications of Quantum Linguistics to cognitive bias problems in the Kahneman-Tversky tradition [11,20], especially what is called the conjunction effect in the cognitive science of the human mind. In mathematical terms, the conjunction effect means that the following, fundamental law of probabilities P rob(ϕ ∧ ψ) ≤ P rob(ψ) i.e., the monotonicity of probabilities in terms of conjunction, does not hold in certain cognitive experiments such as the Linda experiment by TverskyKahneman [20]. In the Linda problem, subjects are first given a certain description about Linda like the following: Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Note that it is not very strange to think, according to this information, that Linda is, in fact, a feminist. Subjects are then asked to judge which of the following two possibilities 1. Linda is a bank teller; 2. Linda is a feminist bank teller is more probable. A significant number of people robustly tend to think that 2 is more probable than 1, regardless of their various backgrounds (the author performed this experiment at Kyoto University, one of the best universities in Japan, and at a top high school, and yet the results were not different, a significant number of people choosing the latter). Hence the violation of P rob(ϕ∧ψ) ≤ P rob(ψ). It should be remarked that the question per se can be answered correctly without any description whatsoever; it is, as a matter of fact, purely a problem in elementary probability calculus. The Linda problem may also be regarded as a case of contextuality; the description about Linda provides a special context for the
472
Y. Maruyama
question, thereby making human judgements biased. No one would choose the latter without the first description. According to recent research, there is a quantum probabilistic model of the statistics resulting from the Linda problem, and it suggests that choosing the latter, rather than the former, is not necessarily irrational [18]. It may just be that human rationality is different from classical rationality, following the laws of what could be called quantum rationality [16,18]. The question we would like to ask is what judgment AI agents would make when given the Linda problem. Put another way, the question is: Can AI have cognitive biases? Human agents can surely have cognitive biases as a vast number of cognitive experiments have shown. How about machine agents then? Recent research about machine biases tells that machine agents can have certain biases. Yet this does not necessarily entail that they have cognitive biases in the particular sense of Kahneman-Tversky. In the following we attempt to answer these questions in two different ways because there are two different kinds of AI, namely Symbolic AI and Statistical AI. We could distinguish even more different kinds of AI, but we think this is the right level of generality at the present stage of AI research. Let us first think of Symbolic AI. What would be Symbolic AI’s answer to the Linda problem? Symbolic AI builds upon Symbolic Logic, and so the question may be paraphrased as follows: What would be an answer to the Linda problem according to Symbolic Logic? Logical reasoning definitely tells us that ϕ is more probable than ϕ ∧ ψ because ϕ ∧ ψ ψ is logically valid. Let us now think of Statistical AI. What would be Statistical AI’s answer to the Linda problem? The vector space model of meaning is the standard, statistical paradigm of Natural Language Processing, and thus we seek an answer to the Linda problem within this paradigm. Meaning vectors for words can be constructed in the usual way based upon the distributional hypothesis (i.e., words that cooccur in similar contexts have similar meanings; there are several technical methods to implement this idea, but that does not really matter here). Meaning vectors for compound expressions can be constructed from word vectors in the way developed in the last section, that is, by means of Quantum Linguistics. So what is the answer to the Linda problem from the viewpoint of Quantum Linguistics? Note that Quantum Linguistics is Integrated AI in the sense that the applicability of Statistical AI is enhanced via the methods of Symbolic AI. The question may be paraphrased within Quantum Linguistics as follows. Let us denote by v the meaning vector for “bank teller”, and by w the meaning vector for “feminist bank teller”, and we also have a meaning vector for “Linda”, denoted by u. Note that u is derived by taking the description about Linda into account. Note also that these vectors can be taken to be normalized vectors. Now we can compare these three meaning vectors to see similarities between them as represented by their mutual angles or inner product values (aka. cosine measure). If u is closer to v than to w in terms of their mutual angle, then AI based upon Quantum Linguistics would answer that “Linda is a bank teller” is more probable than “Linda is a feminist bank teller”. If u is closer to w, the answer would be
The Categorical Integration of Symbolic and Statistical AI
473
the other way around. So to which of v and w is u closer? The answer can be given via the distributional hypothesis, that is, by virtue of comparison in terms of cooccurrence frequencies, which are used in constructing meaning vectors. Let us denote by t the meaning vector for “feminist”. Characteristics of Linda as specified in the description above often cooccur in text with characteristics of the feminist. This means that the description makes the meaning vector for “Linda” closer to the meaning vector for “feminist”, that is, u is closer to t. In other words, statistical correlations in text between characteristics of Linda and characteristics of the feminist make their meaning vectors closer to each other; this is the fundamental idea of the distributional hypothesis and the vector space model of meaning based on it. In contrast, the meaning vector for “Linda” is not so close to the meaning vector for “bank teller”, that is, u is not so close to v. Since w is composed of t and v, we can conclude that u is closer to w than to v. So the AI would conclude that “Linda is a feminist bank teller” is more probable than “Linda is a bank teller”. It may be observed from another angle that the reason why Statistical AI concludes this way is no different from the reason why some human agents answer that “Linda is a feminist bank teller” is more probable than “Linda is a bank teller”. In ordinary life, there are significant correlations between characteristics of Linda and characteristics of the feminist. This is why some human agents judge that “Linda is a feminist bank teller” is more probable than “Linda is a bank teller”. We may thus say that agents influenced by empirical correlations, whether they are humans or machines, tend to judge this way. In contrast, agents based on logical reasoning, whether they are humans or machines, tend to judge the other way around. So it is true that AI can have cognitive biases. Symbolic AI cannot be misled by the description. Yet Statistical AI can be because it relies upon statistical inference, which correlates Linda with the feminist when given the description about Linda. This never implies that Statistical AI is inferior to Symbolic AI because of its cognitive biases. Statistical AI allows us to make machine intelligence closer to human intelligence. There are actually positive aspects of cognitive biases; Tversky-Kahneman considered the Linda problem to stem from (representative) heuristics, which make our thinking much faster than strict logical reasoning (which is time-consuming) at the expense of absolute truth. Fast thinking is crucial in real-life situations. For example, we could die if we thought too much in certain dangerous situations; it is much the same in business situations. There are actually few occasions in which we can think for an unlimited amount of time. This is why we do need heuristics and Statistical AI do matter. Note that Statistical AI also largely relies upon heuristic algorithms rather than precise algorithms, which tend to be computationally expensive. Ultimately, these considerations would lead to an evolutionary account of cognitive biases or heuristics, which have had certain merits for the survival of human beings. There is a remark to be made here. It is implicitly assumed above that we adopt classical logic or at least logic that makes ϕ ∧ ψ → ϕ valid. It is also assumed that the P rob operator preserves logical connectives, especially
474
Y. Maruyama
conjunction and implication (or logical consequence relation). If we adopt nonclassical logic that does not validate ϕ ∧ ψ → ϕ, we may lead to a different answer to the Linda problem. There is actually a systematic method to control logical properties such as ϕ ∧ ψ → ϕ in terms of what are called structural rules in sequent calculus [6]. And we can indeed invalidate ϕ ∧ ψ → ϕ in substructural logic [6]; that is, ϕ ∧ ψ → ϕ is not valid in logic without the weakening rule. If we adopt such substructural logic, therefore, we can account for the fact that P rob(ϕ ∧ ψ) ≤ P rob(ψ) does not hold. This is a substructural account of the Linda problem. The substructural account of cognitive biases suggest that the logic of the human mind may not be classical, but rather be substructural. The logic of quantum mechanics, as well as the logic of human cognition, is substructural in a certain sense. No-Cloning and No-Deleting Theorems in Quantum Physics state that quantum information can neither be copied nor deleted. Categorically, this means that there are no diagonal δ : H → H ⊗ H (copying operation) and no projection p : H ⊗ H → H (deleting operation). Classically, δ : H → H × H and p : H × H → H do exist. The logical meaning of No-Cloning and No-Deleting can be explicated: ϕ ≤ ϕ ⊗ ϕ is contraction, and ϕ ⊗ ϕ ≤ ϕ is weakening in terms of substructural logic. Categorically, monoidal product ⊗ coincides with cartesian product × if and only if ⊗ allows both diagonals and projections. The absence of δ, p enables entanglement, non-locality, and Bell’s fundamental theorem. Interestingly, diagonal arguments do not hold in monoidal categories without δ. All this tells us commonalities between the laws of Nature (i.e., Physics) and the laws of Reason (i.e., Logic), allowing us to go beyond the Cartesian Dualism separating Matter and Mind.
4
Concluding Remarks
We have discussed Quantum Linguistics as the categorical integration of Symbolic and Statistical AI, and applied it to cognitive and machine bias problems. There are other approaches to integrating Symbolic and Statistical AI, such as Domingos et al. [5]; Quantum Linguistics, when compared to them, is supported by the rigorous and profound mathematical foundations of Categorical Quantum Mechanics [1] equipped with the transparent graphical calculus of process diagrams, which can even be automated as a reasoning system for Quantum Mechanics and Natural Language Processing. Statistical AI is known to inherit human biases while learning actual text containing some biased expressions. The same thing applies to cognitive biases in the Kahneman-Tversky tradition, especially the Linda problem. All this is not necessarily a negative finding, suggesting that Statistical AI is closer to human intelligence. Symbolic AI can apparently avoid biased judgements; yet it is also possible to account for them in terms of substructural logic. Cognitive biases are not random phenomena, and they are certain reasons and mechanisms underlying them. While Symbolic AI eliminates cognitive biases, Statistical AI allows us to shed new light on those reasons and mechanisms lurking behind them. In particular, the Linda problem involves
The Categorical Integration of Symbolic and Statistical AI
475
compound expressions in language. In order to analyze it through the vector space model of meaning in Natural Language Processing, we have to compare meaning vectors for different compound expressions in a single setting. Quantum Linguistics, qua Statistical AI enhanced with Symbolic AI, gives the right framework to do this in a mathematically rigorous manner. It may thus be concluded that Symbolic AI can play a pivotal rˆ ole in improving the applicability and efficiency of Statistical AI, at least in NLP as demonstrated by Quantum Linguistics. Deduction and induction, i.e., reasoning on theoretical grounds and learning from empirical data, are two wheels of the mind and of the sciences, and both of them are arguably necessary for future developments of AI. The integration can in principle be achieved in many ways; yet Category Theory gives a particularly promising framework for it as demonstrated by Quantum Linguistics.
References 1. Abramsky, S., Coecke, B.: A categorical semantics of quantum protocols. In: Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, pp. 415–425 (2004) 2. Caliskan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017) 3. Coecke, B., et al.: Mathematical foundations for a compositional distributional model of meaning. Linguist. Anal. 36, 345–384 (2010) 4. Coecke, B., et al.: Lambek vs. Lambek: functorial vector space semantics and string diagrams for Lambek calculus. Ann. Pure Appl. Logic 164, 1079–1100 (2013) 5. Domingos, P., et al.: Unifying logical and statistical AI. In: Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, pp. 1–11 (2016) 6. Galatos, N., et al.: Residuated Lattices: An Algebraic Glimpse at Substructural Logics. Elsevier, Amsterdam (2007) 7. Grefenstette, E., et al.: Concrete sentence spaces for compositional distributional models of meaning. Comput. Mean. 4, 71–86 (2011) 8. Grefenstette, E., et al.: Experimental support for a categorical compositional distributional model of meaning. In: Proceedings of EMNLP 2011, pp. 1394–1404 (2011) 9. Kartsaklis, D., et al.: Reasoning about meaning in natural language with compact closed categories and Frobenius algebras. In: Logic and Algebraic Structures in Quantum Computing, pp. 199–222 (2014) 10. Sadrzadeh, M., et al.: The Frobenius anatomy of word meanings I. J. Logic Comput. 23, 1293–1317 (2013) 11. Kahneman, D., Tversky, A.: Subjective probability: a judgment of representativeness. Cogn. Psychol. 3, 430–454 (1972) 12. Lambek, J.: Type grammars as pregroups. Grammars 4, 21–39 (2001) 13. Lambek, J.: Pregroups and natural language processing. Math. Intell. 28, 41–48 (2006) 14. Lambek, J.: From Word to Sentence: A Computational Algebraic Approach to Grammar. Polimetrica, Monza (2008) 15. MacLane, S.: Categories for the Working Mathematician. Springer, Heidelberg (1971)
476
Y. Maruyama
16. Maruyama, Y.: Contextuality Across the Sciences: Bell-Type Theorems in Physics and Cognitive Science. LNAI. Springer (2019, accepted for publication) 17. Maruyama, Y.: Compositionality and Contextuality: The Symbolic and Statistical Theories of Meaning. LNAI. Springer (2019, accepted for publication) 18. Pothos, E.M., et al.: Can quantum probability provide a new direction for cognitive modeling? Behav. Brain Sci. 36, 255–274 (2013) 19. Turney, P., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010) 20. Tversky, A., Kahneman, D.: Judgments of and by representativeness. In: Judgment Under Uncertainty: Heuristics and Biases, Cambridge University Press (1982)
Vehicle Routing Problem with Fuel Station Selection (VRPFSS): Formulation and Greedy Heuristic Jhonata Soares de Freitas and Andr´e Gustavo dos Santos(B) Federal University of Vi¸cosa, Vi¸cosa, MG 36579-000, Brazil [email protected], [email protected] https://www.ufv.br
Abstract. Motivated by last year truck drivers’ strike and the steady increase in fuel prices in Brazil, as well as the increase in price disparity between fuel stations, this paper proposes a variation of the vehicle routing problem that includes the selection of service stations supply. We propose a mathematical modeling that seeks to reduce transport costs by choosing the routes, fuel stations and quantity that must be supplied in order to minimize the total fuel expense. The new modeling takes into account the position of the fuel stations as well as the fuel value. Another factor to consider is the ability of vehicles to store and consume fuel. We present a mixed integer linear programming for the problem and a greedy heuristic to provide an initial solution. Keywords: Vehicle routing problem · Fuel price · Mathematical modeling · Combinatorial optimization · Greedy algorithm
1
Introduction
The truck drivers’ strike that took place in May, 2018, in Brazil, brought numerous inconveniences caused by loss of inputs, road blockages and lack of fuel. Some of the consequences were, for example: lack of fuel and increase of its price, long queues at fuel stations, lack of food and medicine for supply in shops and pharmacies respectively, suspension of classes and flights, death of poultry and pigs for the lack of food ration according to FOLHA [3]. This scenario caused some Brazilian cities to declare a state of public calamity, which generated losses of 40.9 billion brazilian reais, an increase in the price of the dollar and a 36% drop in exports, according to G1 [5]. The reason for the strike was the increase in fuel prices at stations due to the increase in fuel taxes and Petrobras’ price adjustment policy, as reported by G1 [4]. Another strikers’ claim was the end of the suspended axle toll according to G1 [6]. Rising fuel prices, specifically diesel, which directly affects truck drivers, results in increased costs associated with the transportation of cargo and people, c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 477–486, 2021. https://doi.org/10.1007/978-3-030-49342-4_46
478
J. S. de Freitas and A. G. dos Santos
causing damage to truck drivers and transportation companies. As a consequence of this increase, there is a ripple effect affecting the final price of transported products. The halt in freight transport and the blocking on several roads led to an unprecedented lack of fuel at stations, generating a huge demand for gasoline, including passenger cars. The few stations that received fuel began to limit the supply per vehicle and charge exorbitant prices. For example, gasoline reached R$ 9.99 per liter in some stations in the Federal District according to OGLOBO [11], at a time when the common value was just over R$ 4.00. Thus, road travel planning has an additional requirement: precise definition of where and how much to supply at each stop to ensure sufficient fuel for the route and also lower prices. Due to the difficulties of achieving a significant reduction in fuel prices, a good strategy would be to supply vehicles where the fuel is at the lowest price. Aiming to reduce the effect of instability caused by strike-like situations, this paper proposes an adaptation to the Vehicle Routing Problem (VRP) to calculate product delivery routes with the aim of minimizing fuel expenditure, taking in consideration the available stations, fuel consumption between routes, a minimum quantity that allows the vehicle to reach the nearest fuel station, and the maximum fuel capacity supported. The version proposed here was named Vehicle Routing Problem with Fuel Station Selection (VRPFSS). In the next section we make a brief literature review presenting some related problems. In Sect. 3 we present a formal definition of the problem and in Sect. 4 a mixed integer linear programming model. Section 5 presents a greedy heuristic to build an initial solution. In Sect. 6 we present model results and comparison with simple heuristics and in Sect. 7 some conclusions and future works.
2
Literature Review
The Vehicle Routing Problem (VRP) is a classic combinatorial optimization problem proposed by Dantzig and Ramser [2] and can be considered an extension of another classic problem, the Traveling Salesman Problem (TSP). In VRP, one must serve a set of customers using a fleet of vehicles leaving a common deposit. VRP generally takes into account the capacity of the vehicle, so the sum of customers’ demands served by this vehicle cannot exceed its capacity. The purpose of VRP varies depending on the context in which it is applied. For example, a solution is sought that minimizes total transportation time, total distance traveled, waiting time, driver overtime, vehicle wear, customer dissatisfaction, or maximizes customer service, among others. The motivation of the VRP was the need to reduce the costs associated with transportation logistics, as the estimated value associated with the transportation of the product varies between 10% and 15% of the final product value according to Hasle [12] and Rodrigue [13]. Therefore, using a method that assists in the construction of routes to reduce costs is justifiable. Even having a simple
Vehicle Routing Problem with Fuel Station Selection
479
definition VRP is an NP-Hard problem, just like TSP. There are some variants of VRP that are used for particular VRP applications. Some variations take into account specific characteristics, such as product type, vehicle type, maximum transport speed. Minimizing the total distance traveled is generally associated with low freight costs. However, for a variety of reasons, generating the shortest route does not necessarily produce the lowest financial cost, as other factors may contribute to the added cost of the routes. For example, a portion of the route has slow traffic, tolls, poor road conditions or maintenance, accidents, higher price of fuel on stretches or are more susceptible to weather conditions and others. Using a technique that takes into account all factors that influence transportation cost is not feasible because obtaining instances that accurately represent reality is practically impossible, since some factors such as traffic and road conditions depend on several reasons and may vary depending on time making an accurate representation impracticable. An interesting factor to consider is reducing fuel costs, as a vehicle fleet consumes a large amount of fuel and there is a certain relationship between fuel consumption and distance traveled. The price of fuel at stations tends to vary evenly due to consumer protection laws, so route planning tends to achieve near-minimum fuel costs in the medium term. The VRP Mathematical Model proposed by Dantzig and Ramser [2] minimizes the total distance traveled by the vehicle, restricted by constraints that all customers except deposit should be visited only once. The vehicle must arrive and depart the same number of times from a location, be it a customer or a deposit. The Model ensures that the demand of all customers must be attended. The Vehicle Routing Problem with Simultaneous Delivery and Pickup (VRPSDP) is a variation of the Vehicle Routing Problem, proposed by Min [10], to calculate distribution routes for materials from a library, where the delivered materials must be placed back on the shelves and those to be borrowed must be withdrawn. The concept can be used to deliver products to customers while simultaneously picking up waste for proper recycling or disposal. This variation has arisen to avoid the expense of having to repeat the same route twice to pick up and deliver. While it may seem obvious to use the same vehicle for delivery and pick up, finding routes that do not violate vehicle capacity constraints is not a simple task, so most routes generated by the classic VRP could not be feasible. VRPSDP also has applications in postal services, where it is possible to deliver mail to one customer and pickup another for shipping, and in beverage dispensers, where the truck delivers full beverage containers and pickup empty containers. Before proposing the model, the work of Min [10] proposes a technique that consists of grouping the points in clusters, designating a vehicle for each cluster and applying the TSP to each one of them. However, this approach violated vehicle capacity constraints in most instances. The viable solutions were found using the branch-and-bound technique to solve the model.
480
J. S. de Freitas and A. G. dos Santos
The Green Vehicle Routing Problem (GRVP), according to Lin et al. [8], was proposed with the objective of meeting the needs of green logistics, reconciling the reduction of environmental and economic costs. The GVRP’s main objective is to reduce carbon dioxide emissions (CO2 ) by reducing the consumption of fossil fuels, including other alternative fuels. One of the topics addressed by Lin et al. [8] in GVRP demonstrates that minimizing fuel consumption contributes to reducing carbon dioxide emissions, but does not achieve minimal emissions because factors such as traffic congestion influence. Maden et al. [9] studied the problem considering the road congestion and obtained a 7% reduction in carbon dioxide emission compared to the problem whose objective is only minimizing the distance. The Vehicle Routing Problem considering fuel consumption was proposed based on green logistics to minimize the fuel consumption spent on the routes, being the fuel an important factor in reducing transportation cost and also in reducing carbon dioxide emission. This new approach takes into account factors such as vehicle speed, terrain slope, load weight, distance traveled, vehicle color, tire inflation pressure, traffic conditions, weather and use of deflectors. According to Kuo and Wang [7], the factors that most influence fuel consumption are the total distance, the transport speed and the weight of the cargo. They conclude that choosing a route where the travel can be done at a higher speed is more interesting. Longer routes with light traffic can be chosen rather then shorter routes with intense traffic, if the time of transport is shorter. In order to obtain a result consistent with the real scenario, we must take into account several factors similar to GVRP’s Lin et al. [8] work, demonstrating that the reduction of carbon dioxide emission does not depend exclusively on distance. The VRP variations available in the literature take into account several factors but do not consider the financial costs related to fuel. We then add the costs of fuel consumption on the problem proposed here.
3
Description of VRPFSS
Considering the fact that VRP and its known variations in the literature do not have all the necessary characteristics to minimize the expenses associated with vehicle supply and based on the literature and a common knowledge on the subject, a mathematical modeling for the variation of VRP is described. Which minimizes supply costs. The proposed modeling must satisfy route integrity constraints, as well as other variations of VRP, such as ensuring that the customer is visited only once, meeting customer demand, not exceeding the load capacity of the vehicle and return to deposit. These restrictions can be found in Dantzig and Ramser [2] and have few changes in VRP variations. Although it deals with fuel consumption, the proposed problem disregards some characteristics of the real scenario, such as vehicle weight, speed and road inclination, which directly influence consumption, making consumption dependent only on distance. When traversing a stretch, the amount of fuel available
Vehicle Routing Problem with Fuel Station Selection
481
at the beginning of the stretch is decremented by the length of the stretch times a factor that indicates how much units of fuel the vehicle consumes per unit of distance. When passing through a fuel station, the vehicle may be refueled to the extent of the tank capacity. It may also be only partially filled (or not even use the station) if it is more advantageous to supply at some other station to which it has sufficient fuel to arrive. Stations are points that should be included in the routes if it is convenient for the vehicle to fill at that point. The fuel tank of each vehicle obviously has a limited storage capacity. We also consider that this tank can never be completely emptied, i.e., the vehicle should always have a minimum reserve amount, to ensure that it can refill even if something unforeseen makes it consume more. Thus, throughout the journey, the amount of fuel stored must be between a minimum and a pre-set maximum. Thus, routes and fueling should be designed to prevent the vehicle from being obliged, because it is close to the reserve minimum, to fuel at more expensive stations and to avoid being unable to take advantage of fueling at cheaper stations, for example, be near the limit of storage capacity. In the next section we present a mathematical formulation that formally defines VRPFSS encompassing all these characteristics.
4
Mathematical Model
In order to formally define the problem, the ensuing mixed integer linear program model uses the next parameters and decision variables. Parameters: – – – – – – – – – –
C: Customers and Deposit set (deposit has index 0) V : Vehicles set P : Fuel station set Q: Vehicles capacity T : Maximum fuel capacity of the vehicle R: Fuel reserve K: Fuel Consumption per Km traveled qh : Demand of customer h (we consider qh = 0 for fuel stations) Di,j : Distance from point i to point j Sj : Price per unit of fuel at fuel station j Decision Variables:
– – – –
Gk,i : Fuel of the vehicle k at point i Xk,i,j : 1 if vehicle k travels from point i to point j 0, otherwise Fk,i,j : Load of vehicle k when traveling from point i to point j Ok,i,j : Quantity fueled at station j by vehicle k when coming from point i M in
k∈V i∈C∪P j∈P
Sj ∗ Ok,i,j
(1)
482
J. S. de Freitas and A. G. dos Santos
Xk,0,i = 1
∀k ∈ V
(2)
i∈C∪P
i∈C∪P
Xk,i,j −
j∈C∪P
Xk,i,j = 1 ∀i ∈ C − {0}
(3)
k∈V j∈C∪P
Xk,h,i = 0 ∀i ∈ C ∪ P
∀k ∈ V
(4)
h∈C∪P
Fk,i,h −
Fk,h,j = qh
∀h ∈ C − {0} ∪ P
∀k ∈ V
(5)
j∈C∪P
Fk,i,j ≤ Q ∗ Xk,i,j
∀i, j ∈ C ∪ P
Xk,i,j ≤ 1 ∀k ∈ V
∀k ∈ V
(6)
∀i ∈ P
(7)
j∈C∪P
R ∗ Xk,i,j ≤ Gk,i ≤ T ∗ Xk,i,j
∀i, j ∈ C ∪ P
∀k ∈ V
(8)
R ∗ Xk,i,j ≤ Gk,j ≤ T ∗ Xk,i,j
∀i, j ∈ C ∪ P
∀k ∈ V
(9)
Gk,0 = T
∀k ∈ V
(10)
Xk,i,j ∗Gk,i −K ∗Xk,i,j ∗Di,j +Ok,i,j = Xk,i,j ∗Gk,j
Xk,i,j ∗Gk,i −K ∗Xk,i,j ∗Di,j = Xk,i,j ∗Gk,j
∀i ∈ C ∪P
Xk,i,j ∗ Gk,i − K ∗ Xk,i,j ∗ Di,j ≥ R ∗ Xk,i,j Xk,i,j ∗ (T − R) − Ok,i,j ≥ 0 Xk,i,j ∈ {0, 1}
∀k ∈ V
∀i ∈ C ∪P
j∈C
∀i, j ∈ C ∪ P
∀i, j ∈ C ∪ P ∀i, j ∈ C ∪ P
j∈P
∀k ∈ V (12)
∀k ∈ V
∀k ∈ V
∀k ∈ V (11)
(13) (14) (15)
Fk,i,j ≥ 0
∀k ∈ V
∀i, j ∈ C ∪ P
(16)
Ok,i,j ≥ 0
∀k ∈ V
∀i, j ∈ C ∪ P
(17)
The goal is to reduce the cost of fuel consumption, represented by expression (1). Constraints (2)–(4) controls the flow of the routes: the vehicles should start
Vehicle Routing Problem with Fuel Station Selection
483
at the Depot (2), should visit every customer exactly once (3) and must leave a point the same number of times it is reached (4). The next pair of constraints controls the vehicle load: the load is reduced by the customer demand when visited (5), and the vehicle capacity must be respected everywhere on the route (6). The following constraints control the fuel consumption and supply: (8) and (9) ensure that the vehicle must maintain its fuel capacity upon departure and arrival at a point; (10) ensure that the vehicle leaves the deposit with a full tank; (11) calculates the amount of fuel to reach a station, which is the amount available at the previous point, minus the amount spent in transit plus the amount fueled on the way, and (12) calculates the amount of fuel to reach a customer or deposit, which is the amount available in the previous point, minus the amount spent in transit; (13) ensures that the vehicle will not attempt to go to a point and run out of fuel on the way and (14) ensures that the vehicle must be fueled without violating its capacity constraints. Notice that (13) and (14) are partially covered by (8) and (9) but we keep them for performance. Finally, (15) to (18) represents the domain of the decision variables.
5
Greedy Algorithm
Due to the difficulties of finding a viable initial solution for VRPFSS using only the proposed formulation, a greedy algorithm was implemented to find viable initial solutions. The strategy adopted to implement the algorithm was to explore one of the characteristics that has influence on the total fuel expenditure, the total distance. The strategy implemented was to use the routes obtained by applying the VRP model proposed by Dantzig [2] which minimizes the total distance. This model was solved by CPLEX with maximum 1 h of runtime, disregarding fuel station points. This first solution is then used by a greedy algorithm to add fuel stations as needed, respecting fuel storage capacity constraints, not repeating fuel stations and ensuring that all routes can be completed. The criterion for rank selection is to use the smallest Sj Di,j /β value, whose j is an element of the fuel station set and i is an element of the set of customers. The rationale for choosing the criterion is to choose a low-cost fuel station but at the same time avoid traveling long distances for refueling. The parameter β is used to compensate the magnitude of the different measures (distance and price). In our tests we used β = 15. After choosing a station, the vehicle always fills the tank, justified by the difficulty of determining a fueling criterion in order to obtain the best fueling for a given route.
6
Results
Instances obtained from Augerat [1] have been adapted to add fuel stations, price of fuel and maximum and minimum amount of fuel supported by vehicles.
484
J. S. de Freitas and A. G. dos Santos
The position of fuel stations were randomly generated and the prices have been pseudo-randomly added, with a minimum value of 400 and a maximum of 500 monetary units per fuel unit. Figure 1 represents an example of the adaptation of one of those instances: the VRPFSS P-n9-k3 instance with fuel stations and an example of a viable solution.
Fig. 1. Viable solution for instance P-n9-k3
In Table 1, we report the name of the instance that was adapted from Augerat [1], the number of customers, fuel stations and vehicles, the maximum fuel capacity supported by the vehicle and the amount of the reserve. The “Greedy Algorithm” column represents the viable routes calculated by the algorithm described in Sect. 5 and the “Mathematical Model” represents the improvements obtained by applying the model describing in Sect. 4 using CPLEX 12.8.0 solver during 1 h for each instance and the % column represents the cost improvement of the mathematical model over the greedy algorithm. The improvement on the fuel consumption cost was 54.4% in average, compared to the greedy solution, that always fill the tank in a fuel station chosen greedly. The strategy of using an initial solution accelerated the result calculation process as shown in Table 2, avoiding wasting time finding a viable solution, and exploring further the features of the relationship between distance and consumption. According to Kuo and Wang [7], the distance is one of the factors that influence fuel consumption, so reducing the total distance tends to reduce fuel consumption and consequently affects the total fuel cost. However, due to the complexity of solving the model, it was not possible to measure how far the solution found is from the optimal solution for either instance.
Vehicle Routing Problem with Fuel Station Selection
485
Table 1. Total costs using greedy algorithm and CPLEX to solve the proposed model Instances
Number of Tank limits Solution cost Consumers Stations Vehicles T R Greedy algo. Math. model %
A-n32-k5 A-n33-k5 A-n33-k6 A-n34-k5 A-n36-k5 A-n37-k5 A-n38-k5 A-n44-k7 A-n46-k7 A-n48-k7 P-n9-k3 P-n16-k8 P-n20-k2 P-n21-k2 P-n22-k2 P-n40-k5 P-n45-k5 P-n50-k7 P-n50-k10
32 33 33 34 36 37 38 44 46 48 9 16 20 21 22 40 45 50 50
76 41 41 78 104 89 78 96 134 92 23 28 48 53 30 68 113 58 58
5 5 6 5 5 5 5 7 7 7 3 8 2 2 2 5 5 7 10
85 85 95 95 150 135 125 125 137 153 45 65 85 70 87 91 90 85 85
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
219855 206928 183295 226226 170204 160867 250123 307998 168097 238620 68202 84806 69372 58796 80732 115523 153876 69924 58908
178845 171454 131712 143718 84160 73058 60891 117049 67710 84974 40531 23977 26579 31844 17924 6228 30068 36410 32622
18.7 17.1 28.2 36.5 50.6 54.6 75.7 62.0 59.7 64.4 30.6 71.7 61.7 45.8 87.8 95.0 80.5 47.9 44.7
Table 2. Results after 1 h of CPLEX for the proposed model Instances Using the greedy algorithm Not using the greedy algorithm
7
P-n9-k3
40531
58662
P-n16-k8
23977
No viable solution found
P-n20-k2
26579
No viable solution found
A-n32-k5 178845
No viable solution found
A-n33-k5 171454
No viable solution found
Conclusions and Future Works
This paper addresses a new perspective on VRP, which is based on generating the best routes taking into consideration fuel price and vehicle fueling planning. Adapting the VRP to the context also ensures that vehicles will carefully choose fuel stations and finish the route with as little fuel as possible, facilitating strategic spending planning and resource management. Future work will consider the application of constructive heuristics and metaheuristics in the VRP with selection of stations seeking to find better solutions for the larger instances and add factors that influence fuel consumption, such as the total weight of the vehicle and the speed and traffic on the road. The solutions obtained by constructive heuristics and metaheuristics will be evaluated using statistical methods and compared with the mathematical model and the greedy method here proposed for the VRPFSS.
486
J. S. de Freitas and A. G. dos Santos
Acknowledgment. The authors thank CAPES and FAPEMIG funding agencies for providing the necessary resources for this work.
References 1. Augerat, P.: Vehicle routing problem instances (1995). http://vrp.atd-lab.inf.pucrio.br/index.php/en/. Accessed 22 Apr 2019 2. Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Manag. Sci. 6(1), 80–91 (1959) 3. Folha De S˜ ao Paulo: S˜ ao Paulo: Em 4 dia de greve, cidades ficam sem combust´ıvel e sem alimentos (2018). https://www1.folha.uol.com.br/mercado/2018/05/em-4odia-de-greve-cidades-ficam-sem-combustivel-e-sem-alimentos.shtml. Accessed 25 Apr 2019 4. G1. Brazil: 6 perguntas para entender a alta nos pre¸cos da gasolina e do diesel (2018). https://g1.globo.com/economia/noticia/6-perguntas-para-entendera-alta-nos-precos-da-gasolina-e-do-diesel.ghtml. Accessed 07 Apr 2019 5. G1. Brazil: Greve dos caminhoneiros (2018). https://g1.globo.com/economia/aovivo/greve-de-caminhoneiros-maio-de-2018.ghtml. Accessed 03 Apr 2019 6. G1. Brazil: Representante de caminhoneiros em MT diz que greve n˜ ao se resume a redu¸ca ` ˜o do diesel e cita outras reivindica¸co ˜es (2018). https://g1.globo.com/ mt/mato-grosso/noticia/representante-de-caminhoneiros-em-mt-diz-que-grevenao-se-resume-a-reducao-do-diesel-e-cita-outras-reivindicacoes.ghtml. Accessed 09 Apr 2019 7. Kuo, Y., Wang, C.: Optimizing the VRP by minimizing fuel consumption. Manag. Environ. Qual.: Int. J. 22(4), 440–450 (2011). https://doi.org/10.1108/ 14777831111136054 8. Lin, C., Choy, K., Ho, G., Chung, S., Lam, H.: Survey of green vehicle routing problem: past and future trends. Expert Syst. Appl. 41(4, Part 1), 1118–1138 (2014). http://www.sciencedirect.com/science/article/pii/ S095741741300609X. ISSN 0957-4174 9. Maden, W., Eglese, R., Black, D.: Vehicle routing and scheduling with time-varying data: a case study. J. Oper. Res. Soc. 61(3), 515–522 (2010). https://doi.org/10. 1057/jors.2009.116 10. Min, H.: The multiple vehicle routing problem with simultaneous delivery and pick-up points. Transp. Res. Part A 26A(5), 377–386 (1989) 11. OGLOBO. Brazil: Gasolina chega a 9.99 R$ e procon faz fiscaliza¸ca ˜o (2018). https://oglobo.globo.com/economia/gasolina-do-df-chega-r-999-procon-fazfiscalizacao-22711442. Accessed 25 Apr 2019 12. Hasle, G., Kloster, O.: Geometric modelling, numerical simulation, and optimization. In: Operations Research Computer Science Interfaces, pp. 397–436. Springer (2007) 13. Slack, B.: Concept 3.1 - transport costs and rates (chap. 7). In: Rodrigue, J.-P., Comtois, C. (eds.) The Geography of Transport Systems. Routledge, New York (2006)
Requirements Change Requests Classification: An Ontology-Based Approach Zaineb Sakhrawi(B) , Asma Sellami, and Nadia Bouassida Higher Institute of Computer Science and Multimedia, University of Sfax, Sfax, Tunisia [email protected], {asma.sellami,nadia.bouassida}@isims.usf.tn
Abstract. Requirements for software system projects are becoming increasingly exposed to a large number of change requests. Change requests captured in natural language are difficult to analyze and evaluate. This may lead to major problems, such as requirements creep and ambiguity. To provide an appropriate understanding of a change request in a systematic way, this paper aims to develop an ontology for classifying change requests as either functional changes (FC) or technical changes (TC). Technical changes are further classified into nine categories including the ISO 25010 quality characteristics and Project requirement and constraints. To establish a comprehensive representation of change requests, we collected users’ reviews from PROMISE repository and classified them using the prot´eg´e editor. The feasibility of the proposed approach is illustrated through examples taken from PROMISE repository.
Keywords: System requirements
1
· Change request · Ontology
Introduction
System requirements are the basis for any software project. Identifying completely and clearly system requirements at the beginning of the software life cycle is a hard task. Thus, changes are necessary. Change Requests may occur during the software development or even after its delivery. To effectively evaluate change request, the use of efficient measurement method is required [11]. But the choose of the measurement method depends on the type of change request. This is one of the main reason why the classification of change request is required. Change requests are most often expressed in natural language, accounting for up to 90% of all specifications [1]. However, change requests expressed in natural language are difficult to analyze and may lead to confusion, inefficient distinction of system requirements types, ambiguity, etc. Change requests may be “in scope” or “out of scope” [2]. An “in scope” change request means that changes can be handled with few modifications to system requirements. In contrast, an c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 487–496, 2021. https://doi.org/10.1007/978-3-030-49342-4_47
488
Z. Sakhrawi et al.
“out of scope” change request requires a lot of time to be implemented. Requirements for software system project are decomposed into three components [5]. Functional user requirements (FUR), Non-functional requirements (NFR), and Project Requirement and Constraint (PRC). FUR describe what the software shall do [5]. NFR describe not what the software will do but how the software will do it [5]. PRC represent requirements that define how a software system project should be managed and resourced, or constraints that affect its performance [5]. Change requests can be classified into two categories they affect. A Change Request that affects Functional User Requirement is classified as FC. A Change Request that affects Non Functional Requirements or Project Requirements and constraints is classified as TC. The classification of change requests allows stakeholders to be selective in the use of the appropriate measurement method. And therefore, they can evaluate accurately a change request. This is useful when they need to improve understanding of management decisions. This paper aims to develop an ontology-based approach to classify changes requests as either FC or TC. TC are further classified into quality requirements according to ISO 25010 software quality model and project requirements and constraints. The rest of the paper is structured as follows. Section 2 presents the background including the change management process and the ontology concepts. In Sect. 3, we survey the related work. In Sect. 4, an ontology-based approach is proposed. Section 5 gives conclusions and some suggestions for future work.
2 2.1
Background Changes Management and Requirements Classification
Managing changes is the process of evaluating, deciding, and implementing change requests [10]. This process includes three steps. The first step is change identification including Change elicitation activity and change representation activity. The second step includes change analysis. And, the third step is related to change cost/effort estimation. Our approach focuses on the first step, particularly, on the representation and the classification of changes requests. When change occurs, its impacts or consequences on the software project progress should be analyzed. Change impact analysis is useful to identify the potential consequences of a change in a system, or estimate what needs to be modified to accomplish a change. However, the use of efficient method for evaluating change requests depends on the type of requirement affected by the change. 2.2
Ontology for Classification
Ontology is an explicit specification of shared concepts and relationship among objects [6]. It is also used to represent items of knowledge (ideas, things, facts, etc.) in such a way that determines the relationships and classification of concepts within a specified domain of knowledge [8]. There are many motivation purposes
Requirements Change Requests Classification: An Ontology-Based Approach
489
for using ontology in requirements classification field, some of which are making an explicit domain and sharing common understanding of information between stakeholders. More precisely, the use of ontology will facilitate analyzing domain knowledge [4]. Ontology help in ensuring requirements consistency and facilitate communication between requirement engineers [3].
3
Related Work
This section examines the scholarly works done in the area of system requirements and change classifications with the use of Ontology. For instance, Khatoon el al. [9] noted the importance of ontology in managing changes within the context of global software development. Wang et al. [12] proposed an approach to achieve solutions for new engineering changes by retrieving and reusing existing cases. The proposed approach is limited somewhat to specific enterprises. Yan et al. [13] proposed a knowledge-based method for managing requirements change throughout the software development life cycle. Ontology is only used to create knowledge concepts and the relationship among them. Jayatilleke et al. [7] proposed a method for specifying and classifying business requirements change. Their approach addressed the operational level of system product only at a lower level. However, it is important to evaluate requirements change at a higher level in order to align change goals with business goals. Through the above studies, two main conclusions can be derived. First, classification was designed for different steps of a change management process where change classification needs to be part of that process. Then, ontology is frequently used to avoid inconsistency, ambiguity, and incompleteness among requirements. We noted that most of these proposals [7,9,12,13] do not refer to a standardized quality model, such as ISO 25010. And, these studies addressed only the higher level of granularity (only classes). However, details for evaluating and/or classifying change requests are very important.
4
An Ontology-Based Approach for Change Request Classification
In this section, we present a description of our proposed ontology-based approach that includes three main steps: (i) ontology specification; (ii) ontology conceptualization; and (iii) ontology implementation (Fig. 1). 4.1
Ontology Specification
This step typically includes an analysis of concepts to be classes or entities and the relationship among classes. In our work, the concepts we used are related to the system requirements change domain. A set of concepts (i.e., change requests) are derived from PROMISE repository (the collection of publicly available datasets)1 . Thus, change requests can be represented by an ontology. This 1
http://promise.site.uottawa.ca/SERepository/datasets-page.html.
490
Z. Sakhrawi et al.
Fig. 1. Proposed approach Table 1. Ontology class specifications Class
Description
System requirements change requests Functional change Technical change System requirements CR Domain
Customers change request from PROMISE repository
Delete Modify Create
Customers change requests that affect FUR Customers change requests that affect NFR or PRC Change communicators (internal or external). Examples of some of the internal change communicators are system analysts, developers, designers, etc. Examples of some of the external communicators are users, customers, managers, etc Contains requirements to be deleted Contains requirements to be modified Contains requirements to be created
is to identify their categories (either Functional or Technical change requests). The classification of change requests is designed to identify the type of changes to be made, the actions to be taken and the appropriate service to be implemented. Table 1 lists the main classes of our proposed ontology. Table 2 depicts the inter-relationships among classes (domain/range).
Requirements Change Requests Classification: An Ontology-Based Approach
491
Table 2. Ontology inter-relationship description Inter-relationship among Domain classes
Range
Is composed of
Functional change, Technical change Non-functional requirements, Projects requirements and constraint, External change, Internal change Functional requirements (FUR), Functional external change, Functional internal change In create In modify In delete Ex Create Ex Modify Ex delete
Decomposed in
System requirements change request Technical change
Decomposed in
Functional change
Is a Is a
Internal change External change
4.2
Ontology Conceptualization
Following the enumerated concepts depicted in Table 1 and Table 2, we create an ontology by using Prot´eg´e 4 ontology editor. The conceptual model involves a set of concepts in the domain and their relationships. Of course, a number of concepts must be defined such as class, attributes, objects property, data property and their relationships. Figure 2 presents the different classes of the proposed ontology model: The main class is named “System requirements change request”. The classes named “Functional change” and “Technical change” are sub-classes of the class “System requirements change request”. This step also includes the ontology population using PROMISE repository. Ontology population is made by creating an instance for each class, and providing links according to the interrelationships among classes (domain/range). We selected one software requirements specification document including a total number of 832 non-functional requirements and 93 functional requirements. Manipulation of instances is an important step in our ontology-model. There are many approaches used by ontology management systems: OWL schema 2 and object oriented development3 . We used OWL schema and Jena to populate automatically our ontology with change requests (i.e., users’ reviews) derived from PROMISE repository. These requests are conducted on implemented software and previous software development projects. The gathered data (change requests) are used to perform the task of Ontology population.
2 3
https://www.w3.org/OWL/. https://www.w3.org/TR/sw-oosd-primer/.
492
Z. Sakhrawi et al.
Fig. 2. Ontology-based requirements classification model
4.3
Ontology Implementation
In our contribution, we proposed a set of Semantic Web Rules Language (SWRL)4 and Description Logic Query (DL query)5 based on the definition of the ISO 25010 software quality characteristics and the description of the users’ reviews within PROMISE repository. The first step is to select a set of terms that are relevant to the domain which can be done either manually or automatically. It is related to the identification of subject, object and relationships. The linguistic motivation for this identification is that common terms meaning is implicit in its relations with other terms. Using Prot´eg´e 4 editor, these terms are grouped into classes. Then, they will be translated to a set of rules. These concepts can be used to detect rapidly the required Instances. Table 3 lists our proposed classes with their corresponding key concepts (i.e., users’ reviews from PROMISE repository). Our proposed ontology solution requires at least the following steps: (1)Implementation of the rules, (2) Queries about the knowledge using DL query, (3) Invoke a pellet reasoner that builds a knowledge base of the domain ontology, and (4) Ontology output and solution discussion. Implementation of Rules. Following the conceptualization step, we propose a set of rules for our Ontology. We only present three rules (R1, R2 and R3). With the explanation of the first rule, Table 4 includes four colones: the name of class (class), the attribute (Data property), the instances (individuals) and 4 5
https://www.w3.org/Submission/SWRL/. https://github.com/stardog-union/DLQuery.
Requirements Change Requests Classification: An Ontology-Based Approach
493
Table 3. Categorizing the customer’s change request Classes
Key concepts
Functional change Must contain, Play, View, Select, Manage, Operate Technical change Maintain, Produce, Corporate, Load, Upload, Synchronize, Appearance, Transaction Cannot, Please, Doesn’t, None, Problems, No access, Bugs, External-change Stopped working Product must, Product shall, Administrators must, System Internal-change must, Application parameters, Change Add, Build, Design, Generate, Organize, Set up, Produce, Create Delete, Black out, Destroy, Exclude, Cut out, Eliminate, Delete Cancel Adapt, Revise, Modify, Correct, Rework, Repair Modify
the result (output). And, one line including the proposed rule. This table is applicable to the following rules (R1, R2, and R3). – R1: SystemRequirementsChangeRequest(?F), Change Value(?F, ?V), contains(?V, “stopped played”) — External Change(?F) Table 4. R1 Class
Data property Individuals
Output
System requirements change request
Change Value Stopped played External change
R1 is used to identify the source of change requests. A change request may occur from different stakeholders with different set of priorities. The users’ reviews are classified as either external change or internal change. External changes are related to the users’ point of views. For example, “I loved the application but over since I installed Ios7 and updated the application, it does not start please fix it”. External changes contribute to identify and define internal changes (developer’s point of view). It is important to distinguish internal changes from external changes for a better prioritization of change requests and to establish the role of stakeholders. – R2: SystemRequirementsChangeRequest(?F), Change Value(?F,?V), contains(?V, “events”), contains(?V, “update”) - Functional Change(?F) R2 is used to identify FC that affect functional requirements (FUR). – R3: SystemRequirementsChangeRequest(?F), Change Value(?F,?V), contains(?V, “resources”), contains(?V, “update”) - Technical Change(?F) R3 is used to identify TC that affect quality requirements (NFR) and PRC.
494
Z. Sakhrawi et al.
Queries About the Knowledge Using DL Query. In order to verify and validate the ontology, we used the reasoning features of the proposed rules specified in the DL Query. DL provides a powerful and easy-to-use feature for searching a classified ontology. The query language (class expression) supported by the plugin is based on the Manchester OWL syntax, a user-friendly syntax for OWL DL. Because its expressiveness and powerful representation, we used OWL to represent our ontology-based approach. Figure 3 illustrates a query for FC. As a result, a list of inferred individuals related to FC are considered (as explained by R2). Invoking a Pellet Reasoner that Builds a Knowledge Base of the Domain Ontology. To illustrate the use of our ontology, we used DL Query tab along with the reasoner pellet to retrieve all the corresponding instances of classes. Reasoning in ontologies is one of the reasons why we use ontology. Two types of models exist in Prot´eg´e: asserted and inferred (Fig. 4). The results of tests are shown in the form of the inferred individuals. Ontology reasoner may find important connections and implications among different components (concepts, relations, properties) used for building our ontology. The proposed rules designed for allowing requirements engineers to focus on the system requirements change.
Fig. 3. Functional requirements change DL rule result
Requirements Change Requests Classification: An Ontology-Based Approach
495
Fig. 4. Ontology with reasoner
Ontology Output and Solution Discussion. For a more proactive response to a change, we propose three types of requirements change request that can be applied on FC and TC: Addition, Deletion, and Modification. – If the Change Request is an “Addition of a new requirement”, it will provide additional output. – If the Change Request is a “Deletion of an existing requirement”, it will provide deletion output – If the Change Request is a “Modification of an existing requirement”, the sources of the modification types should be identified (Refine or Replace). As an example R1 “the system shall be able to display student’s information” and a CR occurs as “The system shall be able to display student’s information: full name and grade level”. Replace Modification is a modification that change one requirement to another requirement. Compared to the research studies as presented in Sect. 3, our proposed ontology is useful when analyzing the source of requirements change requests as internal and/or external, the type of requirements change requests as Functional or Technical, the type of both Functional requirements change requests and Technical requirements change requests as Deletion or Modification or Addition, and using the ISO 25010 Standard quality model and ISO 19761 for classification. In our work, we add another challenge of coming-up with change requests classification that affect software/system requirements in order to avoid misunderstanding, and ultimately reach a successful requirement change management. Our ontology model is based on the ISO 25010 quality model to provide a unified classification of change requests
496
5
Z. Sakhrawi et al.
Conclusion
This paper proposed a new approach for change request classification. Proposed approach is based on ontology to classify the different change requests as either Functional Change or Technical Change. Technical Changes are further classified into one of the eight ISO-25010 quality characteristics and Project Requirements and Constraints. Our approach involves three main steps: (i) identification and specification of change requests, (ii) conceptualization of relationship among system requirements change requests, and (iii) an implementation of rules and results generation. Further research could be carried out regarding the ontology reasoning to be used for checking requirements consistency or even identifying links between requirements (conflicts/cooperation/non-effects). We also intent to translate natural language requirements to formal specifications. In addition, using the classification results for software effort/cost estimation.
References 1. Rashwan, A., Ormandjieva, O., Witte, R.: Ontology-based classification of nonfunctional requirements in software specifications: a new corpus and SVM-based classifier. In: Computer Software and Applications Conference, vol. 93, pp. 381–386 (2013) 2. Ben-Menachem, M.: Managing and leading software projects. 35 (2010) 3. Casta˜ neda, V., Ballejos, L., Caliusco, M.L., Galli, M.R.: The use of ontologies in requirements engineering. Glob. J. Res. Eng. 10(6) (2010) 4. Chuprina, S., Alexandrov, V., Alexandrov, N.: Using ontology engineering methods to improve computer science and data science skills. Procedia Comput. Sci. 80, 1780–1790 (2016) 5. I.I.F.P.U. Group: Cosmic and ifpug glossary of terms. Common Software Measurement International Consortium (2015) 6. Gruber, T.: Ontology. The Encyclopedia of Database Systems. Springer, Heidelberg (2008) 7. Jayatilleke, S., Lai, R.: A method of specifying and classifying requirements change. In: 2013 22nd Australian Software Engineering Conference, pp. 175–180 (2013) 8. Jepsen, T.C.: Just what is an ontology, anyway? IT Prof. 11, 22–27 (2009) 9. Khatoon, A., Motla, Y.H., Azeem, M., Naz, H., Nazir, S.: Requirement change management for global software development using ontology. In: 2013 IEEE 9th International Conference on Emerging Technologies (ICET), pp. 1–6 (2013) 10. Ramzan, S., Ikram, N.: Requirement change management process models: activities, artifacts and roles. In: 2006 IEEE International Multitopic Conference, pp. 219–223 (2006) 11. Sellami, A., Haoues, M., Borchani, N., Bouassida, N.: Towards an assessment tool for controlling functional changes in scrum process. In: The 28th International Workshop on Software Measurement and 13th International Conference on Software Process and Product Measurement (IWSM/Mensura), Beijing, China, 18–20 September, pp. 78–95 (2018) 12. Wang, Z., Wan, Y.: Research on engineering change knowledge representation and retrieval technology based on ontology. In: 2013 19th International Conference on Automation and Computing, pp. 1–5 (2013) 13. Yan, Y., Liao, P., Zhang, Z.: An ontology framework of software requirements change management process based on causality, pp. 107–111 (2018)
An Efficient MPLS-Based Approach for QoS Providing in SDN Manel Majdoub(B) , Ali El Kamel, and Habib Youssef PRINCE Lab, University of Sousse, Sousse, Tunisia manelmaj [email protected]
Abstract. SDN (Software Defined Networking) is a new network paradigm that facilitates network management by separating network control logic from switches and routers. Obviously, the separation of the network policy and the hardware implementation help improve quality of service (QoS) provided by the network. Therefore, SDN is able to provide flexible QoS control and perform network resources allocation for multimedia and real-time applications. This paper proposes a new approach for QoS providing in SDN. Based on its assets such as flow-level scheduling, QoS-based queueing and bandwidth reservation, the proposed approach aims to deliver fine-grained QoS management for Multimedia and real time applications. This approach is based on the classification of a flow in order to be assigned to an aggregation. An aggregation consists of gathering many flows together based on similar required QoS. Thus, All flows from the same aggregation are forwarded through a specific QoS-aware bearer. Each bearer is established through Multi Protocol Label Switching (MPLS)-based Label Switched Paths (LSP). We evaluate the performance of our approach in SDN networks in terms of several metrics including throughput, End-to-End delay, packet loss rate and jitter. According to the experimental results, our approach is able to satisfy efficiently the required QoS for each kind of application and perform better than legacy SDN networks with no QoS support.
Keywords: SDN
1
· QoS · MPLS · Vertex cover
Introduction
Since the emergence of advanced smart devices, Multimedia applications became increasingly popular and more and more deployed on cloud. Therefore, providers must offer end-to-end services with QoS guarantees to satisfy clients’ exigencies. However, these applications require different level of QoS depending on their constraints. As for example, augmented reality applications (AR) are bandwidthsensitive applications, which require high bandwidth while Netflix applications are more sensitive to the delay. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 497–508, 2021. https://doi.org/10.1007/978-3-030-49342-4_48
498
M. Majdoub et al.
To support QoS providing, the Internet Engineering Task Force (IETF) has implemented various technologies such as IntServ [1], DiffServ [2] and MPLS [3]. Despite their advantages, the cited technologies have not been widely deployed due to the complexity and the ossification of traditional network architectures. In the last decade, researchers have focused on reducing the complexity of classical networks by proposing new paradigms such as Software Defined Networking (SDN) [4]. SDN is an emerging network paradigm designed to fulfill the commitment of networks in terms of security, flexibility, efficiency and programmability [5]. Thanks to the separation of the control plane and the data plane, SDN makes network management more dynamic and simpler compared to legacy traditional networks. In this context, SDN provides programmable API to a controller to simplify the management and the control of the network. Indeed, the controller centralizes network management by providing a global visibility of the whole network through open and standard communication protocols such as OpenFlow [6]. OpenFlow protocol ensures the communication between controllers (control plane) and switches (data plane). It allows network operators to control traffic on an aggregated or a fine-grained flow-management level [7]. These assets make SDN very suitable for applications with very high QoS requirements. However, one of the most common problems that operators face is resources allocation. Due to the strong aspect of QoS requirements of today’s applications, operators must move from QoS providing to QoS provisioning. QoS provisioning consists of predicting expected resources for all applications in advance. Most approaches that deal with this issue propose the allocation of QoS-aware preestablished paths, known as bearers, and assigning incoming flows to one of those bearers. In the literature, we recognize bandwidth-sensitive bearers and delaysensitive bearers. Each flow, which is not assigned to one of the above bearers, is known as best effort traffic. Long Term Evolution (LTE) is an example of such approaches. LTE [8] classifies flows according to their QoS requirements. Flows may be bandwidth-sensitive, delay-sensitive or best effort. Each flow is forwarded through a specific bearer. A bearer is a virtual QoS-aware transmission path, which takes into account QoS specifications. Due to this strategy, LTE becomes the most widely used technology in mobile networking. In this paper, we propose an approach for QoS providing. It is based on MPLS (Multi Protocol Label Switching) technology and inspired from LTE, in order to guarantee high QoS delivery for multimedia and real-time applications. In our approach, a bearer is defined as an End to End path which connects a source to the destination and serves multiple flows having close QoS requirements. For each flow, an LSP is created within the corresponding bearer. The controller pre-establishes paths with available bandwidth and paths with minimum delay to satisfy QoS requirements for each application. Our approach aims to provide QoS management in SDN networks using MPLS capabilities. Mainly, it includes the following contributions: 1) Heuristic algorithms are used to build the matrix of bandwidth utilization, 2) An adaptive routing algorithm and a queue prioritization scheme based on the class of the flow and the already built matrix are
An Efficient MPLS-Based Approach for QoS Providing in SDN
499
defined, and 3) Specific MPLS-based bearers that supports QoS requirements are finally established. The rest of this paper is structured as follows: in Sect. 2, we discuss some related work. In Sect. 3, we describe different phases of our approach. In Sect. 4, we formulate the problem and the solution. In Sect. 5, we present experiments and results. Finally, Sect. 6 concludes our paper.
2
Related Work
Several researches in traditional network architectures invoked the problem of the network measurement based on the flow conservation law. In [9], authors suggested an architecture with a single point-of-control in the network that is responsible for gathering bandwidth and latency information using SNMP, RMON/NetFlow, and explicitly-routed IP probes. Moreover, a greedy rank algorithm based on the flow conservation law is implemented. The approach proposed in [10] focused on avoiding network congestion by adaptively balancing the load based on measurement and traffic analysis. To achieve this goal, the solution recommends a specific node called a Master node, which is dedicated to the measurement task by using a reduced set of measurement nodes known as the “minimum cover”. In another related work, authors in [11] introduced a solution to get traffic measurement statistics from the network devices in SDN environment. This study highlights the importance of OpenFlow Stats request and Reply messages in monitoring and management networks. Authors in [12] suggeste an end-to-end bandwidth guaranteeing model for OpenFlow. They classify flows into two types: QoS flows and Best effort flows. Results show that this model, when it uses hard-state reservation, performs less signaling overhead compared to IntServ. Unfortunately, all the above solutions do not raise concern about QoS requirements and do not try to provide finegrained QoS to each flow. As for example, the approach cited in [12] considered all flows as having the same class of service (they are called Qos flows) while flows may be delay-sensitive or bandwidth-sensitive. Obviously, we cannot treat both classes in a similar way and specific mechanisms are to be defined independently.
3
Overall Architecture
The proposed approach is described in Fig. 1. It consists of a set of functions to be applied on each incoming flow before being assigned to a bearer. Indeed, each incoming flow must be classified to one of the following types: i) Bandwidthsensitive flows ii) Delay-sensitive flows and iii) Best-effort flows. Classification is done at the ingress switch. Upon receiving the first packet of a flow, the ingress switch sends an OF packetIn message to the controller. According to the class of the flow, the controller uses its local network-state database to check for a suitable path which supports the flow. This suitable path must be included in the associated bearer. An LSP is therefore defined and an OF packetMod
500
M. Majdoub et al.
Fig. 1. Overall architecture.
message with required labels is forwarded to the ingress switch and to all switches belonging to the selected path. At this point, the first packet is admitted to the network. All subsequent packets from this flow are forwarded using the same installed rule without need to contact the controller another time. 3.1
Classification and Aggregation
Based on QoS requirements in terms of bandwidth and delay, a flow may be bandwidth-senstive, delay-senstive or best effort. Bandwidth-sensitive flows refer to real time applications such as Augmented reality (AR), which are forwarded mainly based on available resources and have the meduim priority. Delaysensitive flows are flows which do not tolerate latency such as video streaming applications (Netflix) and it is given a higher priority. Finally, Best effort flows are flows with no guaranteed QoS such as online gaming. They are assigned the lowest priority. 3.2
Network Monitoring
The first step for a path establishment is network monitoring. It provides a global view of the network and points out bottlenecks. In SDN, the controller collects local statistics from all switches. It builds an N*N square matrix of current
An Efficient MPLS-Based Approach for QoS Providing in SDN
501
bandwidth utilization, where N depicts the number of switches. As N increases, management of the matrix becomes significantly hard. We use the weak vertex cover algorithm (WVC) to reduce the amount of data and provide a significant advantage in terms of computation time and communication overhead. Based on the flow conservation law, this algorithm selects a subset of switches from the network, so that, when probed, they cover all the links of the network. The selected switches are known as the Minimum Vertex Cover (MVC). Therefore, the controller probes periodically those switches by sending an “OFPPortStatsRequest” message. Then, each switch replies with an “OFPortStatsReply” message providing required statistics, which are used to compute both the bandwidth availability and the delay of any path from the network. Moreover, the delay between two nodes of the network is also estimated basing on statistics collected from the MVC set. 3.3
Path Computation
This step is divided into two basic levels. In the first level, a suitable path is discovered using QoS-specific routing algorithms which are based on monitoring results. Multiple routing metrics can be considered in path computation such as delay, bandwidth and packet loss rate. Thus, the Dijkstra algorithm is applied to find the path with minimum delay and the path with maximum available resources that connect the source to the destination. Those paths are physically reserved to serve incoming delay-sensitive flows and bandwidth-sensitive flows, respectively. Each path is considered a bearer that can support multiple flows simultaneously. A problem may occur as multiple flows with different class of service reach a switch simultaneously. Indeed, flows, which are delay-sensitive must not wait for long and must be forwarded immediately. To deal with this issue, a preemptive priority queueing (PQ) model is defined within each switch. Three priority queues are defined. Delay-sensitive flows are served by the queue with the high priority whereas bandwidth-sensitive flows are placed in a mediumpriority queue. Finally, the lowest priority queue forwards best-effort flows. 3.4
Admission Control
Admission is achieved based on available resources. If the flow is admitted, the controller reserves resources by defining an LSP which is created using the hardstate reservation. The controller distributes labels to all switches using Label Distribution Protocol (LDP). A delay-sensitive flow must be admitted as soon as it is received. Therefore, the controller pre-establishes a set of LSP with convenient end-to-end delay ( (17) PCk = max(DISCinf ())| NC |< vk () >|2 k=1
Equality restraint in Eq (11) is modified as inequality restraint AGage i=1 NEAGi Yj − 100 ≤ ε; ∀i 100
(18)
ε is set to 0.01.
4 Case Study and Simulated Results In the present study the employees age ranges between 25 to 59 years i.e. the age of the youngest employee would be 25 and the old employee age would be 59 years. AGyear is 5 years. The present age AP0 and the desired age AP5 are presented in Fig. 2.
% of Number of Employees
Desired
Present(Current)
10 8 6 4 2 0 25
30
35
40 45 Age in Years
50
55
60
Fig. 2. The present age AP0 and the desired age AP5
Suming that the adjustments in the age groups are time invariant i.e. δAG1 Yj = δAG1 and so on. The total number of restraints are taken as 30 and all other adjustments
638
T. K. Sharma and A. Abraham
are considered as null. The age groups AG1 = 25; AG11 = 35, AG16 = 40 and AG26 = 50 are considered for performing the adjustments of the employees. The parameter of SFL algorithm are tuned as: Population size, number of memeplexes and number of local iterations are set as 100, 5 and 10 respectively. Maximum step size DSmax is allowed equal to the range of the variable. Each execution is repeated 25 times to eliminate the effect of randomness. The best results for the adjustment magnitudes are noted and plotted in Fig. 3. The evaluated best adjustment magnitudes then used to compute the age distribution at AGyear = 5 Years. The same is plotted in Fig. 4. Evolution of the age distribution is presented in Fig. 5 which also signifies that restraints are not violated while computing the adjustment magnitudes. Also all computed adjustmenyt values are discrete. Also the summation of employees in each consecutive year is tabulated in Table 1.
Change Number in %
1.50 1.00 0.50 0.00 -0.50
25
30
35
40
45
50
55
60
-1.00 Age in Years
Employee Number in %
Fig. 3. Graph of adjustment magnitudes achieved using SFL algorithm
9 8 7 6 5 4 3 2 1 0
Desired
25
30
35
40
45
50
55
60
Age in years Fig. 4. Age distribution at AGyear = 5 Years.
Result Analysis. From Fig. 4, it is analyzed that a large number of age groups are required to segment and accelerate the adjustment process of the age distribution as only four
Age Distribution Adjustments in Human Resource Department
Present Year 1st Year 2nd Year 3rd Year 4th Year
12.00 10.00 8.00 6.00 4.00 2.00 0.00
Emp[loyee Number in %
639
25
30
35
40 45 Age in Years
50
55
60
Fig. 5. Evolution of the age distribution
Table 1. Total Number of employees in years (1 to 5) Present 35
NEAGi Yj
i=1
1st year
2nd year
3rd year
4th year
5th year
GA [2]
100
100.67
100.85
100.52
99.69
99.87
SFLA
100
100.57
100.72
100.50
99.82
99.92
specific age groups are sufficient to modify the existing age distribution. The same discrepancy can be noted from Fig. 4. The total numbers of restraints (NC) are 185.
5 Conclusions with Future Scope In the present study a basic Shuffled Frog Leaping (SFL) algorithm is applied to solve a combinatorial problem of Human Resource Management. The problem is a vital part of Human Resource Planning in Human Resource Department. Based on the present distribution of age groups, the employees are dynamically distributed to achieve the desired distribution age group. The adjustment in the age distribution is evolutionary and performed by adding an adjustment magnitude to the present employees of selected age group distribution. The adjustment magnitudes values are evaluated using SFL algorithm. The results are also compared with that of Genetic Algorithm. In future more case studies of the problem will be referred to solve using basic and enhanced variant of SFL algorithm.
References 1. Drucker, P.F.: The Practice of Management. Harper & Brothers, New York (1954) 2. Harnpornchai, N., Chakpitak, N., Chandarasupsang, T., Tuang-AthChaikijkosi, Dahal, K.: Dynamic adjustment of age distribution in Human Resource Management by genetic algorithms. In: IEEE Congress on Evolutionary Computation (CEC 2007), 25–28 September 2007, Singapore, pp. 1234–1239 (2007)
640
T. K. Sharma and A. Abraham
3. Eusuff, M.M., Lansey, K.E.: Optimization of water distribution network design using the shuffled frog leaping algorithm. J. Water Resour. Plan. Manage. 129(3), 210–225 (2003) 4. Salomon, R.: Evolutionary algorithms and gradient search: similarities and differences. IEEE Trans. Evol. Comput. 2(2), 45–55 (1998) 5. Tang, L., Zhao, Y., Liu, J.: An improved differential evolution algorithm for practical dynamic scheduling in steelmaking-continuous casting production. IEEE Trans. Evol. Comput. 18(2), 209–225 (2014) 6. Dash, R., Dash R., Rautray R.: An evolutionary framework based microarray gene selection and classification approach using binary shuffled frog leaping algorithm. J. King Saud Univ. Comput. Inf. Sci. https://doi.org/10.1016/j.jksuci.2019.04.002 7. Pérez-Delgado, M.-L.: Color image quantization using the shuffled-frog leaping algorithm. Eng. Appl. Artif. Intell. 79, 142–158 (2019) 8. Sharma, T.K., Prakash, D.: Air pollution emissions control using shuffled frog leaping algorithm. Int. J. Syst. Assur. Eng. Manag. (2019). https://doi.org/10.1007/s13198-019-008 60-3 9. Rajpurohit, J., Sharma, T.K., Abraham, A.: Vaishali: glossary of metaheuristic algorithms. Int. J. Comput. Inf. Syst. Ind. Manage. Appl. 9, 181–205 (2017) 10. Eusuff, M.M., Lansey, K.E., Pasha, F.: Shuffled frog-leaping algorithm: a memetic metaheuristic for discrete optimization. Eng. Optim. 38(2), 129–154 (2006) 11. Coello, C.A.C.: Theoretical and numerical constraint-handling techniques used with evolutionary algorithms: a survey of the state of the art. Comput. Meth. Appl. Mech. Eng. 191, 1245–1287 (2002)
Selection of Cloud Service Provider Based on Sampled Non-functional Attribute Set Mehul Mahrishi1(B)
, Kamal Kant Hiran2 , and Ruchi Doshi3
1 Swami Keshvanand Institute of Technology, Jaipur, India
[email protected] 2 Aalborg University, Aalborg, Denmark
[email protected] 3 BlueCrest University College, Monrovia, West Africa, Liberia
[email protected]
Abstract. The successful service allocation is a result of the successful decisions made by the Cloud Service Provider. The decision must be effective and on time, for survival, to get competitive advantages and to increase profitability of an organization. In this scenario, the increasing number of Cloud Service Providers posts the challenge to measure the services on the basis of non-functional attributes also. In the literature, researchers have developed the optimization techniques to distribute the workload. A lot of work is done to optimize the functional attributes but through this research work we tried to propose a mathematical model to select the service provider based on some sampled non-functional attribute set. Keywords: Cloud computing · Cloud service provider · Resource allocation · Rough set theory · Node information system · Virtual machine
1 Introduction Cloud computing is a concept of integrating the characteristics and functionalities of psychological terms “cloud” and “computing” to a technological perspective. Cloud can be defined as a ubiquitous wide-area network community to provide IT capabilities and IT infrastructure that can be accessed using internet connectivity [2]. “It is a pay-per-use model which enables very easy (only by button clicking), on-demand, reliable access to computing resources through network access which can be rapidly stipulated and liberated with minimal consumer efforts and interaction of service providers”. There are number of service providers in the market now-a-days, but three public cloud companies that dominate the entire cloud space are: Amazon Web Services (AWS), Microsoft and Google. Our research focuses on Tier-3 Cloud Service Providers (CSP) that are dependent on Tier-1 service providers for providing services. Every CSP has their own mechanism of providing services for example Adhost provides dedicated web hosting featuring Microsoft servers, including windows 2008 & IIS whereas Enterhost has their expertise in disaster recover solutions, redundant storage and backup services [1, 3]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 641–648, 2021. https://doi.org/10.1007/978-3-030-49342-4_62
642
M. Mahrishi et al.
2 Related Work Cloud Service Provider selection and allocation is a hot topic and key factor in Cloud computing and Grid computing [11–13]. In Cloud, the resources are heterogeneous and very loosely coupled. The capacities are unknown and dynamic [14]. Therefore, the task must be sent to the CSP with high reliability rating [15]. Dynamic allocation of tasks and resource allocation poses a big challenge in cloud computing [16]. With Cloud users can not only just migrate their data and computation to different virtual machines at run time with minimal impact on system performance, but also access the data and computation at anytime and anywhere [17–19]. [21, 23] proposed approaches for dynamic resource allocation in cloud based on various existing algorithms like Dynamic Programming, Meta Heuristics, Greedy Algorithms etc [15, 21]. In a Computational grid or a cloudcomputing environment, the tasks are distributed across distinct computational nodes. In order to obtain optimized results, the allocation must be based on some pre-defined QoS parameters [20]. It is observed that researchers are working on resource allocation in Cloud on the basis of various parameters like computing power, network bandwidth, line quality, response time, task costs etc. [10, 24]. Apart from these functional parameters, various non-functional parameters are also explored like Energy Efficiency, Carbon Footprints and Reliability on which research focus needs to be shifted [13, 14, 17–20, 22, 24].
3 Problem Definition A Tenant evaluates cloud service providers on the basis of number of parameters. The cost being one of the measure but it is usually calculated on a per-use utility model. Apart, there are still number of functional and non-functional attributes that contribute to confidence of a CSP. The physical location of the servers, Reliability of data, cloud storage provider’s service-level agreement (SLA) etc. is very crucial for a customer. Security is another significant parameter. Moreover, these CSPs cannot be counted for all non-stop services. For some CSPs there may be certain services that may be down and not able to be accessed at certain points during the day. This kind of unwanted defeats the purpose of the cloud. The proposed research work uses Rough Set mathematical model to create higher and lower approximations to the given services and provide an internal rating of the CSPs. The rating denotes the strength of security architecture of the cloud service and the risk value associated with existing architecture [5, 6].
4 Problem Formulation The workload is distributed to the cloud service providers in the form of virtual machines (VMs). Let S= {SP1, SP2, SP3…SPn} be the set of n Service Providers total number of service providers. Let J = {J1, J2, J3…Jm} be the set of m jobs generated by the tenants. Two vectors; Resource Requirement Vector and Resource Available Vector are associated with each Job and each Service Provider.
Selection of Cloud Service Provider
643
Each service provider have its own capabilities and it is denoted by a matrix SPR of order n x p matrix and each job has certain requirements it is denoted as JPR of order m x p matrix.
Matrix 1. SPR Matrix for CSP
Matrix 2. JPR Matrix for Jobs
5 Methodology The proposed algorithm enables the tenant to pre-check the cloud service provider adequacy over service level satisfaction. The Rough Set Mathematical Model is used to measure the basis of level of satisfaction of the service. Rough set model synthesize approximation concepts from the actual data [8, 9]. The data is represented as a table where each row is an object, which is simply a cloud service provider in our case. Every column represents an attribute (a variable, an observation, a property etc…) which are non-functional attributes in our case. The table is called an information system [7, 8]. Rough Set Theory (RST) in our research can be approached as an extension of the classical set theory. Pair can represent it as: A = (S, A), where S = {SP1, SP2, SP3 . . . SPn}, nonempty set of Service Providers A : is a non − empty finite set of attributes such that A : U → Va for every a ε A The set Va is called as the value set of A.
(1)
644
M. Mahrishi et al.
A set is said to be a rough set if the boundary region is non-empty. In the considered example we synthesize the outcome in terms of the non-functional attributes. [7–9] Let X = {x/ρ(x) > threshold value} As per the given example it is X = {C, E, F, G}. In the considered example as per the definitions of lower and upper approximations are: LX = {A, B, D} and UX = {C, E, F, G}
6 Illustrative Example and Simulation Results This research work, presents a stepwise algorithm for CSP selection, which solves the problem of choosing right service provider for right kind of service. The main selection criteria is listed in Table 1 that contain 12 sampled non-functional attributes of a Cloud Service Provider namely: Application Security (I), Legal Issues (II), Virtualization (III), Access and Identity management (IV), Risk Management (V), Interoperability and Portability (VI), Business Continuity and Disaster Recovery (VII), Data Centre Operations (VIII), Incident Response (IX), Key Management (X) and Compliance and Audit (XI). All these attributes equally participate when any Tenant (Cloud User) wants to select Cloud Service Provider (CSP) [4, 5]. Initially, each CSP is asked to rate itself out of 10 to fill Table 1. The initial random weights and its fluctuation during run-time rating is automatically stored in last attribute of the table i.e. Compliance and Audit. All these non-functional attributes are categorized on the basis of relevance to tenant and CSPs. The categorization can be done by general survey from existing CSP and cloud users. The resulting set is called as a ‘Reduct’, which is a subset of attributes, which is a predominant attribute set.
Selection of Cloud Service Provider
645
Table 1. A node information system CSP I (Ci)
II III IV V VI VII VIII IX X XI
A
6 3
7
4
7
5
6
8
5
5
5.5 AUTOFILL
B
5 6
7
7
4
8
5
9
3
3
5.5 AUTOFILL
C
7 6
4
5
8
5
6
4
6
7
5.8 AUTOFILL
D
5 7
8
3
5
3
7
8
7
5
5.8 AUTOFILL
E
6 5
7
6
6
7
6
4
9
6
6.2 AUTOFILL
F
8 8
6
8
8
5
8
7
4
6
6.8 AUTOFILL
G
9 7
7
5
9
6
5
6
8
8
7 AUTOFILL
This relevance generates a threshold value γi for each attribute out of a scale of 10. For each attribute ρi the value of ρi γi ≥ 1 or TRUE (2) 0, Otherwise The rough set model represents the CSP and non-functional attributes in a tabular form. The rows of the Table 1 represent a sample Node Information System. It contains the list of cloud service providers and the columns consist of the attributes of the respective cloud service provider. Equation 2 is applied to each row/column combination of the table and a Node Information system is created as represented in Table 2. We then apply Rough Sets to convert the table into lower approximations and upper approximations. The lower one we cannot consider for assigning to the jobs because these are not suitable according to user standards. 6.1 Assessment of Rating The cloud service providers are divided into equivalent classes and a membership value is calculated by applying the fuzzy logic as δ : CSP → [0 1]
(3)
The value of δ indicates the capability of the service providers. This value is considered as an important parameter while allocating the jobs to the service providers.
646
M. Mahrishi et al. Table 2. Node information system after approximation CSP I (Ci)
II III IV V VI VII VIII IX X XI
C
1 1
0
0
1
0
1
0
1
1
1
E
1 0
1
1
1
1
1
0
1
1
1
F
1 1
0
1
1
0
1
1
0
0
1
G
1 1
1
0
1
0
0
0
1
1
1
6.2 Service Provider Selection Table 3 represents the Cloud Service Providers and their respective Fuzzy cost values. The fuzzy cost of the service provider is the aggregate of all the membership values of each attribute as per the Eq. 4. Table 3. Fuzzy cost of the service provider (Ci) I
II
III
IV
V
VI
VII
VIII IX
X
XI
C
1.20 1.09 0.72 0.90 1.45 0.90 1.09 0.72 1.09 1.20 10.33/18.18 = 0.568
E
0.96 0.80 1.12 0.96 0.96 1.12 0.96 0.64 1.45 0.96 9.93/18.18 = 0.546
F
1.17 1.17 0.88 1.17 1.17 0.73 1.17 1.02 0.58 0.88 9.94/18.18 = 0.547
G
1.28 1
1
0.71 1.28 0.85 0.71 0.85 1.14 1.14 9.96/18.18 = 0.548
μ =
n i=1
δi
(4)
From Table 3 it can be seen that Service provider C is the best choice as per our above evaluation.
7 Conclusion and Future Work The research proposes an internal rating for cloud service providers that can be very significant for Tenants while selecting CSPs. The mathematical model can be embedded with existing cloud simulators like CloudSim, Gridsim, and Cloud Analyst etc. To optimize the value of δ we can also use some factors such as Cohesion ρch , coupling ρco etc. We can assign these values on the basis of performance of cloud on the worst case. Higher the value of ρch , and ρco, higher the chance of that CSP to get selected by the model.
Selection of Cloud Service Provider
647
References 1. Liu, H., Abraham, A., Snanel, V., McLoone, S.: Swarm scheduling approaches for work-flow applications with security constraints in distributed data-intensive computing environments. Inf. Sci. 192, 228–243 (2012) 2. Hiran, K.K., Doshi, R., Fagbola, D.T., Mahrishi, M.: Cloud Computing: Concepts, Architecture and Applications with Real-World Examples and Case Studies. BPB Publications, New Delhi (2019) 3. Lee, Y.C., Wang, C., Zomaya, A.Y., Zhoua, B.B.: Profit-driven scheduling for cloud services with data access awareness. J. Parallel Distrib. Comput. 72, 591–602 (2012) 4. Zhang, Y., Leia, F., Zhia, Y.: Optimization of cloud database route scheduling based on combination of genetic algorithm and ant colony algorithm. Procedia Eng. 15, 3341–3345 (2011) 5. Huanga, Y., Bessis, N., Norringto, P., Kuonend, P., Hirsbrunner, B.: Exploring decentralized dynamic scheduling for grids and clouds using the community-aware scheduling algorithm. Future Gener. Comput. Syst. (2011). https://doi.org/10.1016/j.future.2011.05.006 6. Mahrishi, M., Shrotriya, A.: Globally recorded binary encoded domain compression algorithm in column oriented databases 7. Zhua, X., Hea, C., Li, K., Qin, X.: Adaptive energy-efficient scheduling for real time tasks on DVS-enabled heterogeneous clusters. J. Parallel Distrib. Comput. 72, 751–763 (2012) 8. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982) 9. Pawlak, Z.: Rough Sets – Theoretical Aspects of Reasoning about Data. Kluwer Academic Publisher, Dordrecht (1991) 10. Sampaio, M., Barbosa, J.G., Prodan, R.: Piasa: a power and interference aware resource management strategy for heterogeneous workloads in cloud data centers. Simul. Model. Pract. Theor. 57, 142–160 (2015) 11. Lee, H.M., Jeong, Y.-S., Jang, H.J.: Performance analysis based resource allocation for green cloud computing. J. Supercomput. 69(3), 1013–1026 (2013). https://doi.org/10.1007/s11227013-1020-x 12. Ergu, D., Kou, G., Peng, Y., Shi, Y., Shi, Y.: The analytic hierarchy process: Task scheduling and resource allocation in cloud computing environment. J. Supercomput. 64(3), 835–848 (2013) 13. Fayyaz, A., Khan, M.U., Khan, S.U.: Energy efficient resource scheduling through VM consolidation in cloud computing. In: IEEE 13th International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, pp. 65–70 (2015) 14. Hsu, C.-H., Slagter, K.D., Chen, S.-C., Chung, Y.-C.: Optimizing energy consumption with task consolidation in clouds. Inf. Sci. 258, 452–462 (2014) 15. Luo, J.P., Li, X., Chen, M.R.: Hybrid shuffled frog leaping algorithm for energy-efficient dynamic consolidation of virtual machines in cloud data centers. Expert Syst. Appl. 41(13), 5804–5816 (2014) 16. Wang, X., Liu, X., Fan, L., Jia, X.: A decentralized virtual machine migration approach of data centers for cloud computing. Math. Probl. Eng. 2013, 1–11 (2013) 17. Mishra, S.K., Puthal, D., Sahoo, B., Jena, S.K., Obaidat, M.S.: An adaptive task allocation technique for green cloud computing. J. Supercomput. 74(1), 370–385 (2017). https://doi. org/10.1007/s11227-017-2133-4 18. Singh, S., Chana, I., Singh, M., Buyya, R.: SOCCER: selfoptimization of energy-efficient cloud resources. Cluster. Comput. 19(4), 1787–1800 (2016) 19. Calheiros, R.N., Buyya, R.: Energy-efficient scheduling of urgent bag-of-tasks applications in clouds through DVFS. In: Cloud Computing Technology and Science (CloudCom), Singapore, 2014, pp. 342–9
648
M. Mahrishi et al.
20. Lee, Y.C., Zomaya, A.Y.: Energy efficient utilization of resources in cloud computing systems. J. Supercomput. 60(2), 268–280 (2012) 21. Dong, Z., Liu, N., Rojas-Cessa, R.: Greedy scheduling of tasks with time constraints for energy-efficient cloudcomputing data centers. J. Cloud Comput. 4(1), 1–14 (2015) 22. Gao, Y., Wang, Y., Gupta, S.K., Pedram, M.: An energy and deadline aware resource provisioning, scheduling and optimization framework for cloud systems. In: Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, Montreal, Quebec, Canada, p. 31. IEEE Press (2013) 23. Cano, F.A., Tchernykh, A., Corts-Mendoza, J.-M., Yahyapour, R., Drozdov, A., Bouvry, P., Kliazovich, D., Avetisyan, A.: Heterogeneous job consolidation for power aware scheduling with quality of service. In: Workshop on Network Computing and Supercomputing, vol. 1482, pp. 687–97 (2015) 24. Bera, S., Misra, S., Rodrigues, J.J.: Cloud computing applications for smart grid: A survey. IEEE Trans. Parallel Distrib. Syst. 26(5), 1477–1494 (2015)
Image Processing Techniques for Breast Cancer Detection: A Review Mahendra G. Kanojia1(B) , Mohd. Abuzar Mohd. Haroon Ansari2 , Niketa Gandhi3 , and S. K. Yadav1 1 JJT University, Jhunjhunu, Rajasthan, India
[email protected], [email protected] 2 SIES College of Commerce and Economics, Mumbai, Maharashtra, India
[email protected] 3 University of Mumbai, Mumbai, Maharashtra, India
[email protected]
Abstract. The frequency of breast cancer cases in women is increasing worldwide. A substantial amount of time is taken for a histopathologist to analyse the tissue slide. There is a need for automated systems to aid the pathologist for the detection of malignancy. Early detection of breast cancer leads to faster treatment and increases the chances of survival. It is crucial for the researchers to design systems that can increase the speed and accuracy of the diagnosis of breast cancer. Histological analysis is a prominent approach in the detection of breast cancer. Histopathology images are complex in nature with heterogeneous background and distorted shaped nucleus on it. With the advancements in image processing techniques, there are various solutions given by researchers for processing histology images. Breast cancer computer aided diagnosis system developers need to have insight knowledge of histology slide preparation and manual study of the slides. This will help them to mimic the histopathologist while designing the system and increase the accuracy and reliability of the system. This paper covers in depth study of breast biopsy, histological slide description, image processing techniques for automated histopathology analysis and breast cancer. Keywords: Breast cancer · Breast biopsy · Histopathology images · Image processing · Nucleus detection
1 Introduction According to the World Health Organization (WHO), breast cancer is found to be one of the major types of cancer among women all over the globe. In the report of 2018 it is estimated that approximately 627,000 deaths have occurred due to breast cancer alone which amount to 15% of all death due to cancer. Many cases have proved that early and accurate detection of breast cancer helps doctors to decide the line of treatment which increases the chances of cure. Image processing is an aid for multiple folds advancements in the health care sector. There has been tremendous research carried on worldwide, © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 649–660, 2021. https://doi.org/10.1007/978-3-030-49342-4_63
650
M. G. Kanojia et al.
using image processing techniques, to create computer aided diagnosis systems (CAD) [8, 12, 20]. The aim of the CAD system is to aid doctors and health care practitioners to improve the speed and quality of diagnosis. Computational researchers across the globe who are specializing in image processing are working towards developing expert systems. Breast cancer detection is a topic of research from multiple decades, researchers use various image processing tools and techniques to design breast cancer detection expert systems and advances towards the accuracy and early diagnosis of cancer. Breast cancer detection automated systems majorly uses mammography images [1, 3, 6, 8, 15, 16, 18] to identify the calcification and abnormality in the breast or histopathological images [1, 2, 4, 5, 7, 10–14, 19, 21, 24] to identify malignancy that is cancer nuclei or mitosis state. Histopathological slide image analysis for breast cancer detection is less explored than other types of images. The complex architecture and heterogeneity of the histopathological images are the two major reasons for the same [10, 20, 22]. In this paper, we have presented a process to design an expert system, to diagnose the histopathological slide, to report malignancy, that is the presence of cancer cells or benign, that is absence cancer cells. To understand the properties of breast histopathological slides we have first cover the steps involved in the preparation of histopathological slides followed by the visual features which are analyzed by histopathologist during diagnosis of cancer. We have given an overview of the process of digitization of tissue slides. A major part of the paper deals with the various image processing techniques which are recommended in the design of automated CAD systems for the diagnosis of breast cancer. Chronological steps for automated breast cancer identification and analysis using histopathological specimen. Step 1: Collect tissue specimens from the patient through biopsy. Step 2: Breast cancer histopathological slide preparation. Step 3: Digital image acquisition. Step 4: Applying Image processing techniques. Step 5: Identification of nuclei. Step 6: Histological Analysis.
2 Breast Cancer Histopathology Preliminary Before we start the discussion on image analysis techniques, we will describe the manual process for the preparation of histopathological slides. Histopathology is a microscopic study of tissues mounted on the glass side. The histopathological slides are the material for the manual diagnosis process. It is very important to understand the slide making process to validate the use of slides before the slide digitization process.
Image Processing Techniques for Breast Cancer Detection: A Review
651
2.1 Breast Tissue Biopsy and Slide Preparation Breast biopsy is the surgical process where the tissue from the suspected area of the breast is extracted for pathological analysis. A breast biopsy can be done using one of the several methods shown in Fig. 1.
Fig. 1. Types of breast biopsies.
Tissues which are removed by biopsy processes undergo various steps before it is mounted on slides for pathological analysis [9]. We have acquired knowledge of the histopathological slide preparation process from an expert histopathologist practicing in Mumbai, Maharashtra, India. Figure 2 shows the steps for histopathological slide preparation.
Fig. 2. Steps for histopathological slide preparation.
Grossing: Grossing of tissue includes an examination of the specimen and giving appropriate section. Fixation: Tissue requires fixation to avoid autolysis and degeneration. Various fixation can be used but the most common is 10% buffered saline. Tissue Processing: The extracted tissue requires to be embedded on a solid medium which will enable it to be cut down into a thin section. This is the aim of tissue processing. Embedding: Blocks of tissue are prepared after fixation using paraffin wax. Microtome: Tissue ribbons of 3-5 micron thickness are cut using a microtome machine. Slides: Tissue ribbons are taken on glass slides which have a thin layer of egg albumin applied to them.
652
M. G. Kanojia et al.
Heating: Heating is required to remove excess wax on the slide. Stanning: This is a very important step where the tissue is stained using strainers. Hematoxylin and Eosin stain (H&E stain) is used to stain the slides. Mounting: The stained slide is mounted using a mounting medium such as DPX (Dibutyl Phthalate Xylene). It gives good transparency to stained tissue. Labelling: The slides are labeled and are ready for microscopic examination. 2.2 Oncological Diagnosis At the microscopic magnification of 40x the architecture of cells in the normal tissue slide includes systematic arrangements of cells. The number of cells are more towards the base and gradually decreases towards the interior [22] as shown in Fig. 3(a). Necrosis is not seen in normal tissues. Malignant cells are present in clusters [22] as shown in Fig. 3(b) and not in a systematic manner. Necrosis may be present in the malignant slide.
Fig. 3. Tissue at 40x magnification, a) normal tissue and b) benign tissue (Source: BreaKHis – Breast Cancer Histopathological Database)
Features examined by histopathologists for diagnosis of breast cancer at 100x and above microscopic magnification are: 1. No. of cells: In normal tissue, the density of cell is average whereas in oncological tissue the cell count increases drastically. 2. Size of cell: Malignant cell size if 8 to 10 times or even larger than normal cell size. It implies that 8 to 10 normal cells can fit in one malignant cell. 3. Presence of Necrosis: Necrosis region is never found in normal tissues. There are chances of necrosis region to be found in cancer tissue. If necrosis is found cancer detection is confirmed. Figure 4 shows tissue with necrosis. 4. Compensation ratio between nucleus to cytoplasm: High compensation ratio affirms cancer.
Image Processing Techniques for Breast Cancer Detection: A Review
653
Fig. 4. Malignant tissue with necrosis (Source: BreaKHis – Breast Cancer Histopathological Database)
3 Image Processing Techniques for Breast Cancer Nuclease Identification Image processing techniques are used for both the nucleus identification and feature extraction. Breast cancer detection using image processing involves image enhancement and nucleus segmentation from the background. The image parametric features of these nuclei are extracted in a numerical and normalized format. These values can be used later to experiment and simulate machine intelligence techniques. The principal steps for breast cancer nucleuses identification is shown in Fig. 5. 3.1 Image Acquisition To begin with automation process digitization of the histopathological slides is necessary. This can be achieved using various image acquisition tools. Attaching the digital camera on the top of the eyepiece of the microscope is one of the simplest and economical ways to acquire a slide image. Presently, whole slide image scanners are in use to acquire high resolution and noise avoided images. This type of scanners automates the process of scanning the image, compressing and storage [9, 21]. Such images give high response results to the image processing techniques implemented. 3.2 Image Pre-processing Image pre-processing is applied for image enhancement, to reduce the computational cost and to remove the noise from the tissue images which may occur during image acquisition. It has been seen that converting image from colour to grayscale [3–5, 7, 20, 24] is adopted as a first step, followed by image smoothing and / or sharpening filters [1] to enhance the image features. Image sharpening filters such as Gaussian, Laplacian of Gaussian [19] are promising in the enhancement of the edges of nucleases. Median [3, 4, 7, 13, 22] and Gaussian [8, 12, 24] smoothing filters are preferred for noise reduction. Researchers working with intensity property of image uses techniques such as multilevel wavelet transformation [3], histogram stretching [20], histogram equalization [3, 8, 13, 20, 22, 23]. Work done by [7] used intensity adjustment and [4, 21] have used red channel intensity for pre-processing histopathological images. The low-resolution images obtained after image pre-processing can be first analyzed to roughly locate the regions of interest, and only these regions go to the higher resolution processing step.
654
M. G. Kanojia et al.
Fig. 5. Principal steps for breast cancer nucleases identification using image processing
3.3 Image Segmentation After pre-processing, image segmentation is one of the most important steps. Image segmentation extracts objects/regions of interest from the background; these objects and regions are the focus for further disease identification and classification. Local thresholding [1, 7, 22] approaches use a value (threshold) to separate objects from the background; this value is typically based on image intensity. Most preferred dynamic Otsu’s thresholding [3, 6, 8, 19, 20, 22, 24] method finds the global threshold with a weighted within-class variance of the threshold heterogeneous background pixels as black and converts the identified nucleus into white pixels. Entropy based segmentation works on the principle that the non-homogeneous region has more entropy than the homogeneous region. Maximization of relative-entropy [4, 21] between background pixels (homogeneous) and foreground pixels that are nucleases (non-homogeneous) technique is also used in the segmentation of nucleases. Segmentation of overlapping nucleases is achieved with marker-controlled watershed [9, 13, 20] and H-minima transformation based [24] segmentation methods. These methods work well if the locations of the foreground object that are nucleases and background pixels are marked. Gaussian Vector Flow (GVF) snake [18] is an advanced methodology of snakes, which can segment the irregularly shaped nuclei in the image. Snakes are curves which can move under the influence of internal and external forces derived from image data. Image
Image Processing Techniques for Breast Cancer Detection: A Review
655
segmentation is expected to prominently highlight nucleases and subtract unwanted pixels from the image. Histopathological image feature extraction methodologies can be applied directly after image segmentation, provided the segmentation of nucleases are performed accurately and no post processing techniques are required. 3.4 Image Post-processing Once the pixels corresponding to the nucleus is segmented, we enhance the nucleus pixels. Image post-processing methodologies enhance the intensity of nucleus pixels, by applying image processing filters to the image. Morphological operators [3, 4, 6, 7, 21, 23] such as opening, closing, thinning thickening, dilation, top hat transformation and morphological reconstruction of the rolling-ball algorithm are implemented to identify nuclei within the brighter background. Rolling-ball is an advanced morphological algorithm that successively erodes the image and reconstructs it at each step, resulting in an image, which is missing the peaks. This leaves nuclei with a radius equal to the number of iterations performed. 3.5 Identification of Nucleases Nucleuses from the images containing bright nuclei on a dark background image, produced in post-processing steps are required to be identified. The nuclei of cells become more easily identifiable due to morphological transformation step which produces a more uniform morphology of nuclei, high contrast relative to the background, and good separation between adjacent nuclei [19]. A set of nuclei identified can be used in downstream modules for measurement purposes. Most of the literature merges this step with image segmentation methodologies as many of the techniques used for image segmentation such as morphological rolling-ball reconstruction serves both the purposes of image segmentation and nucleus identification. Fourier transform [3], circular Hough transform [20], Mexican hat transformation [20], Bayesian posterior map followed by local thresholding [23] is a few specialized algorithms implemented for nucleases identification. Once the nucleases are identified in the image, histopathologists can use it for a detailed diagnosis. To extend the work for implementing soft-computing methodologies breast cancer image feature extraction methodologies are applied to generate the numerical datasets.
4 Review of Image Processing Techniques This section includes a comprehensive summary of the image processing techniques used in various research work. Image processing techniques for image enhancement, segmentation and nucleus detection are presented for different image types. The work done by [2, 11] shows the use of a convolutional neural network (CNN) as an emerging technique (Table 1).
656
M. G. Kanojia et al. Table 1. Review of Image processing techniques for nucleases identification
Author and year
Image type
Image Image preprocessing segmentation techniques techniques
Dabass, J., 2019 [17]
Ultrasound Images
Median filter
Varma, C., 2018 [1]
Image post-processing techniques
Nuclease/mass identification techniques
Region Based Not and Edge based implemented Thresholding, Clustering
Boundary Extraction
Mammography High Pass Images Sharpening filter
Local Entropy and Thresholding
Morphologically area opening, Edge smoothing, Hole filling
External shape features and internal topographic features such as texture and appearance
Baker, Q.B., 2018 [12]
Histopathology Gaussian Images Smoothing Filter
Ground truth manually cropped and then K-means segmentation applied
Morphological operations to fill holes and merge dispersed cell regions
Watershed segmentation based on separate cells and Watershed segmentation based on Gap between cells
Khuriwal, N., 2018 [13]
Histopathology Median Filter Images and Histogram Equalization
Marker Controlled Watershed Segmentation
Not implemented
Convolutional Neural Network
Sadoughi, F., 2018 [15]
Mammography Median filter, Images Contrast Adjustment
Thresholding
Edge detection
Not implemented
Lakshmanan, B., 2018 [24]
Histopathology Gray Scale Images conversion, Gaussian Filter
H-minima Not Transformation implemented
Sangeetha, R., 2017 [3]
Mammography Median filters Otsu’s Images and Adaptive Thresholding histogram Equalization
Thresholding, Morphological Dilation, Morphological top hat transformation and Sobel edge detection mask
Watershed algorithm and Colour Quantization
Ghongade, R.D., 2017 [8]
Mammography Gaussian Images filter and Adaptive Histogram Equalization
Not implemented
Not implemented
Otsu’s Thresholding
Otsu’s Thresholding and Gabor Feature extraction
(continued)
Image Processing Techniques for Breast Cancer Detection: A Review
657
Table 1. (continued) Author and year
Image type
Image Image preprocessing segmentation techniques techniques
Image post-processing techniques
Nuclease/mass identification techniques
Chang, J., 2017 [11]
Histopathology Not Images Applicable
Deep Learning
Not Applicable
Convolutional Neural Network (CNN) Google’s Inception v3 model
.Rajyalakshmi, Histopathology Median filter U., 2017 [14] Images
Region Growing and Merging. Epithelial cell nuclei segmentation
Gradient estimated and corner detection
Watershed Algorithm
Giri, P., 2017 [16]
Mammography Grayscale Images Conversion, Images are filtered through noise removal algorithm
Global thresholding, edge detection and region-based segmentation
Classifier and deformable model-based segmentation
Clustering
Xu, J., 2016 [2]
Histopathology Not Images Applicable
Not Applicable Not Applicable
Convolutional Neural Network
Johra, F.T., 2016 [5]
Histopathology Gray Scale Images Conversion
Cropping and Unmixed image, Thresholding
Morphological Operations
Cell Profiler Tool
Kanojia, M.G., Histopathology Gray Scale 2016 [19] Images Conversion
Otsu’s Global Thresholding
Marker Controlled Watershed Segmentation
Rolling ball Morphological operation
Paul, A., 2016 [21]
Histopathology Red channel Images of the histology images are used to smooth the image and to preserve the edges
Iterative Entropy Method
Morphological closing and opening, Region Filling
Informative Morphological Scale Space
Paul, A., 2015 [4]
Histopathology Red channel Images of the histology images are used to smooth the image
Thresholding
Morphological opening and closing.
Relative-Entropy Maximized Scale Space (REMSS) use to identify mitosis
(continued)
658
M. G. Kanojia et al. Table 1. (continued)
Author and year
Image type
Image Image preprocessing segmentation techniques techniques
Image post-processing techniques
Nuclease/mass identification techniques
Swetha, T., 2015 [6]
Mammography Gray Scale Images Conversion
Hybrid Image Segmentation and Otsu’s Thresholding methods
Top Hat Morphological Operation and Artifacts Elimination
Subtract morphological image from original image
Helwan, A., 2015 [7]
Histopathology Gray Scale Thresholding Images Conversion using the luminosity method, Median Filter and Image Intensity Adjustment
Morphological Erosion and Dilation
GLCM algorithm
Bhandari, Histopathology Gray Scale S.H., 2015 [10] Images Conversion and differenceof-Gaussian Smoothing filter
Study of whole Generation of slide image features hence code-book segmentation is not done
bag-of-features (BOF) and k-means clustering
Logambal, G., 2015 [22]
Histopathology Gray Scale Images conversion, Median filter, Histogram Equalization
Local thresholding and Otsu’s Thresholding method
Posterior Map Generation
Bayesian modeling
Mustafa, M., 2014 [18]
Mammography Gray Scale Images conversion, Gaussian Smoothing Filter and Contrast Enhancement
Gaussian Vector Flow (GVF) Snake algorithm
Not implemented
Not Applicable
George, YM., 2014 [20]
Histopathology Histogram Images Stretching, Histogram Equalization, Contrast Stretching
Otsu’s Global Thresholding
Not implemented
Circular Hough transform, Mexican Hat transformation
Lu, C., 2014 [23]
Histopathology Histogram Images Equalization
Global Thresholding
Morphological Opening and Closing
Bayesian Posterior Map
5 Conclusion and Future Work In this paper, several histopathology, mammography, ultrasound image analysis techniques are discussed. The paper gives an insight view of biopsy and histopathology
Image Processing Techniques for Breast Cancer Detection: A Review
659
slide preparation. It brings the attention of researchers towards the importance of manual tissue mounting process before digitization of tissue slides. The ground truth of the images can be verified by the researchers after acquiring histology knowledge which lessens dependency on histopathologists. The spectrum of image preprocessing, postprocessing, segmentation and nucleuses detection techniques covered in this paper gives experimental liberty to the computational researchers working on breast cancer detection. It is observed that CNN [2, 11] is a promising hybrid technique where researchers can aim to segment nucleuses using machine learning techniques with image processing. This paper and the study [25] done by us in 2019, were we have reviewed the machine learning techniques for detection of breast cancer using image features, can be used to designed CAD system for detection of breast cancer.
References 1. Varma, C., Sawant, O.: An alternative approach to detect breast cancer using digital image processing techniques. In: International Conference on Communication and Signal Processing (ICCSP), Chennai, India, pp. 134–137 (2018) 2. Xu, J., Xiang, L., Liu, Q., Gilmore, H., Wu, J., Tang, J., Madabhushi, A.: Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images. IEEE Trans. Med. Imaging 35(1), 119–130 (2016) 3. Sangeetha, R., Murthy, K.S.: A novel approach for detection of breast cancer at an early stage using digital image processing techniques. In: International Conference on Inventive Systems and Control (ICISC), Coimbatore, India (2017) 4. Paul, A., Mukherjee, D.P.: Mitosis detection for invasive breast cancer grading in histopathological images. IEEE Trans. Image Process. 24(11), 4041–4054 (2015) 5. Johra, F.T., Shuvo, M.M.H.: Detection of breast cancer from histopathology image and classifying benign and malignant state using fuzzy logic. In: 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), Dhaka, Bangladesh (2016) 6. Swetha, T., Bindu, C.: Detection of breast cancer with hybrid image segmentation and otsu’s thresholding. In: International Conference on Computing and Network Communications (CoCoNet), Trivandrum, India, pp. 565–570 (2015) 7. Helwan, A., Abiyev, R.H.: An intelligent system for identification of breast cancer. In: International Conference on Advances in Biomedical Engineering (ICABME), Beirut, Lebanon, pp. 17–20 (2015) 8. Ghongade, R.D., Wakde, D.G.: Computer-aided diagnosis system for breast cancer using RF classifier. In: International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, India, pp. 1068–1072 (2017) 9. Veta, M., Pluim, J.P.W., Diest, P.J.V., Viergever, M.A.: Breast cancer histopathology image analysis a review. IEEE Trans. Biomed. Eng. 61, 1400–1411 (2014) 10. Bhandari, S.H.: A bag-of-features approach for malignancy detection in breast histopathology images. In: IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, pp. 4932–4936 (2015) 11. Chang, J., Yu, J., Han, T., Chang, H.J., Park, E.: A method for classifying medical images using transfer learning: a pilot study on histopathology of breast cancer. In: IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), Dalian, China (2017)
660
M. G. Kanojia et al.
12. Baker, Q.B., Zaitoun, T.A., Banat, S., Eaydat, E., Alsmirat, M.: Automated detection of benign and malignant in breast histopathology images. In: IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA), Jordan, pp. 1–5 (2018) 13. Khuriwal, N., Mishra, N.: Breast cancer detection from histopathological images using deep learning. In: 3rd International Conference and Workshops on Recent Advances and Innovations in Engineering (ICRAIE) (2018) 14. Rajyalakshmi, U., Rao, S.K., Prasad, K.S.: Supervised classification of breast cancer malignancy using integrated modified marker controlled watershed approach. In: IEEE 7th International Advance Computing Conference (IACC) (2017) 15. Sadoughi, F., Kazemy, Z., Hamedan, F., Owji, L., Rahmanikatigari, M., Azadboni, T.T.: Artificial intelligence methods for the diagnosis of breast cancer by image processing: a review. Breast Cancer Targets Ther. 10, 219–230 (2018) 16. Giri, P., Saravanakumar, K.: Breast cancer detection using image processing techniques. Orient. J. Comput. Sci. Technol. 10, 391–399 (2017) 17. Dabass, J., Arora, S., Vig, R., Hanmandlu, M.: Segmentation techniques for breast cancer imaging modalities-a review. In: 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India (2019) 18. Mustafa, M., Rashid, N.A.O., Samad, R.: Breast cancer segmentation based on GVF snake. In: IEEE Conference on Biomedical Engineering and Sciences (IECBES) (2014) 19. Kanojia, MG., Abraham, S.: Breast cancer detection using RBF neural network. In: 2nd International Conference on Contemporary Computing and Informatics (IC3I) (2016) 20. George, Y.M., Zayed, H.H., Roushdy, M.I., Elbagoury, B.M.: Remote computer-aided breast cancer detection and diagnosis system based on cytological images. IEEE Syst. 8, 949–964 (2014) 21. Paul, A., Mukherjee, DP.: Gland segmentation from histology images using informative morphological scale space. In: IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA (2016) 22. Logambal, G., Saravanan, V.: Cancer diagnosis using automatic mitotic cell detection and segmentation in histopathological images. In: Global Conference on Communication Technologies (GCCT), Thuckalay, India (2015) 23. Lu, C., Mandal, M.: Toward automatic mitotic cell detection and segmentation in multispectral histopathological images. IEEE J. Biomed. Health Inform. 18, 594–605 (2014) 24. Lakshmanan, B., Saravanakumar, S.: Nucleus segmentation in breast histopathology images. In: International Conference on Current Trends Towards Converging Technologies (ICCTCT), Shillong, India (2018) 25. Kunal, P., Mahendra, K., Brian, D., Niketa, G.: Breast cancer detection using WBCD. In: International Interdisciplinary Conference on Recent Trends in Science and Review of Research Journal. UGC Approved Journal no. 48514, Alibag, India (2019)
Author Index
A Abraham, Ajith, 622, 632 Adedeji, Afolabi, 612 Adewumi, Adewole, 582 Ahuja, Ravin, 582, 592, 612 Al-Shamma, Omran, 90, 132, 245 Alves, André Luiz F., 225 Alves, James, 162 Alves, Shara S. A., 140 Al-Yassin, Hassan, 245 Alzubaidi, Laith, 90, 132, 245 Amaral, Gabriela, 550 Ammar, Randa Ben, 172 Ansari, Mohd. Abuzar Mohd. Haroon, 649 Aouani, Hadhami, 406 Arkah, Zinah Mohsin, 90 Atayero, Aderemi A., 601 Awad, Fouad H., 90 Ayed, Yassine Ben, 80, 172 Ayeni, Babajide, 592 B Badejo, Joke A., 601 Bajpai, Abhishek, 318, 530 Barrera-Cámara, Ricardo A., 182 Bassett, Bruce A., 426 Ben Abdessalem Karaa, Wahiba, 561 Ben Aouicha, Mohamed, 264 Ben Ayed, Yassine, 406, 416 Ben Salah, Manel, 346 BenAyed, Yassine, 396 Benzarti, Sabrine, 561 Benzarti, Sana, 205 Bernábe-Loranca, M. B., 182 Bharati, Subrato, 69
Bouassida, Nadia, 487 Bougares, Fethi, 217 Brichni, Marwa, 112, 436 Brito, Jessica, 162 Brunessaux, Stephan, 235 Bührmann, Jacoba H., 298 C Chimedza, Charles, 298 Clark, Patrick G., 376 D Damaševičius, Robertas, 582, 592 Dammak, Nouha, 396 DaSilva, Alvaro, 287 de F. Souza, Luís Fabrício, 140 de Freitas, Jhonata Soares, 477 de la Cal, Enrique, 287 de Oliveira, Aillkeen Bezerra, 225 de Oliveira, Elias, 150 de Souza Baptista, Cláudio, 225 de Souza Leite Cuadros, Marco Antonio, 255, 355 dos S. Silva, Francisco Hércules, 140 dos Santos, André Gustavo, 477 Doshi, Ruchi, 641 Du Plessis, Francois, 456 Du Plessis, M. C., 456 E El Fakharany, Essam, 572 El Gattoufi, Said, 436 Elamine, Maryam, 217 Elayni, Marwa, 122 El-Bendary, Nashwa, 336, 572
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): ISDA 2019, AISC 1181, pp. 661–663, 2021. https://doi.org/10.1007/978-3-030-49342-4
662 El-ziaat, Hadeer, 336 Essid, Mondher, 100 F Fadhel, Mohammed A., 90, 132, 245 Falcão, António, 550 Fan, Kaipeng, 622 Fáñez, Mirko, 287 Fatnassi, Ezzeddine, 509 Filho, Pedro Pedrosa Rebouças, 140 G Gandhi, Niketa, 69, 649 Ganouni, Nourhen, 193 Gatepaille, Sylvain, 235 Gattoufi, Said El, 112 Ghannouchi, Sonia Ayachi, 193 Ghezala, Henda Hajjami Ben, 193 Gibbon, Tim, 456 Gonçalves, S. V., 14 González-Velázquez, R., 182 Granillo-Martinez, Erika, 182 Grzymala-Busse, Jerzy W., 376 Guo, Jifeng, 622 H Hadj Taieb, Mohamed Ali, 264 Hadrich Belguith, Lamia, 217 Hajjami Ben Ghezala, Henda, 561 Hamza, Sihem, 416 Hayashi, Teruaki, 1 Henke dos Reis, Douglas, 255 Hippe, Zdzislaw S., 376 Hiran, Kamal Kant, 641 Holanda, Gabriel Bandeira, 140 I Ibukun, Afolabi, 612 J Jemili, Farah, 24, 100, 122 K Kachouri, Abdennaceur, 520 Kamel, Ali El, 497 Kanojia, Mahendra G., 649 Khaskhoussy, Rania, 80 Kherallah, Monji, 386 Kondo, Sae, 1 Korbaa, Ouajdi, 24, 100, 122, 205 Kumar, Rupesh, 318
Author Index L Lachtar, Abdelfetteh, 520 Lachtar, Marwa, 520 Laleye, Olamide, 582 Li, Baosheng, 622 Lochner, Michelle, 426 M Maalej, Rania, 386 Mabuza-Hocquet, Gugulethu, 278, 365 Machado, Diogo, 550 Mahrishi, Mehul, 641 Majdoub, Manel, 497 Marconato Stringhini, Romulo, 309 Maruyama, Yoshihiro, 466, 540 Maskeliūnas, Rytis, 582, 592 Mattos da Silva, Rodrigo, 355 Mechti, Seifeddine, 217 Meddeb, Rahma, 24 Mishra, Pranchal, 530 Misra, Sanjay, 582, 592, 601, 612 Moawad, Ramadan, 336 Mouaddib, Abdel-Illah, 235 Mroczek, Teresa, 376 Mudali, Deborah, 59 Muyama, Lillian, 59 N Nakatumba-Nabende, Joyce, 59 Nelwamondo, Fulufhelo V., 278, 365 Nicoletti, M. C., 14 Niemiec, Rafal, 376 Nogara Dotto, Gustavo, 309 O Ogbuokiri, Blessing, 35 Ohsawa, Yukio, 1 Oliveira, Elias, 162 Oluwatobi, J. Afolabi, 278, 365 Osman, Heba, 572 P Peng, Lizhi, 622 Pereira, Alfredo F., 550 Pereira, Sofia, 550 Pirovani, Juliana, 162 Pirovani, Juliana P. C., 150 Pivot, Frédérique, 446 Plogmann, Ramona, 446 Podder, Prajoy, 69 Popoola, Segun I., 601
Author Index R Raborife, Mpho, 35 Rahman, Mohammad Atikur, 69 Rapheal, Ojelabi, 612 Rejeb, Lilia, 509 Ribeiro, Rita, 550 Robel, Md. Robiul Alam, 69 Roberts, Ethan, 426 Rodrigues Garcia, Thiago, 355 Romero-Montoya, M., 182 S Sakhrawi, Zaineb, 487 Sakka, Mustapha, 346 Santos, Jorge, 550 Sebai, Mariem, 509 Sedano, Javier, 287 Sellami, Asma, 487 Sharma, Tarun K., 632 Smaoui, Souhaïl, 346 Smith, Bevan I., 298 Solaiman, Basel, 122 Sousa, Emanuel, 550 Sowunmi, Olaperi Yeside, 592 Spalenza, Marcos A., 150 Suárez, Victor, 287 T Tan, Qing, 446 Tello Gamarra, Daniel Fernando, 255, 309, 355
663 Thabet, Dhafer, 193 Tiwari, Naveen, 318, 530 Triki, Bayrem, 24, 205 Tripathi, Ayush, 530 Turki, Houcemeddine, 264
V Varela, M. L. R., 550 Vasnier, Kilian, 235 Villar, Jose Ramón, 48, 287 Villar, Mario, 48
W Wang, Lin, 622 Welfer, Daniel, 255, 309 Williams, Rotimi, 601
Y Yadav, S. K., 649 Yadav, Sanjeev, 318 Yang, Bo, 622 Youssef, Habib, 49
Z Zhu, Jian, 622