291 60 21MB
English Pages XV, 308 [323] Year 2021
Advances in Intelligent Systems and Computing 1182
Ajith Abraham M. A. Jabbar Sanju Tiwari Isabel M. S. Jesus Editors
Proceedings of the 11th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2019)
Advances in Intelligent Systems and Computing Volume 1182
Series Editor Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland Advisory Editors Nikhil R. Pal, Indian Statistical Institute, Kolkata, India Rafael Bello Perez, Faculty of Mathematics, Physics and Computing, Universidad Central de Las Villas, Santa Clara, Cuba Emilio S. Corchado, University of Salamanca, Salamanca, Spain Hani Hagras, School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK László T. Kóczy, Department of Automation, Széchenyi István University, Gyor, Hungary Vladik Kreinovich, Department of Computer Science, University of Texas at El Paso, El Paso, TX, USA Chin-Teng Lin, Department of Electrical Engineering, National Chiao Tung University, Hsinchu, Taiwan Jie Lu, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia Patricia Melin, Graduate Program of Computer Science, Tijuana Institute of Technology, Tijuana, Mexico Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro, Rio de Janeiro, Brazil Ngoc Thanh Nguyen , Faculty of Computer Science and Management, Wrocław University of Technology, Wrocław, Poland Jun Wang, Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications on theory, applications, and design methods of Intelligent Systems and Intelligent Computing. Virtually all disciplines such as engineering, natural sciences, computer and information science, ICT, economics, business, e-commerce, environment, healthcare, life science are covered. The list of topics spans all the areas of modern intelligent systems and computing such as: computational intelligence, soft computing including neural networks, fuzzy systems, evolutionary computing and the fusion of these paradigms, social intelligence, ambient intelligence, computational neuroscience, artificial life, virtual worlds and society, cognitive science and systems, Perception and Vision, DNA and immune based systems, self-organizing and adaptive systems, e-Learning and teaching, human-centered and human-centric computing, recommender systems, intelligent control, robotics and mechatronics including human-machine teaming, knowledge-based paradigms, learning paradigms, machine ethics, intelligent data analysis, knowledge management, intelligent agents, intelligent decision making and support, intelligent network security, trust management, interactive entertainment, Web intelligence and multimedia. The publications within “Advances in Intelligent Systems and Computing” are primarily proceedings of important conferences, symposia and congresses. They cover significant recent developments in the field, both of a foundational and applicable character. An important characteristic feature of the series is the short publication time and world-wide distribution. This permits a rapid and broad dissemination of research results. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/11156
Ajith Abraham M. A. Jabbar Sanju Tiwari Isabel M. S. Jesus •
•
•
Editors
Proceedings of the 11th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2019)
123
Editors Ajith Abraham Scientific Network for Innovation and Research Excellence Machine Intelligence Research Labs (MIR) Auburn, WA, USA
M. A. Jabbar Department of Computer Science and Engineering Vardhaman College of Engineering Hyderabad, Telangana, India
Sanju Tiwari Departamento de Inteligencia Artificial Universidad Politécnica de Madrid Madrid, Spain
Isabel M. S. Jesus ISEP - Instituto Superior de Engenharia do Porto Porto, Portugal
ISSN 2194-5357 ISSN 2194-5365 (electronic) Advances in Intelligent Systems and Computing ISBN 978-3-030-49344-8 ISBN 978-3-030-49345-5 (eBook) https://doi.org/10.1007/978-3-030-49345-5 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Welcome Message
Welcome to Vardhaman College of Engineering, Hyderabad, India, and to the 11th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2019) and 11th World Congress on Nature and Biologically Inspired Computing (NaBIC 2019). In 2018, SoCPaR was held at Instituto Superior de Engenharia do Porto (ISEP), Portugal, during December 13–15 and NaBIC at VIT University, Vellore, India, during December 06–08. SoCPaR 2019 is organized to bring together worldwide leading researchers and practitioners interested in advancing the state of the art in soft computing and pattern recognition, for exchanging knowledge that encompasses a broad range of disciplines among various distinct communities. It is hoped that researchers and practitioners will bring new prospects for collaboration across disciplines and gain inspiration to facilitate novel breakthroughs. The themes for this conference are thus focused on “Innovating and Inspiring Soft Computing and Intelligent Pattern Recognition.” SoCPaR 2019 received submissions from 16 countries, and each paper was reviewed by at least five reviewers in a standard peer review process. Based on the recommendation by five independent referees, finally 24 papers will be presented during the conference (acceptance rate of 35%). The 11th World Congress on Nature and Biologically Inspired Computing (NaBIC 2019) brings together international researchers, developers, practitioners, and users. The aim of NaBIC is to serve as a forum to present current and future work as well as to exchange research ideas in this field. The conference theme is “Nurturing Intelligent Computing Towards Advancement of Machine Intelligence.” NaBIC 2019 received submissions from 10 countries, and each paper was reviewed by at least five reviewers in a standard peer review process. Based on the recommendation by five independent referees, finally five papers will be presented during the conference (acceptance rate of 30%). Conference proceedings are published by Springer Verlag, Advances in Intelligent Systems and Computing Series, which is now indexed by ISI Proceedings, DBLP, SCOPUS, etc. Many people have collaborated and worked hard to produce this year successful SoCPaR–NaBIC conferences. First and foremost, we would like to thank all the authors for submitting their papers to the v
vi
Welcome Message
conference, for their presentations and discussions during the conference. Our thanks to Program Committee members and reviewers, who carried out the most difficult work by carefully evaluating the submitted papers. We are grateful to our five plenary speakers for their wonderful talks: • Prof. Dr. Arturas Kaklauskas, Vilnius Gediminas Technical University, Lithuania • Prof. Dr. Pawan Lingras, Saint Mary’s University, Canada • Prof. Dr. Stephen Huang, University of Houston, USA • Prof. Dr. Patricia Melin, Tijuana Institute of Technology, Tijuana, Mexico • Prof. Dr. Oscar Castillo, Tijuana, Institute of Technology, Tijuana, Mexico Thanks to the Springer Publication team for the wonderful support for the publication of these proceedings. We express our sincere thanks to the session chairs and local organizing committee chairs for helping us to formulate a rich technical program. We are thankful to the Management of Vardhaman College of Engineering for hosting SoCPaR–NaBIC 2019 Conferences, Computer Society Chapter of IEEE Hyderabad for the great local organization. Enjoy reading the proceedings. Ajith Abraham M. A. Jabbar General Chairs Laura Garcia-Hernandez Isabel M. S. Jesus Rajanikanth Aluvalu Program Chairs
Organization
SoCPaR–NaBIC 2019 Organization Chief Patrons T. Vijender Reddy (Chairman) M. Rajasekhar Reddy (Vice chairman) T. Upender Reddy (Secretary) E. Prabhakar Reddy (Treasurer)
Vardhaman India Vardhaman India Vardhaman India Vardhaman India
College of Engineering, Hyderabad, College of Engineering, Hyderabad, College of Engineering, Hyderabad, College of Engineering, Hyderabad,
Patrons K. Mallikharjuna Babu (Director)
Vardhaman College of Engineering, Hyderabad, India
Honorary Chairs B. L. Deekshatulu Ketan Kotecha (Dean)
Distinguished Fellow, IDRBT, Hyderabad and Ex-Director, NRSA Faculty of Engineering, Symbiosis International (Deemed University) and Director, Symbiosis Institute of Technology
General Chairs Ajith Abraham M. A. Jabbar
Machine Intelligence Research Labs (MIR Labs), USA Vardhaman College of Engineering, Hyderabad, India
vii
viii
Organization
Program Chairs Laura Garcia-Hernandez Isabel Jesus Rajanikanth Aluvalu
University of Cordoba, Spain Instituto Superior de Engenharia do Porto, Portugal Vardhaman College of Engineering, Hyderabad, India
SoCPaR Program Committee Janos Abonyi Ajith Abraham Laurence Amaral Babak Amiri José Everardo Bessa Maia János Botzheim Joseph Alexander Brown Alberto Cano Paulo Carrasco Oscar Castillo Turgay Celik Isaac Chairez Lee Chang-Yong Mario Giovanni C. A. Cimino Phan Cong-Vinh Gloria Cerasela Crisan Alfredo Cuzzocrea Haikal El Abed El-Sayed M. El-Alfy Carlos Fernandez-Llatas Amparo Fuster-Sabater Xiao-Zhi Gao Alexander Gelbukh Elizabeth Goldbarg Thomas Hanne Biju Issac Isabel Jesus Kyriakos Kritikos Jerry Chun-Wei Lin Simone Ludwig Kun Ma Ana Madureira Jabbar Meerja
University of Pannonia Machine Intelligence Research Labs (MIR Labs) Federal University of Uberlandia The University of Sydney State University of Ceará-UECE Budapest University of Technology and Economics Innopolis University Virginia Commonwealth University Univ. Algarve Tijuana Institute of Technology University of the Witwatersrand UPIBI-IPN Kongju National University University of Pisa Nguyen Tat Thanh University “Vasile Alecsandri” University of Bacau ICAR-CNR and University of Calabria German International Cooperation (GIZ) GmbH King Fahd University of Petroleum and Minerals Universitat Politècnica de València Institute of Applied Physics (C.S.I.C.), Serrano 144, 28006 Madrid, Spain Aalto University Instituto Politécnico Nacional Federal University of Rio Grande do Norte University of Applied Sciences Northwestern Switzerland Teesside University Institute of Engineering of Porto Institute of Computer Science, FORTH Western Norway University of Applied Sciences North Dakota State University . Departamento de Engenharia Informática jntu
Organization
Efrén Mezura-Montes Jolanta Mizera-Pietraszko Paulo Moura Oliveira Diaf Moussa Akila Muthuramalingam Janmenjoy Nayak C. Alberto Ochoa-Zezatti Varun Ojha Konstantinos Parsopoulos Carlos Pereira Eduardo Pires Dilip Pratihar Radu-Emil Precup Héctor Quintián Meera Ramadas Keun Ho Ryu Ozgur Koray Sahingoz Neetu Sardana Hirosato Seki Mansi Sharma Mohammad Shojafar Patrick Siarry Antonio J. Tallón-Ballesteros Shing Chiang Tan Jose Tenreiro Machado Sanju Tiwari Eiji Uchino Leonilde Varela Gai-Ge Wang Lin Wang Frantisek Zboril SoCPaR Additional Reviewers Das Sharma, Kaushik Diniz, Thatiana Menezes, Matheus Márquez Grajales, Aldo Rahul, Mayur Ramos, Octavio Santiago-Valentín, Eric
ix
University of Veracruz Wroclaw University of Technology UTAD University UMMTO KPR Institute of Engineering and Technology Aditya Institute of Technology and Management (AITAM) Universidad Autónoma de Ciudad Juárez University of Reading University of Ioannina ISEC UTAD University Department of Mechanical Engineering Politehnica University of Timisoara University of A Coruña University College of Bahrain Chungbuk National University Istanbul Kultur University Jaypee Institute of Information technology Osaka University Indian Institute of Technology, Delhi University of Surrey Universit de Paris 12 University of Huelva Multimedia University ISEP National Institute of Technology Kurukshetra Yamaguchi University University of Minho School of Computer Science and Technology, Jiangsu Normal University University of Jinan Brno University of Technology
x
Organization
NaBIC Program Committee Janos Abonyi Ajith Abraham Laurence Amaral Babak Amiri José Everardo Bessa Maia János Botzheim Joseph Alexander Brown Alberto Cano Paulo Carrasco Oscar Castillo Turgay Celik Isaac Chairez Lee Chang-Yong Mario Giovanni C.A. Cimino Phan Cong-Vinh Gloria Cerasela Crisan Alfredo Cuzzocrea Haikal El Abed El-Sayed M. El-Alfy Carlos Fernandez-Llatas Amparo Fuster-Sabater Terry Gafron Xiao-Zhi Gao Laura Garcia-Hernandez Alexander Gelbukh Elizabeth Goldbarg Thomas Hanne Biju Issac Isabel Jesus Kyriakos Kritikos Jerry Chun-Wei Lin Simone Ludwig Kun Ma Ana Madureira Jabbar Meerja Efrén Mezura-Montes Jolanta Mizera-Pietraszko Paulo Moura Oliveira Diaf Moussa Akila Muthuramalingam
University of Pannonia Machine Intelligence Research Labs (MIR Labs) Federal University of Uberlandia The University of Sydney State University of Ceará-UECE Budapest University of Technology and Economics Innopolis University Virginia Commonwealth University Univ. Algarve Tijuana Institute of Technology University of the Witwatersrand UPIBI-IPN Kongju National University University of Pisa Nguyen Tat Thanh University “Vasile Alecsandri” University of Bacau ICAR-CNR and University of Calabria German International Cooperation (GIZ) GmbH King Fahd University of Petroleum and Minerals Universitat Politècnica de València Institute of Applied Physics (C.S.I.C.), Serrano 144, 28006 Madrid, Spain Bio Inspired Technologies Aalto University University of Córdoba Instituto Politécnico Nacional Federal University of Rio Grande do Norte University of Applied Sciences Northwestern Switzerland Teesside University Institute of Engineering of Porto Institute of Computer Science, FORTH Western Norway University of Applied Sciences North Dakota State University . Departamento de Engenharia Informática jntu University of Veracruz Wroclaw University of Technology UTAD University UMMTO KPR Institute of Engineering and Technology
Organization
Janmenjoy Nayak C. Alberto Ochoa-Zezatti Varun Ojha Konstantinos Parsopoulos Carlos Pereira Eduardo Pires Dilip Pratihar Radu-Emil Precup Héctor Quintián Meera Ramadas José Raúl Romero Keun Ho Ryu Ozgur Koray Sahingoz Neetu Sardana Hirosato Seki Mansi Sharma Mohammad Shojafar Patrick Siarry Shing Chiang Tan Jose Tenreiro Machado Eiji Uchino Leonilde Varela Gai-Ge Wang Lin Wang Frantisek Zboril NaBIC-Additional Reviewers Das Sharma, Kaushik Mejía-de-Dios, Jesús-Adolfo Ramírez, Aurora Santos, André
xi
Aditya Institute of Technology and Management (AITAM) Universidad Autónoma de Ciudad Juárez University of Reading University of Ioannina ISEC UTAD University Department of Mechanical Engineering Politehnica University of Timisoara University of A Coruña University College of Bahrain University of Cordoba Chungbuk National University Istanbul Kultur University Jaypee Institute of Information technology Osaka University Indian Institute of Technology, Delhi University of Surrey Universit de Paris 12 Multimedia University ISEP Yamaguchi University University of Minho School of Computer Science and Technology, Jiangsu Normal University University of Jinan Brno University of Technology
Contents
Generalized Fuzzy Rough Sets Based on New Fuzzy Similarity Relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Avatharam Ganivada SV-NET: A Deep Learning Approach to Video Based Human Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sukrit Bhattacharya, Vaibhav Shaw, Pawan Kumar Singh, Ram Sarkar, and Debotosh Bhattacharjee Machine Learning Based Framework for Recognizing Traffic Signs on Road Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Any Gupta and Ayesha Choudhary
1
10
21
Cursor Control Using Face Gestures . . . . . . . . . . . . . . . . . . . . . . . . . . . Arihant Gaur, Akshata Kinage, Nilakshi Rekhawar, Shubhan Rukmangad, Rohit Lal, and Shital Chiddarwar
31
A Smart Discussion Forum Website . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rohit Beniwal, Mohd. Danish, and Arpit Goel
41
Certificate Management System Using Blockchain . . . . . . . . . . . . . . . . . Anjaneyulu Endurthi and Akhil Khare
50
Reality Check in Virtual Space for Privacy Behavior of Indian Users of Social Networking Sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sandeep Mittal and Priyanka Sharma
58
Application of Artificial Electric Field Algorithm for Economic Load Dispatch Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anita, Anupam Yadav, and Nitin Kumar
71
Intelligent Data Compression Policy for Hadoop Performance Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Ashu, Mir Wajahat Hussain, Diptendu Sinha Roy, and Hemant Kumar Reddy
80
xiii
xiv
Contents
Using 3D Hahn Moments as A Computational Representation of ATS Drugs Molecular Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . Satrya Fajri Pratama, Azah Kamilah Muda, Yun-Huoy Choo, Ramon Carbó-Dorca, and Ajith Abraham
90
Anomaly Detection Using Modified Differential Evolution: An Application to Banking and Insurance . . . . . . . . . . . . . . . . . . . . . . . 102 Gutha Jaya Krishna and Vadlamani Ravi Deep Quantile Regression Based Wind Generation and Demand Forecasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 N. Kirthika, K. I. Ramachandran, and Sasi K. Kottayil Recommendation System for E-Commerce by Memory Based and Model Based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . 123 K. RaviKanth, K. ChandraShekar, K. Sreekanth, and P. Santhosh Kumar Malware Behavior Profiling from Unstructured Data . . . . . . . . . . . . . . 130 Yoong Jien Chiam, Mohd Aizaini Maarof, Mohamad Nizam Kassim, and Anazida Zainal Customized Hidden Layered ANN Based Pattern Recognition Technique for Differential Protection of Power Transformer . . . . . . . . . 141 Harish Balaga and Deepthi Marrapu Gaussian Naïve Bayes Based Intrusion Detection System . . . . . . . . . . . . 150 Akhil Jabbar Meerja, A. Ashu, and Aluvalu Rajani Kanth Traveler Behavior Cognitive Reasoning Mechanism . . . . . . . . . . . . . . . 157 Ahmed Tlili, Salim Chikhi, and Ajith Abraham Grading Retinopathy of Prematurity with Feedforward Network . . . . . 168 Shantala Giraddi, Satyadhyan Chickerur, and Nirmala Annigeri Fraudulent e-Commerce Website Detection Model Using HTML, Text and Image Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Eric Khoo, Anazida Zainal, Nurfadilah Ariffin, Mohd Nizam Kassim, Mohd Aizaini Maarof, and Majid Bakhtiari Sleep Disorders Prevalence Studies in Indian Population . . . . . . . . . . . . 187 Vanita Ramrakhiyani, Niketa Gandhi, and Sanjay Deshmukh Prediction Models in Healthcare Using Deep Learning . . . . . . . . . . . . . 195 S. Bhavya and Anitha S. Pillai Comparison of Global Prevalence of Sleep Disorders in Intellectually Normal v/s Intellectually Disabled: A Review . . . . . . . . 205 Nushafreen Irani, Niketa Gandhi, Sanjay Deshmukh, and Abhijit Deshpande
Contents
xv
Detecting Learning Affect in E-Learning Platform Using Facial Emotion Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Benisemeni Esther Zakka and Hima Vadapalli Computer Aided System for Nuclei Localization in Histopathological Images Using CNN . . . . . . . . . . . . . . . . . . . . . . . . . 226 Mahendra G. Kanojia, Mohd. Abuzar Mohd. Haroon Ansari, Niketa Gandhi, and S. K. Yadav Intrusion Detection System for the IoT: A Comprehensive Review . . . . 235 Akhil Jabbar Meera, M. V. V. Prasad Kantipudi, and Rajanikanth Aluvalu Multi-objective Symmetric Fractional Programming Problem and Duality Relations Under ðC; Gf ; a; q; dÞ-Invexity over Cone Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Ramu Dubey, Teekam Singh, Vrince Vimal, and Bhaskar Nautiyal Wind Power Intra-day Multi-step Predictions Using PDE Sum Models of Polynomial Networks Based on the PDE Conversion and Substitution with the L-Transformation . . . . . . . . . . . . . . . . . . . . . 254 Ladislav Zjavka, Václav Snášel, and Ajith Abraham Optimization of Application-Specific L1 Cache Translation Functions of the LEON3 Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Nam Ho, Paul Kaufmann, and Marco Platzner Assessment of Environmental and Occupational Stresses on Physiological and Genetic Profiles of Sample Population . . . . . . . . . 277 Jasbir Kaur Chandani, Niketa Gandhi, and Sanjay Deshmukh Deep Convolution Neural Network-Based Feature Learning Model for EEG Based Driver Alert/Drowsy State Detection . . . . . . . . . . . . . . . 287 Prabhavathi C. Nissimagoudar, Anilkumar V. Nandi, and H. M. Gireesha A Feature Extraction and Selection Method for EEG Based Driver Alert/Drowsy State Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 P. C. Nissimagoudar, Anilkumar V. Nandi, and H. M. Gireesha Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Generalized Fuzzy Rough Sets Based on New Fuzzy Similarity Relation Avatharam Ganivada(B) School of Computer and Information Sciences, University of Hyderabad, Telangana 500015, India [email protected]
Abstract. Fuzzy similarity relations are equivalence class constructs involving a method of handling uncertainty in the feature space. Fuzzy influence value is an expression for quantification of an equivalence class, which describes similarity/dissimilarity between a pair of patterns pertaining to a single class/different classes. Formation of fuzzy similarity relations with the fuzzy influence value to describe the method of handling uncertainty in real life data is primary task. A new fuzzy similarity relation based on a fuzzy influence value is initially defined. A fuzzy rough set involving the lower and upper approximation of a set, based on the fuzzy similarity relation, is then modernized. Moreover, entropy for evaluating uncertainty is defined on basis of the generalized fuzzy rough sets. Several properties of rough set theory which involve in fuzzy rough set are discussed. Computation of fuzzy lower and fuzzy upper approximations of a set (fuzzy rough set) is illustrated using a typical data, as an example. Entropy values for the data comparing with an existing fuzzy rough entropy are provided. These values demonstrate that the proposed entropy is more effective for handling uncertainty arising in overlapping regions.
Keywords: Fuzzy rough set Uncertainty modeling
1
· Entropy · Rough set properties ·
Introduction
Knowledge representation is based on knowledge extraction and rule generation from data. Knowledge extraction and rule generation are important techniques in rough set theory. Different tools for knowledge extraction and rule generation, based on rough set theory, are developed in [1]. Here, rough set deals with incomplete information. Rough set is characterized by the rough lower and upper approximations. Crisp equivalence classes, induced by crisp information granules, are central to rough set theory. The values of a feature (conditional and decision) over all the objects constitute crisp equivalence classes, where indistinguishability between the objects is considered. In computing the equivalence c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 1–9, 2021. https://doi.org/10.1007/978-3-030-49345-5_1
2
A. Ganivada
classes of objects, feature values are discretized on the basis of thresholds. The discretized features do not generally preserve the actual information with the features, thereby leading to information loss. It is required to introduce the concept of similarity in feature space which gives rise to membership values between a pair of objects, inducing degree of similarity between the pair. Several investigations on fuzzyfication of crisp equivalence classes into fuzzy similarity or tolerance relations have been made in [2,3] and [4]. The addition of fuzziness in the equivalence classes is called fuzzyfication. In the process, a fuzzy membership function is used. A set is approximated using the fuzzy lower and upper memberships in fuzzy feature space, where the approximate operators, like implication and t-norm, employ the fuzzy equivalence classes in obtaining those memberships. The fuzzy lower and upper approximations of a set represent a fuzzy rough set. Different notions of fuzzy rough set using variable-precision [5], fuzzy discernibility matrix [5], similarity relations [6] are established. These are aimed to deal with the overlapping classes and noise data in fuzzy approximation space [7]. Moreover, fuzzy rough sets have several advantages of robustness, simplification and efficiency for data analysis. Fuzzy rough sets provide conceptual and powerful methods for handling uncertainty in data. One of the methods of using fuzzy rough sets comes with entropy measure. The recent past studies [8,9] provide different classes of entropy measures based on the concepts of fuzzy rough set to quantify uncertainty of data. In this investigation, a fuzzy influence value for each class is initially defined using the concepts of fuzzy set. In doing so, a membership function assigns membership value to a pattern in a class. The membership of a pattern is initially multiplied with a weighting value which is the distance from the pattern to the mean of a class. In effect, a pattern in a class is assigned with a weighted membership value. Based on the actual and weighted memberships of a pattern, a fuzzy influence value for a class is defined. Here, the weighted membership of is inverse related to the actual membership of a pattern. A membership function involves the fuzzy influence value of a class. A new similarity relation based on the fuzzy influence value and the membership function is defined. The notion of fuzzy rough set involving the lower and upper approximations is modernized using the concept of fuzzy similarity relations. An entropy measure is proposed using the generalized fuzzy rough sets. While the proposed similarity relation takes care of equivalence of a pair of patterns, the proposed entropy signifies in handling uncertainty. The article is organized as follows: Sect. 2 introduces the fundamental concepts of fuzzy rough set like fuzzy similarity relation, fuzzy decision classes, fuzzy lower and upper approximations of a set. Section 3 describes the method of proposed fuzzy similarity relation and a novel fuzzy influence value. This section also provides the process of generalizing the lower and upper approximations of a fuzzy set, denoted by a generalized fuzzy rough set, and the formulation of an entropy measure using the concepts of the proposed fuzzy rough set. Section 4 provides the important theoretical properties of the generalized fuzzy rough set. Based on a real life data, the values of the lower & upper approximations of
Generalized Fuzzy Rough Sets Based on New Fuzzy Similarity Relation
3
a fuzzy set and the entropy are calculated and these are provided in the same section. Section 5 concludes the present investigation.
2
Preliminaries of Fuzzy Rough Sets
In theory of fuzzy rough sets, the real life data is presented to a decision table, denoted by S. The table S is expressed in the form of {U, F ∪ {d}}, where U contains a set of all patterns, F represents a set of conditional features and {d} is a set of decision features. Let xi , i = 1, 2, . . . , m, denote a set of patterns belonging to c number of classes. Each pattern has n number of features representing with aj , j = 1, 2, . . . , n. The decision features are denoted by dk , k = 1, 2, . . . , c. Based on the decision table S, fuzzy reflexive relational matrix corresponding to a conditional feature aj and fuzzy decision classes corresponding to a decision feature dk are initially described. Then the lower and upper approximations of a set, characterizing the fuzzy rough set, based on the relational matrix and decision classes, are defined using fuzzy logical operators. These are discussed as follows. 2.1
Fuzzy Reflexive Relation
A fuzzy reflexive relation Raj between a pair of patterns x1 and x2 for i = 1 and 2 corresponding to a conditional feature aj is defined as [8] Raj (x1 , x2 ) = ⎧ aj (x2 )−aj (x1 )+σaj a(x1 )−a(x2 )+σaj ⎪ 1 1 ⎪ max min , ,0 , ⎪ σa j σa j ⎪ ⎪ 1 1 ⎪ ⎪ ⎪ if aj (x1 ) & aj (x2 ) ∈ Rd1 , ⎪ ⎪ ⎨ (1) aj (x2 )−aj (x1 )+σaj aj (x1 )−aj (x2 )+σaj ⎪ 2 2 ⎪ max min , , 0 , ⎪ σa j σa j ⎪ ⎪ 2 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ if aj (x1 ) ∈ Rd1 , aj (x2 ) ∈ Rd2 . For k = 1 and 2, σajk represents standard deviations of a kth set, say Xjk , corresponding to jth conditional feature and Rdk denotes kth decision class. Equation 1 produces a fuzzy reflexive relational matrix corresponding to conditional feature. Every row of the matrix is called fuzzy equivalence granule which contains similarities between all the possible pairs of patterns corresponding to the feature. On the other, fuzzy decision classes are described as follows. 2.2
Fuzzy Decision Classes
Fuzzy decision classes corresponding to a kth decision feature dk , k = 1, 2, . . . , c are based on the membership values of patterns belonging to c classes. The → x i ), is defined as membership of ith pattern to kth class, denoted by μk (− → μk (− x i) =
1 1 + ( Zfik )fe d
,
(2)
4
A. Ganivada
where Zik is a weighted distance, and fd and fe are the denominational and exponential fuzzy generators controlling the amount of fuzziness in the class membership lying in [0, 1]. The values of fd and fe are chosen as fe = 1 & fd → = 5. It may be noted that a vector − x i is a ith pattern consisting of n number → − of conditional features and x i is expressed as xij . The weighted distance Zik is defined as
n xij − Okj 2
Zik = , for k = 1, 2, . . . , c, (3) Vkj j=1 where Okj and Vkj denote the mean and standard deviation of kth class, respectively. The fuzzy decision classes using membership values of the patterns corresponding to a decision feature dk are defined as [8] 1. The membership values of patterns in the kth class to its own class are represented as → DDkk = μk (− x i ), if k = u, and (4) 2. The membership values of patterns in the kth class to other classes are denoted as (5) DDku = 1, if k = u where k and u = 1, 2, . . . , c. For any two patterns x1 and x2 ∈ U , with respect to a feature aj ∈ {d}, the fuzzy decision classes are defined as DDkk , if aj (x1 ) = aj (x2 ), Ra (x1 , x2 ) = (6) DDku , otherwise.
3
Proposed Fuzzy Similarity Relation and Fuzzy Rough Set
A new notion of fuzzy similarity relation is initially derived using the concepts of fuzzy set. The proposed fuzzy rough set, based on the new notion of fuzzy similarity relation, is then described in this section. 3.1
New Fuzzy Similarity Relation
→ − → → Let − x ki denote ith pattern in kth class and Ø k & − σ k be the mean and variance of kth class. The distance from ith pattern to the mean of kth class, denoted by Dki , is defined as → → μ k )2 (− x ki − − . (7) Dki = → − σk − → The membership value of ith pattern in kth class, represented by M ki , is calculated as − → 1 , (8) M ki = → − 1 + exp( D ki , w1 )
Generalized Fuzzy Rough Sets Based on New Fuzzy Similarity Relation
5
where w1 is a weighting factor chosen greater than or equal to 1. It controls class fuzzyness of a pattern. By using the membership value of a pattern i and the → − → mean − μ k of kth class, the influence value of a class k, denoted by λ k , is defined as − → − − → → − (→ x ki − Ø k )2 exp(M ki , w1 ) , (9) λk = i − → exp( M , w ) ki 1 i k = 1, 2, . . . , c. In Eq. 9, calculation of the influence value for every class c involves the weighted product of squared distance from a pattern to its class mean and exponential value of pattern membership. Here, the weighting value is the sum of exponential values of memberships of the patterns. The similarity value between a pair of patterns xik & y corresponding to jth feature aj is defined as aj (Ski ) =
|aj (xki ) − aj (y)| + aj (λk ) , i = 1, 2, . . . , n; aj (λk )
(10)
where y ∈ U and λk is influence value of a class k (= 1, 2, . . . , c). A fuzzy similarity relational matrix based on aj (Ski ) for jth feature aj is defined as Raj (xki , y) =
1 , 1 + exp(aj (Ski ), w1 )
(11)
i = 1, 2, . . . , n; k = 1, 2, . . . , c. Equation 11 generates fuzzy similarity matrix of the size n×n. Every row in the matrix is represented as a fuzzy equivalence class. The new fuzzy similarity relation (Eq. 11) and fuzzy decision classes (Eq. 6) are used in generalization of the notion of fuzzy rough set as follows. 3.2
Generalized Fuzzy Rough Set
We use the preliminary concepts of fuzzy rough set like a set of patterns xi ∈ U , conditional features aj ∈ F and decision features dk in this section. For jth conditional feature aj ∈⊆ F, the lower and upper approximations of a set are defined as (12) (Raj ↓ Rdk )(xi ) = inf y∈U I(Raj (xki , y), Rdk (xi )), (Raj ↑ Rdk )(xi ) = supy∈U T (Raj (xki , y), Rdk (xi )),
(13)
for all xi , i = 1, 2, . . . , m; k = 1, 2, . . . , c. The fuzzy similarity relation Raj (xki , y) and fuzzy decision class Rdk (xi ) in Eqs. 12 and 13 are obtained from Eqs. 11 and 6, respectively. Here, xki is a membership of ith pattern in the kth class. The fuzzy lower approximation in Eqs. 12 and upper approximation in Eq. 13 are characterized by a generalized fuzzy rough set.
6
3.3
A. Ganivada
Roughness and Entropy
Let Xk , k = 1, 2 . . . c denote kth set of patterns. There exists c is the number of classes. For jth conditional feature aj ∈ F, the notions LSaj and U Saj are defined as m(xi )(Ra ↓ Rd )(xi ), i = 1, 2 . . . m. (14) LSaj = xi ∈Xq
U Saj =
m(xi )(Ra ↑ Rd )(xi ), i = 1, 2, . . . m.
(15)
xi ∈Xq
Here, m(xi ) is the actual membership value of ith pattern xi . Therefore, roughness of a set Xk is calculated as n j=1 LSaj , (16) R(Xk ) = 1 − n j=1 U Saj The R(Xk ) is quantification expression for determining correctness of pattern information in kth set Xk . The entropy of a set Xk is defined as F RE(Xk ) = −R(Xk )loge (R(Xk )).
(17)
The average of values of c sets is called final entropy denoted by E and it is defined as c 1 E= F RE(Xq ) . (18) c q=1 The properties of entropy E are discussed as in [8].
4
Theoretical Properties of Fuzzy Rough Set and an Example
We discuss theoretical properties of fuzzy similarity relation and fuzzy rough set in the following sections. 4.1
Properties of Fuzzy Similarity Relation
A fuzzy binary relation R on U is called a similarity relation R iff R is 1. reflexive: R(x, x) = 1, ∀x ∈ U . 2. symmetric: R(x, y) = R(y, x), ∀x, y ∈ U . 3. transitive: R(x, y) ≥ sup minz∈U {R(x, z), R(z, y)}, ∀x, y ∈ U . Theorem 1. Let U be a nonempty universe. For every t-norm T and x and y ∈ A ⊆ U and R is fuzzy similarity relation, R(x, z) = supz∈A T (R(x, z), R(z, y)).
(19)
Proof of the theorem is found in [3]. Theorem 2. Let I be an R-implicator. R is fuzzy similarity relation on U , R(x, y) = infz∈A I(R(x, z)R(z, y)), ∀x, y ∈ A ⊆ U. Its proof is available in [3].
(20)
Generalized Fuzzy Rough Sets Based on New Fuzzy Similarity Relation
4.2
7
Properties of Generalized Fuzzy Rough Set
The properties of the generalized fuzzy rough set in fuzzy upper approximation space [3] are discussed as follows: Let X and Y ⊆ U be two fuzzy sets, conditional feature a ∈ F and decision feature d . 1. 2. 3. 4. 5. 6. 7. 8. 9.
If X = ∅ then (Ra ↓ Rd )(X) = ∅ = (Ra ↑ Rd )(X). If X = U then (Ra ↓ Rd )(X) = U = (Ra ↑ Rd )(X). For all x ∈ X, (Ra ↓ Rd )(x) ⊆ X ⊆ (Ra ↑ Rd )(x). If X ⊆ Y then (Ra ↓ Rd )(X) ⊆ (Ra ↓ Rd )(Y ) and (Ra ↑ Rd )(X) ⊆ (Ra ↑ Rd )(Y ). (Ra ↓ Rd )(X c ) ⊆ ((Ra ↑ Rd )(X))c and (Ra ↑ Rd )(X c ) ⊆ ((Ra ↓ Rd )(X))c , where X c = U − X. (Ra ↓ Rd )(X ∩ Y ) = (Ra ↓ Rd )(X) ∩ (Ra ↓ Rd )(Y ) and (Ra ↑ Rd )(X ∪ Y ) = (Ra ↑ Rd )(X) ∪ (Ra ↑ Rd )(Y ). (Ra ↓ Rd )(X ∪ Y ) ⊇ (Ra ↓ Rd )(X) ∪ (Ra ↓ Rd )(Y ) and (Ra ↑ Rd )(X ∩ Y ) ⊆ (Ra ↑ Rd )(X) ∩ (Ra ↑ Rd )(Y ). (Ra ↓ Rd )((Ra ↓ Rd )(X)) = (Ra ↑ Rd )((Ra ↓ Rd )(X)) = (Ra ↓ Rd )(X). (Ra ↓ Rd )((Ra ↑ Rd )(X)) = (Ra ↑ Rd )((Ra ↑ Rd )(X)) = (Ra ↑ Rd )(X).
4.3
Example of New Fuzzy Similarity Relation and Fuzzy Rough Set
Mathematical simulation of fuzzy similarity matrix, fuzzy lower and upper approximations, is explained using a example data [6]. The data is shown in Table 1. Table 1. Example data a1
a2
a3
c
x1 −0.4 −0.3 −0.5 1 x2 −0.3 −0.4 −0.3 1 x3 0.2
0
x4 −0.4 0.2
0
1
−0.1 2
x5 0.3
−0.3 0
2
x6 0.2
−0.3 0
2
First, fuzzy decision classes for two classes denoted by c1 & c2 are defined using Eq. 6. Therefore, c1 = {0.769, 0.817, 0.680, 1.0, 1.0, 1.0}; c2 = {1.0, 1.0, 1.0, 0.672, 0.791, 0.815}. The two sets of values constitute two decision attributes. Then, fuzzy similarity matrices for conditional features a1 , a2 and a3 are calculated using Eq. 11. As example, a similarity matrix for conditional attribute a1 is provided in Table 2. The size of the matrix is 6 × 6 as there are 6 patterns
8
A. Ganivada Table 2. Similarity matrix for conditional feature a1 1.000 0.888 0.571 1.000 0.533
0.571
0.888 1.000 0.615 0.888 0.571
0.615
0.571 0.615 1.000 0.571 0.888
1.000
1.000 0.888 0.571 1.000 0.5333 0.571 0.533 0.571 0.888 0.533 1.000
0.888
0.571 0.615 1.000 0.571 0.888
1.000
belonging to two classes c1 & c2 . The matrix satisfies the properties described in Sect. 4.1. Lower and upper approximations: Let X = {x1 , x2 , x3 } and Y = {x4 , x5 , x6 } be two sets of patterns. The X and Y correspond to classes c1 and & c2 , respectively. For i = 1, 2, . . . , 6; k = 1 and 2, Raj (xki , y) and Rdk (xi ) in Eq. 12 and in Eq. 13 are represented with {1.000, 0.888, 0.571, 1.000, 0.533, 0.571} (see Table 2) and {0.769, 0.817, 0.680, 1.0, 1.0, 1.0} (corresponding to c1 ), respectively. For i = 1 and k = 1, Eq. 12 implies that (Ra1 ↓ Rdk )(x1 ) = inf y∈U I(Ra (x11 , y), Rd (x11 )). (Ra1 ↓ Rdk )(x1 ) = min{I(1.000,0.769), I(0.888, 0.817), I(0.571,0.680), I(1.0,1.0), I(0.533,1.0), I(0.571,1.0)}. (Ra1 ↓ Rdk )(x1 ) = 0.7558. Similarly, we obtain (Ra2 ↓ Rdk )(x2 ) = 0.7558 and (Ra3 ↓ Rdk )(x3 ) = 0.7558. Lower approximation of a set X, i.e., (Raj ↓ Rdk )(X) = {0.7558, 0.7558, 0.7558}. In a similar way, memberships belonging to upper approximations of a set X are computed. So, (Raj ↑ Rdk )(X) = {0.9629, 0.8962, 0.8624}. The lower and upper approximations of a set Y are as follows: (Raj ↓ Rdk )(Y ) = {0.7592, 0.7513, 0.7459} and (Raj ↑ Rdk )(Y ) = {0.9629, 0.9333, 0.9629}. It may be noted that, in computing the lower and upper membership values of a set Y , the above similarity matrix and fuzzy decision classes {1.0, 1.0, 1.0, 0.672, 0.791, 0.815} are presented with Eqs. 12 and 13. The resultant lower and upper memberships of sets X & Y are used to calculate roughness and entropy of those sets. Example for Roughness and Entropy: The values of toughness 16 for sets X&Y are obtained as 0.1535 and 0.1390, respectively. Whereas, using the fuzzy rough set [8], these are 0.1589 and 0.4497 for sets X&Y , respectively. Lower value of roughness of a set (closer to 0) signifies that the set is compact in terms of actual pattern information. Hence, the sets X&Y using of Eq. 16 become more compact than using [8]. From the roughness values, the entropy is calculated. The values of entropy for sets X&Y are 0.2876 & 0.2743, respectively, and the entropy E using Eq. 18 for the data is 0.2812. The entropy E is compared
Generalized Fuzzy Rough Sets Based on New Fuzzy Similarity Relation
9
with fuzzy rough entropy [8]. The values of fuzzy rough entropy (FRE) [8] for sets X&Y are 0.2251 and 0.4030, respectively. The FRE for the data is 0.3140 (average of values of FRE). The entropy E for the data outperforms fuzzy rough entropy [8] as the former has low values. Therefore, the entropy E is more effective for handling uncertainty in overlapping regions between classes.
5
Conclusion
In this study, a fuzzy similarity relation is newly formulated using a fuzzy influence value of a class and a membership function. The concepts of fuzzy set to determine the influence value are used. The lower and upper approximations of a fuzzy set use fuzzy similarity relation in their generalization. The generalized lower and upper approximations characterized generalized fuzzy rough set. Further, entropy measure based on the generalized fuzzy rough set is proposed. Theoretical properties of rough set which the fuzzy rough set satisfies are discussed. The process of calculating similarity relations, lower and upper approximations, and entropy are explained using a real life data as an example. The entropy values as compared to fuzzy rough entropy are shown. Superior performance of the entropy over fuzzy rough entropy is seen to be obtained. In future investigation, fuzzy rough theoretic methods for rule generation and feature selection, based on the proposed similarity relation, would be developed. Further, applications of real life data, including microarrays, to those methods would be provided.
References 1. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic, Dordrecht (1991) 2. Pawlak, Z.: Hard and soft sets. In: Ziarko, W.P. (ed.) Rough Sets, Fuzzy Sets and Knowledge Discovery, pp. 130–135. Springer, Heidelberg (1994) 3. Radzikowskaa, A.M., Kerreb, E.E.: A comparative study of fuzzy rough sets. Fuzzy Sets Syst. 126, 137–155 (2002) 4. Mieszkowicz-Rolka, A., Rolka, L.: Variable precision fuzzy rough sets. In: Slezak, D., Wang,G., Szczuka, M., Duntsch, I., Yao, Y. (eds.) Transactions on Rough Sets I, LNCS, vol. 3100, pp. 144–160. Springer, Germany (2004) 5. Hu, Q.H., Yu, D.R., Xie, Z.X., Liu, J.F.: Fuzzy probabilistic approximation spaces and their information measures. IEEE Trans. Fuzzy Syst. 14(2), 191–201 (2006) 6. Jensen, R., Shen, Q.: New approaches to fuzzy-rough feature selection. IEEE Trans. Fuzzy Syst. 17(4), 824–838 (2009) 7. Hu, Q.H., Yu, D.R., Xie, Z.X.: Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn. Lettt. 27(5), 414–423 (2006) 8. Ganivada, A., Ray, S.S., Pal, S.K.: Fuzzy rough granular self-organizing map and fuzzy rough entropy. Theoret. Comput. Sci. 466, 37–63 (2012) 9. Sen, D., Pal, S.K.: Generalized rough sets, entropy, and image ambiguity measures. IEEE Trans. Syst. Man Cybern. Part B 39, 117–128 (2009)
SV-NET: A Deep Learning Approach to Video Based Human Activity Recognition Sukrit Bhattacharya1, Vaibhav Shaw2, Pawan Kumar Singh3(&), Ram Sarkar3, and Debotosh Bhattacharjee3 1
Department of Electronics and Communication Engineering, SRM Institute of Science and Technology, Chennai, India [email protected] 2 Department of Computer Science and Engineering, Jalpaiguri Government Engineering College, Kolkata, India [email protected] 3 Department of Computer Science and Engineering, Jadavpur University, Kolkata, India [email protected], [email protected], [email protected]
Abstract. The automatic identification of physical activities performed by human beings is referred to as Human Activity Recognition (HAR). It aims to infer the actions of one or more persons from a set of observations captured by sensors, videos or still images. Recognizing human activities from video sequences is a much challenging task due to problems such as background clutter, partial occlusion, changes in scale, viewpoint, lighting, and appearance etc. In this paper, we propose a Convolutional Neural Network (CNN) model named as SV-NET, in order to classify human activities obtained directly from RGB videos. The proposed model has been tested on three benchmark video datasets namely, KTH, UCF11 and HMDB51. The results of the proposed model demonstrate improved performance over some existing deep learning based models. Keywords: Human activity recognition SV-NET Convolutional neural network Data augmentation Video datasets KTH HMDB51 UCF11
1 Introduction In recent times, automatic human activity recognition (HAR) has drawn much attention to the researchers in the field of video analysis because of its varied applications in surveillance, entertainment and health-care etc. The field has many applications including video surveillance systems, human-computer interaction, and robotics for human behaviour characterization. Most of the HAR methods assume a figure-centric scene of an uncluttered background, where the actor performs all the activities freely [1]. Generally, it is a challenging task to develop a fully automated HAR system which would be capable of classifying a person’s activity accurately. The challenges mainly seen in HAR methods are background clutter, partial occlusion, changes in scale, viewpoint, lighting and © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 10–20, 2021. https://doi.org/10.1007/978-3-030-49345-5_2
SV-NET: A Deep Learning Approach
11
appearance, and frame resolution [2]. Sometimes, the intra-class and inter-class similarities make the problem of classification even more challenging. For example, during performing the activities like ‘Running’ and ‘Jogging’, the actor seems to perform almost the same task. Recently, deep learning has achieved great success in many challenging research areas, such as image recognition and natural language processing. The key merit of the deep learning-based model is to automatically learn representative features from massive data. This research paradigm is proven to be a good alternative for solving the HAR problem instead of conventional machine learning approaches. In this paper, we propose a new CNN based model called SV-NET for the solution of the HAR problem. The model has been designed such that it turns out to be robust because of its ability to take input directly from the video data unlike many of the existing models and also be able to classify multiple activity classes. We train our SV-NET model by taking a sequence of video frames as the input to the model and we also compare the results in order to prove the robustness of our model. The basic flow diagram of the classification process of the SV-NET model has been shown in Fig. 1.
Fig. 1. Flow diagram of our proposed HAR methodology.
2 Related Study Based on the categorization of HAR methods as unimodal and multimodal, the authors in [3] proposed a structured temporal approach. They have used a stochastic technique that is HMMs [4] to model human actions as action units and then used the rule-based method that is grammatical rules to form a sequence of complex actions by combining
12
S. Bhattacharya et al.
different action units. Due to the high complexity of the models, problems arise in treating long video sequences when temporal grammars are used for action classification. Video sequence classification in terms of local features in a spatiotemporal environment has also been given much focus. The authors in [5] used unconstrained videos targeting at generating generic action proposals, where each action proposal corresponds to a temporal series of a spatio-temporal video tube, which is extracting bounding box candidates, where each candidate contains motion of human. As every action is executed by humans with meaningful motion, significant action paths were estimated using high action scores of the video tubes, which is measured by utilizing both appearance and motion cues. These candidates may overlap due to large spatiotemporal redundancy in videos. To address such problems estimation of maximum set coverage was applied. Another approach using untrimmed video sequences, the authors in [7] performed isolated gesture recognition based on a per-frame representation of videos and template representation of actions were proposed. Dynamic programming approach was used to recognize sequences. Sequences are then aligned using dynamic time warping [9] to classify different activities. In unimodal space-time methods, a vocabulary-based approach [8] was used by the authors in [10] to construct a treestructured vocabulary of similar actions after extraction of spatio-temporal segments from video sequences corresponding to the whole part of human motion. Few approaches showed that unsupervised learning can also take part in solving challenges related to HAR Authors in [6] trained a model such that human actions are arranged in chronological order using unsupervised manner by exploiting temporal order in video sequences. Different classification algorithms in the field of machine learning were employed whereas CNN and RNN models were tested in the deep learning domain. Deep or hierarchical learning techniques outperform traditional machine learning methods. As the ability of the performance of the classifiers to large-scaled with increased in quantities of data and learn by creating a more abstract representation of data as the network grows deeper. As a result of that, the model automatically extracts features and optimizes it, which greatly explains their outstanding performance and higher accuracy results. The basic idea behind building a model using a CNN as it uses relatively little pre-processing compared to other classification algorithms that help the network learns the filters in an efficient manner. Basically, this independence from prior knowledge in feature design is a major functionality provided by the network. Additionally, CNN architectures are easy to build, simply adding multiple layers of convolution and subsampling in an alternating manner. CNN’s when applied to video analysis problems, it is relevant to recognize the motion features encoded in a stack of contiguous frames. To this end, the increasing interest in exploring and analyzing human activities for recognition and the introduction of spatio-temporal relationship in videos become a need and motivates us to use a CNN model. Finally, we propose to perform 3D convolution architecture called SV-NET, to compute necessary features and information from spatial and temporal dimensions of the dataset.
SV-NET: A Deep Learning Approach
13
3 Proposed Architecture In this section, the proposed approach and its related components are discussed. Our CNN based model takes a video frame and has the ability to learn visual patterns, directly from the pixels of the given data. The proposed model uses a 3D convolutional layer (C3D) architecture named as SV-NET. This architecture comprises the trainable filters and the pooling operations which basically capture all the changes in terms of spatial and temporal information. The model’s architecture consists of 6 layers of 3D convolutional layers and max-pooling layers placed alternatively to each other. The architecture of the model has been shown in Fig. 2. Training a 6 layered model is a challenging task as they can be sensitive to the initial random weights and configuration of the learning algorithm. Thus to improve the performance and stability of the deep neural networks, the technique of batch normalization is used. We set the 3D convolutional layer and max pooling layer kernel size as d k k where d is the kernel temporal depth and k is the kernel spatial size. The 3D convolution is achieved by convolving a 3D kernel to a cube formed by stacking multiple contiguous frames together. By this construction, the feature maps in the convolutional layer are connected to multiple contiguous frames in the previous layer, thereby capturing motion information.
Fig. 2. Complete architecture of our proposed SV-NET model.
The C3D network has 6 convolution layers and 6 pooling layers (each convolution layer is immediately followed by a pooling layer), 1 flattening layer, 3 fully connected (FC) layers and a softmax layer. After 1 layer of CNN and max pooling layer, batch normalization is employed to standardize the inputs to a layer at each mini batch. The numbers of channels (filters) for 6 convolution layers from 1 to 6 are 16, 32, 64, 256, 1024 and 1024 respectively. The ratio represents the spatial map size ratio. All convolution layers have 3 3 3 convolution filters and 1 1 1 stride size. All pooling layers from the pool_1 to pool_6 have 2 2 2 pooling kernels with stride size of 2 2 2, which basically means that the size of the output signal is reduced by a factor of 8. The specifications of the convolutional and the pooling layers have been shown in Table 1. The output coming out of each convolutional layer is a volume which is called feature maps. The pooling layers have the same number of features maps as the convolutional layers but with a reduced spatial resolution. The pooling
14
S. Bhattacharya et al.
layers also introduce scale-invariant features. There are three FC layers each of which has 256 outputs and finally, a softmax layer is used to predict action labels. Table 1. Specification of the convolutional and the pooling layers in the SV-NET architecture. Layer
Conv 1
Conv 2
Conv 3
Conv 4
Conv 5
Conv 6
Size
3x3x3
3x3x3
3x3x3
3x3x3
3x3x3
3x3x3
Stride
1x1x1
1x1x1
1x1x1
1x1x1
1x1x1
1x1x1
Channel
16
32
64
256
1024
1024
Layer
Pool 1
Pool 2
Pool 3
Pool 4
Pool 5
Pool 6
Stride
2x2x2
2x2x2
2x2x2
2x2x2
2x2x2
2x2x2
FC 7
FC 8
FC 9
Softmax
256
256
256
Layer
4 Results and Analysis In order to validate the effectiveness of our proposed CNN based model, SV-NET, several experiments have been conducted on the three standard benchmark video datasets i.e. KTH [11], UCF11 [12] and HMDB51 [13]. A high-end workstation is employed for the training of the CNN model. The workstation basically operates on the Intel Xeon processor and runs on 32 GB of main memory. It is also powered by the 16 GB NVIDIA Quadro P5000 GPU. Each of the three datasets has been proportionately divided into train and test data. We have used a threefold cross-validation process for evaluating our proposed model. This implies that the videos in the train data comprise 3/4th of the entire dataset whereas the test data comprises 1/4th of the whole dataset. Moreover, to train the model more efficiently, the data must be diverse. Therefore, the process of data augmentation is used by cropping the random sequences of the consecutive frames. It is basically a strategy that enables practitioners to significantly increase the diversity of the data frames available for training models, without actually collecting new data frames. The proposed SV-NET model has been evaluated using the following four wellknown standard performance measures namely, classification accuracy, Precision, Recall and F-Measure. Tables 2 and 3 show these measures calculated activity-wise for KTH and UCF11 datasets respectively. As evident from Tables 2 and 3, the proposed model classified KTH and UCF11 with mean accuracies 93.6% and 84.59% respectively whereas for HMDB51 dataset, the overall classification accuracy is found to be 65.9%. Figure 3 shows the graphical comparison of the individual classification accuracies calculated for each of the 51 activity classes of HMDB51 dataset. Higher accuracy is mainly achieved by hyper-parameter optimization and data augmentation techniques. Parameters optimization is basically done by trial and error method. The confusion matrices for KTH and UCF11 datasets are also shown in Fig. 4 (a) and (b) respectively.
SV-NET: A Deep Learning Approach
15
Table 2. Performance calculation in terms of classification accuracy, precision, recall and F1 values using SV-NET model for KTH dataset [11]. Activity
Performance measures Classification accuracy ‘Boxing’ 0.967 ‘Handclapping’ 0.947 ‘Handwaving’ 0.944 ‘Jogging’ 0.909 ‘Running’ 0.909 ‘Walking’ 0.941 Average 0.936
Precision 0.97 0.95 0.94 0.83 0.95 0.97 0.935
Recall 0.97 0.95 0.94 0.91 0.91 0.94 0.937
F-Measure 0.97 0.95 0.94 0.87 0.93 0.96 0.937
Table 3. Performance calculation in terms of classification accuracy, precision, recall and F1 values. using SV-NET model for UCF11 dataset [12]. Activity
Performance measures Classification accuracy ‘Basketball’ 0.935 ‘Biking’ 0.948 ‘Diving’ 0.891 ‘Golf swing’ 0.902 ‘Horse riding’ 0.792 ‘Soccer juggling’ 0.750 ‘Swing’ 0.852 ‘Tennis swing’ 0.812 ‘Trampoline jumping’ 0.807 ‘Volleyball spiking’ 0.821 ‘Walking’ 0.777 Average 0.846
Precision 0.94 0.84 0.89 0.88 0.82 0.80 0.88 0.89 0.81 0.72 0.81 0.843
Recall 0.94 0.95 0.89 0.90 0.79 0.75 0.85 0.81 0.81 0.82 0.78 0.844
F-Measure 0.94 0.89 0.89 0.89 0.81 0.77 0.87 0.85 0.81 0.77 0.79 0.843
For the KTH dataset, as observed from Table 2, the highest accuracy has been clinched by the class ‘Boxing’ with 97% while the lowest accuracies has been achieved by both the classes ‘Jogging’ and ‘Running’ which turned out to be 90.9%. This is probably due to the fact that both the activity classes have the same kind of actions performed by the human subjects thus, confusion occurs while classification which is evident from Fig. 4(a). From Table 3, we noticed that the class ‘Biking’ has attained the highest accuracy of 94.8% while ‘Soccer juggling’ merely achieved the accuracy of 75% for the UCF11 dataset. The probable reason of this low accuracy is that the model could not recognize the juggling of football and misclassified it with ‘Walking’ as observed in the confusion matrix shown in Fig. 4(b). Finally, for HMDB51 dataset, as inferred from Fig. 3, the highest accuracy of 94% has been acquired by ‘Shoot balls’ while both the activity classes ‘talk’ and ‘turn’ attained the lowest accuracy of only
16
S. Bhattacharya et al.
Fig. 3. Graphical comparison of classification accuracies achieved for 51 activity classes of HMDB51 dataset [13].
Fig. 4. Confusion matrix obtained for: (a) KTH and (b) UCF11 datasets.
40% each. The class ‘talk’ has been wrongly classified with the ‘smile’ class as both of them possess nearly same type of pose while performing the action. Thus, similar misclassifications are mainly responsible for the overall decrement in the accuracy achieved on HMDB51 dataset. Table 4 shows the average time required for executing the proposed SV-NET model on three action datasets. It can be seen from Table 4 that the KTH dataset, on an average, takes about 3180 s to be executed completely. On the other hand, UCF-11
SV-NET: A Deep Learning Approach
17
dataset being a slight larger dataset requires about 4320 s whereas the HMDB51 dataset took the highest amount of time which is about 6420 s as it is the largest dataset (in terms of the number of action classes) among all the three datasets. Table 4. Average time taken for execution of the SV-NET model on KTH, UCF11 and HMDB51 action datasets. Dataset KTH [11] UCF11 [12] HMDB51 [13]
Classification accuracy (in %) 93.60 84.59 65.60
Average Time taken for execution (in seconds) 3180 4320 6420
We have compared the results of SV-NET obtained over the KTH, UCF11 and HMDB51 datasets with some existing methods which are shown in Table 5. It is obvious from Table 5 that the proposed SV-NET model has performed better than some of the existing works done on three action recognition datasets. Table 5. Comparison of the proposed model with existing HAR methods for KTH, UCF11 and HMDB51 datasets. Data-set
Authors
Method
Classification accuracy (%) KTH [11] Grushin et al. [14] STIP with HOF + RNN + LSTM 90.7 Naveed et al. [15] Heterogeneous features and sequential 91.99 minimal optimization Lang et al. [16] STIP with HOG + SVM 92.13 Zhang et al. [17] Slow feature analysis 93.33 Akilandasowmya KNN Classifier 93.50 et al. [18] Proposed Model SV-NET Model 93.6 UCF11 [12] Hasan et al. [19] STIP + Gist3D + AB 54.51 Liu et al. [12] Motion queues + static 71.2 features + PageRank Techniques Ikizler-Cinbis MIL based framework 75.2 et al. [20] Wang et al. [21] Trajectories + HOG + HOF + MBH 84.2 Proposed Model SV-NET Model 84.59 HMDB51 [13] Simonyan et al. Two-stream CNN 59.4 [22] Wu et al. [23] HOF + MBH + Event model + BoW 49.86 Lan et al. [24] Multi-skip features 63.9 Proposed Model SV-NET Model 65.6
18
S. Bhattacharya et al.
It can be observed that with an increasing number of activity classes, choosing optimized parameters and augmentation techniques becomes difficult. This can said from the outcome obtained over the HMDB51 dataset as the model shows relatively less performance. Also, considering video sequences containing human activities in different frames, it is inevitable that fewer frames of video sequences of classes namely, ‘ride bike’, ‘ride horse’, ‘walk’, ‘run’ do not contain any activity related to the subject and similar action classes namely, ‘climb’ and ‘climb stairs’, ‘kick’ and ‘kickball’, ‘sword’ and ‘sword exercise’ may lead to misclassification. Therefore, the proposed model may have extracted unnecessary features from empty frames and faced difficulties in differentiating similar activities result in the downfall of the overall accuracy. Overall, with the obtained accuracies, it can be inferred that the proposed model is robust in terms of classifying datasets of fewer classes.
5 Conclusion In this paper, we have provided a comprehensive overview of the HAR research and presented a CNN model for the HAR problem. The model we have proposed is called SV-NET. The architecture of our model has been designed in such a way that it would be able to handle any number of classes fed into it. Our model directly takes input from the video frames and classifies the actions quite accurately. SV-NET has achieved accuracies of 93.6%, 84.59% and 65.60% on KTH, UCF11 and HMDB51 datasets respectively. The experimental results indicate the robustness of the model and it also outperforms many existing models. As a limitation, we can say that our model faces difficulties while classifying some of the action classes having the similar kind of actions such as the ‘Jogging’ and ‘Running’. In the future scope of work, classes of activities will be diversified and the model would be trained on other datasets also and different aggregations on the activities will be tested. Moreover, videos having multiple view angles would be included for experimentation. In addition, hyper-parameter tuning can be carried out over the same set of activities in order to uplift the classification performance and reduce the computational complexity of both the training and inference phases. Conflict of Interests. The authors declare that there is no conflict of interests regarding the publication of this paper.
References 1. Gupta, A., Davis, L.S.: Objects in action: an approach for combining action understanding and object perception (2007) 2. Alahi, A., Ramanathan, V., Fei-Fei, L.: Socially-aware large-scale crowd forecasting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2203–2210 (2014)
SV-NET: A Deep Learning Approach
19
3. Kuehne, H., Arslan, A., Serre, T.: The language of actions: recovering the syntax and semantics of goal-directed human activities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 780–787 (2014) 4. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006) 5. Yu, G., Yuan, J.: Fast action proposals for human action detection and search. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1302–1311 (2015) 6. Fernando, B., Gavves, E., Oramas, J.M., Ghodrati, A., Tuytelaars, T.: Modeling video evolution for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5378–5387 (2015) 7. Kulkarni, K., Evangelidis, G., Cech, J., Horaud, R.: Continuous action recognition based on sequence alignment. Int. J. Comput. Vision 112(1), 90–114 (2015) 8. Kovashka, A., Grauman, K.: Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2046–2053. IEEE, June 2010 9. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, Boston (2008) 10. Ma, S., Sigal, L., Sclaroff, S.: Space-time tree ensemble for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5024–5032 (2015) 11. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3, pp. 32–36. IEEE, August 2004 12. Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: CVPR, June 2009 13. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: 2011 International Conference on Computer Vision, pp. 2556–2563. IEEE, November 2011 14. Grushin, A., Monner, D.D., Reggia, J.A., Mishra, A.: Robust human action recognition via long short-term memory. In: The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, August 2013 15. Naveed, H., Khan, G., Khan, A.U., Siddiqi, A., Khan, M.U.G.: Human activity recognition using mixture of heterogeneous features and sequential minimal optimization. Int. J. Mach. Learn. Cybern. 10(9), 2329–2340 (2019) 16. Wang, X., Wang, L., Qiao, Y.: A comparative study of encoding, pooling and normalization methods for action recognition. In: Asian Conference on Computer Vision, pp. 572–585. Springer, Heidelberg, November 2012 17. Akilandasowmya, G., Sathiya, P., AnandhaKumar, P.: Human action analysis using K-NN classifier. In: 2015 Seventh international conference on advanced computing (ICoAC), pp. 1–7. IEEE, December 2015 18. Zhang, Z., Tao, D.: Slow feature analysis for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 3, 436–450 (2012) 19. Hasan, M., Roy-Chowdhury, A.K.: Incremental activity modeling and recognition in streaming videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 796–803 (2014) 20. Ikizler-Cinbis, N., Sclaroff, S.: Object, scene and actions: combining multiple features for human action recognition. In: European Conference on Computer Vision, pp. 494–507. Springer, Heidelberg, September 2010 21. Wang, H., Kläser, A., Schmid, C., Liu, C.-L.: Action recognition by dense trajectories, June 2011
20
S. Bhattacharya et al.
22. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014) 23. Wu, J., Hu, D.: Learning effective event models to recognize a large number of human actions. IEEE Trans. Multimedia 16(1), 147–158 (2013) 24. Lan, Z., Lin, M., Li, X., Hauptmann, A.G., Raj, B.: Beyond Gaussian pyramid: multi-skip feature stacking for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 204–212 (2015)
Machine Learning Based Framework for Recognizing Traffic Signs on Road Surfaces Any Gupta
and Ayesha Choudhary(B)
Jawaharlal Nehru University, Delhi, India [email protected], [email protected]
Abstract. We propose a novel method for detection and classification of markings on the surface of the road using a camera placed inside the vehicle. Road surface markings can be defined as the text and symbol drawn on the road surface such as, stop signs, zebra crossing, pedestrian crossing, direction arrows, etc. The surface markings are differed from traffic signs which are situated on the sides of the road. These surface markings give great contribution in increasing driving safety in Advanced Driver Assistant Systems (ADAS) by providing guidance and giving the right information to driver about the markings. Their contribution also includes enhancing localization and path planning in ADAS. In our framework, we use unsupervised learning for the detection of road surface markings using clustering method and Support Vector Machine (SVM) classifier for recognizing the surface markings. Our framework performs well for almost all types of surface markings including their sizes and orientation. We have done experiments on two road surface markings datasets, dataset [17] and dataset [18] and compare it with a previous proposed method. Our experiments show that our real-time framework is robust and accurate. Keywords: Road surface markings recognition · Machine learning · Computer vision · Advanced driver-assistance system
1 Introduction In this paper, we propose a computer vision and machine learning based novel framework for recognizing markings on the surface of the road for an Advanced driverassistance system (ADAS). Our system processes the view captured by a dashboard camera for recognizing these markings on the road such as, stop signs, pedestrian crossing, direction arrows, etc. These markings are different from traffic signs which are situated on the sides or top of roads. The road surface markings can be catagorized or specified in different ways by countries. Road surface marking recognition is to identify the type of surface marking by giving it the right label. The process of extracting the region of interest also leads to extraction of lane markings and other noise which is eliminated in next step using their size and orientation. The purpose of doing this is to reduce the computation time of our system by reducing the number of irrelevant markings to be classified by SVM classifier. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 21–30, 2021. https://doi.org/10.1007/978-3-030-49345-5_3
22
A. Gupta and A. Choudhary
The resulted markings may include some lanes markings with road surface markings. All these markings are fed into SVM classifier for correctly identification of the type of surface markings. The road surface marking is classified with the matched class, whereas, the lane marking or any other irrelevant object is recognized as ‘unknown class’. It is important for the driver to follow the signs on the road surface and therefore, detecting and recognizing these markings form an important part of ADAS. It gives guidance to the driver by giving correct information regarding the surface markings and hence reducing the unwanted accidents due to driver’s error during wrong lane departure. The surface markings is an important part of autonomous vehicle for safety as well as building road feature map for vehicle localization and path planning. For building the accurate road feature map, the static information of markings on the surface of the road is necessary. The devices which are helpful for getting the static information is known as, Global Positioning System (GPS). GPS is a device which can be helpful for getting the position data in ADAS but it is not reliable because it can lose signal in between of localization process. Therefore, cameras can be the right choice for extracting the surface markings information and enhancing the localization estimation. However, it is very challenging to develop a robust and accurate road surface markings recognition due to variation in size and orientation of markings. There can be occlusion present in surface markings due to vehicle and high volume of traffic. The surface markings recognition process can also be affected by the weather conditions and illumination variation due to shadows present in road, sunlight, etc. In spite of having all these challenges, we propose a real-time method for recognizing the markings on the surface of the road i.e. accurate and robust. Section 2 explains related work of road surface markings, Sect. 3 explains our proposed framework, Sect. 4 has the details of experiments done by us and Sect. 5 conclude the paper.
2 Related Work Haohao et al. [1] applied Histogram of oriented gradients (HoGs) and Optical Character Recognition (OCR) for road markings detection and artificial neural network for its classification. They used the road markings features for calculating local displacement error in aerial domain and sensor domain. Jung et al. [2] applied RANSAC technique and random sampling for extraction of road surface markings. They also applied morphological operations and image segmentation for avoiding small irrelevant objects present in the image. Wen et al. [3] used a modified U-net model for surface markings extraction and Convolutional Neural Networks (CNN) for classifying them. They also applied Generative Adversarial Network (GAN) for completing the missing road surface markings. Bahman et al. [4] used road surface markings templates for mapping and localization after converting these templates to 3-dimension. Suhr et al. [5] proposed a framework for directional and stop-markings for vehicle localization. They apply histogram of oriented gradient (HOG) and RANSAC technique for detection of markings and trained total error rate based classifier for recognizing them. Wu et al. [6] converts
Machine Learning Based Framework for Recognizing Traffic Signs on Road Surfaces
23
RGB image into Hue, Saturation, and Value (HSV) format for detection of markings such as, pedestrian crossings, arrows, and numeric markings and propose a framework for vehicle localization. Ishida et al. [7] proposed a map generation method using road surface markings. They first used multi-frame sparse tensor voting for surface markings feature extraction and then contour is detected using the tensor field. Deng et al. [8] integrated three color spaces of RGB, HSV and CIE L*a*b* for road surface markings detection. Bailo et al. [9] applied maximally stable extremal regions (MSER) technique for feature extraction of road surface markings which are further get clustered using density-based clustering. Surface markings are recognized using machine learning approaches. Ziqiong et al. [10], Greenhalgh et al. [12] and Philippe et al. [11] used IPM for markings detection and classification. Greenhalgh et al. [12] proposed a method for text-based and symbol-based markings detection and classification on the surface of road. The text-based road surface markings are recognised by optical character recognition (OCR) technique whereas, the symbol-based markings are recognized by applying support vector machine (SVM) classifier. We also applied SVM for the road surface markings classification and compare our method with [12] work and show it in experimental section. MSER technique is also applied by Wu et al. [13] for road surface markings detection and template matching is performed for the recognition of markings. Hyeon et al. [14] applied a difference-of-Gaussian based method for extracting a connected component set of road surface markings, which are further grouped for classification task. Random Forest classifier is used for classification of road surface markings. Their system works efficiently on images of around-view monitoring system. Liu et al. [15] presented a road surface markings recognition system using IPM. The prior information of makings is used to find out the candidate regions. The non-marking candidate regions are removed by Adaboost classifier and a learning classifier including HOG feature is applied to classify the road surface markings. Chen et al. [16] detected the road surface markings using binarized normed gradient (BING) method and PCA network (PCANet) is applied for road surface markings classification.
3 Proposed Work In our framework, we recognize the markings on the surface of the road using support vector machine (SVM). SVM is trained on labeled road surface markings which are distinct from each other on the basis of orientation, perspective distortion, color, size. In the following subsections, we elaborate SVM classifier, some pre-processing to extract road surface markings from the frame, training of SVM model and finally surface markings recognition phase. 3.1 Background: Support Vector Machine (SVM) SVM is a discriminative two-class classifier which gives a optimal hyperplane of catagorized data as an output. Multiclass SVM clasifier is implemented by combining several 2-class SVMs. There are some parameters in SVM classifier which can be varied for achieving better classification results such as, regularization parameter,
24
A. Gupta and A. Choudhary
gamma, margin and kernel. Multiclass SVM basically works on binary classification using maximum margin value. In the case of two separable classes, the training sample set Q is labeled as {xi , yi }, where, xi is the feature vector of t training samples and yi is label of training samples. Q can be represented in Eq. 1. Q = {xi , yi }ti=1 , xi ∈ Rm , yi ∈ {−1, 1}
(1)
where, yi is given value “1” for one class and “−1” for the other class and m is the dimension of the vector xi . The motive of SVM is to find out an optimized hyperplane {w, b} which classifies the samples in two class lables and the hyperplane equation for both the class is represented as in Eq. 2 and Eq. 3. H1 : wT · xi + b ≥ 1, yi = +1
(2)
H2 : wT · xi + b ≤ −1, yi = −1
(3)
where, w is normal to the hyperplane and b is the deviation. Hence, we choose 2 which is simply the parameter margin hyperplane in such a way that maximizes w 2 subject to the w2 Eq. 2 and Eq. 3. If we consider positive Lagrange’s method of undetermined mutiplier for each of the above two inequalities, then the minimization problem is converted into the minimizing of L, which is given by Eq. 4. 2 t w L= − ∑ αi {yi wT · xi + b − 1} (4) 2 i=1
between the datasets. This margin can be maximized by minimizing
where, α is undetermined multiplier. After minimization optimization, we determine on which side vector x will lie on the hyperplane. The decision function to classify the vector is given by following Equation. f x = sign ∑ αi yi xi · x + b . We validate our classifier and calculate the accuracy with the fraction of class labels which is predicted by the classifier. 3.2
Preprocessing
The road surface markings classification system should work in real-time with high accuracy. Therefore, we reduce our search space by focusing only on that area in which markings are present which is the lower part of the image for reducing the computational time of our framework. The RGB image is further converted into monoscale image to improve the binarization of markings present on the road. After that, we also reduce noise present in the frames which includes irrelevant objects which we do not want to process because it increase the computation time of our system. We get better image without unnecessary sharp intensity regions after noise
Machine Learning Based Framework for Recognizing Traffic Signs on Road Surfaces
25
reduction. We perform binarization on this image using adaptive thresholding. The foreground section contains the relevant objects for our system. It may contain noise having high intensity value due to surrounding vehicles, sunlight reflection, fences or street light reflection. These noisy regions are removed in further step by our framework. The binarization on grayscaled images of dataset [17] is shown in Fig. 1.
Fig. 1. (a) The ROI from dataset [17], (b) the grayscaling of ROI, and (c) the binarization images after applying adaptive thresholding.
Fig. 2. Contour detection and markings extraction of dataset [17].
Now, we need to find out the connected components present in foreground regions which is also known as contours. These can vary in size because of variation in shapes and orientation of road surface markings. The very small and very large contours are considered as noise and we remove all these contours based on their area as they are not significant for our framework. Figure 2 shows the contour detection on dataset [17]. We get the connected components of lanes, markings on the surface of the road and some remaining noise. We try to extract only the markings other than lanes and noise. Usually, the length of lane markings is usually greater than the road surface markings unless it is zebra crossing or pedestrian crossing. Also, the lane markings are situated in a fixed range of orientation view and mostly the road surface markings comes in between the lane markings. We try to eliminate the lane markings and noise based on their length, area and orientation for reducing the computational time during SVM classification phase. We define a fix threshold on these parameters and filter out the noise and lane markings.
26
A. Gupta and A. Choudhary
After this filtering process, we may still get some lane markings along with road surface markings. We give all these markings to SVM classifier as an input for classification by extracting these markings using coordinates of their bounding box (Fig. 2). We use the orientation of the markings to rotate them for overcoming the problem of perspective distortion in such a way that the orientation of such contours is vertical. Now, this vertically oriented contour is sent to SVM classifier for the classification which is the next step of our framework. 3.3
Classifcation of Markings on the Surface of the Road
Training SVM. Multiclass SVM is trained using labeled data consisting of a large number of stop signs, arrows, other markings etc. on the road surface, captured in various scenarios such as, occlusion, illumination variations due to shadows, sunlight etc. The orientation and sizes of these surface marking may vary. We convert each road surface markings into binary image and remove the perspective distortion by rotating it to vertical axis (as discussed in Sect. 3.2). We convert the surface markings in same size by zero padding and make the set of unique markings, for example, a set of stop signs, another set of speed limit signs etc. Each image is converted in 1-dimensional vector and we create a table of these vectors. We also create another same size of table in which we notify the label of each road surface marking. Multi-class SVM is trained by taking both the tables as an input. This trained multi-class SVM is capable to classify multi-view road surface markings as the orientation of these markings depends on the view of camera. This does not affect the computation time of our system since it is done offline. Classification Using SVM. During the classification phase, each extracted orientation normalized contour is zero-padded to make it of the equal size as defined in training data and converted into a 1-dimensional vector. Now, each contour is fed into the multiclass SVM classifier to predict its class label. Multi-class SVM gives a classification prediction score and class label as an output. If the testing vector satisfies the threshold value of the prediction score, then it is considered as road surface marking and labeled with the matched class, otherwise it is considered as lane marking or noise and treated as an ‘unknown class’. We perform this step for all the filtered contours presented in an image and in this way, the markings on the surface of the road are classified spatiotemporally in real-time. Our framework is able to efficiently recognize distinct types of markings whether they are arrow markings or text based markings in various challenging scenarios.
4 Experimental Results and Discussions In this section, we describe the performance of our SVM based road surface markings recognition system on various datasets i.e. dataset [17] and dataset [18]. We describe each datasets separately with the experiments results. We perform our experiments on a Intel Core i5-7200U CPU having frequency of 2.5 GHz and 8 GB of memory.
Machine Learning Based Framework for Recognizing Traffic Signs on Road Surfaces
27
Fig. 3. Road surface markings recognition on dataset [17] in low illumination scenario.
Fig. 4. Confusion matrix on (a) Dataset [17], and (b) dataset [18]. It shows the true classification of each class on each dataset.
4.1 Dataset [17] Dataset [17] has total 1443 frames of size 320 × 240. This dataset has both types of markings, text markings such as, ‘35’, ‘PED’, ‘STOP’, ‘SCHOOL’, ‘XING’ etc. as well as directional arrows such as, ‘left turn’, ‘right turn’, ‘ahead’ etc.
Fig. 5. Precision Recall curves of (a) dataset [17] (b) dataset [18] respectively.
28
A. Gupta and A. Choudhary
The dataset has several frames with a lot of environmental variations such as, shadows, sunlight etc. which makes it very challenging to process. We train our SVM classifier on 60% labeled frames and perform validation on 20% frames. Then, from the rest of the images, we take 10 random sets of images for classification and calculate precision and recall and show the precision recall curve in Fig. 5(a). We get 99.78% classification accuracy as our framework performs well on challenging scenarios also. We show the accuracy of each class in Fig. 4(a). The road surface markings classification results are shown in Fig. 3. Table 1 shows the comparison between [19] and our proposed method. Table 1. Comparison performance of our framework with [19] in terms of recognition rate on dataset [17]. Methods
Average % recognition rate on test set
Ahmad et al. [19]
99.05%
Our proposed framework 99.78%
Fig. 6. Road surface markings recognition on dataset [18] having slightly faded markings due to rainy season, and night vision.
Table 2. Overall Accuracy of our system on the basis of Precision and Recall. Methods
Accuracy Precision Recall
Jack et al. [12]
93.54%
0.91
0.92
Our proposed framework 99.78%
0.93
0.94
Machine Learning Based Framework for Recognizing Traffic Signs on Road Surfaces
29
4.2 Dataset [18] Dataset [18] is a large dataset containing approximately 92000 frames which are extracted from 48 videos, captured in USA and Korea. These videos are very challenging in itself as these are captured in different variations of environmental conditions such as, fog, snow, and rain. Also, it is captured in various illumination conditions such as, tunnel light, shadows etc. We take 60% frames for training SVM and 20% labeled images are taken for validation. We perform testing of our model on rest of the images. We get 98.86% classification accuracy on this dataset. The classification performance on individual road surface marking is shown in Fig. 4(b). The road surface markings recognition on this challenging dataset is shown in Fig. 6. We also calculate the precision recall value and draw the curve which is shown in Fig. 5(b). Table 2 shows the overall performance of our framework. We also perform testing of our model on dataset [18] after training SVM on some frames of dataset [17] and vice-verse. We do this for hardening our SVM model for real-time scenarios as our road surface markings classification system should work efficiently in real-time. Our proposed system performs well in real-time as it takes 29 ms to process one frame.
5 Conclusion We have proposed a novel, computer vision and machine learning based method for recognizing markings on the surface of the road. We extract the foreground regions of an image and try to eliminate the noise and lane markings using their size and orientation. After elimination process, we may get lane markings along with road surface markings. We give these markings as an input to SVM classifier for classification. It classifies the extracted road surface marking by giving it correct label, whereas, the lanes and other objects such as noise etc. are classified as ‘unknown class’. We keep our system simple as we try to eliminate the objects on the basis of its basic properties before classification. Experimental results shows that we get higher accuracy using SVM classifier in our system. In future, we will try to explore deep learning techniques for road surface markings recognition and find out the performance of the system in real-time.
References 1. Hu, H., Sons, M., Stiller, C.: Accurate global trajectory alignment using poles and road markings. In: The Proceedings of IEEE Intelligent Vehicles Symposium (IV), June 2019 2. Jung, J., Che, E., Olsen, M.J., Parrish, C.: Efficient and robust lane marking extraction from mobile lidar point clouds. ISPRS J. Photogrammetry Remote Sens. 147, 1–18 (2019) 3. Wen, C., Sun, X., Guo, Y.: A deep learning framework for road marking extraction, classification and completion from mobile laser scanning point clouds. ISPRS J. Photogrammetry Remote Sens. 147, 178–192 (2019) 4. Soheilian, B., Qu, X., Bredif, M.: Landmark based localization : LBA refinement using MCMC-optimized projections of RJMCMC-extracted road marks. In: The Proceedings of IEEE Intelligent Vehicles Symposium (IV), pp. 940–947, June 2016
30
A. Gupta and A. Choudhary
5. Kyu Suhr, J., Gi Jung, H.: Fast symbolic road marking and stop-line detection for vehicle localization. In: The Proceedings of IEEE Intelligent Vehicles Symposium (IV), July 2015 6. Wu, T., Ranganathan, A.: Vehicle localization using road markings. In: The Proceedings of IEEE Intelligent Vehicles Symposium (IV), June 2013 7. Ishida, H., Kidono, K., Kojima, Y., Naito, T.: Road marking recognition for map generation using sparse tensor voting. In: The Proceedings of 21st International Conference on Pattern Recognition (ICPR), pp.1132–1135, November 2012 8. Deng, Z., Zhou, L.: Detection and recognition of traffic planar objects using colorized laser scan and perspective distortion rectification. IEEE Trans. Intell. Transp. Syst. 19(5), 1485– 1495 (2018) 9. Bailo, O., Lee, S., Rameau, F., Yoon, J. S., Kweon, I.S.: Robust road marking detection and recognition using density-based grouping and machine learning techniques. In: The Proceedings of IEEE Winter Conference on Applications of Computer Vision, pp. 760–768 (2017) 10. Liu, Z., Wang, S., Ding, X.: ROI perspective transform based road marking detection and recognition. In: The Proceedings of IEEE International Conference in Audio, Language and Image Processing (ICALIP), pp. 841–846 (2012) 11. Foucher, P., Sebsadji, Y., Tarel, J., Charbonnier, P., Nicolle, P.: Detection and recognition of urban road markings using images. In: The Proceedings of IEEE 14th International Conference on Intelligent Transportation Systems (ITSC), pp. 1747–1752 (2011) 12. Greenhalgh, J., Mirmehdi, M.: Detection and recognition of painted road surface markings. In: ICPRAM (2015) 13. Wu, T., Ranganathan, A.: A practical system for road marking detection and recognition. In: The Proceedings of IEEE Intelligent Vehicles Symposium (IV), pp. 25–30, June 2012 14. Hyeon, D., Lee, S., Jung, S., Kim, S., Seo, S.: Robust road marking detection using convex grouping method in around-view monitoring system. In: The Proceedings of IEEE Intelligent Vehicles Symposium (IV), pp. 1004–1009, June 2016 15. Liu, W., Lv, J.,Yu, B., Shang, W., Yuan, H.: Multi-type road marking recognition using adaboost detection and extreme learning machine classification. In: The Proceedings of IEEE Intelligent Vehicles Symposium (IV), July 2015 16. Chen, R., Chen, Z., Shi, Q., Huang, X.: Road marking detection and classification using machine learning algorithms. In: The Proceedings of IEEE Intelligent Vehicles Symposium (IV), July 2015 17. http://www.ananth.in/RoadMarkingDetection.html 18. DSDLDE v.0.9: Video clips for lane marking detection. https://drive.google.com/file/d/ 1315Ry7isciL-3nRvU5SCXM-4meR2MyI/view?usp=sharin 19. Ahmad, T., Ilstrup, D., Bebis, G.: Symbolic road marking recognition using convolutional neural networks. In: The Proceedings of IEEE IEEE Intelligent Vehicles Symposium (IV), pp. 1428-1433, June 2017
Cursor Control Using Face Gestures Arihant Gaur(B) , Akshata Kinage(B) , Nilakshi Rekhawar(B) , Shubhan Rukmangad(B) , Rohit Lal(B) , and Shital Chiddarwar Visvesvaraya National Institute of Technology, South Ambazari Road, Ambazari, Nagpur 440010, Maharashtra, India [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Abstract. This paper presents the software implementation for the detection of facial landmarks with the data trained on iBUG 300 – W dataset by dlib library in python and implementing them for controlling mouse operations. The proposed method aims at catering to the needs of differently – abled people (for example, people with locked – in syndrome), who are not able to operate a computer. Blinking of the left eye will result in a left click, the blinking of the right eye will result in right-click and scroll mode is enabled by opening the mouth. The same is done for disabling the scroll mode. Concept of Eye Aspect Ratio in the case of eyes and mouth aspect ratio in the case of mouth has been used for checking whether that particular mouse operation is performed or not. Image enhancement has been done primarily by Contrast Limited Adaptive Histogram Equalization (CLAHE) and Gaussian Filtering. Keywords: Facial landmarks
1
· Eye Aspect Ratio · Mouth aspect ratio
Introduction
Algorithms of Computer Vision and Image Processing are being widely used across various applications. To cater to the needs of physically disabled people, who are not able to use the computer freely, we propose a novel approach to bridge that gap. The proposed approach enables the user to perform the mouse actions using just the facial gestures. For initiating a left click, the user can blink the left eye, similarly for right click, blink the right eye and moving the tip of the nose across the screen to move the mouse cursor in different directions. In case one needs to scroll up and down, the opening of mouth for a brief period of 5 s will activate the scrolling mode and the page can be scrolled using just the movement of the tip of the nose. The proof of working of the proposed algorithm can be seen in the video [16]. The proposed approach just requires a functional computer having access to a webcam. An initial calibration step makes the code adapt to that particular environment in terms of the amount of noise and brightness. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 31–40, 2021. https://doi.org/10.1007/978-3-030-49345-5_4
32
1.1
A. Gaur et al.
Hardware and Software Specifications
The proposed approach is implemented using python3, along with additional libraries like python3-OpenCV, numpy, scipy, matplotlib, dlib [4] and imutils [12]. All of the calculations and operations have been performed on a laptop having processor Intel CoreTM i5-8250U CPU @ 1.60 GHz × 8. The webcam used has a display of 1280 × 720 and has a frame rate of 30 (sampled down between 15–25 to reduce the sensitivity and to track the mouse easily).
2
Related Works
The concept of facial recognition has been researched extensively. It has been used for addressing various problems such as detecting the mood of a person from the state of eyes using Convolutional Neural Network(CNN) [7,8], using facial landmarks for face recognition by applying Zernike Moments on Hidden Markov Model(HMM) [9]. The algorithm used in most of the publications includes the use of CNN, which is computationally demanding. Also, some of them [7] use wearable computers such as Head Mounted Displays(HMD). In the case of HMM, accuracy is compromised in cases where a high amount of intricacy is required. Other methods also include the use of image processing techniques to extract facial features [17] and also for detection of drowsiness in a person [18,19]. The concept of Gabor Filtering [5] will be useful to analyze different frequency domains in different directions and then extracting the pupil of the eye. Despite a positive result, the algorithm tends to become computationally heavy therefore, the lag increases drastically. The concept of Haar Cascading [6] is useful for the detection of eyes and face. Extraction of the Region of Interest(ROI) for eyes is done for checking whether the eye is closed or not. However, as shown in Fig. 1, some false positives are detected for the eyes, which are difficult to remove in some cases. Apart from that, the algorithm will not be able to detect the face if it is significantly tilted. In other words, it is not rotation invariant. So, change in the orientation of the camera will not be able to detect the face.
Fig. 1. Haar Cascading resulting in false positives
Cursor Control Using Face Gestures
START
Start of Calibration
Video capture object initialized
Keep Both Eyes opened for 5 seconds
Setting of circular boundary for mouse
Reading of frame
Keep only left eye closed for 5 seconds
Calculation of direction and distance from the centre of the boundary to tip of mouse
Conversion to grayscale
Index set for facial landmarks
2 second delay
Left Click Keep only right eye closed for 5 seconds Application of CLAHE diff. in EAR < closed left eye threshold
Y
Right click
2 second delay
Initialization of file containing facial landmarks
N
Keep mouth open for 5 seconds diff. in EAR > closed right eye threshold
Y
Mapping of facial landmarks on the frame
1 second delay
N
Increment of counter till it reaches 30 Y
Calculation of difference in EARs and MAR
MAR > threshold for MAR
Drawing convex hull over eyes and mouth
N counter = 1 Y
Display of graphs for difference in EAR and MAR
Calculation of threshold difference in EAR and MAR
counter = 30 Yes N
Calibration Done?
No
counter = 0 Esc Move mouse in that direction
Y
Scroll according to position of nose
Y
if counter = 0 and mouse is outside the boundary
Key Press
Ctrl - C STOP
N No If counter = 1 and mouse is outside the boundary
ESC Key Press
Yes
Fig. 2. Flowchart explaining the control of mouse actions using face gestures
33
34
3
A. Gaur et al.
Algorithm
Figure 2 on the previous page shows the overall flow of the algorithm. 3.1
Determination of Facial Landmarks
The paper revolves around the use of facial landmarks. The dataset used for mapping 68 facial landmark points on the face uses semiautomatic database annotation tool. The face detector was made using the combination of classic Histogram of Oriented Gradients feature or HOG along with an image pyramid, linear classifier and also a sliding window detection scheme [14]. As for the detection of gestures, the dlib library is used which is based on the implementation of a well – known paper [4]. We used a pre – trained model on the iBUG 300 – W face landmark dataset [10,11,15] and the result is stored in file(.dat). The model is used for the detection of the frontal face and the associated landmarks with it. As the landmarks are obtained, they are displayed on the image. At the same time, their coordinates are stored in an array to be used later in the paper.
Fig. 3. Facial landmarks not mapped due to glare. CLAHE has not been applied.
Fig. 4. Application of CLAHE has managed to map the facial landmarks on the eyes and face, though there is some error due to the existing glare in the image.
Cursor Control Using Face Gestures
3.2
35
Image Enhancement
Before extracting any landmarks, the image histogram should be spread out in to avoid fluctuations in the facial landmarks. For that, the concept of Contrast Limited Adaptive Histogram Equalization or CLAHE [13] is used. The image can be imagined as a histogram, with all the values mapped indicating the intensities and their occurrences. Any abrupt changes in the graph are smoothed out and spread across the image. However, just plainly doing it will also amplify the noise and some details might change in the image either due to some bright spot or dark spot. So, the image is divided into smaller grids or tiles of arbitrary size. Now, histogram equalization is done in these grids only. The noise will get amplified. For that, the contrast limit is used. If the histogram bin has more value than the initialized limit value, then those pixels are clipped and that part is distributed uniformly over the tile before applying histogram equalization. In the end, bilinear interpolation is used, to remove the artifacts in tile borders. The clip limit was set to 2.0 with the tile grid size of 8 × 8. The difference can be seen in Fig. 3 and Fig. 4. 3.3
Calibration
The program requires to be calibrated at first. The entire calibration step lasts about 25 s, which includes opening both eyes, closing only the left eye, closing only the right eye and then opening the mouth. In this way, The calibration is used for finding the reference value of Eye Aspect Ratio (EAR) as well as the reference value of Mouth Aspect Ratio(MAR) [1]. This can be understood visually using Fig. 5.
Fig. 5. Application of CLAHE has managed to map the facial landmarks on the eyes and face, though there is some error due to the existing glare in the image.
EAR and MAR can be represented as Eqs. (1) and (2) EAR =
||p2 − p6 || + ||p3 − p5 || 2||p1 − p4 ||
(1)
where p1 , p2 , p3 , p4 , p5 , and p6 are the coordinates of the landmarks on the eye.
36
A. Gaur et al.
EAR =
||p2 − p8 || + ||p3 − p7 || + ||p4 − p6 || 2||p1 − p5 ||
(2)
where p1 , p2 , p3 , p4 , p5 , p6 , p7 and p8 are the coordinates of landmarks on the mouth. For the first 5 s, the user is required to keep both the eyes open. The algorithm will calculate the EAR for both the left and right eye and return the difference between them. This is because the difference between the EAR of left and right eye is calculated is that when the user keeps both his eyes opened or blinks involuntarily, then the algorithm should not consider it as initiating both left as well as right - click, since the difference in EARs (EARdif f erence ) will be constant for both open and closed eyes. As soon as the left eye blinks, the EARdif f erence will tend to be negative and when the right eye is blinked, the EARdif f erence will tend to be positive. The next allocated 3 s will be spent by the user closing only their left eye. The EARdif f erence is then calculated and recorded. The next 3 s will be for only the right eye closed and the final 3 s will be for the mouth to be kept open. All these operations have a time interval of 2 s between them to incorporate the reaction time of the user. Note that the values are recorded continuously in an array and the time at which it is recorded in another array. For each operation, these two arrays are initialized. Now, a threshold is set for the EARs and MAR, so that whenever EARs and MAR for eyes and mouth cross the threshold limit, then the required operation can be performed. In our case, the left eye blink will initiate a left click, right eye blink will initiate right click and opening of the mouth will activate/deactivate the scroll mode. For that, the median of the arrays is taken during calibration and set them as the threshold limits. 3.4
Implementation of Facial Landmarks
From the 68 landmarks obtained, 6 landmarks are allocated to the left and right eye respectively, 8 landmarks to the mouth and 1 point to the tip of the nose. The eyes and the mouth are enclosed in a closed convex polygon. It will also be used for calculation of area under these polygons to find whether the eyes and mouth are open or closed. An open eye would mean more area under the polygon and a closed eye would mean less area under the polygon. So, the user will be able to click if the convex hull has less area as well as if the difference in EAR of left and right eye is significant. 3.5
Algorithm Deployment
After finding the facial landmarks again using the same algorithm and locating their coordinates, EAR and MAR are calculated for every frame.
Cursor Control Using Face Gestures
37
Left and Right Click. As soon as the threshold limit is crossed, the required operation must occur. So, mouse operations must be connected with the python code. For this, there is a library called pyautogui, which helps in achieving this easily. Operations such as scrolling, clicking, and traversing across the screen are handled by this library. A sample graph showing left and right click can be seen in Fig. 6.
Fig. 6. The Graph between time and difference between EAR of left and right eye. The black ‘x’ denotes left click and red ‘x’ denote right click. The difference is multiplied by 100 to see the difference clearly. The left and right thresh indicate the limits below or above which the left and right click are initiated respectively. The limits of right and left thresh have been set from the calibration done before.
Movement of Mouse Across the Screen. A circular boundary of radius 50px is made. Inside the circle, the nose won’t move the mouse. As soon as the tip of the nose goes outside this region, the mouse will start moving in the direction pointed. In Fig. 7, one can see the user moving the tip of the nose out of the red circle and when it is done, the mouse cursor will move in the direction set by the vector between the tip of the nose and the center of the circle. Now, the mouse should be moved in the direction of the nose tip. Let the angle made by the vector on the x – axis be ‘a’. So, the direction vector can be represented as Eq. (3), v = cos (a)i + sin (b)j (3) Now, the mouse pointer can be moved from one point to the other point, by using the pyautogui library in python. The result can be seen in Fig. 7.
38
A. Gaur et al.
Fig. 7. Moving the cursor across the screen
4
Results
The concept of calibration was proposed for the algorithm was found to be working in real-time on any person. A survey on the performance of various facial landmark detection methods can be found in [20]. The experimentation is divided into two parts: one to check whether the facial landmarks are sitting well on the face or not and the other one to test the precision and recall of the program. When both the eyes are open or when both eyes are closed, the difference between the EAR of left and right eye should be close to zero. So, a data set is used from AT & T Laboratories [3], wherein the code runs through a set of 400 images having 40 subjects. The size of each image is 92 × 112 pixels and all the images are in grayscale. The images have varying lighting conditions and some subjects wear spectacles too. The eyes of the subject can either be open or closed. After the test was done, it was seen that the difference between the EAR of the left eye and the EAR of the right eye on an absolute scale is 0.002, which is quite close to the theoretical value of 0, that is, there is only a deviation of 0.2%. Now the main code is tested for the detection of blinks. A dataset, consisting of 5 videos is made, where the subject blinks their left and right eye 10 times, in no particular order. The result for that is in Table 1. Precision and recall can be represented using Eqs. (4) and (5), P recision = Recall =
TP TP + FP
TP TP + FN
(4) (5)
Cursor Control Using Face Gestures
39
Table 1. Table showing the precision and recall for each video in the data set. Due to calibration, the threshold limits differ for every user. User True positives False positives
False negatives
Lighting conditions
Wearing spectacles
Precision (%)
Recall (%)
A
18
0
2
Good
Yes
100
90
B
19
0
1
Decent
Yes
100
95
C
20
2
0
Decent
No
90.9
100
D
15
0
5
Poor
Yes
100
75
E
17
1
3
Poor
No
94
85
5
Limitations
As stated in the introduction, the contrast of the image is easily modifiable, but not the illumination in the environment, which can only be modified by adjusting the lighting of the environment. Despite modifying the contrast for reducing the false positives, places where intensity at certain spots are abnormally high, such as keeping a glowing light source inside the frame itself, normalizing it would not only change the contrast of the face, but it might also provide glare on the face, which can result in loss of some facial characteristics. It is known that different people have different sets of facial characteristics. So, for some cases, the EAR and MAR differences might not be stark, which might result in mixed results. Apart from that, the facial landmarks will not be mapped properly if the head orientation is changed significantly.
6
Conclusion and Future Works
The paper discusses the use of facial landmarks for controlling mouse actions. Any suggestions for possible further extension of this work are welcome. One possible extension of the paper can be for its use in security systems, where only selected users have access to the mouse actions by facial recognition techniques. Also, a screen lock mechanism using face gestures can also be used, where a user can unlock the console if he or she imitates the face expressions stored in its database. Limitations that were encountered during the experimentation of the paper should not be forgotten and can be rectified in order for its smooth operation.
References 1. Soukupova, T., Cech, J.: Real-time eye blink detection using facial landmarks. In: Luka, C., Rok, M., Vitomir, S. (eds.) 21st Computer Vision Winter Workshop Rimske Toplice, Slovenia, 3–5 February, pp. 1–3 (2016) 2. Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 FPS via regressing local binary features. In: Proceedings CVPR, pp. 1685–1692 (2014)
40
A. Gaur et al.
3. https://www.kaggle.com/kasikrit/att-database-of-faces/ 4. Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: CVPR, pp. 1–2 (2014) 5. Arai, K., Mardiyanto, R.: Comparative study on blink detection and gaze estimation methods for HCI, in particular, gabor filter utilized blink detection method. In: Proceedings of 8th, International Conference on Information Technology: New Generations, Las Vegas, USA, pp. 442–443 (2011) 6. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, pp. I–511–I–518, p. 3 (2001) 7. Hickson, S., Dufour, N., Sud, A., Kwatra, V., Essa, I.: Eyemotion: classifying facial expressions in VR using eye-tracking cameras. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1626–1627 (2019) 8. Behera, A., Gidney, A., Wharton, Z., Robinson, D., Quinn, K.: A CNN model for head pose recognition using wholes and regions. In: IEEE International Conference on Automatic Face and Gesture Recognition (Accepted/in press) 9. Rahul, M., Shukla, R., Yadav, D.K., Yadav, V.: Zernike moment-based facial expression recognition using two-staged hidden markov model. Adv. Comput. Commun. Comput. Sci. 924, 664 (2019) 10. Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: A semi-automatic methodology for facial landmark annotation. In: Proceedings of IEEE International Conference Computer Vision and Pattern Recognition (CVPR-W), 5th Workshop on Analysis and Modeling of Faces and Gestures (AMFG 2013), Oregon, USA, June 2013 11. Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: Proceedings of IEEE International Conference on Computer Vision (ICCV-W), 300 Faces in-the-Wild Challenge (300-W), Sydney, Australia, December 2013 12. Rosebrock, A.: Imutils. https://github.com/jrosebr1/imutils 13. Pisano, E.D., Zong, S., Jhonston, R. E.: Contrast limited adaptive histogram equalization image processing to improve the detection of simulated speculation in dense mammograms. J. Digit. Imaging 11(4), 193–200 (1998) 14. http://dlib.net/face landmark detection.py.html 15. Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the- wild challenge: database and results, special issue on facial landmark localisation ”in-the-wild”. Image Vis. Comput. (IMAVIS) 47, 3–18 (2016) 16. https://www.youtube.com/watch?v=JthUAjAT1SE 17. Sharma, S., Jain, S., Khushboo.: A static hand gesture and face recognition system for blind people. In: 2019 6th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 536–537 (2019) 18. Manu, B.N.: Facial features monitoring for real time drowsiness detection. In: 2016 12th International Conference on Innovations in Information Technology (IIT), pp. 79-80 (2016) 19. Alshaqaqi, B., Baquhaizel, A.S., Amine Ouis, M.E., Boumehed, M., Ouamri, A., Keche, M.: Driver drowsiness detection system. In: 2013 8th International Workshop on Systems, Signal Processing and Their Applications (WoSSPA), pp. 152-153 (2013) 20. Sandikci, E.N., Erdem, C.E., Ulukaya, S.: A comparison of facial landmark detection methods. In: 2018 26th Signal Processing and Communications Applications Conference (SIU) (2018)
A Smart Discussion Forum Website Rohit Beniwal1(&), Mohd. Danish1, and Arpit Goel2 1 Department of Computer Science and Engineering, Delhi Technological University, New Delhi 110042, India [email protected], [email protected] 2 London Business School, London NW1 4SA, UK [email protected]
Abstract. Sentiment Analysis deals with understanding the context of textual data and further forming an opinion based on the piece of text. The Sentiment Analysis further classifies the user’s emotions and opinions in various categories such as positive, negative, or neutral. Applications of Sentiment Analysis in various research areas are quite abundant and clearly visible across the literature. Here in this paper, we also accomplished the application of Sentiment Analysis. To be specific, we developed a discussion forum website that allows a user to post questions, answers, and comments or feedback of their choice along with to like and dislike answers. This discussion forum then automatically performs the Sentiment Analysis on the feedback or comments posted by the users. This performed Sentiment Analysis categorizes the answers written on various topics on discussion forum website and then presents the emotional quotient of people, i.e., whether the users are happy, angry, or sad, etc. with the answers. Therefore, in our discussion forum, we ranked the answers based on the sentiment score and no. of likes and dislikes, which makes our discussion forum unique as compared to other available discussion forums in the market. In order to realize the effectiveness of our work, dummy data entries were made on the discussion forum website in order to cross verify the ranking of answers based on the sentiment score and no. of likes and dislikes. Keywords: Automatic Website
Discussion forum Sentiment analysis Smart
1 Introduction In today’s world, posting a question is really simple as there are many discussion forums; however, the quality of an answer is deteriorating day by day. It is very difficult to filter out quality answer considering the number of answers available to a particular question. It becomes very essential to gather feedback and discover what the users feel about a particular answer. Keeping this in mind, we came up with a solution by developing a discussion forum website, to refine the feedback generating mechanism by analyzing comments and feedback of users on various topics. This would ensure that the users are happy with the services being offered to them.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 41–49, 2021. https://doi.org/10.1007/978-3-030-49345-5_5
42
R. Beniwal et al.
Our discussion forum is a smart discussion forum which allows users to post questions, answers, comments, or feedback, and lets users like or dislike answers like other conventional discussion forum websites. Along with this, we attached a feedback mechanism that automatically performs the Sentiment Analysis based on the user’s comments, social media posts, etc. This feedback is then used for further analysis of the answer. The answers and comments are then adjusted according to their rank in order to provide the best possible result. Hence, the ranking of answers not only depends on the likes and dislikes, but, also on the sentiments of the users on the desired post and the social media emotions and presence/relevance associated with a post, which makes our discussion forum unique as compared to other available discussion forums in the market. Thus, our discussion forum provides a smart website assistance platform which improves itself based on the likes, dislikes, feedback or comments, and sentiments of the users. The rest of the paper is organized as follows: Sect. 2 discusses the related work; Sect. 3 illustrates the architecture of the proposed system; Sect. 4 elaborates the implementation of system architecture; Sect. 5 discourses the results and analysis followed by Sect. 6, which concludes the paper and provides directions for future work.
2 Related Work Gokulakrishnan et al. [1] provided a model for Sentiment Analysis and opinion mining. Based on the classification of data into subjective - objective, or irrelevant, they applied positive and negative sentiment analysis, which delivered higher accuracy as per their results. Wen et al. [2] performed Sentiment Analysis on forum posts in a Massive Open Online Course (MOOC) to monitor students trending opinions towards the course, lecture, and peer-assessment. They also monitored how the opinions of students change over time. Maharani et al. [3] presented a syntactical based aspect and opinion extraction using decision tree and rule learning to produce a sequence labeling pattern set. They provided an exhaustive analysis of aspect extraction using typed–dependency and pattern base technique. The patterns were then utilized to recognize and find aspect term candidates in customer product reviews. Gojali and Khodra [4] underlined the significance of the aspect-based sentiment analysis when contrasted with the common opinion of the entire document. They proposed a method which discovers potential aspect and its sentiment. Moreover, they discover its polarity and categorizes it accordingly. Sahu and Ahuja [5] exhibited a system to determine the movie reviews polarity and used semantic techniques for data pre-processing. Furthermore, they used feature impact analysis to reduce the features and calculate their importance. The proposed system was assessed on an IMBD dataset, and it then accomplished an outcome of 88.951% accuracy. Sun et al. [6] demonstrated a series of opinion mining methods appropriate for various circumstances like sentence-level opinion mining, cross-domain opinion mining, and document-level opinion mining. Afterward, they also recorded some comparative and deep learning approaches of the same.
A Smart Discussion Forum Website
43
Though much work based on Sentiment Analysis is carried out on different datasets of various discussion forums, however, in this work, we proposed a system that automatically applies Sentiment Analysis on user’s comment or feedback available on the discussion forum website to rank the answers in addition to the likes or dislikes of the user. Thus, the proposed discussion forum is unique as compared to other available discussion forums in the market.
3 Architecture The architecture of the proposed discussion forum is shown in the following Fig. 1. The figure explains how our discussion forum works.
Fig. 1. System Architecture
44
R. Beniwal et al.
First, on the user side, the homepage is mounted on the root server and the user can add a question, comment, answer, and also be able to provide feedback. All of these actions will be handled on the server through different routes. Second, all the queries will be performed at modal side and database will be updated based on the input received on view from the user. Third and last, on the view side, the updated template will be sent to the client side based on the template populated with information from the database. With parallel to the modal side, the system simultaneously performs Sentiment Analysis using the IBM Watson Natural Language Understanding algorithm. The results of the Sentiment Analysis are then presented on a local host. The desired ratios, emotions and other aspects can be set beforehand.
4 Implementation To implement the system architecture, we used HTML, CSS, JavaScript, jQuery, and twitter bootstrap for front end implementation. Bootstrap is used as it is a standardized format for creating responsive web pages with clean aesthetics/mobile compatible websites. Moreover, we used Node.js, Express Framework, Sockets.IO, MySQL, Handlebars.js for back end implementation. Node.js is used as it asynchronously handles a large amount of request, thus, makes the website scalable and reduces the server size requirement. Express Framework is used as it provides a robust set of features to develop web and mobile applications. Socket.IO establishes a pipeline to carry out a live chat system. The following Fig. 2 shows one of snapshot showing the question- answer web page. Here in this web page, it shows a question posted by a user. Another user can then post an answer to it by clicking on the ‘Post reply’ button. Moreover, the webpage allows any user to like/dislike any answer. Additionally, users can also comment on the quality of answers posted by any user. 4.1
Dataset
In order to realize the effectiveness of our work, dummy data entries were made on the discussion forum website. Therefore, the entries may not be factually correct. Each entry may consist of questions followed by appropriate answers. The answers may or may not be factually relevant and hence, do not hold any significance. All this has been done to realize the effectiveness of our approach. The dummy dataset can be corrected or modified at a later stage, if required.
A Smart Discussion Forum Website
45
Fig. 2. Question- Answer Web Page
4.2
Data Pre-processing
The following Fig. 3 explains how Sentiment Analysis is performed on the discussion forum website data. As far as our discussion forum website is concerned, the required data is available as part of the user’s comment or feedback to any question posted over there. Once data is available, it is scrapped and sent to the IBM Watson tool as a URL. The system then automatically performs the data pre-processing using IBM Watson Natural Language Understanding algorithm. This data pre-processing is a five-step process. As we all know that these days people use a lot of emojis during their conversation and discussion forums are no different. Therefore, the first step includes converting all the emojis into their equivalent textual data so that our system can easily analyze them for Sentiment Analysis purpose. The second step includes converting all the textual data into its lowercase equivalent. In the third step, the system removes the Non-ASCII
46
R. Beniwal et al.
Fig. 3. Data pre-processing
characters because they are of no use to find the sentiment score of a particular comment. The fourth step includes removing of stop word. “Stop words are common English words such as the, am, their which do not influence the semantic of the review. Removing them can reduce noise” [7, 8]. Lastly, in the Fifth step, lemmatization is performed by the system. “Lemmatization is the process of grouping together the inflected forms of a word so they can be analyzed as a single item” [8, 9]. All these steps are implemented in the system. Once, all these steps are performed, the system then performs the Sentiment Analysis to determine sentiment score. 4.3
Sentiment Analysis
Once the data pre-processing stage is over, the system then determines the sentiment score in the range of −1 to 1 using IBM Watson Natural Language Understanding (NLU) algorithm. Results of the Sentiment Analysis are presented on a local host. 4.4
Ranking of Answer
After the system determines the sentiment score corresponding to a given answer, we then adjusted the rank of an answer based on the sentiment score along with the no. of likes and dislikes to a given answer. The rank to a given answer is provided according to the ranking score, which is calculated using the following formula. Resultant Likes=Dislikes Value ðRLDÞ ¼ No: of Likes No: of Dislikes
A Smart Discussion Forum Website
Ranking Score of an answer ðRSÞ ¼ Sentiment Score þ
47
RLD 10No: of Digits of RLD
5 Result and Analysis As we provided dummy entries to our discussion forum to cross verify its effectiveness, the system automatically performed the Sentiment Analysis on the given comments [10–12]. For e.g., for the following comment as shown in Fig. 4, corresponding to the question of Fig. 2, the system performed the Sentiment Analysis and it corresponding result is shown in Fig. 5.
Fig. 4. Dummy input comment
For this above input comment, the system calculated the sentiment score, which is as shown in following Fig. 5.
Fig. 5. Sentiment Score
Therefore, for the above input comment, the Ranking Score of the answer is 1.120914 as the answer had twenty-five likes and three dislikes.
48
R. Beniwal et al.
6 Conclusion and Future Scope Our discussion forum website is working perfectly on localhost and is showing the questions along with their ranked answers based on the sentiment score of the comments along with no. of likes and dislikes to a given answer. The system allows users to post questions, answers, comments or feedback, and lets users like or dislike answers like other conventional discussion forum websites. Along with this, we attached a feedback mechanism that automatically performs the Sentiment Analysis based on the user’s comments. This Sentiment Analysis along with the no. of likes and dislikes to a given answer is then used to rank the answers in our system. Higher the rank means that the given answer is placed on top of other comparatively low ranked answers. Thus, this solution of ranking the answers on our discussion forum website makes them more reliable and authentic. Hence, our discussion forum provides a smart website assistance platform which improves itself based on the likes, dislikes, feedback or comments, and sentiments of the users. A possible direction of future work may include filtering out the spam data from our discussion forum website, which further improves its efficiency and performance.
References 1. Gokulakrishnan, B., Priyanthan, P., Ragavan, T., Prasath, N., Perera, A.: Opinion mining and sentiment analysis on a Twitter data stream. In: International Conference on Advances in ICT for Emerging Regions (ICTer2012) (2012) 2. Wen, M., Yang, D., Rose, C.: Sentiment analysis in MOOC discussion forums: what does it tell us? In: Educational Data Mining (2014) 3. Maharani, W., Widyantoro, D., Khodra, M.: SAE: syntactic-based aspect and opinion extraction from product reviews. In: 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA) (2015) 4. Gojali, S., Khodra, M.: Aspect based sentiment analysis for review rating prediction. In: 2016 International Conference on Advanced Informatics: Concepts, Theory And Application (ICAICTA) (2016) 5. Sahu, T., Ahuja, S.: Sentiment analysis of movie reviews: a study on feature selection & classification algorithms. In: 2016 International Conference on Microelectronics, Computing and Communications (MicroCom) (2016) 6. Sun, S., Luo, C., Chen, J.: A review of natural language processing techniques for opinion mining systems. Inf. Fus. 36, 10–25 (2017) 7. Maalej, W., Nabil, H.: Bug report, feature request, or simply praise? On automatically classifying app reviews. In: 2015 IEEE 23rd International Requirements Engineering Conference (RE) (2015) 8. Bhatia, M.P.S., Kumar, A., Beniwal, A.: An optimized classification of app reviews for improving requirement engineering. Recent Adv. Comput. Sci. Commun. 13(1), 12 (2020) 9. Lemmatisation. https://en.wikipedia.org/wiki/Lemmatisation 10. Kumar, A., Bhatia, M.P.S., Beniwal, R.: Characterizing relatedness of web and requirements engineering. Webology, 12(1) (2015)
A Smart Discussion Forum Website
49
11. Bhatia, M.P.S., Kumar, A., Beniwal, R.: Ontology based framework for detecting ambiguities in software requirements specification. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 3572–3575. IEEE, March 2016 12. Bhatia, M.P.S., Kumar, A., Beniwal, R.: Ontology based framework for reverse engineering of conventional softwares. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 3645–3648. IEEE March 2016
Certificate Management System Using Blockchain Anjaneyulu Endurthi1(&) and Akhil Khare2 1
2
Computer Science and Engineering Department, Rajiv Gandhi University of Knowledge Technologies, Basar, Telangana, India [email protected] Computer Science and Engineering Department, MVSR Engineering College, Hyderabad, Telangana, India [email protected]
Abstract. Overcoming the problems of existing file storage and sharing approaches, this paper gives an efficient way of storing and sharing the files using blockchain and smart contracts. As an example, Managing student’s information and sharing them or giving access to unauthorized people in a much secured way is one of the major responsibilities of any university or organization. Instead of storing the certificates in a normal database, blockchain technology can be used to store the certificates. In our approach, the certificates and other documents will be uploaded to the blockchain and will be shared securely when any third party needs to access the certificates. The university administrator has ultimate rights to upload student’s certificates and managing them. Students can view their documents and certificates; other unauthorized people (third party/organization/employer/others) can access the files after taking permission of the college administrator. These types of approaches are very useful in this era where efficient storage and sharing the files plays a crucial role. The feature which differentiates our approach of certificate management system from existing certificate management system is, our system is decentralized, cryptographically secure, immutable and efficient. Keywords: Blockchain Cryptocurrency Distributed ledger Node application Consensus Ethereum Solidity Smart contracts
1 Introduction Blockchain [1] is a peer-to-peer, distributed ledger [2] that is cryptographically secure, append-only, immutable and update-able only via consensus or agreement among peers. It was first developed for Bitcoin cryptocurrency [1]. The concept of blockchain technology was first introduced in 2008. Blockchain technology has many applications across different sectors like banking [3], hedge funds, Internet identity & DNS, health care [4], voting, messaging apps, real estate [5], critical infrastructure security, cryptocurrency, education & academia, cloud computing, stock trading etc. Ethereum [6, 7] launched in 2015, is the first blockchain introduced turingcomplete language. This is in contrast to the limited scripting language in bitcoin and many other cryptocurrencies. With the help of Solidity programming [8], one can © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 50–57, 2021. https://doi.org/10.1007/978-3-030-49345-5_6
Certificate Management System Using Blockchain
51
develop decentralized applications. Ethereum provides a public blockchain to develop smart contracts [9, 10] – pieces of code permanently stored on the blockchain and capable of responding to users requests. In this paper we have introduced a novel approach to store and share student’s certificates in a secure way. The paper is organized as follows: Sect. 2 provides technical overview of blockchain, Sect. 3 introduces the proposed scheme and its implementation details, results of the implementation were discussed in Sect. 4, advantages of the proposed system over the current system were discussed in Sect. 5 and Sect. 6 provides conclusions and future research directions.
2 Techincal Overview 2.1
Blockchain
Blockchain is a distributed system that stores and records all the transactions occurred in blockchain network. It acts as database that stores all the transactions in the network, Among the several entities or participants of the network the database is replicated and shared. Every participant can be visualized as a node in the blockchain. Every node contains a copy of entire data of the blockchain. Blockchain is a digital ledger that not only records financial transaction but virtually everything of value. Blockchain is basically a chain of blocks in which each block records some transactions. Each block has a hash value. Each block is connected to its previous block by storing its hash value. The transactions in blockchain are immutable and cannot be undone which make blockchain is a way more secure compare to other existing technologies. There are no centralized servers in blockchain. So every transaction is known by each and every node in the blockchain network. Blockchain uses the concept of Public key Infrastructure [11]. There are generally two types of blockchain networks public and private [12, 13]. In a public blockchain network any anonymous person can join, view the transactions and make a new transaction. Examples of public blockchain networks are Bitcoin, Ethereum etc. In private blockchain network only users with permissions can join, view and send transactions. Examples of private blockchain network are Everledger, Ripple etc. 2.2
Components of Blockchain
The main components of blockchain are as follows: Block. Each block consists of transactions/data along with the meta-data like hash of previous block, timestamp, nonce value and hash of the current block. These blocks use the concept of hashchain to form a chain of blocks or blockchain i.e. each block stores the hash of previous block. The very first block is called as the Genesis block or block 0, which is hardcoded at the time of creating the blockchain (Fig. 1).
52
A. Endurthi and A. Khare
Fig. 1. Blockchain
Cryptographic Hash Function. Hash function is any function which can convert an arbitrary sized data into a fixed size data. Hash functions provide confidentiality and integrity to the data. They possess following properties: • Pre-image resistant • Second pre-image resistant • Collision resistant A specific cryptographic hash function used in many blockchain implementations is the Secure Hash Algorithm (SHA) with an output size of 256 bits (SHA-256). Asymmetric-Key Cryptography. Unlike Symmetric key crypto-system which uses only one key to encrypt the data as well as to decrypt the data, Asymmetric key cryptosystem consists of two keys, one of which is used to encrypt the data and other key is used to decrypt the data. These keys are called as private and public keys. This provides confidentiality and authentication to the data. Distributed Ledger. This is a type of database which is shared, replicated and synchronized among the members (or nodes) of a decentralized network. Node Application. Each node (member of network) has to install and run a node application related to that particular blockchain. Consensus Algorithm. Consensus refers to agreement between different people/nodes over the set of rules and regulations to implement a blockchain. Smart Contracts. This is the code/programs written in various high level languages like Golang, JavaScript, Solidity etc. The system of physical contracts can be replaced with the help of smart contracts.
3 Proposed System One of the major responsibilities of any university is to organize student’s files and certificates. In the existing system, the certificates are stored digitally in a normal database. Whenever any third party (organization or employer or others) wishes to access or verify the certificates of a particular student, then the university may digitally
Certificate Management System Using Blockchain
53
send a certificate to the third party or send a link to the third party, using which the third party can access and verify the certificates. The proposed system considers three categories of people: The administrator, students and third party. 3.1
Approach
The administrator adds student details and certificates in blockchain in a secure way. Whenever the administrator adds the details along with the certificate, a hash value is generated and stored in the blockchain as a transaction. Along with the hash value, the time stamp is also created and stored. The administrator provides a login to the students of the organization so that they can access and verify their details and certificates. If any third party wishes to access and verify the certificates, they have to send request to the administrator with that particular student ID or roll number. Administrator verifies the need and accordingly user ID and password is generated by the system and will be shared with the third party. The proposed system restricts the number of times the certificates can be accessed and also gives a time frame after which the user ID and password could be invalidated. In the next sub sections, prerequisite software's, installation of IPFS (InterPlanetary File System) [14], working of IPFS, certificate management, permissions associated with the three categories is discussed. 3.2
Prerequisites
To implement the proposed system, following packages/software’s are required (Our implementation includes the given versions. Based on the compatibility and availability newer versions can be used). Node package manager (version 6.4.1), can be installed using nvm, Node(version v10.10.0), Solc compiler (version 0.4.25), Web3 (version 1.0.0-beta.37), Nextjs and semantic-ui-react, Reactjs, Metamask Wallet, Rinkeby ethers and ipfs-api. 3.3
Installing IPFS
IPFS (InterPlanetary File System) can be used to store and share hypermedia in a distributed file system. Go to ipfs.io to download the software compatible to the platform. IPFS needs to initialized before we can use it. Open terminal and type “ipfs init”. The IPFS daemon must be running to use the system. Run the following command in a separate terminal to start the IPFS daemon. “ipfs daemon”. 3.4
Working of IPFS
Initially certificates will be stored using IPFS. IPFS gives a unique hash to each file. The hash is totally different even if there is a difference in any single character. Hence, IPFS can use the content of the file to locate its address, instead of using a domain name just like what HTTP does. IPFS removes redundant files in the whole network. Every edit history is recorded and can be easily traced back. When a search query is fired, IPFS searches for the document based on its hash. As the hash is unique, it is easy
54
A. Endurthi and A. Khare
to make a query. We also have IPNS to locate IPFS hash. All of the nodes in IPFS stores a hash table to record the corresponding location of the file. These Hashes will be stored in blockchain. 3.5
Adding a Certificate
Certificates will be added by the administrator. He/She can add any certificate image to the IPFS by using following command:“ipfs add imagename.png”. After adding an image to IPFS, a hash of file or image is generated by the system. Following is a sample hash value for file. “QmR1VhhtXVvzw5zc12wj1CtohKh6DU7Y1MHpqRdhmrTZ7Y”. 3.6
Accessing the Stored Certificate
Administrator can get file or image from IPFS by using the hash value of that file. The following command can be used to access or download file or image. “Ipfs Get QmR1VhhtXVvzw5zc12wj1CtohKh6DU7Y1MHpqRdhmrTZ7Y”. 3.7
Permissions
As the proposed system considers three categories of people: The administrator, students and third party. Every category has some special permission. Administrator is the main entity; He has permission to upload the certificates, create accounts for student’s and can give access to student’s and others to access and verify the certificates. Student can only have the permission to view his/her documents that were uploaded by administrator. Initially, others do not have permission to access student’s documents but can get read only permissions from the administrator upon request.
4 Results The proposed system is implemented on Ethereum platform. Following are the results of the same. 4.1
Administrator
As administrator has all the permissions, he/she manages the other two categories and can change the rules and laws using solidity programming [8]. Administrator would get requests from third party to get access for a particular student. 4.2
Student
Each student is provided with their own credentials to login into their accounts and they can view their certificates uploaded by college administrator (Figs. 2 and 3).
Certificate Management System Using Blockchain
55
Fig. 2. Home page of administrator and the requests sent by third party
Fig. 3. Home page of student
4.3
Third Party
If any third party such as any organization, employer or others want to access or verify any particular student’s certificates, they can do so in their home page (Fig. 4).
Fig. 4. Home page of Guest (Third party)
56
A. Endurthi and A. Khare
5 Advantages of the Proposed System The current system of storing the certificates in a normal database has many shortcomings like security, efficiency and availability. It is also difficult to keep track of people who have accessed the data as the logs can be modified by the attacker. It is also possible to modify even the certificates and no one knows whether they are original or not. The proposed system has following advantages over the current system. • Immutability. Once the certificates have been stored in the blockchain, it is computationally difficult to modify them as every block stores the hash of previous block, the attacker has to change all the blocks/certificates in the blockchain, which is computationally difficult. • Transparency and trust. Any third party or students can be able to see and verify the data which is present in the blockchain and thus establishes trust. • Highly secure. All the certificates stored in the blockchain are cryptographically secure. • Audit/Trail. The logs of people who have accessed the certificates could be stored in the blockchain itself. Thus the administrator can verify who have accessed the certificates.
6 Conclusions and Future Research Directions The current system of certificate management is volatile. If an attacker wishes to hack into the system and forge the documents, he/she can easily do so by attacking the central node running the system. Whereas the proposed system runs on blockchain and provides far better security. It also provides transparency to the participants, to make it an open and fair alternative to the current system. As blockchain is immutable, similar type of systems can be developed to store data like health care records, land records, DNS, banking transactions etc.
References 1. Nakamoto, S.: Bitcoin: A Peer-to-Peer Electronic Cash System (2008) 2. Walport, M.: Distributed ledger technology: beyond block chain, January 2016 3. Kelly, J., Williams, A.: Forty Big Banks Test Blockchain-Based Bond Trading System (2016) 4. Kar, I.: Estonian Citizens Will Soon Have the World’s Most Hack-Proof Health-Care Records (2016) 5. Oparah, D.: 3 Ways That the Blockchain Will Change the Real Estate Market (2016) 6. Buterin, V., et al.: A next-generation smart contract and decentralized application platform. White Paper (2014) 7. Ethereum: State of knowledge and research perspectives Sergei Tikhomirov, SnT, University of Luxembourg (2017) 8. Dannen, C.: Introducing Ethereum and Solidity. Springer, Berkeley (2017) 9. Szabo, N.: Smart Contracts (1994)
Certificate Management System Using Blockchain
57
10. Szabo, N.: The Idea of Smart Contracts (1997) 11. Housley, R.: Public Key Infrastructure (PKI). Wiley, Hoboken (2004) 12. Jayachandran, P.: The difference between public and private blockchain. IBM Blockchain Blog, vol. 31, May 2017 13. Swan, M.: Blockchain: Blueprint for a New Economy. O’Reilly Media, Inc., Sebastopol (2015) 14. Benet, J.: IPFS - Content Addressed, Versioned, P2P File System (2014)
Reality Check in Virtual Space for Privacy Behavior of Indian Users of Social Networking Sites Sandeep Mittal1(&) 1
2
and Priyanka Sharma2
Cyber Security and Privacy Researcher, NICFS (MHA), New Delhi, India [email protected] I.T. and Telecommunication, Raksha Shakti University, Ahmadabad, India
Abstract. The users of social networking sites intentionally or unintentionally reveal large amount of personal information about themselves. These SNSs’ users have certain clues about the attitude of the persons with whom they interact in the physical world which are missing during online interaction. Therefore, their attitude in maintaining privacy of personal information in virtual space need to be understood. The present study is a maiden attempt to understand privacy attitude of the SNSs’ users in online environment. The present study has identified and validated significant trends in privacy attitudes of Indian users of social networking sites and would serve as a starting point for future research. Keywords: Information privacy Data privacy attitude Data privacy Social networking sites
1 Introduction What could have been the common ground in the stories of Sir John Sayer, the illustrious spy who could not become the chief of MI6 and the Officer Trey Economidy of the Albuquerque Police, who was dismissed from the job? Yes, it was their lack of understanding of the sharpness of the two edges of the ‘social-media-sword’. The behaviour of user of social networking sites is very crucial in virtual social interactions not only for common man but also for a law enforcement officer. The issues involved in handling social media are multiple and complex, but the attitude of user of social networking sites is one that is most generic. The concern about the privacy of the members of the civil society has been pondering over the minds of the citizens, thinkers, intellectuals, governments and lawmakers alike during the historical past and present, and perhaps would continue to be an important social and individual concern in future in any civil society. The general privacy beliefs are results of complex interaction of social norms and moral value beliefs often mediated in space and time by a number of social variables at individual and collective levels. In real-life social interactions, the individuals have a control over the personal information shared amongst each other. The personal information thus shared in physical world has a limited and slow flow to others and generally dissipates © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 58–70, 2021. https://doi.org/10.1007/978-3-030-49345-5_7
Reality Check in Virtual Space for Privacy Behavior of Indian Users
59
with time with no trace after a relatively reasonable time span. Its impact on a person’s reputation is also relatively limited to a relatively close social- circle. The rise of the Internet, Web 2.0 and easy availability of smart devices has resulted in an era of privacy development where the use of social network(ing) sites (SNSs) like Facebook, LinkedIn, Twitter etc. for exchanging information in virtual space has become the norm. These SNSs are used to maintain networks for exchange of information on anything under the sun, be it the innocent exchange of academic ideas or planning a terrorist attack around the globe. The personal information exchanged over such SNSs generically differ from that in real world in that the persons exchanging information are not face to face with each other thus compromising the real world controls on the information, travels fast and far beyond the control of anyone and has perpetual availability on internet in accordance with the adage “God forgives and forgets but internet never does…..”. The user generated content, mostly beyond the knowledge and comprehension of SNSs’ users, and algorithms of the web aggregating services further worsens the privacy scenario today with SNSs sensing the every breath and the every step one takes in real life. The general privacy, initially defined either by value-based approach or cognatebased approach, gradually shifted in present information era to ‘privacy as a right’ concept to “control physical space and information” [1]. The protection of privacy and confidentiality of this personal data at residence and in motion within and across the borders is a cause of concern. In India, until the recent judgment by the ‘Nine Judges Constitutional Bench’ of Hon’ble Supreme Court of India [2], the right to privacy was not even recognized as a fundamental right and a data privacy legal framework is still lacking. This judgment has recognized right to privacy as a fundamental constitutional right in India and has directed Government of India to put in place, a robust data privacy regime expeditiously for which Government of India has constituted a Committee called ‘Justice B. N. Srikrishna Committee’ [3]. As the current process of drafting a data privacy framework in India has commenced, the present study, which is part of a larger study undertaken by authors, is scoped to understand the privacy attitudes of the Indian users of the SNSs.
2 The Literature Review 2.1
The Definition of Privacy
A perusal of the scholarly reviews on privacy reveals mainly two approaches to defining the general privacy, viz., value-based and cognate-based, the former being more prevalent in legal, sociological and political studies while the latter being more explored in psychological studies. In the present study a mix of these two approaches is used to explore the cognitive aspect (attitudes towards privacy) and the right-based aspect (expectations from law to protect privacy). As cognate-state approach, the general privacy is defined as “a state of limited access to a person” which narrowed down to Information systems broadly translates to “a state of limited access to information” [3]. As cognate-control approach the general privacy is defined as “the
60
S. Mittal and P. Sharma
selective control of access to the self” [4] and as “control of transactions between person(s) and other(s), the ultimate aim of which is to enhance autonomy or/and to minimize vulnerability” [5]. As a right-based approach, the general privacy is treated differently in different parts of the world, e.g., in the EU, privacy is seen as a fundamental human right; while in the U.S., privacy is seen as a commodity subject to the market and is cast in economic terms. 2.2
The Privacy and the Social Network(ing) Sites
In course of social interactions in the physical world, while an individual uses his physical senses to perceive and manage threats to his privacy, he has no such social and cultural cues to evaluate the target of self-disclosures in a visually anonymous online space of SNSs. Therefore, while the cognitive management of protection of privacy in offline world is performed unconsciously and effortlessly, deliberate actions are required for effective self-protection are required on SNSs [6]. These deliberative actions can be understood in terms of the “Theory of Planned Behavior” (TPB) [7] which stipulates that “an individual’s intention is a key factor in predicting his or her behavior. 2.3
Attitudes Towards Privacy on SNSs
Only a few empirical studies across disciplines have been conducted to understand the attitudes on privacy and data privacy protection laws in jurisdictions worldwide. A few findings relevant to the present work are enumerated here, (a) Level of concern for privacy of SNSs’ users is associated with their level information disclosure [8]. (b) SNSs’ users change their default settings as per their awareness of privacy setting and need [9, 10]. (c) Greater information disclosure by SNSs improves perceived trust of SNSs’ users [11]. (d) In protecting privacy of SNSs’ users, Privacy Policies of SNSs helps [12]. (e) Disclosure of personal information on SNSs act as a bargaining process where privacy is outweighed by perceived benefits and gratifications of networking [13]. (f) Privacy concern of SNSs’ users improves with enhanced knowledge and experience of using the Internet [14]. (g) SNSs’ user’s privacy behavior is influenced by demographic factors [15]. In India, present authors, as part of larger study, have explored the attitude of Indian users of social networking sites with regard to trends in privacy behavior and thought process on need for a data privacy law in India [15–17]. The present work is part of this larger study.
Reality Check in Virtual Space for Privacy Behavior of Indian Users
61
3 The Research Methodology The population for the present study is the users of the SNSs in India grouped into five strata, namely, Law Enforcement Officers, Judicial and Legal Professionals, Academicians, Information Assurance and Privacy Experts and the Internet Users (other than listed in strata above) in India adopting disproportionate, stratified, purposive, convenience mixed sampling technique, and a statistically adequate sample size of 385 having 95% Confidence Level, 5% Margin of Error (Confidence Interval), 0.5 Standard Deviation and 1.96 Z-score was calculated. A questionnaire was designed for this study by incorporating modified questions based on the Eurobarometer [18] and modified in Indian context and limited to the objectives of the present study. The variables included in the tool can be categorized as nominal and ordinal variables. A pilot study was conducted and reliability of instrument was checked by running reliability analysis which returned a Cronbach Alpha value of 0.700 and modified to adjust the scale and a Cronbach Alpha value of 0.795 was obtained which is well within the acceptable norms ( −1, ν > −1 are adjustable parameters controlling the shape of polynomials. The discrete Hahn polynomials satisfy the orthogonal condition N −1
ρp (a)h(μ,ν) (a, N )h(μ,ν) (a, N ) = d2p δpq p q
(4)
a=0
where ρs (a) is so-called weighting function which is given by ρs (a) =
1 a!(a + μ)!Γ (N + ν − a)Γ (N − s − a)
(5)
and the square norm d2s has the expression d2s =
Γ (2N + μ + ν − s) s!(2N + μ + ν − 2s − 1)Γ (N + μ + ν − s)Γ (N + ν − s)Γ (N − s)
(6)
94
S. F. Pratama et al.
To avoid numerical fluctuations in moment computation, the Hahn polynomials are usually scaled by utilizing the square norm and the weighting function, such that ρs (a) (μ,ν) (μ,ν) ˜ (a, N ) = hs (a, N ) (7) h s d2s Therefore, the orthogonality of normalized Hahn polynomials can be described as N −1 ˜ (μ,ν) (a, N ) = δpq ˜ (μ,ν) (a, N )h (8) h p q a=0
Given a digitalized image f (x, y) with size N × N , the (p + q)th order of Hahn moment of image is Hpq =
N −1 N −1
˜ (μ,ν) (y, N )f (x, y) ˜ (μ,ν) (x, N )h h p q
(9)
x=0 y=0
This study proposes the extension of Hahn moments for 3D images. The proposed 3D Hahn moments is adopting the generalization of n-dimensional moments on a cube [31,32], and defined as Hpqr =
−1 N −1 N −1 N
˜ (μ,ν) (y, N )h ˜ (μ,ν) (z, N )f (x, y, z) ˜ (μ,ν) (x, N )h h p q r
(10)
x=0 y=0 z=0
3
Experimental Setup
With the goal stated in the section above, an empirical comparative study must be designed and conducted extensively and rigorously. A detailed description of the experimental method is provided in this section. 3.1
Dataset Collection
This section describes the process of transforming molecular structure of ATS drug into 2D and 3D computational data representation, as outlined in [44]. ATS dataset used in this study comes from [1], while 60 non-ATS drug molecular structures are randomly collected from [45]. After the voxel data has been generated, 3D geometric, complex, Legendre, Zernike, and Hahn moments are calculated up to 8th order, which produces 165 features. While the features of 3D geometric, Legendre, and Hahn moments are real numbers, 3D complex and Zernike moments on the other hand are complex numbers. Therefore, these complex numbers must be transformed into real numbers, because most of pattern recognition tasks only capable to handle real numbers. Ref. [46] proposed a method which consists of four techniques to represent complex number as a real number, and it is found Cartesian bit interleaved as the best representation technique. The value of the zeroth-order moments of ecstasy for each 3D moments represented using Cartesian bit interleaved are shown in Table 1.
Using 3D Hahn Moments as a Computational Representation of ATS
95
Table 1. Cartesian bit interleaved values of zeroth-order moments of ecstasy for each 3D moments. 3D moments Original number
Represented number
Geometric
306425
42545721700200699567041133799352041472
Complex
16130711836.218561
42576847550484374798153183560267891362
Legendre
0.000285380519926548 14175173924443230618113893434503725056
Zernike
7708.229987404831
42538108148786362155157822007266511528
Hahn
0.12138471769954105
14177782865744079550609449631697511082
3.2
Operational Procedure
The traditional framework of pattern recognition tasks, which are pre-processing, feature extraction, and classification, will be employed in this paper. Therefore, this paper will compare the performance of existing and proposed 3D moments. All extracted instances were tested using training and testing dataset discussed earlier for its processing time, memory consumption, intra- and inter-class variance, and classification of drug molecular structure using leave-one-out classification model, all of which was executed for 50 times. To justify the quality of features from each moments technique in terms of intra- and inter-class variance, the quartile coefficient of dispersion (QCD) of normalized median absolute deviation (NMAD) is employed. The intra- and inter-class variance is a popular choice of measuring the similarity or dissimilarity of a representation technique [47]. The QCD measures dispersion and is used to make comparisons within and between data sets [48]. Meanwhile, the median absolute deviation (MAD) is a robust alternative to standard deviation as it is not affected the outliers [49]. However, the MAD may be different across different instances, therefore it should be normalized to the original ith feature to achieve consistency for different data, such that NMADi =
MADi xi
(11)
In this study, the intra-class variance is defined as the QCD of NMAD for the ith feature of a molecular structure compared against intra-class molecular structures, and inter-class variance is defined as the QCD of NMAD for the ith feature compared against inter-class molecular structures. On the other hand, the features are tested in terms of classification accuracy against well-known classifier, Random Forest (RF) [50] from WEKA Machine Learning package [51]. RF is employed in this study, because previous studies conducted by [35,52,53] have found that RF is the most suitable for the molecular structure data. In this study, the number of trees employed by RF is 165, equals to the number of attributes of all 3D moments.
96
4
S. F. Pratama et al.
Results and Discussion
The existing and proposed 3D moments will be evaluated numerically in this section to evaluate their merit and quality in representing molecular structure. Table 2 presents the average of processing time, memory consumption, the intraclass variance ratio relative to the total number of features, and the average of classification accuracy from 50 executions. Table 2. Average of processing time, memory consumption, and intra-class variance ratio of 3D moments. 3D moments Processing time (ns/voxel) Geometric
Memory consumption (bytes/ voxel)
Intra-class variance ratio
Classification accuracy
2
425
92.37%
64.52%
Complex
39
841
77.58%
60.43%
Legendre
16
1195
67.88%
73.50%
Zernike
59
4405
64.24%
71.58%
201
1433
2.42%
62.90%
Hahn
The results presented in Table 2 show that 3D Hahn performs slowest and has lowest value of intra-class variance ratio among other 3D moments, although it requires less memory than Zernike moments and its classification accuracy is higher than 3D complex moments. The slow performance of 3D Hahn is attributed to its polynomial computation which is time consuming. Since the classification accuracy is the primary interest of this study, it should also be validated statistically. Prior to performing the statistical validation, the classification accuracy results should be tested for normality. If the results are normally distributed, parametric tests, such as ANOVA [54], can be used to validate the classification accuracy, otherwise, non-parametric tests should be used instead. In this study, the normality of the classification accuracy is tested using Shapiro–Wilk test of normality [55]. The result of the test of normality is presented in Table 3, which shown that the classification accuracy for all 3D moments are normally distributed, since the p value of the Shapiro–Wilk test is greater the 0.05. Table 3. Tests of normality results. 3D moments Statistic df Sig. (p) Geometric
0.975
50 0.376
Complex
0.978
50 0.480
Legendre
0.973
50 0.305
Zernike
0.975
50 0.353
Hahn
0.960
50 0.092
Using 3D Hahn Moments as a Computational Representation of ATS
97
However, based on the results shown in Table 4, there are no homogeneity of variances between groups of 3D moments (p ≤ 0.05), therefore the assumption of ANOVA has been violated and the robust tests of equality of means must be used instead. From the results shown in Table 5, there is a statistically significant effect of classification accuracy [F (4, 122.013) = 409.479, p = 0] at the p < 0.05 level. Post-hoc comparisons using the Games–Howell test [56] shown in Table 6 indicated that the mean score for classification accuracy of 3D Hahn moments (62.90% ± 0.256%) was statistically significantly worse than other candidates (p < 0.05), except with 3D complex moments (60.43% ± 0.289%, p = 0). Table 4. Test of homogeneity of variances results. Levene statistic df 1 df 2 Sig. 2.918
4
245 0.022
Table 5. Robust tests of equality of means results. Statistica df 1 df 2
Sig.
Welch 409.479 4 122.013 0 a. Asymptotically F distributed.
Table 6. Post-hoc test results using Games–Howell tests for 3D Hahn moments vs. other 3D moments. Opposing 3D moments
Mean difference
Std. error Sig.
95% confidence interval
Geometric
−.01617
0.0046
0.01 −0.029
Complex
.02467
0.0039
0
0.0139
0.0354
Legendre
−.10600
0.0037
0
−0.1162
−0.0958
Zernike
−.08683
0.0039
0
−0.0978
−0.0759
Lower bound Upper bound −0.0033
Despite providing a not too high performance, this study nevertheless proposes a new 3D moments technique and shows that the proposed 3D Hahn possesses certain potentials to be explored in the future, most notably on its invariance properties.
98
5
S. F. Pratama et al.
Conclusion
A new 3D moments technique to represent ATS drug molecular structure has been proposed and the extensive comparative study to the existing 3D moments has been presented in this paper, namely 3D Hahn moments. Despite the experiments have shown that the proposed technique performs rather unexceptionally compared to existing 3D moments in terms of processing time, memory consumption, intra- and inter-class variance, and more importantly, classification accuracy, this study nonetheless serves as a basis towards a better 3D molecular structure representation, especially on using continuous orthogonal moments defined on a cube. Hence, future works to extend the proposed technique so that it has invariance properties, as well as better representing the molecular structure based on this preliminary study are required. The proposed feature extraction technique will be further validated in the future works using specifically-tailored classifiers for drug shape representation. Furthermore, ATS drug molecular structure data from National Poison Centre, Malaysia, will also be used as additional dataset in the future works.
References 1. United Nations Office of Drugs and Crime: Recommended methods for the identification and analysis of amphetamine. Methamphetamine and their ring-substituted analogues in seized materials. UNODC, New York, USA (2006) 2. United Nations Office on Drugs and Crime: World drug report 2016. UNODC, Vienna (2016) 3. Cary, P.L.: Designer drugs: what drug court practitioners need to know. Drug Court Pract. Fact Sheet IX, 1–13 (2014) 4. Swortwood, M.J.: Comprehensive forensic toxicological analysis of designer drugs. Doctor of Philosophy, Florida International University, Florida, USA (2013) 5. Smith, M.C.F.: But what of designer drugs? Adv. Psychiatr. Treat. 17, 158 (2011) 6. Krasowski, M.D., Ekins, S.: Using cheminformatics to predict cross reactivity of “designer drugs” to their currently available immunoassays. J. Cheminform. 6, 22 (2014) 7. Petrie, M., Lynch, K.L., Ekins, S., Chang, J.S., Goetz, R.J., Wu, A.H., Krasowski, M.D.: Cross-reactivity studies and predictive modeling of “Bath Salts” and other amphetamine-type stimulants with amphetamine screening immunoassays. Clin. Toxicol (Phila) 51, 83–91 (2013) 8. Amine, A., Elberrichi, Z., Simonet, M., Rahmouni, A.: A hybrid approach based on self-organizing neural networks and the k-nearest neighbors method to study molecular similarity. Int. J. Chemoinform. Chem. Eng. 1, 75–95 (2011) 9. Martin, Y.C., Kofron, J.L., Traphagen, L.M.: Do structurally similar molecules have similar biological activity? J. Med. Chem. 45, 4350–4358 (2002) 10. Bender, A., Glen, R.C.: Molecular similarity: a key technique in molecular informatics. Org. Biomol. Chem. 2, 3204–3218 (2004) 11. Bender, A.: Studies on molecular similarity, p. 182. Darwin College, Doctor of Philosophy, University of Cambridge, Cambridge, UK (2005)
Using 3D Hahn Moments as a Computational Representation of ATS
99
12. Consonni, V., Todeschini, R.: Molecular descriptors. In: Puzyn, T., Leszczynski, J., Cronin, T.M. (eds.) Recent Advances in QSAR Studies: Methods and Applications, pp. 29–102. Springer, Dordrecht (2010) 13. Axenopoulos, A., Daras, P., Papadopoulos, G., Houstis, E.N.: A shape descriptor for fast complementarity matching in molecular docking. IEEE/ACM Trans. Comput. Biol. Bioinform. 8, 1441–1457 (2011) 14. Estrada, E.: Generalized graph matrix, graph geometry, quantum chemistry, and optimal description of physicochemical properties. J. Phys. Chem. A 107, 7482– 7489 (2003) 15. Kortagere, S., Krasowski, M.D., Ekins, S.: The importance of discerning shape in molecular pharmacology. Trends Pharmacol. Sci. 30, 138–147 (2009) 16. de Oteyza, D.G., Gorman, P., Chen, Y.C., Wickenburg, S., Riss, A., Mowbray, D.J., Etkin, G., Pedramrazi, Z., Tsai, H.Z., Rubio, A., Crommie, M.F., Fischer, F.R.: Direct imaging of covalent bond structure in single-molecule chemical reactions. Science 340, 1434–1437 (2013) 17. Gross, L., Mohn, F., Moll, N., Schuler, B., Criado, A., Guitian, E., Pena, D., Gourdon, A., Meyer, G.: Bond-order discrimination by atomic force microscopy. Science 337, 1326–1329 (2012) 18. http://www.moleculardescriptors.eu/tutorials/T3 moleculardescriptors requirements.pdf 19. Randi´c, M.: Molecular bonding profiles. J. Math. Chem. 19, 375–392 (1996) 20. Sun, Y., Liu, W., Wang, Y.: United moment invariants for shape discrimination. In: International Conference on Robotics, Intelligent Systems and Signal Processing, pp. 88–93. IEEE (2003) 21. Kihara, D., Sael, L., Chikhi, R., Esquivel-Rodriguez, J.: Molecular surface representation using 3D Zernike descriptors for protein shape comparison and docking. Curr. Protein Pept. Sci. 12, 520–530 (2011) 22. Sael, L., Li, B., La, D., Fang, Y., Ramani, K., Rustamov, R., Kihara, D.: Fast protein tertiary structure retrieval based on global surface shape similarity. Proteins 72, 1259–1273 (2008) 23. Xu, D., Li, H.: Geometric moment invariants. Pattern Recogn. 41, 240–249 (2008) 24. Mezey, P.G.: Shape-similarity measures for molecular bodies: a three-dimensional topological approach to quantitative shape-activity relations. J. Chem. Inf. Comput. Sci. 32, 650–656 (1992) 25. Zhang, D., Lu, G.: Shape-based image retrieval using generic fourier descriptor. Sig. Process. Image Commun. 17, 825–848 (2002) 26. Muda, A.K.: Authorship invarianceness for writer identification using invariant discretization and modified immune classifier. Doctor of Philosophy, Universiti Teknologi Malaysia, Johor, Malaysia (2009) 27. Mezey, P.G.: Theorems on molecular shape-similarity descriptors: external TPlasters and interior T-Aggregates. J. Chem. Inf. Comput. Sci. 36, 1076–1081 (1996) 28. Liao, S.X.: Image analysis by moment. Doctor of Philosophy, University of Manitoba, Manitoba, Canada (1993) 29. Hu, M.-K.: Visual pattern recognition by moment invariants. IRE Trans. Inf. Theory 8, 179–187 (1962) 30. Alt, F.L.: Digital pattern recognition by moments. J. ACM 9, 240–258 (1962) 31. Flusser, J., Suk, T., Zitov´ a, B.: Moments and Moment Invariants in Pattern Recognition. Wiley, West Sussex (2009) 32. Flusser, J., Suk, T., Zitov´ a, B.: 2D and 3D Image Analysis by Moments. Wiley, West Sussex (2016)
100
S. F. Pratama et al.
33. Liao, S.X., Pawlak, M.: On image analysis by moments. IEEE Trans. Pattern Anal. Mach. Intell. 18, 254–266 (1996) 34. Pawlak, M.: Image analysis by moments: reconstruction and computational aspects. Oficyna Wydawnicza Politechniki Wroclawskiej, Wroclaw, Poland (2006) 35. Pratama, S.F., Muda, A.K., Choo, Y.-H., Abraham, A.: 3D geometric moment invariants for ATS drugs identification: a more precise approximation. In: Abraham, A., Haqiq, A., Alimi, A.M., Mezzour, G., Rokbani, N., Muda, A.K. (eds.) Proceedings of the 16th International Conference on Hybrid Intelligent Systems (HIS 2016), pp. 124–133. Springer, Cham (2017) 36. Abu-Mostafa, Y.S., Psaltis, D.: Recognitive aspects of moment invariants. IEEE Trans. Pattern Anal. Mach. Intell. PAMI 6, 698–706 (1984) 37. Flusser, J.: On the independence of rotation moment invariants. Pattern Recogn. 33, 1405–1410 (2000) 38. Teague, M.R.: Image analysis via the general theory of moments*. J. Opt. Soc. Am. 70, 920 (1980) 39. Teh, C.H., Chin, R.T.: On image analysis by the methods of moments. IEEE Trans. Pattern Anal. Mach. Intell. 10, 496–513 (1988) 40. Zhou, J., Shu, H., Zhu, H., Toumoulin, C., Luo, L.: Image analysis by discrete orthogonal Hahn moments. In: Kamel, M., Campilho, A. (eds.) Image Analysis and Recognition, vol. 3656, pp. 524–531. Springer, Berlin, Heidelberg (2005) 41. Mukundan, R., Ong, S.H., Lee, P.A.: Image analysis by Tchebichef moments. IEEE Trans. Image Process. 10, 1357–1364 (2001) 42. Yap, P.-T., Paramesran, R., Ong, S.-H.: Image analysis by Krawtchouk moment. IEEE Trans. Image Process. 12, 1367–1377 (2003) 43. Yap, P.T., Paramesran, R., Ong, S.H.: Image analysis using Hahn moments. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2057–2062 (2007) 44. Pratama, S.F., Muda, A.K., Choo, Y.-H., Carb´ o-Dorca, R., Abraham, A.: Preparation of translated, scaled, and rotated ATS drugs 3D molecular structure for the validation of 3D moment invariants-based molecular descriptors. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 10, 57–67 (2018) 45. http://www.chemspider.com/ 46. Pratama, S.F., Muda, A.K., Choo, Y.-H.: Arbitrarily substantial number representation for complex number. J. Telecommun. Electron. Comput. Eng. 10, 23–26 (2018) 47. He, Z., Youb, X., Tang, Y.-Y.: Writer identification using global wavelet-based features. Neurocomputing 71, 1831–1841 (2008) 48. Bonett, D.G.: Confidence interval for a coefficient of quartile variation. Comput. Stat. Data Anal. 50, 2953–2957 (2006) 49. Rousseeuw, P.J., Croux, C.: Alternatives to the median absolute deviation. J. Am. Stat. Assoc. 88, 1273–1283 (1993) 50. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001) 51. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009) 52. Pratama, S.F., Muda, A.K., Choo, Y.-H., Abraham, A.: Exact computation of 3D geometric moment invariants for ATS drugs identification. In: Sn´ aˇsel, V., Abraham, A., Kr˜ omer, P., Pant, M., Muda, A.K. (eds.) Innovations in Bio-Inspired Computing and Applications, vol. 424, pp. 347–358. Springer, Cham (2016) 53. Pratama, S.F., Muda, N.A., Salim, F.: Representing ATS drugs molecular structure using 3D orthogonal fourier-mellin moments. Int. J. Comput. Inf. Syst. Ind. Manag. Appl. 9, 135–144 (2017)
Using 3D Hahn Moments as a Computational Representation of ATS
101
54. Fisher, R.A.: Statistical Methods for Research Workers. Hafner Pub. Co., New York (1970) 55. Shapiro, S.S., Wilk, M.B.: An analysis of variance test for normality (complete samples). Biometrika 52, 591–611 (1965) 56. University of Dayton. http://academic.udayton.edu/gregelvers/psy216/spss/ 1wayanova.htm
Anomaly Detection Using Modified Differential Evolution: An Application to Banking and Insurance Gutha Jaya Krishna1,2 and Vadlamani Ravi1(B) 1
Center of Excellence in Analytics, Institute for Development and Research in Banking Technology, Hyderabad 500057, India [email protected] 2 School of Computer and Information Sciences, University of Hyderabad, Hyderabad 500046, India [email protected]
Abstract. We propose two Modified Differential Evolution driven subspace based optimization models for anomaly detection in customer credit card churn detection, automobile insurance fraud detection and customer credit card default detection. Sparsity coefficient is chosen as the objective function for discovering anomalies. Also, we employed an external performance measure as selection constraint, namely, precision multiplied by recall at every iteration after a pre-specified iteration count. The proposed technique outperformed a bunch of baseline algorithms for anomaly detection, for example, Local Outlier Factor, Angle based Outlier Detection, K-means, Partition Around Medoids and also the proposed model without invoking the external performance measure in terms of precision and Area Under ROC Curve (AUC) indicating that the proposed method a viable alternative for anomaly detection.
Keywords: Modified Differential Evolution Banking · Insurance
1
· Anomaly detection ·
Introduction
Anomaly is an atypical sample. Anomalies can be classified into the samples with an abnormal distribution or the samples caused because of a error. On the off chance, the data can be a combination of distributions or can be non-normal, which is valid most of the time and any anomaly detection method can find it hard to detect anomalies. When the dimensions increases, the data ends up to be sparse, and each sample seems to be a potential anomaly. Along these lines, authors in [2] proposed a subspace-based anomaly detection method, by employing the sparsity coefficient for discovering of anomalies in the data. Sparsity coefficient as in [12,13] and [2], was examined and chosen as the objective function. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 102–111, 2021. https://doi.org/10.1007/978-3-030-49345-5_11
AD Using MDE: An Application to Banking and Insurance
103
To search the combinatorial number of subspaces, [2] employed a Genetic Algorithm (GA) [7]. But, in the literature, there are more powerful and stable optimization techniques than GA [6]. Therefore, we propose a population based EC technique namely a modified version of Differential Evolution (DE) [21], which is given in [12], for anomaly detection in banking and insurance data. Apart for the above changes, we employed a new external performance measure i.e. we employed precision multiplied by recall for the evaluation of the detected anomalies within selection operation of Modified Differential Evolution, as the ground truth is available. In Sect. 2, we review the literature. Section 3 presents the anomaly detection procedure as well as the details of the optimization algorithm employed. In Sect. 4, description of the datasets analyzed is provided. In Sect. 5, The results and discussion are presented. Finally, Sect. 6 concludes the paper while proposing some future directions.
2
Related Work
In [5], GA was employed to create subsets of multi-case anomalies for the continuous regression data. It uses mean squared error as the objective for multi-case anomaly detection. Also, in [23], GA was employed for both for anomaly detection and feature detection in linear regression models by finding various potential groupings of anomalies and non-anomalies. In [17], particle swarm optimization (PSO) based methodology for anomaly discovery was proposed which dynamically finds the separation measure rather than manual setting of separation measures by experimentation. This strategy was contrasted with Local Outlier Factor (LOF). In [16], a method was proposed to discover least non-reducts based on rough sets by employing PSO. In [15], to recognize anomalies, a hybrid method self-organizing maps (SOM) and PSO was proposed which performed better than the independent SOM and PSO. In [10], a local search based anomaly detection method was developed for categorical data employing entropy as the objective function and contrasted its outcomes with that of LOF and K-Nearest Neighbor. In [14], a non-parametric kernel density technique was proposed to recognize anomalies, and this was contrasted with LOF and Local Correlation Integral (LOCI). In [2,8,12], anomaly detection with sparsity coefficient is specified. Ref. [26] reviews different numerical anomaly detection algorithms. In [26], different estimates like hubness, combinatorial explosion, recognizing relevant features, and so forth are described as challenges confronted while distinguishing anomalies in high dimensions. In [4], also reviews different anomaly discovery strategies and also the possible applications of these anomaly discovery strategies. Our current work is conspicuously different from our previous works [12,13] in that we now include a selection constraint namely the product of precision and recall. The datasets analyzed in the work, i.e. churn, insurance fraud and credit card default are described in [22,25], and they are used in classification setting. But, here these datasets are used in clustering setting with the assumption that the
104
G. Jaya Krishna and V. Ravi
minority class samples are outliers. When we look at the results of the classifiers in [22,25], apart from under-sampling, they are marginal as the features are highly indiscriminative and achieving the same results or an improvement proves the efficiency of the subspace clustering we employed.
3
Proposed Approach
Our proposed methodology for anomaly detection employing the MDE-based subspace detection, is illustrated in Fig. 1. Our assumption is that minority class samples are anomalies.
Fig. 1. Proposed approach
AD Using MDE: An Application to Banking and Insurance
3.1
105
Optimization Problem Formulation
Sparsity coefficient-based subspace detection is a extended variant of Z-score [20], to distinguish anomalies. z − score =
x−μ σ
Objective function, i.e. sparsity coefficient the normal distribution approximated by binomial distribution with Bernoulli trails. The assumption will be that data is non-normal. Therefore, normal distribution approximated by binomial distribution with Bernoulli trails is considered [18]. y=
X − np np (1 − p)
Where ‘X’ = binomial distribution, ‘n’=population size,‘p’ = probability of success. ‘y’ has mean as ‘np’ and standard deviation as np(1 − p). To detect the anomalies for non-normal data, as per the heuristics of the Chebyshev’s inequality any example falling outside the [μ − 3σ, μ + 3σ] is treated as a anomaly. As indicated by the Chebyshev’s inequality, 89% of the information falls inside the window of μ − 3σ and μ + 3σ. N (D) − (N ∗ f d ) S (D) = N ∗ f d ∗ (1 − f d ) S(D) = Sparsity Coefficient, N(D) = No. of samples in d-dimensional space, N = total no. of samples, f = fraction of observations in subspace f = ( ∅1 ). Here S(D), which is to be minimised, for the M-dimensional dataset, (where ‘M’ is the number of dimensions of the dataset) by producing d-dimensional subspaces of the data ranges defined by the parameter ∅. M inimize S(D) M >0 0 ≤x≤∅ d = 2, 3, . . . , df ixed When the subspace is sparse N(D) tends to 0, Therefore: N S (0) = − d ∅ −1 When a dataset is non-normal, a sample is an anomaly, when it falls out of the limits of μ − kσ and μ + kσ, where k = 3 is the number of standard deviations ‘σ’ from mean ‘μ’. By keeping S(0) = k, we derive the equation below to fix the upper bound for the ‘d’. N d = log ∅ +1 k2
106
G. Jaya Krishna and V. Ravi
Here each dimension means a feature of the dataset. On the off chance when xi = 0, at that point that feature data range isn’t considered for computation of S(D). Prior to optimization, we need keep record of the indices of samples and sort the individual features in a non-decreasing manner. The sorted individual feature values are divided into ‘∅’ equal parts. N(D) is the intersection of indices of various feature parts of the d-dimensional subspace. We should begin by setting ‘k’ as per need and afterward start picking ‘∅’ by trial and error, with substitution in above stated equation for ‘d’, which should not put ‘d’ below 2. Further, substituting the k in S(D) (for N(D) = 1) should make S(D) above ‘k’. The general process of optimization is described below. Optimization techniques make a new vector and the new vector can be repaired to meet the constraints of the problem. Utilize this vector to calculate the objective function i.e. sparsity coefficient. In the event that the objective is better than the rest of the population, update it into the population. Repeat until the stopping criteria is met. 3.2
Modified Differential Evolution
MDE is employed to search the sparse subspaces with sparsity coefficient as the objective function [12]. Crossover rate and mutation in MDE carry out the exploration and exploitation of sparse subspaces. Along these lines, tuning of CR is a very significant. The MDE has a basic improvement in selection. MDE replaces the new solution generated with the worst member in the population, rather than the current member as in DE. A. Algorithm for MDE-Based Subspace Detection 1. Create the initial population of solutions randomly. 2. Repeat until standard deviation of the objective function values in the DE population is below a fixed threshold. Repeat steps 3 to 7. 3. Pick three solution vectors randomly p1 , p2 , p3 . 4. Pick a number ‘R’ randomly from 1 to M. 5. Compute potential new position y = [y 1 , y 2 , ..., y m ] by the following procedure: (a) For each ‘i’ generate ri a number randomly between 0 and 1. (b) If r i < CR or R = i then set y i = p 1 + F × (p2 − p3 ) else y i = xi . 6. Compute the fitness(y) using Sparsity Coefficient 7. If fitness(y) < fitness(xworst ) then replace the worst solution in the population with the solution of y. Here fitness function is sparsity coefficient. (a) Employ an external performance measure as selection constraint, namely, precision multiplied by recall at every iteration after a pre-specified iteration count of 2000. Take a snapshot of the population and update the current value of the external performance measure.
AD Using MDE: An Application to Banking and Insurance
107
where ‘CR’ is the DE parameter called crossover rate, which is between ‘0’ and ‘1’. Additionally, ‘F’ is another DE parameter called differential weight, which is between ‘0’ and ‘2’. Nonetheless, here we fixed F to be ‘1’ since it is an integer optimization problem. Standard deviation of the objective function values in the DE population below a fixed threshold is taken as the stopping criteria for DE.
4 4.1
Description of Datasets Churn Dataset
The churn dataset considered in the current work is from [22]. The dataset contains information of credit card churn of a Latin-American bank. The dataset comprises of two classes namely churners versus non-churners. The dataset contains of 21 explanatory variables and a class or ground truth variable. Dataset comprises a total of 14,814 observations and of which 13,812 are non-churners and 1002 are churners. Therefore, 93.24% are non-churners and 6.76% are churners. 4.2
Insurance Fraud Dataset
The insurance fraud dataset considered in the current work is from [22]. This dataset fundamentally contains the data of different claims of automotive insurance from the years 1994 to 96. The dataset, is preprocessed to contain 24 explanatory variables and a class or ground truth variable. Dataset comprises a total of 15,420 observations and of which 14,497 are non-fraud and 923 are fraud. Therefore, 94% are non-fraud and 6% are fraud. 4.3
Default Dataset
The default dataset considered in the current work is from [24]. The dataset is composed from the credit card data of customers who defaulted and also who are non-defaulters in Taiwan. The dataset contains 23 explanatory variables and a class or ground truth variable. Dataset comprises a total of 30,000 observations and of which 23,364 are non-defaulters and 6636 are defaulters. Therefore, 77.88% are non-defaulters and 22.12% are defaulters.
5
Results and Discussion
The models were run on an Intel (R) Core (TM) i7-6700 processor with 32 GB RAM on Rstudio. R is used for writing codes for anomaly detection. The best result of 25 runs is chosen as the final result, as the proposed models are stochastic. To compare our proposed models with clustering techniques, we employed K-Means [9] and Partition Around Medoids (PAM) [19] on the datasets, which are the basic clustering techniques. To further strengthen our claim that the
108
G. Jaya Krishna and V. Ravi
proposed models perform better than some of the clustering based anomaly detection techniques, we also employed Angle Based Outlier Detection (ABOD) [11] and Local Outlier Factor (LOF) [3]. LOF is a close similar to subspace-based anomaly detection methods. In this paper, to evaluate our models, we selected precision multiplied by recall as our performance metric for the selection constraint. Apart from calculating precision multiplied by recall, we also computed AUC, but, not as selection constraint, to judge the models even more profoundly. Here, anomalies detected as anomalies is TP, non-anomalies detected as non-anomalies is TN, anomalies detected as non-anomalies is FN and non-anomalies detected as anomalies is FP. Sensitivity or Recall = Specif icity = P recision =
TP TP + FN
TN TN + FP TP TP + FP
Sensitivity + Specif icity 2 Tables 1, 2 and 3, present the results of the proposed models compared with some of the chosen state-of-the-art models on the three datasets. The selection constraint indeed gave superior results in case of the churn dataset when contrasted with the other proposed model without selection constraint and also with the chosen state-of-the-art models. But, the model with selection constraint performed marginally better than the other proposed model and numerically better than the chosen state-of-the-art models in the remaining two datasets i.e. insurance fraud and default datasets. The churn dataset yielded a precision of 20.11, recall of 50.89 and AUC of 0.681 with the selection constraint, which is the best when compared to the other two datasets. Also, the high amount of precision yielded by the proposed models convey that the false positive detected are low compared to the other competing models. Even though for the three datasets, the LOF gave better recall but, the precision and AUC are low compared to the proposed models. MDE driven subspace based anomaly detection models don’t use a distance measure in finding outliers. But, LOF based anomaly detection technique uses distance based density measurements which according to [1] have some shortcomings in high dimensional feature spaces. Table 4 presents the parameters of MDE-based subspace anomaly detection. MDE-based subspace anomaly detection has two explicit parameters namely ‘CR’ and ‘F’ and one implicit parameter namely ‘∅’ AU C =
AD Using MDE: An Application to Banking and Insurance
Table 1. Results of churn dataset Churn dataset Model
Precision Recall AUC
MDE based subspace detection with selection constraint
20.11
50.89 0.681
MDE based subspace detection
24.09
17.26
0.566
LOF [3]
6.23
77.84
0.464
K-Means [9]
1.18
1.89
0.452
PAM [19]
6.41
26.44
0.492
ABOD [11]
7.55
10.07
0.505
Table 2. Results of insurance fraud dataset Insurance fraud dataset Model
Precision Recall AUC
MDE based subspace detection with selection constraint
7.75
MDE based subspace detection
9.13
9.64
0.517
LOF [3]
6.09
76.59
0.507
K-Means [9]
3.44
36.51 0.544
0.003 0.498
PAM [19]
2.85
19.17
0.388
ABOD [11]
7.03
20.15
0.504
Table 3. Results of default dataset Default dataset Model
Precision Recall AUC
MDE based subspace detection with selection constraint
24.09
44
0.523
MDE based subspace detection
23.08
31.78
0.508
LOF [3]
21.69
78.22
0.49
K-Means [9]
22.25
85.08
0.503
PAM [19]
16.81
28.82
0.441
ABOD [11]
27.59
12.7
0.516
109
110
G. Jaya Krishna and V. Ravi Table 4. Parameters of MDE-based subspace anomaly detection
6
CR
F
∅
MDE parameters
Dataset
With selection constraint
Churn 0.5 Insurance fraud 0.1 Default 0.1
Without selection constraint
Churn 0.4 1.0 23 Insurance fraud 0.05 1.0 13 Default 0.3 1.0 33
1.0 23 1.0 13 1.0 33
Conclusion and Future Directions
In this paper, we proposed two MDE driven subspace based clustering models for anomaly detection in the context of churn detection, insurance fraud detection and default detection. The first model employs an external performance measure, namely precision multiplied by recall, as a selection constraint while the other model is without the selection constraint. In terms of AUC and precision, the proposed model with selection criterion outperformed the other competing models. In future, we can perform feature selection and under sampling to further improve the results. Further, we can employ better optimization algorithms and also improvise the operators that the optimization algorithms use to get better results than the proposed algorithms.
References 1. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the Surprising Behavior of Distance Metrics in High Dimensional Space, pp. 420–434. Springer, Heidelberg (2001) 2. Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: Sellis, T., Mehrotra, S. (eds.) Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data - SIGMOD 2001, vol. 30, pp. 37–46. ACM Press, Santa Barbara (2001) 3. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J., Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM SIGMOD International Conference on Management of Data, vol. 29, pp. 93–104. ACM Press, Dallas (2000) 4. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009) 5. Crawford, K.D., Wainwright, R.L.: Applying genetic algorithms to outlier detection. In: Eshelman, L.J. (ed.) Proceedings of the 6th International Conference on Genetic Algorithms, pp. 546–550. Morgan Kaufman, San Francisco (1995) 6. Das, S., Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evol. Comput. 15(1), 4–31 (2011) 7. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. Addison-Wesley Longman Publishing Co. Inc., Boston (1989) 8. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)
AD Using MDE: An Application to Banking and Insurance
111
9. Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979) 10. He, Z., Deng, S., Xu, X.: An optimization model for outlier detection in categorical data. In: Huang, D., Zhang, X., Huang, G. (eds.) International Conference on Intelligent Computing, pp. 400–409. Springer, Heidelberg (2005) 11. Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in highdimensional data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 444–452. ACM, New York (2008) 12. Krishna, G.J., Ravi, V.: Outlier detection using evolutionary computing. In: Proceedings of the International Conference on Informatics and Analytics (ICIA), pp. 1–6. ACM Press, Pudicherry (2016) 13. Krishna, G.J., Ravi, V.: Keystroke based user authentication using modified differential evolution. In: Proceedings of the TENCON 2019 (accepted). IEEE, Kochi (2019) 14. Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier detection with kernel density functions. In: Perner, P. (ed.) Machine Learning and Data Mining in Pattern Recognition, pp. 61–75. Springer, Heidelberg (2007) 15. Lotfi-Shahreza, M., Moazzami, D., Moshiri, B., Delavar, M.: Anomaly detection using a self-organizing map and particle swarm optimization. Scientia Iranica 18(6), 1460–1468 (2011) 16. Misinem, A., Bakar, A.A., Hamdan, A.R., Nazri, M.Z.A.: A Rough set outlier detection based on particle swarm optimization. In: 2010 10th International Conference on Intelligent Systems Design and Applications, pp. 1021–1025. IEEE, Cairo (November 2010) 17. Mohemmed, A.W., Zhang, M., Browne, W.N.: Particle swarm optimization for outlier detection. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation - GECCO 2010, p. 83. ACM Press, New York (2010) 18. Papoulis, A., Pillai, S.U.: Probability, Random Variables, and Stochastic Processes, 4th edn. McGraw Hill, Boston (2002) 19. Reynolds, A.P., Richards, G., de la Iglesia, B., Rayward-Smith, V.J.: Clustering rules: a comparison of partitioning and hierarchical clustering algorithms. J. Math. Model. Algorithms 5(4), 475–504 (2006) 20. Shiffler, R.E.: Maximum Z scores and outliers. Am. Stat. 42(1), 79–80 (1988) 21. Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11, 341–359 (1997) 22. Sundarkumar, G.G., Ravi, V.: A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. Eng. Appl. Artif. Intell. 37, 368–377 (2015) 23. Tolvi, J.: Genetic algorithms for outlier detection and variable selection in linear regression models. Soft Comput. 8(8), 527–533 (2004) 24. Yeh, I.C.: UCI machine learning repository: credit card default dataset (2016). https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients 25. Yeh, I.C., Hui Lien, C.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009) 26. Zimek, A., Schubert, E., Kriegel, H.P.: A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Anal. Data Min. 5(5), 363–387 (2012)
Deep Quantile Regression Based Wind Generation and Demand Forecasts N. Kirthika1(&), K. I. Ramachandran2, and Sasi K. Kottayil1 1
2
Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India {n_kirthika,kk_sasi}@cb.amrita.edu Center for Computational Engineering and Networking, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India [email protected]
Abstract. The widespread attention in the growth of clean energy for electricity production necessitates an accurate and reliable generation and demand forecasts. However, the decision-making process in electric power industry involves more uncertainty due to the transition towards distributed energy systems, which are not addressed in the conventional point forecasts. This paper proposes a probabilistic method termed as Deep Quantile Regression (DQR) for the construction of prediction intervals (PIs) that can potentially quantify uncertainty in the point forecasts of wind power generation and demand. The effectiveness of DQR is examined using the low and high seasonal wind and demand datasets. PIs with various confidence levels of 99%, 95% and 90% are estimated by constructing the appropriate quantiles using the proposed DQR method. The quantitative comparison of the quality in all the estimated PIs using the proposed method proves to outperform the other state-of-the-art methods. Keywords: Wind forecasting Demand forecasting Regression Uncertainty analysis
Deep Quantile
1 Introduction Forecasting electricity generation and demand play a significant role in recent years in a wide range of planning and operation tasks on power grid owing to the substantial growth and penetration of renewable energy sources [1–3]. Numerous research studies have been carried out in search of newer models of forecasts of wind energy generation and demand. The conventionally used forecast models can be divided into two major categories physical and statistical [4, 5]. Physical models are numerical weather predictions and mesoscale models that often use meteorological data and physical laws governing atmospheric conditions for the forecasts [6]. Statistical models (also called as datadriven models) use historic datasets to forecast the future power generation which are proven to be more appropriate for short term forecasting [7]. The published literature provides a variety of linear and non-linear techniques applied as statistical models for wind energy generation and demand forecasting applications. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 112–122, 2021. https://doi.org/10.1007/978-3-030-49345-5_12
Deep Quantile Regression Based Wind Generation and Demand Forecasts
113
Despite the advances in forecast technology, there are ample cases of large forecast errors which could not be thoroughly mitigated [8, 9]. In the scenario of smart grid with huge penetration of wind energy, even minor errors may threaten the reliability of power system operation. Therefore, appropriate quantification of uncertainty in the forecast of wind generation and demand become a prerequisite for the viable operation of the power grid in future [10]. In system operators’ viewpoint, the forecast accuracy is vital to be specified, which serves as a measure to indicate the extent of uncertainty in point forecasts. A particular interest in the construction of prediction interval (PI) can properly quantify the associated forecast uncertainties. PI is an interval estimate in which the future observation will fall with the preset probability referred to as confidence level computed using probabilistic methods [11]. The width and sharpness of PIs contribute to unveil the valuable information on the level of uncertainty and risk in point forecasts. A thorough review of probabilistic forecast methods is conducted in [12]. Some of the other methods used for probabilistic forecasting include Bootstrap method [13], Hybrid Intelligent Algorithm (HIA) approach [14], Particle Swarm Optimization method (PSO) based lower upper bound estimation (LUBE), Auto-Regressive Integrated Moving Average (ARIMA), exponential smoothing (ES) and Naïve [15]. Also, the time series regression models [16] and neural network models [17] have gained wide popularity due to ease of implementation and less intensive computations. However with the advent of deep learning methods, the conventional neural network models perform the learning process with many layers of non-linear processing unit [18]. This paper addresses the estimation of PIs with the implementation of Quantile Regression (QR) method using Deep Neural Networks (DNN) termed as Deep Quantile Regression (DQR). This newly proposed method with dense layers is capable of learning all the vital information available in the point forecasts. The implementation of DQR on the wind power generation and demand datasets resulted in reliable and sharp quantiles which are the prime features necessary for the construction of quality PIs. The rest of the paper is structured as follows. Section 2 explains the formulation of PIs using the QR based on DNN. The PI evaluation metrics are provided in Sect. 3. The results and discussion of the case studies conducted using the proposed probabilistic model is presented in Sect. 4. Finally, Sect. 5 concludes the paper.
2 Formulation of Prediction Intervals 2.1
Quantile Regression Method
The quantile regression estimates the parameters by minimizing the sum of absolute deviances by the asymmetric weight minimization. For a random variable Y described by its cumulative distribution function, F ð yÞ ¼ PðY yÞ. The sth quantile of Y is given by,
114
N. Kirthika et al.
QðsÞ ¼ F 1 ðsÞ ¼ inf fy : F ð yÞ sg
ð1Þ
where, s (0, 1). The loss function with an indicator function I is defined as, Ls ð yÞ ¼ y s Iy\0
ð2Þ
A quantile in particular is found by minimizing the expected loss of Y d with respect to d, Z mind EðLs ðY dÞÞ ¼ mind ðs 1Þ
d
inf
Z ðy dÞdF ð yÞ þ s
inf d
ðy dÞdF ð yÞ
ð3Þ
By setting the expected loss to zero and qs to be the solution, then Z 0 ¼ ð 1 sÞ
qs inf
Z dF ð yÞ s
inf
dF ð yÞ
ð4Þ
qs
F ð qs Þ ¼ s
ð5Þ
Thus qs is the sth quantile of the random variable. The general conditional quantile function for the sth quantile is QYjX ðsÞ ¼ Xbs . On solving, bs ¼ arg minb2R E ðLs ðY XbÞÞ
ð6Þ
The estimate of the parameter is therefore obtained as, c bs ¼ arg minb2R
XN i¼1
ðLs ðYi Xi bÞÞ
ð7Þ
As the estimated power is related to different uncertain factors, the quantile function can be described by, QðsÞ ¼ b0 þ b1 X1 þ . . . þ bn Xn þ e
ð8Þ
where, bi is the vector parameter, Xi is the input vector, n is the dimension of the input depending on the number of uncertain factors and e is the residue. 2.2
PI Estimation Using DQR
The training of QR in DNN is termed as DQR. The Tensorflow neural network building is used for the QR implementation. DNN is a multilayer network with multiple fully connected hidden layers and is capable of learning the information from input data [18]. A non-linearity is introduced in DNN using the activation function and feature mapping is done from bottom layer to top connected layer.
Deep Quantile Regression Based Wind Generation and Demand Forecasts
115
The major objective of DNN model is to reduce the error in output by adjusting the weights of neurons present in different layers using back propagation algorithm [19]. It uses stochastic gradient descent optimization approach for updating the weights. Evaluation of data is carried out in batches (called mini batch), rather than evaluating for the whole training set. For each set of input, the gradient descent algorithm determines the outputs and errors, thereby adjusting the weights accordingly. The errors from each batch are back propagated to update the weights of the connected network. The same process is repeated until the convergence is reached. In the final layer of the network, DNN uses softmax function to monitor the progress of learning process with the help of loss function. The loss function used in QR method, which is given in Eq. (2), is computed in DNN by taking the element-wise maximum of y s and y ðs 1Þ. Tensorflow building network, employed for the estimation of quantiles which are represented in the form of prediction intervals with the point forecast data, is shown in Fig. 1.
Fig. 1. DNN architecture
3 PI Evaluation Metrics In this study, three metrics are used for PI evaluation [20, 21]; the first two metrics describe the probability of coverage and width of PIs, while the third metric combines the aforementioned indices simultaneously to evaluate the PI. The major reliability and calibration feature of PIs is measured based on the coverage probability. PI coverage probability (PICP) is the fraction of target within the lower and upper bound of quantiles given in Eqs. (1) and (2) pi ¼
1; if Wact;i 2 ½li ; ui 0; otherwise
PICP ¼
1 XN p i¼1 i N
ð9Þ ð10Þ
116
N. Kirthika et al.
where, li and ui are the lower and upper quantiles of the ith PI respectively. Theoretically, PIs are valid and reliable only if PICP (1 − a)%. The sharpness of PIs is evaluated using PI normalized averaged width (PINAW). Sharper PIs are more informative than wider PIs. For the target range, R PINAW ¼
1 XN ð ui l i Þ i¼1 NR
ð11Þ
Both the sharpness and calibration of PIs are measured using a coverage widthbased criterion (CWC) defined as, CWC ¼ PINAW 1 þ cegðPICPlÞ c¼
0; PICP l 1; PICP \ l
ð12Þ ð13Þ
where, g and l parameters denote the severity of penalty. The detail description of CWC evaluation is dealt in [20, 21].
4 Case Studies and Results 4.1
Wind Data and Demand Data
The historic wind speed dataset of two Indian wind sites namely, Karungal and Kalimandayam obtained from National Institute of Wind Energy (NIWE), Chennai, India [22] and demand dataset from Southern Regional Load Despatch Center (SRLDC), Chennai, India [23] are used to validate the effectiveness of the proposed probabilistic forecast model. The two datasets from each wind site have high and low winds occurring during July and November, 2016, respectively. Similarly, the highest and lowest demand trends captured from SRLDC data during April and December, 2018, respectively are used as the demand datasets. All datasets are observed at five-minute intervals and simulated using Python 3 software. 4.2
Simulation Parameters
Table 1 summarizes the list of parameter values used in the simulation studies. PIs are constructed at confidence levels of 99%, 95% and 90% with a = 0.01, 0.05 and 0.1 respectively. While, l = (1 − a) % which is the same as the confidence level, η is set to be 50, for highly penalizing PIs with an undesirable PICP. The parameters used in DNN architecture are also provided.
Deep Quantile Regression Based Wind Generation and Demand Forecasts
117
Table 1. List of parameter values Parameter N a l η
4.3
Values Parameter 288 Number of hidden layers 0.01, 0.05, 0.1 Number of neurons 0.99, 0.95, 0.9 Batch size 50 Optimization algorithm Activation function Number of epochs
Values 2 32 32 Adam optimizer Rectified Linear Unit (ReLU) 5000
Results and Discussion
The day-ahead deterministic forecast results obtained from the ARIMA model [24, 25] applied on the VMD-DWT [26] de-noised historic wind speed and demand datasets, elucidated in Sect. 4.1, are employed in this study. The wind power generation of Gamesa G114-2.0 MW wind turbine generator (WTG) is forecasted based on its power curve [27] by this method. The resulted day-ahead forecasts of wind power and demand consisted of 288 data points, sampled on a five-minute interval. Table 1 shows the parameters employed by DQR described in Sect. 2.2 for the estimation of PIs. The 99%, 95% and 90% PIs are obtained by estimating the respective quantile pairs namely 99.5th–0.5th quantiles, 97.5th–2.5th quantiles and 95th and 5th quantiles. Figures 2, 3, 4 and 5 show the forecasted wind power (shown as red color dots) which is the input data points determined from the point forecasts for the proposed DQR probabilistic method. The actual wind power (shown as black color line) is the targets upon which the evaluation of PIs is done. The 99%, 95% and 90% PIs estimated from the quantile pairs for the different wind generation and demand datasets are presented.
Fig. 2. 99%, 95% and 90% PIs of Karungal wind site on July 5, 2016
118
N. Kirthika et al.
Fig. 3. 99%, 95% and 90% PIs of Kalimandayam wind site on November 5, 2016
Fig. 4. 99%, 95% and 90% PIs of SRLDC demand on April 5, 2018
Fig. 5. 99%, 95% and 90% PIs of SRLDC demand on December 5, 2018
Tables 2 and 3 summarize the quality of PIs using the PICP, PINAW and CWC performance indices conducted on the test datasets. PIs are reliable when their PICP is greater than the confidence level. Thus, the PICP values obtained by using the proposed method prove to be reliable for all the wind power and demand case studies, in spite of its high uncertainty in the data.
Deep Quantile Regression Based Wind Generation and Demand Forecasts
119
Table 2. Evaluation of the performance metrics for wind power datasets Wind power dataset
PI
Karungal - July 5, 2016
99% 95% 90% Karungal - November 5, 2016 99% 95% 90% Kalimandayam - July 5, 2016 99% 95% 90% Kalimandayam - November 5, 2016 99% 95% 90%
Performance index PICP% PINAW% 99.30 23.13 95.13 18.35 93.40 17.43 100 10.32 100 7.46 100 6.95 99.65 23.16 95.83 21.51 92.70 19.86 99.30 13.97 95.13 11.38 90.27 10.16
CWC% 23.13 18.35 17.43 10.32 7.46 6.95 23.16 21.51 19.86 13.97 11.38 10.16
Table 3. Evaluation of the performance metrics for demand datasets Demand dataset SRLDC- April 5, 2016
PI
99% 95% 90% SRLDC - December 5, 2016 99% 95% 90%
Performance index PICP% PINAW% 100 10.20 100 7.66 100 6.72 100 17.62 100 12.25 100 10.41
CWC% 10.20 7.66 6.72 17.62 12.25 10.41
Further, the PICP of Karungal dataset on 5th Nov, 2016 is showing the entire targets falling within the PIs. Also, the proposed method is evidently more suitable for the demand datasets compared to the wind generation datasets. The PINAW values for 99% PI are higher than the rest as these are wider compared to 95% and 90% PIs. The sharpness of PI is indicated by the lower value of PINAW, which provide more information compared to the wider PIs. As the PICP values in all the cases are greater than the confidence level, the value of PINAW is equal to the value of CWC and hence exempted from the penalty for unreliable PIs. 4.4
Comparison of DQR with the State-of-the-Art Methods
Performance metrics of other state-of-the-art forecasting methods have been collected from the literature and presented in Table 4 for the sake of comparison with the proposed DQR method; it shows that DQR has greater mean value of PICP. This implies that the proposed method is highly reliable than the other state-of-the-art methods. Also, the mean value of PINAW using DQR method is less compared to the
120
N. Kirthika et al.
other benchmark models. Thus, the estimation of narrow and sharper PIs is evident in DQR method compared to the other methods. Also, the values of CWC and PINAW are equal and thus exempted from the penalty. Therefore, the proposed method offers an appropriate and appreciable quantification of the uncertainties prevailing in wind generation and demand forecasting applications.
Table 4. Comparison results of the proposed method with the state-of-the-art methods Methods
PICP% 99% 95%
PINAW% 90% 99% 95% Bootstrap [13] 95.05 HIA [14] 98.48 95.50 90.80 PSO-based LUBE [15] 91.20 ARIMA [15] 91.16 ES [15] 91.38 Naïve [15] 90.14 Proposed method 99.70 97.68 96.06 16.40 13.10
CWC% 90% 99% 95% 28.55
90% 28.55
16.05 18.66 18.73 22.24 11.92 16.40 13.10
16.05 18.66 18.73 22.24 11.92
5 Conclusion Short-term renewable energy forecasting and demand forecasting are of prime importance in the operation of smart grids. The stochastic nature of weather conditions and the penetration of renewable energy into the grid create numerable uncertainties in the power system. These also pave way for numerous risks to be encountered by service providers and consumers in energy markets. To ameliorate the performance of deterministic forecasts, this paper employs a probabilistic method of forecasting the wind generation and demand that could handle such uncertainties. PIs are prominent means for the estimation of uncertainties connected with deterministic forecasts. The conventional approaches used for the construction of PIs endure several difficulties. The newly proposed DQR probabilistic method construct the PIs using the quantile regression method implemented using DNN. Advantages of using this new method are the deployment of DNN which can learn the high variant features from the raw input which is potentially useful for forecasting applications with wavering climatic conditions. The quantiles estimated are sharp and closer to the original wind power and demand input patterns, which reveal the quality of learning that is carried out by the DNNs. Comparative results show that DQR on demand datasets performs better than wind generation datasets. Yet, on comparison with the state-of-the-art methods, DQR shows high PICP and narrow PINAW values. Thus higher quality PIs are obtained using the proposed method that could aid in the decision making risk assessments processes in smart grids.
Deep Quantile Regression Based Wind Generation and Demand Forecasts
121
References 1. Hong, T., Fan, S.: Probabilistic electric load forecasting: a tutorial review. Int. J. Forecast. 32 (3), 914–938 (2016) 2. Jinhua, Z., Jie, Y., Wenjing, W., Yongqian, L.: Research on short-term forecasting and uncertainty of wind turbine power based on relevance vector machine. Energy Procedia 158, 229–236 (2019) 3. Xie, J., Hong, T., Laing, T., Kang, C.: On normality assumption in residual simulation for probabilistic load forecasting. IEEE Trans. Smart Grid 8(3), 1046–1053 (2017) 4. Foley, A.M., Leahy, P.G., Marvuglia, A., McKeogh, E.J.: Current methods and advances in forecasting of wind power generation. Renew. Energy 37(1), 1–8 (2012) 5. Kusiak, A., Zhang, Z.: Short-horizon prediction of wind power: a data-driven approach. IEEE Trans. Energy Convers. 25(4), 1112–1122 (2010) 6. Lange, M., Focken, U.: Physical Approach to Short-term Wind Power Prediction. Springer, Heidelberg (2005) 7. Burton, N., Bossanyi, E.: Wind Energy Handbook. Wiley, Hoboken (2001) 8. Bremnes, J.B.: A comparison of a few statistical models for making quantile wind power forecasts. Wind Energy 9(1–2), 3–11 (2006) 9. Pinson, P., Kariniotakis, G.: Conditional prediction intervals of wind power generation. IEEE Trans. Power Syst. 25(4), 1845–1856 (2010) 10. Khosravi, A., Nahavandi, S.: An optimized mean variance estimation method for uncertainty quantification of wind power forecasts. Electr. Power Energy Syst. 61, 446–454 (2014) 11. Pinson, P., Girard, R.: Evaluating the quality of scenarios of short-term wind power generation. Appl. Energy 96, 12–20 (2012) 12. Pinson, P., Nielsen, H.A., Mller, J.K., Madsen, H., Kariniotakis, G.N.: Non-parametric probabilistic forecasts of wind power: required properties and evaluation. Wind Energy 10 (6), 497–516 (2007) 13. Abbas, K., Nahavandi, S., Creighton, D.: Prediction intervals for short-term wind farm power generation forecasts. IEEE Trans. Sustain. Energy 4(3), 602–610 (2013) 14. Abbas, K., Nahavandi, S., Creighton, D., Naghavizadeh, R: Uncertainty quantification for wind farm power generation. In: IEEE World Congress on Computational Intelligence, Australia (2012) 15. Hao, Q., Srinivasan, D., Abbas, K.: Short-term load and wind power forecasting using neural network-based prediction intervals. IEEE Trans. Neural Netw. Learn. Syst. 25(2), 303–315 (2014) 16. Kavasseri, R.G., Seetharaman, K.: Day-ahead wind speed forecasting using fARIMA models. Renew. Energy 34(5), 1388–1393 (2009) 17. Barbounis, T., Theocharis, J., Alexiadis, M., Dokopoulos, P.: Long-term wind speed and power forecasting using local recurrent neural network models. IEEE Trans. Energy Convers. 21(1), 273–284 (2006) 18. LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521, 436–444 (2015) 19. Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back propagating errors. Nature 323, 533–536 (1986) 20. Khosravi, A., Nahavandi, S., Creighton, D., Atiya, A.F.: A lower upper bound estimation method for construction of neural network-based prediction intervals. IEEE Trans. Neural Netw. 22(3), 337–346 (2011) 21. Khosravi, A., Nahavandi, S., Creighton, D., Atiya, A.: Comprehensive review of neural ne work-based prediction intervals and new advances. IEEE Trans. Neural Netw. 22(9), 1341– 1356 (2011)
122
N. Kirthika et al.
22. NIWE Homepage. http://niwe.res.in:8080/NIWE_WRA_DATA/. Accessed 20 Nov 2018 23. SRLDC Homepage. https://srldc.in/DailyReport.aspx. Accessed 20 Nov 2018 24. Vishnupriyadharshini, A., Vanitha, V., Palanisamy, T.: Wind speed forecasting based on statistical auto regressive integrated moving average (ARIMA) method. Int. J. Control Theory Appl. 9(15), 7681–7690 (2016) 25. Nair, K.R., Vanitha, V., Jisma, M.: Forecasting of wind speed using ANN, ARIMA and hybrid models. In: IEEE International Conference on Intelligent Computing, Instrumentation and Control Technologies, pp. 170–175 (2017) 26. Lahmiri, S.: Comparative study of ECG signal denoising by wavelet thresholding in empirical and variational mode decomposition domains. Healthc. Technol. Lett. 1(3), 104– 109 (2014) 27. https://en.wind-turbine-models.com/turbines/428-gamesa-g114-2.0mw. Accessed 24 Dec 2018
Recommendation System for E-Commerce by Memory Based and Model Based Collaborative Filtering K. RaviKanth1(&), K. ChandraShekar2, K. Sreekanth3, and P. Santhosh Kumar4 1
Department of CSE, RGUKT, IIIT, Basar, India [email protected] 2 Department of CSE, GNITC, Hyderabad, India [email protected] 3 Department of CSE, NNRG, Hyderabad, India [email protected] 4 Department of SCIS, HCU, Hyderabad, India [email protected]
Abstract. Usage of internet is growing rapidly and became more and more important in every aspect of life. Everyone is addicted to use the internet and enjoy its advantages. One of the key advantages with the internet is Ecommerce. E-commerce being an online market facilitates the users to a greater extent. In the past, people used to buy the goods by going to the shops and markets but now everyone is using E-commerce to buy the goods. In past if people want to search for a product, they could directly ask the shop owner and he would provide it, if he had it. But in the E-commerce it is headache of the customer to search for the product as it is vast. To avoid this, recommendation systems are used. These recommendation systems recommend products for the users and help the users to take correct decision and also help for the growth of E-commerce. There are different types of recommendation systems such as content based, collaborative, hybrid etc. Variety of algorithms are been used by various researchers based on the application area and the requirements of the end user. In this paper, we propose a collaborative filtering recommendation system. Keywords: Recommendation system Collaborative filtering Similarity measure Matrix factorization Pearson correlation Gradient descent
1 Introduction Recent days, Internet is growing rapidly and it became a daily need for the people [2]. These days everyone is using various types of electronic gadgets, such as laptops, smart phones, computers, etc., for learning, searching, and business purposes with the help of the internet to obtain the required information in fraction of seconds [4]. This demand for the use of Internet resulted in the design and development of many software, useful for the E-commerce websites (like amazon, flipkart, ebay, snapdeal etc.) [2]. © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 123–129, 2021. https://doi.org/10.1007/978-3-030-49345-5_13
124
K. RaviKanth et al.
Shopping is one of the routine job for everyone to buy their daily needs. In past there was no technology related to E-commerce, so people used to purchase/obtain the goods (every small item) only by visiting the shops physically. But, nowadays the technology has grown enormously [1], and reached every corner of the globe including rural places. This (technology) became important part of everyone’s daily activity. In this competitive world, people don’t want to waste their precious time by shopping physically and hence attracted towards the E-commerce world [2]. E-commerce is nothing but an online shop where we can buy whatever the goods, items we would like to purchase, just from our home, office, during the travel etc., simply by using internet on a electronic gadget. But E-commerce is a very broad world [1], if we want to search for a required product, it may take a lot of time of the customer, which is not convenient way for the customers who are very busy with their jobs/business, with respect to the traditional offline shopping [2]. Customers are the first and primary part for the development and growth of any business, including Ecommerce. If they are not happy with the service provided (whether offline/online) they may not prefer to use that website once again for the shopping. Development of user friendly searching processes in a particular E-commerce website is highly recommended and became a serious concern for vendors. However, there is a high competition among the E-commerce merchants to provide the best service through customer friendly website designing [2], thereby attracting the customers. In order to make the item search process easier and to show the best suitable items for each of the customer within a short period of time, the E-commerce traders require a software called Recommendation system. Recommendation is the process of suggesting an appropriate product to the each customer by analysing their previously stored search data [3]. We can consider recommendation system as a “shop counter guy”. If we have a customer friendly, good speaking, informative and skilful, shop counter guy, the customers will be happy to revisit the shop, subsequently it will result in popularizing the shop and good business. Similarly, a simple, user friendly and efficient recommendation system also makes the E-commerce business (website) popular among customers. One such example in the current E-commerce world is the amazon, as it is known to get 35% of its revenue from the recommendation system [7]. For this purpose, many E-commerce traders have already been involved in developing better and latest recommendation systems. But still, there are several inherent problems associated with the designing, such as cold start, Data valid time, Limited resource [2, 3]. At this juncture, we designed and developed a system, which focuses to solve the issues related to cold start.
2 Literature Survey a) Research group of Roshni Padate from Mumbai, proposed a hybrid recommendation system which is aggregate of collaborative and clustering methods [1]. As a part of memory based and model approach they implemented item-item collaborative filtering and cluster (k-means) based system respectively. And for finding the similarity between the users they had tried jaccard similarity, cosine similarity and
Recommendation System for E-Commerce
125
the Pearson correlation. But as the jaccard similarity and the cosine similarity have disadvantages they used Pearson correlation. This paper presents the recommendation system which uses both item-item and user-user collaborative filtering. b) Rachana Ramesh, Priyadarshini N, Yuvaraju BN developed a recommendation system which solve problems such as cold start, data valid time, limited resource [2]. In order to solve these problems they designed hybrid recommendation system, which is combination of content based filtering and collaborative filtering. As part of a collaborative approach they have used user-user similarity and also gave a solution for the cold start problem, by considering the users interest when they signup. This paper provide the solution for cold start problem by using model based approach.
3 Proposed System Proposed system uses collaborative filtering technique to develop the recommendation system and it avoids the cold start problem [3, 4] (Fig. 1).
Fig. 1. Pictorial representation of collaborative filtering approaches
The proposed collaborative filtering models are based on the assumptions that, a) the similar users like the similar items and b) a user liked items are almost same or they have some common features in them. Collaborative filtering has two main branches, i.e., i) model based approach and ii) memory based approach [3]. Lot of currently existing E-commerce traders uses either of them to implement the recommendation system, whereas our proposed system uses a combination of both. 3.1
Memory Based Approach
The memory based approach is purely based on the arithmetic operation, and it does not have any learning parameters. Here, the arithmetic operation is finding similarity between the vectors, and the vector may be user related or item related. There are mainly three ways to find the similarities [5].
126
K. RaviKanth et al.
i) Jaccard similarity, ii) Cosine similarity, and iii) Pearson correlation. For example, if we are finding the similarity between users with the help of ratings given by them i) Jaccard similarity (A, B) J ðA; BÞ ¼
jA \ Bj jA [ Bj
The disadvantage of Jaccard is, it ignores the rating values, i.e., whether the user rating is high or medium or low, it considers every rating as same (considers whether user gave the rating or not but does not consider the rating value). In other words, even though one user given high rating and another user given low rating jaccard considers both as same.
ii) Cosine similarity (A, B) cos h ¼
~ a ~ b ~ ak b k~
The disadvantage of cosine similarity is, it treats the missing values (no response) as the negative ratings.
iii) Pearson correlation It is more close to the Cosine similarity, but normalizes the ratings such that the missing values can also be considered as average rating. Pearson/Centered cosine ¼ cos ðrðAÞ; rðBÞÞ As there are disadvantages with Jaccard and Cosine similarities, in this paper we have used Pearson correlation as similarity metric. We have also implemented user-user collaborative filtering [4] and the item-item collaborative filtering [4]. In user-user collaborative filtering, as a first step, we find the similarity between all users and store it in a user-user matrix. As a second step, for the given user we find the top ‘n’ similar users. As a third step, for the items which are not rated by the given user, we predict the rating (weighted average of similar user’s ratings) and as a final step, we recommend the items which have obtained the top rating. The same procedure can also be implemented in the item-item collaborative filtering [8]. As a first step, we find the similarity between all the items and store it in a item-item matrix. As a second step, we find the items which are not rated by the users, subsequently, for each unrated item, we find the similar items which are rated by the same user and predict the rating (using weighted average of similar item ratings by the same user). Finally the items which got the top rating will be recommended to the user.
Recommendation System for E-Commerce
127
Rating prediction in user-user collaborative filtering formula: P simðu; u0 Þru0 i u0 ^rui ¼ P jsimðu; u0 Þj u0
Rating prediction in item-item collaborative filtering formula: P j2N ði;xÞ sij rxj rxi ¼ P j2N ði;xÞ sij
sij… similarity of items i and j rxj… rating of user x on item j N(i; x)… set items rated by x similar to i. 3.2
Model Based Approach
In model based approach, we use machine learning concepts to build the model. Unlike the memory based approach, the model based approach will have a learning parameter. As part of model based approach we have implemented Matrix Factorization technique [6]. Matrix factorization is nothing but decomposition of matrix into two matrices such that the multiplication of both should result in the original matrix back [6]. As a first step of implementation, we build a data matrix (R) which is a user-rating matrix, and our goal is to find the two matrices (P, Q) whose dot product will give the R back. In the second step, we build two matrices P (user-feature), and Q (feature-rating) based on number of features which we want to extract for the items. In the third step, we apply a dot product between P and Q and calculate the error function. As a fourth step, the gradient descent algorithm will be applied on the error function and the values of P and Q matrices will be updated. The steps 3 & 4 will be repeated, till the minimum error is obtained. At the end of these four steps, the two matrices P and Q will be obtained in such way that their multiplication generates the R matrix back. In the next step, for the given user we find the items which are not rated and we predict the rating for each of these unrated items by using P and Q matrices (we take user row from P and item column from Q and apply a dot (.) product to get the rating). Finally, we recommend the products for which we got top ratings. 3.3
Combining Both the Approaches
In Hybrid approach we use the results of memory and model based approaches. To obtain the results of this method, we calculate the average rating for each item from result of both the models [3]. Finally we recommend the items which have obtained highest average rating (Fig. 2).
128
K. RaviKanth et al.
Fig. 2. Results of memory based approach
4 Experimental Results The above values represent the Product Id’s which have arrived from the combination of item-item and user-user collaborative filtering as a part of memory based approach (Fig. 3).
Fig. 3. Results of model based approach
The values in the result of model based approach express Product Id’s which are produced from the matrix factorization (Fig. 4).
Fig. 4. Results of hybrid approach
Recommendation System for E-Commerce
129
The above values are the final results of the paper which are evolved from the unification of memory and model based approach.
5 Conclusion and Future Work In our day to day life many E-Commerce websites are evolving and Recommendation systems are playing major role in the E-commerce and at the same time recommendation systems may face problems such as cold start and data invalid, which can affect the efficiency and accuracy of the recommendation system. The proposed system helps the E-commerce websites to avoid such problems and also helps to get the profits for their company. Finally, the proposed system definitely satisfies the customers and help to maintain good relationship with the customers. In future, E-commerce traders will increase exponentially hence to attract the customers and maintain good profits. Hence, there is a need to develop high accuracy and efficient recommendation system. For this purpose, in future we implement recommendation system with various effective algorithms and we plan to build an accurate and efficient hybrid recommendation system.
References 1. Roshni, P., Priyanka, B., Jayesh, K., Adarsh, G.: Hybrid recommendation system using clustering and collaborative filtering. IJRITCC (June 2017). ISSN:2321-8169 2. Rohan, N., Aniket, M., Jeetesh, R., Girish, W.: E-commerce recommendation system problems and solutions. IRJET (April 2018). e-ISSN:2395-0056 3. Tarang, R., Yask, P.: A survey: collaborative filtering, content-based filtering, hybrid recommendation approach. IJIRMF (May 2017). ISSN:2455-0620 4. Prasad, R.V.V.S.V., Kumari, V.V.: A categorical review of recommender system. Int. J. Distrib. Parallel Syst. (IJDPS) 3(5), 73 (2012) 5. Suganeshwari, G., Syed Ibrahim, S.P.: A survey on collaborative filtering based recommendation system (2016) 6. Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42, 30–37 (2009) 7. Daoud, M., Naqvi, S.K.: Recommendation system techniques in e-commerce system. IJSR (2015). ISSN:2319-7064 8. Jakhar, K., Sharma, V.K., Sharma, S.: Collaborative filtering based recommendation system augmented with SVM classifier. Int. J. Sci. Eng. Technol. (2016). ISSN:2348-4098
Malware Behavior Profiling from Unstructured Data Yoong Jien Chiam1, Mohd Aizaini Maarof1, Mohamad Nizam Kassim1,2, and Anazida Zainal1(&) 1
Cyber Threat Intelligence Lab, Information Assurance and Security Research Group, School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia [email protected], {aizaini,anazida}@utm.my 2 Cyber Security Responsive Services Division, CyberSecurity Malaysia, Level 7, Tower 1, Menara Cyber Axis, Cyberjaya, Selangor, Malaysia [email protected]
Abstract. Recently, the emergence of the new malware has caused a major threat especially in finance sector in which many of the online banking data was stolen by the adversaries. The malware threats information needs to be collected immediately after its outbreak. Early detection can save others from being the victims. Unfortunately, there is time delay to get the new malware information into the Malware Database such as ExploitDB. A pre-emptive way needs to be taken to gather the first-hand information of the new malware as a preventive measure. One of the methods is by extracting information from open source data such as online news by using Named Entity Recognition (NER). However, the existing NER system is incapable to extract the domain specific entities from the online news accurately. The aim of this paper is to extract the malware entities and its behaviour attributes using extended version of NER with HMM and CRF. A malware annotated corpus is produced in order to conduct the supervise learning for the machine learning approach of the name entity tagger. The results show CRF performs slightly better than HMM. Few experiments are performed in order to optimize the performance of CRF in terms of feature extraction. Finally, the malware behaviour information is visualized onto a dashboard by combining few statistical graphs using matplotlib. The purpose of visualizing the malware behaviour profile extracted from the online news is to help cyber security experts to better understand the malware behaviour. Keywords: Cyber threat intelligent annotation
Natural language processing Entity
1 Introduction Nowadays, many small-medium enterprise and big companies are facing a lot of cyber threats that affect their businesses. Cyber threat also exists on personal gadget such as laptop and mobile devices. One of the common cyber threat is malware and it is spreading in the cyber world and majority of the malware steal sensitive personal information or privilege escalation on target system. Recently, the emergence and © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 130–140, 2021. https://doi.org/10.1007/978-3-030-49345-5_14
Malware Behavior Profiling from Unstructured Data
131
widespread of the banking trojan such as Zeus and Ursnif had caused large amount of online banking data been exposed to the adversaries and millions of dollars were stolen from various banking accounts. Lack of cyber threat information gathering and analysis hinders the effort to raise public awareness and delay the time to disseminate information on precaution measurement. AO Kaspersky Lab reported that there was an increase in 15.9% of users being attacked by banking malware which among the victims with banking malware, only 24.1% were corporate users and the rest were consumers. Zbot and Gozi are still leading when it comes to most widespread banking malware family with more than 26% and 20% attacked users respectively [4]. 1.1
Problem Background
Most of the existing NER systems were developed for general subject and recently more research works of NER have been published in medical domain such as Biomedical text mining in cancer research [10], NER over Electronic Health Records [7] and Biological Entity Recognition with Conditional Random Fields [2]. In 2012, there were some problems in integrating textual information on molecule, organism, individual and population to comprehend the complex biological system and also removing the noise and lower the number of false positive in the natural language text [10]. Another major problem was lack of standard corpus and weak feature set which often resulted in lower F1 score in discriminating genome name from other textual data [2]. Similarly, cyber security domain also faces similar challenges in building NER systems to extract cyber security jargons and special names such as malware names, threat actor names etc. Therefore, the purpose of this paper is to design NER system that is capable of recognizing cyber security related terms. Besides that, a gold labelled corpus related to cyber security needs to be prepared manually with the help of annotating tools such as WebAnno. As of 2013, there was no standard corpus for the NER in cyber security domain [3]. The corpus needs to be manually annotated with fair understanding in cyber security domain and the amount of available annotated corpus was too little to prove the capability of NER system in recognizing the relevant name entities. Besides, another major problem faced was in finding the relationship among the security concepts [3]. Bridges et al. [1] highlighted the need to generate a robust feature sets in order to obtain a better performance of the corpus labelling system. Therefore, an extended name entity identifier of the existing NER system is needed so that accurate malware entities and malware behavioural attributes can be extracted from the unstructured text. 1.2
Problem Statement
There is lack of the malware related corpus in the community that can be used for the malware entity tagging task. It will be difficult for the existing NER system to recognize the malware related terms. In order to create a good gold standard corpus, help from the experts with malware related knowledge are needed. There is a standard annotation scheme helped in standardizing all attributes with their respective name
132
Y. J. Chiam et al.
entity tags [6]. From the literature, it is found than both CRF and HMM are competent. [8] did a comparative study on HMM and CRF in NER task in healthcare domain and it was reported that CRF achieved better result than HMM technique. Besides extending NER system to cater for cyber security domain, this paper also investigates the effectiveness of both CRF and HMM techniques in malware domain. The entities extracted from the proposed NER are then visualized in the form of various statistical graphs and presented in visual analytics form to support discovery of malware profiles.
2 Methodology 2.1
Attribute Selection
The MAEC Vocabulary provides a high-fidelity information about the malware based on the attributes such as behaviours including artefacts and relationships between malware samples. However, in the experiment, several changes and modification on the attributes were done in the MAEC Vocabulary in order to get the desired attributes of the malware profile. From the MAEC 5.0 vocabulary list, there are 16 types of malware attribute. In this experiment, five attributes were selected as the name entity in the malware profile where four of them are from MAEC vocabulary list. IOB2 tagging format was used to tag these entities. The five chosen name entity tags defined in this research are shown in Table 1.
Table 1. The five entities used to define malware profile Entities Malware name (MAL) Malware behaviour (BEH) Malware capability (CAP) Delivery Vector of Malware (DVEC) Targeted Operating System (OS)
2.2
Origin Determined by authors MAEC 5.0 vocabulary MAEC 5.0 vocabulary MAEC 5.0 vocabulary MAEC 5.0 vocabulary
Malware Attribute Enumeration and Characterization
The MAEC vocabulary is a community-developed structured language for encoding and sharing high-fidelity information about malware based on attributes such as behaviours, artefacts, and relationships between malware samples. It was devised by the MITRE Corporation as a standardized language for describing malware [5]. The MAEC vocabulary is used as a source of labels for the annotations [6]. This will facilitate cross-applications in other projects and ensure relevance in the cyber security community.
Malware Behavior Profiling from Unstructured Data
2.3
133
Annotation Guideline
[6] described a general guideline for malware attributes annotation where it gives a clear guide on how to annotate the related text with its’ closest name entity. According to the guideline, only the words with relevant scope should be labelled. These scopes are; i) Technical Capabilities of the malware and ii) Technical Activities of the malware. Meanwhile the words related to the following scopes should not be labelled and they are; i) Geopolitical or commercial effects of the malware, ii) Investigation into origins of the malware and finally iii) Advertisements for security product. In this research, we strictly follow the above guidelines in labelling the security terms. 2.4
Corpus Building
All the annotated articles need to be combined into a large file (corpus). Later, this corpus will be labelled with the POS-tagging that will act as a feature for each chunk of words. In the end, a corpus in Common Separator Value (CSV) file is developed with these four headers; i) Sentence #, ii) Word, iii) POS and finally iv) Tag. The data are stored in sentence by sentence format to ensure the continuous flow of the data in a sentence since both HMM and CRF need to refer to previous or next words within the sentence. Then, the corpus is ready to be fed into the machine learning model for training. 2.5
Hidden Markov Model
HMM is a generative model that shows the probability distributions over the sequences of observation. From those probability distributions, it can calculate and predict the unknown sequence that underlies in the datasets provided. HMM utilizes training data to create some datapoints and then learns the transition probabilities on its own since there is no direct control over the output labels from itself. To populate the model that reflects the situation, it is important to know the three parameters, such as the start probabilities, emission probabilities and state transition probabilities. As for HMM applied in classifier, each word will be assigned to the named entity types. Every state will be organized into regions that represent each type of entities. A language model then is used to calculate the words within the same region (named entity type). Deleted interpolation is used to compute the transition probabilities. 2.6
Conditional Random Field
CRF is an undirected graphical model that defines a conditional probability distribution over labelled sequences given a particular observation sequence. It can consider as the generalization form of HMM. In the application of NER, CRF is a framework that uses probability calculations to segment and label unstructured data. CRF is often used with general graphical structure, because such structure greatly help in relational learning as the assumption of each entity can be loosen. The main advantage of CRF is relaxation of independence assumption. In independence assumption, the variables are in independent state and do not affect each other in any situation but this is not always true and
134
Y. J. Chiam et al.
this assumption can cause inaccuracy. In natural language processing, linear-chain CRF is often used because it considers context surrounding the word. There are two CRF models that had been developed in this research with different parameters setup as shown in Table 2.
Table 2. Parameter used for both CRF models Parameter 1. Cross-validation 2. Iteration count 3. c1 (r1) 4. c2 (r2)
CRF 1 5-fold 50 Random exponential scale from 0.5 Random exponential scale from 0.05
CRF 2 5-fold 50 10 Random exponential scale from 0.05
The two important parameters here are c1 & c2 and they are hyper-parameter. CRF is uses regularization to obtain a spatially consistent label where L1 and L2 are the regularization techniques often used to minimize an overfitting issue during the CRF training. L1 regularization is a penalty function for regularizing model that applies penalties proportional to jwi j resulted in the penalized objective function as shown in Eq. 1. max lðY j X ; wÞ k w
X
jwi j;
ð1Þ
i
k is a parameter that controls the degree of regularization during training. Regularizing with L1 penalty complicates training but has the advantage of producing a sparse model. L1 regularization creates smoothen model where some of the weights are exactly equal to zero. The features with zero weight can be eliminated from the model. In L1 regularization, k acts as a threshold and prevents wi from becoming non-zero to get small improvement in the objective function. L2 regularization penalty each weight and it is proportional to w2i. By referring to Eq. 2, the high value of k corresponds to a large amount of smoothing and a k of zero results in no smoothing. max lðY j X ; wÞ k wT w w
ð2Þ
The penalized objective function is differentiable and therefore training a CRF with L2 penalty requires about the same computational effort as training a CRF without regularization [9]. The difficulty lies in selecting the best value for k, which can be done by using different settings of k via cross-validation.
Malware Behavior Profiling from Unstructured Data
135
3 Results 3.1
Comparison of Results of Each Approaches
Four standard evaluation metrics were used to measure the performance of NER with CRF and NER with HMM. They are; Accuracy, Precision, Recall and F1 score. Figure 1, shows the score for each metrics: accuracy, precision, recall and F1 score for HMM NER model are 97.23%. Meanwhile for CRF NER model is 98.68% for all the scoring metrics. The lower score from the HMM NER model due to the its structure that supports less word feature, in this case, only unigram and bigram feature used. That less likely to support feature extraction. The detailed explanation can be found in discussion section.
Fig. 1. Score comparison between HMM and CRF model
Fig. 2. Score comparison between different CRF models
Meanwhile Fig. 2 shows the overall scoring of CRF model improved after the model was tuned with 5-fold cross validation. The precision, recall and F1-score of the original CRF model has improved from 98.68% to 99.29% with an increment of 0.61%. Even though the result has improved, it only becomes more likely to remember the words rather than recognise the pattern of the words as it reflected in the recall score (from 97.2% to 99.3%). Figure 3 shows that the overall score for each metrics has slightly dropped by 1.49% (From 98.68% to 97.19%). It is obvious that the recall score is lower than the previous trained CRF model. Even though the performance scores dropped, the model now has more ability to recognize the pattern of the word. This is discussed in detailed in discussion section.
136
Y. J. Chiam et al.
Fig. 3. Score comparison of CRF model between random c1 and c2 where c1 = 10, c2 = random
3.2
Visualization of the Results
The results from the NER task are shown differently in the form of bar chart (Fig. 4) and pie chart (Fig. 5).
Fig. 4. List of malware names
Fig. 5. Top 10 malware targeted object
From Fig. 4, NER tagger shows ‘trickbot’ has the highest frequency (88 times) followed by ‘asacub’ (56 times) and ‘zbot’ (48 times). Where as in Fig. 5, the user bank credentials are the most targeted items by the malware users, which is 29.4% followed by email (15.7%) and message (9.8%).
Malware Behavior Profiling from Unstructured Data
137
4 Discussion From the results shown in Fig. 1, the HMM achieves lower performance score compared to CRF. This could be due to the distinct methodologies that are used to construct the structure of both NER tagger [7]. The structure of HMM is less suitable to represent feature set compared to CRF that is able to represent rich set of features. The features used in HMM contain N-gram (unigram and bigram) and POS tags. As for CRF, the feature selection is based on the N-gram (unigram, bigram and trigram), POS tags, upper case words, lower case words, title words and suffix of the words. Besides, the nature of CRF allows it to accommodate any context information and its feature design is flexible when compared to the HMM that is only dependent on every state and its corresponding observation (observed object).
Table 3. Classification results of HMM model HMM models Precision Recall F1-score Support B-BEH B-CAP B-DVEC B-MAL B-OS I-BEH I-CAP I-DVEC I-MAL I-OS O Micro avg Macro avg Weighted avg
0.701 0.639 0.377 0.703 0.750 0.434 0.583 0.360 1.000 1.000 0.992 0.972 0.6885 0.978
0.561 0.645 0.588 0.891 0.812 0.721 0.820 0.850 0.806 1.000 0.981 0.972 0.789 0.972
0.623 0.642 0.460 0.786 0.780 0.541 0.682 0.506 0.893 1.000 0.987 0.972 0.718 0.974
246 121 102 479 85 154 111 80 31 13 34642 36064 36064 36064
Table 4. Classification results of CRF model CRF models
Precision Recall F1score
Support
B-BEH B-CAP B-DVEC B-MAL B-OS I-BEH I-CAP I-DVEC I-MAL I-OS O Micro avg Macro avg Weighted avg
0.850 0.842 0.861 0.888 0.897 0.747 0.886 0.833 1.000 1.000 0.992 0.987 0.891 0.986
267 117 169 465 75 182 87 119 28 9 34164 35682 35682 35682
0.723 0.726 0.586 0.955 0.933 0.615 0.897 0.672 1.000 1.000 0.996 0.987 0.828 0.987
0.781 0.780 0.697 0.920 0.915 0.675 0.891 0.744 1.000 1.000 0.994 0.987 0.854 0.986
Based on the classification results from Table 3 and 4, only two tags that the HMM model has higher recall score than CRF model which are I-BEH (HMM: 72.1%, CRF: 61.5%) and I-DVEC (HMM: 85.0%, CRF: 67.2%). This might be due to the HMM has the higher probability value for the state transition of the I-BEH and I-DVEC meanwhile the CRF model mostly only memorizes the next and previous word rather than the current word. Such model features will cause the lower recall score of the CRF model than HMM model.
138
Y. J. Chiam et al.
There is an improvement on the scores from the optimized model in Fig. 2. The feature selection result for I-BEH tag in Table 5 clearly shows there is a problem in the model. Table 5. Feature selection of I-BEH tag for optimized CRF model I-BEH Weight +4.363 +3.873 +3.646 +3.082 +2.698
Feature +1:word.lower():campaigns +1:word.lower():money +1:word.lower():last −1:word.lower():for −1:word.lower():reply
In order to increase the capability of the model in identifying the pattern of the words for each tag, some tuning is required using c1 and c2 parameters of the CRF model. Based on the CRF model, the value of c1 parameter needs to be increased to enable the model to do feature selection. Therefore, a new smart feature recognition CRF model is proposed with 5-fold cross validation. 50 iterations were run with fixed c1 parameter (which is 10) and random search for the c2 parameter was set with exponential scale, 0.1. Table 6 shows classification results of smart feature detection of CRF model. Table 6. Classification results of smart feature detection CRF model CRF models B-BEH B-CAP B-DVEC B-MAL B-OS I-BEH I-CAP I-DVEC I-MAL B-OS I-OS O Micro avg Macro avg Weighted avg
Precision 0.788 0.923 0.850 0.873 0.734 0.291 0.844 0.824 0.873 0.734 0.500 0.975 0.972 0.782 0.967
Recall 0.307 0.205 0.101 0.783 0.627 0.088 0.437 0.118 0.783 0.627 0.111 0.997 0.972 0.395 0.972
F1-score 0.442 0.336 0.180 0.825 0.676 0.135 0.576 0.206 0.825 0.976 0.182 0.986 0.972 0.479 0.965
Support 267 117 169 465 75 182 87 119 465 75 9 34164 35682 35682 35682
Malware Behavior Profiling from Unstructured Data
139
The overall score for each metrics has slightly dropped. It is obvious that the recall score is lower than the previously trained CRF model. This enables the model to recognize the pattern of each named entity tags rather than memorizing specific keywords.
5 Conclusion From the experiments conducted, it can be concluded that CRF performs better than HMM in Name Entity Recognition exercise. The F1-score for CRF is 98.68% while HMM score is 97.23%. The feature analysis of the CRF began by using the random search hyperparameter with 5-fold cross validation. The performance of the new CRF model has slightly increased than the original CRF model. However, the feature selection of both models mostly depends on the word memorization. Therefore, a smart feature recognition model with a higher value of c1 (ridge regularization) was developed. This has enabled CRF’s ability to improve recognition on the pattern of the entity instead of memorizing. The F1-score of the smart feature recognition model has dropped (from 98.68% to 97.19%) because there is a significant drop in the recall score of the model. This low recall score of the model (Table 6) indicates that the model generated will have less ability to memorize word. It is clearly shown in Table 6 as the model has more ability to recognize the I-BEH tag by recognizing patterns of the related words. The pattern recognition is important when dealing with the whole new data with different words. For future improvement, it is recommended to create a good word feature of malware attributes by thoroughly analysing each malware related word. This require human with malware knowledge to identify these words. Gathering more experts with a deep knowledge in malware domain to annotate the data to ensure the correctness of the malware name entity of each word can also produce a better result. Acknowledgement. This work is a collaboration between Universiti Teknologi Malaysia and CyberSecurity Malaysia. It is partly supported by the Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) under High Impact Research Grant (HIR) (VOT PY/2018/02890).
References 1. Bridges, R.A., Jones, C.L., Iannacone, M.D., Testa, K.M., Goodall, J.R.: Automatic labeling for entity extraction in cyber security, pp. 1–11 (2013) 2. He, Y., Kayaalp, M.: Biological entity recognition with conditional random fields. In: Annual Symposium Proceedings/AMIA Symposium, AMIA, pp. 293–297 (2008) 3. Joshi, A., Lal, R., Finin, T., Joshi, A.: Extracting cybersecurity related linked data from text. In: Proceedings - 2013 IEEE 7th International Conference on Semantic Computing, ICSC 2013, pp. 252–259 (2013) 4. Kaspersky Lab Page. https://www.kaspersky.com/about/press-releases/2019_number-ofusers-attacked-by-banking-trojans-grew. Accessed 15 Mar 2019
140
Y. J. Chiam et al.
5. Knoth, P., Gooch, P.: An introduction to text mining research papers what is text mining? (September 2015) 6. Lim, S.K., Muis, A.O., Lu, W., Ong, C.H.: MalwareTextDB: a database for annotated malware articles, pp. 1557–1567 (2017) 7. Ponomareva, N., Rosso, P., Pla, F., Molina, A.: Conditional random fields vs. hidden markov models in a biomedical named entity recognition task. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing, RANLP, pp. 479–483 (May 2014) 8. Quimbaya, A.P., Múnera, A.S., Rivera, R.A.G., Rodríguez, J.C.D., Velandia, O.M.M., Peña, A.A.G., Labbé, C.: Named entity recognition over electronic health records through a combined dictionary-based approach. Procedia Comput. Sci. 100, 55–61 (2016) 9. Vail, D.L., Lafferty, J.D., Veloso, M.M.: Feature selection in conditional random fields for activity recognition. In: IEEE International Conference on Intelligent Robots and Systems, pp. 3379–3384 (2007) 10. Zhu, F., Patumcharoenpol, P., Zhang, C., Yang, Y., Chan, J., Meechai, A., Shen, B.: Biomedical text mining and its applications in cancer research. J. Biomed. Inform. 46(2), 200–211 (2013)
Customized Hidden Layered ANN Based Pattern Recognition Technique for Differential Protection of Power Transformer Harish Balaga(&) and Deepthi Marrapu Vardhaman College of Engineering, Hyderabad, India [email protected], [email protected]
Abstract. This article presents the use of Customized Multi-Layer ANN based Pattern Recognition Technique for the numerical differential protection of a power transformer. An efficient Resilient Back Propagation trained neural network model with customized parallel hidden layers is proposed for the said purpose. The task of the ANN is to discriminate among various operating conditions of the transformer and issue trip signal, only in the case of internal fault. The data base required for the training of algorithm is obtained by using MATLAB/SIMULINK environment. Keywords: Differential protection Pattern recognition network Power transformer Differential relay
Artificial neural
1 Introduction Transformer protection is an important aspect as transformer forms an important element in power systems. Several protective relays are available in market for the reliable protection of transformer. Earlier method of bypassing 2nd harmonic may prevent false tripping of the relay under inrush conditions, it has increased fault clearance time in case of heavy internal faults along with the Saturation of CT’s. Presence of non-linear characteristics in CT cause them to saturate at high currents. In addition to high currents presence of DC also leads to saturation. In recent days differential protection is being widely used. This method compares the currents flowing through different terminals (primary and secondary current) of the transformer on a single base and identifies a fault. The main defect the said method faces are distinguishing between magnetizing inrush and fault conditions. Thus, a relay is to be designed which can distinguish between magnetizing inrush, over excitation and internal fault conditions and act accordingly. There are researchers who have been working on the said relay in recent times [1–3, 5, 8, 10–13]. In this paper, a relay has been designed which can distinguish between healthy operating conditions and faulty conditions depending on the pattern of the differential current. For the development of this relay we have used a multi-layered resilient back propagation network. Composition of the network includes a total of 48 inputs and 6 outputs. In addition to these we employ two parallel operated independent hidden © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 141–149, 2021. https://doi.org/10.1007/978-3-030-49345-5_15
142
H. Balaga and D. Marrapu
layers. Performance of different experiments, results in a finalised network consisting of 40 hidden neurons for each layer. A separate description about these input and output layers will be mention in the coming sections. Testing for speed and accuracy on this scheme is done with the help of ANN-based pattern recognition algorithm. Its observed that the proposed scheme provides us with an acceptable result in distinguishing between a fault and normal operating conditions.
2 Database Generation For the required database generation we use a simulation network having a 3-phase 220/110 kV power transformer, transmission line as shown in Fig. 1. The simulation was carried out in Simulink (MATLAB) environment. Various internal faults conditions were simulated by adding fault at different sides of transformer and different points of transmission line. For different values of voltage angles and load conditions simulation for over excitation and inrush current is done.
Protected zone
Fig. 1. Simulated model
3 Neural Network Design and Training As we know that the scheme is differential protection scheme which works on the wave pattern, it is a pattern recognition problem. So, it is usual to start with a simple feedforward backpropagation neural network design as a pattern classifier. However, there are two problems. One is that for the two major classes out of six, the normal and
Customized Hidden Layered ANN Based Pattern Recognition Technique
143
fault cases, the wave pattern looks similar except the difference in wave amplitude. Our experimentation showed that the simple pattern classifier is not that effective in classifying those two cases, leading to maloperation of the relay. The other problem is that the Back-Propagation algorithm is relatively slower while training the network, with an additional common disadvantage of error local minimization. To overcome these two problems, a customized network architecture is adopted, as discussed in Sect. 3.2 with a faster training algorithm. Having the preprocessed database with both inputs and corresponding targets makes the learning process the supervised type. 3.1
Training Algorithm - RPROP
Resilient Back Propagation (RPROP) Algorithm is a supervised learning approach similar to backpropagation (BP) algorithm. In other words, RPROP is a faster version of BP. RPROP updates the weights with variable learning rate, based on the sign of the error gradient. This process performs a local adaptation of the weight-updates and hence it requires lesser memory requirements than BP. It also reduces the number of epochs required for training. Up to some extent, it can also reduce the chances of settling into local minima, the major problem with BP algorithm. 3.2
Network Architecture and Training
For the development of a neural network, it is preliminary to fix the number of data inputs and the number of classes. Different models and mix of data sets were endeavored to show up at the last setup with an objective of getting least mean squared error (MSE). Six class number is fixed, variation of number of neurons and hidden layers is done based on trial and error method, till a minimum error is achieved. The differential current waveform is sampled at a rate of 32 samples per cycle or 1600 Hz. Then, 16 samples from each phase data are taken in a set, cumulating to 48 overall. These samples are then arranged in moving window format and used for training and testing the developed NNs. This way, a database consisting of 2142 sets of samples are generated by sampling waveforms presented in Fig. 2, created in SIMULINK model of MATLAB environment, has been used to train and test the neural network. Out of this database, 80% sets are used for training the network and remaining are used for validation and testing. For 6 different operating conditions, the ANN has to classify the inputs into 6 classes, namely, (i) Normal (ii) Inrush (iii) Over-excitation (iv) Fault on Phase-A (v) Fault on Phase-B (vi) Fault on Phase-C. The target of the network is to issue trip signal only when one or more of the last three outputs is raised to 1.
144
H. Balaga and D. Marrapu
Fig. 2. Differential current waveforms generated by the 3-phase transformer model for (a) normal (b) magnetising inrush (c) over-excitation (d) LLG fault (e) LLLG fault
Customized Hidden Layered ANN Based Pattern Recognition Technique
145
Fig. 2. (continued)
Next comes the most important part of the architecture, i.e. the hidden layer. The proposed architecture consists of two hidden layers. But unlike the traditional series connected layers, the two hidden layers are operated in parallel and independent on each other. i.e., neither of the inputs and outputs of one hidden layer is effecting the performance of the other directly. Figure 3 illustrates the interconnections between different layers of the network. It is worth noting that the ANN classifies the inputs into 6 categories based on the shape or pattern of the differential current waveform, but not on its amplitude. In this case, both faulty and normal current wave forms look similar in shape but differ largely in amplitude. Hence, although the ANN discriminates very clearly when trained for normal and fault conditions only, it might falsely consider both conditions as same case when trained along with other operating conditions like inrush, as they differ much in the wave shape with normal condition. To rectify this, the parallel hidden layered architecture is proposed.
Fig. 3. ANN architecture
3.3
Fault Detection Algorithm
Algorithm for the purpose of fault detection in transformer differential relay is presented in Fig. 4. This is generalized algorithm for any ANN based differential relay.
146
H. Balaga and D. Marrapu
Fig. 4. Fault detection algorithm
Customized Hidden Layered ANN Based Pattern Recognition Technique
147
4 Performance of the Proposed ANN Based Relay As in every ANN based solution, the inputs and outputs are decided based on available samples and number of classes to be identified. The number of neurons required for the hidden layer were decided purely by trial and error method. In this work experimentation using various network configurations is done. The best validation error was 0.0083299. it is evident from the results shown in Table 1 and Table 2, The network designed gives a reliable response by distinguishing all the possible. The performance graphs are depicted in Fig. 5.
Fig. 5. Learning errors of the ANN (48_40 * 2_6) Table 1. Performance of proposed ANN with variable neurons in the hidden layer (maximum training epochs: 1000) ANN architecture 48_20 * 2_6 48_30 * 2_6 48_40 * 2_6 48_60 * 2_6 48_80 * 2_6 48_96 * 2_6 48_120 * 2_6
Best validation error 0.0191 0.0153 0.0083 0.0109 0.0126 0.0141 0.0173
The neural network is created using MATLAB nprtool, which is designed particularly for pattern recognition method-based applications, rather than the generally used nntool. The training process takes just above 2 min for 1000 epochs. This is far better when compared to using nntool for network creation and training which takes more than an hour for 1000 epochs.
148
H. Balaga and D. Marrapu
Speed with maximum possible accuracy plays key role in transformer protection. The proposed ANN could identify the fault at the 10th sample of fault signal, just over half cycle time, with 100% confidence. However, if we can consider 99.9% confidence is sufficient to issue the trip signal to the circuit breaker, it can be seen when 12th sample of the faulty wave is detected. These results are shown in Table 2. Though one cannot find much difference in the tested output when trained the network with different architecture, one can definitely find some difference in the accuracy of the proposed system while discriminating the faulty condition from other conditions. Table 2. Test output of the proposed ANN architecture Operating condition
Outputs 1 O Normal 0.9994 Magnetising inrush 5 * 10−5 Over-excitation 5 * 10−3 Internal fault (any phase) 9 * 10−3 T = Target; O = actual output;
T 1 0 0 0
2 O 17 * 10−5 1 11 * 10−5 18 * 10−5
T 0 1 0 0
3 O 36 * 10−5 3 * 10−5 0.99994 392 * 10−5
T 0 0 1 0
4/5/6 O 0.02 46 * 10−5 77 * 10−4 0.9992
T 0 0 0 1
5 Conclusion A new protection scheme for the transformer based on a customized parallel hidden layered neural network is proposed. The proposed 48_40 * 2_6 architecture, with 48 inputs, 6 outputs and two parallel hidden layers with 40 neurons each, could effectively differentiate all the healthy and faulty conditions and issues trip signal only when any one or more of the last three outputs (4, 5, 6) crosses the threshold value. The proposed method is efficient and gives excellent reliability, speed and accuracy.
References 1. Balaga, H., Vishwakarma, D.N.: Artificial neural network based backup differential protection of generator-transformer unit. Int. J. Electron. Electr. Eng. 3(6), 482–487 (2015) 2. Sachdev, M.S., Nagpal, M.: A recursive least error squares algorithm for power system relaying and measurement application. IEEE Trans. Power Deliv. 6(3), 1008–1015 (1991) 3. Balaga, H., Gupta, N., Vishwakarma, D.N.: GA trained parallel hidden layered ANN based differential protection of three phase power transformer. Int. J. Electr. Power Energy Syst. 67, 286–297 (2015) 4. Balaga, H., Vishwakarma, D.N., Sinha, A.: Numerical differential protection of power transformer using ANN as a pattern classifier. In: IEEE International Conference on Power, Control and Embedded Systems, pp. 1–6. IEEE, Allahabad (2000)
Customized Hidden Layered ANN Based Pattern Recognition Technique
149
5. Balaga, H., Vishwakarma, D.N., Sinha, A.: Application of ANN based pattern recognition technique for the protection of 3-phase power transformer. In: Panigrahi, B.K., Suganthan, P. N., Das, S., Satapathy, S.C. (eds.) Swarm, Evolutionary, and Memetic Computing, vol. 7076, pp. 358–365. Lecture Notes in Computer Science. Springer, Heidelberg (2011) 6. Pihler, J., Grčar, B., Dolinar, D.: Improved operation of power transformer protection using artificial neural network. IEEE Trans. Power Deliv. 12, 1128–1136 (1997) 7. Kasztenny, B., Rosolowski, E.: Multi-objective optimization of a neural network based differential relay for power transformers. IEEE Conf. Transm. Distrib. 2, 476–481 (1999) 8. Zaman, M.R., Rahman, M.A.: Experimental testing of the artificial neural network based protection of power transformers. IEEE Power Eng. Rev. 13, 510–517 (1997) 9. Moravej, Z., Vishwakarma, D.N., Singh, S.P.: ANN-based protection scheme for power transformer. Electr. Power Compon. Syst. 28, 875–884 (2000) 10. Moravej, Z., Vishwakarma, D.N.: ANN based harmonic restraint differential protection of power transformer. IE(I) J.-EL 84, 1–6 (2003) 11. Khorashadi-Zadeh, H.: Power transformer differential protection scheme based on symmetrical component and artificial neural network. In: 2004 Seventh Seminar on Neural Network Applications in Electrical Engineering - Proceedings, NEUREL 2004 (2004) 12. Segatto, E.C., Coury, D.V.: A differential relay for power transformers using intelligent tools. IEEE Trans. Power Syst. 21(3), 1154–1162 (2006) 13. Ram, B., Vishwakarma, D.N.: Power System Protection and Switchgear. Tata Mcgraw-Hill (2011)
Gaussian Na¨ıve Bayes Based Intrusion Detection System Akhil Jabbar Meerja(B) , A. Ashu(B) , and Aluvalu Rajani Kanth(B) Department of Computer Science and Engineering, Vardhaman College of Engineering, Hyderabad, India [email protected], [email protected], [email protected]
Abstract. Intrusion detection system (IDS) is used to monitor the intrusions or suspicious actions over the network traffic data or in the computer system. In this paper, we propose an IDS for identifying the intrusion over the network traffic data. As the network traffic data are continuous in nature, we used the Gaussian Na¨ıve Bayes classification approach with the IDS to deduct the intrusions. We used the Kyoto dataset to evaluate the performance of the proposed approach. The results show that the proposed approach have better accuracy of intrusion detection, high intrusion detection rate, and low false alarm rate than the existing approaches. Keywords: Gaussian Na¨ıve Bayes Machine learning · Security
1
· Intrusion detection system ·
Introduction
With the increased usage of information and communication technology, there is an increase in the network traffic [1]. This increase in network traffic increased the risk of intrusion by the attackers. Due to this risk, researchers proposed the usage of intrusion detection system (IDS) to identify the abnormal behavior of the system [2–4]. The IDS identifies the actions that break the security principle of a system or network traffic [1–3]. The IDS finds and analyzes the anomalous actions, and alerts the system administrator about these events. Researchers categorize the IDS into two types: the host based IDS and the network based IDS [5]. The host based IDS observes the operating system files. Whereas the network based IDS examines the network traffic of the system. In this paper, we will concentrate on network based IDS. Researchers proposed the usage the machine learning and the data mining approaches to automatically deduct the intrusions from the network traffic data [6,7]. They used the concepts of feature selections, patter recognition, and rule based learning to identify the intrusions over the network traffic data [6]. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 150–156, 2021. https://doi.org/10.1007/978-3-030-49345-5_16
Gaussian Na¨ıve Bayes Based Intrusion Detection System
151
Therefore, the existing IDS are limited to identify the intrusions based on some patterns and features. As the data generated by the network traffic are streaming data, the existing IDS approaches does not have the capability to handle these data. Due to these reasons, the existing IDS have less accuracy of intrusion detection [7], low intrusion detection rate (DR) [7] and high false alarm rate (FAR) [7]. To overcome the above mentioned limitations, in this paper, we use the Gaussian Na¨ıve Bayes approach [8] with IDS to classify the intrusions. The Gaussian Na¨ıve Bayes approach is built to classifies the continuous and streaming network traffic data in the IDS. We organize the rest of the paper as follows. In Sect. 2, we present the related works on existing IDS. Whereas in Sect. 3, we present our proposed Gaussian Na¨ıve Bayes approach for IDS. Similarly, in Sect. 4, we provide our experimental results. Finally, in Sect. 5, we conclude the paper.
2
Related Works
Network based IDS use the network traffic data to recognize the suspected cyberattacks on the system [5]. In [9], authors used the network based IDS for synchrophasor systems to identify the abnormal events in the network traffic data. They checked the lists of packets with appropriate source IP addresses, format of the packets, etc., to deduct the abnormal events. Similarly, in [10], the authors analyzed the network traffic data by developing a distributed and parallel IDS. In [11], authors proposed the usage of k-means clustering based ensemble with the IDS to cluster the network traffic data. Whereas in [12], the authors used the Na¨ıve Bayes classification approach with the network based IDS to classify the network traffic data. In [13], the authors implemented a specification based IDS by analyzing the sequential actions occurred in the network traffic data. They introduced the advanced metering infrastructure to analyze the sequential events from the network traffic data. Many machine learning algorithms have been used for implementing IDS. Few of them are 1) SVM [14,15], decision tree [16], random forest [17]. These methods achieved reasonable accuracy. IDS using Na¨ıve bayes classifier was proposed by in [18]. Authors used NSL-KDD dats set and applied the Na¨ıve Bayes classifier. From the total 41 features, 21 features have been selected for classification based on feature selection. Accuracy obtained by their method is 97.78%. In [19], the authors developed a model, intelligent IDS based on neural network and fuzzy logic. The model combined host based and anomaly based detection. In [20,21] authors proposed IDS based on PCA and Genetic algorithm. Feature selection methods have been used to improve accuracy. Even though, the existing network based IDS above mentioned [5,9–20] can find the abnormal actions over the network traffic data, but they are limited to analyze the abnormal actions that happen over the physical system. For instance, the IDS proposed in [11], is limited to with the format of the valid IP address. Similarly, the Na¨ıve Bayes approach proposed in [12], does not produce better results with the streaming and continuous network traffic data. To over come
152
A. J. Meerja et al.
these limitations, in this paper, we propose the usage of the Gaussian Na¨ıve Bayes classification approach with IDS (GNIDS). We use the data from the Kyoto dataset to evaluate our proposed approach.
3
Proposed Approach
The main phases of our methodology is preprocessing and then applying supervised learning Gaussian Na¨ıve bayes algorithm on the preprocessed data set. Data preprocessing is required to remove irrelevant and redundant features. We have incorporated missing values computation and normalization techniques to preprocess the data. Schematic diagram of our proposed approach is shown in Fig. 1.
Fig. 1. Schematic diagram of GNIDS
During classifier building, data set is partitioned into training and testing. 1O cross validation testing measure is applied to build classifier. Confusion matrix is derived to record various measures. Our model will classify intrusion and normal traffic data.
4
Experimental Results
In this section, we present the experimental results of the proposed IDS. We used data from the Kyoto dataset [22] to evaluate the results. In Sect. 4.1, we present the details of the dataset. We evaluated the performance of the proposed approach with the Na¨ıve bayes [11] and the Bayes net [12] approaches. Whereas in Sect. 4.2, we present the evaluation of the proposed approach in terms of three evaluations metrics: the accuracy of intrusion detection, the DR and the FAR. 4.1
Dataset
The Kyoto dataset [14] consists of more than 200,000 samples of network traffic data. The data is represented in terms of 15 attributes (14 features and 1 class). Table 1 represents the different attributes of the Kyoto dataset. The attributes of the dataset represents the real network traffic captured using darknet sensors, honey pots, web crawler and email server deployed on five networks inside and outside the Kyoto University. It ignores redundant records and does not contain information on particular attacks.
Gaussian Na¨ıve Bayes Based Intrusion Detection System
153
Table 1. Attributes of Kyoto dataset. Sl. no Attribute name
4.2
1
Duration
2
Service
3
Source bytes
4
Destination bytes
5
Count
6
Same srv rate
7
Serror rate
8
Srv serror rate
9
Dst host count
10
Dst host srv count
11
Dst host same src port rate
12
Dst host serror rate
13
Dst host srv serror rate
14
Flag
15
Label
Performance Evaluation
We evaluated the performance of the proposed approach in terms of three metrics: the accuracy of the intrusion detection (accuracy), the DR and the FAR. The accuracy and detection rates measures obtained by the model is recorded based on confusion matrix. We define the accuracy the ratio of correctly classified network traffic data to the total number of network traffic data as Accuracy =
Datacorrectlyclassif ied . T otalnumberof datainthedataset
(1)
The DR is the ratio between total attacks detected by the system to the total attacks present in the dataset. In other words, we define the DR represent the ratio of the total positive attacks (TP) identified from the total positive (TP) and total negative (TN) attacks as DR =
TP . TP + TN
(2)
Similarly, the FAR is the ratio of the false positive attacks identified to the sum of false positive and the total negative attacks as F AR =
FP . FP + TN
(3)
We used the Na¨ıve Bayes based IDS (NBIDS) and the Bayes Net based IDS (BNIDS) to as the evaluation methods for the proposed approach.
154
A. J. Meerja et al.
Table 2 shows the accuracy of the intrusion detection. The proposed approach is 98.6% accurate in identifying intrusion from the network traffic data. From the Table 2, we can prove that the proposed algorithm outperforms NBIDS and the BNIDS approaches. The Gaussian Na¨ıve Bayes classification approach works well for large data sets and streaming data. We found that the Na¨ıve bayes is biased towards imbalanced data set. Table 2. Accuracy comparison. Algorithm
Accuracy (%)
NBIDS
94.11
BNIDS
97.26
Proposed approach 98.6
In Table 3, we show the performance of the proposed approach in terms of three evaluation metrics. A good IDS should have high DR and low FAR. Detection rate of proposed approach is recorded as 93.6% and FAR is 0.001. Results obtained by the proposed approach gives an indication that the proposed approach is well suited for IDS. Table 3. Performance of the proposed approach with 3 metrics. Metric
Value (%)
Accuracy 98.6
5
DR
93.6
FAR
0.001
Conclusion
In this paper, we presented a paper on IDS using Gaussian Na¨ıve Bayes supervised classifier. Security has become a major issue due to wide number of devices connected to internet and threats towards various devices. Various IDS techniques have been implemented using machine learning techniques and it has been observed that most of the IDS methods records less accuracy, detection rate and more false alarm rate. Proposed Gaussian Naive bayes based IDS model recorded remarkable accuracy with less false alarm rate compared with conventional methods. Deep learning with optimization techniques will further improve the accuracy of the IDS.
Gaussian Na¨ıve Bayes Based Intrusion Detection System
155
References 1. Ashfaq, R.A.R., Wang, X.Z., Huang, J.Z., Abbas, H., He, Y.L.: Fuzziness based semi-supervised learning approach for intrusion detection system. Inf. Sci. 378, 484–497 (2017) 2. Javaid, A., Niyaz, Q., Sun, W., Alam, M.: A deep learning approach for network intrusion detection system. In: Proceedings of 9th EAI International Conference on Bio-Inspired Information and Communications Technologies, pp. 21–26 (May 2016) 3. Ambusaidi, M.A., He, X., Nanda, P., Tan, Z.: Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans. Comput. 65(10), 2986–2998 (2016) 4. Pan, S., Morris, T., Adhikari, U.: Developing a hybrid intrusion detection system using data mining for power systems. IEEE Trans. Smart Grid 6(6), 3104–3113 (2015) 5. Aljawarneh, S., Aldwairi, M., Yassein, M.B.: Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J. Comput. Sci. 25, 152–160 (2018) 6. De la Hoz, E., De La Hoz, E., Ortiz, A., Ortega, J., Prieto, B.: PCA filtering and probabilistic SOM for network intrusion detection. Neurocomputing 164, 71–81 (2015) 7. Pajouh, H.H., Javidan, R., Khayami, R., Ali, D., Choo, K.K.R.: A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks. IEEE Trans. Emerg. Top. Comput. 7(2), 314–323 (2016) 8. Zhang, B., Liu, Z., Jia, Y., Ren, J., Zhao, X.: Network intrusion detection method based on PCA and Bayes algorithm. Secur. Commun. Netw. 2018, 1–11 (2018) 9. Jamei, M., Stewart, E., Peisert, S., Scaglione, A., McParland, C., Roberts, C., McEachern, A.: Micro synchrophasor-based intrusion detection in automated distribution systems: toward critical infrastructure security. IEEE Internet Comput. 20(5), 18–27 (2016) 10. Folino, G., Pisani, F.S., Sabatino, P.: A distributed intrusion detection framework based on evolved specialized ensembles of classifiers. In: European Conference on the Applications of Evolutionary Computation, pp. 315–331 (May 2016) 11. Jabbar, M.A., Aluvalu, R.: RFAODE: a novel ensemble intrusion detection system. Procedia Comput. Sci. 115, 226–234 (2017) 12. Jabbar, M.A., Aluvalu, R., Reddy, S.S.S.: Intrusion detection system using Bayesian network and feature subset selection. In: IEEE International Conference on Computational Intelligence and Computing Research, pp. 1–5 (December 2017) 13. Shakeri, A., Garrich, M., Bravalheri, A., Careglio, D., Sol´e-Pareta, J., Fumagalli, A.: Traffic allocation strategies in WSS-based dynamic optical networks. J. Opt. Commun. Netw. 9(4), B112–B123 (2017) 14. Kuang, F., Xu, W., Zhang, S.: A novel hybrid KPCA and SVM with GA model for intrusion detection. Appl. Soft Comput. 18, 178–184 (2014) 15. Reddy, R.R., Ramadevi, Y., Sunitha, K.V.N.: Effective discriminant function for intrusion detection using SVM. In: Proceedings of International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 11481153 (2016) 16. Quinlan, R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
156
A. J. Meerja et al.
17. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, Burlington (1993) 18. Farnaaz, N., Jabbar, M.A.: Random forest modeling for network intrusion detection system. Procedia Comput. Sci. 89, 213–217 (2016) 19. Mukhrjee, S., et al.: Intrusion detection using Na¨ıve Bayes classifier with feature reduction. Procedia Technol. 4, 119–128 (2012) 20. Bashah, N., et al.: Hybrid intelligent intrusion detection system. WASET 11, 23–26 (2005) 21. Ahmed, I., et al.: Feature subset selection for network intrusion detection mechanism using genetic eigen vectors. In: Proceedings of CSIT, vol. 5 (2011) 22. Song, J., Takakura, H., Okabe, Y., Eto, M., Inoue, D., Nakao, K.: Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation. In: Proceedings of First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, pp. 29–36 (April 2011)
Traveler Behavior Cognitive Reasoning Mechanism Ahmed Tlili1(&), Salim Chikhi1, and Ajith Abraham2 1
Complex Systems Modeling and Implementation (MISC Labs), Abdelhamid Mahri University, Constantine, Algeria [email protected], [email protected] 2 Machine Intelligence Research Labs (MIR Labs), Scientific Network for Innovation and Research Excellence, Auburn, WA 98071, USA [email protected]
Abstract. In this work, we use the kosko’s fuzzy cognitive maps to represent the reasoning mechanism in complex dynamic systems. The proposed approach focuses on two points: the first one is to improve the learning process by providing a connection between Kosko’s FCMs and reinforcement learning paradigm, and the second one is to diversify the states of FCM concepts by using an IF-THEN rules base based on the Mamdani-type fuzzy model. An important result is the creation of the transition maps between system states for helpful knowledge representation. After transition maps are validated, they are aggregated and merged as a unique map. This work is simulated under Matlab with Fuzzy Inference System Platform. Keywords: Fuzzy Cognitive Maps Salesman Problem
Reinforcement Learning Traveling
1 Introduction The more intensively studied optimization problem is the Traveling Salesman Problem (TSP). TSP is ranged in the combinatorial NP-hard problem that requires more calculation time, because the number of possible circuits is extremely wide even for cases were number of cities is small. For this reason, the use of the heuristic technics is suitable. TSP, as a nonlinear NP-complete problem, is formulated as follows: A salesman visits n cities that he starts by chooses one amongst cities goes to each city and returns to the starting one. So he provides a complete tour that combines’ all cities where TSP objective now is cost minimizing in energy or time. TSP, mathematically in the literature, is well characterized and described but cannot be solved with the exact methods therefore the heuristic methods are used. In last decades, many studies have using FCMs formalism [1, 2], to study dynamic systems, and have given hopeful results [3–5]. In this work we assume that the task performed by the traveler to find a best tour with a minimum of cost is in nature a cognitive task. based on this idea we present in this paper an approach based on FCM cognitive formalism with Reinforcement Learning (RL). © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 157–167, 2021. https://doi.org/10.1007/978-3-030-49345-5_17
158
A. Tlili et al.
2 Literature Review TSP is one of the most studied problems in the optimization field. Among the methods developed by researchers we discuss two methods related to our approach, namely: Hopfield Neural Networks with Genetic Algorithm and Fuzzy Self-Organizing Maps. Liu et al. [14] applied Hopfield Neural Network (HNN) with Genetic Algorithm (GA) in TSP reasoning mechanism so GA-HNN was established. In GA-HNN there is a connection between the property of the GA and the parallelism mechanism of HNNs. This connection seems be, in their work, between global stochastically searching ability of GA and self-learning ability of HNN. According to the authors the proposed method applied to TSP optimization has the advantages of convergence, precision and calculation stabilization. Kajal and Chaudhuri [15] illustrated how the Fuzzy Self-Organizing Map (FSOM) can be used to improve TSP reasoning mechanism in the winner city search by integrating its neighborhood preserving property and the convex-hull property of the TSP. In order to improve learning at each stage, FSOM draws for all excited neurons to the input city and in the meantime excites them towards the convex-hull of cities cooperatively.
3 Theory Background 3.1
Fuzzy Cognitive Maps
The term of Cognitive Map (CM) was introduced in 1948 by Tolman [9] and described the abstract mental representation of space built by rats trained to navigate in the labyrinth. The term of Fuzzy Cognitive Maps (FCM) as illustrated in Fig. 1 was introduced by Kosko [2], to designate a simple extension of CMs by the connection between fuzzy logic and artificial neural networks. FCMs can describe the dynamic behavior of entities. They are directed graphs with nodes representing concepts categorized into sensory, motor and effectors. Arcs represent causal relationships between concepts. Each arc from one concept Ci to one concept Cj is associated with a weight xij which reflecting a strength of causal relationship: inhibition if xij < 0 or excitation if xij > 0. The activation degree for each concept is associated and it represents its state at time t, and over time can be modified. For more detail about FCMs refer to [6]. Kosko [2] proposed Eq. (1) to calculate values of each concept: Xki þ 1 ¼ f
X
Xjk :xji
In order to make the most of the history of the concepts, (2) was proposed: X Xjk :xji Þ Xki þ 1 ¼ f ðXik þ
ð1Þ
ð2Þ
Traveler Behavior Cognitive Reasoning Mechanism
159
Fig. 1. An FCM as a graph
3.2
Reinforcement Learning
Formal framework of reinforcement learning is defined by the Markov Decision Processes (MDP) [12] where MDP process is defined by: • • • •
S, a finite set of states. s Є S A, a finite set of actions in state s. a Є A(s) r, a reward function. r(s, a) Є R P, the probability of transition from one state to another depending on the selected action. P (s′ | s, a) = Pa(s, s′).
The solution is to find best policy of actions that achieves the aim by maximizing rewards beginning with any initial state. In all stages, the TSP chooses an action according to these outputs. So, the environment sends either award or penalty defined by: rk = h (sk, ak, sk+1). In RL paradigm, there is at each stage an accumulation of costs and its allows to find total cost represented by the formula R h(sk, ak, sk+1). In [7] the expected reward is weighted by the parameter c and come to be R c h (si, ai, si+1) with 0 c 1. The RL is to find a optimal policy p* among all possible action selection policy. The existence of optimal policy p whose is considered consequently the Bellman [10] optimality equation is satisfied: n o X Vp ¼ V ðsi Þ ¼ max Rðsi ; aÞ þ dð Pðsi ! si þ 1 ; aÞV ðsi þ 1 Þ
8s 2 S
ð3Þ
Equation (3) sets the value function of the optimal policy that RL will seek to assess: V ðsÞ ¼ max V p ðsÞ
3.3
ð4Þ
Q-Learning Algorithm
Q-learning algorithm was developed by Watkins is the is one of the most popular reinforcement learning methods based on temporal difference learning technic TD (0). Q-Learning algorithm technique is to establish a quality function represented by one value for each state-action couples and Qp (s, a) is to reinforce estimate when choice is to starting from state s, with a as an action by following a policy p. In this technique [13], for any policy p and any state s 2 S, the value of executing action a in state
160
A. Tlili et al.
s under policy p denoted by Qp(s, a) correspond to the expected future reward starting from state s: Q ðs; aÞ ¼ max Qp ðs; aÞ
ð5Þ
Where Qp(s, a) = E Rcri and Q*(s, a) to references the optimal state-action with following policy p* if Q*(s, a) = max Qp(s, a) and if we reach the Q*(si, ai) for each pair state-action then we say that the agent can reach the goal starting from any initial state. The value of Q is updated by the following equation: Qk þ 1 ðsi ; ai Þ ¼ Qk ðsi ; ai Þ þ a hðsi ; ai ; si þ 1 Þ þ c arg maxðQk ðsi þ 1 ; aÞ Qk ðsi ; ai Þ ð6Þ
4 Proposed Approach Framework of the proposed method is shown In Fig. 2. The dynamic systems require a balance between exploitation and exploration processes in the search for optimal actions. An imbalance between these concepts can produce either a premature convergence, to a chaotic state, or a divergence that leads the system towards a deadlock situation. This equilibrium is achieved through reinforcement learning and performing actions based on a heuristic method.
Fig. 2. Framework of the proposed approach.
The proposed methode is summarized by pseudo code 1: P k Step 1: Generate the output vector X k þ 1 : X k þ 1 ¼ f ðX k þ X :-Þ Step 2: In response to environment: IF r = 1 //Award Qk þ 1 ðsi ; ai Þ ¼ Qk ðsi ; ai Þ þ a½1 Qk ðsi ; ai Þ W k þ 1 ðCi ; Cj Þ ¼ W k ðCi ; Cj Þ Pk þ 1 ð ai Þ ¼ Pk ð ai Þ þ b½1 Pk ð ai Þ
Traveler Behavior Cognitive Reasoning Mechanism
161
IF r = o //Penalty Qk þ 1 ðsi ; ai Þ ¼ ð1 aÞ Qk ðsi ; ai Þ W k þ 1 ðCi ; Cj Þ ¼ W k ðCi ; Cj Þ Pk þ 1 ð ai Þ ¼ ð1 bÞ Pk ðai Þ Step 3: Stop if the system converge. Otherwise go to Step 1.
5 Case Study: Symmetric Traveling Problem In the dynamic systems theory, we can model the TSP same as a sequential decision process SDP [11], designated by the sextuplet C = {Ґ, S, Ap, P, Q, W}. An alternative, between others, is to consider that a set of states S is composed by all cities for solving TSP. Dimension of S here is equivalent to the instance size of the problem The efficient understanding the power of proposed approach is presented in example of TSP with 5 cities shown on Fig. 3. All action aij are being to visiting the city sj from city si, and the number associated to each arc corresponds to the distance between cities: 1. Ґ: iteration set instants denoted by Ґ = {1… n}, where the number of n cities that form a route for TSP corresponds to cardinality of Ґ. 2. S: set of states represented by S = {s1,…, sn}, with each state si, i = 1, …, n corresponds to a city. 3. Ap: The set of possible action Ap ¼ Ap ðs1 Þ [ . . . [ Ap ðsn Þ ¼ fa12 ; a13 ; . . .; an;n1 g 4. P: transition probability function between states s 2 S with the elements pij(sj|si, aij) is the probability to reach state sj were it is in state si choose action aij. 5. Q: one pair of (state, action) value measures quality function denoted by Qðsi ai Þ. 6. W: weight matrix between concepts and is a function of S S in ℜ relating a weight Wij to pair ðsi ; sj Þ. The best way to initialize the connection weights is to take Wij inversely proportional to the distance between cities Wij = 1/dij.
Fig. 3. Graph of the example TSP with 5 cities.
In summary, the main objective is to find shortest path of visiting n cities exactly one time and returning to the initial city [13]. The mathematical description is:
162
A. Tlili et al.
Minimize n X
X
dij xij
ð8Þ
xij ¼ 1 j ¼ 1; 2; . . .n
ð9Þ
xij ¼ 1 j ¼ 1; 2; . . .n
ð10Þ
i¼1 n X j¼1
Where dij represents distance between cities i and j; in the permutation matrix, decision variable indicate that the path is from city i to city j; be a sign of the route which isn’t chosen by the salesman. Equation (8) represent the objective function, (9) and (10) are the constraints to ensure that each city will be visited only one. One solution to the problem, a tour visiting all cities and return to the started city, can be encoded as a permutation matrix, i.e., a binary square matrix containing exactly one `1’ per column and row. In this matrix, a line represents a city and a column indicates the order of visiting this city’s during a tour. For Fig. 3, one possible tour BDAECB is shown in Table 1. Table 1. One accepted solution for 5-cities TSP. A B C D E
A 0 1 0 0 0
B 0 0 0 1 0
C 1 0 0 0 0
D 0 0 0 0 1
E 0 0 1 0 0
The dynamics of fuzzy cognitive map is guided at each step in the evolution of the system with allowed actions to move from one state si to state sj, i.e. at the heart of the construction of the commercial traveler solution is constrained by behavioral adaptation in a given step made that certain actions are not available to go from state si to state sj. The possible actions set is denoted by Ap= Ap (s1) [ … [ Ap (sn) with Ap(si) = {aij, aik,…, ain}. For example, if in step k one has the partial following solution solp: si ! sj ! sk with i > j > k, then the possible actions in this stage to advance to the next stage are: Ak(sk) = {akr, r 6¼ i and r 6¼ j}. In this case the states si and sj with respectively aki and akj actions are not feasible, and this is to prevent the passage through the same state (the same city) more than once accordingly to respect constraints.
6 Hybrid Learning Fuzzy Cognitive Maps HLFCM The inference mechanism, by IF-THEN rules, start after fuzzyification process of the input data is accomplished. The search for the best solution at each step, the system is in a state represented by concepts of FCMs constructed at this stage and we called
Traveler Behavior Cognitive Reasoning Mechanism
163
transition card. The traveler arrived at this stage is always seeking to transit to a future city (state), among the possible cities by optimizing the reward of the environment and respecting the constraints imposed, a city is visited once and only once, by adapting his behavior by removing actions that are not permitted at this stage (Fig. 4).
Fig. 4. Transition map as a sub trip.
x is the new state, axy….axyi possible actions at step k and y1….yi are possible state or cities to visits. The adaptation of behavior is also guided at each step by using the parameter transition between state si, this parameter is equal to 0 if the state is not previously visited and equal to 1 if the state is already visited. d¼
1 if the state is visited 0 if the state is not visited
ð11Þ
For TSP the fuzzy rules can be designated as: Rulek: IF x1 is s1 and x2 is s2…and xk is sk THEN yk is Ok Where x1, x2,…, xk are the input at step k. s1, s2,…sk the membership function of the fuzzy rules represents states or cities and yk the output of the rule Rulek designated by membership function Ok. This fuzzy rule is also known as Mamdani fuzzy type model or linguistic fuzzy model. For example, in our case study TSP of 5 cities, the fuzzy rule associated for the transition card at step k can became as (Table 2): Table 2. Fuzzy rule processes. IF x1 x2 x3 x4 A C D E A D E B
THEN y B C
In this example, the traveler’s will take a choice between two actions that lead to two states or two different cities (represented by concepts in LFCM). if in step 3 the salesman person is in the city D knowing that the initial starting state A and was the city he passed is the city C, the next possible cities or states are the city B or the city E, so the traveler must choose the next city to be visited in next step. For this the balance between exploration and exploitation is assured gradually based on the data of the table of the function Q values and the probability of each possible action at each stage (Table 3).
164
A. Tlili et al. Table 3. Fuzzy rules at step 3. IF x1 x2 x3 A C D A C D
THEN y B E
In this step the traveler has visited the cities A, C and D and must choose the next city to visit. Here there are two options either to go to the city B or city E. based on the constructed transition map Fig. 6, the choice is guided by the probabilities of possible actions at this level and the value of the function Q if it has already taken this path (Table 4 and Fig. 5). Table 4. Output vector as a solution. Input vectors 10000 10100 10110 11110
Output vectors 10100 10110 11110 11111
Iteration 1 2 3 4
Fig. 5. Transition map at step 3.
In this stage traveler has two possible actions namely aDB and aDE. Their corresponding Q-values and probabilities are initially depicted in the next Table 5 as follows: Table 5. Action probabilities and Q-function values. ai P (ai) Q (si, ai) Value (B adb) Qdb adb pdb ade pde (E ade) Qde
The Q-function values of (state, action) initially receives a null value for all items, i.e., Q(si, aij) = 0, and a table of action probabilities initially receives a 1/n value for everyone actions at each associated state, and n is the number of actions at this state. At
Traveler Behavior Cognitive Reasoning Mechanism
165
everyone iteration, the updates of Q-value and probability actions are made using pseudo code described in pseudo code 1. The Q value is rounded to 1 for the winner concept, which means this concept is activated, after the environment’s response on the action giving the best result.
7 Experimental Results The targeted objective here is behavioral adaptation in decision making during an autonomous entity reasoning mechanism. Tests were carried out using two instances of the TSPLIB library [8]: Burma14 and Ulysses16 (Table 6). Table 6. TSP instances information. Instance Number of cities Optimal solution Burma14 14 3323 Ulysses16 16 6859
After 20 runs on each city set, all statistics for HLFCM were generated and shown in table described below: Table 7. Statistics comparison. Instances and optimal TSPLIB solution Burma14 (Optimal solution 3323) Ulysses16 (Optimal solution 6859)
Classical FCM solution 4624
Deviation classical FCM/Optimal solution 34.15%
HLFCM solution 3334
Deviation HLFCM/Optimal solution 0,33%
8726
27,21%
6873
0,20%
The comparison, described in Table 7 and shown on Fig. 6, between conventional FCM and FCM with hybrid learning shows that FCMs are able to learn from experiences and use their historical past in a very optimal way to model and simulate of the dynamic systems. At all iteration, one concept is active, i.e. its value is equal to 1, and the value of other concepts of the transitional card is initialized to 0. Evolution of the modeled system is performed by the reasoning mechanism implemented using the inference process described by the pseudo code of the pseudo code 1.
166
A. Tlili et al.
Comparaison Classical FCM vs Learning FCM
Solution evolution
10000 8000 TSP optimal solution Classical FCM Learning FCM
6000 4000 2000 0 Burma14
Ulysses16
TSP Instances
Fig. 6. Classical and hybrid learning FCM evolution Solution.
8 Conclusion In this paper, the study of traveler behavior was focused on the cognitive reasoning mechanism induced by traveler and the TSP is taken here just as a representative example. Studies of salesman traveler behavior in computer science and other related science are most important for many reasons, for example, to optimize both travel related cost and time consumption. In the last two decades, many attempts have been made to give best solution using heuristic techniques. The discussed method solving in this paper is based on classical Kosko’s FCM type improved by a connection with RL. An heuristic way of updating concepts output value is presented. Based on fusion of the temporal transition maps, the whole FCM parameters were obtained and which led to more best results. Naturally the behavioral adaptation is a cognitive task that autonomous entities apply to adapt to their dynamic environment, so in this work we have targeted the TSP reasoning mechanism. In future works, we aim to test our approach on several instances of the TSP and from a mathematical point of view. We plan to improve the approach by formulating a standard model those implements, in a general manner, the reasoning mechanism of autonomous entities.
References 1. Axelrod, R.: Structure of Decision. Princeton Press, Princeton (1976) 2. Kosko, B.: Fuzzy cognitive maps. IJMM Stud. 24, 65–75 (1986) 3. Maikel, L., Ciro, R., Maria, M., Garcia, R.B., Koen, V.: FCMs for Modeling Complex Systems. Springer, Cham (2010) 4. Stylios, C.D., Peter, P.G.: Modeling complex systems using FCM. IEEE (2004) 5. Tarkov, M.S.: Solving the TSP Using a RNNs. Springer, JNAA (2015) 6. Buche, C.A., Parenthoen, M., Tisseau, J.: FCMs for the Simulation of Individual Adaptive Behaviors. Wiley, San Mateo (2010) 7. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, London (2005) 8. Web TSP. http://comopt.ifi.uni-heidelberg.d/software/tsplib/index.html/ 9. Tolman, E.: Cognitive maps in rats and men. Review 55,189–208 (1948) 10. Thomas, J.: Dynamic macroeconomic theory. Section 1(1-1), 4 (2010)
Traveler Behavior Cognitive Reasoning Mechanism
167
11. Leon, M, Nápoles, G., Bello, R., Mkrtchyan, I., Depaire, B. Vanhoof, K.: Tackling travel behavior: an approach based on FCMs. In: IJCIS, vol. 6 (2013) 12. Jasmin, E., Imthias, T.P., Jagathy, V.P.R.: Reinforcement Learning approaches to economic dispatch problem. Elsevier (2011) 13. Donald, D.: Traveling salesman problem, theory and applications. InTech Janeza publisher, 51000 Rijeka, Croatia, Copyright © (2010) 14. Liu, J., Qiu, W.: GA-Hopfield network for transportation problem. IEEE (2008) 15. Kajal, D., Chaudhuri, A.: A study of TSP using fuzzy self organizing ap. In: Davendra, D. (ed.) TSP Theory and Applications. Intech Books (2010). ISBN 978-953-307-426-9
Grading Retinopathy of Prematurity with Feedforward Network Shantala Giraddi1(&), Satyadhyan Chickerur2(&), and Nirmala Annigeri1 1
School of Computer Science and Engineering, KLE Technological University, Hubli 580031, Karnataka, India [email protected], [email protected] 2 Center for High Performance Computing, KLE Technological University, Hubli 580031, Karnataka, India [email protected]
Abstract. Retinopathy of Prematurity is a disease that affects premature infants having low birth weight. The disease may lead to blindness unless timely treatment is not provided. Because of the high birth rate premature babies and expanded neonatal care, the incidence of ROP is worrying in India today. There is an urgent need to create awareness about disease. The researchers propose a new approach of grading ROP with feed forward networks using second order texture features. Experiments are conducted with six different architectures of Feed Forward Networks. Second order texture features mean, entropy, contrast, correlation, homogeneity, energy from Gray level co-occurrence matrix (GLCM) are considered. The results obtained indicate Feed forward network offers an easy yet effective paradigm for ROP Grading. Keywords: ROP
Grading Feed forward Multilayer Retinopathy
1 Introduction Retinopathy of prematurity is a disease occurring in premature infants and children of low birth weight. ROP is disorder of retina that potentially leads to blindness in those infants. In full term infants, there is a complete development of the retina and retinal vasculature, and ROP can’t occur. Nevertheless, the development of the eye is incomplete in premature infants. The risk factors for ROP development include early gestational age, low birth weight, and supplemental oxygen therapy. Despite better survival and very low birth weight for premature infants, the frequency of ROP increased. Developed countries have conducted demographic studies on ROP and set the guidelines and screening criteria for ROP based on weight and gestational age. Developing countries are yet to assess and set the guidelines for ROP. The symptoms of retinopathy of prematurity are not recognized by visual inspection as they occur deep inside the eye, Only trained ophthalmologist can detect these signs with the assistance of opthal instruments.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 168–176, 2021. https://doi.org/10.1007/978-3-030-49345-5_18
Grading Retinopathy of Prematurity with Feedforward Network
169
The American Pediatrics Association formulated Guidelines for screening for all newborns babies. Greater via these tests, more babies with ROP are recognized. Complications include nystagmus (abnormal eye movements) and leukocoria (white pupils) in the case of severe ROP might happen. But, these are also general signs of trouble with vision. So the general rule is, if your child has any of these, an ophthalmologist should be seen immediately.
2 Background and Related Work Gelman et al. (2005) developed semi-automated software based on multiscale analysis. RGB images of size 640 480 pixels are considered. Segmentation, skeleton construction, vessel root selection, and tracking steps were followed by the software. Geometrical properties Curvature, diameter, and index of tortuosity (TI) are calculated for each segment. These parameters a have higher values for deceased image hence used for the detection of plus decease. Katie et al. (2013) performed plus disease detection using vascular tortuosity. A key component in the international Plus disease classification system of venous dilation in the posterior pole. There are several limitations for this approach. Other factors like rate of vascular change are also considered along with posterior retinal vessels. According to authors domain knowledge will improve the accuracy. Praveen Sen et al. (2015), discuss about the latest scenario of ROP in India and various treatment available for ROP. Authors conclude that laser treatment is the best treatment options create awareness so that infants get timely treatment. Walter et al. (2015) discuss the various aspects of telemedicine like imaging techniques, procedure to be followed, image quality, equipment maintenance, data Storage, image transfer protocol, backup etc. Authors discuss about the advantages of telemedicine evaluated i.e. like increase the number of infants, improve parent education about ROP. There are several disadvantages like RDFI-TM collects less information than required for deciding severity of ROP and current practical knowledge gaps. Jayadev et al. (2015) designed software that performed to pre-process the image, three protocols Grey Enhanced, Color Enhanced and Vesselness Measure. These images are evaluated by ROP specialist. The results indicate, compared to standard non-processed imagery, that clinically relevant functionality enhanced by each of the protocols provided clinically significant improved data. Campbell et al. (2016) used when accuracy is highest when Tortuosity of both arteries and veins, are considered. When using only arterial tortuosity is considered accuracy is 90%.
170
S. Giraddi et al.
Shah et al. (2016) carried out a derailed survey on prevalence and causes of ROP and according to them unfiltered oxygen supplementation in Europe and North America the late year 1940’s and 1950’s in triggered the ROP epidemic. In developed countries like UK, underweight infants weight less than 1500 g also survive whereas in developing countries like India, even bigger babies with birth weight 1750 and 2000 g have higher risk of ROP. Author’s opinion that the root cause for this is the lack of adequate neonatal care and no additional oxygen. Prematurity of retinopathy: past, present, future 2016. Shantala et al. (2016) proposed a novel technique with Haar and First order features from Horizontal, vertical and Diagonal components. K-NN and Decision tree are used for classification. K-NN classifier yielded better performance with 85% accuracy. Stefano Piermarocchi (2017) Considered parameters such as GA, BW, weight gain, oxygen and blood transfusion therapy and used three algorithms for ROP, WINROP, ROP Score and CHOP ROP. WINROP is a system of observation. The algorithm is based on postnatal weekly weights and insulin growth factor (IGF) serum levels. ROPScore is another easily accessible algorithm that requires mechanical ventilation with BW, GA and weight at the 6th week of life, blood transfusion and oxygen presence or absence (Eckert et al. 2012). The ROP model of the CHOP (Children’s Hospital Of Philadelphia) deals for postnatal weight gain, adapted from the ROP Model of PINT (Premature Infants in Need of Transfusion) using SAS technology. All these algorithms are calculating the score expressed as a decimal number. 0.010 is the standard alarm cut-off value for which the child is considered at risk. The algorithm’s aim is to classify all children at risk for the development of type 1 ROP. Hu et al. (2018) Performed grading of ROP images using Deep Neural Network filters unclassified images. Authors carried out experiments using Transfer learning approach used in each model called Alex Net, VGG, among them VGG pediatric ophthalmologists. Wang et al. (2018) Developed two different DNN models i.e. recognition and grading tasks. Zhang et al. (2018) carried out study on finding the presence and severity of Retinopathy of premature using CNN. In this paper, CNN’s novel architecture consists of a feature extracted sub- feature of variable images in an examination. Is to aggregate operator to bind features. Authors have experimented with optimal ensemble model and best model has yielded results of 97.6% accuracy.
Grading Retinopathy of Prematurity with Feedforward Network
171
3 Feedforward Network A multilayer feed forward neural network consists of a layer of input layer, hidden layer and output layer of units. There can be one or more layers of hidden layers. Multilayer networks can represent non-linear functions. In these networks, the output from one layer of neurons flows into the next layer of neurons. There are no connections from backwards. The application shows the number of input units and the number of output units. Determining the number of units in the hidden layers is an art and requires experimentation. Too few units would inhibit the training of the network. Too many layers can result in over fitting the network and also increases the training time. Some minimum number of units is needed to learn the target function accurately. Connection between the weights associated with it. Initially all these are initialized to random small value. During training phase, these values are gradually adjusted. Figure 1 shows the diagram of Feed Forward Network.
Fig. 1. Multilayer Feedforward network.
4 Dataset Description The dataset used for Grading ROP was provided by the IEEEDataPort (Table 2). The experimental study is conducted on dataset of 200 retinal images. For experimentation purpose, training dataset would include 80 Normal ROP and 80 Diseased ROP. Testing would consist of 20 Normal ROP and 20 Diseased ROP. Table 1 shows sample images.
172
S. Giraddi et al. Table 1. ROP images in each category
Grade 0
Grade 1
Grade 2
Grade 3
Grade 4
Grading Retinopathy of Prematurity with Feedforward Network
173
5 Proposed System The proposed methodology is shown in Fig. 2. The RGB images are converted to gray scale image before extracting features. Six different architectures are experimented. For Model1, Model2 and Model3, all the 18 features have been used. Model4, Model5 and Model6 have used 6 features. The six features correspond to GLCM features in 0°, 45° and 90° respectively.
Fig. 2. Schematic block diagram.
GLCM Features: Haralick is most popular texture descriptors. In this experimentation we have considered six Haralick features in three angles three angles (0°, 45° and 90°). The Procedure for Classification is given in the form of algorithm given below: a. b. c. d. e.
Read an Image. Perform preprocessing. Perform Feature extraction. Store the Feature. Apply Feed Forward Neural Network using Keras.
Table 2. Distribution of images into classes Classes
Number of images Number of images Training Testing Diseased 80 20 Non-Diseased 80 20
174
S. Giraddi et al.
Model 1: In this model we are considering one input layer with 18 neurons corresponding to 18 features. Five hidden layers with 10 neurons each. One output layer with 2 neurons, one for each class. Model 2: In this model we are considering one input layer with 18 neurons corresponding to 18 features. Two hidden layers with sixteen neurons each, and one hidden layer with ten neurons, and one hidden layer with eight neuron. One output layer with 2 neurons, because we are having 2 output classes. Model 3: In this model we are considering one input layer with eighteen nodes. We are having 18 features extracted from GLCM code so eighteen attributes in our input values. Three hidden layers with twelve neurons each, and 1 hidden layer with 10 neurons, and 1 hidden layer with 8 neuron. One output layer with 2 neurons, because we are having 2 output classes. Model 4: We are having 6 features extracted from GLCM code so. Two hidden layers with sixteen neurons each, and 1 hidden layer with 10 neurons, and 1 hidden layer with 8 neuron. One output layer with 2 neurons. Model 5: We are having 6 features extracted from GLCM Two hidden layers with sixteen neurons each, and 1 hidden layer with 10 neurons, and 1 hidden layer with 8 neuron. One output layer with 2 neurons. Model 6: We are having 6 features extracted from GLCM Two hidden layers with sixteen neurons each, and 1 hidden layer with 10 neurons, and 1 hidden layer with 8 neuron. One output layer with 2 neurons.
6 Results The Dataset would consist of 200 ROP images. A Comparative study of effectiveness of Haralick Features Computed from gray scale image. A total of 6 features have been identified which contribute in the Grading of ROP. The texture features include mean, entropy, contrast, correlation, homogeneity, energy. Feed Forward Neural Network is implemented using Keras. Thus, NN gives good results for classification by adjusting its errors and obtaining a minimum square error to give high accuracy of classification. For this work, NN toolbox in jupyter notebook has been used for performing classification (Tables 3 and 4). Table 3. Results with 18 GLCM feature Classifier Feed forward neural network Accuracy Sensitivity Specificity
Model1 70.37% 65.54% 75%
Model2 80% 75% 87.50%
Model3 80% 69.09% 85.71%
Grading Retinopathy of Prematurity with Feedforward Network
175
Table 4. Results with six GLCM features Classifier Model-4 0° Feed forward neural network Accuracy 68.42% Sensitivity 66% Specificity 75%
Model-5 45° Model-6 90° 63.16% 43.37% 66.% 54.54% 53.84% 55.55%
7 Discussion and Conclusion Our study showed ROP scoring using Feed forward neural network model. Practical images are sent for grading to clinicians and not graded accurately when the patient is in for screening. The skilled networks of Feed forward make a fast ranking as feasible. In this experiment we are predicting different model with their accuracy. Model1 is created getting the accuracy of 70.37%, Model2 is created and getting accuracy of 80%. Model3 is created getting the accuracy of 80%. In this project comparative study for different models has been studied. Also, classification and detection of all the five stages of Retinopathy of prematurity disease should constitute the future work.
References Campbell, J.P., Ataer-Cansizoglu, E., Bolon-Canedo, V., Bozkurt, A., Erdogmus, D., KalpathyCramer, J., Patel, S.N., et al.: Expert diagnosis of plus disease in retinopathy of prematurity from computer-based image analysis. JAMA Ophthalmol. 134(6), 651–657 (2016) Eckert, G.U., Fortes Filho, J.B., Maia, M., Procianoy, R.S.: Apredictive score for retinopathy of prematurity in very low birth weight preterminfants. Eye 26(3), 400–406 (2012) Fierson, W.M., Capone, A., American Academy of Pediatrics Section on Ophthalmology: Telemedicine for evaluation of retinopathy of prematurity. Pediatrics 135(1), e238–e254 (2015) Gelman, R., Martinez-Perez, M.E., Vanderveen, D.K., Moskowitz, A., Fulton, A.B.: Diagnosis of plus disease in retinopathy of prematurity using Retinal Image multiscale analysis. Invest. Ophthalmol. Vis. Sci. 46(12), 4734–4738 (2005) Giraddi, S., Gadwal, S., Pujari, J.: Abnormality detection in retinal images using Haar wavelet and First order features. In: 2016 2nd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), pp. 657–661. IEEE (2016) Hu, J., Chen, Y., Zhong, J., Ju, R., Yi, Z.: Automated analysis forretinopathy of prematurity by deep neural networks. IEEE Trans. Med. 38(1), 269–279 (2018) Jayadev, C., Vinekar, A., Mohanachandra, P., Desai, S., Suveer, A., Mangalesh, S., Bauer, N., Shetty, B.: Enhancing image characteristics of retinal images of aggressive posterior retinopathy of prematurity using a novel software, (RetiView). BioMed. Res. Int. 2015 (2015) Keck, K.M., Kalpathy-Cramer, J., Ataer-Cansizoglu, E., You, S., Erdogmus, D., Chiang, M.F.: Plus disease diagnosis in retinopathy of prematurity: vascular tortuosity as a function of distance from optic disc. Retina (Philadelphia, Pa.) 33(8), 1700 (2013) Piermarocchi, S., et al.: Predictive algorithms for early detection of retinopathy ofprematurity. Acta Ophthalmol. 95(2), 158–164 (2017) Sen, P., Rao, C., Bansal, N.: Retinopathy of prematurity: anupdate. Sci. J. Med. Vis. Res. Foun. 33(2), 93–6 (2015)
176
S. Giraddi et al.
Shah, P.K., Prabhu, V., Karandikar, S.S., Ranjan, R., Narendran, V., Kalpana, N.: Retinopathy of prematurity: past, present and future. World J. Clin. Pediat. 5(1), 35 (2016) Wang, J., Ju, R., Chen, Y., Zhang, L., Hu, J., Wu, Y., Dong, W., Zhong, J., Yi, Z.: Automated retinopathy of prematurity screening using deep neural networks. EBioMedicine 35, 361–368 (2018) Zhang, Y., et al.: Development of an automated screening system for retinopathy of prematurity usinga deep neural network for wide-angle retinal images. IEEE Access 7, 10232–10241 (2018)
Fraudulent e-Commerce Website Detection Model Using HTML, Text and Image Features Eric Khoo1, Anazida Zainal1(&), Nurfadilah Ariffin1, Mohd Nizam Kassim1,2, Mohd Aizaini Maarof1, and Majid Bakhtiari3 1
Cyber Threat Intelligence Lab, Information Assurance and Security Research Group, School of Computing, Faculty of Engineering, Universiti Teknologi Malaysia, 81310 Skudai, Johor, Malaysia [email protected], {anazida,aizaini}@utm.my, [email protected] 2 Cyber Security Responsive Services Division, CyberSecurity Malaysia, Level 7, Tower 1, Menara Cyber Axis, Cyberjaya Selangor, Malaysia [email protected] 3 Islamic Azad University, Central Branch, Tehran, Iran [email protected]
Abstract. Many of Internet users have been the victims offraudulent e-commerce websites and the number grows. This paper presents an investigation on three types of features namely HTML tags, textual content and image of the website that could possibly contain some patterns that indicate it is fraudulent. Four machine learning algorithms were used to measure the accuracy of the fraudulent e-commerce websites detection. These techniques are Linear Regression, Decision Tree, Random Forest and XGBoost. 497 e-commerce websites were used as training and testing dataset. Testing was done in two phases. In phase one, each features was tested to see its discriminative capability. Meanwhile in phase two, these features were combined. The result shows that textual content has consistently outperformed the other two features especially when XGBoost was used as a classifier. With combined features, overall accuracy has improved and best result of accuracy recorded was 98.7% achieved when Linear Regression was used as a classifier. Keywords: Fraudulent website
Textual content HTML tags and image
1 Introduction Internet has become the main target for fraudsters to prey on their victims due to its wide coverage and easy access. It is a great avenue to sell and buy goods online. Unfortunately it can also be used by fraudsters to trick users to buy fake products. Fraudulent e-commerce websites selling counterfeit products, not only cause loss to consumers but also affect digital advertising ecosystem (Wu et al. 2018). These websites look so real and difficult for users to differentiate them from legitimate websites. Fraudulent e-commerce not only tricks users, it also tarnishes the reputation of legitimate online shops. It is estimated that 20% of the websites are fake (Gyongyi and Garcia-Molina 2005). Due to undesirable effects of fraudulent websites, many © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 177–186, 2021. https://doi.org/10.1007/978-3-030-49345-5_19
178
E. Khoo et al.
studies have come up with various methods to detect the fraudulent websites such as in (Maktabar et al. 2018; Abbasi et al. 2010). Unfortunately fraudulent websites still grow and the amount of losses has reached millions of dollars. As described by Maktabar et al. (2018), there are few major challenges and among them are: i) The new generation of web programming technologies increased the complexity of web scrapping and limits the fraudulent websites detection abilities to access the web content ii) Fast growth of fraudulent websites and its dynamics, make the static (blacklist) nature of fraudulent websites detection becomes irrelevant and obsolete. The new demand for an efficient fraudulent website detection system is amounting. And effective and accurate fraudulent website detection model should be able to address these challenges. This study proposes a fraudulent website detection model by using three different features which are; HTML tags, image of the website and its textual content. We have limit our scope to only focus on detection of fraudulent ecommerce websites. This paper is organized in six sections. Section 2 describes some of related works in fraudulent website detection including the detection of phishing websites. Meanwhile, Sect. 3 describes our proposed model and experimental setup. Results and its discussion is presented in Sect. 4 and finally Sect. 5 concludes the paper.
2 Existing Works There are two categories of fake websites; i) those that target at search engine and known as web spam and ii) those that attack web users (Dinev 2006). The focus of this study falls under second category where it can give serious impact to e-commerce and involve transfer or money. Based in literature search, there are not many works reported on the subject of fraudulent e-commerce websites. Instead, we survey existing works in the domain close to fraudulent websites, which is phishing websites. One major difference is phishing websites normally have a login form unlike fraudulent e-commerce websites. Therefore discussion of the existing works will not include this feature. The subsequent paragraphs discuss existing works on detection of phishing websites. Generally there are two approaches of detection being implemented; lookup where it solely relies on the blacklists comprising of uniform resource locators (URLs) of fake/fraudulent/phishing websites. This blacklist is manually updated by human. Usually there are working groups that provide phishing and fake websites such as AntiPhishing Working Group, PhishTank.com, Escrow-Fraud.com and few others. One outstanding drawback is this approach is reactive. By the time this list is updated, many users have already being tricked and to some extent their credentials are stolen. Another approach is proactive classification, where detection does not rely on blacklist provided by human. Instead, it utilizes fraud cues (fraud indicators) such as content, image and URLs. Existing classifier systems implement simple, static rule-based heuristics and limited fraud cues, making them susceptible to easy exploits (Zhang et al. 2007). Since the focus of this study is classification systems, therefore only existing works on classification system will be discussed in this section.
Fraudulent e-Commerce Website Detection Model
179
CANTINA+ proposed by Xiang et al. (2011) utilizes features comprise of URL, HTML, DOM, third party services and search engine to detect phishing websites. Besides, a filtering algorithm was introduced to reduce false positive and human effort in detecting phishing websites. They also suggested to explore the visual and image elements of the websites and measure the similarities. There are three types of similarity metrics which are; block level similarity, layout similarity and overall style similarity. Usually this approach produces high false positive because fraudsters constantly use new strategies and utilizes more sophisticated technologies (Dinev 2006). Abbasi et al. (2010) proposed fake website detection using statistical learning theory (SLT) utilizing fraud cues from various categories. They explained in details on possible cues that originated from three major components of a website design: information, navigation and visualization design. i) web page text (content) usually contains cues are obtained from information design elements. This also includes misspelled words and grammatical errors which rarely occur in legitimate webpages. Lexical measures (Selis et al. 2001) and frequency of certain words (Ntoulas et al. 2006) ii) linkage information and URL can provide informative clues relating to navigation design characteristics. Linkage can be measured in terms of number of relative (e.g.,../../default.htm) and absolute (e.g., http://www.abc.com/ default.com) (Abbasi et al. 2010) and usually lengthier, or with dashes or digits are common fake websites (Abbasi et al. 2010) iii) image or visual design of a web page. Usually fake or fraudulent websites will reuse images of products from older fake websites. In our previous work (Maktabar et al. 2018) we have proposed a fraudulent website detection model based on sentiment analysis of the textual content, natural language processing and machine learning techniques. In this current study, we extend our investigation to find discrimination capability of the three major features which are HTML tags, text (content) and image of the main webpage. Four machine learning algorithms were used as classifiers. The best performed technique is highlighted in the discussion.
3 The Proposed Model and Experimental Setup This section discusses on the proposed e-commerce website fraud detection model and its’ process flow. Figure 1 shows the proposed model where it consists of four primary modules and they are; data acquisition, pre-processing, feature extraction and classification. The proposed model utilizes NLP techniques, three types of features (HTML, image and text) and four machine learning techniques. The following sub-sections will describe each module in details.
180
E. Khoo et al.
Fig. 1. Process flow of the proposed e-commerce website fraud detection model
3.1
Data Acquisition
The data used in this study composed of legal and fraudulent e-commerce websites. The data were crawled using Python script and its related packages that enable the scraping of metadata (HTML elements and URLs) and also the content of the website such as texts and images. For images, we screenshot the website’s main page manually. Almost 500 websites were crawled with 258 of them are legitimate and 239 are fraudulent websites. Table 1 shows the distribution of websites used in the study.
Table 1. Dataset distribution Category of website Amount Legitimate e-commerce website 258 Fraudulent e-commerce website 239 Total 497
3.2
Pre-processing (for Textual Content)
Data scrapped from Internet are noisy, therefore needs to be pre-processed before it can be used to train the classifier. Noise can impact the performance of classifier. Earlier, it has been mentioned that the type of data that is crawled from the websites are the website metadata, textual data and the images of the website. Thus, the pre-processing is only performed on the textual data of the websites (content). Figure 2 shows the process flow of this pre-processing.
Tags removal
Tokenization
Stop words removal
Text transformation
Fig. 2. Text Pre-processing steps
Text Vectorization
Fraudulent e-Commerce Website Detection Model
181
Textual data (unstructured data) cannot be readily used to train any machine learning classifier. Some processes are required to ensure the data is fit with the suitable format for the model training. The pre-processing module starts with tags removal for any remaining HTML tags still reside in the text after scraping process. Since BoW technique is adopted in this study, texts are tokenized at the word-level by chunking these texts word by word. Later, common and insignificant words will be removed by the stop words removal process. This step helps to remove any words that bring no meaning and could affect the learning process such as ‘the’, ‘is’, ‘are’ and etc. Next, transformation of text is performed by converting all the letters to lowercase and remove any special characters and punctuations. It is important to normalize the text where the redundancy of data can be reduced and data is standardized. Finally, as text is not suitable to be directly used in model training, it has to be vectorized where value is assigned to each word without losing the context of the text itself. When this process completes, the vectors are passed to next process and the features can be extracted. 3.3
Feature Extraction and Feature Selection
The features generated in this module are crucial because they can affect the performance of a classifier. If data is not well represented (e.g. using insignificant features) this could impair the discrimination capability of the learned model. As previously mentioned, this study uses three types of features; metadata features like the HTML tags and CSS elements, BOW features from the textual data and the image features captured from the main page of the websites. 3.4
Metadata (HTML Tags)
Metadata shapes the look of the websites. The website developers summarize the contents and allow the search engines use it to show the result searched by users. Metadata can be either the HTML tags or the URL reside in the code behind a webpage. In this study we used HTML tags. We calculated the amount of distinct elements such as input button, submit button, number of CSS file included and etc. The amount of these distinct features is potentially helpful to discriminate fraudulent e-commerce websites from the legitimate websites. 3.5
Bag of Words (Textual Content)
Using the Bag-of-Words model or also known as Unigram, it has simplified the representation of the text (like documents or sentences). This technique is simple, efficient and popularly used technique in Natural Language Processing (NLP). It keeps track of the occurrences of words in a given document and it creates a corpus of word counts for each data (documents) using TF-IDF. TF-ID helps to reflect the significance of words in a given document in a collection of a corpus.
182
3.6
E. Khoo et al.
Images
In this study, the image of the main page of each websites represents an image feature. The images features are extracted using Image Moments technique. It works by weighting the average of image pixel intensities. The logic behind this technique is to match the image and spot any difference between legal and fraudulent main page in terms of their main page screenshots using the calculated weightage. 3.7
Classification
This study used supervised learning approach to do classification. Four popular classification techniques were used to discriminate fraudulent from legitimate e-commerce websites. From the features extracted and produced from previous module, they are separately fed into the model for training and the performance of each model was evaluated. The features were used separately during model training with the aim to find the best features in segregating fraudulent e-commerce website from legitimate one. For the classification model training, few techniques were used and they are; Logistic Regression, Random Forest, Decision Tree and XGBoost. According to literature, all of these algorithms are popularly used for prediction. Logistic Regression is a model used to describe and explain the relationship between dependent variable and independent variable. Likewise the Logistic Regression, Random Forest, Decision Tree and XGBoost are decision tree-based algorithms which differ through their evolution over time. Decision Tree is a graphical representation of possible outcome to a decision based on certain conditions. Random Forest is evolved from Decision Tree by adopting bagging concept where it randomly considers the subset of features to build a forest or collection of decision trees. Meanwhile, XGBoost algorithm is the latest evolution of Decision Tree with optimized-gradient boosting algorithm through parallel processing, tree pruning, handling missing values and regularize the model to overcome bias or overfitting during training. The experiments were done in two phases; phase one aims at studying each feature’s discriminative capability. Meanwhile in phase two, we investigated the discriminative capability of their combination. We split the data to 80/20 where 80% (equivalent to 374 documents) was used for training and the rest was used for testing. At the end of this process, the performance of each classifier was measured using the F1-score, accuracy and AUC.
4 Results and Discussion Table 2 shows the classification results of Logistic Regression, Decision Tree, Random Forest and XGBoost classifiers using three different features namely; HTML tags, website image and textual content of the website. Three different measurement metrics were used to measure their discriminative ability. These measures are F1-score, Accuracy and AUC.
Fraudulent e-Commerce Website Detection Model
183
Table 2. Performance of machine learning techniques using HTML, image and content/text features Technique Features Linear Regression HTML Image Textual content Decision Tree HTML Image Textual content Random Forest HTML Image Textual content XGBoost HTML Image Textual content
F1-score 0.800 0.737 0.926 0.857 0.736 0.918 0.831 0.684 0.894 0.864 0.800 0.940
Accuracy 0.800 0.737 0.926 0.863 0.758 0.916 0.842 0.737 0.895 0.880 0.800 0.937
AUC 0.800 0.737 0.926 0.865 0.764 0.918 0.847 0.763 0.894 0.880 0.800 0.943
It is clear that textual content is the best cue to classify fraudulent e-commerce websites compared to the image and HTML features. The best accuracy recorded is 93.7% achieved by XGBoost classifier. Textual content consistently shows superior performance when used with different classification techniques. This consistency shows that textual content has distinctive features that enable a classifier to detect whether it is fraudulent or otherwise. Therefore, content feature should be considered to be incorporated in fraudulent website detection system. Meanwhile HTML features consistently outperform image feature. And in terms of classifier, XGBoost outperforms Linear Regression, Decision Tree and Random Forest. We further investigated the capability of these three features when combined. The result of classification using the combined features is shown in Table 3. Table 3. Performance of machine learning techniques using the combined features (HTML, image and text) Technique Linear Regression Decision Tree Random Forest XGBoost
F1-score 0.989 0.898 0.911 0.979
Accuracy 0.989 0.895 0.916 0.979
AUC 0.989 0.895 0.915 0.978
Overall, the results show an improvement if these features are combined and the best performance is achieved by using Linear Regression technique. Except, there is a slight drop for Decision Tree. Combined features achieves lower accuracy than textual content features when it is used alone with Decision Tree classification technique.
184
E. Khoo et al.
Further investigation is needed to determine this peculiar behavior of Decision Tree when used with combined features. Figure 3 shows AUC for each classification techniques for the combined features.
Fig. 3. AUC for four different classification techniques
Using term frequency (tf) we also identified the top ten words frequently appear in both fraudulent websites and legitimate websites. Figure 4(a) shows the top ten words found in fraudulent e-commerce websites. From these 10 most important words, it can be observed that the words related to price occurs the most, such as ‘price’ and ‘descuento’ (discount in Spanish). Meanwhile, we also explored the top 10 words with highest weightage for the legitimate websites as shown in Fig. 4(b). It can be seen that the most words frequently appeared in the legitimate websites are related to general items that we used daily such as ‘gift’, ‘accessories’ and few other common words.
Fraudulent e-Commerce Website Detection Model
185
Fig. 4. (a) Ten top words for fraudulent website. (b) Top ten words for legal websites
5 Conclusion Despite the effort put up by various government agencies to curb fraudulent ecommerce activities and many researches have been done to combat it, fraud cases are still increasing. This study has investigated the discriminative capability of three features namely; HTML tags, textual content and image of a website. The finding shows that textual content feature is the most significant to discriminate fraudulent ecommerce against legitimate websites. The highest accuracy is achieved using XGBoost classifier which is 93.7%. We further combined these three features and the combined features outperform the single textual content feature. The best accuracy result achieved when all the features are combined is 98.9% using Linear Regression. Acknowledgement. This work is supported by the Research Management Centre (RMC) at the Universiti Teknologi Malaysia (UTM) under High Impact Research Grant (HIR) (VOT PY/2018/02890).
References Abbasi, A., Zhang, Z., Zimbra, D., Chen, H., Nunamaker Jr., J.F.: Detecting fake websites: the contribution of statistical learning theory. MIS Q. 34(3), 435–461 (2010) Dinev, T.: Why spoofing is serious internet fraud. Commun. ACM 24(4), 76–82 (2006) Gyongyi, Z., Garcia-Molina, H.: Spam: it’s not just for inboxes anymore. IEEE Comput. 38(10), 28–34 (2005) Maktabar, M., Zainal, A., Maarof, M. A., Kassim, M.N.: Content based fraudulent website detection using supervised machine learning techniques. In: Abraham A., Muhuri, P., Muda, A., Gandhi, N. (eds.) Hybrid Intelligent Systems. HIS 2017, Advances in Intelligent Systems and Computing, vol 734, Springer, Cham (2018) Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th International World Wide Web Conference, Edinburg, Scotland, 23–26 May, pp. 83–92 (2006)
186
E. Khoo et al.
Selis, P., Ramasastry, A., Wright, C.S.: Bidder beware: toward a fraud free marketplace – best practices for the online auction industry. Center for Law, Commerce & Technology, School of Law, University of Washington, 17 April (2001) Xiang, G., Hong, J., Rose, C.P., Cranor, L.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Transactions on Information and System Security. 14(2), 1–77 (2011). Article no. 21 Wu, K.T., Chou, S.H., Chen, S.W., Tsai, C.T., Yuan, S.M.: Application of machine learning to identify counterfeit website. In Proceedings of WAS 2018, 19–21 November, Yogyakarta, Indonesia (2018) Zhang, Y., Egelman, S., Cranor, L., Hong, J.: Phinding phish: evaluating anti-phishing tools. In: Proceedings of 14th Annual Network and Distributed System Security Symposium, San Diego, CA, February 28–March 2 (2007)
Sleep Disorders Prevalence Studies in Indian Population Vanita Ramrakhiyani, Niketa Gandhi, and Sanjay Deshmukh(&) Department of Life Sciences, University of Mumbai, Mumbai 400 098, Maharashtra, India [email protected], [email protected], [email protected]
Abstract. Sleep is a necessary part of human functioning. Sleep disorders impair quality of life and thereby pose many health-related problems. The disease burden for sleep disorders is huge among the Indian population. Commonly found sleep disorders are insomnia, obstructive sleep apnea, hypersomnia, restless leg syndrome, and shift work disorder. Sleep Medicine is a recent field in the Indian sub-continent, the availability of data is sparse. Most of these studies are performed on the urban population and are based on subjective questionnaires. Sleep deprivation has a deteriorating effect on overall well being including weight gain, cardiovascular risks, diabetes, and cognition. The current review article highlights prevalence studies with special emphasis on the Indian population, thereby creating a need to spread awareness regarding the sleep disorders among physicians as well as the general population. Keywords: Excessive daytime sleepiness Hypersomnia Insomnia Neuropsychology test battery Obstructive sleep apnea Restless leg Syndrome Sleep deprivation Shift work disorder
1 Introduction Sleep is an essential part of healthy living. Adequate sleep helps maintain a good quality of life. Today’s changing lifestyle includes frequent travel to different time zone, shift work, odd working hours to meet deadlines/targets and nuclear family which changes the sleep pattern restricting the optimum hours of sleep. The brain prepares itself while sleeping for the next upcoming day activities and forms new pathways of remembering information and processing existing information. Sleep deprivation is defined as not having enough sleep. It can be acute or chronic. Sleep deprivation can be qualitative or quantitative. Table 1 describes the different types of sleep deprivation.
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 187–194, 2021. https://doi.org/10.1007/978-3-030-49345-5_20
188
V. Ramrakhiyani et al. Table 1. Types of sleep deprivation
Types of sleep deprivation Acute sleep deprivation Chronic Sleep deprivation Qualitative an Quantitaive sleep deprivation
1.1
Description Total sleep deprivation resulting due to sudden life event or trauma or work life stress Sleeping few hours less chronically Sleep disturbance due to sleep disorders such as obstructive sleep apnea, Insomnia, Restless leg syndrome
Effect of Sleep Deprivation on Physical Health
Chronic sleep deprivation is associated with increased risk of heart disease, kidney function, high blood pressure, diabetes risk and brain stroke. The linkage between obesity and sleep deprivation has also been well established. Sleep loss due to night work and sleep disorders has public health consequences. Sleep deprivation has proven to affect work-related performance failures, including environmental health disasters. Attention and reaction time are altered by induced sleep loss, which results in the accumulative, dose-dependent deterioration of attention, and response time. Philips Sleep Survey, conducted in November 2009, discovered some surprising facts concerning sleep deprivation among the Indian population. The survey reported the sleep disorder/deprivation prevalence of 93% among the Indian population. The incidence of sleep apnea was observed to be 34% that can result in obesity and cardiovascular diseases in the future. This high prevalence of sleep apnea can be occurring due to the sedentary way of life. Chronic sleep deprivation affects the overall quality of life including developing insulin resistance, obesity, cardiovascular diseases and productivity. The fundamental role in a child’s proper health is played by an optimum amount of sleep. It is important for parents to understand proper sleep hygiene along with other factors such as schooling, food and tuitions.
2 Prevalence Studies Conducted Among Indian Population Following section describes sleep disorder prevalence studies among the Indian population reported year wise. Udwadia et al. (2004) reported the prevalence of sleep disordered breathing and sleep apnea in the middle-aged urban Indian men. The prevalence of sleep disordered breathing was reported to be 19.5% while that of excessive daytime sleepiness was found to be 7.5%. Saxena et al. (2006) estimated the prevalence of sleep disordered breathing in a questionnaire based survey in Mumbai. Habitual snoring frequency was found to be 6.64% among the adult population. The study estimated the prevalence of sleep disordered breathing in the Indian population to be between 1.64% to 3.42%. The study raised the alarm for an increasing number of apneic populations, mostly undiagnosed and untreated, leading to disabling symptoms to sufferers.
Sleep Disorders Prevalence Studies in Indian Population
189
Gupta et al. (2008) conducted a cross-sectional questionnaire study to investigate the variance in sleep habits of adolescents of various high school grades among the urban Indian population. Variables utilized were bed timings, total sleep time, sleep latency, spontaneous arousals, wake-time after sleep onset, sleep efficiency, quality of sleep, daytime napping and daytime somnolence. Total sleep time was reported to be less among higher grades students. Chronic sleep deprivation in the form 1 h of sleep debt was observed across the study population. The results are depicted in Table 2. Table 2. Prevalence of sleep disorders among adolescents Grade/parameter 9th 10th 11th 12th
Total sleep time Awakenings 8h 35.9% 7.7 h 44.7% 7.9 h 40.3% 7.6 h 28.3%
Daytime somnolence 37.2% 39.1% 39.7% 54.2%
Krishna and Shwetha (2008) conducted a questionnaire based study on 67 medical students aimed to analyze the quality of sleep among medical students. The parameters included were sleep, blood pressure (BP), body mass index (BMI) and academic performance. A high prevalence of poor sleep quality was reported. Suri et al. (2008) conducted a questionnaire based survey to determine the prevalence of sleep disorders in the adult population of Delhi. This study depicted the lack of awareness among the general population about sleep disorders impacting the social, mental, physical and economic health of society. The prevalence of various sleep disorder symptoms are represented in Table 3. Table 3. Prevalence of sleep disorder symptoms among adults Snoring 39.5%
Sleep disordered breathing 4.3%
Excessive daytime sleepiness 48.6%
Sleeping pill usage 2.3%
Chronic sleep deprivation 50%
Ghoshal et al. (2008a, b) evaluated 120 asthmatics over two years to study the incidence of excessive daytime sleepiness at Kolkata medical college in India. The severity of asthma was definitely correlated with excessive daytime sleepiness and level of control. However, the mode of diagnosis of asthma was not well correlated with excessive daytime sleepiness. Meshram Sushant et al. (2008) conducted a questionnaire-based study to assess the behavior, attitude and knowledge of sleep medicine among resident doctors had concluded that there was an intense need for including sleep medicine in their curriculum. Suri et al. (2009) conducted a questionnaire based study to determine the prevalence of common sleep-related disorders in the elderly population of Delhi in India. The results are shown in Table 4.
190
V. Ramrakhiyani et al. Table 4. Prevalence of sleep disorders among elderly population
Snoring Sleep disordered breathing Excessive daytime sleepiness Sleeping pill usage 41.4% 10.3% 41.5% 8%
Ravikiran et al. (2010) evaluated sleep problems in preschool and school aged rural Indian children as depicted in Table 5. Table 5. Sleep problems among school children Parameter Bedtime problems Excessive Daytime sleepiness Awakening during night Regularity and duration of sleep Sleep disordered breathing
Preschool 33% 32.5% 25% 19.84% 4.8%
School 14.9% 1.9% 11.87% 4.98% 5%
The study concluded that sleep problems are common among rural Indian children. Philips sleep survey published in June 2010 in Express Healthcare reported that 93% of Indians were not sleeping enough and 34% were at risk of obstructive sleep apnea, which can lead to weight gain and even difficult situations such as worsening of heart. The survey also suggested the incidence of sleep disorders in the urban population ranged from 2.5 to 20% of the general population. The prevalence of OSA was estimated at 4%. The article also stressed on the reasons for denial or not approaching medical services. There is a lack of awareness among the general population as well as doctors. The cost of treatment was also said to play a role in creating this inertia. The third party payment option is not available. Rao and David (2011) conducted a questionnaire based study to assess the prevalence of diurnal bruxism among information technology professionals and thereby exploring parafunction habits. 59% was the prevalence of self-reported bruxism. The study also highlighted the symptom of bruxism among professional with more years of the experience was found to be reported less than those with less experienced professionals. Devnani and Bhalerao (2011) conducted a questionnaire based cross sectional assessment of sleepiness and sleep debt in the adolescent urban Indian population 24.8% of the sample the population was found to obtain approximately 4–6 h of sleep and thereby exhibiting higher sleepiness statement score while 59.3% received 6 to 8 h of sleep and hence reported lesser sleepiness statement score. The above observations highlight unrecognized sleep debt among the study population. Excessive daytime sleepiness in the form of dozing in the classroom was reported to be 25%. Panda et al. (2012) estimated sleep related disorders in an apparently healthy South Indian population. The study reported insomnia, sleep-related breathing disorders, narcolepsy and restless leg syndrome as 18.6%, 18.4%, 1.04% and 2.9% respectively. Other sleep disorders such as night terrors, nightmares, somnambulism and sleep
Sleep Disorders Prevalence Studies in Indian Population
191
talking were found to be 0.6%, 1.5%, 0.6% and 2.6%, respectively. The study also highlights the health seeking population for a sleep disorder as 0.3%, which is very low, indicating the need to create awareness among physicians as well as the public. The prevalence of OSA was found to be higher among Indian men, which can have public health consequences. Agrawal et al. (2013) evaluated surgical patients in tertiary care hospitals for obstructive sleep apnea. High risk for OSA was reported to be in 24.5% o the study population. Shad et al. (2015) conducted a cross-sectional study to evaluate burnout and poor sleep quality issues among 214 undergraduate medical students. The questionnaire utilized was the Pittsburg sleep quality index and Oldenburg Burnout Inventory for the sleep duration and burnout syndrome, respectively. The results are depicted as follows in Table 6. Table 6. PSQI results Parameter Poor Sleepers Sleep hours less than 5 Poor sleep among medical students Poor sleep among non-medical students
Prevalence/incidence 62.6% 20% 72.9% 51.9%
The exhaustion dimension of burnout was reported to be higher among medical students than among non-medical students but was correlated more with the PSQI sleep score in the non-medical group. A cross sectional study was performed by Kaur G and Singh A in 2016, to estimate the prevalence of excessive daytime sleepiness among undergraduate students. The study utilized Epworth Sleepiness Scale and socio demographic survey. 45% of the student population was found to experience excessive daytime sleepiness. Macwana et al., conducted questionnaire based study among adolescents to correlate the sleep hours and obesity. Sleep deprivation in the form of less than 7 h of sleep was reported for 45% of the study population. Singh et al. (2019a, b) evaluated sleep patterns and duration effect on children’s overall development among the Indian population. The study reported Indian children to have the least hours of night-time sleep as compared to other country children. This can be resulting due to early school hours of most of the private schools. Children optimally should be made to sleep as early as possible, but due to changing modern lifestyles and excessive internet usage, this nighttime routine has been delayed by 2– 3 h. This can affect a child’s physical and mental well-being in the long term. Singh et al., recently in 2019, reported the strong association of overuse of internet, excessive daytime sleepiness and other sleep problems among Indian population.
192
V. Ramrakhiyani et al.
3 Studies Assessing Effect of Sleep Deprivation Among Indian Population Sharma et al. (2010) studied the effect of severe obstructive sleep apnea on neurocognitive function among Indian adults. Impaired performance on alertness working memory, response inhibition, problem solving and executive function was observed among patients with severe obstructive sleep apnea. The probable cause of this impaled performance was delayed information processing. Namita et al. (2010) studied the effect of day and night duty among hospital employees with special emphasis on visual and auditory reaction time. The results are tabulated in Table 7. Table 7. Effect of sleep hours on reaction time Day duty Night duty Visual reaction time (231.60 ± 30.93) (234.98 ± 32.27) Auditory reaction time (224.69 ± 46.95) (228.74 ± 47.01)
The study results indicated that response time was higher during night duty as compared to day duty. The reason can be pinpointed to the adaptation of shift work and chronic sleep deprivation among this specific group of the population requiring continuous vigilance and attention in the job. Shaikh et al. (2009) assessed the effect of sleep duration on adiposity Gujurati Indian Adolescents. The body mass index, fat percentage and total body fat mass were significantly lower in the adequate sleep duration group (with more than 7 h of sleep) than in the inadequate sleep duration group (with less than 7 h of sleep). The study concluded that inadequate sleep duration increases adiposity and thereby predisposition to obesity. Iyer and Iyer (2006a, b) published a review article dealing with sleep and obesity role in metabolic syndrome. The review put forth features that suggested OSA is closely linked to metabolic syndrome. These features were a strong association with obesity, prevalence among male gender, increased prevalence among postmenopausal women, systemic effects like hypertension and diabetes and 55 & 65 age was found to be correlated with the incidence of sleep apnea. Vartak et al. (2014) conducted a trial to assess the effect of stress in the form of sleep deprivation leading to cortisol exposure on the human telomere gene. A significant decrease in telomere length was observed among blood samples collected from participants with higher levels of cortisol due to stressful factors such as sleep deprivation. This study establishes the role of sleep deprivation and aging.
Sleep Disorders Prevalence Studies in Indian Population
193
4 Conclusion The prevalence of sleep deprivation in the form of sleep disorders is significantly high among the Indian population. This number could be higher as all the above discussed studies mainly dealt with subjective questionnaires for which people tend to be bias in their answers while denying any underlying sleep problem. Also, there is lack of awareness among physicians as well as general population regarding the importance of sleep and its disorders. There is a need to create more number of sleep clinics and skillful sleep technologists as well as physicians to work in coordination to screen and thereby diagnose these sleeps disorders. The first line of treatment for moderate to severe sleep apnea is Continuous Positive Airway Pressure (CPAP) machines, which are economically not feasible to the majority of the patients. There is a need to Indianize the manufacture and supply of such costly machines to make it available at cheaper rates. Cognitive behavioral therapy benefits insomnia patients, yet the number of skillful experts are less, if there are more number of skilled CBT technologists, chronic sleep deprivation effects can be controlled.
References Devnani, P., Bhalerao, N.: Assessment of sleepiness and sleep debt in adolescent population in urban western India. Indian J. Sleep Med. 6(4), 140–143 (2011) Express Healthcare: Waking up to sleep Therapy, June 2010 Gupta, R., Bhatia, M.S., Chhabbra, V., Sharma, S., Dahiya, D., Semalti, K., Sapra, S., Dua, R.S.: Sleep patterns of urban school-going adolescents. Indian Pediatr. 45, 184–189 (2008) Iyer, R., Iyer, R.: Sleep and obesity in the causation of metabolic syndrome. Indian J. Diabet. Dev. Count. 26(2), 63–69 (2006a) Iyer, R.S.: Sleep and type 2 diabetes mellitus-clinical implications. JAPI 60, 42–47 (2012) Iyer, S.R., Iyer, R.R.: Sleep and obesity in the causation of metabolic syndrome. Indian J. Diabet. Dev. Count. 26(2), 63–69 (2006b) Kaur, G., Singh, A.: Excessive daytime sleepiness and its pattern among college students. Sleep Med. 29, 23–28 (2017) Ravikiran, S.R., Jagadesh Kumar, P.M., Latha, K.S.: Sleep problems in preschool and school aged rural Indian children. Indian Pediatr. 48, 221–223 (2011) Sharma, H., Sharma, S.K., Kadhivran, T., Mehta, M., Shreenevas, V., Gulati, V., Sinha, S.: Pattern & correlates of neurocognitive dysfunction in Asian Indian adults with severe obstructive sleep apnea”. Indian J. Med. Res. 132, 409–414 (2010) Surendra, S.: Wake-up call for sleep disorders in developing countries. Indian J. Med. Res. 131, 115–118 (2010) Suri, J.C., Sen, M.K., Adhikari, T.: Epidemiology of sleep disorders in the adult population of Delhi: a questionnaire based study. Indian J. Sleep Med. 3(4), 128–137 (2008) Suri, J.C., Sen, M.K., Ojha, U.C., Adhikari, T.: Epidemiology of sleep disorders in the elderly- a questionnaire survey. Indian J. Sleep Med. 4(1), 12–18 (2009) Udwadia, Z.F., Doshi, A.V., Lonkar, S.G., Singh, C.I.: Prevalence of sleep-disordered breathing and sleep apnea in middle-aged urban Indian men. Am. J. Respiratory Crit. Care Med. 169, 168–173 (2004) Vartak, S., Deshpande, A., Barve, S.: Reduction in the telomere length in human T-lymphocytes on exposure to cortisol. Curr. Res. Med. Medi. Sci. 4(2), 20–25 (2014)
194
V. Ramrakhiyani et al.
Ghoshal, A.G., Sarkar, S., Mondal, P., Bhattacharjee, S.K., Shamim, S., Mundle, M.: A study of excessive daytime sleepiness in asthma. Indian J. Sleep Med. 3, 0973–340 (2008) Krishna, P., Shwetha, S.: Sleep quality and correlates of sleep among medical students. Indian J. Sleep Med. 3, 0973–340 (2008) Meshram, S.H., Meshram, C.S., Mishra, G.S., Bharshankar, R.: Behaviour, attitude and knowledge of sleep medicine among resident doctors in university hospitals of Central India: A questionnaire based study. Indian J. Sleep Med. 2 (2008) Singh, R., Singh, R.K, Singh, S.: Sleep and children’s development in Indian. J. Global Health Reports 3 (2019a) Shad, R., Thawani, R., Goel, A.: Burnout and sleep quality: a cross-sectional questionnaire-based study of medical and non-medical students in India. Cureus 7(10) (2015) Agrawal, S., Gupta, R., Lahan, V., Mustafa, G., Kaur, U.: Prevalence of obstructive sleep apnea in surgical patients presenting to a tertiary care teaching hospital in India: a preliminary study. Saudi J. Anaesth. 7(2), 155–159 (2013) Singh, L.K., Suchandra, K.H., Pattajoshi, A., Mamidipalli, S.S., Kamal, H., Singh, S.: Internet addiction and daytime sleepiness among professionals in India: a web-based survey. Indian J. Psychiatry 2019;61:265-9 Saxena, S., Gothi, D., Joshi, J.M.: Prevalence of symptoms and risk of sleep disordered breathing in Mumbai (India). Indian J. Sleep Med. 1(1) 27–31 (2006) Rao, S.K., Bhat, M., David, J.: Work, stress, and diurnal bruxism: a pilot study among information technology professionals in Bangalore City, India. Int. J. Dent. (2011) Panda, S., Taly, A.B., Sinha, S., Gururaj, G., Girish, N., Nagaraja, D.: Sleep-related disorders among a healthy population in South India. Neurol India 60, 68–74 (2012) Namita, Ranjan, D.P., Shenvi, D.: Effect of shift working on reaction time in hospital employees. Indian J. Physiol. Pharmacol. 54(3), 289–293 (2010) Shaikh, W., Patel, M., Singh, S.K.: Sleep deprivation predisposes Gujurati Indian adolescents to obesity. Indian J. Commun. Med. 34(3), 192–194 (2009)
Prediction Models in Healthcare Using Deep Learning S. Bhavya(&) and Anitha S. Pillai School of Computing Sciences, Hindustan Institute of Technology and Science, Chennai, Tamil Nadu, India {rs.bs0918,anithasp}@hindustanuniv.ac.in
Abstract. Predictive models are used to predict the unknown future events using a set of relevant predictors or variables by studying both present and historical data. Predictive modeling is also known as predictive analytics that uses the techniques of statistics, data mining, and artificial intelligence that can be applied to a wide set of applications. A predictive model in healthcare learns the historical data of patients to predict their future conditions and determine the treatment. In this review, the use of deep learning models such as LSTM/Bi-LSTM (Long Short-Term Memory/Bi-directional LSTM), RNN (Recurrent Neural Network), CNN (Convolutional Neural Network), RBM (Restricted Boltzmann Machine) and GRU (Gated Recurrent Unit) on different healthcare applications are highlighted. The results indicate that the LSTM/BiLSTM model is widely used in time-series medical data and CNN for medical image data. A deep learning model can assist healthcare professionals to make decisions regarding medications, hospitalizations quickly and thus save time and also serve the healthcare industry better. This paper analyzes the various predictive models used in healthcare applications using deep learning. Keywords: Deep learning (DL) LSTM/Bi-LSTM RNN
Prediction models Healthcare CNN
1 Introduction A deep learning-based predictive model plays a vital role in healthcare especially in early identification of diseases, predicting treatment as well as recommending future hospitalizations. Various deep learning algorithms used in healthcare are Auto Encoder (AE), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Long Short-Term Memory (LSTM)/Bi-Directional LSTM, and Gated Recurrent Unit (GRU) [1]. AE: AE is an unsupervised artificial neural network that uses data compression and encoding techniques. It has an encoder and decoder; the encoder encodes the input data into a code and the decoder will reconstruct the encoded data back to its original input [2]. CNN: A CNN receives several inputs, acquires a weighted sum over them, proceeds it through an activation function and responds with an output. CNN composed of an © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 195–204, 2021. https://doi.org/10.1007/978-3-030-49345-5_21
196
S. Bhavya and A. S. Pillai
“input layer, an output layer and a hidden layer” which includes several “convolutional layers, pooling layers, fully connected layers and normalization layers” [3]. CNNs are used in image and video identification, recommendation systems and natural language processing. RNN: RNN is a type of neural network where the output from the preceding step becomes the input to the present step. Usually, all the inputs and outputs are independent of each other in traditional neural networks, but in some of the natural language processing applications to predict the next word information about the previous word is required. RNN solves this issue with the help of an important feature known as Hidden State which remembers the information about the previous word in a sequence. RBM: RBM is an undirected graphical model introduced by Paul Smolensky in 1986 and it gained popularity in recent years especially in collaborative filtering, dimension reduction, regression and classification, feature learning, and topic modeling. RBMs are simple two-layer neural networks with one input layer and a hidden/output layer. RBM belongs to the kinds of stochastic and generative models of Artificial Intelligence (AI) where Stochastic refers to anything based on probabilities and generative implies that it uses AI to get the desired output. One of the limitations of RBM is that the neurons within the same layer cannot communicate with each other; rather, communication is only possible neurons with other layers [4]. LSTM/ Bi-LSTM: These are the artificial recurrent neural network (RNN) which have feedback connections. LSTM applies to tasks such as speech recognition, handwriting recognition, predicting diseases, etc. based on time series data. LSTM consists of a cell, and three gates: an input, output, and a forget gates. The cell is used to remember the values, while the three gates control the flow of information into and out of the cell. A Bidirectional LSTM (Bi-LSTM) is a type of LSTM that allows the data flow in two directions, forward and backward. A Bi-LSTM can get the data from past (backward) and future (forward) states at the same time. GRU: The vanishing gradient problem faced by RNN is eliminated and is a powerful tool to handle sequence data. GRU makes use of the update gate and reset gate, which permits a GRU to carry the information forward over several time periods in order to use it at a future period [5].
2 Deep Learning-Based Prediction Models in Healthcare LSTM network successfully predicted the phenotyping of clinical time-series data. Electronic Health Record (EHR) of patients who were in Pediatric intensive care unit (PICU) was used for the experiment where the LSTM network successfully classified the phenotyping of multivariate PICU time series data of patients with an accuracy of Micro and Macro AUC (Area Under the curve) 0.8324 and 0.7717 [6] respectively. LSTM was used as it is a popular model for learning sequence data and also it can capture long-range dependencies [7]. LSTM network classified diagnoses from patients’ EHR (Electronic Health Record) collected from Children’s Hospital LA. The patients’
Prediction Models in Healthcare Using Deep Learning
197
visits, sensor data and lab test results were recorded in EHR. According to the authors, this was the first empirical study that used LSTM to classify diagnosis from multivariate PICU time series data of patients. LSTM with two layers of 128 memory cells and a drop out probability of 0.5 between the layers and target replication predicted 128 diagnoses from 13 irregularly sampled time-series measurements. They have compared the LSTM model with baselines logistic regression and MLP with three hidden layers and the results proved that the model outperformed the baselines with Micro and Macro AUC of 0.8560 and 0.8075 [6] respectively and also authors concluded that the early diagnosis using LSTM model predicted patients’ future conditions and treatments. A Bi-LSTM network was used to diagnose the ICU (intensive care unit) time-series data [8, 15, 16]. A Bi-directional LSTM network used temporal information to detect the presence of bacteria or fungus in the blood. A time-series data collected from 2177 ICU admissions were used for the study. For each patient, nine parameters were monitored and measured and the total dataset included fourteen million values. The temporal features of patients were learned by a Bi-LSTM network and predicted whether the patient had a positive blood culture or not. A precision-recall (PR) curve was used for the validation of the network and the model had a PR AUC (Area-Underthe Curve) of 71.95% [8]. Patients’ unplanned ICU (Intensive Care Unit) readmission was predicted by using a Bi-LSTM network. The EHR dataset from (“Medical Information Mart for Intensive Care”) MIMIC-III critical care database which had over 40000 patient’s data was used. The feature category was: Patient’s demographic data, chronic disease information and other physical conditions evaluated by experts. A Bidirectional LSTM network with an additional LSTM layer with overall 16 hidden units outperformed the other baseline logistic regression models with AUC of 0.791 and sensitivity of 0.742 [15]. A Bi-directional LSTM, based on attention and time adjustment factors predicted the next treatment behavior of patients from their past medical insurance data. The real-time datasets from two hospitals were used for the study and they were “tumor dataset”, “Coronary heart disease dataset”, “Diabetes dataset”, and “Pneumonia dataset” and a Bi-LSTM model was used to train the embedded visit information of the patients. The model had an accuracy of 0.8864, 0.8674. 0.8618 and 0,8893 for the tumor dataset, the heart disease dataset, the diabetes dataset and the pneumonia dataset [16] respectively. LSTM networks had better accuracy in Disease progression [9, 10, 14]. The timeline model with LSTM predicted the primary diagnosis factor of future hospitalization. Timeline mechanism had the records of previous visits of patients including their disease diagnosis features. The medical claim data from the Seer-Medicare database was used for the study. Each patient’s time of the visit and the medical codes recorded during each visit was learned by a LSTM-Timeline model which predicted the main diagnosis of future hospitalization. The model predicted that chronic conditions have a very long-lasting effect on upcoming Hospital visit and showed better accuracy than the state-of-the-art deep neural networks [9]. LSTM networks with time interval data performed well in predicting Alzheimer’s Disease (AD) progression and patient’s next hospital visit was predicted using 5432 patients’ information collected from National Alzheimer’s Coordinating Center (NACC). Patient’s temporal and medical data was extracted from their past medical visits records and LSTM could successfully predict the AD progression and next medical visit of patients with over
198
S. Bhavya and A. S. Pillai
99% accuracy [10]. Zhang et al. used the LSTM network with Attention-based and Time aware mechanisms to identify the septic shock disease progression from patients’ EHR data obtained from Christiana Care Health System (CCHS). Attention mechanism (to identify the critical past moments of patients) and time interval information increased the accuracy of LSTM network in the disease progression prediction and the model obtained the best AUC value 0.811 [14]. Infectious diseases such as malaria, chickenpox, and scarlet fever was predicted by DNN and LSTM models. Four kinds of data such as “search query data”, “social media data”, “weather data” (temperature and humidity) and “infectious disease data” were used for the study. There were four comparison models: DNN, LSTM with time-series data, the “autoregressive integrated moving average” (ARIMA) method, and the “ordinary least squares” (OLS) with all possible combinations of variables. The DNN and LSTM accomplished better results than the other models with an accuracy improvement of 24% and 19% [11] respectively. DNN showed a stable and LSTM showed more accurate in infectious disease prediction. LSTM network was used to predict heart failure in patients by examining patients’ time-stamped Electronic Health Record (EHR). Medical concept vectors with one-hot encoding and skip-gram [12] models were used to improve the performance of the network. According to authors LSTM outperformed with an AUC 0.894 when compared to logistic regression, support vector machine (SVM), multilayer perceptron (MLP), and K-nearest neighbor (KNN). An LSTM model by Hong et al. predicted Alzheimer’s Disease (AD) at an early stage using MRI images collected from “AD Neuroimaging Initiative” (ADNI) dataset. The model predicted the AD state of the next sixth month. The model was evaluated in terms of disease stages: NC (“Normal control”), AD and MCI (“Mild Cognitive Impairment”) and the prediction on NC vs AD vs MCI showed the best AUC (Area Under the curve) of 0.752 for Cortical Thickness Average (TA) feature [13]. From the results it is observed that the LSTM model showed a better performance than existing models. A CNN was used for brain tumor grading by using MRI (Magnetic Resonance Imaging) scan images of patients. The study was conducted on the “Brats 2014” dataset. There was an improvement in performance by 8% [17] while using 3-layer CNN than a baseline neural network. A deep CNN model with 6- layers (four convolutional layers and two fully connected layers) was used for glaucoma (which is a chronic eye disease) detection. Dropout and data augmentation techniques were used to enhance the performance of the model. The ORIGA and SCES datasets were used for the experiment. From a set of input images, a deep CNN was able to identify the pattern of glaucoma and non-glaucoma. The results showed that the AUC values on two datasets ORIGA and SCES were 0.831 and 0.887 [18] and CNN gave better results than the state-of-the-art models. A Deep CNN detected the cardiovascular disease (CVD) called a myocardial infarction (MI) using ECG (Electrocardiogram). The dataset used was the PTB (Physikalisch-Technische Bundesanstalt diagnostic) ECG database. A CNN was used to detect the normal ECG and MI ECG with or without noise. Average prediction accuracy with noise ECG and without noise ECG beats was 93.53% and 95.22% [19] respectively.
Prediction Models in Healthcare Using Deep Learning
199
The recovery of patients with postanoxic coma after a cardiac arrest [20, 22] was predicted by a deep CNN on different datasets. Electroencephalography (EEG) data of 287 patients after 12 h of cardiac arrest and 399 patients after 24 h of cardiac arrest were recorded for the first 5 days of admission. The dataset used for this study was EEG data of patients at “Medisch Spectrum Twente and Rijnstate hospital”. The CNN model predicted the neurological outcome of patients. “Cerebral Performance Category Scale (CPC)” was used as the outcome measure and two categories used were good neurological outcome (CPC score 1–2) and Poor neurological outcome (CPC score 3– 5). The model predicted the poor outcome most correctly after 12 h of cardiac arrest with a sensitivity of 58% at a specificity of 100% and for the good outcome, the sensitivity was 58% and specificity was 97% [20]. Tjepkema et al. analyzed the same study later with 895 patients from five hospitals. Patients’ continuous electroencephalogram signals were recorded for the first 3 days. A CNN model used EEG signals of data after 12 h and 24 h of cardiac arrest and prediction of poor outcome was more accurate after 12 h of cardiac arrest with a sensitivity of 58% and false positive rate (FPR) was 0%. For the Good outcome, sensitivity was 48% at FPR of 5% after 12 h of heart attack [22]. “The chronic obstructive pulmonary disease (COPD)”, “acute respiratory disease (ARD) events” and “mortality” in smokers was identified and predicted by a three layer CNN using a combined image of four canonical CT (Computed Tomography) scan images of chest. The study was on two cohorts, “COPDGene” and “ECLIPSE” (“Evaluations of COPD Longitudinally to Identify Predictive Surrogate End-points”) and the model accurately predicted the existence of COPD in smokers with a C statistic value 0.856. The “C statistics” value for “ARD” events were 0.64 and 0.55 respectively for “COPDGene” and “ECLIPSE” datasets and for the mortality model it was 0.72 for COPDGene and 0.60 for ECLIPSE dataset [21]. A Stacked Restricted Boltzmann Machine (SRBM) and a Stacked Sparse Autoencoder (SSAE) predicted the Healthcare associated Infections (HAI) on two million patients EHR data, obtained from the Swedish Health Record Research Bank. Bag of words approach and word2vec tool were used for converting the text records to numerical representations in both models. The SRBM with bag of words approach model obtained a precision of 0.79 and a recall of 0.88 and the SSAE with bag of words classifier obtained a precision 0.78 and recall of 0.78 [23]. From the results it is clear that the SRBM model performed better than SSAE. The word2vec representation gave the lowest score than bag of words approaches on SRBM model but showed a better score for SSAE classifier. E.E Hamke et al. used RBM to detect the breathing events, breathing duration and the number of breathing events using a series of respiratory sound recordings of six firefighters aged between 20 and 30. According to the authors, RBM showed better performance than other classifiers with an accuracy of 90% [24]. A type of RNN (Recurrent Neural network), GRU (Gated Recurrent Units) was used for early prediction of Heart Failure (HF) using Electronic Health Records (EHR) of 3884 HF cases from “Sutter Palo Alto Medical Foundation” (“Sutter PAMF)”. The GRU model achieved a better AUC of 0.777 when compared with logistic regression (0.747), MLP with one hidden layer (0.765), support vector machine
200
S. Bhavya and A. S. Pillai
(0.743) and k nearest-neighbor (0.730) [25]; GRU has outperformed all these models. The diabetes disease progression study was conducted on ‘Diabetes dataset’ using a deep GRU model were the inputs were the sequence of admission information and treatment history of patients. These were embedded using vectors and the study reported a prediction accuracy of 70% [26]. The medical conditions, usage of medicines and the visit time of patients were predicted using an RNN model named ‘Doctor AI’, from 260k patients’ EHR data obtained at Sutter PAMF. Different medical codes of patient such as “diagnosis, medication and procedure codes” was fed to the RNN. According to the authors, the model achieved better performance than baselines with an accuracy of 79.58% recall@30 [27]. The diabetes progression in patients was predicted by an attentionbased RNN model. A task specific-layer was used for the multiple diagnosis prediction. Two real datasets: Study of Osteoporotic Fractures (SOF) dataset and BloodTest dataset was used for the study. The three attention-based RNN models: location-based RNNi, general RNNg and concatenation-based RNNc was used to understand the relationship between past visit and current visit of patients. The RNNi achieved an accuracy of 0.8454 and 0.8605 on SOF and BloodTest datasets respectively; for RNNg: 0.8440 on SOF and 0.8600 on BloodTest dataset and for RNNc: 0.8449 on SOF and 0.8632 [28] on BloodTest dataset. The RNN-based methods achieved good results in comparison with the baselines. The various critical conditions or complications (mortality, renal failure and postoperative bleeding) of patients admitted to intensive care unit after cardiothoracic surgery was predicted by an RNN [29]. The EHR dataset from German Heart Center Berlin was used for the study. The results proved that the proposed model achieved better accuracy than the standard clinical reference tool and the model helped the staff to give more care to the critical patients. Sepsis disease in critical care patients who were not under the category of sepsis at the time of admission was predicted by RNN by using MIMIC III database. The results proved that an RNN with two hidden layers (used GRU as the hidden layers) successfully predicted the onset of sepsis in critical care patients with the “area under the receiver operating characteristic” (AUROC) of 0.81 [30] and the model outperformed the conventional algorithms. A 3-layer RNN predicted the splice junction in DNA sequence using Molecular biology data taken from “Genbank database” with an accuracy of 99.95% [31]. An RNN model predicted the progression of Alzheimer’s Disease by using 1677 MRI images from “Alzheimer’s Disease Neuroimaging Initiative (ADNI) database”. The proposed model incorporated the issue of missing data in their work by using RNN with model filling (RNN-MF) method but authors mentioned that most of the previous work did not handle the missing data issue. The results showed that, RNN-MF method performed better than other baselines with an mAUC (multiclass Area Under the Curve) of 0.944 [32] (Fig. 1).
Prediction Models in Healthcare Using Deep Learning
Infectious Disease [11] Alzheim er’s predictio n [13]
LSTMDNN
To classify diagnosis of multivariat e timeseries data [7]
Phenotypi ng of clinical data [6]
Disease Progression: [9], [10] and [14]
To detect breath rate and duration [24]
Hospitalizat ion Behaviour Prediction [16]
ICU Readmission [15]
Positive Blood Culture Detection [8]
To Predict multiple diagnoses status [28]
Heart Failure Predictio n [25]
SRBM-SSAE RBM
DL Model
LSTM
Heart Failure Predictio n [12]
Healthcareassociated infections [23]
GRU Disease Progressio n [26]
CNN
Bi-LSTM RNN
To predict ICU Complica tions [29]
To predict Sepsis in ICU patients [30] Splice Junction Prediction in DNA [31] Clinical events Prediction [27] Alzheimer’s progression [32]
Brain Tumor Grading [17]
Glaucoma Detection [18]
Outcome prediction after cardiac arrest: [20] and [22] Disease staging and prognosis in smokers [21] Detection of Myocardial Infarction ECG [19]
Fig. 1. DL models used for various Healthcare Applications
201
202
S. Bhavya and A. S. Pillai
3 Conclusion Various deep learning (DL) models used in healthcare are reviewed in this paper. From the study, it is clear that deep learning has been used in healthcare for the prediction of diseases as well as for identification. Analyzing the voluminous EHR information of patients is very tedious and DL can assist the Doctors/Healthcare professionals to identify as well as predict diseases from the scan, MRI reports, EHR and so on. From the literature, it is observed that the most commonly used deep learning method for time-series clinical data was LSTM/Bi-LSTM (Long Short-Term Memory/BiDirectional LSTM) model and for the medical image data diagnosis, the widely used deep learning model is CNN (Convolutional Neural Network).
References 1. Slava Kurilya: Deep Learning (DL) in Healthcare. https://blog.produvia.com/deep-learningdl-in-healthcare-4d24d102d317 2. https://en.wikipedia.org/wiki/Autoencoder 3. https://medium.com/technologymadeeasy/the-best-explanation-of-convolutional-neuralnetworks-on-the-internet-fbb8b1ad5df8 4. whatis.techtarget.com 5. towardsdatascience.com 6. Lipton, ZC., Kale, D.C., Wetzel, R.: Phenotyping of clinical time series with LSTM recurrent neural networks. arXiv.org (2015) 7. Lipton, Z.C., Kale, D.C., Elkan, C., Wetzel, R.: Learning to diagnose with LSTM recurrent neural networks – Revised, arXiv.org (2016) 8. Baets, L.D., Ruyssinck, J., Peiffer, T., Turck, F.D., Ongenae, F., Dhaene, T., Decruyenaere, J.: Positive blood culture detection in time-series data using a BiLSTM network. arXiv.org (2016) 9. TianBai, B.L.E., Zhang, S., Vucetic, S.: Interpretable representation learning for healthcare via capturing disease progression through time. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, pp. 43–51. ACM (2018) 10. Wang, T., Qiu, R.G., Ming, Y.: Predictive modeling of the progression of Alzheimer’s disease with recurrent neural networks. Sci. Rep. 8 (2018). Article number: 9161 11. Chae, S., Kwon, S., Lee, D.: Predicting infectious disease using deep learning and big data. Int. J. Environ. Res. Public Health 15(8), 1596 (2018) 12. Maragatham, G., Devi, S.: LSTM model for prediction of heart failure in big data. J. Med. Syst. 43(5), 111 (2019) 13. Hong, X., Lin, R., Yang, Zeng, N., Cai, C., Gou, J., Yang, J.: Predicting Alzheimer’s disease using LSTM, special section on data-enabled intelligence for digital health. IEEE Access (2019) 14. Zhang, Y., Yang, X., Ivy, J., Chi, M.: ATTAIN: attention-based time-aware LSTM networks for disease progression modeling. In: Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI 2019), Sweden (2019) 15. Lin, Y.-W., Zhou, Y., Faghri, F., Shaw, M.J., Campbell, R.H.: Analysis and prediction of unplanned intensive care unit readmission using recurrent neural networks with long shortterm memory, PLoS ONE (2019)
Prediction Models in Healthcare Using Deep Learning
203
16. Cheng, L., Ren, Y., Zhang, K., Pan, L., Shi, Y.: Hospitalization behavior prediction based on attention and time adjustment factors in bidirectional LSTM. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds.) Database Systems for Advanced Applications. Workshop 2019. DASFAA Proceedings. Springer, Cham (2019) 17. Pan, Y., Huang, W., Lin Z., Zhu, W., Zhou, J., Wong, J., Ding, Z.: Brain tumor grading based on neural networks and convolutional neural networks. In: Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015, pp. 699–702 (2015) 18. Chen, X., Xu, Y., Wong, D.W.K., Wong, T.Y., Liu, J.: Glaucoma detection based on deep convolutional neural network. In: Proceedings of the 2015 37th Annual International Conference of the IEEE, Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015, pp. 715–718 (2015) 19. Acharya, U.R., Fujita, H., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M.: Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Inf. Sci. 415, 190–198 (2017). [CrossRef] 20. van Putten, M.J.A.M., Hofmeijer, J., Ruijter, B.J., Tjepkema-Cloostermans, M.C.: Deep Learning for Outcome Prediction of Postanoxic Coma. Springer, Cham (2017) 21. Gonzalez, G., Ash, S.Y., Vegas-Sánchez-Ferrero, G., Onieva, J.O., Rahaghi, F.N., Ross, J. C., Díaz, A., Estepar, R.S.J., Washko, G.R.: Disease staging and prognosis in smokers using deep learning in chest computed tomography. Am. J. Respiratory Crit. Care Med. (2018). (www.atsjournal.org) 22. Tjepkema-Cloostermans, M.C., da Silva Lourenço, C., Ruijter, B.J., Tromp, S.C., Drost, G., Kornips, F.H.M., Beishuizen, A., Bosch, F.H., Hofmeijer, J., van Putten, M.J.A.M.: Outcome Prediction in Postanoxic Coma with Deep Learning*. Critical Care Medicine (2019) 23. Jacobson, O., Dalianis, H: Applying deep learning on electronic health records in Swedish to predict healthcare-associated infections In: Proceedings of the 15th Workshop on Biomedical Natural Language Processing, pp. 191–195, Berlin, 2016. Association for Computational Linguistics (2016) 24. Hamke, E.E., Martinez-Ramon, M., Nafchi, A.R., Jordan, R.: Detecting breathing rates and depth of breath using LPCs and restricted boltzmann machines, Biomedical Signal Processing and Control, vol. 48, Elsevier (2019) 25. Choi, E., Schuetz, A., Stewart, W.F., Sun, J.: Using recurrent neural network models for early detection of heart failure onset. J. Am. Med. Inform. Assoc. 24(2), 361–370 (2016) 26. Pavithra, M., Saruladha, K., Sathyabama, K.: GRU based deep learning model for prognosis prediction of disease progression. In: 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC). IEEE (2019) 27. Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor AI: predicting clinical events via recurrent neural networks. In: Proceedings of Machine Learning for Healthcare 2016, JMLR W&C Track, vol. 56 (2016) 28. Suo, Q., Ma, F., Canino, G., Gao, J., Zhang, A., Veltri, P., Gnasso, A.: A multi-task framework for monitoring health conditions via attention-based recurrent neural networks. In: AMIA Annual Symposium Proceedings 2017, pp. 1665–1674. PubMed Central (PMC) (2017) 29. Meyer, A., Zverinski, D., Pfahringer, B., Kempfert, J., Kuehne, T., Sündermann, S.H., Stamm, C., Hofmann, T., Falk, V., Eickhoff, C.: Machine learning for real-time prediction of complications in critical care: a retrospective study. Elsevier (2018)
204
S. Bhavya and A. S. Pillai
30. Scherpf, M., Grabera, F., Malberga, H., Zaunsederb, S.: Predicting sepsis with a recurrent neural network using the MIMIC III database, Computers in Biology and Medicine 113, Elsevier (2019) 31. Sarkar, R., Chatterjee, C.C., Das, S., Mondal, D.: Splice junction prediction in DNA sequence using multilayered RNN model. In: Advances in Decision Sciences, Image Processing, Security and Computer Vision, pp. 39–47 (2019) 32. Nguyen, M., He, T., An, L., Alexander, D.C., Feng, J., Thomas Yeo, B.T.: Predicting Alzheimer’s disease progression using deep recurrent neural networks, bioRxiv (2019)
Comparison of Global Prevalence of Sleep Disorders in Intellectually Normal v/s Intellectually Disabled: A Review Nushafreen Irani1, Niketa Gandhi1, Sanjay Deshmukh1(&), and Abhijit Deshpande2 1
2
Department of Life Sciences, University of Mumbai, Mumbai 400 098, Maharashtra, India [email protected], [email protected], [email protected] International Institute of Sleep Sciences, Thane 400602, Maharashtra, India [email protected]
Abstract. Sleep is not given due importance unless one suffers from any one type of a sleep disorder. The current paper reviews the prevalence of sleep disorders. There is very low awareness about sleep disorders and their disastrous consequences in the Indian population. These consequences include impairment in physical, psychological and lack of neurodevelopment progress. Common methods to assess sleep disorders prevalence in large population groups have been discussed. This review also includes a special mention of sleep disorders in the intellectually disabled population. Our review finds that the prevalence of sleep disorders in this population is significantly high and comparable to the normal population. Sleep issues in intellectually disabled population present dual consequences. Not only sleep issues hamper the growth of persons suffering but unfortunately even the caregivers are also affected by this issue. Sleep disorders in the mentally challenged population are less studied since the person suffering has difficulty in reporting it. Our literature review reveals the fact that sleep issues are prevalent globally and they especially affect the intellectually disabled population. The review also discusses ancient history on sleep & intellectually disabled population. Keywords: Intellectually disabled Indian population Mentally challenged Prevalence of sleep disorders Sleep disorders Western population
1 Introduction Sleep is not just a “time out” from our daily activity but it is an essential and active phase of life (Hachinski et al. 2006). An individual usually spends one-third of his/her day sleeping but still, not much significance is given to it. One of the important discoveries like the structure of the Benzene ring was revealed to August Kekule in his dreams. After a long daytime contemplation on the structure of benzene, he dreamed of the self-devouring snake. This story indicates the active nature and significance of sleep. Every species (from insects to mammals) show that they do enter the phase of © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 205–216, 2021. https://doi.org/10.1007/978-3-030-49345-5_22
206
N. Irani et al.
sleep (Carsksadon and Dement 2005). Thus “sleep” is a universal behavioral phenomenon. So far there are 84 types of know sleep disorders (Sheldon et al. 2005). Many of which are perfectly treatable. These treatments can bring about dramatic and positive changes in a person’s life. Goodnight and restful sleep make positive neurobehavioral functioning improvements. This is more pronounced in Intellectually Disabled populations who already suffer from impaired neurobehavioral functioning. Even though sleep is not much discussed, in the last decade research publications documenting various aspects of sleep have been noted. Sleep disturbance & its consequences have been observed throughout the world. Sleep disorders can have a negative impact on the entire society due to accidents & disasters. Among the most devastating work-related accidents include not limited to the Three Mile Islands Meltdown in 1979; the Gas Leak in Bhopal, India in 1984; the Challenger Shuttle Disaster in 1986; the Chernobyl episode (1986), the oil spills at Exxon Valdez 1989, Mangalore air crash in 2010 to name a few. Bhopal gas tragedy on 3rd December 1984 occurred in early morning hours where workers were drowsy and inattentive resulting in death of 15,000 people and approximately 600,000 people were injured.
2 Tools Used for Measuring Sleep in Prevalence Studies Questionnaire based screening has emerged as a tool to study sleep patterns in a larger population. Many of these questionnaires are validated with gold standard sleep studies. Children Sleep Habit Questionnaire (CSHQ) is a validated and reliable tool to estimate sleep related disturbance in children (Owens et al. 2000), Although the original questionnaire was validated for children, the questionnaire also has been used in children from 4 to 18 years (Laberge et al. 2001). CSHQ has also been used in the ID population (Breslin et al. 2011). Many validated sleep questionnaires to screen large populations are also available. In the case of adolescence children, Cleveland Adolescent Sleepiness Questionnaire and School Sleep Habit Survey (SSHS) have been utilized. Other examples include the Epworth Sleepiness Scale (ESS) and the Berlin questionnaire. Other questionnaires include Stanford Sleepiness Scale and Pittsburgh Sleep Quality Index (Spriggs 2009). In the case of adult screening tool, questionnaires are self-reported whereas in the case of children and special populations, the reports are from parents or caregivers.
3 Prevalence of Sleep Disorders in Intellectually Normal Population Prevalence of sleep disorders ranges from 0.5 to 36%. This wide range is perhaps due to the difference in Materials and methods (Salzarulo and Chevalier 1983; Johns 1991; Roehrs et al. 2005; Iwadare et al. 2013). Though individuals with sleep-disordered breathing are known to report daytime somnolence in the study reported by Krakow et al. 2001. 50% of patients positive for sleep disordered breathing complained about either insomnia or Idiopathic hypersomnia (Tanaka and Honda 2010). The prevalence of daytime sleepiness also shows a wide range. In the USA, 2005 a survey has done shows significant (60%) American population was drowsy while driving. In fact, over 13% of
Comparison of Global Prevalence of Sleep Disorders
207
these had fallen out asleep while driving (Sheldon 2005). In Europe, a Finnish study found that the percentage prevalence of excessive daytime sleepiness was higher in female (11%) as compared to males (6.7%) (Hublin et al. 1996). In Japan the prevalence of excessive daytime to be as less as 2.5% (Kaneita et al. 2005). Daytime sleepiness observed in sleep apnea individuals after experiencing long-term their ability to rate difference decreases and which may return on therapy (Patzold et al. 1998). In general population prevalence of chronic insomnia appears to 1–2% while 15–20% was acute insomnia (Sheldon et al. 2005) and usually women are known to be more affected than men. Liljenburg et al. 1988 identified the incidence of insomnia from geographically two different central rural parts of Sweden. These participants included age group from 30 to 65 years. There were different criteria for the distribution of percentage population. These are those who reported difficulty falling asleep with females 7.1% and 5.1% males while trouble falling asleep along with nocturnal awakenings were 8.9% females and 7.7% males. When more stringent conditions were applied prevalence of female to male was found to be 1.1% and 0.5% respectively (Liljenberg et al. 1988). A similar finding was reported by Morgan et al. 1988; Li et al. 2002 identified females to a higher possibility for insomnia than males. Higher disturbance of sleep in females than males were also reported by Abdel-Khalek 2004. In Germany prevalence was found to be 5% in women v/s 3% in males (Hajak 2001). A survey by national sleep foundation 2005 showed 46% women with sleep problems almost every night which was present only in adolescents (Ohayon 2010) and adults while in children no such difference was observed (Ohayon and Roth 2002). Sleep problems have been observed across varying age groups. Parent report screening carried out by Stein et al. 2001 for children (age: 4 to 12 years) documented 10.8% of the population to have sleep problems. Of the affected population 58.7% were positive for parasomnias and sleep disturbance mentioned as noisy sleep. Different forms of sleep disorders affecting 25% of children have been reported around the world (Allik et al. 2006; Panossian and Avidan 2009). It has been reported that 45% of adolescents between 11 to 17 yrs of age sleep less than 8 h on school nights (Sheldon 2005). Sleep disturbance i.e. sleep disorders and general sleep complains are also found to identify majorly with suicidal tendencies (Bernert and Joiner 2007). The gender-based difference in sleep disorders has also been reported. Incidence of obstructive sleep apnea is found to be greater in males as to premenopausal females but the trend remains the same in postmenopausal females. In menopausal women, the prevalence of sleep disturbance is estimated to be higher of 42–50% (Owens et al. 2005). 25% male as compared to 9% of female middle aged-adults are positive for sleep apnea (Paparrigopoulos et al. 2010). 3.1
Indian Scenario
In Indian context prevalence of SDB was reported to be occurring in 19.5% of the subject population. In case of healthy urban males between the age group of 35 to 65 years, 7.5% population were positive for OSAHS. This study reported by Udwadia et al. 2004 draws attention towards presence of potential consequences which is observed in middle-aged male developing country like India. Yet another finding of prevalence of sleep apnea in middle age men has been reported by Kaul et al. 2001.
208
N. Irani et al.
4 Sleep Disorders in Intellectually Disable Population Study in mentally challenged populations showed severe sleep problems in them especially for nighttime awakenings followed by bedtime resistance (Brylewski and Wiggs 1998). Other sleep disorders observed in them were parasomnias and sleep disorder breathing. Poindexter and Bihm (1994) monitored sleep parameters in adults with profound mental retardation. It was observed that approx. 38.8% of the participants were sleeping less than 6 h over a period of six months. REM sleep in case of profound and severe mental retardation has shown to proportionally reduced as to increase in Non-REM sleep. Richdale et al. 2000 reported carrying out sleep based analysis with the help of sleep diary by parents of mentally challenged populations. Low levels of intelligence have been associated with decreased REM latency and spindle density as well as the presence of undifferentiated sleep. Such individuals have also reported prolonged initial REM latency and reduced amount of REM sleep. Piazza et al. 1996 compared total sleep time, sleep onset and nighttime awakenings with normal controls. Mentally retarded participants had significantly less bedtime and total sleep time as compared to individuals of the same age group. Of the population, 88% had complained of either delayed sleep onset, any numbers of WASO or early morning waking (Sadock 2011). Regezi et al. 2016 identified sleep disturbances to occur in 23–44% of people with intellectual disabilities. Sleep disordered breathing is also more common in this group. Obstructive sleep apnea was seen in Down syndrome due to congenitally narrow airways, lack of muscle tone and enlarged adenoids- tonsils (Rosdi 2012). DS persons show a statistical significantly higher number of central sleep apnea episodes and of oxygen desaturation than that Fragile X with a similar sleep structure (Hatton et al. 1999). Along with OSA they are also found to be positive for hypoxemia, hypoventilation and sleep fragmentation and reported to be correlated with obesity (Sateia et al. 2000). Yet another questionnaire-based study on Fragile X syndrome showed 47% positive of significant sleep problems in their children (Kronk et al. 2010). As per parent report survey children with Down syndrome reported disturb sleep, snoring, afraid of being in the dark, more than two times nighttime awakenings, teeth grinding, early morning awakening i.e. before 5 am as compared to normal (Sawant et al. 2005). There can be sleep problems of unspecific origins such as waking up in the middle of the night, unusual sleep-wake hours, inadequate sleep duration, daytime sleepiness and difficulty settling to sleep (Roth 2010). 69% of 16 patients with Down syndrome who underwent Tonsillectomy and adenoidectomy revealed symptoms getting resolved (Van Strien et al. 2005). Patient compliance with respect to mask and regular use of machines requires behavioral modification techniques. Down syndrome children when compared to their age control normal showed bedtime resistance (20%), waking up in the middle of the night (41%) and parent to be present at the time of sleep (24%) (Hoffmire et al. 2014). As per questionnaire-based survey in Fragile X syndrome children, 34% had sleep problems of which 84% had two or more sleep problems. Parents of such children usually complained of their children not falling asleep and major nighttime awakenings (Kronk et al. 2010). ADHD children have also shown positive for PLMD. With one such study in ADHD showed 26% positive while another
Comparison of Global Prevalence of Sleep Disorders
209
showed 64% of the population to be positive for PLMD (Roth 2010). Another study showed 44% of PLMD positive participants showing symptoms of ADHD (Rugh 1987). Approx. 66% of ASD children have sleep-wake problems (Wiggs and Stores 2004). In ASD individual showing anxiety and obsessive-compulsive behaviors, there are higher chances that such children have problems in sleep initiation and maintenance. A study by Buckley et al. 2010 showed a statistically significant difference in autism children demonstrating shorter total sleep time, more of stage 1, and stage 3 and a lack of REM sleep as compared to children with developmental delays. Disturb rapid eye movement (REM) sleep was observed by polysomnography studies in ASD (Godbout et al. 2000). Irregular sleep, frequent nighttime awakenings, prolonged awakenings and low early morning awakenings-spontaneous arousal threshold is usually observed in case of autism (Steffenburg et al. 1996). Similar results have been demonstrated by Krakowiak et al. 2008. Several studies in ASD have reported sleep problems which are parent report based (Breslin et al. 2011) and actigraphically recorded (Wiggs and Stores 2004). Breslin et al. 2011 reported one such study in which parents have signified their child’s sleep to difficulty falling asleep, decreased sleep efficiency, frequent wakening in their sleep and increased wake after sleep onset; as well as more variable sleep times. Sleep problems in ASD tend to show a rise in stereotypic behaviors, social difficulties, emotional symptoms, more behavior problems as to those with no sleep problems (Paavonen et al. 2008). Spastic cerebral palsy children presented with daytime irritability, frequent nighttime awakenings and desaturation in sleep (Mughal et al. 2012). Cerebral palsy, when compared with agematched control showed an increase in obstructive sleep apnea and hypopnea per hour of sleep with less body position changes during sleep along with central sleep apnea may exist. Increase chances of obstructive sleep apnea may be due to consequences of macroglossia and aspiration consequent to gastro-esophageal reflux during sleep (Kotagal 2001). In such children due to lesions in the anterior visual pathway leading to blindness may experience the delay in the secretion of the initial stage of melatonin which plays a very important role in sleep initiation and maintenance resulting in the delayed phase. This was observed by Palm et al. (1997) by doing a 24 h sleep-wake cycle on them. 4.1
Indian Scenario
Prevalence of sleep disorders in intellectually disabled population in India was by was carried out by Irani et al. 2019a, b. The Level 1 polysomnogram study showed total sleep time to be less than 6 h, increased Stage 3 sleep and reduced REM sleep. Micro arousal index was observed to 14/h which is significantly high and sleep efficiency of 81%. Percentage prevalence of sleep disorder/s was seen in 84.21% of the population. These are 32.43% OSA, 40.54% PLMD and parasomnias 13%. Irani et al. 2019a, b for EEG distribution in PLMD participants showed micro-arousal index of statistical difference of p = 0.04 when compared with those without PLMD positive participants.
210
N. Irani et al.
5 Ancient History on Sleep Disorders The point about sleep has been first documented in 360 BCE where Dionysius was described as being so obese that he died choking on his own fat. Treatment for sleep apnea in those days was done by sticking needles to their fat. This penetration was done through the fat and up to the point where needle could reach fat-free region creating arousal. Hippocrates (400 BCE) had described how sleep apnea as a person in the state of suffocation, crying, jumping and fleeting out of doors. These episodes would continue until the patient would wake up feeling physically weak. Another description of sleep apnea was reported by William Wells (1898) as daytime sleepiness. This association of sleepiness was observed due to nasal obstruction, headaches, snorting and restless at night. John Cheyne in 1818 was the first one to report Cheyne stroke breathing. The term Sleep apnea was first coined by Guilleminault et al. 1975. Burwell et al. described Pickwickian syndrome in 1956 with major implications on Heart failure rather than daytime sleepiness. Karl (1945) was first to describe the symptoms of the restless leg in sleep which was later coined as restless leg syndrome by Sir Thomas Willis in 1972. Persistent wakefulness described as insomnia today was reported in 1869 by William Hammond. The cause of insomnia was perceived by him then as an excess of oxygen to the brain. Karl Wetspall (1877) was first to observe symptoms of cataplexy and Narcolepsy. The English version of which has been given by Schenck et al. 2007. Roger Broughton in 1968 described a few parasomnias as being the cause of arousal rather than just dreaming state. The first time ever human brain electrical activity was found by Hans Berger in 1928. He evidently demonstrated differences in brain waves when subjects were awake or asleep, which he called “electroencephalograms”. This pioneering work could validate the measure of sleep without any discomfort to the person sleeping. The first fiber optic-based ear oximeter was introduced by Hewlett- Packard in 1974. The NREM (non-rapid eye movement) in encephalogram was first described by Alfred Loomis (1935) and co-workers. Later they were confirmed by Hallowell and Pauline Davis at Harvard University. They recorded sleep from day to night and categorized them into five types (which then labeled as A, B, C, D). These labelings were based on the order of manifestation and resistance to change by exterior emergence (Loomis et al. 1935). The equipment for analyzing sleep and sleep disorders were still too basic and challenging to use. Previously instead of gold cup electrodes, there were small pins (few channels) used which were stuck into the scalps of stoical volunteers. The amplifiers were as big as the size of the room which used ink pens for measuring sleep parameters. Due to such a system caution calibration was required before starting with the study. The data capturing would require a large amount of paper which was tedious to manage. Most of the research in this field was withheld due to World War II but post the war due to electric advances the studies in this field again started developing rapidly. Electrooculography was utilized to detect eye movement in children and adults. Kleitman and Aserinsky in 1951 used this tool to monitor eye movements. This methodology used a burst of electrical potential changes that were quite different from the slow movements at sleep onset. In the 1930s recordings for sleep measurements were carried out via paper and ink. With the introduction of the computer, recordings and sleep medicine analysis were
Comparison of Global Prevalence of Sleep Disorders
211
made much easier. EEG recordings were further advanced with the introduction of amplifiers and high pass filters. Specific findings like microarousals were studied from the frontal lobe and alpha waves denoting drowsiness were studied from the occipital lobe. Thus specific EEG leads were given importance during recording sleep. Along with EEG other physiological parameters such as heart rate and respiration were also introduced (Deak and Epstein 2009). Rechtschaffen and Kales in 1969 generated an asleep scoring manual which formed the basis of sleep staging for most research for the next 40 years. Chronic tracheostomy was used as a treatment of severe obstructive sleep apnea before the end of the 1980’s. Though this technique remained to be highly effective due to its detrimental system, it got slowly replaced by different methods which are surgery (Fujitia et al. 1981) and mechanical (Sullivan et al. 1981). The use of continuous positive airway pressure was described by Colin Sullivan in The Lancet in 1981. This was a revolutionary period for treatment of sleep apnea as tracheostomy was only therapy available for the treatment then. Untreated sleep apnea showed high mortality rate as reported by Jiang He and colleagues, Christian Guilleminault and colleagues 1989 especially in those below 50 years of age. Fred Turek and Michael Menaker altered the circadian clock of a sparrow in 1976 by administering melatonin which established the chronobiotic drugs used nowadays. Between 1935–1936 Fredrick Bermer established the concept that sleep is only when there is a reduction in stimulation and activity as seen in electroencephalography. He supported this concept by carrying out experiments in cats which showed a reduction of activity-idling, slow, synchronized, “resting” neuronal activity with sleep. Emil DuBois- Reymond is known as the father of modern electrophysiology, confirmed the polarized state of nerves and muscle fibers, and demonstrated that the peripheral passage of a nerve impulse was accomplished by an electrical discharge. Charles Czeisler (1976) documented 24-h cortical secretory patterns in humans. In 1970 Legendre and Pieron showed sleep in dogs that were induced with sleep deprived blood serum. Takahashi et al. 1997 discovered and cloned the first mammalian circadian clock gene, called Clock. Winkelman 2007 discovered single nucleotide polymorphisms with BTBD9, MEIS1, and MAP2K5/LBXCOR1 areas which are associated with RLS. Stefansson et al. 2008 discovered BTBD9 polymorphism to be associated with PLMs and also with low ferritin levels. Joe Bass’s lab (2010) reported Clock genes in pancreatic islet cells determining insulin secretion; Mitchell A. Lazar’s the lab reported (2011) the linkage of the metabolic and clock genes for hepatic lipid metabolism; Rachel Edgar et al. (2012) showed co-evolution of cellular circadian and metabolic timekeeping with redox homeostatic mechanisms. 5.1
Indian Scenario
As per Samhita 2001 cause of sleep problems is due to headache, body ache, loss of appetite, nausea, etc. These factors along with old age, V-type constitution, and disorder of Vata can lead to insomnia. In the same text, sleep therapy has been recommended for various reasons the cause could any. This includes those who are thinning due to activities that are due excessively. In the text, the activities enlisted are signing, reading, weight carrying, and people with anger- grief –fear, etc. Here Sleep therapy means for in such
212
N. Irani et al.
condition the individual must sleep a day as well as night for propose of equilibrium of all dhatus. Vata aggravation in the summer season can be reduced by a daytime nap. But the same must not be observed in the rainy and winter season. Also daytime is to be avoided for those who are obese, having fatty acids and plenty of Kaph. Different therapies for increasing and decreasing sleep have also been mentioned. Different therapies for inducing sleep included are massage, curd with rice, anointing, fix a time for bedtimes. Methods for reducing sleep include fasting, smoking, exercise, uncomfortable, etc.
6 Ancient History on Intellectual Disability (Mentally Challenged) During ancient civilization, Greek and Roman philosophers considered intellectually disabled people as barely human. Such attitude and treatment with mental retardation were also observed in China and the early Christian world. Egyptian Papyrus of Thebes in 1552 B.C made the first mention of intellectually disabled (Harris 2006). In the late fifth century Hippocrates believed mental retardation was due to an imbalance in the four senses of humor in the brain. Thomas Willis in the 17th century was the first to describe intellectual disability as a disease. He believed that intellectual disability is caused by structural abnormalities in the brain which could be an inborn condition or acquired later in life. During the 17th and 18th centuries, Europe was observed providing basic needs such as food, shelter, and clothing to these populations. Due to this understanding of brain function and certain types of mental retardation (e.g. cretinism and hydrocephalus) increased. Individuals with mental retardation were confined to institutions (e.g. foundling homes, hospitals, prisons). In the 18th and 19th century mentally retarded were removed from their family, usually in infancy, and settled them at institutions made for mentally challenged. In early 20th century Eugenic movement faced mandatory sterilization and exclusion from getting married. This was observed in the majority of the developed world and this was applied by Hitler as a foundation for the crowd assassination of such individuals during the holocaust. However, on being cognizant, such evil acts as Eugenics movement got abounded by the mid-20th century by most developed countries. Contrary to the Roman law in 1905 which declared that people with mental retardation to be incapable of the deliberate intent to harm that was necessary for a person to commit a crime. With time many distinct etiological categories were identified and described (e.g. Down syndrome, cretinism, hydrocephaly, and microcephaly) although the causes of many of these conditions were poorly understood. In 1866 John Langdon Down described in his essay about a certain set of mental retardation having common features. These children were then referred to as Mongoloids. The condition later came to be known as “Down’s syndrome” due to ethnic insult in the 1960’s from Asian genetic research. In the 1970s the term was named as Down syndrome by American revision of scientific terms. It was Jerome Lejeune and Patricia Jacobs working independently first determined trisomy (triplication) of the 21st chromosome to be the cause. Francis Galton in the 19th century
Comparison of Global Prevalence of Sleep Disorders
213
identified the cause of mental retardation could be reduced by considering genetic link before marriage. Frederick Hecht in 1970 coined the term fragile site. Fragile X was also called Escalante syndrome after Julio Anibal Escalante. Martin and Bell 1943 described a lineage of X-linked mental disability, without considering the macroorchidism. Lubs (1969) was first to observe “marker X chromosome” in link with mental disability. William syndrome was first identified in 1961 by Dr. J. C. P. Williams. These alarms continued in 1960 until in 1961 President Kennedy appointed the President’s Panel on Mental Retardation. The experts on the panel suggested eight programs in the preceding year which concealed every aspect of mental retardation from preventive to rehabilitative measures. 6.1
Indian Scenario
Ayurveda has conceptualized mental retardation four thousand years ago. Charak and Susruta both hypothesized that mental retardation or “manasamandyam” was due to faulty genes, poor condition at the time of pregnancy, and damaged child nurture practices. Charak and Susruta explained the cause of mental retardation, to be under the influence of divine forces and “Grahas”. Indian philosophy believes in KARMA, the seeds of which could be in past life and carry forward in present and seen in future lives as well (Puri and Sen 1989). The ancient Indian literature discloses the being of the rural day school or the ‘Gurukul’ (residential learning center). This gave due emphasis to a child center approach, by identifying the learning channel and pace of each learner and by individualizing both teaching and learning. In both types, the mentors planned the program to offer value and robustness to the education on a long time outlook but dispensed it according to proficiency or deficits in the learner. Thus the system of education could provide the educational needs of a broad series of learners-the highly gifted to the sub-average. Many students with special educational needs were successfully added with the normal students and participated meaningfully in the community. Establishment of a first residential home for intellectually disabled was started in India-Mumbai, then Bombay (Children Aid Society, Mankhurd,). This was followed by the establishment of a special school in 1944. By 1947, the schools for the mentally retarded were just three but rose to 20 by 1980 and at present, there are over 1100 schools in the country. The first school in one of the associated disabilities, i.e. cerebral palsy, was started in 1973. First standardized test for determining intelligence in children was developed by Alfred Binet (Venkatesan 2004). Intelligent quotient “IQ” was abbreviated first William Stern for the German term Intelligenz-quotient.
7 Conclusion The current paper made a review of all the aspects of sleep and sleep disorders in the western and Indian contexts. The paper mentions sleep-related issues in the intellectually disabled population. Illustration of ancient sleep history has also been made. The author has given due credit to all the researchers for their contribution to the field of sleep sciences.
214
N. Irani et al.
References Abdel-Khalek, A.M.: Prevalence of reported insomnia and its consequences in a survey of 5,044 adolescents in Kuwait. Sleep 27(4), 726–731 (2004) Allik, H., Larsson, J.O., Smedje, H.: Insomnia in school-age children with Asperger syndrome or high-functioning autism. BMC Psychiatry 6(1), 18 (2006) Bernert, R.A., Joiner, T.E.: Sleep disturbances and suicide risk: a review of the literature. Neuropsychiatric Dis. Treat. 3(6), 735 (2007) Breslin, J.H., Edgin, J.O., Bootzin, R.R., Goodwin, J.L., Nadel, L.: Parental report of sleep problems in Down syndrome. J. Intellect. Disabil. Res. 55(11), 1086–1091 (2011) Brylewski, J.E., Wiggs, L.: A questionnaire survey of sleep and night-time behaviour in a community-based sample of adults with intellectual disability. J. Intellect. Disabil. Res. 42, 154–162 (1998) Carskadon, M.A., Dement, W.C.: Normal human sleep: an overview. Principles Pract. Sleep Med. 4, 13–23 (2005) Deak, M., Epstein, L.J.: The history of polysomnography. Sleep Med. Clin. 4(3), 313–321 (2009) Ekbom, K.A.: Restless legs. Acta Med. Scand. 158, 4–122 (1945) Godbout, R., Bergeron, C., Limoges, E., Stip, E., Mottron, L.: A laboratory study of sleep in Asperger’s syndrome. NeuroReport 11(1), 127–130 (2000) Guilleminault, C., Eldridge, F.L., Simmon, F.B., Dement, W.C.: Sleep apnea syndrome. West. J. Med. 123(1), 7 (1975) Hachinski, V., Iadecola, C., Petersen, R.C., Breteler, M.M., Nyenhuis, D.L., Black, S.E., Vinters, H.V.: National Institute of Neurological Disorders and Stroke-Canadian stroke network vascular cognitive impairment harmonization standards. Stroke 37(9), 2220–2241 (2006) Hajak, G.O., SINE Study Group: Epidemiology of severe insomnia and its consequences in Germany. Eur. Arch. Psychiatry Clin. Neurosci. 251(2), 49–56 (2001) Hatton, D.D., Bailey, D.B., Hargett-Beck, M.Q., Skinner, M., Clark, R.D.: Behavioral style of young boys with fragile X syndrome. Dev. Med. Child Neurol. 41(9), 625–632 (1999) Hoffmire, C.A., Magyar, C.I., Connolly, H.V., Fernandez, I.D., van Wijngaarden, E.: High prevalence of sleep disorders and associated comorbidities in a community sample of children with Down syndrome. J. Clin. Sleep Med.: JCSM: Official Publicat. Am. Acad. Sleep Med. 10(4), 411 (2014) Hublin, C., Kaprio, J., Partinen, M., Heikkilä, K., Koskenvuo, M.: Daytime sleepiness in an adult, Finnish population. J. Intern. Med. 239(5), 417–423 (1996) Irani, N., Deshmukh, S., Deshpande, A.: Prevalence of Sleep Disorders in Intellectually Disabled Population (2019) Irani, N., Deshmukh, S., Deshpande, A.: Prevalence of PLMD in Intellectually Disabled Population (2019) Iwadare, Y., Kamei, Y., Oiji, A., Doi, Y., Usami, M., Kodaira, M., Saito, K.: Study of the sleep patterns, sleep habits and sleep problems in Japanese elementary school children using the CHSQ-J. Kitasata Med J 43, 31–37 (2013) Johns, M.W.: A new method for measuring daytime sleepiness: the Epworth sleepiness scale. Sleep 14(6), 540–545 (1991) Kaneita, Y., Ohida, T., Uchiyama, M., Takemura, S., Kawahara, K., Yokoyama, E., Kaneko, A.: Excessive daytime sleepiness among the Japanese general population. J. Epidemiol. 15(1), 1– 8 (2005) Kaul, S., Meena, A.K., Murthy, J.M.: Sleep apnoea syndromes: clinical and polysomnographic study. Neurol. India 49(1), 47–50 (2001) Kotagal, S.: Sleep abnormalities and cerebral palsy. Clin. Dev. Med. 107–110 (2001)
Comparison of Global Prevalence of Sleep Disorders
215
Krakow, B., Melendrez, D., Ferreira, E., Clark, J., Warner, T.D., Sisley, B., Sklar, D.: Prevalence of insomnia symptoms in patients with sleep-disordered breathing. Chest J. 120(6), 1923– 1929 (2001) Kronk, R., Bishop, E., Raspa, M., Bickel, J.O., Mandel, D.A., Bailey, D.B.: Prevalence, nature, and correlates of sleep problems among children with fragile X syndrome based on a large scale parent survey. Sleep 33(5), 679–687 (2010) Laberge, L., Petit, D., Simard, C., Vitaro, F., Tremblay, R.E., Montplaisir, J.: Development of sleep patterns in early adolescence. J. Sleep Res. 10(1), 59–67 (2001) Li, R.H., Wing, Y.K., Ho, S.C., Fong, S.Y.: Gender differences in insomnia—a study in the Hong Kong Chinese population. J. Psychosom. Res. 53(1), 601–609 (2002) Liljenberg, B., Almqvist, M., Hetta, J., Roos, B.E., Agren, H.: The prevalence of insomnia: the importance of operationally defined criteria. Ann Clin Res. 20(6), 393–398 (1988). (Abstract) Loomis, A.L., Harvey, E.N., Hobart, G.: Further observations on the potential rhythms of the cerebral cortex during sleep. Science 82(2122), 198–200 (1935) Morgan, K., Dallosso, H., Ebrahim, S., Arie, T., Fentem, P.H.: Characteristics of subjective insomnia in the elderly living at home. Age Ageing 17(1), 1–7 (1988) Mughal, S., Usmani, S., Naz, H.: Effects of rehabilitation on mild hypotonic diplegic cerebral palsy child with Down syndrome: an observational case study. Pak. J. Biochem. Mol. Biol. 45 (2), 104–111 (2012) Ohayon, M.M., Roth, T.: Prevalence of restless legs syndrome and periodic limb movement disorder in the general population. J. Psychosom. Res. 53(1), 547–554 (2002) Ohayon, M.M., Bader, G.: Prevalence and correlates of insomnia in the Swedish population aged 19–75 years. Sleep Med. 11(10), 980–986 (2010). (Abstract) Owens, J.A., Babcock, D., Blumer, J., Chervin, R., Ferber, R., Goetting, M., Rosen, C.: The use of pharmacotherapy in the treatment of pediatric insomnia in primary care: rational approaches. A consensus meeting summary. J. Clin. Sleep Med. 1(1), 49–59 (2005) Owens, J.A., Spirito, A., McGuinn, M.: The Children’s Sleep Habits Questionnaire (CSHQ): psychometric properties of a survey instrument for school-aged children. Sleep-New York- 23 (8), 1043–1052 (2000) Paavonen, E.J., Vehkalahti, K., Vanhala, R., von Wendt, L., Nieminen-von Wendt, T., Aronen, E.T.: Sleep in children with Asperger syndrome. J. Autism Dev. Disord. 38(1), 41–51 (2008) Palm, L., PhD, G.B.M.D., PhD, L.W.M.D.: Long‐term melatonin treatment in blind children and young adults with circadian sleep‐wake disturb-ances. Dev. Med. Child Neurol. 39(5), 319– 325 (1997) Panossian, L.A., Avidan, A.Y.: Review of sleep disorders. Med. Clin. North America 93(2), 407–425 (2009) Paparrigopoulos, T., Tzavara, C., Theleritis, C., Psarros, C., Soldatos, C., Tountas, Y.: Insomnia and its correlates in a representative sample of the Greek population. BMC Public Health. 10, 531 (2010) Patzold, L.M., Richdale, A.L., Tonge, B.J.: An investigation into sleep characteristics of children with autism and Asperger’s disorder. J. Paediatr. Child Health 34(6), 528–533 (1998) Piazza, C.C., Miller, W.W., Kahng, S.W.: Sleep patterns in children and young adults with mental retardation and severe behaviour disorders. Dev. Med. Child Neurol. 38, 335–344 (1996) Poindexter, A.R., Bihm, E.M.: Incidence of short-sleep patterns in institutionalized individuals with profound mental retardation. Am. J. Mental Retard. 98, 776–780 (1994). (Abstract) Regezi, J.A., Sciubba, J.J., Jordan, R.C.: Oral Pathology: Clinical Pathologic Correlations. Elsevier Health Sciences (2016) Richdale, A., Francis, A., Gavidia-Payne, S., Cotton, S.: Stress, behaviour, and sleep problems in children with an intellectualdisability. J. Intellect. Dev. Disabil. 25(2), 147–161 (2000)
216
N. Irani et al.
Roehrs, T., Carskaon, M.A., Dement, W.C. (eds.): Principles and practice of Sleep medicine, 4th edn., pp 39–50. Elsevier Saunders, Philadelphia (2005) Roizen, N.J., Patterson, D.: Down’s syndrome. Lancet 361, 1281–1289 (2003) Rosdi, M., AbdKadir, R.S.S., Murat, Z.H., Kamaruzaman, N.: The comparison of human body electromagnetic radiation between down syndrome and non down syndrome person for brain, chakra and energy field stability score analysis. In: 2012 IEEE Control and System Graduate Research Colloquium, pp. 370–375. IEEE (2012) Roth, I., Barson, C., Hoekstra, R., Pasco, G., Whatson, T.: The autism spec-trum in the 21st century: exploring psychology, biology and practice. Jessica Kingsley Publishers (2010) Rugh, J.D., Harlan, J.: Nocturnal bruxism and temporomandibular disorders. Adv. Neurol. 49, 329–341 (1987). (Abstract) Sadock, B.J., Sadock, V.A.: Kaplan and Sadock’s Synopsis of Psychiatry: Behavioral Sciences/clinical Psychiatry. Lippincott Williams & Wilkins (2011) Schenck, C.H., Bassetti, C.L., Arnulf, I., Mignot, E.: English translations of the first clinical reports on narcolepsy and cataplexy by Westphal and Gélineau in the late 19th century, with commentary. J. Clin. Sleep Med. 3(3), 301–311 (2007) Salzarulo, P., Chevalier, A.: Sleep problems in children and their relationship with early disturbances of the waking-sleeping rhythms. Sleep: J. Sleep Res. Sleep Med. (1983) Sateia, M.J., Doghramji, K., Hauri, P.J., Morin, C.M.: Evaluation of chronic insomnia. An American Academy of sleep medicine review. Sleep 23(2), 243–308 (2000) Sawant, N.S., Parkar, S.R., Tambe, R.: Isolated sleep paralysis. Indian J. Psychiatry 47, 238–240 (2005) Sheldon, S.H.: Polysomnography in infants and children. In: Sheldon, S.H., Ferber, R., Kryger, M.H. (eds.) Principles and Practice of Pediatric Sleep Medicine, 4th edn, pp. 49–71. Elsevier Saunders, Philadelphia (2005) Spriggs, W.: Essentials of Polysomnography. Jones & Bartlett Publishers, Boston (2009) Steffenburg, S., Gillberg, C.L., Steffenburg, U., Kyllerman, M.: Autism in Angelman syndrome: a population-based study. Pediat. Neurol. 14(2), 131–136 (1996). (Abstract) Stein, M.A., Mendelsohn, J., Obermeyer, W.H., Amromin, J., Benca, R.: Sleep and behavior problems in school-aged children. Pediatrics 107(4), e60–e60 (2001) Tanaka, S., Honda, M.: IgG abnormality in narcolepsy and idiopathic hypersomnia. PLoS ONE 5 (3), e9555 (2010) Udwadia, Z.F., Doshi, A.V., Lonkar, S.G., Singh, C.I.: Prevalence of sleep-disordered breathing and sleep apnea in middle-aged urban Indian men. Am. J. Respirat. Crit. Care Med. 169(2), 168–173 (2004) Wiggs, L., Stores, G.: Sleep patterns and sleep disorders in children with autistic spectrum disorders: insights using parent report and actigraphy. Dev. Med. Child Neurol. 46(6), 372– 380 (2004) Van Strien, J.W., Lagers-Van Haselen, G.C., Van Hagen, J.M., De Coo, I.F.M., Frens, M.A., Van Der Geest, J.N.: Increased prevalences of left-handedness and left-eye sighting dominance in individuals with Williams-Beuren syndrome. J. Clin. Exp. Neuropsychol. 27 (8), 967–976 (2005)
Detecting Learning Affect in E-Learning Platform Using Facial Emotion Expression Benisemeni Esther Zakka(B) and Hima Vadapalli University of the Witwatersrand, Johannesburg, South Africa [email protected], [email protected]
Abstract. Recent trends in education have shifted from traditional classroom learning to an online learning setting; however, research has indicated a high drop out rate among e-learners. Boredom, lack of motivation are among the factors that led to this decline. This study develops a platform that provides feedback to learners in real-time while engaging in an online learning video. The platform detects, predicts and analyses the facial emotions of a learner using Convolutional Neural Network (CNN), and further maps the emotion to a learning affect. The feedback generated provides a reasonable understanding of the comprehension level of the learner.
Keywords: Facial emotion recognition neural network
1
· e-learning · Convolutional
Introduction
E-learning is gaining prominence in higher institutions of learning, which is as a result of the need to provide quality education to learners irrespective of their geographical location [1,2]. With the high increase in the number of institutions offering e-learning courses, it is essential, therefore to evaluate its effectiveness and sustainability of this learning platform, due to record of high dropout rate among e-learning students [3]. Survey has pointed out that lack of support from instructors, difficulty to contact lecturers, lack of interaction and communication [4,5] are among the significant factor that contributes to this high dropout rate. Interaction and communication are vital tools in any practical learning setting, and they lead to enhancing student satisfaction and motivation [6]. In a traditional classroom setting, communication between learners and teachers takes place: verbally, through the use of body language and facial expressions. These communications are mostly two way, where an instructor can recognise the facial emotion expression of a learner and respond appropriately, by posing questions and changing the method, of course, content delivery [3,7–10]. However, in e-learning setting communications are mostly one-way. E-learning platforms are c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 217–225, 2021. https://doi.org/10.1007/978-3-030-49345-5_23
218
B. Esther Zakka and H. Vadapalli
unable to detect non-verbal cues and respond accordingly, as a teacher would in a classroom setting, therefore the e-learner is unable to receive feedback in real-time. Facial emotion expression has been identified as the most used non-verbal expression, and teachers may understand the comprehension level of their learners using facial emotion expressions [3]. Various researchers have used different techniques to analyse the effect of facial emotion expressions in learning. However, despite growing research in the field of facial emotion recognition as a means to provide feedback to learners or instructors, most research works have focused on the traditional classroom setting. E-learning platforms utilise chat rooms and video conferencing to enable learners to interact among themselves or with the teacher. It is, therefore, necessary to develop a platform that provides feedback in real-time to a learner in an e-learning setting. This work aims at developing a system that detects, predicts and analyses facial emotions of a learner and provides a feedback platform with the use of a convolutional neural network.
2
Related Work
The advent of machine learning has revolutionised the learning environment. Various machine learning algorithms have been used on facial emotion datasets to improve teaching and learning. Machine learning algorithms have been used to detect student engagement while learning. Klein and Celik [9] developed a system which enables instructors in a traditional classroom setting to receive feedback on learners learning affect in real-time, using a convolutional neural network based on AlexNet. Their technique explored the body gestures of learners in a realtime classroom setting to classify learners engagement. Despite their technique achieving a remarkable result of 89.7% using CNN, examining their approach in an e-learning system have not been studied. Shen et al. [11] used biophysical signals to determine how emotion evolves during teaching and learning and the possibility of using the result to improve learning. Their work applied Russell’s circumplex model of affect and machine learning techniques (SVM and KNN) to analyse emotions of learners who were engaged in a learning process. Their result showed that SVM performed better with an accuracy of 86.3%. They concluded that the use of emotion data in e-learning would improve the performance of learners within the system. This research proved that emotional data are vital for an effective e-learning system. However, the system was tested in a laboratory setting; devices used restricted the movement of learners and are uncomfortable to use. Implementation of such a system in an e-learning setting may prove difficult. Pan et al. [7] developed a system to record the attendance of learners using facial recognition. They further developed a model to detect students engagement in a classroom using facial emotion expression. A stimulus-response mechanism was used to classify facial expression and attention of the learner. They mapped six emotions (concerned, curiosity, thinking, comprehension, disregard
Facial Emotion Expression in E-Learning
219
and disgust) into learning affects. An antenna type learning affect transfer model was designed to evaluate the teaching techniques. This technique made it possible to analyse teaching methods in a classroom quantitatively. The research gave a theoretical framework for analysing learning affect from facial emotion expression of a learner, but it was not automated. Likewise, Oussama et al. [12] implemented a facial emotion recognition system with application in an e-learning setting using an educational game platform. The data was trained using CNN, and accuracy of 97.53% was achieved on the KDEF dataset and 97.18% on the JAFFE dataset. The system was integrated into an educational gaming application and used by young learners, to test the effectiveness of the system. Facial emotions of the learners were captured and examined in real-time. The result obtained from applying the approach indicated detection of emotions during the learning process except for the moment where the learners were not looking into the camera. Learners expressed sad expression when they encountered a challenge. Their system is a giant step in implementing facial emotion recognition in an e-learning setting. However, full implementation of this system in real-time was not achieved. Therefore, there is a need to design a model for e-learners, to detect facial emotions and determine the comprehension level of the learning remotely and provide a feedback mechanism in real-time. A convolutional neural network is adopted in this work, as the machine learning architecture for predicting facial emotion. The FER2013 dataset is for the training of the network. The architecture can capture, extract relevant and useful attributes from an image. It has achieved state of the art performance on image classification problems [12–15]. The FER2013 emotion data set contains unposed images from google search API, this images cut across different ethnic, gender and age groups. The dataset is a reflection of images the system may see when used in the e-learning platform.
3
Methodology
Figure 1, shows this study’s proposed model for detecting and analysing facial emotion expressions, and providing feedback to e-learners in real-time. Achieving the stated goal requires the adoption of six key processes; video capturing, image processing, detecting facial emotion using trained CNN model, mapping of emotions to learning affect, generating feedback to the learner and future processing. 3.1
Video Capture
The e-learning model automatically activates video capture whenever a learner plays a video from an e-learning platform. The OpenCV function, cap.read() is utilized to take video coverage of the learner using a computer webcam. The system splits the acquired video data into frames and converts them to grayscale images.
220
B. Esther Zakka and H. Vadapalli
Fig. 1. System overview
3.2
Image Processing
Due to the enormous data generated from a video sequence, it becomes necessary to extract frames that have a good representation of the video without losing salient features while eliminating redundant frames. This work employed the sequential comparison method for extracting keyframes. Each frame is compared to the last extracted keyframes; the new frames is indicated as a keyframe if the difference between the previous keyframes and the new frame is high. The system splits video data from the web camera into frames and converts them to grayscale using OpenCV libraries. The system detects faces from the images using HAAR cascade classifier [16]. HAAR cascade is used for its ability to detect faces almost in real-time on different scales. Detected faces were resized to 48 × 48 and normalised by dividing each image by 255. This action enabled the resized image to fit into the CNN model and ensure normalised image pixels maintain similar intensity value. Resized images are fed into the trained convolutional neural network, which is used to predict the emotions on the faces. 3.3
Trained Convolutional Neural Network
A convolutional neural network is a subset of deep neural network proposed by Yann Lecun in the year 1988 [17]. It has been used in a wide range of problems such as image classification, visual recognition and high dimensional shallow feature encoding [18]. The network has achieved state of the art accuracy in many image classification challenges, one of which is the ICML 2013 workshop with an accuracy of 71% on the FER 2013 dataset [19]. A trained CNN model is used in this work to predict and classify the facial emotion of the learner. The dataset used to trained the model is obtained from the kaggle public dataset [19], It was introduced at the International Conference on Machine learning in 2013. Images in the dataset were collected by the Google image search API automatically, which makes it a large and dynamic dataset with population
Facial Emotion Expression in E-Learning
221
and environmental variations. The dataset comprise of 35887, 48 × 48 grayscale images each contaning with the following emotions labelled: anger, disgust, fear, happiness, sadness, surprise and neutral. The dataset is split into training, validation and testing sets by the ratio of [80:10:10]. Figure 2 shows samples of images from the dataset.
Fig. 2. Sample images from dataset [19]
The CNN architecture comprises of three main layers: convolutional layer, pooling layer and fully connected. The convolutional layer in this work consists of a 3×3 kernel; the kernel moves in strides of 2 performing matrix multiplication between the kernel and the image to extract high-level features such as edges and gradient orientation. Adam optimiser, with an initial learning rate of 0.001, is used in the network. Keras learning rate reducer function is used to reduce the learning rate by a factor of 0.9, whenever there is no improvement in the validation loss after three epochs. Batch normalisation and drop out layers are inserted in the network to prevent overfitting and improve performance. The second layer in the CNN architecture (Max Pooling layers) are inserted after every two consecutive convolutional layers; this helped to reduce the spatial size of the representation, control overfitting and suppress noise in the images. The softmax activation function is used on the fully connected layer to classify the output, and Relu activation function is used on the convolutional layers.
222
3.4
B. Esther Zakka and H. Vadapalli
Emotion Mapping
Mapping mechanism is used to map emotions to their corresponding learning affects, and this process executes after CNN has predicted the emotions. The mapping mechanism proposed by [3] is used to map emotions to either positive, negative or neutral affects. Given an estimated emotion and its corresponding mapping affect, the learner receives a generated feedback at the end of the video in real-time.
4
Results
Keras save best only function is used to save the best model during training. The metrics used for the evaluation of the model is the classification accuracy, the higher it is, the better the model performs. 88.81% accuracy was obtained on the training dataset, 65.17% on the validation set while the test set had an accuracy of 64.77%. Figure 3 shows the loss and accuracy of the model while training.
Fig. 3. Training history of model
Additionally, confusion is also used to visualise the performance of the model. Analysis from the confusion matrix (Fig. 4 and Table 1) indicates a poor performance on the disgust and fear class.
Facial Emotion Expression in E-Learning
223
Fig. 4. Confusion matrix
The CNN model was also trained on augmented data, with all classes having the same instances, the model in this data set indicated an increase in the accuracy for disgust, fear and sad. The accuracy obtained is 71% on disgust, 42% on the fear class and 49% on the sad class. Although there is a drop in the accuracy for happy, surprise and neutral. Confusion matrix and the breakdown of each class performance can be seen in figure and Table 2.
Fig. 5. Confusion matrix on augmented data Table 1. Analysis on FER2013 data from confusion matrix Class
Number of samples Accuracy %
Angry
498
56
Disgust
52
37
Fear
545
37
Happy
881
88
Sad
588
49
Surprise 414
67
Neutral 611
77
224
B. Esther Zakka and H. Vadapalli Table 2. Analysis on augmented data from confusion matrix
5
Class
Number of samples Accuracy %
Angry
491
51
Disgust
55
71
Fear
528
42
Happy
879
80
Sad
594
56
Surprise 416
81
Neutral 626
65
Further Work
The future work includes: Analysing the outcome from the e-learning platform and evaluating it with the learners’ personal experience. Performing qualitative analysis on the video; for instance, if most learners indicate negative learning affect while watching a video, then the course content of the video can be improved, courtesy of the feedback generated from the qualitative analysis done. Proper labelling of video to show different sections in the course may help to indicate the particular point in the video a learner is experiencing difficulty.
References 1. Ayvaz, U., G¨ ur¨ uler, H., Devrim, M.O.: Use of facial emotion recognition in elearning systems. Inf. Technol. Learn. Tools 60(4), 95–104 (2017) 2. Horton, W.K.: Leading e-learning. American Society for Training and Development (2001) 3. Sathik, M., Jonathan, S.G.: Effect of facial expressions on student’s comprehension recognition in virtual educational environments. SpringerPlus 2(1), 455 (2013) 4. Willging, P.A., Johnson, S.D.: Factors that influence students’ decision to dropout of online courses. J. Asynch. Learn. Networks 13(3), 115–127 (2009) 5. Brown, K.M.: The role of internal and external factors in the discontinuation of off-campus students. Dist. Educ. 17(1), 44–71 (1996) 6. Savenye, W.C.: Improving online courses: what is interaction and why use it? Dist. Learn. 2(6), 22 (2005) 7. Pan, M., Wang, J., Luo, Z.: Modelling study on learning affects for classroom teaching/learning auto-evaluation. Science 6(3), 81–86 (2018) 8. Sandanayake, T., Madurapperuma, A., Dias, D.: Affective E learning model for Recognising learner emotions. Int. J. Inf. Educ. Technol. 1(4), 315 (2011) 9. Klein, R., Celik, T.: The wits intelligent teaching system: detecting student engagement during lectures using convolutional neural networks. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 2856–2860. IEEE (2017) 10. Leone, G.: Observing social signals in scaffolding interactions: how to detect when a helping intention risks falling short. Cognit. Process. 13(2), 477–485 (2012)
Facial Emotion Expression in E-Learning
225
11. Shen, L., Wang, M., Shen, R.: Affective e-learning: using “emotional” data to improve learning in pervasive learning environment. J. Educ. Technol. Soc. 12(2), 176–189 (2009) 12. El Hammoumi, O., Benmarrakchi, F., Ouherrou, N., El Kafi, J., El Hore, A.: Emotion Recognition in E-learning Systems. In: 2018 6th International Conference on Multimedia Computing and Systems (ICMCS), pp. 1–6. IEEE (2018) 13. Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–10. IEEE (2016) 14. Loshchilov, I., Hutter, F.: Online batch selection for faster training of neural networks. arXiv preprint arXiv:151106343 (2015) 15. Lopes, A.T., De Aguiar, E., Oliveira-Santos, T.: A facial expression recognition system using convolutional networks. In: 2015 28th SIBGRAPI Conference on Graphics, Patterns and Images, pp. 273–280. IEEE (2015) 16. Viola, P., Jones, M., et al.: Rapid object detection using a boosted cascade of simple features. In: CVPR (1), vol. 1, no. 511–518, p. 3 (2001) 17. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 18. Agrawal, A., Mittal, N.: Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy. Vis. Comput. 1–8 (2019) 19. Goodfellow, I.J., Erhan, D., Carrier, P.L., Courville, A., Mirza, M., Hamner, B., et al.: Challenges in representation learning: a report on three machine learning contests. In: International Conference on Neural Information Processing, pp. 117– 124. Springer (2013)
Computer Aided System for Nuclei Localization in Histopathological Images Using CNN Mahendra G. Kanojia1(&), Mohd. Abuzar Mohd. Haroon Ansari2, Niketa Gandhi3, and S. K. Yadav1
2
1 JJT University, Jhunjhunu, Rajasthan, India [email protected], [email protected] SIES College of Commerce and Economics, Mumbai, Maharashtra, India [email protected] 3 University of Mumbai, Mumbai, Maharashtra, India [email protected]
Abstract. Today, the health care industry is extensively using computer aided diagnostic expert system. The expert system for the diagnosis of breast cancer using the histopathological image is the need of time. Analysis of histopathological images is challenging due to its complex architecture with irregularly shaped nuclei. Convolutional neural network (CNN) is a promising technology emerging in recent years. We have designed a computer based expert system to identify nuclei in histopathological images. The system is developed using python programming language. We have used the BreaKHis breast cancer dataset for experimentation and Kaggle dataset for convolution masks generation. Nucleases are localized using custom design Keras and U-Net Hybrid CNN (KUH-CNN) model. The systems can be used by histopathologists for the diagnosis of malignancy in the tissue. The system can also aid the researchers who can implement a machine learning algorithm on the nucleases detected images for further analysis. Keywords: Breast cancer detection network Nuclei detection
BreaKHis dataset Convolution neural
1 Introduction Medical image processing considerably deals with object localization or segmentation of the region of interest (ROI). The general approach for nuclease identification follows the process of image enhancement, nuclei segmentation and image post processing [1–3]. Morphological analysis plays a vital role in such a process [2, 3]. Albeit the general image processing approach has proved to be the most adapted procedure for nuclease segmentation, it has limitations in identifying overlapping nuclei, noisy image and intensity levels [2, 3]. The identification of nucleases can be viewed as an object localization problem [3, 17], with nucleases as an object of interest on the visually heterogeneous background. Study shows that CNN [4–8, 14, 15] has proved to produce © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 226–234, 2021. https://doi.org/10.1007/978-3-030-49345-5_24
Computer Aided System for Nuclei Localization
227
benchmark results in object localization. This property of CNN is the motivation for the proposed work. In this paper, we propose fully connected convolutional neural network based trained model KUH-CNN for localization of nuclei in breast histopathological images. To aid the easy and efficient use of the proposed model we have developed computer aided graphical user interface system using Python programming language, Keras libraries [26] and U-Net model [1, 19].
2 Literature Review Considerable amount of work has been done for nuclei segmentation in histopathological images using image processing and soft computing techniques [2, 3, 21]. Soft computing techniques such as, but not limited to, artificial neural networks, radial basis function, fuzzy logic, SVM are implemented for prediction and classification. The goal of image processing techniques in the medical analysis is to enhance and segment the image and further extract features such as texture, statistical, shape and intensity which are given as input to design soft computing models for classification or detection. Recent advances in medical image analysis show the use of deep learning methodologies such as fully connected convolutional neural network [1, 4–8, 14, 15] to achieve high accuracy and faster performance. The hybrid model of convolutional neural networks, like R-CNN [12, 17], DeCAF [11], FCNN [16], TSP-CNN [18] is implemented to extend the strength of CNN for nuclease segmentation. A comparative study conducted by [21] summarizes the TensorFlow, Keras, PyTorch and Caffe deep learning libraries available for researchers as a gift to explore the capabilities of the deep learning algorithms without getting into mathematical hurdles of it. Transfer learning using AlexNet [9] and GoogleNet [9, 10, 13] for identification of breast cancer in pathological images are a faster way to train the images on pretrained networks. Work done by [1, 19] using U-Net, [21, 22] using keras, shows how researchers can implement custom models without getting into complexity of designing the models from scratch. Histopathological images are highly affected by microscope lightning condition, experiments done by [23] where the object under 15 different lighting condition were trained and tested in a Keras based pipeline for ImageNet-trained VGG16, ResNet, and SqueezeNet. Work done by [11] in 2017 and [24] in 2018, shows the use of CNN for texture-based analysis and classification in images. Due to heterogeneity in the histopathological images, texture analysis is one of the followed methods for the identification of nucleases. Machine learning experiments required high end computational engines and GPU; it adds an economic burden on researchers for implementing the machine learning models. A literature survey reveals that in recent years a limited work is done using CNN, towards nuclei localization in histopathological images for breast cancer diagnosis, hence there is a need to explore this domain. No diagnosis method will produce reliable results if the nuclei are not identified accurately in the histopathological images. CNN and its variants are promising and emerging techniques to achieve high accuracy in nuclei localization.
228
M. G. Kanojia et al.
3 Dataset The proposed system uses two datasets viz: BreaKHis [20] and Kaggle [25]. Kaggle is used to create generalize masks for the localization of nucleases. BreaKHis dataset is used for training and validation of the proposed model. BreaKHis is a dataset of breast cancer microscopic tissue images. It includes total 9109 images of 40, 100, 200 and 400 magnifications. The dataset includes both malignant that is cancerous tissues and benign that is non-cancerous tissues and their subtypes. The dataset and the ground truth of the images are available publicly. Our preference for BreaKHis dataset is because of the quality of images and the labeled subtypes in both benign and malignant tissues. For example, there are architectural similarities between benign tissue subtype fibroadenoma and malignant tissues, where most of the system trained without knowing the ground truth of types may, false positively classify the benign tissue as malignant. We have used images at 100 and 400 magnification from BreaKHis dataset for validation. Kaggle is a dataset of segmented nuclease images of various cell types, at different magnification and microscopic condition. We have used total of 30131 images and mask from Kaggle, to generate resized and augmented generalize nuclease detection masks dataset.
4 Research Methodology This section includes a detailed description of the research methods used in the implementation of the system. The proposed model is based on CNN architecture and implemented using Keras libraries and U-Net model in Python. 4.1
Convolutional Neural Network
The architecture of the convolution neural network consists of three main layers [6]: convolutional layer, fully connected linear layer and pooling layer. Neurons at every convolutional layer learn the features of the images and adjust the weights to match the features for future prediction. CNN is a backpropagation neural network, where the error calculated at every iteration is backpropagated to reduce the mean error until the model converges. Images are convoluted using a mask and the desired features are extracted using a corresponding designed mask. In a fully connected layer output of each layer is given as an input to the post convolution layer. The final output is a set of all the identified features. Pooling layers is inserted between every convolutional layer to reduce the time complexity and resolve the overfitting problem by reducing the spatial size of images. 4.2
Keras
Keras [26] is a set of neural network open APIs written in python which runs on computation engines. TensorFlow, CNTK, Theano, MXNet, and PlaidML are supported by computational engines. It has a collection of modules to integrate and design
Computer Aided System for Nuclei Localization
229
new models. One can use keras to design its machine learning models. Keras include sequential and functional models. Functional models can be used to design complex learning models. 4.3
U-Net
U-Net [19] is convolutional neural network architecture for image segmentation. It is designed to give precise segmentation with less training images. It works in two phases firstly layers of contractions and followed by an equal number of expansion layers. This gives U-shape to the network architecture as shown in Fig. 1. In the contraction phase, the features maps are converted into vectors and in the expansion phase, the image with features maps are reconstructed. The architecture consists of three repeated layers: 3 3 convolutional layers followed rectified linear unit (ReLU) and 2 2 max pooling operation with stride 2 for downsampling. During expansion phases, the cropped feature maps from the corresponding contraction phase are concatenated. At the final layer 1 1 convolution is used to map each class. U-Net uses a loss weighting function to segment the overlapping features.
Fig. 1. U-Net architecture [19]
5 Proposed System Detailed study was carried on various CNN based libraries and pre trained networks. The study inferred the use of Keras and U-Net for the best implementation of CNN for breast cancer nuclear localization. To achieve desired segmentation and localization of nucleases in histopathological images we have used handcrafted KUH-CNN model. The proposed model uses fined tuned parameters for precise localization of nucleases in
230
M. G. Kanojia et al.
histopathological images at 100 and 400 magnification. Figure 2 shows the architecture overview of the proposed model.
Fig. 2. KUH-CNN model architecture
CNN is a supervised artificial neural network where each neuron layer convoluted images and masks. In this research, the segmented nucleus images and masks dataset were sourced from Kaggle. We used 30131 images and masks to generate desired scaled masks for training and testing. We designed a functional model in Keras to use the Kaggle dataset and produce a specialized mask for further processing. Kaggle dataset set has images of various sizes, type and magnification. The dataset from Kaggle is divided into a training set and testing set. The training set images and masks are resized to 128 128 for faster computation. The testing dataset includes only resized images. Data diversity is achieved by data augmentation adding new data to the data set. Shearing is used to avoiding overfitting. Using the above process in Keras we have generated the desired generalize mask data set for nuclease localization. We can use this mask to segment the images for nucleus localization. The output at this stage are colour filled segmented nucleases with approximation error. To achieve the desired accuracy in the nucleases localization we have built fined tuned parameters U-Net model. To train the U-Net CNN model we have sourced histopathological image from BreaKHis dataset. The images are preprocessed using a median filter to remove the noise, and then the images are partitioned into patches. The patches of histopathological images and the masks dataset generated in keras module are used for convolution in the designed U-Net model. The model uses 3 image channels, downscales the images by 3 3 repeated convolution of 16 16, 32 32, 128 128, 256 256 then upscaled in reversed order with cropped feature map at each layer. At each convoluted layer, the features are mapped and downscaled with 2 2 max pooling. A dropout value of 0.1 is used to avoid overfitting. Intersection over the union method
Computer Aided System for Nuclei Localization
231
is used to identify the borders of the overlapping nucleases. The output layer is of size 1 1 with a sigmoid activation function. This model localizes all the nucleases in the image, with its center and detected edges.
6 Result The proposed system is easy to use and accurate in the identification of nucleases. The issue of overlapping nucleases is well addressed. The same dataset was used to create an image processing-based model for nucleus localization. We implemented a median filter as image preprocessing techniques to remove noise and Otsu’s [1] method for image segmentation followed by various nuclease localization functions in python. Figure 3(a) shows the output from the image processing module. Figure 3(b) shows the output from our KUH_CNN module. It is evident that the proposed model has surpassed the accuracy in the localization of nuclei in histopathological images. The edges are clear and marked with centers of each nucleus identified.
Fig. 3. Nucleuses detected in histopathological images. (a) Using only image processing techniques. (b) Using KUH-CNN model.
A hand coded graphical user interface-based system is designed in Python to implement the proposed model. The system is simple to use. A series of figures below shows the steps involved in the system. It was observed that the output for images with noise, improper staining and blurred were inappropriate. The proposed system allows the user to visually identify the images for this problem before analyzing. Firstly, the histopathological image is loaded using the load image module. The image is processed using a process image module which includes the KUH-CNN based nuclease localization module and the output image is displayed. The image can be saved using the save image module.
232
M. G. Kanojia et al.
Fig. 4. GUI of the application
Fig. 5. Opening histopathological image
Fig. 6. Histopathological image loaded
Fig. 7. Nucleases identified image
7 Conclusion The paper describes the complex structure histopathological image and necessity for nuclei identification for the detection of breast cancer. The proposed KUH-CNN model is simulated using Kaggle and BreaKHis dataset. The work also shows the use of keras and U-Net libraries for nuclei localization. We claim that the GUI based designed proposed expert system can be used by histopathologist for faster and detailed analysis of tissue images which will be an aid for the diagnosis of breast cancer. Further, the system can be integrated to extract feature set of nucleases and train the desired machine learning model for breast cancer detection. One of the machine learning algorithms reviewed by us [27] in 2019, can be integrated with the proposed work to design fully automated CAD system for breast cancer detection. Based on our experimentation and output, we claim that we have successfully implemented the computer aided system for nuclei localization in histopathological images.
Computer Aided System for Nuclei Localization
233
8 Future Work In our future work, we aim to add modules for identified nuclei feature extraction and train the machine learning algorithm for the diagnosis of breast cancer. The module can work in batch mode to facilitate the nuclease identification for the whole dataset. Eventually, design a fully automated breast cancer detection computer aided diagnosis system for breast cancer detection using histopathological images.
References 1. Zeng, Z., Xie, W., Zhang, Y., Lu, Y.: RIC-Unet: an improved neural network based on Unet for nuclei segmentation in histology images. IEEE Access 7, 21420–21428 (2019) 2. Yousefi, S., Nie, Y.: Transfer learning from nucleus detection to classification in histopathology images. In: 16th International Symposium on Biomedical Imaging, Venice, Italy, vol. 16, pp. 957–960 (2019) 3. Tofighi, M., Guo, T., Vanamala, J.K.P., Monga, V.: Prior information guided regularized deep learning for cell nucleus detection. IEEE Trans. Med. Imaging 38, 2047–2058 (2019) 4. Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L.: Breast cancer histopathological image classification using convolutional neural networks. In: International Joint Conference on Neural Networks (IJCNN). Vancouver, BC, pp. 2560–2567 (2016) 5. Ye, J., Luo, Y., Zhu, C., Liu, F., Zhang, Y.: Breast cancer image classification on WSI with spatial correlations. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, pp. 1219–1223 (2019) 6. Kumar, K., Rao, A.C.S.: Breast cancer classification of image using convolutional neural networks. In: 4th International Conference on Recent Advances in Information Technology (RAIT), Dhanbad, India (2018) 7. Angara, S., Robinson, M., Guillen-Rondon, P.: Convolutional neural networks for breast cancer histopathological image classification. In: 4th International Conference on Big Data and Information Analytics (BigDIA), Houston, TX, USA (2018) 8. Jafarbiglo, S.K., Danyali, H., Helfroush, M.S.: Nuclear atypia grading in histopathological images of breast cancer using convolutional neural networks. In: 4th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Tehran, Iran, pp. 89–93 (2018) 9. He, S., Ruan, J., Long, Y., Wang, J., Wu, C., Ye, G., Zhang, Y.: Combining deep learning with traditional features for classification and segmentation of pathological images of breast cancer. In: 11th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, pp. 3–6 (2018) 10. Chang, J., Yu, J., Han, T., Chang, H., Park, E.: A method for classifying medical images using transfer learning: A pilot study on histopathology of breast cancer. In: IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), Dalian, China (2017) 11. Spanhol, F.A., Oliveira, L.S., Cavalin, P.R., Petitjean, C., Heutte, L.: Deep features for breast cancer histopathological image classification. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, pp. 1868–1873 (2017) 12. Yan, R., Ren, F., Wang, Z., Wang, L., Ren, Y., Liu, Y., Zhang, F.: A hybrid convolutional and recurrent deep neural network for breast cancer pathological image classification. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, pp. 957–962 (2018)
234
M. G. Kanojia et al.
13. Garud, H., Karri, S.P.K., Sheet, D., Chatterjee, J., Mahadevappa, M., Ray, A.K., Maity, A. K.: High-magnification multi-views based classification of breast fine needle aspiration cytology cell samples using fusion of decisions from deep convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 828– 833 (2017) 14. Nahid, A.-A., Ali, F.B., Kong, Y.: Histopathological breast-image classification with image enhancement by convolutional neural network. In: 20th International Conference of Computer and Information Technology (ICCIT), Dhaka, Bangladesh (2017) 15. Reza, M.S., Ma, J.: Imbalanced histopathological breast cancer image classification with convolutional neural network. In: 14th IEEE International Conference on Signal Processing (ICSP), Beijing, China, pp. 619–624 (2018) 16. Su, H., Liu, F., Xie, Y., Xing, F., Meyyappan, S., Yang, L.: Region segmentation in histopathological breast cancer images using deep convolutional neural network. In: IEEE 12th International Symposium on Biomedical Imaging (ISBI), New York, USA, pp. 55–58 (2015) 17. Kanojia, M.G., Abraham, S.: Breast cancer detection using RBF neural network. In: 2nd International Conference on Contemporary Computing and Informatics (IC3I), Noida, India, pp. 363–368 (2016) 18. Veta, M., Pluim, J.P.W., Diest, P.J.V., Viergever, M.A.: Breast cancer histopathology image analysis a review. IEEE Trans. Biomed. Eng. 61(5), 1400–1411 (2014) 19. Ronneberger O., Fischer P., Brox T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI. LNCS, vol. 9351. Springer, Cham (2015) 20. Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L.: A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. 63(7), 1–8 (2016) 21. Stancin, I., Jovic, A.: An overview and comparison of free Python libraries for data mining and big data analysis. In: 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Hrvatska, pp. 1161–1166 (2019) 22. Wu, C., Ruan, J., Ye, G., Zhou, J., He, S., Wang, J., Zhang, Y.: Identifying tumor in wholeslide images of breast cancer using transfer learning and adaptive sampling. In: Eleventh International Conference on Advanced Computational Intelligence (ICACI), Guilin, China, pp. 167–172 (2019) 23. Sivaraman, K., Murthy, A.: Object recognition under lighting variations using pre-trained networks. In: IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA. (2018) 24. Zhu, G., Li, B., Hong, S., Mao, B.: Texture recognition and classification based on deep learning. In: Sixth International Conference on Advanced Cloud and Big Data (CBD), Lanzhou, China (2018) 25. Kaggle. https://www.kaggle.com 26. Keras: The Python deep learning library. https://keras.io 27. Kunal, P., Mahendra, K., Brian, D., Niketa, G.: Breast cancer detection using WBCD. In: International Interdisciplinary Conference on Recent Trends in Science and Review of Research Journal. UGC Approved Journal No. 48514, Alibag, India, (2019)
Intrusion Detection System for the IoT: A Comprehensive Review Akhil Jabbar Meera1(&), M. V. V. Prasad Kantipudi2, and Rajanikanth Aluvalu1 1
2
Vardhaman College of Engineering, Hyderabad, India [email protected], [email protected] Sreyas Institute of Engineering & Technology, Hyderabad, India [email protected]
Abstract. The IoT has recently been widely used in smart homes and smart cities design. With various services and application domains. (IoT) connects objects with internet to make our life easier, which leads IoT environments vulnerable to different kinds of attacks. Threats to IoT are increasing due to large number of devices with different standards are connected. Intrusion Detection System (IDS) is used to protect from various types of attacks. Intrusion detection system (IDS) works in the network layer of an IoT system. This paper highlights the issues related IoT security, discusses literature on implementation of IDS for IoT using ML algorithms and also makes few suggestions. An IDS designed for IoT should operate under stringent conditions. More IDS have to be designed to detect major attacks to safe guard IoT. Keywords: Machine Learning (ML) Internet of Things (IoT) Network security Vulnerability Attacks Intrusion Detection System (IDS)
1 Introduction Internet of Things (IoT) is a collection of various devices connected to the Internet. The IoT can be treated as a giant network of connected things and people. They will collect and share the data about the environment around them. In 2020, nearly 50 billion things connected to the Internet [1]. The Internet of Things (IoT) has the following features 1) IoT is a network of physical objects which gather and share the information 2) IoT also includes smart devices which use Internet protocol (IP). As per Gartner, by the end of year 2020, there will be 5.81 billion devices will be connected. IoT market by segment wise for the year 2018–2020 is shown in Table 1 In IoT all the connected things will be categorized into three types [2] Things which collect information 1. Things which receive information 2. Things that receive and collect the information
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 235–243, 2021. https://doi.org/10.1007/978-3-030-49345-5_25
236
A. J. Meera et al. Table 1. IoT market segments Segment Utilities Government Building automation Physical security Manufacturing & natural resources Automotive Healthcare providers Retail & wholesale trade Information Transportation Total Source: Gartner (August 2019)
2018 0.98 0.40 0.23 0.83 0.33 0.27 0.21 0.29 0.37 0.06 3.96
2019 1.17 0.53 0.31 0.95 0.40 0.36 0.28 0.36 0.37 0.07 4.81
2020 1.37 0.70 0.44 1.09 0.49 0.47 0.36 0.44 0.37 0.08 5.81
In General IoT works as: 1. Sensors, used to collect the data. 2. The data which is collected shared through cloud. 3. Software will analyze the data through App or website [3]. The IoT has a reflective impact on many industries like Manufacturing, Transportation, and health care, automotive industries. As per Mc Kinsey report by 2025 there will be $11T yearly economic impact of the IoT. IoT includes broad range of technologies and services, which includes related technologies such as AI, Big data, next-gen cyber security, Cloud computing, Advanced analytics, AR and VR, and block chain. The IoT bridges digital and physical realities and People’s lives. Most of the IoT elements will have common things like 1) Connectivity 2) Things 3) Data 4) Communication 5) Intelligence and action 6) Automation and Eco system. ID systems developed in order to tackle the attacks on IoT. The safeguarding of IoT security is becoming increasingly difficult and more efficient detection methods are required to prevent from the attackers. This paper reviews and recommends the IDS in IoT. In Sect. 2, background study is discussed. In Sect. 3, research on IDS and IoT is discussed. We conclude in Sect. 4 and mentioned some recommendations.
2 Background Study A. Internet of Things (IoT) IoT is integration of cyber world and physical devices. As Governments are focusing on smart cities and investing more on IoT and its related technologies. Statistics of IoT are listed below • It is estimated that by 2020 nearly 90% cars will be connected to the net. • Microsoft is going to invest USD 5 billion in IoT and related technology in four years [4].
Intrusion Detection System for the IoT: A Comprehensive Review
237
Three phases of IoT operations [5]. 1) Collection and 2) Transmission 3) Processing and, management, utilization The IoT [6] term was introduced in 1998 [7]. Wireless connection is established among the objects with small sensors in IoT network. IoT devices can communicate with each other without human interference [8]. IoT devices can be accessed through untrusted network and IoT networks are prone to various attacks. 4 layers are presented in IoT architecture 1) Perceptual 2) Network 3) Support Layer 4) and Application Layer, which are shown in Fig. 1.
Application Support Network Perceptual
Security management Fig. 1. Architecture of IoT [9]
Due to wide variety of applications, IoT is becoming more prone to various types of attacks like 1) Jamming 2) Block Hole 3) Sybil 4) Vulnerability attacks. These attacks are shown in Fig. 2.
Fig. 2. Various attacks on IoT [10]
238
A. J. Meera et al.
B. Intrusion Detection System (IDS) ML techniques are very much used to implement IDS. IDS can be of two types 1) Host-based (HIDS) 2) Network-based (NIDS). HIDS verifies malicious activities whereas NIDS analyzes network traffic [11, 12]. Various IDS methods are 1) Statistical analysis 2) Evolutionary 3) Protocol verification 4) Rule-based [13]. Classification of intrusion detection system based on their behavior [14]. 1. Signature based IDS: This type of IDS will verify the present profile of the network with the existing attack patterns [15]. 2. Anomaly based IDS: Founds intrusions if the behavior is different from regular one. 3. Specification based IDS: if any malicious activity it will raise alarm. As mentioned earlier IoT is the combination of heterogeneous devices. IoT is the combination of adhoc and wireless sensor networks. These networks are prone to various kinds of attacks like internal and external. For detail description of the attacks readers may refer [14, 18]. C. Intrusion Detection System (IDS) in IoT This section discusses intrusion detection system for IoT. Various attacks on IoT are 1) Fragmentation Attack 2) Authentication Attack 3) Confidentiality Attack 4) Altering and Spoofing Attacks 5) Rank Attack 6) Local Repair Attack 7) Neighbor Attack 8) Black hole Attack 9) Sybil attack 10) Clone ID and Sybil Attack 11) Distributed Denial-of-Service Attack (DoS or DDoS) 12) Hello Flooding [16] Attack 13) Homing Attack 14) Resource Exhausting Attack 15) Selective Forwarding Attack 16) Sinkhole Attack 17) Wormhole Attack 18) Clone attack [17].
3 Related Work This section will review literature on implementation of IDS using Machine Learning (ML) algorithms. Fu et al. [19] proposed a model using anomaly mining which uses slice time window to find normal traffic. Data of size 2.3 millions data, which is collected from sensors data, is used for experimental analysis. Distributed anomaly detection have been proposed by Rajasegaran et al. [20]. Authors proposed a model which uses hyper ellipsoidal groups at every node to detect abnormal behavior in the system. Accuracy, Detection rate have been improved in the method. Ham et al. [21] proposed a SVM to detect attacks in android IoT. The proposed method is used for malware android detection. 14 Malicious and 14 normal nodes are used for evaluation process. Data set contains 10% malicious and 90% normal applications. Proposed method recorded high accuracy and detection rate. Sinkhole attack detection method was proposed by Cervantes et al. [22]. Proposed method is evaluated with SVELTE. Simulation has been claimed out in the dimensions of 80 80 and 100 100 m. Simulation time is fixed as 1500 s. This method detects only one attack.
Intrusion Detection System for the IoT: A Comprehensive Review
239
Neuro fuzzy based IDS has been proposed by Rahman [23]. IDS uses ANFIS model observes the activity of the network. Data in the data base will be updated dynamically. Proposed method recorded high accuracy and high reliability. Detection of DDOS attacks on IoT have been proposed by Sonar and Upadhyay [24] based on agents are software based managers. The proposed strategy is simulated using COOJA simulator. This method is not preferred to detect large number of attacks. Hodo et al. [25] proposed an ANN based IDS to detect DDOS attacks. 2313 samples were trained to evaluate the performance of the model. High detection rate, low false positive rate have been recorded by the proposed method. Deep learning based IDS was proposed by Diro and Chilamurti [26]. Proposed method can be extended to distributed system. Experiments were carried out on NSLKDD data set. High detection rate and low false alarm rate have been recorded by the proposed method. Distributed IDS was presented in [27]. Signatures are sent to blooms filters. Pattern checking is performed to verify the pattern match. This method is restricted to detect limited number of attacks. Oh et al. [28] proposed a model which uses pattern matching engine to detect malicious nodes in the IoT. Early decision and auxiliary shifting strategies are used in the model. Omni vision 5647 sensor and Raspberry PI are used in the model. Proposed model uses low memory consumption but detect less number of attacks. Cloud based Frame work for IDS was proposed by Sun et al. [29]. Proposed method cloud eye implements a lightweight scanning agent. Idea is to detect malicious attacks in a cloud environment. The method requires low time consumption and provides high data privacy. Detection of attacks in IoT using service oriented architecture and learning automata concepts was introduced by Misra [30]. SOA which acts as middle ware provides a good platform for applications for IoT. A unique method proposed by Jover [31] et al. used to recognize intrusions for SMS. The method is based on two analysis 1) Contact based 2) Volumetric based. DOS attacks are recognized yby combining these two metrics. Privacy aware routing protocol (PALXA) was proposed by Xia et al. [32]. This protocol uses two processes 1) Maintenance 2) Route step. Experiments were simulated using MATLAB. Low latency and good accuracy are the advantages with their proposed method. Game theoretic model to detect various types of attacks in a honey pot enable IOT network was suggested by La et al. [33]. Proposed system analyzes incoming traffic and if suspicious traffic is identified, it will sent to honey pot. Simulation is performed using MATLAB. A method used to identify black hole attacks was introduced by ahmed et al. [34]. When a suspicious node is identified neighbor will inspect further. For verification nodes in the network a query is initiated. Contiki 2.7 OS and COOJA simulator were used for simulations Le et al. [35] proposed a method for checking node behaviors consists of two steps 1) Simulation of RPL Protocol 2) Knowledge of RPL algorithms will be translated. High scalability and detection rate are recorded in proposed method.
240
A. J. Meera et al.
IDS IP was demonstrated by Amin et al. [27]. The proposed method has USN packet analyzes and internet packet analyzer. IPA has two segments namely 1) Pattern classifier 2) Anomaly detector. USN Packet analyzer is to detect attacks. Low memory consumption is the advantage of the proposed method. Security mechanism for DOS attack detection in IoT environment is demonstrated by Kasinathan et al. [36]. The proposed uses hybrid method to detect various attacks. An alarm is raised when an anomaly is detected in 6LOWPAN network. Proposed method is evaluated using penetration test. A method to detect routing attacks was outlined by Raza et al. [37]. The proposed method has three modules namely 1) 6LOWPAN mapper 2) Intrusion detector 3) distributed mini firewall. Each module consists of two light weight modules inside each node. Sedjelmaci [38] proposed a game theoretic model which merges signature based and anomaly based IDS. When a new attack is detected, proposed model (built using game theory and Nash equilibrium) activates the anomaly based IDS. To evaluate the proposed model, TOSSIM simulator is used. Low energy consumption is added advantage of the propose model. Self adapting IDS kalis has been introduced by midi et al. [39]. Kalis will detect the attacks in real time across IOT systems. Tiny OS with TelosB wireless sensor mote is used to evaluate the proposed technique. Kalis is implemented using java. High computational overhead is the disadvantage of the proposed method. Conti et al. [40] presented a survey which addresses the opportunities and challenges in IoT domain. This survey focused on highlighting the challenges in IoT. Deep learning has attracted many researchers to focus more on developing IDS. Lopez martin [41] presented a NIDS for IoT network. The proposed model is based on conditional variation auto encoder. An advantage of the proposed model is it operates in single training and saving computational resources. Software defined networking architecture for IoT was discussed by Flauzac [42] et al. Their work demonstrates the SDN architecture to achieve network security. Table 2 gives the summary of research on IDS implemented using IoT.
Table 2. Literature on IDS and IoT Sl no 1 2 3 4 5 6 7 8 9 10
Author name Fu et al. Rajasegaran et.al Ham et al. Rahman et al. Cervantes et al. Upadhyay Hodo et al. Diro and Chilamurti Oh et al. Sun et al.
Research method Anomaly mining Distributed anomaly detection SVM based anomaly detection Neuro Fuzzy based SVELTE Agent based IDS ANN based IDS Deep learning based IDS Pattern matching engine for IDS Lightweight IDS
Year 2016 2006 2014 2016 2015 2013 2016 2018 2014 2017
Intrusion Detection System for the IoT: A Comprehensive Review
241
4 Conclusion With the wide spread use of IoT in every do- main, threats and attacks are also growing commensurately in IoT and related technologies. To achieve the effective defense against various -attacks, the Intrusion Detection Systems (IDS) is developed. However, conventional IDS need to be improved and modified for application to the IoT. Machine learning algorithms are widely used to develop IDS for IoT systems. This paper highlights the issues related IoT security, discusses literature on implementation of IDS for IoT using ML algorithms, research challenges and makes few suggestions to safeguard the IoT from various attacks.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
15. 16. 17. 18. 19.
Bahga, A., et al.: IoT: A Hands on Approach. University Press, Cambridge (2017) https://www.iotforall.com/what-is-iot-simple-explanation/ https://builtin.com/internet-things. Accessed 04 Apr 2018 www.moneycontrol.com. Accessed 04 Apr 2018 Borgia, E.: The Internet of Things vision: key features, applications and open issues. Comput. Commun. 54, 1–31 (2014) Hamzei, M., Navimipour, N.J.: Toward efficient service composition techniques in the Internet of Things. IEEE IoT J. 5(5), 3774–3787 (2018) Sarma, S., Brock, D.: The Internet of Things, White Paper, Auto-ID center. MIT (1998) Kortuem, et al.: Smart objects as building blocks for the internet of things. IEEE Internet Comput. 14(1), 44–51. https://doi.org/10.1109/mic.2009.143 Adat, V., et al.: Security in internet of things: issues, challenges, taxonomy and architecture. Telecommun. Syst. 67, 423–441 (2017) Jan, S.U., et al.: Toward a lightweight intrusion detection system for the IoT. IEEE Access 7, 42450–42471 (2019) Hudo, E., et al.: Threat analysis of IoT network using artificial neural network, pp. 1–5 (2017) Lazarevic, A., Kumar, V., Srivastava, J.: Intrusion detection: a survey, pp. 19–78 (2005) Choudhary, S., et al.: Intrusion detection system for Internet of Things. Int. J. Inf. Secur. Privacy 13(1), 86–105 (2019) Amaral, J.P., Oliveira, L.M., Rodrigues, J.J., Han, G., Shu, L.: Policy and network-based intrusion detection system for IPv6-enabled wireless sensor networks. In: 2014 IEEE International Conference on Communications (ICC), pp. 1796–1801. IEEE, June 2014 Sing, V.P.: Hello flood attack and its countermeasures in wireless sensor networks. IJCSI Int. J. Comput. Sci. Issues 7(3), 23 (2010) Anthoniraj, J., Abdul, R.T.: Clone attack detection protocols in wireless sensor networks: a survey. Int. J. Comput. Appl. 98, 43–49 (2014). https://doi.org/10.5120/17183-7281 Stergiou, C., Psannis, K.E., Kim, B.G., Gupta, B.: Secure integration of IoT and cloud computing. Future Gener. Comput. Syst. (2016). https://doi.org/10.1016/j.future.2016.11.03 Pacheco, J., et al.: IoT security framework for smart cyber infrastructures. In: IEEE 1st International Workshops on Foundations and Applications of Self Systems, Germany (2016) Fu, R., Zheng, K., Zhang, D., Yang, Y. An intrusion detection scheme based on anomaly mining in Internet of Things (2011)
242
A. J. Meera et al.
20. Rajasegarar, S., Leckie, C., Palaniswami, M., Bezdek, J.C.: Distributed anomaly detection in wireless sensor networks. In: 2006 10th IEEE Singapore International Conference on Communication Systems, Singapore, pp. 1–5 (2006) 21. Ham, H.-S., Kim, H.-H., Kim, M.-S., Choi, M.-J.: Linear SVM-based android malware detection for reliable IoT services. J. Appl. Math., 1–10 (2014). https://doi.org/10.1155/ 2014/594501 22. Cervantes, C., Poplade, D., Nogueira, M., Santos, A.: Detection of sinkhole attacks for supporting secure routing on 6LoWPAN for Internet of Things. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), pp. 606–611. IEEE, May 2015 23. Rahman, S., Ahmed, M., Kaiser, M.S.: ANFIS based cyber physical attack detection system. In: 5th International Conference on Informatics, Electronics and Vision (ICIEV), Dhaka, 2016, pp. 944–948 (2016) 24. Chaudhary, V.K., Upadhyay, S.K.: Distributed intrusion detection system using sensor based mobile agent technology. Int. J. Innov. Eng. Technol. (IJIET) 3(1), 220–226 (2013) 25. Hodo, E., Bellekens, X., Hamilton, A., Dubouilh, P.L., Iorkyase, E., Tachtatzis, C., Atkinson, R.: Threat analysis of IoT networks using artificial neural network intrusion detection system. In: 2016 International Symposium on Networks, Computers and Communications (ISNCC), pp. 1–6 (2016) 26. Diro, A., Chilamkurti, N.: Leveraging LSTM networks for attack detection in fog-to-things communications. IEEE Commun. Mag. 56, 124–130 (2018) 27. Amin, S.O., Siddiqui, M.S., Hong, C.S., Choe, J.: A novel coding scheme to implement signature based IDS in IP based Sensor Networks. In: IFIP/IEEE International Symposium on Integrated Network, Management-Workshops IM 2009, pp. 269–274. IEEE, June 2009 28. Oh, D., et al.: A malicious pattern detection engine for embedded security systems in the Internet of Things. Sensors. 14(12), 24188–24211 (2014) 29. Sun, H., Wang, X., Buyya, R., Su, J.: CloudEyes: Cloud-based malware detection with reversible sketch for resource-constrained internet of things (IoT) devices. Softw. Pract. Exp. 47(3), 421–441 (2017). https://doi.org/10.1002/spe.2420 30. Misra, S., Krishna, P.V., Agarwal, H., Saxena, A., Obaidat, M.S.: A learning automata based solution for preventing distributed denial of service in Internet of Things. In: 2011 International Conference on and 4th International Conference on Cyber, Physical and Social Computing Internet of Things (iThings/CPSCom), pp. 114–122 (2011) 31. Piqueras Jover, R.: Security attacks against the availability of LTE mobility networks: overview and research directions. In: 2013 16th International Symposium on Wireless Personal Multimedia Communications (WPMC), Atlantic City, NJ, pp. 1–9 (2013) 32. Xia, Y., Lin, H., Xu, L.: An AGV mechanism based secure routing protocol for Internet of Things. In: IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, Liverpool, pp. 662–666 (2015) 33. Yang, L.: Future Internet 2018, vol. 11, p. 65 (2018). https://doi.org/10.3390/fi11030065 34. Ahmed, Firoz, et al.: Mitigation of black hole attacks in routing protocol for low power and lossy networks. Secur. Commun. Netw. 9, 5143–5154 (2016) 35. Le, A., Loo, J., Chai, K.K., Aiash, M.: Specification-based IDS for detecting attacks on RPL based network topology. Information 7(2), 25 (2016). https://doi.org/10.3390/info7020025 36. Kasinathan, P., Costamagna, G., Khaleel, H., Pastrone, C., Spirito, M.A.: DEMO: an IDS framework for internet of things empowered by 6LoWPAN. In: Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, pp. 1337–1340. ACM, November 2013
Intrusion Detection System for the IoT: A Comprehensive Review
243
37. Raza, S., Duquennoy, S., Chung, T., Yazar, D., Voigt, T., Roedig, U.: Securing communication in 6LoWPAN with compressed IPsec. In: 2011 International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS), pp. 1–8. IEEE, June 2011 38. Sedjelmaci, H., Senouci, S.M., Feham, M.: Intrusion detection framework of cluster-based wireless sensor network. In: IEEE ISCC, Cappadocia, Turkey, pp. 893–897 (2012) 39. Midi, D., Rullo, A., Mudgerikar, A., Bertino, E.: Kalis — a system for knowledge-driven adaptable intrusion detection for the Internet of Things. In: IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, pp. 656–666 (2017) 40. Conti, M., Dehghantanha, A., Franke, K., Watson, S.: Internet of Things security and forensics: challenges and opportunities. Future Gener. Comput. Syst. 78 (2017). https://doi. org/10.1016/j.future.2017.07.060 41. Lopez-Martin, M.: Conditional variational auto encoder for prediction and feature recovery applied to intrusion detection in IoT. Sensors 17, 1967 (2017). https://doi.org/10.3390/ s17091967 42. Flauzac, O., Gonzalez, C., Hachani, A., Nolot, F.: SDN based architecture for IoT and Improvement of the Security. In: 29th International Conference on Advanced Information Networking and Applications Workshops WAINA, pp. 688–693 (2015)
Multi-objective Symmetric Fractional Programming Problem and Duality Relations Under (C, Gf , α, ρ, d)-Invexity over Cone Constraints Ramu Dubey1 , Teekam Singh2(B) , Vrince Vimal3 , and Bhaskar Nautiyal2 1
J.C. Bose University of Science and Technology, YMCA, Faridabad, Haryana, India 2 Graphic Era (deemed to be) University, Dehradun 248002, Uttarakhand, India [email protected] 3 Graphic Era Hill University, Dehradun 248002, Uttarakhand, India
Abstract. We propose the idea of (C, Gf , α, ρ, d)-invex function and give a significant numerical example which explains the existence of this type of functions. Moreover, we have established several generalized concepts, namely, (F, Gf , α, ρ, d)/(C, Gf , α, ρ, d)-invexity and formulate a numerical example which satisfies feasible conditions of the given system. We consider Mond-Weir type fractional symmetric dual model over arbitrary cones and discuss duality theorems under (C, Gf , α, ρ, d)-invexity assumptions. Keywords: Symmetric duality · Strong duality · (F, Gf , α, ρ, d)-invex · (C, Gf , α, ρ, d)-invex · Multi-objective fractional programming · Arbitrary cones
1
Introduction
Optimization is an dynamic and rapid growing research scope and has a extensive impact on the realistic phenomenon. In majority of real world puzzles, conclusions are made taking into account several contradictory criteria, rather than by optimizing a single objective. This type of problem is known as multi-objective programming. Multi-objective programming through mathematical modeling used to analysis the real world problems. Dual of a primal plays the vital role in nonlinear programming. In reality, when the solution of original primal problem poses certain hassles, we look the dual solution. Beside the well known Wolfe dual [3], Mond and Weir [4] formulated various duals for nonlinear problems with positive factors and demonstrated distinct duality theorems with pseudoconvexity quasi-convexity conditions. c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 244–253, 2021. https://doi.org/10.1007/978-3-030-49345-5_26
Multi-objective Symmetric Fractional Programming
245
Hanson [5] proposed the idea of invex function which is an extension of convexity. Hanson proved the Kuhn-Tucker sufficient condition. Antczak [6] proposed the theory of G-invexity and obtained few stipulations for restricted problems under G-invex functions. Antczak in [7] detailed the earlier concept by a vector estimated G-invexity and prove the theorems for multi-objective problem. Lately, Kang et al. in [8] described the G-invex function for a local Lipschitz function and acquired the optimality theorems for multi-objective optimization problems. In last few years, several results of multi-objective fractional programming problems for optimality and duality have been acquired. Ferrara and Stefaneseu [9] employed the (φ, ρ)-invex function to derived the theorems of multi-objective optimization problem for optimality. Furthermore, Stefanescu and Ferrara [10] proposed a new category of (φ, ρ)ω - invex function for a multi-objective problem and derive theorems. Further, Gao [2] considered Mond-Weir type dual model with cone constraints as well as cone objective and derived duality results under generalized conditions. Recently, Khushboo et al. [1] integrated higherorder duality in non-differentiable multi-objective problem and established the duality results under higher order invexity conditions. In this article, we have proposed the definitions of (C, Gf , α, ρ, d)/ (F, Gf , α, ρ, d)-invexity and established a significant numerical instance which illustrates the existence of this kind of functions. We considered a group of multiobjective Mond-Weir type fractional primal-dual symmetric problems over arbitrary cones. Finally, we prove the weak, strong and converse duality theorems related to a competent solution under (C, Gf , α, ρ, d)-invexity assumptions.
2
Mathematical Preliminaries
Definition 2.1. The non-negative polar cone A∗ of a cone A ⊆ Rs is described by A∗ = {a ∈ Rs : bT a 0}. Consider optimization problem: T (MP) Minimize f (b) = f1 (b), f2 (b), ..., fk (b) Subject to B 0 = {b ∈ B ⊂ Rn : gj (b) ≤ 0, j = 1, 2, ..., m} where f = {f1 , f2 , ..., fk } : B → Rk and g = {g1 , g2 , ..., gm } : B → Rm are differentiable functions defined on B. Definition 2.2 [11]. A point ¯b ∈ B 0 is said to be an competent solution of (MP) if there exists no other b ∈ B 0 such that fr (b) < fr (¯b), for some r = 1, 2, ..., k and fi (b) ≤ fi (¯b), for all i = 1, 2, ..., k. Suppose f = (f1 , ..., fk ) : B → Rk be a differentiable function described on open set φ = B ⊆ Rn and Ifi (B), i = 1, 2, ..., k be the limit of fi .
246
R. Dubey et al.
Definition 2.3 [12]. Let S : B × B × Rn → R (B ⊆ Rn ) be a function which holds Sb,u (0) = 0, ∀(b, u) ∈ B × B. Then, S is said to be convex on Rn with regard to third place variable if and only if for any certain (b, u) ∈ B × B, Sb,u (δb1 + (1 − δ)b2 ) ≤ δSb,u (b1 ) + (1 − δ)Sb,u (b2 ), ∀δ ∈ (0, 1), ∀b1 , b2 ∈ Rn . Definition 2.4. A functional U : B × B × Rn → R is said to be sub-linear with respect to the third place variable if ∀ (b, u) ∈ B × B, (i) Ub,u (x1 + x2 ) ≤ Ub,u (x1 ) + Ub,u (x2 ), for all x1 , x2 ∈ Rn , (ii) Ub,u (αx) = αUb,u (x), for all α ∈ R+ and x ∈ Rn . Now, we propose the definition of a differentiable function (U, Gf , α, ρ, d)/(S, Gf , α, ρ, d)-invexity. Definition 2.5. The function f is said to be (U, Gf , α, ρ, d)-invex function at u ∈ B if there exist a differentiable function Gf = (Gf1 , Gf2 , ..., Gfk ) : R → Rk such that every component Gfi : Ifi (B) → R is strictly increasing on the range of Ifi ρ ∈ R, a real valued function α : B × R → R+ \ {0} and d : B × B → R (satisfying d(b, z) = 0 ⇔ b = z) such that ∀ b ∈ B, [Gfi (fi (b)) − Gfi (fi (u)) − ρi d2i (b, u)] ≥ Ub,u [α(b, u){Gfi (fi (u))∇b fi (u)}], ∀ i = 1, 2, ..., k. If the sign changes of above inequality to ≤, then f is called (F, Gf , α, ρ, d)-incave at u ∈ B. Remark 2.1. If α(b, u) = 1, ρ = 1 and Ub,u (x) = η T (b, u)(x), then Definition 2.5 reduces to Gfi − invex, i = 1, 2, 3, ..., k given by [7]. Definition 2.6. The function f is said to be (C, Gf , α, ρ, d)-invex function at u ∈ B if there exist a differentiable function Gf = (Gf1 , Gf2 , ..., Gfk ) : R → Rk such that every component Gfi : Ifi (B) → R is strictly increasing on the range of Ifi , ρ ∈ R, a real valued function α : B × B → R+ \ {0} and d : B × B → R (satisfying d(b, z) = 0 ⇔ b = z) such that ∀ b ∈ B, 1 [Gfi (fi (b)) − Gfi (fi (u)) − ρi d2i (b, u)] ≥ Sb,u [Gfi (fi (u))∇b fi (u)], α(b, u) ∀ i = 1, 2, ..., k. If the sign changes of above inequality to ≤, then f is called (S, Gf , α, ρ, d)-incave at u ∈ B. Remark 2.2. If α(b, u) = 1, ρ = 1 and Cb,u (x) = η T (b, u)(x), then Definition 2.6 reduces to Gfi − invex, i = 1, 2, 3, ..., k given by [7]. Now, we give a nontrivial numerical examples which are (S, Gf , α, ρ, d)-invex function, but not (U, Gf , α, ρ, d)-invex function with the same η.
Multi-objective Symmetric Fractional Programming
247
Example 2.1. Let f : [0, 1] → R2 be described as f (y) = f1 (y), f2 (y)
where f1 (y) = y 4 , f2 (y) = arc (tan y) and Gf = defined as:
Gf1 , Gf2
: R → R2 be
Gf1 (c) = c9 + c7 + c3 + 1 and Gf2 (c) = tan c. and α(y, u) = 1, ρi = 0 and di (y, u) = |y − u|, i = 1, 2. Let S : B × B × R2 → R be given as: Sy,u (x) = x2 (y − u). Now, we will show that f is (S, Gf , α, ρ, d)-invex at u = 0. Now, we have to claim that τi =
1 [Gfi (fi (b)) − Gfi (fi (u)) − ρi d2i (b, u)] − Sb,u [Gfi (fi (u))∇b fi (u)] ≥ 0, α(b, u) for i = 1, 2.
Substituting the values of f1 , f2 , Gf1 , Gf2 , α(b, u), ρ1 , ρ2 , d1 (b, u) and d2 (b, u) in the above expressions, we obtain τ1 = b36 + b28 + b12 + 1 − (u36 + u28 + u12 + 1) − Sb,u (36u35 + 28u27 + 12u11 ) × 4u3 ,
and
τ2 = b − u − Sb,u 1 ×
1 (1 + u2 )
which at u = 0 yield τ1 = b36 + b28 + b12 and τ2 = 0. Obviously, τ1 ≥ 0, and τ2 ≥ 0, ∀ b ∈ [0, 1]. Hence, f is (S, Gf , α, ρ, d)-invex at u = 0. Now, suppose 1 2 β= [f2 (b) − f2 (u) − ρ2 d2 (b, u)] − Cb,u ∇b f2 (u) α(b, u) or
β = arc (tan b) − arc (tan u) − Sb,u
which at u = 0 gives β = arc (tan b) − 1.
1 (1 + u2 )
248
R. Dubey et al.
Present formula may not be non-negative ∀ b ∈ [0, 1]. For example at b = 1, β=
π − 1 < 0. 4
Hence, f2 is not (S, α, ρ, d)-convex at u = 0. Hence, f = (f1 , f2 ) is not (S, α, ρ, d)convex at u = 0. Finally, S(b,u) is not a sub-linear in its third place. Consequently, function f is not (U, Gf , α, ρ, d)-invex function.
3
Mond-Weir Model
Consider the following pair of multi-objective fractional symmetric primal-dual programs over arbitrary cones: Gf k (fk (b, a)) Gf 1 (f1 (b, a)) Gf 2 (f2 (b, a)) , , ..., (MFP) Minimize L(b, a) = Gg1 (g1 (b, a)) Gg2 (g2 (b, a)) Ggk (gk (b, a)) subject to −
aT
Gfi (fi (b, a)) (Ggi gi (b, a))∇a gi (b, a) ∈ C2∗ , βi Gfi (fi (b, a))∇a fi (b, a) − Ggi (gi (b, a)) i=1
k
Gfi (fi (b, a)) (Ggi (gi (b, a))∇a gi (b, a)) ≥ 0, βi Gfi (fi (b, a))∇a fi (b, a) − Ggi (gi (b, a)) i=1
k
β > 0, β T e = 1, b ∈ C1 . (MFD) Maximize M (u, v) =
Gf 1 (f1 (u, v)) Gg1 (g1 (u, v))
,
Gf 2 (f2 (u, v)) Gg2 (g2 (u, v))
, ...,
Gf k (fk (u, v))
Ggk (gk (u, v))
subject to Gfi (fi (u, v)) (Ggi (gi (u, v))∇b gi (u, v)) ∈ C1∗ , βi Gfi (fi (u, v))∇b fi (u, v) − Ggi (gi (u, v)) i=1
k
uT
Gfi (fi (u, v)) (Ggi (gi (u, v))∇b gi (u, v)) ≤ 0, βi Gfi (fi (u, v))∇b fi (u, v) − Ggi (gi (u, v)) i=1
k
β > 0, β T e = 1, v ∈ C2 ,
Multi-objective Symmetric Fractional Programming
249
where S1 ⊆ Rn and S2 ⊆ Rm , C1 and C2 are arbitrary cones in Rn and Rm , respectively such that C1 × C2 ⊆ S1 × S2 , fi : S1 × S2 → R, gi : S1 × S2 → R are differentiable functions, Gfi : Ifi → R and Ggi : Igi → R are differentiable strictly increasing functions on their domains. C1∗ and C2∗ are positive polar cones of C1 and C2 , respectively. It is assumed that in the feasible regions, the numerators are nonnegative and denominators are positive. The following example shows the feasibility of the primal problem (MFP) and dual problem (MFD) discussed above: Example 3.1. Let k = 2, n = m = 1 and S1 = R, S2 = R. Let fi : S1 × S2 → R, gi : S1 × S2 → R be defined as f1 (b, a) = b3 + a2 , f2 (b, a) = 2b4 + xy 2 + 2a2 , g1 (b, a) = 2b2 a2 + 4, g2 (b, a) = xy 4 + b2 + 1. Suppose Gfi (t) = Ggi (t) = t, i = 1, 2. Assume that C1 = C2 = R+ then C1∗ = C2∗ = R+ . Clearly, C1 × C2 ⊆ S1 × S2 . 3 b + a2 2b4 + xy 2 + 2a2 (EMFP) Minimize L (b, a) = , 2b2 a2 + 4 xy 4 + b2 + 1 subject to b3 + a 2 2b4 + xy 2 + 2a2 (4b2 a) + β2 (2xy + 4a) − β1 2a − 2 2 (4xy 3 ) 0, 4 2 2b a + 4 xy + b
b3 + a 2 2b4 + xy 2 + 2a2 aβ1 2a − 2 2 (4b2 a) + aβ2 (2xy + 4a) − (4xy 3 ) 0, 4 2 2b a + 4 xy + b
β1 , β2 > 0, b 0.
(EMFD) Maximize M (u, v) =
u3 + v 2 2u4 + uv 2 + 2v 2 , 2u2 v 2 + 4 uv 4 + u2 + 1
subject to u3 + v 2 2u4 + uv 2 + 2v 2 4 2 3 2 (4uv (v ) + β + v ) − + 2u) 0, β1 3u2 − (8u 2 2u2 v 2 + 4 uv 4 + u2 + 1 u3 + v 2 2u4 + uv 2 + 2v 2 4 2 3 2 (8u (4uv (v ) + uβ + v ) − + 2u) 0, uβ1 3u2 − 2 2u2 v 2 + 4 uv 4 + u2 + 1
β1 , β2 > 0, v 0.
250
R. Dubey et al.
One can easily verify that b = 9, a = 0, β1 = 3, β2 = 3 is (EMFP) and u = 0, v = 7, β1 = 3, β2 = 5 is (EMFD) feasible. Now, Let U = (U1 , U2 , ..., Uk ) and V = (V1 , V2 , ..., Vk ). Then, we can express the programs (MFP) and (MFD) equivalently as: (M F P )U Minimize U subject to Gfi (fi (b, a) − Ui Ggi (gi (b, a)) = 0, i = 1, 2, ..., k,
(1)
−
(2)
k
βi (Gfi (fi (b, a))∇a fi (b, a)) − Ui Ggi (gi (b, a))∇a gi (b, a) ∈ C2∗ ,
i=1
aT
(3)
k
βi (Gfi (fi (b, a))∇a fi (b, a)) − Ui Ggi (gi (b, a))∇a gi (b, a) ≥ 0,
i=1
β > 0, b ∈ C1 , β T e = 1.
(4) (M F P )U Minimize U subject to
Gfi (fi (u, v)) − Vi Ggi (gi (u, v)) = 0, i = 1, 2, ..., k,
(5) k
(6)
βi (Gfi (fi (u, v))∇b fi (u, v)) − Vi Ggi (gi (u, v))∇b gi (u, v) ∈ C1∗ ,
i=1
uT
(7)
k
βi (Gfi (fi (u, v))∇b fi (u, v)) − Vi Ggi (gi (u, v))∇b gi (u, v) ≤ 0,
i=1
β > 0, v ∈ C2 , β T e = 1.
(8)
Next, we prove duality theorems for (M F P )U and (M F P )V , which one equally apply to (MFP) and (MFD), respectively. Theorem 3.1 (Theorem of Weak duality). Let (b, a, U, β) and (u, v, V, β) be feasible for (MFP)U and (MFD)V , respectively. Let, ∀ i = 1, 2, 3, ..., k such that (i) fi (., v) be (C, Gfi , α, ρi , di )- invex at u for fixed v, (ii) gi (., v) be (C, Ggi , α, ρi , di )- incave at u for fixed v, ¯ Gf , α, (iii) fi (b, .) be (C, ¯ ρ¯i , d¯i )- incave at a for fixed b, i ¯ Gg , α, ¯i , d¯i )- invex at a for fixed b, (iv) gi (b, .) be (C, i ¯ ρ k k βi [1 − Ui ] > 0 and βi [1 − Vi ] > 0, (v) i=1
i=1
(vi) Ggi (gi (b, v)) > 0, ∀ i = 1, 2, ..., k, k 2 (vii) either βi [(1 + Vi )ρi d2i (b, u) + (1 + Ui )ρ¯i d¯i (v, a)] ≥ 0 and
i=1 ρ¯i ≥
0,
or
ρi ≥ 0
Multi-objective Symmetric Fractional Programming
251
¯v,a (b) + bT y ≥ 0, ∀b ∈ C ∗ , (viii) Cb,u (a) + aT u ≥ 0, ∀a ∈ C2∗ and C 1 ¯ : Rm × Rm × Rm → R. where C : Rn × Rn × Rn → R and C Then, U V. Proof. From hypotheses (i) and (ii), we have 1 [Gfi (fi (b, v)) − Gfi (fi (u, v)) − ρi d2i (b, u)] ≥ α(b, u) Cb,u Gfi (fi (u, v))∇b fi (u, v)
(9)
and 1 [−Ggi (gi (b, v)) + Ggi (gi (u, v)) − ρi d2i (b, u)] ≥ α(b, u) − Cb,u Ggi (gi (u, v))∇b gi (u, v) .
(10)
βi Vi βi , and , where τ = βi (1 − Vi ) and (9)–(10), respectively, we obtain τ τ i=1 k
Using (v), β > 0,
βi α(b, u)τ
and βi Vi α(b, u)τ
Gfi (fi (b, v)) − Gfi (fi (u, v)) − ρi d2i (b, u) ≥
βi Cb,u Gfi (fi (u, v))∇b fi (u, v) τ
− Gfi (gi (b, v)) + Gfi (gi (u, v)) − ρi d2i (b, u) ≥ −
βi Vi Cb,u Gfi (gi (u, v))∇b gi (u, v) . τ
Now, summing over i and adding the above two inequalities and using convexity of Cb,u , we have k βi Gfi (fi (b, v)) − Gfi (fi (u, v)) − ρi d2i (b, u) + (11) α(b, u)τ i=1 k i=1
βi Vi α(b, u)τ Cb,u
k i=1
− Ggi (gi (b, v)) + Ggi (gi (u, v)) − ρi d2i (b, u) ≥
βi Gfi (fi (u, v))∇b fi (u, v) − Vi (Ggi (gi (u, v))∇b gi (u, v)) . τ
Now, from (6), we have k
βi [(Gfi (fi (u, v))∇b fi (u, v) − Vi (Ggi (gi (u, v))∇b gi (u, v)] ∈ C1∗ . α(b, u)τ i=1 Hence, for this a, Cb,u (a) ≥ −uT a ≥ 0 from (vii)–(viii) . Using this in (11), we obtain a=
k i=1
k
βi Gfi (fi (b, v)) − Gfi (fi (u, v)) + βi Vi − Ggi (gi (b, v)) + Ggi (gi (u, v)) i=1
≥
k i=1
βi ρi (1 + Vi )d2i (b, u).
252
R. Dubey et al.
Using (5) and hypothesis (vii) in above inequality, we get k
(12)
βi [Gfi (fi (b, v)) − Vi Ggi gi (b, v)] ≥ 0.
i=1
Similarly, from hypotheses (iii)–(v) and from the conditions (vii)–(viii), for b=−
k βi [Gfi (fi (b, a))∇a fi (b, a) − Ui (Ggi (gi (b, a))∇a gi (b, a))] ∈ C2∗ , τ i=1
we get k
(13)
βi [−Gfi (fi (b, v)) + Ui (Ggi (gi (b, v))] ≥
i=1
k
βi ρ¯i (1 + Ui )d¯2i (v, a).
i=1
Adding the inequalities (12)–(13) and using hypothesis (vii), we get k
(14)
βi (Ui − Vi )(Ggi (gi (b, v)) ≥ 0.
i=1
Since β > 0 and using (vi), it follows that U V. This completes the theorem.
¯ be an competent solutions of ¯ , β) Theorem 3.2 (Theorem of Strong duality). Let (¯b, a ¯, U (MFP)U and β = β¯ in (M F D)V . If the corresponding assumptions exist: (i )
k
T ¯))∇a fj (¯b, a ¯) ∇a fj (¯b, a ¯) + Gfj (fj (¯b, a ¯))∇yy fj (¯b, a ¯) β¯j [Gfj (fj (¯b, a
j=1
T ¯j (G (gj (¯b, a −U ¯))∇a gj (¯b, a ¯) ∇a gj (¯b, a ¯) + Ggj (gj (¯b, a ¯))∇yy gj (¯b, a ¯))] gj is positive or negative definite,
(ii)
k ¯j (G (gj (¯b, a (Gf (fj (¯b, a ¯))∇a fj (¯b, a ¯)) − U ¯))∇a gj (¯b, a ¯)) gj j
dent,
j=1
are not linearly depen-
¯j > 0, j = 1, 2, ..., k. (iii) U ¯ is feasible solution for (M F D)V . Moreover, if the assumptions of Theorem 3.1 ¯ , β) Then, (¯b, a ¯, U ¯ is an competent solution of (M F D)V and the objective functions have ¯ , β) exist, then (¯b, a ¯, U same numerical values. Proof. The proof can be acquired on the lines of Theorem 3.1.
¯ be an competent solution Theorem 3.3 (Theorem of Converse duality). Suppose (¯ u, v¯, V¯ , β) of (MFD)V and fix β = β¯ in (MFP)U . If the corresponding stipulations exist: (i)
k
u, v¯))∇b fj (¯ u, v¯) (∇b fj (¯ u, v¯))T + Gfj (fj (¯ u, v¯))∇bb fj (¯ u, v¯) β¯j [Gfj (fj (¯
j=1
− V¯j (Ggj (gj (¯ u, v¯))∇b gj (¯ u, v¯) (∇b gj (¯ u, v¯))T + Ggj (gj (¯ u, v¯))∇bb gj (¯ u, v¯))] is negative or positive definite,
k Gf (fj (¯ u, v¯))∇b fj (¯ u, v¯) − V¯j (Ggj (gj (¯ u, v¯))∇b gj (¯ u, v¯)) are not linearly dependent, j j=1 ¯ (iii) Vj > 0, j = 1, 2, ..., k. (ii)
¯ is feasible solution of (MFP)U . Moreover, if the assumptions of Theorem 3.1 Then, (¯ u, v¯, V¯ , β) ¯ is an competent solution of (MFP)U and the objective functions have hold, then (¯ u, v¯, V¯ , β) same numerical values. Proof. The Proof can be acquired on the lines of Theorem 3.2.
Multi-objective Symmetric Fractional Programming
253
References 1. Verma, K., Mathur, P., Gulati, T.R.: A new approach on mixed type nondifferentiable higher order symmetric duality. J. Oper. Res. Soc. China (2018). https:// doi.org/10.1007/s40305-018-0213-7 2. Gao, Y.: Higher order symmetric duality in multi-objective programming problems. Acta Mathematicae Applicate Sincia. English Series 32, 485–494 (2016) 3. Mishra, S.K.: Second order symmetric duality in mathematical programming with Fconvexity. Eur. J. Oper. Res. 127, 507–518 (2000) 4. Mond, B., Weir, T.: Generalized convexity and duality. In: Schaible, S., Ziemba, W.T. (eds.) Generalized Convexity in Optimization and Economics, pp. 263–280. Academic Press, New York (1981) 5. Hanson, M.A.: On sufficiency on the Kuhn-Tucker conditions. J. Math. Anal. Appl. 80, 545–550 (1981) 6. Antczak, T.: New optimality conditions and duality results of G-type in differentiable mathematical programming. Nonlinear Anal. 66, 1617–1632 (2007) 7. Antczak, T.: On G-invex multi-objective programming. Part I. Optimality. J. Global Optim. 43, 97–109 (2009) 8. Kang, Y.M., Kim, D.S., Kim, M.H.: Optimality conditions of G-type in locally Lipchitz multi-objective programming. Vietnam J. Math. 40, 275–285 (2012) 9. Ferrara, M., Viorica-Stefaneseu, M.: Optimality conditions and duality in multiobjective programming with (φ, ρ)-invexity. Yugoslav J. Oper. Res. 18, 153–165 (2008) 10. Viorica-Stefaneseu, M., Ferrara, M.: Multi-objective programming with new invexities. Optim. Lett. 7, 855–870 (2013) 11. Egudo, R.: Multi-objective fractional duality. Bull. Aust. Math. Soc. 37, 367–378 (1988) 12. Long, X.J.: Optimality conditions and duality for nondifferentiable multi-objective fractional programming problems with (C, α, ρ, d)-convexity. J. Optim. Appl. 148, 197–208 (2011) 13. Brumelle, S.: Duality for multiple objective convex programs. Math. Oper. Res. 6, 159–172 (1981)
Wind Power Intra-day Multi-step Predictions Using PDE Sum Models of Polynomial Networks Based on the PDE Conversion and Substitution with the L-Transformation Ladislav Zjavka1(&), Václav Snášel1, and Ajith Abraham2 1
2
Department of Computer Science, Faculty of Electrical Engineering and Computer Science, VŠB-Technical University of Ostrava, Ostrava, Czech Republic {ladislav.zjavka,vaclav.snasel}@vsb.cz Machine Intelligence Research Labs (MIR Labs), Auburn, WA 98071, USA [email protected]
Abstract. Precise forecasts of wind power are required as they allow full integration of wind farms into the electrical grid and their active operation. Their daily base poses a challenge due to the chaotic nature of global atmospheric dynamical processes resulting in local wind fluctuations and waves. Surface wind forecasts of NWP models are not fully adapted to local anomalies which can influence significantly in addition its temporal-flow. AI methods using historical observations can convert or refine forecasts in consideration of wind farm location, topography and hub positions. Their independent intra-day wind-power predictions are more precise then those based on NWP data as these are usually produced every 6 h with a delay. The designed AI method combines structures of polynomial networks with some mathematic techniques to decompose and substitute for the n-variable linear Partial Differential Equation, which allow complex representation of unknown dynamic systems. Particular 2-variable PDEs, produced in network nodes, are converted using the Laplace transformed derivatives. The inverse L-transformation is applied to the resulting pure rational terms to obtain the originals of unknown node functions, whose sum is the composite PDE model. Statistical models are developed with data samples from estimated periods of the last days which optimally represent spatial patterns in the current weather. They process the latest available data to predict wind power in the next 1–12 h according to the trained data inputs ! output time-shift. Keywords: Polynomial Neural Network General partial differential equation Polynomial PDE substitution of operational calculus External Complement
1 Introduction Regional NWP systems solve sets of primitive PDEs to simulate each particle behavior for the ideal gas flow in a defined time and grid-resolution. They can solve PDEs to describe additional surface wind factors which can particularize the local wind speed © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 254–265, 2021. https://doi.org/10.1007/978-3-030-49345-5_27
Wind Power Intra-Day Multi-Step Predictions Using PDE Sum Models
255
forecasts [4]. Atmospheric pressure and temperature distribution in global scales are the major factors which largely influence overall weather character. Local wind waves, gusts, unstable direction or abnormalities in surface temperature affects the induced power on minor range but these variations must be also considered as plant operation side-effects whose relevance increases in short-time horizon. Wind power prediction methods can predict either direct wind power or wind speed initially to convert it in a 2-stage procedure. They use usually 2 main approaches [3]: • Physical, using numerical solutions of PDEs • Statistical, using data observation series to form prediction models Artificial Intelligence (AI) adaptive techniques need not wait for NWP data, which are usually provided with a several hour delay. However, predictions of their autonomous statistical models are usually valuable up to several-hour horizon. They can post-process local NWP outputs to better account for local specifics and the surface disparity but this results in high correlation with the forecast accuracy. However, in a case of incoming frontal zones, events and their disturbances, NWP data allow representation of changed unknown weather patterns which are not included in the training data set [5]. Regression or AI methods do not aim at representation of weather phenomena using the physical consideration. They model statistical relationships between relevant inputs ! output quantities [8]. The main objective is to choose the appropriate NWP or statistical approach according to the current situation and data analysis to develop the optimal prediction model with the minimal errors or failures. Complexity of composite sum PDE models is adequate to the patterns in local weather. Standard AI computing techniques require data pre-processing which usually significantly reduces the number of input variables, and results in simplification of the models. Polynomial Neural Networks (PNN) use regression where the number of parameters grows exponentially along with the number of input variables. PNN decompose the general inputs ! outputs connections, expressed by the KolmogorovGabor polynomial (1). Y ¼ a0 þ
n X i¼1
ai x i þ
n X n X
aij xi xj þ
i¼1 j¼1
n X n X n X
aijk xi xj xk þ . . .
ð1Þ
i¼1 j¼1 k¼1
n - number of input variables xi ai, aij, aijk, … - polynomial parameters Group Method of Data Handling (GMDH) evolves gradually multi-layer structures of PNN, adding layer by layer to calculate the polynomial parameters of the chosen nodes, which approximate to the best the target function. PNN decompose system complexity into a number of simple relationships, each described by low order polynomial node functions (2) for every pair of input variables xi, xj [1]. y ¼ a0 þ a1 xi þ a2 xj þ a3 xi xj þ a4 x2i þ a5 x2j xi, xj - input variables of polynomial neuron nodes
ð2Þ
256
L. Zjavka et al.
Differential Polynomial Neural Network (D-PNN) is a recent neuro-computing technique using adapted 2-stage procedures of Operational Calculus (OC). It decomposes the n-variable linear PDE into to particular node sub-PDEs, whose various combinations allow complex representation of unknown dynamic processes. D-PNN combines principles of the self-organizing multi-layer structures with mathematical methods of PDE solutions. It selects the best 2-inputs in PNN nodes to produce applicable PDE components. The 1st step OC polynomial PDE conversion leads to rational terms which are the Laplace images of unknown node functions. The inverse L-transform is applied to them in the 2nd step to obtain the node originals whose sum is used in the complete PDE model of the searched separable output functions. D-PNN uses External Complement in its training and testing, which usually allow the optimal representation of a problem [1]. Statistical models are developed for each 1–12-h inputs ! output time-shift of spatial data observations to predict wind power at particular hours [6]. The D-PNN intra-day predictions are more accurate than those based on adapted middle-term NWP forecasts or standard statistical approaches using only a few input variables in the simple AI or regression models (Sect. 5) [9].
2 Intra-day Multi-step Statistical Wind Power Prediction The proposed multi-step procedure, based on the statistical approach, pre-estimates at first the optimal numbers of the last days, whose data are used to elicit prediction models. The optimal daily training periods are initially determined using assistant test models according to their best approximation of the desired output in the last 6-h. The model development is analogous to the prediction one but its output is continually tested with the reserved latest power measurements. The lowest testing errors indicate the optimal training parameters [7].
Data observaons
The last data
Training (2-x days)
Test (6h.) days
Input me-series: Temperature Relat.humidity Pressure Wind speed & direcon
Time-shi 1- 12 h.
Desired wind power
D-PNN
Model inputs
Predicon 1-12 hours ahead
The latest data Input me-series: Temperature Relat.humidity Pressure Wind speed & direcon
Predicon each hour Time-shi 1- 12 h.
Wind power predicons
Models 1…12
Fig. 1. D-PNN is trained with spatial data observations from the estimated period of the last few days for each inputs ! output time-shift 1–12 h (blue-left) to develop PDE models which apply the latest data inputs to predict wind power in the trained time-horizon (red-right)
Wind Power Intra-Day Multi-Step Predictions Using PDE Sum Models
257
The estimated numbers of daily data samples with an increased inputs ! output time-shift are used to elicit the regression or AI statistical models. These are applied to the latest morning input data to predict wind power in the trained horizon 1–12 h ahead. Separate intra-day models are developed to represent the optimal inputs ! output data relations for the particular time-shift in each hour prediction (Fig. 1). A similarity between training and testing data patterns, characterized by a sort of settled weather over few-day periods, allow to develop prediction models applicable to unseen data. Incoming frontal breaks or disturbances result in various different conditions, which are difficult to model using only the latest data [6].
3 A Polynomial PDE Substitution Using the L-Transformation D-PNN defines and substitutes for the general linear PDE (3) which can describe unknown complex dynamic systems. It decomposes the n-variable PDE into 2-variable specific sub-PDEs in PNN nodes. These can be solved using OC to model unknown node functions uk whose sum is the searched n-variable u function (3). a þ bu þ
n X
ci
i¼1
n X n @u X @2u þ dij þ... ¼ 0 @xi @xi @xj i¼1 j¼1
u¼
1 X
uk
ð3Þ
k¼1
u(x1, x2,, …, xn) - unknown separable function of n-input variables a, b, ci, dij, … - weights of terms ui - partial functions Particular 2-variable linear 1st or 2nd order PDEs, formed in PNN nodes, can be expressed with 8 equality variables (4).
@u @u @ 2 u @ 2 u @ 2 u F x1 ; x2 ; u; ; ; ; ; @x1 @x2 @x21 @x1 x2 @x22
¼0
ð4Þ
uk - node partial sum functions of an unknown separable function u The OC conversion of specific PDEs (4) is based on the proposition of the Ltransforms of function nth derivatives in consideration of the initial and boundary conditions (5). Lff n ðtÞg ¼ pn F ð pÞ
n X
ði1Þ
pni f0 þ
Lff ðtÞg ¼ F ð pÞ
k¼1
f(t), f’(t), …, f(n)(t) - originals continuous in p, t - complex and real variables
ð5Þ
258
L. Zjavka et al.
This polynomial substitution for the f(t) function nth derivatives in an Ordinary Differential Equation (ODE) results in algebraic equations from which the L-transform F(p) of the searched function f(t) can be separated as a pure rational function (6). It is expressed in the complex form with the complex number p, so that the inverse Ltransformation is necessary to obtain the original functions f(t) of a real variable t (6) described by the ODE [2]. F ð pÞ ¼
n Pð pÞ X Pðak Þ 1 ¼ Qð pÞ k¼1 Qk ðak Þ p ak
f ðt Þ ¼
n X Pðak Þ ak t e Q ða Þ k¼1 k k
ð6Þ
ak - simple real roots of the multinomial Q(p) F(p) - L-transform image Pure rational terms (6), whose polynomial degree correspond to specific 2-variable sub-PDEs (4), are produced in D-PNN node blocks (Fig. 2) (4), using the OC based conversion (5). The inverse L-transformation is analogously applied to the corresponding L-images (6), to obtain the originals of unknown uk node functions which are summed in the output model of the separable n-variable u function (3). Each block node calculates its GMDH polynomial (2) output which is applied in the next layer node inputs. Blocks contain 2 vectors of adaptable polynomial parameters a, b to form rational functions - neurons, i.e. specific sub-PDE converts (7). One of its inverse Ltransformed neurons can be selected to be included in the model output sum [6]. yi ¼ w i
b0 þ b1 x1 þ b2 sigðx21 Þ þ b3 x2 þ b4 sigðx22 Þ eu a0 þ a1 x1 þ a2 x2 þ a3 x1 x2 þ a4 sigðx21 Þ þ a5 sigðx22 Þ
ð7Þ
u = arctg(x1/x2) - phase representation of 2 input variables x1, x2 ai, bi - polynomial parameters wi - weights sig - sigmoidal Eulers’s notation of complex variables (8) defines the phase which can replaces the inverse L-transformation eu (7) of converted sub-PDEs expressed in the complex form with p (6). The pure rational term (8) corresponds to the r radius (amplitude). p ¼ x1 þ i x2 |{z} |{z} Re
Im
qffiffiffiffiffiffiffiffiffiffiffiffiffiffi i arctan ¼ x21 þ x22 e
x2 x1
¼ r eiu ¼ r ðcos u þ i sin uÞ
ð8Þ
Wind Power Intra-Day Multi-Step Predictions Using PDE Sum Models x1
x2
259
Input variables
2ndorder sub-PDE solution
Block output
CT = composite terms
Fig. 2. Blocks form derivative neurons – PNN-node sub-PDE solutions
4 PDE Decomposition Using Backward Multi-layer Structures Multi-layer PNN form composite polynomials (9). Blocks in nodes of the 2nd and next layers can produce in addition Composite Terms (CT) which are equivalent to the simple neurons in calculation of the D-PNN sum output. CTs substitute for the subPDEs with respect to input variables of back-connected node blocks of the previous layers (Fig. 3) using the product of their Laplace images according to the composite function partial derivation rules (10). F ðx1 ; x2 ; . . . ; xn Þ ¼ f ðz1 ; z2 ; . . . ; zm Þ ¼ f ð/1 ð X Þ; /2 ð X Þ; . . .; /m ð X ÞÞ m @F X @f ðz1 ; z2 ; . . .; zm Þ @/i ðXÞ ¼ @xk @zi @xk i¼1
k ¼ 1; . . . ; n
ð9Þ ð10Þ
The 3rd layer blocks, for example, can select from additional CTs using products of sub-PDE converts. i.e. the neuron L-images, of 2 and 4 back-connected blocks in the previous 2nd and 1st layers (11). The number of possible CT combinations in blocks doubles along with each back-joined preceding layer (Fig. 3). b0 þ b1 x21 þ b2 x221 þ b3 x22 þ b4 x222 a0 þ a1 x21 þ a2 x22 þ a3 x21 x22 þ a4 x221 þ a5 x222 b0 þ b1 x12 þ b2 x212 P12 ðx1 ; x2 Þ u31 e 2 2 a0 þ a1 x11 þ a2 x12 þ a3 x11 x12 þ a4 x11 þ a5 x12 Q12 ðx1 ; x2 Þ
y31 ¼ w31
Qij, Pij = GMDH output and reduced polynomials of n and n-1th degree ykp - pth Composite Term (CT) output u21 = arctg(x11/x13) u31 = arctg(x21/x22) ckl - complex representation of the lth block inputs xi, xj in the kth layer
ð11Þ
260
L. Zjavka et al.
The CTs are the products of sub-PDE solutions of the external function (i.e. the L−1 transformed image) in the starting node block and selected neurons (i.e. the internal function images) of back-connected blocks in the previous layers (11).
Fig. 3. D-PNN selects from possible 2-input combination node blocks to produce applicable sum PDE components - neurons and CTs
The D-PNN output Y is the arithmetic mean of the outputs of selected active neurons + CTs in node blocks to simplify and speed-up the parameters adaptation (12). Y¼
k 1X yi k i¼1
k ¼ the number of active neurons þ CTs ðnode PDE solutionsÞ ð12Þ
Algorithms based on multi-objective optimization can perform the formation and “back-production” of single neurons and CTs in the 2-node tree-like PNN structure (Fig. 3). D-PNN selects the best 2-input combinations in each layer node (analogous to GMDH) to produce applicable sum PDE model components. Their polynomial parameters and weights are pre-optimized using the Gradient method [11]. This iteration algorithm skips from the actual to the next node block, one by one, to select and adapt one of its neurons or CTs. D-PNN training error is minimized in consideration of a continual test using the External Complement of GMDH. A convergent combination of selected node sub-PDE solutions can form the optimal sum model [10].
Wind Power Intra-Day Multi-Step Predictions Using PDE Sum Models
RMSE ¼
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uM uP d u ðYi Yi Þ2 ti¼1 M
! min
261
ð13Þ
Yi - produced and Ydi - desired D-PNN output for ith training vector of M-data samples The Root Mean Squared Error (RMSE) is calculated in each iteration step of the training and testing to be gradually minimized (13).
5 Prediction Experiments Using the Estimated Data Periods D-PNN applied time-series of 20 data inputs to predict wind power 1–12 h ahead in the central wind farm at Drahany, Czech Republic. The last 6-h data were reserved for the continual test, i.e. these samples were not used to adapt the model parameters but only to calculate the testing error and control the training. Additional spatial historical data (wind speed and direction) of 3 surrounding wind farms and meteorological observations (avg. temperature, relative humidity, see level pressure, wind speed and azimuth) from 2 nearby airports extended the input vector1 (Fig. 4).
Airport (temperature, humidity, pressure, wind speed and direction)
Wind farm (wind power, speed and direction)
Forecasted locality
Fig. 4. Spatial location denotation of the observation data
Standard regression SVM using the dot kernel and the GMDH Shell for Data Science, a professional self-optimizing forecasting software, were used to compare performance of the models. Their training and testing were analogous to the D-PNN
1
Weather underground historical data series: www.wunderground.com/history/airport/LKTB/2016/7/ 22/DailyHistory.html.
262
L. Zjavka et al.
multi-step procedure (Fig. 1) using data from the estimated daily periods. GMDH searches for the most valuable 2-inputs in PNN nodes [1], analogous to D-PNN. This feature selection improves the accuracy of the prediction models (Fig. 5–Fig. 6).
Fig. 5. Drahany 13.5.2011 - RMSE: D-PNN=356.1, SVM=587.8, GMDH=403.1
Fig. 6. Drahany 18.5.2011 - RMSE: D-PNN=117.6, SVM=274.2, GMDH=121.4
The performance of the presented D-PNN and standard AI models was compared with regression of 2 conventional methods - Exponential Smoothing (ES) and Linear Regression (LR) in 2-week 12-h intra-day predictions, from May 12 to 25, 2011 (Fig. 7). ES and LR process only the historical time-series of wind power and apply their previous time-step predictions as input data in the next-time steps.
Wind Power Intra-Day Multi-Step Predictions Using PDE Sum Models
263
Fig. 7. 2-week intra-day 1–12 h time-shift power prediction errors:
D-PNN=162.8, SVM=276.5, GMDH=210.6, Smooth=268.6, Regress=313.3
The AI models can predict the real power course in most cases. Their formation is problematic, if training data patterns in catchy wind do not correspond to the latest changed conditions in the predicted capful days with an intermittent or stable low power output (Fig. 6). The wind speed values vary under or around the power generation limit (about 400 kW), which causes difficulties and failures in the predictions. NWP data can be analyzed to detect these days in order to extend the training periods or extract exactly days with similar data patterns. The AI predictions in changeable weather (Fig. 5) usually succeed as the training data better characterize wind variations in predicted hours. The SVM output can alternate in subsequent time-steps (Fig. 5 and Fig. 6), which increases prediction errors. SVM need more precise estimations of the optimal training periods than D-PNN or GMDH. These selective methods can apply different numbers of the last data samples (Fig. 8) to produce the models with similar predictions.
Fig. 8. The model initialization times – estimated daily training periods used to develop the prediction models in the monitored 2-week interval
264
L. Zjavka et al.
The ES and LR predictions represent mostly a simple course or linear trend in the power series progress. ES can rarely predict a round course of power series. Both the statistics can succeed in calm days with gentle wind speed alterations (Fig. 6), which follow some catchy periods. This is caused be simplicity of the models an uncomplicated calculation resulting in the flat output. ES and LR require also estimations of the optimal periods (Fig. 8) whose data samples they use to calculate the parameters [9].
6 Conclusions D-PNN is a novel neuro-computing method combining self-organizing PNN structures with adapted mathematical techniques to decompose and solve n-variable PDEs. Its selective sum PDE solutions can model the local weather dynamics. D-PNN can predict real wind power alterations in catchy days. The predictions are less valuable if calm wind days follow a break change in weather. The compared AI and conventional regression techniques are not able to model the complexity of local weather patterns in most of the predicted days. D-PNN can analogously predict the intra-day production of the photo-voltaic (PV) energy using additional input data of clear sky index, cloudiness cover or sky conditions [8]. Statistical models need to apply additional NWP data in the middle-term 24–48-h prediction horizon2. The presented wind power intra-day predictions are more precise than AI converted wind speed forecasts of meso-scale NWP systems, which cannot fully consider local specifics. Acknowledgements. This work was supported from European Regional Development Fund (ERDF) “A Research Platform focused on Industry 4.0 and Robotics in Ostrava”, under Grant No. CZ.02.1.01/0.0/0.0/17 049/0008425.
References 1. Anastasakis, L., Mort, N.: The development of self-organization techniques in modelling: a review of the Group Method of Data Handling (GMDH). The University of Sheffield (2001) 2. Berg, L.: Introduction To The Operational Calculus. North-Holland Series on Applied Mathematics and Mechanics, vol. 2. North-Holland, New York (1967) 3. Monteiro, C., Bessa, R., Miranda, V., Botterud, A., Wang, J., Conzelmann, G.: Wind power forecasting: state of the art 2009. Report No.: ANL/DIS-10-1. Argonne National Laboratory, Argonne, Illinois (2009) 4. Wang, J., Song, Y., Liu, F., Hou, R.: Analysis and application of forecasting models in wind power integration: a review of multi-step-ahead wind speed forecasting models. Renew. Sustain. Energy Rev. 60, 960–981 (2016) 5. Yan, J., Liu, Y., Han, S., Wang, Y., Feng, S.: Reviews on uncertainty analysis of wind power forecasting. Renew. Sustain. Energy Rev. 52, 1322–1330 (2015) 6. Zjavka, L.: Wind speed forecast correction models using polynomial neural networks. Renew. Energy 83, 998–1006 (2015)
2
Weather underground tabular forecasts: www.wunderground.com/cgi-bin/findweather/getForecast? query=LKMT.
Wind Power Intra-Day Multi-Step Predictions Using PDE Sum Models
265
7. Zjavka, L.: Multi-site post-processing of numerical forecasts using a polynomial network substitution for the general differential equation based on operational calculus. Appl. Soft Comput. 73, 192–202 (2018) 8. Zjavka, L., Krömer, P., Mišák, S., Snášel, V.: Modeling the photovoltaic output power using the differential polynomial network and evolutional fuzzy rules. Math. Model. Anal. 22, 78– 94 (2017) 9. Zjavka, L., Mišák, S.: Direct wind power forecasting using a polynomial decomposition of the general differential equation. IEEE Trans. Sustain. Energy 9, 1529–1539 (2018) 10. Zjavka, L., Pedrycz, W.: Constructing general partial differential equations using polynomial and neural network. Neural Netw. 73, 58–69 (2016) 11. Zjavka, L., Snášel, V.: Constructing ordinary sum differential equations using polynomial networks. Inf. Sci. 281, 462–477 (2014)
Optimization of Application-Specific L1 Cache Translation Functions of the LEON3 Processor Nam Ho1 , Paul Kaufmann2(B) , and Marco Platzner3 1
Institute for Advanced Simulation, J¨ ulich, Germany 2 Mainz University, Mainz, Germany [email protected] 3 Paderborn University, Paderborn, Germany
Abstract. Reconfigurable caches offer an intriguing opportunity to tailor cache behavior to applications for better run-times and energy consumptions. While one may adapt structural cache parameters such as cache and block sizes, we adapt the memory-address-to-cache-index mapping function to the needs of an application. Using a LEON3 embedded multi-core processor with reconfigurable cache mappings, a metaheuristic search procedure, and Mibench applications, we show in this work how to accurately compare non-deterministic performances of applications and how to use this information to implement an optimization procedure that evolves application-specific cache mappings.
1
Introduction
One of the major challenges processor architects have to face is the so-called memory bottleneck: the disparity in the frequencies of memory requests of processing units and latencies of DRAMs. To mask the memory access delays, multilayered cache hierarchies have been introduced in the 70s [1], mimicking a highspeed and large main memory while using slow and inexpensive DRAM chips. This principle has been used successfully for many decades before running into performance issues in the last years. For instance, processing large data structures fills caches with data lacking temporal locality, deteriorating performances of other tasks executed on the same processor [2]. Varying memory access pattern types of applications executed by many-core systems interfere with each other making it much more difficult for the cache to implement a coherent memory model efficiently. The consequence of these effects is that the processor manufacturers have introduced reconfigurability to caches to allow to adapt the cache behavior to the requirements of applications [3]. Reconfigurable caches have only recently found their way into off-the-shelf processors [2]. Research in this area has started earlier, with the rise of reconfigurable logic in the 90s. The primary motivation for reconfigurable caches is that while well-configured caches with fixed architecture perform well for a broad range of c The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 266–276, 2021. https://doi.org/10.1007/978-3-030-49345-5_28
Evolution of Cache Mappings
267
applications, it is sometimes desirable to change the configuration of the cache to handle applications with atypical memory access patterns more energy-efficiently or to tailor a cache to specific use cases such as numerical simulation. The research on reconfigurable caches subdivides roughly into structurally reconfigurable caches and application-specific memory-to-cache-index mapping functions. The work on structural cache reconfiguration investigates the benefits of dynamically changing the geometry of the cache memories, i.e., the size of the cache, the number of ways, and the number of cache blocks. Work on reconfigurable memory-address-to-cache index functions has the goal of distributing accesses to cache memories more evenly for a reduced number of conflict misses. Usually, permutation- and single-level XOR-based functions, as well as primemoduli and multi-stage mappings, are used in related work [4]. While reconfigurability helps caches to improve their performance, trade-offs such as larger logic areas, longer hit times, and bigger overall number of memory cells may arise. The memory overhead is, to some extent, bearable since today most processor designs are not restricted by silicon area but by performance and performance per energy. The increase in hit time is more critical. For any embedded processors with clock frequencies well below one GHz, the pressure on the timing is moderate. High-performance processors, on the other hand, have several levels of cache where only the first level is optimized for hit time. Here, the reconfigurability can be applied to higher-level caches more easily. In this work, we present an FPGA-based implementation of a processor that is able to reconfigure its memory-to-cache-index mapping functions freely. We show the challenge and the solution of how to reliably compare performances of non-deterministic applications and present an implementation of an optimization procedure that can evolve application-specific cache mappings.
2
Related Work
A conventional cache consists of one or multiple separate memories, called ways, that are usually word addressable. 2l consecutive words define a cache block and 2k blocks define a way. Whenever a processor would like to access data, it puts the according memory address A = [an−1 . . . a0 ] = [tm−1 . . . t0 ][ik−1 . . . i0 ][bl−1 . . . b0 ][a1 a0 ] onto the memory bus. The first level cache checks, whether it stores the requested data in one of its ways by splitting the memory address into the block offset B = [bl−1 . . . b0 ] = [al+2−1 . . . a2 ], block or set index I = [ik−1 . . . i0 ] = [ak+l+2−1 . . . al+2 ], and a tag T = [tm−1 . . . t0 ] = [am+k+l+2−1=n−1 . . . ak+l+2 ].
268
N. Ho et al.
If one of the cache blocks in the set selected by the block index bits I contains valid data and the stored tag is identical to the tag bits T , the requested data is returned to the processor. Otherwise, the cache passes the request to the next memory stage. In a non-conventional cache mapping scheme, the tag and index bits are transformed by two functions, f and g, before relaying their outputs to the cache. A common approach is, for instance, implementing g as a permutation on the bits I ⊂ T ∪I and setting f = id((T ∪I)\I ). The elegance of this approach, investigated, for instance, by Givargis [5] and Patel et al. [6], is that shuffling the cache index lines does not require changing any of the other components of a cache, preserving the hit-times and implementation simplicity. The idea of permuting index bits can be expanded into using more complex translation functions with few layers of logic. Vandierendonck et al. introduced a layer of XOR gates for computing the index bits in a cache [7]. A heuristic evolved the connection patterns between the memory address and cache index bits. Wang et al. investigated a similar approach for GPUs in [8]. The function g may also be implemented as a multi-stage function, e.g., g1 and g2 . In the case of a first-stage miss using g1 , the next-stage function g2 is re-evaluated at the same cache level to examine, whether the requested memory cell is probably stored at a back-up location. This allows reducing the miss-rate of a two set-associative cache to the level of a four set-associative cache with negligible hardware overhead [9]. Another way of defining the memory-to-cache-index function is to use a different modulus. Diamond et al. [10] investigated prime and non-prime moduli to minimize bank conflicts of a GPU. Kim et al. [4] compared XOR-based permutation, polynomial modulus, prime modulus, and own indexing schemes for various GPGPU workloads and were able to reduce the computation time and energy consumption significantly. In our approach, we define the cache index translation function g as a multilevel circuit composed out of 2-input Look-up Tables (LUT). Such a function can compute any Boolean function of a certain size [11]. The configuration of the function is evolved by a heuristic algorithm and is application-specific. Once a good configuration has been found, the application can use it during its regular execution time without any additional optimization overhead. Our approach can be seen as the extension of the work of Vandierendonck et al. [7]. The difference is that the index translation functions of Vandierendonck et al. are using a single level of XOR gates. The achievable complexity of these translation functions is lower than in our approach. We hope that with more complex translation functions, higher miss rate reductions become possible.
3
Reconfigurable Hardware Architecture
For this work, we are using the embedded open-source LEON3 processor with one level of private instruction and data caches per core [12]. An open-source L2 cache implementation was not available for LEON3 at the starting time of the project.
Evolution of Cache Mappings
269
The reconfigurable cache mapping architecture consists of Cache Mapping Controllers (CMC), the Reconfiguration Controller (RC), and the Reconfigurable Blocks (RCB). The task of the CMCs that are located in each core is to relay reconfiguration requests from the cores to the RC and to manage RCBs’ cache mapping reconfiguration process. The RC gets and serializes reconfiguration requests from CMCs and uses DMA to fetch reconfiguration bitstreams for the RCBs efficiently. The RCBs implement the reconfigurable cache mapping functions. Each core has a set of three active RCBs (L1:I, L1:D, and the snooping mechanism of L1:D) and at least one set of shadow RCBs for masking the reconfiguration time. LEON3’s L1:I caches do not implement a coherent memory model and need, therefore, no synchronization of modified cache blocks among the cores. An RCB implements a 16 × 5 grid of 2-input LUTs embedded into a feedforward butterfly network (cf. Fig. 1(a)). While Xilinx’ SRLC32E primitive can reconfigure LUTs at run-time, no such method exists in the public domain for run-time reconfiguration of FPGA’s routing. The butterfly network offers a solution to this situation. It allows a primary output to be a function computed on any of the primary inputs. Identification of an appropriate configuration of the LUTs is subject to the optimization algorithm. The LUTS can be reconfigured in four cycles (cf. Fig. 1(b)).
Fig. 1. (a) The butterfly network of an RCB with 4 × 3 LUTs. (b) The architecture of a reconfigurable LUT and the reconfiguration architecture.
The reconfiguration of cache mappings is done through a Linux driver. The driver allows userspace programs to load and read cache mapping configuration bitstreams and test their functionality. During the power-up of the LEON3 cores, all RCBs are initialized with the conventional modulo mapping to mimic the standard behavior of a cache. The benchmarks are metered through the Linux’ perf tool command [13]. The implementation overheads for the presented modules are shown in the left part of Table 1. The baseline configuration is a LEON3 4-core processor synthesized with direct-mapped 4 KB instruction and data caches. The implementation consumes 13 Distributed RAMs (DRAMs), where RAM 32x1D primitives are used. There is an overhead for the implementation of cache memories/controllers, as the comparators for hit/miss detection are wider. In the original
270
N. Ho et al.
LEON3 cache implementation, not all bits of Block RAMs (BRAM), which are used to store cache tags and blocks, are employed. This results in steady BRAM usage. Table 1. Hardware resources used by a LEON3 core and the parameters of the LEON3 platform implementing reconfigurable cache mappings. Generic System Configuration FFs
LUTs
DRAMs
BRAMs
RC
176
557
13
0
RCB &
2972
1558
80 × 6
(RAM32x1Ds) 0
(SRL16Es)
Controllers Cache Controllers 4 KB,1-way
969
2543
0
0
overhead
39.4%
23.8%
0.0%
0.0%
Cache Tags & Memories
Parameters
Configuration
Clock Frequency
50 Mhz
Floating Point
Hardware/-Software
Memory
1 GB DRAM
I/D-TLB
8 entries
Linux Kernel
2.6.36.4 from Gaisler
Cache Configuration (with/-without FPU hardware: 4 cores) L1:I &
4 KB,1-way
46
47
0
7
overhead
21.1%
17.5%
0.0%
0.0%
4 KB:1-way
L1:D
{16,32}-bytes/line
Coherency
Snooping Protocol
The right part of Table 1 summarizes the parameters of the prototype system. The prototype is implemented on a Xilinx ML605 board equipped with a Virtex6 FPGA. The reconfigurable circuits are implemented for L1:I and L1:D caches.
4
The Algorithmic Methodology
A specification of an heuristic optimization procedure requires the definition of the goal function, the encoding model, the encoding model manipulation operators, and the optimization algorithm itself. This section describes these components. 4.1
Accurate Estimation of the Performance of an Application
Accurate estimation of the performance of a computing system depends on various direct and implicit factors. Usually, the performance is derived for an application and its data of certain size. Factors like the interdependence with concurrently executed applications competing for the same resources and the overhead by the performance measurement system are minimized as far as possible. But even then, experiments repeated under identical conditions still may produce varying performance numbers. Hence, characteristic performance of a candidate cache mapping has to be derived in multiple experiments, and the better candidate cache mapping has to be identified by a statistical test. The performance of caches is often measured by the miss rate. Our optimization algorithm uses therefore as the objective function the Misses Per Kilo Instructions (MPKI) metric defined as MPKI =
M × 100, IC
Evolution of Cache Mappings
271
where M and IC are the numbers of misses and retired instructions, respectively. As a well-performing cache mapping should excel for a wide range of potential input vectors, the candidates cache mappings are optimized using a set of 4 data vectors, that are selected to be as different and as representative as possible. To be able to aggregate MPKI values for a set of input vectors, the MPKI values are normalized to the miss rate of the modulo cache mapping function. That is, for an application app, an input vector ∈ I = {i1 , i2 , . . . }, a candidate cache mapping function candidate and the reference modulo cache mapping function modulo, the normalized MPKI is defined as: MPKI1 =
MPKI(app, input vector, candidate) . MPKI(app, input vector, modulo)
DMPKI1 and IMPKI1 represent the MPKI1 metric for the data and instruction caches of an split L1 cache. Sequences of normalized {D/I}MPKI1 values are the basis for mutual comparisons of candidate cache mappings during the optimization. Once a good candidate cache mapping has been found for an application, the mapping’s performance is validated using input data vectors that have not been employed during the training. 4.2
Efficient Evaluation of the Functional Quality
To identify the central tendency of the MPKI values, an application needs to be executed together with its candidate cache mapping and an input data vector multiple times. To reduce the number of executions as the most time-consuming operation, we employ the following scheme: Assuming an optimization algorithm is trying to find an efficient cache mapping for the CJPEG application and its L1:D cache. The optimization algorithm creates a new candidate cache mapping and would like to compare it to its best candidate cache mapping found so far. In the first step, CJPEG and the new candidate cache mapping are evaluated on four training pictures four times. The 16 measurements are normalized and statistically compared to 48 measurements that have been computed before for the best candidate cache mapping. If the test doesn’t confirm the new candidate be inferior, CJPEG and the new candidate cache mapping is evaluated on the four pictures four times again. Now 32 measurements of the new candidate are statistically compared to the 48 measurements of the best. If not inferior, the new candidate is measured 16 times more. If after 48 measurements still no statistical evidence for different medians of currently best and the new candidate cache mappings can be derived, the mapping with the best median miss rate wins. If the new candidate can be identified as inferior in one of the tests, it is skipped immediately, and the optimization algorithm proceeds with creating another new candidate cache mapping. The presented performance evaluation scheme allows to significantly reduce the number of application evaluations and the overall optimization time. We are using the Wilcoxon rank-sum test to identify differences in the medians. The selection of this test bases on the observation that the Shapiro-Wilk,
272
N. Ho et al.
Kolmogorov-Smirnov, and Anderson-Darling tests report at α = 5% for 47.9%, 41.1%, and 46.7% that the distribution of MPKI1 values is not normal. 4.3
The Encoding Model and the Encoding Manipulation Operator
The configuration bitstream of an RCB consists of a linear list of 80 integers. Starting with the upper left LUT of an RCB (cf. Fig. 1), the four bits of the first integer encode the LUT’s transfer function. The next integer encodes the transfer function of the second-left LUT at the top of the gird. Line-by-line, from left to right and from top to bottom, the 80 integers encode all 2-input LUT transfer functions of the grid. Such a list of integers is called a genotype or a candidate solution. When the integers are used to program an RCB, the hash function computed by an RCB is called a phenotype or a candidate cache mapping. The strategy of the optimization algorithm in this work is to iteratively, step-by-step, modify the genotype by small and random changes with the hope that by chance, the resulting candidate hash mappings improve their miss-rates steadily. The algorithm is a directed search, proceeding from good to better solutions gradually. The manipulation algorithm for the genotype is called the mutation operator. It works directly on the 80 · 4 = 320 bits of the genotype. The mutation operator randomly samples the number of bits that should be flipped and then flips this number of bits in a genotype. The flipped bits are selected randomly. To identify a reasonable mutation rate, i.e., the amount of mutated genetic material, we recorded in initial experiments those mutation rates that created new candidate cache mappings with better miss-rates. With this data, we then configured the mutation operator to flip 1, 2, and 3 bits with probabilities of 48%, 31%, 21%, and 46%, 31%, 23% for the data and instruction caches for the best overall convergence. 4.4
The Optimization Algorithm
The optimization challenge of this work belongs to the family of combinatorial problems. Such search spaces usually lack the notion of a gradient, making it difficult to define an efficient optimization procedure. We, therefore, tested different popular approaches, such as Simulated Annealing (SA) and Genetic Algorithms (GA), until we set with the very basic (1 + 1) Hill Climber (HC) as the most effective optimizer for our case. A (1 + 1) HC operates on a single genotype, also called parent, and derives an off-spring solution through a mutation. The new candidate becomes the new parent if it is on par or better than the old parent. Then the scheme repeats until the computation budget exceeds.
5
Experiments
All experiments are subdivided into a training and a validation phase. In the training phase, the optimization algorithm evolves in three optimization runs
Evolution of Cache Mappings
273
for an application from the MiBench suite, a good-performing candidate cache mapping. The optimization algorithm executes for 3000 iterations and uses a training set of four input data vectors. Each application has an own set of training data vectors. The best candidate cache mapping of the three runs is selected and evaluated in the consecutive validation phase on of up to 10 data vectors. Each application has an own set of test data vectors again. For each data vector, the application and its best candidate cache mapping are executed 16 times. The up to 160 measurements per application are normalized, and their median behavior is presented in Table 2. Cache mappings for instruction and data caches are evolved in separate experiments. All experiments have been conducted for direct-mapped caches to establish a baseline performance. Set associative caches show better miss rate numbers. It is, therefore, more difficult to find better custom cache mappings for the set-associative case. Because the experiments are computationally expensive, we have focused in the first step on direct-mapped caches in our experiments. It is essential to mention that once a good performing cache mapping is evolved, no further optimization is required, and no additional optimization overhead occurs during the regular execution time of an application using an evolved cache mapping. 5.1
The Experimental System
A single optimization run can be parallelized on up to 16 LEON3 cores. Our experimental system consists, therefore, of a host computer carrying out the HC algorithm and distributing candidate cache mapping evaluations to up to four FPGA boards emulating 4-core LEON3 CPUs each. 5.2
The Training Results
The training results of the best cache mappings are presented in the “training red.[%]” columns for L1:D and L1: I caches in Table 2. The first observation is that the optimization process is always able to find better cache mappings than the standard mapping for data and instruction caches. The next observation is that the miss rate improvements can be different for data and instruction caches of the same application. CJPEG and DJPEG reach, for instance, 20% and 60% differences in the miss rates of evolved data and instruction cache mappings, respectively. Finally, fetching instructions can be predicted better than fetching data cells. This is even though the L1:D cache shows higher relative miss rates than the L1:I cache for many applications and therefore, intuitively, L1:D should have a higher potential for miss-rate improvement (column: testing → cache misses → rate[%]). On the other hand, the absolute number of misses is often higher for the L1:I cache (column: testing → cache misses → ×106 ). A reason for small miss-rate reductions could be the randomization of the virtual-to-physicalpage mapping of Linux. This could lead to potentially smaller achievable missrate reductions.
274
N. Ho et al.
5.3
The Generalization Results
Table 2 shows also for the best evolved cache mappings their measurements on unseen data. Reported are the absolute cache miss rates (testing → cache misses → ×106 ), relative cache miss rates (testing → cache misses → rate[%]), miss rate reductions compared to the conventional cache (testing → cache misses → red.[%]), and the reductions of execution times (testing → run-time → red[%]). The first observation is that for the L1:D the miss rate excel for the CJPEG and degrade for the Rijndael-DE, Rijndael-EN, and the CRC32 benchmarks. For other benchmarks, the L1:D miss-rates are around the numbers of the conventional caches. Because in a single experiment, only one of the two caches of a LEON3 has been optimized, the overall magnitude of the execution time reduction or degradation is smaller than the magnitude of the miss rate reduction or degradation. i.e., reducing the miss-rate of CJPEG by 35% reduces the execution time only by 16.8%. Degrading the miss rate of Rijndael-DE by 48.9% degrades the execution time only by 12.5%. For the L1:I cache the test miss-rates could be improved much more consistently than for the L1:D cache. However, most execution times are around the performance of the conventional cache with the notable exceptions for the Rijndael-DE, Rijndael-EN, and the CRC32 benchmarks. There, the execution times could be improved by 26.3%, 18.8%, and 37.1%, respectively. Table 2. Evolution of custom cache mappings for a 4 KB direct mapped cache. Normalized median training and test miss-rate reductions (red.[%]) of the best evolved cache mapping of an application. Median absolute and relative numbers of miss-rates during testing (×106 , rate[%]). Median run-time reductions during testing. Application
L1:D Training
L1:I Testing
Training
Cache misses
CJPEG DJPEG FFT DBLOWFISH PATRICIA
red.[%]
×106
rate[%]
red.[%]
32.7
9.5
41.1
35.2
Testing
Run-time
Cache misses
red.[%]
red.[%]
×106
16.8
12.4
2.2
Run-time rate[%]
red.[%]
red.[%]
1.9
−0.6
−1.6 8.7
3.6
3.7
32.0
3.3
2.1
64.5
2.7
5.2
67.7
10.1
0.7
10.2
5.0
0.2
4.6
6.9
16.3
4.7
2.6
6.5
0.1
13.6
4.8
2.0
30.7
0.01
0.5
19.6
0.0 3.1
10.7
3.9
21.4
6.4
0.2
3.0
31.0
25.7
3.3
DIJKSTRA
0.6
2.3
22.4
0.4
0.9
13.6
2.2
4.5
10.6
9.7
RIJNDAEL-EN
2.7
7.8
41.2
−12
−3.1
15.6
29.2
36.0
15.7
18.8
RIJNDAEL-DE
6.2
7.6
42.3
−48.9
−12.5
14.6
24.2
31.6
14.5
26.3
CRC32
4.5
4.7
9.3
−16.6
−2.1
33.3
32.2
16.0
33.3
37.1
6
Conclusion
The trend for more cores on a single die challenges the conventional processor design. Applications with fundamentally different memory access behaviors interfere with each other making it difficult for the cache logic to provide a
Evolution of Cache Mappings
275
uniform, coherent, and efficient memory model. Shared resources allow for information leaks among unprivileged tasks. The memory bottleneck and caches are becoming one of the popular research and development fields of processor design again. In this work, we have investigated the idea that a reconfigurable memoryaddress-to-cache-index mapping that is tailored by a search algorithm to a specific application may outperform the conventional modulo-mapping. We have extended for the instruction and data caches of LEON3 by reconfigurable address mapping functions, interfaced them with the Linux OS Kernel, adopted Linux’ scheduler, and evolved for nine MiBench applications for the L1:I and L1:D application-specific cache mappings. For most of the applications, the L1:I miss rates could be improved by more than 10%. Most of the execution times for L1:I and L1:D lie, however, around the performance of systems with a conventional cache. Considerable run-time improvements have been achieved for the Rijndael-DE, Rijndael-EN, and the CRC32 benchmarks using the L1:I cache. More work is required to investigate whether optimizing cache mappings for virtually-addressed L1:D caches would lead to better results. Because of the prolonged optimization times, only direct-mapped caches have been investigated in this work to establish a baseline performance.
References 1. Corporation, A.: Amdahl 470V/6 Machine Reference Manual (1976) 2. Intel: Improving real-time performance by utilizing cache allocation technology. Technical report, Intel (2015) 3. Intel: Intel 64 and IA-32 architectures software developer’s manual volume 3B: system programming guide, Part 2. Technical report, Intel (2015) 4. Kim, K.Y., Baek, W.: Quantifying the performance and energy efficiency of advanced cache indexing for gpgpu computing. Microprocessors and Microsystems (2016). http://www.sciencedirect.com/science/article/pii/S0141933116000053 5. Givargis, T.: Improved indexing for cache miss reduction in embedded systems. In: Proceedings Design Automation Conference (DAC), pp. 875–880. IEEE (2003) 6. Patel, K., Macii, E., Benini, L., Poncino, M.: Reducing cache misses by applicationspecific re-configurable indexing. In: Proceedings of the 2004 IEEE/ACM International Conference on Computer-aided Design (ICCAD), pp. 125–130. IEEE Computer Society (2004) 7. Vandierendonck, H., Manet, P., Legat, J.: Application-specific reconfigurable XORindexing to eliminate cache conflict misses. In: Proceedings Design, Automation and Test in Europe (DATE), pp. 1–6. IEEE (2006) 8. Wang, B., Liu, Z., Wang, X., Yu, W.: Eliminating intra-warp conflict misses in GPU. In: Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 689–694. EDA Consortium (2015) 9. Seznec, A., Bodin, F.: Skewed associative caches. Technical report 1655, INRIA (1992) 10. Diamond, J.R., Fussell, D.S., Keckler, S.W.: Arbitrary modulus indexing. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 140–152. IEEE Computer Society (2014)
276
N. Ho et al.
11. Kaufmann, P., Plessl, C., Platzner, M.: EvoCaches: Application-specific Adaptation of Cache Mappingsm, pp. 11–18. IEEE CS (2009) 12. Aeroflex Gaisler: Grlib. http://www.gaisler.com/products/grlib/grlib.pdf 13. Ho, N., Kaufmann, P., Platzner, M.: A hardware/software infrastructure for performance monitoring on LEON3 multicore platforms. In: Proceedings International Conference on Field Programmable Logic and Applications (FPL) (2014)
Assessment of Environmental and Occupational Stresses on Physiological and Genetic Profiles of Sample Population Jasbir Kaur Chandani, Niketa Gandhi, and Sanjay Deshmukh(&) Department of Life Sciences, University of Mumbai, Santacruz (E), Mumbai 400 098, India [email protected], [email protected], [email protected]
Abstract. Environmental and occupational stresses are one of the most prevalent health hazards in today’s workplace. Stressful working conditions have been linked from time to time to physical illness, low productivity, absenteeism, and increased rates of accidents. They have also been found associated with telomere length attrition and thus aging. This paper discusses the results achieved from a sample study of size 60 from two cities of India based on different industries such as road construction, sawmill, tire remolding, hotel and bakery. The profiles of samples are discussed in detail. Further, the effects of various parameters like smoking/alcoholism, increasing age, number of working years, working temperature, noise levels, light intensity, humidity levels on telomere length are discussed with the results achieved from the study. The study concludes with some strong correlations between these parameters and the telomere length. Keywords: Aging length
Environmental stress Occupational stress Telomere
1 Introduction For generations, various studies on occupational and environmental exposures have provided information about profound importance of strengthening our knowledge of the etiology of many diseases. Occupational stress is the reaction people have when exposed to over and above work stress than their handling potential [1]. Occupational and environmental exposures has become a topic of major concern in developing countries like India where exposure levels are likely to be higher as they have lesser strict rules and regulations than other developed countries. The intensity or duration of exposures to numerous substances in the workplace can lead to higher chances of cancer in the exposed workers. In a recent study conducted by Kurt, Elsa et al., [2] on various species of birds and mammals having varied life spans and body sizes, it was found that the rate of telomere shortening has a specific role in forecasting the life span of species [2].
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 277–286, 2021. https://doi.org/10.1007/978-3-030-49345-5_29
278
J. K. Chandani et al.
In a study conducted on large European ancestry older cohort to find associations between genetically determined telomere length and age-associated diseases, telomere length was found linked to increased cancer risk and lower risk of coronary heart disease [3]. Telomeres, the nucleoprotein complexes are found to shorten by each cell division and with increasing age [4, 5]. Telomerase, a reverse transcriptase enzyme, adds repeats to these shortened telomeres and is found to reverse this process to some extent [6]. Short telomere length is found linked to increasing age, weight gain, desk bound habits and smoking from time to time [7, 8]. Telomere length gets shortened with increasing age as a normal cellular process [9]. Human fibroblast cell enters senescence after 50 to 70 cell divisions [10]. Telomere length is humans mostly decrease at the rate of 24.8 to 27.7 bp/year [11]. Telomere length is found to affect the aging process to a certain level [12–14]. This study has made an attempt to understand the effect of various parameters on telomere length from a sample size of 60 from different industries in two cities of India. The results found prove some strong correlations between the parameters and telomere length that can be used as a base in the future to avoid such situations on the health of alike population working under similar environmental and occupational stresses.
2 Profile of Sample Population 2.1
Sample Population Details
Sample Population includes employees from Road construction Industry at Khopoli (Raigad District), Maharashtra state, India and employees from four industries of Indore city of Madhya Pradesh state of India namely Sawmill industry, Tire remoulding Industry, Hotel Industry and Bakery Industry. The sample size of the population was 60 for this study (Table 1). 2.2
Profile of Sample Population (Occupation-Wise)
In general, the sample population included people from the above-mentioned Industries. The sample population included: laborers, supervisors, technicians, managers, owners, helpers, drivers, mechanics, etc. The sample population is selected on the basis Table 1. Profile of sample population-occupation-wise Sr. no. Industry 1 Road construction Industry 2 Sawmill industry 3 Tire remoulding Industry 4 Hotel Industry 5 Bakery Total
Sample size % of sample size 12 20% 22 36.67% 12 20% 06 10% 08 13.33% 60 100%
Assessment of Environmental and Occupational Stresses
279
of the availability of employees from different industries and in the proportion of workers working in respective industries. 2.3
Profile of Sample Population-Control/Experimental
Out of total population surveyed, 85% are experimental subjects-site workers to be specific and 15% control subjects –office workers/off-site workers. A comprehensive study was conducted to study the effect of environmental factors on the sample population. 85% of the sample thus selected were experimental subjects and 15% control subjects, as office workers/offsite workers are less exposed to environmental factors. 83% of the sample population belongs to the age group between 25–55 years of age because 6–7% belong to a lower age group and have less experience required for study, on the contrary, 55–65 yrs of age group is more affected by other parameters due to age constraints. 2.4
Demographics of Sample Population
The sample population included 87% male and 13% females. In general, all females are laborers-weight (wood) lifters in the sawmill industry. Men are involved in almost all kinds of occupations. The population of males and females is selected on the basis of the availability of the sample population. Gender has no relevant effect on telomere loss [11]. Since there is no minimum qualification criteria for selected industries, hence the sample population includes both educated and uneducated people. Almost 65% of people working are either uneducated or below senior secondary level and 45% are educated. More than 50% of uneducated people are mostly experimental subjects and involved in on-site work. Monthly income, marital status and family size of the sample population were also found to have an impact on human health. So, these points were also considered while collecting the sample data. 2.5
Physiological Parameters of Sample Population
According to standard BMI scale: BMI below 18.5 indicates underweight, BMI between 18.5 and 24.9 –is normal weight, BMI between 25–29.9 –indicates overweight and BMI between 30 and 39.9 – indicates obesity [15]. According to previous studies increase in BMI is linked to shortened telomere length [16]. Obesity is also associated with excessive telomeres length attrition. Obese women have a significantly shorter telomere length than the lean women of the same age [9]. An increase in the rate of telomere attrition in obese individuals was equivalent to almost 9 years of one’s life. It was observed that almost 50% of the population was in normal BMI Range. About 6% of the total population was in the obese range and none of them had telomere length over 50 copies/µL. BMI was found directly related to telomere length, contrary to results in previous studies [17] (Figs. 1 and 2).
280
J. K. Chandani et al.
Fig. 1. (a) Provides positive correlation between BMI and Telomere length, while (b) shows correlation between Systolic Blood pressure (SBP) and TL. It is observed that with increase in SBP, TL decreases.
Fig. 2. (a) Shows linear correlation between Diastolic Blood Pressure (DBP) and Telomere length. It is observed that with increase in DBP, TL increases. (b) Highlights correlation between Pulse Pressure and TL. It is observed that Pulse pressure is inversely proportional to TL.
Telomere length may be shorter in individuals with high blood pressure than normal individuals [18]. Telomere length increased with increasing diastolic blood pressure (DBP) but was found to decrease with an increase in systolic blood pressure (SBP) [20]. According to studies: Telomere Length is found to be hereditary and negatively correlated with Pulse Pressure [19]. For a normal reading, your blood pressure needs to show a top number (systolic pressure) that’s between 90–120 and a bottom number (diastolic pressure) that’s between 60–80 [20]. The normal range for the pulse pressure is between 30 to 50 mm Hg [21]. It is observed that with an increase in SBP, TL decreases, and with an increase in DBP, TL increases, which are in line with previous studies [19]. It is also proved that Pulse pressure is inversely proportional to TL, as shown in a previous study [19].
3 Results - Telomere Length 3.1
Telomere Length Range (Copies/µL) of Sample Population
It is observed that people in road construction have no population with Telomere length greater than 40. Even the control was in the least Telomere length range 0–10. This may be due to workers working and spreading asphalt having temperature 100–160°. Here the temperature is the main environmental stressor. Almost 10% of the population
Assessment of Environmental and Occupational Stresses
281
had Telomere length greater than 40 in the Sawmill industry and 35% population had Telomere length greater than 40 in Tyre remoulding Industry. It is observed that people in Bakery have no population with Telomere length greater than 40. The controls were in the Telomere length range of greater than 40. This may be due to the high temperature of about 250° used in the baking industry for baking. Even the temperature outside the oven is 70–80° at which workers have to work. Here the temperature is the main environmental stressor. More than 15% of the population had Telomere length greater than 40 in Hotel Industry almost all controls had telomere length less than 40. In all the 5 industries road construction industries workers had the least telomere length range followed by the bakery industry. 3.2
Effect of Smoking/Alcoholism on Telomere Length
Fig. 3. (a) Shows smoking status of sample population. (b) Shows no. of alcoholics in sample population. Data in (c) implies that more is the dosage of cigarette lesser is the telomere length whereas (d) shows that more is the dosage of alcohol lesser is the telomere length.
According to previous studies, Smoking also leads to accelerated telomere shortening [22]. More is the dosage of cigarette smoking, lesser is the telomere length [22]. It was found in a study that telomere length was lost at the rate of 25.7 to 27.7 base pair/year and an additional 5 bp was lost with daily cigarette smoking [9]. So, smoking 1 pack of cigarette daily for 40 years is equivalent to more than 7 years of one’s life [9]. Thus smoking increases oxidative stress, this leads to early shortening of telomere length, which may speed the aging process (Fig. 3). According to studies it was found that alcoholic patients had short telomere lengths, which place them in the higher-risk zone for various diseases like heart disease,
282
J. K. Chandani et al.
diabetes, etc. [23]. It is observed that there is a very good correlation between telomere length status and smoking status and alcoholism status. More is the dosage of cigarette and alcohol lesser is the telomere length. 3.3
Effect of Increasing Age on Telomere Length
Fig. 4. Reveals that telomere length decreases with increasing age of sample size.
According to studies environmental and occupational exposures play an important role in affecting the rate of telomere length attrition [24]. With increasing age telomere length is found to decrease [24]. It is observed that telomere length decreases as sample age increases, in accordance with previous studies [24] (Fig. 4). 3.4
Effect of Number of Working Years on Telomere Length
Fig. 5. Shows correlation between no. of working years and telomere length
It is observed that more the number of working years, the lesser is the telomere length. Hence, telomere length is inversely proportional to the number of working years, in accordance with previous studies (Fig. 5). According to studies, more the number of working years, more is the effect of environmental stressors, and lesser will be the telomere length [25], hence 85% of the population selected have more than 5 years of experience. So, the results are likely to be more prominent. It is observed that more the number of working years, lesser is the
Assessment of Environmental and Occupational Stresses
283
telomere length. Hence, telomere length is inversely proportional to the number of working years, in accordance with previous studies [25]. 3.5
Effect of Working Temperature on Telomere Length
Fig. 6. Reveals that with increase in temperature telomere length decreases
More than 80% of the population works at more than 40 °C temperature. Results are in line with previous studies [26], as it is observed that with an increase in temperature telomere length decreases (Fig. 6). 3.6
Effect of Noise Levels on Telomere Length
Fig. 7. Shows telomere length decreases with increase in noise level.
• Noise exposure standard • Occupational Health and Safety Regulations 2017 (OHS Regulations) has set a noise exposure the standard for workplaces [27], which consist of 2 parts: • -Average 85 dB in an 8-h period and maximum allowed noise level of 140 dB [27] • It is found that more than 30% of the population works at more than 85db noise level that is standard.
284
J. K. Chandani et al.
• Workers in road construction, sawmill and tire remolding industry are mostly working in a higher noise level than recommended in standard. • As found in previous studies, telomere length decreases with increase in noise level (Fig. 7). 3.7
Effect of Light Intensity on Telomere Length
It is found that light intensity in the bakery industry is lesser than recommended illuminance levels. It is observed that more than 15% of the population works at more than 300 lux illuminance levels. No specific relation is observed between the illuminance level and telomere length. 3.8
Effect of Humidity Levels on Telomere Length
Fig. 8. It shows a partial inverse relation between humidity and telomere length.
It is observed that about 8% of workers work in humidity levels lower than recommended (Fig. 8). A partial inverse relation between humidity and telomere length is observed not that prominent though.
4 Conclusion In summary, Telomere length was found to both shorten and lengthen with different environmental and occupational exposures in this study. Telomere length is found to decrease with increasing age of sample size, in accordance with previous studies [24]. It is found that more is the dosage of cigarettes and alcohol, lesser is the telomere length. More is the number of working years, lesser is the telomere length. Hence, telomere length is inversely proportional to the number of working years, in accordance with previous studies. With the increase in the temperature telomere length decreases. Telomere length is found to decrease with the increasing noise level. So the effect of environmental stressors is quite prominent. It is observed that with an increase in SBP, TL decreases, and with an increase in DBP, TL increases, which are in line with previous studies [19]. It is also proved that Pulse pressure is inversely proportional to TL. Contrary to results in previous studies [17, 26], BMI was directly related to telomere length. In short, telomere length can be used as a biomarker of aging, which in
Assessment of Environmental and Occupational Stresses
285
the future can prevent the occurrence of various aging-related diseases and will help to create a healthy work environment. Ethical Approval:. All procedures performed in this study involving human participants were in accordance with the ethical standards of University of Mumbai, the institute and the research committee of Department of Life Sciences where the research was carried out as part of the Ph.D work. Further, informed consent was obtained from all individual participants included in the study.
References 1. https://www.who.int/occupational_health/topics/stressatwp/en/ 2. Kurt, W., Elsa, V., Eva, M.-N., Sanpera, C., Blasco, M.A.: Telomere shortening rate predicts species life span. PNAS 116(30), 15122–15127 (2019). https://doi.org/10.1073/pnas. 1902452116 3. Chia-Ling, K., Pilling, L.C., Kuchel, G.A., Ferrucci, L., Melzer, D.: Telomere length and aging-related outcomes in humans: a Mendelian randomization study in 261,000 older particpants (2019). https://doi.org/10.1111/acel.13017 4. Frenck Jr., R.W., Blackburn, E.H., Shannon, K.M.: The rate of telomere sequence loss in human leukocytes varies with age. Proc. Nat. Acad. Sci. USA 95(10), 5607–5610 (1998) 5. Svenson, U., Nordfjäll, K., Baird, D., Roger, L., Osterman, P., et al.: Blood cell telomere length is a dynamic feature. PLoS ONE 6(6), e21485 (2011) 6. Chan, S.W., Blackburn, E.H.: New ways not to make ends meet: telomerase, DNA damage proteins and heterochromatin. Oncogene 21(4), 553–563 (2002) 7. Shiels, P.G., Mc Glynn, L.M., Mac Intyre, A., Johnson, P.C., Batty, G.D., Burns, H., Cavanagh, J., Deans, K.A., Ford, I., Mc Connachie, A., Mc Ginty, A., Mc Lean, J.S., Millar, K., Sattar, N., Tannahill, C., Velupillai, Y.N., Packard, C.J.: Accelerated telomere attrition is associated with relative household income, diet and inflammation in the p SoBid cohort. PLoS ONE 6(7), e22521 (2011) 8. Fyhrquist, F., Saijonmaa, O.: Telomere length and cardiovascular aging. Ann. Med. 44 (Suppl. 1), s138–s142 (2012) 9. Valdes, A.M., Andrew, T., Gardner, J.P., Kimura, M., Oelsner, E., Cherkas, L.F., Aviv, A., Spector, T.D.: Obesity, cigarette smoking, and telomere length in women. Lancet 366(9486), 662–664 (2005) 10. von Zglinicki, T., Martin-Ruiz, C.M.: Telomeres as biomarkers for ageing and age-related diseases. Curr. Mol. Med. 5(2), 197–203 (2005) 11. Olovnikov, A.M.: A theory of Marginotomy. The incomplete copying of template margin in enzymic synthesis of polynucleotide’s and biological significance of the phenomenon. J. Theor. Biol. 41(1), 181–190 (1973) 12. Cawthon, R.M., Smith, K.R., O’Brien, E., Sivatchenko, A., Kerber, R.A.: Association between telomere length in blood and mortality in people aged 60 years or older. Lancet 361 (9355), 393–395 (2003) 13. Farzaneh-Far, R., Cawthon, R.M., Na, B., Browner, W.S., Schiller, N.B., Whooley, M.A.: Prognostic value of leukocyte telomere length in patients with stable coronary artery disease: data from the heart and soul study. Arterioscleriosis Thrombosis Vasc. Biol. 28(7), 1379– 1384 (2008)
286
J. K. Chandani et al.
14. Yang, Z., Huang, X., Jiang, H., Zhang, Y., Liu, H., Qin, C., Eisner, G.M., Jose, P.A., Rudolph, L., Ju, Z.: Short telomeres and prognosis of hypertension in a chinese population. Hypertension 53(4), 639–645 (2009) 15. https://www.nhs.uk/common-health-questions/lifestyle/what-is-the-body-mass-index-bmi/ 16. Gielen, M., Hageman, G.J., Antoniou, E.E., Nordfjall, K., Mangino, M., Balasubramanyam, M., De Meyer, T., Hendricks, A.E., Giltay, E.J., Hunt, S.C., et al.: Body mass index is negatively associated with telomere length: a collaborative cross-sectional meta-analysis of 87 observational studies. Am. J. Clin. Nutr. 108(3), 453–475 (2018) 17. Greider, C.W., Blackburn, E.H.: Identification of a specific telomere terminal transferase activity in Tetrahymena extracts. Cell 43(2 Pt 1), 405–413 (1985) 18. Tellechea, M.L., Pirola, C.J.: The impact of hypertension on leukocyte telomere length: a systematic review and meta-analysis of human studies. J. Hum. Hypertens. 31(2), 99–105 (2017). https://doi.org/10.1038/jhh.2016.45 19. Jeanclos, E., Schork, N.J., Kyvik, K.O., Kimura, M., Skurnick, J.H., Aviv, A.: Telomere length inversely correlates with pulse pressure and is highly familiar. Hypertension 36, 195– 200 (2000). https://doi.org/10.1161/01.HYP.36.2.195 20. https://www.mayoclinic.org/diseases-conditions/high-blood-pressure/in-depth/bloodpressure/art-20050982 21. https://www.healthline.com/health/pulse-pressure 22. Song, Z., von Figura, G., Liu, Y., Kraus, J.M., Torrice, C., Dillon, P., Rudolph-Watabe, M., Ju, Z., Kestler, H.A., Sanoff, H., Lenhard Rudolph, K.: Lifestyle impacts on the agingassociated expression of biomarkers of DNA damage and telomere dysfunction in human blood. Aging Cell 9(4), 607–615 (2010) 23. Presented at the 40th Annual Scientific Meeting of the Research Society on Alcoholism, Denver, CO, USA, 24–28 June 2017. https://www.emedevents.com/c/medical-conferences2017/research-society-on-alcoholism-rsa-40th-annual-scientific-meeting. Accessed 5 May 2018 24. Shammas, M.A.: Telomeres, lifestyle, cancer, and aging. Curr. Opinion Clin. Nutr. Metab. Care 14(1), 28–34 (2011) 25. Njajou, O.T., Hsueh, W.C., Blackburn, E.H., Newman, A.B., Wu, S.H., Li, R., Simonsick, E.M., Harris, T.M., Cummings, S.R., Cawthon, R.M.: Association between telomere length, specific causes of death, and years of healthy life in health, aging, and body composition, a population-based cohort study. J. Gerontol. A Biol. Sci. Med. Sci. 64, 860–864 (2009) 26. Hoxha, M., Dioni, L., Bonzini, M., Pesatori, A.C., Fustinoni, S., Cavallo, D., Carugno, M., Albetti, B., Marinelli, B., Schwartz, J., Bertazzi, P.A., Baccarelli, A.: Association between leukocyte telomere shortening and exposure to traffic pollution: a cross-sectional study on traffic officers and indoor office workers. Environ. Health 8, 41 (2009) 27. https://www.worksafe.vic.gov.au/noise-safety-basics
Deep Convolution Neural Network-Based Feature Learning Model for EEG Based Driver Alert/Drowsy State Detection Prabhavathi C. Nissimagoudar(&), Anilkumar V. Nandi, and H. M. Gireesha B.V.B. College of Engineering and Technology, Hubballi, Karnataka, India [email protected]
Abstract. Driver state detection is an important feature of Advance Driver Assistance Systems (ADAS) of automotive. Accurate determination of the driver’s alert/drowsy condition avoids accidents and offers safety to both driver and vehicle. The electroencephalogram (EEG) based method of determining driver’s alert/drowsy condition is the proven most accurate direct measure of driver state. Researchers have attempted to extract significant features representing the driver’s state for a long time. However, extraction and selection of features from the large number of them is very difficult for EEG based systems. In this paper, a representation learning model using a deep convolution neural network (DCNN) is proposed that can automatically learn features from labeled data. The model was used to extract and learn features for publicly available EEG data sets and experimented for different classification results. The results showed that features extracted using DCNN based feature learning model proved better than conventional Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE) techniques in terms of significant feature extraction, data dimension reduction, and classification accuracy. The features also can converge quickly and reduce training times for classifiers. The model can be very effectively applied to automotive application where speed is the criteria. Keywords: EEG Feature extraction network Autoencoders
Driver state Convolution neural
1 Introduction The American Sleep Foundation has reported that about 50% of U.S. adult drivers have agreed that consistently getting behind the wheel makes one drowsy and the Governors Highway Safety Association of United States has reported an estimated 5,000 people died in 2015 in crashes relating drowsy driving. Driver state detection is a very important feature in automotive, which is an early warning system to avoid accidents. Various methods for determining driver alertness are found in literature, but EEG based method, being the direct indication of driver’s cognitive state is more accurate. EEG is a representation of neural activity happening inside the brain. The information is in frequency and amplitude of the signal. EEG signal comprises different rhythms, © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 287–296, 2021. https://doi.org/10.1007/978-3-030-49345-5_30
288
P. C. Nissimagoudar et al.
namely theta (0–4 Hz), delta (4–8 Hz), alpha (8–12 Hz), beta (12–30 Hz) and gamma (30–100 Hz). These rhythms represent different states of brain activity. Analyzing these rhythms and extracting features is crucial in determining the brain state. EEG signal processing and analysis methods include steps comprising of acquiring EEG signals, preprocessing to remove artifacts, feature extraction to retain significant features and classification for the decision. EEG data is acquired from physiological sensors like electrodes. The data is acquired from EEG electrodes following the international 10–20 standard. There are also publicly available data sets like the Montreal Archive of Sleep Studies (MASS) and Sleep-EDF. The raw EEG acquired from electrodes will be usually contaminated with different noises. So, the next step is to pre-process the signal by removing unwanted artifacts and power line noise by using suitable filters. Rhythm isolation is the next step, wherein the significant frequency bands of EEG signals are separated using filtering techniques or by using transforms like wavelet transforms and FFT. Various statistical approaches are used for extracting features in the time domain, frequency domain, time-frequency domains and also nonlinear methods like fractal dimension are used in literature. These features are used by machine learning classifiers for making decisions about brain conditions [18–20]. Feature extraction is very crucial for decision making. EEG Researchers have explored many approaches to extract features from EEG signals for different purposes like in diagnosing brain-related diseases, sleep analysis and for determining different brain conditions. Traditional methods use hand-engineered statistical feature extraction, feature elimination, and reduction techniques. Mohammed Diykh et al. have used timedomain statistical features of the structural graph similarity and K-means for classifying six sleep stages [3, 15]. Farhan Riaz et al. used empirical mode decomposition (EMD) an approach for time-frequency analysis of non-stationary signals [4, 16]. Ning Wang et al. proposed an elimination-based feature selection method which reduces the redundant and noisy points by providing lower-dimensional and independent feature vector [5, 13]. Robert Jenke et al. suggested features selection using multivariate methods proved accurate compared to univariate methods and discussed advanced feature extraction technique which have an advantage over commonly used spectral bands [6]. These commonly used methods mainly extract features from time, frequency and time-frequency domains and then apply feature selection and dimensionality reduction techniques [17]. It is very complex and tedious to select significant features and to apply data reduction techniques. Some methods use non-linear techniques, wavelet transforms [7, 8] and empirical techniques to extract and select features; but these techniques are complex and time-consuming. Recently deep learning techniques have become the research interest and they proved efficient for image and video analysis applications. Yaguang Jia et al. proposed a deep belief neural network stacked by a restricted Boltzmann machine to extract features from steady-state motion visual evoked potential signals [8, 14]. Mehdi Hajinoroozi et al. proposed a deep learning solution for predicting driver state using channel-wise convolution neural network and its variant restricted Boltzmann machine. They tested the performance for raw EEG and Independent Component Analysis (ICA) transformed data and proved their model works better for ICA transformed data [9, 12].
DCNN-Based Feature Learning Model for EEG
289
Although hand-engineered feature extraction methods proved efficient in terms of accurately determining driver state using EEG signals, there is a need for an automatic and independent feature learning model. In this paper, we propose an independent feature learning model for determining driver alertness levels based on deep convolution neural networks. The model independently learns the features from highdimensional data and can effectively with greater decision accuracy and speed be used with classifiers. The structure of the paper is as follows; in Sect. 2 we discuss methodology related deep convolution neural network-based feature learning model, Sect. 3 details about experiments and results involving datasets used, parameters of the model and results of various classifiers. Section 4 details about observations and discussions on parameters of the model and classifier results. Section 5 concludes the paper.
2 Methodology The proposed model uses the supervised method of learning features with the data labeled as wake and drowsy. The model obtains significant features from the data sets to describe the driver states by applying feature extraction and dimension reduction techniques. The training stage of the model in Fig. 1 uses raw wake and drowsy data to train a deep convolution neural network. Testing phase, in which the test data is input to the model to obtain the relevant features. Then the features are given to the dense neural network for classification. The model experiments with different layers of convolution neural networks with varying kernels and kernel sizes. Different maxpooling layers and dropouts were also experimented. The results are discussed in Sect. 3. Feature extraction and reductions using CNN is autoencoders’ framework of the model.
Fig. 1. Supervised method of feature learning model
290
2.1
P. C. Nissimagoudar et al.
Deep-Neural-Network Based Supervised Feature Learning Model
The EEG signal data is high dimensional and their dimension is not independent. Therefore to extract significant features, the convolution neural network is used which provides the use of multiple convolution kernels to get a number of local features. CNN allows estimating local features using receptive field and sharing parameters. Further, the feature size can be reduced by pooling (average or max) which is a downsampling process and also by drop out procedure. This process of convolution and downsampling can be iterative before we arrive at the final feature size. Our supervised feature learning model is explained as follows, the representation learning model extract features using CNN, which uses repeating multiple convolution layers with different filters (kernels) and pooling to arrive at pre-set feature size. The model structure is as shown in Fig. 2
Fig. 2. Schematic of feature learning model
The model includes two stages 1) Two convolution layers with different filter size 2) Max Pooling layer for downsampling. The convolution layer performs the three operations 1D convolution with different filters, batch normalization and activation function realization with rectified linear units (ReLU). For downsampling, max pooling is used with reduced filter size. Drop out is further used to reduce the feature size. The feature matrix is then converted to 1D by using flattening and reshaping before giving it to dense neural network for classification. Suppose the input to the model is 1D EEG data, for multiple convolution layers with different kernels, the calculation the feature map can be done as follows,
DCNN-Based Feature Learning Model for EEG
291
cmi ¼ gðwci x þ bci Þ Here x is the input; i the feature map is cmi ; wci and bci are filters/weights and biases; is the convolution operation. For reducing the feature dimension pooling layer is used. The pooling layer downsamples the feature map into a reduced feature map. It uses the window length of l for sliding to extract feature maps. Here max pooling is used, wherein maximum value within the window is retained to get the feature map. This is represented as, pmi ¼ maxpoolingðcmi ; lÞ Multiple convolutions and pooling processes result in a reduced dimension of feature maps. Reshape operation converts the feature matrix into a 1D vector. Then by using a full connection layer, feature maps of pooling operation synthesize the information. The reshape or flatten operation results in a 1D feature vector v. The output of a fully connected layer is translated in to feature coded as f ¼ gðwv v þ bf Þ Here f is the coded feature, wv and bf are weight and bias components of the fully connected layer. The activation function used at the output layer is sigmoid and for all hidden layers, the ReLu activation function is used. The loss function used is binary cross-entropy. The loss function is minimized using ADAM optimizer.
3 Experimentation and Results The paper focuses on the supervised feature learning method using EEG signals for detection driver alertness. The experimentation is done as follows, 1) Supervised learning of features using alert and drowsy EEG data sets to get features. 2) Classification using dense neural network using these features and verify the accuracy. 3.1
Data Sets
The data set used for experimentation is sleep EEG signals available online from Physionet database [11]. Sleep data is in the European data format (edf). The data consists of various sleep stages recorded at, Bob Kemp, Sleep Centre, MCH-Westeinde Hospital, Den Haag, The Netherlands. EEG data of 61 polysomnograms (PSGs) is *PSG.edf files and related annotations are done is *Hypnogram.edf files. *PSG.edf files are polysomnographic sleep EEG recordings for the whole night. The data includes EEG signals from Fpz-Cz and Pz-Oz locations of electrodes sampled at 100 Hz, with sleep stages annotated as W, R, 1, 2, 3, 4, REM, M (Movement time) and ? (Not scored). The annotations are according to 1968 Rechtschaffen and Kales manual [10]. The data with wake (W) and initial sleep or drowsy (S1) stage along with annotation is extracted using Polyman tool.
292
P. C. Nissimagoudar et al.
The *PSG.edf and *Hypnogram.edf files for wake-up stage W and drowsy stage S1 are shown in Fig. 3. We took 150 samples 2 min each of sleep1 and wake stage data. As the sampling frequency is 100 Hz, we got 12000 samples for each signal.
Fig. 3. EEG data for sleep stage S1 and wake stage W
3.2
Network Parameters
The CNN based supervised feature learning model was having the parameters chosen as shown in Fig. 4. The network depth was chosen to get enhanced learning ability
Fig. 4. The CNN based supervised feature learning model for drowsy stage and wake stage
DCNN-Based Feature Learning Model for EEG
293
without over fitting and also with the optimal computation time. The number of network layers was fixed to two. We experimented with different feature coding size by varying number of kernels and kernel size. In Fig. 4 The data size used is of 2000 epochs of 2 min duration from 25 subjects. The labelled wake and sleep1 (drowsy) 1D data is given to first layer of convolutional with the kernel function C = 32, kernel size = 3 and stride size = 2. Then the 10% drop out function is used. Convolutional layer two is used with kernel function as c = 64, kernel size = 3. Further to reduce the feature dimension size, Max pooling was used with kernel function = 2. The data is then reshaped to 1D before it was given to the fully connected layer. The fully connected or dense neural network uses three hidden layer with activation function as ReLu. The output layer uses sigmoid activation with the loss function calculated using binary cross entropy. The weights of the activation function were adjusted using ADAM optimizer. 3.3
Regularization
Regularization techniques are used to avoid over fitting problems. We used drop-out and weight decay, two techniques to overcome over fitting problem. Drop-out randomly assigns 0 to input values along with disconnecting them from inputs. This is done during training with the specific probability value. 0.1 drop out was used in our training model only in the first layer of CNN. The drop-out was removed during testing to provide deterministic outputs. To avoid over fitting due to noises and artifacts weight decay technique was used. This technique uses penalty term to add to the loss function. This prevents large valued exploding gradient parameters in the model. Weight decay was applied to only the first layer of CNN. Weight decay helped the model to only learn smooth filters. The weight decay parameter, lambda used was 103 . 3.4
Experimental Results
The experiment uses the labelled data as wake and sleep1 (drowsy) for training the model to learn features. The test data was used extract features from the model to classify the results. The effectiveness of the model was tested using dense neural network for desired accuracy. The experimented architecture variations are shown in Table 1, architecture 1 and architecture 2 are having one extra convolutional layer which increases the computational complexity at the reduced accuracy of 80%. Architecture 3 with only two convolutional layer with less computational complexity is giving highest accuracy. The model was cross validated using k-fold validation with k value as 10, is giving the validation accuracy around 90%. For the test data the accuracy obtained was 89.9%.
294
P. C. Nissimagoudar et al. Table 1. Classification results for varying architectures DNN with flidden layers = 3 filter size = 32, 64, 128 Optimizer = Ada m
Results = Validation accuracy = 80% test accuracy = 80%)
Flatten = ID conversion
DNN with flidden layers = 3 filter size = 64, 128, 128 Optimizer = Ada rn
Results = validation accuracy = 80% test accuracy = 80%)
Flatten = ID conversion
DNN with flidden layers = 3 filter size = 64, 128, 128 optimizer = Ada m
Results = validation Accuracy = 89.9% test accuracy = 90%)
Architecture1: epoch size = 5
Layerl: Convl with kernel = 16, size = 3
Layer2: Conv2 with kernel = 32, size = 3
Flatten = ID Layer3: Conv2 with kernel = 64 conversion size = 3 maxpool size = 2
Architecture2: Epoch size = 5
Layerl: Convl with Kernel = 16, size = 3
Layer2: Conv2 with kernel = 32, size = 3
Layer3: Conv2 with kernel = 64, Size = 3 Maxpool size = 2
Architecture3: Epoch Size = 5
Layerl: Convl with kernel = 32, size = 3, 10 K dropout ¼ 10%
Layer3: Layer2: maxpool Conv2 size = 2 with kernel = 64, size = 3
The model was experimented with different feature size as shown in Table 2, and respective classification accuracy is as shown. The number of features were varied from n = 2, 4, 6, 8, 16, 32, 64, 128. For the feature size n = 64, the highest accuracy is obtained. Table 2. Classification results for varying features No. of features 2 4 8 16 32 64 128 Classification accuracy 69.23% 72.62% 76.92% 78.67% 88.42% 89.93% 89.20%
4 Discussion We propose the CNN based supervised feature learning model for driver alert/drowsy state detection. The model learns the features automatically from the labeled wake and drowsy state single-channel data. Different architecture options were experimented with different CNN layers. CNN with two layers is found to give the desired accuracy. Further, the model was tested against different filters, CNN layer1 with 32 filters and CNN layer2 with 64 filters gives good classification accuracy. Max pooling provides a desired data reduction to arrive at an optimal feature size. Drop-out and weight decay techniques were used to avoid overfitting and the final features are for the smooth CNN filters. The feature learning model was verified against the dense neural network with three layers. The model was validated with K-fold cross-validation technique with k = 10. The feature learning model is used to test the drowsy/awake state for the sleep edf. Around 90% test accuracy observed for the various test inputs. Further, the model has limitations in the form of requiring huge data for training. Training time is considerably high. The work was implanted using the MATLAB R2018a and Python software on a computer with Intel® Core™ i7 7th generation, 2.70 GHz CPU, 16 GB of installed RAM and 64-bit operating system.
DCNN-Based Feature Learning Model for EEG
295
5 Conclusion and Future Work In this paper, we tried to propose a supervised feature learning model using CNN to drive state detection. Our model can automatically extract the features from labeled drowsy/awake data with feature reduction and good classification accuracy. We experimented with our model with publicly available single-channel sleep edf. We observed the classification accuracy of around 90%. Our method has an advantage over the conventional hand-engineered feature learning model. Further, it is planned to train the model data of varying classes to address class imbalance problems. Also, it is planned to test the model using hardware and for live EEG signals acquired using electrodes.
References 1. Rangayyan, R.M.: Biomedical Signal Analysis. Wiley, Hoboken (2002) 2. Tompkins, W.J.: Biomedical Digital Signal Processing. Prentice-Hall, Upper Saddle River (1995) 3. Diykh, M., Li, Y., Wen, P.: EEG sleep stages classification based on time domain features and structural graph similarity. IEEE Trans. Neural Syst. Rehabil. Eng. 24(11), 1159 (2016) 4. Riaz, F., Hassan, A., Rehman, S., Niazi, I.K., Dremstrup, K.: EMD-Based temporal and spectral features for the classification of EEG signals using supervised learning. IEEE Trans. Neural Syst. Rehabil. Eng. 24(1), 28–35 (2016) 5. Wang, N., Lyu, M.R.: Extracting and selecting distinctive EEG features for efficient epileptic seizure prediction. IEEE J. Biomed. Health Inform. 19(5), 1648–1659 (2015) 6. Jenke, R., Peer, A., Buss, M.: Feature extraction and selection for emotion recognition from EEG. IEEE Trans. Affect. Comput. 5(3), 327–339 (2014) 7. Nissimagoudar, P.C., Nandi, A.V., Gireesha, H.M.: EEG feature extraction and analysis for driver drowsiness detection. In: 9th International Conference on Innovations in Bio-Inspired Computing and Applications, IBICA 2018, 17–19 December, 2018, Kochi, India (2018) 8. Garrett, D., Peterson, D.A., Anderson, C.W., Thaut, M.H.: Comparison of linear, nonlinear, and feature selection methods for EEG signal classification. IEEE Trans. Neural Syst. Rehabil. Eng. 11(2), 141–144 (2003) 9. Mirowski, P.W., LeCun, Y., Madhavan, D., Kuzniecky, R.: Comparing SVM and convolutional networks for epileptic seizure prediction from intracranial EEG. In: 2008 IEEE Workshop on Machine Learning for Signal Processing, MLSP 2008, pp. 244–249. IEEE (2008) 10. Rechtschaffen, A., Kales, A.E.: A manual of standardized terminology, techniques and scoring systems for sleep stages of human subjects, vol. 10. UCLA Brain Information Service. Brain Research Institute, Los Angeles (1968) 11. https://physionet.org/physiobank/database/sleep-edfx/ 12. Shoeb, A.H., Guttag, J.V.: Application of machine learning to epileptic seizure detection. In: Proceedings of the 27th International Conference on Machine Learning (ICML-2010), pp. 975–982 (2010) 13. Berthomier, C., Drouot, X., Herman-Stoïca, M., Berthomier, P., Prado, J., Bokar-Thire, D., Benoit, O., Mattout, J., d’Ortho, M.: Automatic analysis of singlechannel sleep EEG: validation in healthy individuals. Sleep 30(11), 1587 (2007)
296
P. C. Nissimagoudar et al.
14. Subasi, A.: Eeg signal classification using wavelet feature extraction and a mixture of expert model. Expert Syst. Appl. 32(4), 1084–1093 (2007) 15. Zoubek, L., Charbonnier, S., Lesecq, S., Buguet, A., Chapotot, F.: Feature selection for sleep/wake stages classification using data driven methods. Biomed. Signal Process. Control 2(3), 171–179 (2007) 16. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) 17. Wen, T., Zhang, Z.: Deep convolution neural network and autoencoders based unsupervised feature learning of EEG signals. IEEE Access 6, 25399–25410 (2018) 18. Nissimagoudar, P.C., Nandi, A.V.: An EEG based driver consciousness analysis. In: International Conference on Recent Advances & Applications in Computer Engineering (RAACE 2017), 23–25 November, 2017. Springer (2017) 19. Nissimagoudar, P.C., Nandi, A.V., Gireesha, H.M.: Wavelet transform based EEG signal analysis for driver status detection. In: 9th International Conference on Innovations in BioInspired Computing and Applications, IBICA 2018, 17–19 December, 2018, vol. 939, Kochi, India. Part of Advances in Intelligent Systems and Computing (AISC). Springer (2018) 20. Nissimagoudar, P.C., Nandi, A.V., Gireesha, H.M.: EEG feature extraction and analysis for driver drowsiness detection. In: 9th International Conference on Innovations in Bio-Inspired Computing and Applications, IBICA 2018, 17–19 December, 2018, vol. 939, Kochi, India. Part of Advances in Intelligent Systems and Computing (AISC). Springer (2018)
A Feature Extraction and Selection Method for EEG Based Driver Alert/Drowsy State Detection P. C. Nissimagoudar(&), Anilkumar V. Nandi, and H. M. Gireesha(&) B.V.B. College of Engineering and Technology, Hubballi, India [email protected], [email protected]
Abstract. Accurate estimation of driver’s alert/drowsy state is one of the very essential feature of driver assistance systems. Physiological signals are the direct indication of a driver’s cognitive state. We propose a method using Electroencephalograms (EEG) of the driver to determine the state of consciousness. This paper discusses about EEG analysis using features extracted in both time and frequency domain for alpha and theta frequency bands is used. We have investigated the feature extraction and elimination techniques combined with classification techniques for their performance. Publicly available Sleep data sets from Physionet were used for the proposed study. The signals from Fpz–Oz electrode are divided into smaller segments before estimating the features. Twenty features were used for experimentation. The feature extraction done using ICA showed improved performance over ICA in terms of classification accuracy. The recursive feature elimination technique, when used with the neural network, showed an overall improved performance of 92% accuracy. The proposed method can be used to determine the driver status and further to predict the driver’s alertness Keywords: EEG analysis
Feature extraction Feature elimination
1 Introduction Detection of driver’s alert/drowsy state is one of the essential driver assistance features used in automotive. EEG based method being a direct measure of driver’s cognitive state is a more accurate method compared to other indirect methods like measuring vehicle’s behavior, driver’s behavior or image processing techniques to measure driver’s facial expressions. EEG is a physiological signal which represents the neural activity of the brain. It is measured using scalp electrodes placed according to 10–20 international standards. The data acquired from electrodes is a combination of neural activity, muscle movement, eye movement and other physiological signals [1, 2]. The extraction of EEG signals from artifacts is the first step of EEG processing. EEG signals are a combination of signals with different frequencies and amplitudes representing various brain activities. The EEG rhythms include theta (0–4 Hz), delta (4–8 Hz), alpha (8–12 Hz), beta (12–30 Hz) and gamma (30–100 Hz). These signals contain a lot of information and related large number of features [15, 16]. The extraction and selection of significant features from EEG signals is a challenging task. This paper discusses © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 297–306, 2021. https://doi.org/10.1007/978-3-030-49345-5_31
298
P. C. Nissimagoudar et al.
various feature extraction and selection methods relevant for EEG signal classification for determining driver state. Features can be obtained using time domain, frequency domain, time-frequency domain and non-linear methods. The features obtained are given to dimension reduction techniques such as feature extraction and selection methods to reduce the size of the features and further to reduce the computation complexity for while taking decisions by the classifiers. Dimensionality reduction can be done using feature extraction and feature selection. Feature extraction or feature projection mainly deals with linear or non-linear transformation of high dimensional data into fewer dimensions whereas feature selection deals with selecting features from large set to smaller subset while retaining its original characteristics. Feature extraction and selection are important for making the system more generalized, improving its specifications in terms of accuracy, time and storage. Guyon and Elisseeff have provided a comprehensive review of such methods [6]. Kai-Quan Shen et al. discussed two approaches to feature selection for EEG classification applied mental fatigue detection. One method proposes using a random forest (RF) classifier with heuristic initial feature ranking technique (INIT) and other using RF with a recursive feature elimination scheme (RFE). Their study concluded that RF with RFE outperformed over RF with INIT in terms of lowest test error rate and the number of features selected [3]. Andrzej Majkowski et al. proposed frequency analysis (FFT) for feature extraction to classify ERD/ERS patterns using a genetic algorithm [4]. Raja Majid Mehmood et al. discussed using Hjorth activity, mobility and complexity as features for analyzing emotions and using a balanced one-way ANOVA method to select features. They found their method better than univariate and multivariate features [5, 10]. In this paper, we study various feature extraction and selection techniques applied to detect vehicle driver alertness detection. Section two discusses feature extraction and selection methods, section three deals with experimental results, section four deals with discussion results and section five concludes the paper.
2 Feature Extraction and Selection Method Dimensionality reduction by removing redundant and irrelevant features is a very important stage while doing decision making using classifiers. We present feature extraction and selection methods used for choosing better feature space, applied to classify the driver cognitive states. The methods suggest the relevance of these methods in terms of saving computational time and resources [7] along with minimizing the problem of overfitting or curse of dimensionality issue. Various feature extraction and selection techniques applied to EEG signal processing are studied in this work and the relevant methods are proposed. 2.1
Feature Extraction Techniques
Feature extraction methods transform the features into a newer set of features representing more optimum information. In this transformation process, the data (feature) size is also reduced.
A Feature Extraction and Selection Method for EEG
299
2
3 2 3 2 3 x1 y1 x1 4 x2 5 ! Feature extraction ! 4 y2 5 ¼ f 4 x2 5 x3 y3 x3 Techniques which have been experimented are, 2.2
Principal Component Analysis (PCA)
PCA is an unsupervised method where the features are transformed linearly into a new feature set of orthogonal or uncorrelated features. The features are also ranked according to the variance. The dimensionality reduction depends on the choice of a number of principal components. The data set has to be normalized before applying the PCA otherwise the features in the largest scale dominate the principal components. 2.3
Independent Component Analysis (ICA)
ICA is a technique that is used for the representation of multivariate data into statistically independent components by using linear transformations. Compared to other linear transformation techniques like PCA, and factor analysis, ICA linearly transforms the data which is non-Gaussian. This representation extracts the distinct structure of data which is still independent of each component [7, 11, 14]. ICA is a statistical method in which the random data observed are transformed linearly into independent components that are non-dependent to each other and also with “ interesting” distributions. ICA can be expressed as the approximation of a latent variable model. The intuitive notion of maximum nongaussianity can be used to derive different objective functions whose optimization enables the estimation of the ICA model. Alternatively, more conventional notions like maximum likelihood estimation or minimization of mutual information can also be used to estimate ICA; the FastICA algorithm has been proved to be a less computationally intensive method among all ICA techniques. ICA is used in many areas such as image processing, audio processing, biomedical signal processing, and econometrics. 2.4
Feature Selection Methods
Feature selection is eliminating the non-relevant features directly without using any transformation. The main reasons for using such section techniques are to reduce feature dimension, reduce training time, improve on accuracy and avoid overfitting feature selection methods are classified as filter methods, wrapper methods and embedded methods. 2.5
Filter Method
This is a technique, which is used at the preprocessing stage, the selection depends on correlation with the outcome variable which is obtained by applying various statistical tests and their scores. The approach is explained as below which is independent of any learning algorithms [12].
300
P. C. Nissimagoudar et al.
Original set of features ! Selecting a subset of features ! Apply learning algorithm ! Performance analysis Linear Discriminant Analysis (LDA), Analysis of Variance (ANOVA) and Chisquare are the commonly used techniques. The techniques we experimented is, Chi-Square: This statistical test is applied to the group of categorical features to estimate the correlation between the features based on the frequency distribution. 2.6
Wrapper Method
In this method, the subset of features is created using some statistical methods and these subsets are used to train the model. Based on the inferences derived from the model the decision on the retaining and removing the features will be taken. Compared to filter methods these methods are computationally more intensive. The commonly used wrapper techniques are forward feature selection, backward feature elimination, recursive feature elimination. Forward feature elimination technique starts with no features selected and goes on adding new features for every iteration for improvised model, till no change in model performance is observed. On the contrary backward feature, elimination technique starts with all features and goes on eliminating features for every iteration, till further elimination does not affect the performance. We have experimented with an optimized recursive feature elimination technique which works on the principle of retaining only best-performing features by raking all the features according to their performance. Filter methods, correlation with the dependent variables to decide the relevance of features while wrapper methods measure select features based on their relevance. Wrapper methods involve training models so they are computationally very expensive and time consuming compared to filter methods [9].
3 Experimentation and Results Various feature extraction and feature selection techniques with different classifiers was experimented and performance was evaluated for the accuracy and computation time. 3.1
Data Sets
The data set used for experimentation is sleep EEG signals available online from the Physionet database [11]. Sleep data is in the European data format (edf). The data consists of various sleep stages recorded at, Bob Kemp, Sleep Centre, MCH-Westeinde Hospital, Den Haag, The Netherlands. EEG data of 61 polysomnograms (PSGs) is *PSG.edf files and related annotations are done is *Hypnogram.edf files. *PSG.edf files are polysomnographic sleep EEG recordings for the whole night. The data includes EEG signals from Fpz-Cz and Pz-Oz locations of electrodes sampled at 100 Hz, with sleep stages annotated as W, R, 1, 2, 3, 4, REM, M (Movement time) and ? (Not scored). The annotations are according to the 1968 Rechtschaffen and Kales manual [10]. The data with wake (W) and initial sleep or drowsy (S1) stage along with annotation is extracted
A Feature Extraction and Selection Method for EEG
301
using the Polyman tool. The *PSG.edf and *Hypnogram.edf files for wake-up stage W and drowsy stage S1 are shown in Fig. 1. We took 150 samples 2 min each of sleep1 and wake stage data. As the sampling frequency is 100 Hz, we got 12000 samples for each signal [8].
Fig. 1. EEG signals for sleep stage S1 and wake stage W
Various statistical features were obtained from the raw signal. The features list is included in the following Table 1. Table 1. Features for EEG classifications Sl. no. Features 1 Power Spectral Density 2 Skewness 3 Hurst 4 Spectral entropy 5 SVD-entropy 6 Std deviation 7 Mean 8 Fisher info 9 Correlation-coefficient 10 Kurtosis 11 Hjorth complexity
3.2
Sl. no. Features 12 Hjorth mobility 13 RMS 14 HFD 15 Spectral Centroid 16 DFA 17 ZCR 18 Bandpower 19 Inter-quartile range 20 Median 21 Geometric mean 22 Hjorth mobility
Feature Extraction Methods
Principal Component Analysis (PCA) and Independent Component Analysis (ICA) techniques were experimented with the above mentioned data and the techniques were compared against different classifiers. Table 2 and Table 3 show the results of feature extraction methods PCA and ICA with varying feature reduction size and different classifiers. PCA experimented with the number of features varying from 2to 13, out of 20 features. The classifiers used were Support Vector Machines (SVM) [13], logistic regression with iteration rate as 1,500 and alpha the regularization parameter as 0.1, K-Nearest Neighbors (Neighbors(k) = 2), Random forest (No. of estimators = 500),
302
P. C. Nissimagoudar et al.
Neural Network (loss function: Mean squared error, optimizer: Adagrad, epochs:100, batch size:30). The classification accuracy achieved is around 85% for the number features 8 to 12.
Table 2. PCA with varying feature size and classification results Feature reduction method Principal Component Analysis (PCA)
No. of Classifier features 2 SVM 3 5 6 7 10, 11, 12, 13 5 Logistic regression (iterations = 1500, alpha = 0.01) 2, 3, 4 KNN (neighbors = 2) 5 6 7 8 9 10 11, 12, 13 2 Random forest (no. of estimators = 500) 3 4 6 7, 8 9 2, 3, 4 Neural Network (loss function: Mean squared error, optimizer: Adagrad, epochs:100, batch 5 size:30) 7 8, 10 9, 11 12, 13
Accuracy 85.83 83.33 86.67 90.83 89.17 88.33 82.5 80 81.77 82.5 84.17 85.83 86.67 85.83 85 85.83 85 86.67 87.5 86.67 85 85 88.33 89.17 90 88.3 90.83
Independent Component Analysis (ICA) also experimented with a number of components (features) varying from 2 to 13. The classifiers experimented were KNN, SVM, and Random forest. The accuracy obtained is about 90%, when the number of components is 8 and the accuracy is around 85% when the number of components is varied from 8 to 13.
A Feature Extraction and Selection Method for EEG
303
Table 3. ICA with varying feature size and classification results Feature reduction method Independent Component Analysis (ICA)
3.3
No. of features 2 3, 4, 5 6, 7 8 9, 10, 11, 12 2 3, 4 5 6, 10 2 3 4 5, 6 7 8 9, 10, 11, 12, 13
Classifier KNN (neighbors = 2)
SVM
Random forest (no. of estimators = 500)
Accuracy 78.33 81.67 84.16 86.67 85.83 72.5 63.33 52.5 48.33 86.7 85 83.33 85.83 90 91.67 85.83
Feature Elimination Methods
Chi-square and recursive feature elimination techniques were experimented with the above-mentioned sleep edf data and were compared with different classifiers for comparison. Below Table 4 shows the results for the chi-square feature elimination technique. Table 4. Chi-Square with varying feature size and classification results Feature reduction method No. of features Chi Square 10 11 12
Classifier SVM Random Forest Nearest Neighbor
Accuracy 78.33% 80% 80%
Feature elimination techniques experimented with feature sizes varying from 10, 11 and 12. The classifiers used were SVM, random forest with a number of estimators used 500, and nearest neighbors with n = 2. The classification accuracy of around 80%. Recursive feature elimination technique was experimented with the feature size varying from 3to 20. The classifiers tested were SVM, KNN with n = 2, Random forest and neural networks. The accuracy observed is around 90%. The combination of recursive feature elimination with neural networks is giving the accuracy around 93.33%.
304
P. C. Nissimagoudar et al.
Table 5. Recursive feature elimination technique with varying feature size and Classification results Features selected
SVM-Acc (%) 3, 17 82.5 3, 14, 17 89.17 3, 14, 17, 20 90 3, 14, 16, 17, 20 92.5 3, 14, 16, 17, 18, 20 90.83 3, 8, 14, 16, 17, 18, 20 91.67 3, 8, 13, 14, 16, 17, 18, 20 92.5 1, 3, 8, 13, 14, 16, 17, 18, 20 91.67 1, 3, 7, 8, 13, 14, 16, 17, 18, 20 90.83 1, 3, 4, 7, 8, 10, 13, 14, 15, 16, 89.17 17, 18, 20
KNN-Acc (%) 80 87.5 85.83 85.83 85.83 84.17 82.5 81.6 70.83 70.83
Random ForestAcc (%) 85 87.5 88.3 88.3 90 87.5 88.3 88.3 89.2 88.3
Neural-networkAcc (%) 80 90.83 90.83 92.5 93.33 89.17 90 88.3 90.83 88.3
4 Discussion Two different approaches of feature learning models i.e. feature extraction and feature selection methods are experimented for classifying driver alert drowsy states using EEG signals. Both the techniques essentially reduce data and retain only significant features for classification. The feature extraction methods PCA and ICA experimented here can be compared on various grounds. PCA is commonly used subspace projection technique. The basis vectors in PCA are obtained by solving the algebraic eigenvalues. PCA mainly focuses on minimizing the re-projection error from compressed data. It's fast and simple to implement, which means you can easily test algorithms with and without PCA to compare performance. In addition, PCA offers several variations and extensions (i.e. kernel PCA, sparse PCA, etc.) to tackle specific roadblocks. The new principal components are not interpretable, and need manual setting of threshold for cumulative variance. ICA is minimizes the statistical dependence between the basis vectors. It is computationally more superior to PCA. The components are extracted “randomly” depending on initial weight. The experiment results show ICA performance better for EEG analysis compared to PCA. Feature elimination techniques experimented show that recursive feature elimination technique performs better for EEG analysis. • From the table 5 it can be seen that RFE, which is a feature selection method with the combination of Neural network has achieved accuracy of 93.33. • Hence the features so selected, namely Spectral entropy, Spectral centroid, ZCR, Bandpower, Inter-Quartile Range and Geometric mean are important in the case of our analysis on EEG-based driver state detection. • In PCA and ICA which are feature reduction techniques, though the accuracy achieved is good, important features that play main role in EEG data analysis cannot be visualised.
A Feature Extraction and Selection Method for EEG
305
• Weightage of each feature cannot be seen in PCA and ICA. Where as in Feature selection the vital features required for EEG data analysis can be seen and can be used for future EEG experiments. • Hence in our analysis of EEG data, Feature selection method along with Neural network achieved higher accuracy than Feature reduction.
5 Conclusion In this paper, we experimented with feature extraction and feature elimination techniques applied to EEG signal analysis for driver state detection. The statistical features in time and frequency domain are used to classify the alert and drowsy state of the driver. Both feature extraction and feature elimination techniques were experimented and tested against machine learning neural network classifiers. The results show that ICA performs better compared to PCA when tested with different classifiers. The recursive feature elimination technique performed better when compared to chi-square. Overall the RFE when used with neural network techniques the classification accuracy was around 90%. This method can help to determine the drowsy driver condition in the automotive. The method also can be applied to analyze different sleep stages.
References 1. Rangayyan, R.M.: Biomedical Signal Analysis. Wiley-Intersciene, Wiley, New York (2002) 2. Tompkins, W.J.: Biomedical Digital Signal Processing. Prentice-Hall, Upper Saddle River (1995) 3. Shen, K.Q., Ong, C.J., Li, X.P., Hui, Z., Wilder-Smith, E.P.V.: A feature selection method for multilevel mental fatigue EEG classification. IEEE Trans. Biomed. Eng. 54(7), 1231 (2007) 4. Majkowski, A., Kołodziej, M., Zapała, D., Tarnowski, P., Francuz, P., Rak, R.J., Oskwarek, Ł.: Selection of EEG signal features for ERD/ERS classification using genetic algorithms. In: 2017 18th International Conference on Computational Problems of Electrical Engineering (CPEE) (2017) 5. Mehmood, R.M., Du, R., Lee, H.J. : Optimal feature selection and deep learning ensembles method for emotion recognition from human brain EEG sensors. In: Advances of Multisensory Services and Technologies for Healthcare in Smart Cities,. IEEE Access 5 6. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003) 7. Hira, Z.M., Gillies, D.F.: A review of feature selection and feature extraction methods applied on microarray data-review article. Adv. Bioinform. 2015. https://doi.org/10.1155/ 2015/198363. Article ID 198363, 13 pages 8. https://physionet.org/physiobank/database/sleep-edfx/ 9. Shoeb, A.H., Guttag, J.V.: Application of machine learning to epileptic seizure detection. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 975–982 (2010)
306
P. C. Nissimagoudar et al.
10. Garrett, D., Peterson, D.A., Anderson, C.W., Thaut, M.H.: Comparison of linear, nonlinear, and feature selection methods for eeg signal classification. IEEE Trans. Neural Syst. Rehabil. Eng. 11(2), 141–144 (2003) 11. Berthomier, C., Drouot, X., Herman-Stoïca, M., Berthomier, P., Prado, J., Bokar-Thire, D., Benoit, O., Mattout, J., d'Ortho, M.P.:. Automatic analysis of singlechannel sleep eeg: validation in healthy individuals. Sleep-N. Y.Then Westchester- 30(11), 1587 (2007) 12. Subasi, A.: Eeg signal classification using wavelet feature extraction and a mixture of expert model. Expert Syst. Appl. 32(4), 1084–1093 (2007) 13. Gireesha, H.M., Nanda, S.: Thyroid nodule segmentation and classification in ultrasound images. Int. J. Eng. Res. Technol. (2014) 14. Nissimagoudar, P.C., Nandi, A.V., Gireesha, H.M: Wavelet transform based EEG signal analysis for driver status detection. In: 9th International Conference on Innovations in BioInspired Computing and Applications. IBICA 2018, Kochi India, 17–19 December 2018. Advances in Intelligent Systems and Computing (AISC). Springer, vol. 939 (2018) 15. Nissimagoudar, P.C., Nandi, A.V.: An EEG based driver consciousness analysis. In: International Conference on Recent Advances and Applications in Computer Engineering (RAACE 2017), 23–25 November 2017. Springer (2017) 16. Nissimagoudar, P.C., Nandi, A.V., Gireesha, H.M.: EEG feature extraction and analysis for driver drowsiness detection. In: 9th International Conference on Innovations in Bio-Inspired Computing and Applications, IBICA 2018, Kochi, India, 17–19 December 2018. Advances in Intelligent Systems and Computing (AISC), vol. 939. Springer (2018)
Author Index
A Abraham, Ajith, 90, 157, 254 Aluvalu, Rajanikanth, 235 Anita, 71 Annigeri, Nirmala, 168 Ansari, Mohd. Abuzar Mohd. Haroon, 226 Ariffin, Nurfadilah, 177 Ashu, A., 80, 150 B Bakhtiari, Majid, 177 Balaga, Harish, 141 Beniwal, Rohit, 41 Bhattacharjee, Debotosh, 10 Bhattacharya, Sukrit, 10 Bhavya, S., 195 C Carbó-Dorca, Ramon, 90 Chandani, Jasbir Kaur, 277 ChandraShekar, K., 123 Chiam, Yoong Jien, 130 Chickerur, Satyadhyan, 168 Chiddarwar, Shital, 31 Chikhi, Salim, 157 Choo, Yun-Huoy, 90 Choudhary, Ayesha, 21 D Danish, Mohd., 41 Deshmukh, Sanjay, 187, 205, 277 Deshpande, Abhijit, 205 Dubey, Ramu, 244
E Endurthi, Anjaneyulu, 50 G Gandhi, Niketa, 187, 205, 226, 277 Ganivada, Avatharam, 1 Gaur, Arihant, 31 Giraddi, Shantala, 168 Gireesha, H. M., 287, 297 Goel, Arpit, 41 Gupta, Any, 21 H Ho, Nam, 266 Hussain, Mir Wajahat, 80 I Irani, Nushafreen, 205 J Jaya Krishna, Gutha, 102 K Kanojia, Mahendra G., 226 Kantipudi, M. V. V. Prasad, 235 Kassim, Mohamad Nizam, 130 Kassim, Mohd Nizam, 177 Kaufmann, Paul, 266 Khare, Akhil, 50 Khoo, Eric, 177 Kinage, Akshata, 31 Kirthika, N., 112 Kottayil, Sasi K., 112 Kumar, Nitin, 71 Kumar, P. Santhosh, 123
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 A. Abraham et al. (Eds.): SoCPaR 2019, AISC 1182, pp. 307–308, 2021. https://doi.org/10.1007/978-3-030-49345-5
308 L Lal, Rohit, 31 M Maarof, Mohd Aizaini, 130, 177 Marrapu, Deepthi, 141 Meera, Akhil Jabbar, 235 Meerja, Akhil Jabbar, 150 Mittal, Sandeep, 58 Muda, Azah Kamilah, 90 N Nandi, Anilkumar V., 287, 297 Nautiyal, Bhaskar, 244 Nissimagoudar, P. C., 297 Nissimagoudar, Prabhavathi C., 287 P Pillai, Anitha S., 195 Platzner, Marco, 266 Pratama, Satrya Fajri, 90 R Rajani Kanth, Aluvalu, 150 Ramachandran, K. I., 112 Ramrakhiyani, Vanita, 187 Ravi, Vadlamani, 102 RaviKanth, K., 123 Reddy, Hemant Kumar, 80
Author Index Rekhawar, Nilakshi, 31 Rukmangad, Shubhan, 31 S Sarkar, Ram, 10 Sharma, Priyanka, 58 Shaw, Vaibhav, 10 Singh, Pawan Kumar, 10 Singh, Teekam, 244 Sinha Roy, Diptendu, 80 Snášel, Václav, 254 Sreekanth, K., 123 T Tlili, Ahmed, 157 V Vadapalli, Hima, 217 Vimal, Vrince, 244 Y Yadav, Anupam, 71 Yadav, S. K., 226 Z Zainal, Anazida, 130, 177 Zakka, Benisemeni Esther, 217 Zjavka, Ladislav, 254