136 99 16MB
English Pages 558 Year 2022
Algorithms for Intelligent Systems Series Editors: Jagdish Chand Bansal · Kusum Deep · Atulya K. Nagar
Shikha Agrawal · Kamlesh Kumar Gupta · Jonathan H. Chan · Jitendra Agrawal · Manish Gupta Editors
Machine Intelligence and Smart Systems Proceedings of MISS 2021
Algorithms for Intelligent Systems Series Editors Jagdish Chand Bansal, Department of Mathematics, South Asian University, New Delhi, Delhi, India Kusum Deep, Department of Mathematics, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India Atulya K. Nagar, School of Mathematics, Computer Science and Engineering, Liverpool Hope University, Liverpool, UK
This book series publishes research on the analysis and development of algorithms for intelligent systems with their applications to various real world problems. It covers research related to autonomous agents, multi-agent systems, behavioral modeling, reinforcement learning, game theory, mechanism design, machine learning, meta-heuristic search, optimization, planning and scheduling, artificial neural networks, evolutionary computation, swarm intelligence and other algorithms for intelligent systems. The book series includes recent advancements, modification and applications of the artificial neural networks, evolutionary computation, swarm intelligence, artificial immune systems, fuzzy system, autonomous and multi agent systems, machine learning and other intelligent systems related areas. The material will be beneficial for the graduate students, post-graduate students as well as the researchers who want a broader view of advances in algorithms for intelligent systems. The contents will also be useful to the researchers from other fields who have no knowledge of the power of intelligent systems, e.g. the researchers in the field of bioinformatics, biochemists, mechanical and chemical engineers, economists, musicians and medical practitioners. The series publishes monographs, edited volumes, advanced textbooks and selected proceedings. Indexed by zbMATH. All books published in the series are submitted for consideration in Web of Science.
More information about this series at https://link.springer.com/bookseries/16171
Shikha Agrawal · Kamlesh Kumar Gupta · Jonathan H. Chan · Jitendra Agrawal · Manish Gupta Editors
Machine Intelligence and Smart Systems Proceedings of MISS 2021
Editors Shikha Agrawal University Institute of Technology Rajiv Gandhi Proudyogiki Vishwavidyalaya Bhopal, India Jonathan H. Chan King Mongkut’s University of Technology Thonburi Bangkok, Thailand
Kamlesh Kumar Gupta Rustamji Institute of Technology Gwalior, India Jitendra Agrawal School of Information Technology Rajiv Gandhi Proudyogiki Vishwavidyalaya Bhopal, India
Manish Gupta Institute of Professional Studies Gwalior, India
ISSN 2524-7565 ISSN 2524-7573 (electronic) Algorithms for Intelligent Systems ISBN 978-981-16-9649-7 ISBN 978-981-16-9650-3 (eBook) https://doi.org/10.1007/978-981-16-9650-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
Nowadays, when the world is becoming so ambiguous that cannot be comprehended by a single individual, information is growing at a tremendous rate and software systems are becoming uncontrollable, and this inspired computer scientists for designing an alternative intelligent systems in which control, pre-programming and centralization are replaced by autonomy, emergence and distributed functioning. The field of research focused on developing such systems and applying them to solve a wide variety of problems is termed as ‘machine intelligence.’ Machine intelligence is a methodology involving computing that provides a system with an ability to learn and/or to deal with new situations, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. Since its origin, the number of its successful applications has been grown rapidly, and the use of these machine intelligence algorithms has increased over the years. Machine intelligence (MI) techniques are ideal for such applications as tools of ‘knowledge discovery from data’ or in short ‘data to knowledge’ for complex and often apparently intractable systems. There is a need to expose academicians and researchers to machine intelligence (MI) and their multidisciplinary applications for better utilization of these techniques and their future development for developing smart systems. This book not only deals with an introduction of the MI techniques along with their several applications but also tries to cover several novel applications of combining MI techniques and utilizing the hybrid forms in different practical areas like engineering systems used in agriculture, military and civilian applications, manufacturing, biomedical and healthcare systems as well as education. Equally important, this book intends to demonstrate successful case studies, identify challenges and bridge the gap between theory and practice in applying machine intelligence to solving all kinds of real-world problems. Since machine intelligence is a truly interdisciplinary field, scientists, engineers, academicians, technology developers, researchers, students and government officials will find this text useful in handling their complicated real-world issues by using machine intelligence methodologies and assisting in furthering their own research efforts in this field. Moreover by bringing
v
vi
Preface
together representatives of academia and industry, this book is also a means for identifying new research problems and disseminating results of the research and practice. The main goal of this book is to provide scientific researchers and engineers with a vehicle where innovative technologies for developing smart systems through machine intelligence techniques are discussed. Bhopal, India Gwalior, India Bangkok, Thailand Bhopal, India Gwalior, India
Shikha Agrawal Kamlesh Kumar Gupta Jonathan H. Chan Jitendra Agrawal Manish Gupta
Contents
1
2
3
Artificial Intelligence Aided Neurodevelopmental Disorders Diagnosis: Techniques Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Deborah T. Joy, Sushree Prangyanidhi, Aman Jatain, and Shalini B. Bajaj Deep Learning Implementation for Dark Matter Particle Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anukriti and Vandana Niranjan Augmentation of Handwritten Devanagari Character Dataset Using DCGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rajasekhar Nannapaneni, Aravind Chakravarti, Shilpa Sangappa, Parinita Bora, and Raghavendra V. Kulkarni
1
9
31
4
Deep Reinforcement Learning for Optimal Traffic Control . . . . . . . . Rajasekhar Nannapaneni, Raghavendra V. Kulkarni, and Shalabh Bhatnagar
5
Ensemble Semi-supervised Machine Learning Algorithm for Classifying Complaint Tweets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pranali Yenkar and S. D. Sawarkar
65
Underwater Image Enhancement Using Fusion Stretch Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Litty Koshy and Shwetha Mary Jacob
75
6
7
8
A Novel Approach for Semantic Microservices Description and Discovery Toward Smarter Applications . . . . . . . . . . . . . . . . . . . . . Chellammal Surianarayanan, Gopinath Ganapathy, and Pethuru Raj Chelliah
45
89
Comparative Analysis of Machine Learning Algorithms for Imbalance Data Set Using Principle Component Analysis . . . . . . 103 Swati V. Narwane and Sudhir D. Sawarkar
vii
viii
9
Contents
Application of Reinforcement Learning in Control Systems for Designing Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Rajendar Singh Shekhawat and Nidhi Singh
10 Classifying Skin Cancer Images Based on Machine Learning Algorithms and a CNN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 S. Aswath and M. Kalaiyarivu Cholan 11 Lossless Data Compression Method Using Deep Learning . . . . . . . . . 145 Rahul Barman, Sayali Badade, Sharvari Deshpande, Shruti Agarwal, and Nilima Kulkarni 12 Comparative Study on Different Classification Models for Customer Churn Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Anuj Kinge, Yash Oswal, Tejas Khangal, Nilima Kulkarni, and Priyanka Jha 13 An E-healthcare System Using IoT and Edge Computing . . . . . . . . . 165 Nitish Gupta, Ashutosh Soni, Yogesh Kumar Gond, and Dinesh Kumar 14 Autism Detection Using Machine Learning Approach: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 C. Karpagam and S. Gomathi a Rohini 15 Double-Image Encryption Through Compressive Sensing and Discrete Cosine Stockwell Transform . . . . . . . . . . . . . . . . . . . . . . . 199 Saumya Patel and Ankita Vaish 16 A Hybrid Deep Learning Model for Human Activity Recognition Using Wearable Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Kumar Gaurav, Bholanath Roy, and Jyoti Bharti 17 Detection of Diabetic Retinopathy Using Deep Learning-Based Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Imtiyaz Ahmad, Vibhav Prakash Singh, and Suneeta Agarwal 18 Multiclass Image Classification Using OAA-SVM . . . . . . . . . . . . . . . . 235 J. Sharmila Joseph, Abhay Vidyarthi, and Vibhav Prakash Singh 19 Deep Learning-Based Differential Distinguisher for Lightweight Ciphers GIFT-64 and PRIDE . . . . . . . . . . . . . . . . . . . . 245 Girish Mishra, S. K. Pal, S. V. S. S. N. V. G. Krishna Murthy, Ishan Prakash, and Anshul Kumar 20 COVID-19 Detection Using Chest X-rays: CNN as a Classifier Versus CNN as a Feature Extractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 N. A. Sriram, J Vishaq, T Dhanwin, V Harshini, A Shahina, and A Nayeemulla Khan
Contents
ix
21 Fuzzy Set-Based Frequent Itemset Mining: An Alternative Approach to Study Consumer Behaviour . . . . . . . . . . . . . . . . . . . . . . . . 273 Renji George Amballoor and Shankar B. Naik 22 EEG Seizure Detection Using SVM Classifier and Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 Tulasi Pendyala, Anisa Fathima Mohammad, and Anitha Arumalla 23 Colorization of Grayscale Images Using Convolutional Neural Network and Siamese Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 Archana Kumar, David Solomon George, and L. S. Binu 24 Modelling and Visualisation of Traffic Accidents in Botswana Using Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Ofaletse Mphale and V. Lakshmi Narasimhan 25 EPM: Meta-learning Method for Remote Sensing Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Shiva Pundir and Jerry Allan Akshay 26 Collaborative Network Security for Virtual Machine in Cloud Computing for Multi-tenant Data Center . . . . . . . . . . . . . . . . . . . . . . . . 341 Rajeev Kudari, Dasari Anantha Reddy, and Garigipati Rama Krishna 27 Binary Classification of Toxic Comments on Imbalanced Datasets Using Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . 351 Abhinav Saxena, Ayush Mittal, and Raghav Verma 28 Robust Object Detection and Tracking in Flood Surveillance Videos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 S. Divya, P. Kaviya, R. Mohanaruba, S. Chitrakala, and C. M. Bhatt 29 Anti-drug Response Prediction: A Review of the Different Supervised and Unsupervised Learning Approaches . . . . . . . . . . . . . . 373 Davinder Paul Singh, Abhishek Gupta, and Baijnath Kaushik 30 An Highly Robust Image Forgery Detection Using STPPL-HBCNN and Region Detection Using DBSCAN-ACYOLOv2 Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Sagi Harshad Varma and Chagantipati Akarsh 31 Deep Learning-Based Differential Distinguisher for Lightweight Cipher GIFT-COFB . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 Reshma Rajan, Rupam Kumar Roy, Diptakshi Sen, and Girish Mishra 32 Dementia Detection Using Bi-LSTM and 1D CNN Model . . . . . . . . . 407 Neha Shivhare, Shanti Rathod, and M. R. Khan
x
Contents
33 PairNet: A Deep Learning-Based Object Detection and Segmentation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 Ameya Kale, Ishan Jawade, Pratik Kakade, Rushikesh Jadhav, and Nilima Kulkarni 34 Analysis of Electrocardiogram Signal Using Fuzzy Inference Evaluation System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 J. S. Karnewar and V. K. Shandilya 35 Optical Flow Video Frame Interpolation Based MRI Super-Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Suhail Gulzar and Sakshi Arora 36 An Efficient Hybrid Recommendation Model with Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Saurabh Sharma and Harish Kumar Shakya 37 Fixed-MAML for Few-shot Classification in Multilingual Speech Emotion Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 Anugunj Naman and Chetan Sinha 38 A Machine Learning Approach for Automated Irrigation Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485 Pulin Prabhu, Anuj Purandare, Abhishek Nagvekar, Aditi Kandoi, Sunil Ghane, and Mahendra Mehra 39 Deepfake Images, Videos Generation, and Detection Techniques Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Rishika Singh, Sagar Shrivastava, Aman Jatain, and Shalini Bhaskar Bajaj 40 Retinal Image Enhancement Using Hybrid Approach . . . . . . . . . . . . . 515 Prem Kumari Verma and Nagendra Pratap Singh 41 Prediction of Mental Stress Level Based on Machine Learning . . . . . 525 Akshada Kene and Shubhada Thakare 42 Data Distribution in Reliable and Secure Distributed Cloud Environment Using Hash-Solomon Code . . . . . . . . . . . . . . . . . . . . . . . . 537 Abhishek M. Dhore and Nandita Tiwari 43 A Novel Classification of Cancer Based on Tumor RNA-Sequence (RNA-Seq) Gene Expression . . . . . . . . . . . . . . . . . . . . . 547 Shweta Koparde Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
About the Editors
Dr. Shikha Agrawal is Director, Training and Placement at Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal (Madhya Pradesh), India and Associate Professor in the Department of Computer Science and Engineering at University Institute of Technology, Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal (Madhya Pradesh), India. She obtained B.E., M.Tech. and Ph.D. in Computer Science and Engineering from Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal. She has more than seventeen years of teaching experience. Her area of interest is swarm intelligence, artificial intelligence, soft computing image processing, data mining and particle swarm optimization. She has published more than 50 research papers in different reputed international journals, 04 patents filed, 12 chapters and two books. For her outstanding research work in Information technology, she has been awarded as “Young Scientist” by Madhya Pradesh Council of Science and Technology, Bhopal (2012). Her other extraordinary achievements include “CII SIS-Tech GURU” Award in 2017 by Confederation of Indian Industry, “Distinguished Assistant Professor” in 2017 by Computer Society of India, “Best Paper Presenter at International Conference” Award under CSI Academic Award (2017) , “Best Faculty–Published Research” by Computer Society of India (2018), “Best Faculty–Funded Research” by Computer Society of India (2019), “ICT Rising Star of the Year Award 2015” in International Conference on Information and Communication Technology for Sustainable Development (ICT4SD—2015), Ahmedabad, India, and Young ICON Award 2015 in Educational category by Dainik National News Paper Patrika, Bhopal, India. She has been elected as IEEE Senior member and Madhya Pradesh Student Chapter Coordinator of Computer Society of India. She is also a member of various academic societies such as IEEE Computational Intelligence Society, Indian Society of Technical Education (ISTE), Computer Society of India (CSI), ACM, Computer Science Teachers Association (CSTA), Machine Intelligence Research Labs (MIR Labs). She has participated in numerous (more than 15) conference presentations (including invited and peer-reviewed oral presentations and panel discussions) and chaired technical sessions in various international conferences and also served as a member of reviewer committee of International Journal of Computer Science and Information Security, USA, and many international conferences of IEEE, Springer etc. xi
xii
About the Editors
Dr. Kamlesh Kumar Gupta is Principal in Rustamji Institute of Technology (RJIT), BSF Academy, Tekanpur, Madhya Pradesh, India. He completed B.E., M.Tech, and Ph.D. in Computer Science and Engineering from RGPV Bhopal, Madhya Pradesh, India. He has published more than 40 research papers in various journals and conferences. He organized an international conference on emerging trends of computer science and its applications in RJIT. Also, he organized various training course on computer literacy in Border Security Force Academy, Tekanpur (MHA). He also guided 07 Ph.D. in various universities like RGPV, Uttrakhand Technical University, Amity University, Gwalior. He is guiding 10 M.Tech. thesis. He is a member of CSI, IETE, and member in BOG of RGPV University, Bhopal. Dr. Jonathan H. Chan is Associate Professor of Computer Science at the School of Technology, King Mongkut’s University of Technology Thonburi (KMUTT), Thailand. He holds a B.A.Sc., M.A.Sc. and Ph.D. degrees from the University of Toronto and is currently a visiting professor there until the end of 2019; he was also a visiting scientist at The Centre for Applied Genomics at Sick Kids Hospital in Toronto in several occasions. He is Section Editor of Heliyon Computer Science (Cell Press), Action Editor of Neural Networks (Elsevier) and a member of the editorial board of International Journal of Machine Intelligence and Sensory Signal Processing (Inderscience), International Journal of Swarm Intelligence (Inderscience), and Proceedings in Adaptation, Learning and Optimization (Springer). Also, he is a reviewer for a number of refereed international journals including Information Sciences, Applied Soft Computing, Neurocomputing, Neural Computation and Applications, BMC Bioinformatics, and Memetic Computing. He has also served on the program, technical and/or advisory committees for numerous major international conferences. He is Past President of the Asia Pacific Neural Network Assembly (APNNA) and current Governing Board Member of the Asia Pacific Neural Network Society (APNNS). In addition, he is a founding member and a current VP of the IEEE-CIS Thailand Chapter. He is a senior member of IEEE, ACM and INNS and a member of the Professional Engineers of Ontario (PEO). His research interests include intelligent systems, cognitive computing, biomedical informatics, and data science and machine learning in general. Dr. Jitendra Agrawal was born in 1974 and is Associate Professor in School of Information Technology at Rajiv Gandhi Proudyogiki Vishwavidyalaya, Madhya Pradesh, India. He is Teacher, Researcher and Consultant in the field of Computer Science and Information Technology. He earned his master degree from Samrat Ashok Technology Institute, Vidisha (Madhya Pradesh) in 1997 and was awarded Doctor of Philosophy in Computer and Information Technology in 2012. His research interests include database, data structure, data mining, soft computing and computational intelligence. He has published more than 50 publications in international journals and conferences. He is the recipient of the Best Professor in Information Technology Award by the World Education Congress in 2013. He has participated in many workshops/seminars/staff development programmes (SDP) organized at national and
About the Editors
xiii
international levels as resource person, keynote speaker, session chairman, moderator, panelist, etc. He has delivered over 10 invited talks, keynote addresses at different academic forums on various emerging issues in the field of information technology and innovations in teaching learning systems. He is a senior member of the IEEE (USA), life member of Computer Society of India (CSI), life member of Indian Society of Technical Education (ISTE), member of Youth Hostel Association of India (YHAI) and IAENG. He has been associated with the International Program Committees and Organizing Committees of several regular international conferences held in different countries like USA, India, New Zealand, Korea, Indonesia, Tunisia, Thailand, Morocco, etc. He regularly serves as reviewer for international journals and conferences, including IEEE, Elsevier and Inderscience etc. Manish Gupta is Assistant Professor in Computer Science and Engineering in Vikrant Institute of Technology and Management, Gwalior, Madhya Pradesh, India. He has done B.E. from MITS Gwalior, Madhya Pradesh, India, and M.Tech from ABV-IIITM Gwalior, Madhya Pradesh, India. He is also pursuing Ph.D. from RGPV Bhopal, Madhya Pradesh, India. He is 4-times GATE and 02-times NET qualified. He has published more than 25 research papers in various journals, IEEE conferences, Springer conferences. He has organized 02 IEEE special sessions and 01 Springer special sessions. He is also editorial board member of various organizations and journals. He has published two books. He has been awarded 06 M.Tech. thesis. He is currently working on a Central Government Project “Unnat Bharat Abhiyan” sanctioned by the Government of India. He is also working on a research project on “A new GUI based image cryptography using random key for IoT” sanctioned by RGPV university Bhopal, Madhya Pradesh, India under TEQIP-III. He has completed a project on “Cyber Security and Digital India Awareness Program for Rural Area” sanctioned by Madhya Pradesh Council of Science and Technology Bhopal, Madhya Pradesh, India.
Chapter 1
Artificial Intelligence Aided Neurodevelopmental Disorders Diagnosis: Techniques Revisited Deborah T. Joy, Sushree Prangyanidhi, Aman Jatain, and Shalini B. Bajaj
1 Introduction Infirmities in the working of the brain that affect a child’s behaviour, memory or ability to learn are called neurodevelopment disorders; they manifest as impairment in learning, speaking, moving and even neuropsychiatric difficulties [1]. Some of the specifics when it comes to NDDs are attention-deficit hyperactive disorder, autism spectrum disorder, learning disorders, intellectual disorder, dyspraxia and different forms of apraxia [2]. Talking about the use of AI in health care and diagnosis in specific, the essential aspect of most recent DL applications is the capability of machines to categorize and identify the input data with the exact label at scale. But existing systems offer much less but on the contrary more effective tools. AI is currently a part of the recovery and not the discovery of the process involving NDDs. Navjot Singh and Amarjot Singh in their research concentrate on the challenges in prevailing machine learning techniques and the greatest conceivable solution addressing those hitches. With a more visionary view, the authors hope for AI-based expert systems that could possibly augment the precision of the diagnosis and prognosis procedures, in the future [3].
2 A Neurodevelopmental Disorder Review As mentioned, neurodevelopmental disorders are the primary functioning of the brain leading to severe or mild impairment in the overall functioning of the body and reducing the productivity of an individual early in life. Some of the forms of NDDs are intellectual disability, dyslexia, autism, attention-deficit hyperactivity D. T. Joy (B) · S. Prangyanidhi · A. Jatain · S. B. Bajaj Department of Computer Science, Amity University Gurgaon, Gurgaon, Haryana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_1
1
2
D. T. Joy et al.
disorder, learning deficits and dyspraxia. Neurodevelopmental paediatrics is a field that addresses complex aspects of the growth enhancement of the central nervous system (CNS) particularly during childhood [4]. According to researches and surveys conducted, it has been found that neurodevelopmental behavioural disorders are often more prevalent in commonly in countries that have advanced more in the industrial sector [5]. Numbers accounting in 15% of the child population have been defined as facing these disorders including reduced intelligence quotient and cerebral palsy. It has also been noted that the occurrence is comparatively much higher in aboriginal children, and in most cases though, explicit aetiology is unknown [6]. According to Crocq and Rosendahl, neurodevelopmental behavioural intellectual disorders comprise several conditions. ADHD and ASD have been determined as the commonest functional conditions identified each of which consists of various subcategories based on the major symptomatology of an individual child [7]. In the Lancet Psychiatry, a conceptual approach has been adopted on the fact that ADHD and ASD although mostly recognized in children can walk all through a person’s life [8]. Developmental disorders were included for the first time in DSM-III. Neurodevelopmental disorders were introduced as an overarching disorder category in DSM-5. NDDs gained even more prominence in ICD-11, the latest revision of the International Classification of Diseases. NDDs are defined as a group of conditions with onset in the developmental period, inducing deficits that produce impairments of functioning [9]. In their research, Doernberg and Hollander have constructed the differences in the diagnosis of NDDs as based in the American Standard, DSM-5 and the global standard, ICD-10. They have covered both the differences and similarities between the two thus moving to a brief consultation on how the disparities affect the diagnosis, screening and treatment of NDDs in general. They have also proposed the use of the Research Domain Criteria (RDoC) initiative, an alternative diagnostic tool [10]. Moving on to the specifications in neurodevelopmental disorders, there is ADHD which is a psychiatric disorder that happens to be identified by impulsivity, inattentivity and physical eccentricity. This disorder affects up to 5% of the 5 to 15 population worldwide. However, there are former fMRI studies that have examined the neurobiology of ADHD [11]. Going over another specification, i.e. autism, the study done by Faras, Ateeqi and Tidmarsh strongly suggests that a complex mode of inheritance as a genetic basis lies within ASD. The authors thus believe that more research needs to be done on the environmental factors affecting said NDD. Furthermore, they have laid a ground for physicians giving them the needed information for the identification and referrals concerned with autism along with its controversies, aetiology and management [12]. Further, looking at a third NDD, dyspraxia where the talk is about kids with significant damage in their functioning. Dyspraxia or developmental coordination disorder, according to Gibbs and Appleton, must be identified early so as to find requisite cure. Early intervention tends to improve the functioning of children in their motor skills, further enhancing their self-esteem and social participation. The authors vouch that psychological support is a key factor in the repair and recovery of kids having DCD or any learning difficulty per say. And along with the former,
1 Artificial Intelligence Aided Neurodevelopmental …
3
school environment, teacher and parent assistance also play important roles [13]. Initially though, oppositional defiant disorder (ODD) had been counted as a much mild form of conduct disorder, but now it holds an independent position with other NDDs. Although it still stands along with other neural disorders with its aetiology lying in temperament, cognitive skills and deficits and other physical factors [14]. Ghosh and Sinha on the other hand have stated in their research that profiling the symptoms leads to believe that progress from ADHD to ODD and CD has a similar pattern where aggression, hostility and emotionality are concerned, thus giving rise to a possibility that a common psychopathologic range covers the three NDDs [15]. Semrud-Clikeman in his review studies the experimental works for calculations of learning problems in progenies from a neuropsychological viewpoint. The author mentions of an appraisal of kids with learning difficulties that they must deliberate on the measures of working recollection, attention, decision-making role and grasp both verbal which involves the use of ears and written form using motor skills in the upper limbs, in particular for progenies who do not answer to intercession. The review delivers evidence about the prevailing research on neurobiological associates of learning disabilities, the link to response-to-intervention movement and even likely ranges for additional estimation. The response-to-intervention (RTI) standpoint offers great sustenance for the procedures of growth in young children though is still in the development phase when considering students beyond the age of six [16]. Hudson, C. and Chan, J. in their review epitomize the earnestness for a provisional impartation for people with mental illness and intellectual disability which is unified and coordinated. Most evidences try and prove that mental illnesses in many people are in fact intellectual disabilities and not neuro-impairment. The conception that persons identified having intellectual disability can in fact have mental illness has only been recently acknowledged. Additionally, a wide array of psychiatric disorders often seen in the general population can also be found in people having intellectual disability [17].
3 Artificial Intelligence as a Solution The artificial intelligence aspect begins with classifying how to actually determine the existence of an NDD or even on the other hand help NDD patients to recover and tolerate the disorder. Convolutional neural net is the algorithm specifically designed to be deployed on images. The pre-processing required in a CNN is comparatively lower than other NNs. Basically, instead of applying filters and getting to the point where the NN can actually understand the image, CNN potentially learn what the image has to offer and works with it [18]. This turns out to be helpful where brain MRI can be used to detect the NDD. Further, recurrent neural network (RNN) is a much more modern and rather refined form of the neural networks which are essentially a set of algorithms which structure the human brain with close resemblance, chiefly designed to do what the human brain finds in itself pretty naturally which is to recognize patterns and actually have the neural nets learn from not only the past
4
D. T. Joy et al.
inputs, that are situational, but also from prior inputs, that it has already previously understood or even mastered [19]. Therefore, the work here lies in finding the damage in an actual human brain. In the respect of concern for the kind of dataset provided and used for the different projects and papers, the choice remains on neural networks, either convolutional or recurrent for a lot of the machine-learning-based solutions. However, most existing remedies are just that remedies. There have not been much backup when it comes to the diagnosis of the NDDs. This is primarily due to the fact that humans tend to rely more on who they see and revere as doctors over trained machine models in the form of ML or DL. Over to the specifications, FMRI-based prognosis has the potential to aid psychiatrists and psychotherapists in providing better-quality diagnosis and even effective treatment for psychiatric patients. This tactic is reliable with the current prominence on personalized medicine in healthcare services. Furthermore, for fMRI-based diagnostics what is of interest is rest-state fMRI [20]. When it comes to the aspect of using AI to work with individuals having ASD, although the present gilded standard in diagnostic measures depends on behavioural observations administered by healthcare experts, AI has ascended as an auspicious substitute. AI is built grounded on the biological nets in and of the human brain and is thus capable of accomplishing cognitive roles by imitating human acumen [21]. Artificial intelligence is progressively altering the functioning of the medical sector. With fresh development in machine and deep learning, digitalized statistics procurement and computing infrastructure, AI applications are intensifying over places that were unchartered territory like medicine of all surprises, the major province of human experts. Yu KH, Beam AL and Kohane IS outline recent innovations in AI technologies and their biomedical claims, thereby identifying the trials for further development in healthcare AI imbued systems; they also give a summary on the implications that AI offers in health care as relevant in the sectors of economy, law and the society [22].
3.1 Using AI with Neurodevelopmental Disorders Coming over to the use of AI with neurodevelopmental disorders, it is helpful to note that the drive of meticulous medication is to define and augment the root for medical analysis, therapy and genetic prediction based on genes, regular functioning and the environment. Taking the advantage of computer competences, artificial intelligence algorithms now can reach rational success in predicting diseases from available multidimensional clinical and biological data. Contrary to this success, NDD poses a challenge causing an impede to similar progresses. Researchers are optimistic on quantifying the patterns of supposed NDDs on genetic, phenotypic and aetiologic backgrounds. Further dimensions are likely to be provided by structural and practical brain imaging and neurophysiological/neuropsychological markers, but these often
1 Artificial Intelligence Aided Neurodevelopmental …
5
necessitate added progress to attain sensitivity for diagnosis. Therein lies a precision medicine conundrum: where the diagnosis of neurodevelopmental disorders is concerned, can the so-called AI propose a breakthrough in predicting the risks [23]. Other further studies though have also extended to using valuation tools with AI to segregate common neurodevelopmental disorders. Duda et al. have used diverse machine learning algorithms to find the best cataloguing features using SRS, i.e. the Social Responsiveness Scale. Presently, no screening test design exists that can potentially draw a sure line in the differences between the two major disorders. Furthermore, with in the making times from preliminary misgiving to diagnosis uphill a year, methods to rapidly and correctly assess risk for these and even other developmental disorders are dreadfully the need of the hour [24]. The purpose of the authors’ study was to appraise literature that has applied AI skills and techniques to contemporary assessment instruments for autism. A total of thirteen articles were studied for review with most of the lessons using supervised machine learning methods such as support vector machines to distinguish between persons with and without the autistic disorder. Discoveries establish that the algorithms were primarily used to identify structures that were most illustrative of autistic characteristics and that the algorithms were able to exclude redundant items [25]. In one of the major works done on using AI on NDDs, Attallah, Sharkar and Gadelkarim have researched and found that neurodevelopment disorders can be detected early in the stages of embryonic development in pregnant women. The hope rises from the fact that brain defects can be spotted and to an extent identified in the embryonic stage. Previous works on the same have been bare and almost nothing thus posing severe challenges coming into feature extraction and even in the classification process. Deep learning methods have the ability to deduce an optimum demonstration from the raw images without image enhancement, segmentation and feature extraction processes, leading to an effective classification process. Their article proposes a new framework based on deep learning methods for the detection of END. The framework depends on feature fusion. The results of the research showed that the proposed framework was capable of identifying END from embryonic MRI images of various gestational ages thereby succeeding in the detection of END quite accurately [26]. Shari Trewin from a viewpoint considers how fair handlings for people with disabilities in the modern society might be obstructed or at least severely affected by the rise of use in artificial intelligence, and especially machine learning and deep learning methods. When it comes to fairness between people with protected attributes of age or gender and the fairness of people suffering with one or the other form of disabilities there is clear argument in the differences since the two do not and cannot be intermingled. One of the major differences lying in the extreme diversity of ways that disorders reflect and state themselves and how people adapt to them. Given such differences, the authors explore definitions of fairness along with how well they work where the space of disorders or disabilities are concerned. Thus, giving rise to the talk on the suggested ways of approaching fairness for people with disabilities in AI applications [27].
6
D. T. Joy et al.
According to some of the existing systems, the neurodevelopmental assessment comprises more than nine hundred functions when oriented with academic and sensory motor testing. The output of this calculation is a highly customized report which provides parents with a comprehensive understanding of their child’s sociobehavioural and academic skill hemispheres and how they communicate equally at lightning speed which essentially is millions of times per minute in a poorly function bringing the left and the right side of the brain together and only impacting partial information thereby causing frequent miscommunication, called functional disconnection, and is found to be the root of many types of learning disorders along with behavioural and social problems found in growing children. Although, of course, the machine learning algorithm used for the NDD test is not always absolutely accurate which is informed through the feedback of healthcare staff, generally the medical professionals disagree with the system that is potentially envisioned to be as a tool so as to help them support their own diagnosis without undermining their proficient judgement and without forcing them into a decision which they don’t agree with. Therein lies the dilemma since potentially the only limitation of this model is that it is not a hundred percent accurate, basically since the source of the truth is not known for sure arising from the fact that AI is a black box which can neither itself be explained nor can its prediction for most of the model be elucidated [28]. Artificial intelligence is potentially a new solution merging neural network architectures with massive calculating power to qualify the solution itself to learn a pattern from large data sets and thereby make statistical estimates based on solid and somewhat evident test results and responses that already exist for tens of thousands of students. Basically, clinical decision support systems aiding healthcare practitioners are believed to be fast-growing with data sets about children shared with new artificial intelligence replicas such as reasonable AI will help the healthcare staff to improve a child’s initial valuation fortunately improving the overall programs outcomes [29]. On a final note, AI, being a new technological revolution, enables the study of human conduct forms, which happen to hidden in layers of millions of micro-patterns potentially coming from their activities, responses and gestures [30].
4 Discussion and Future Possibilities Over to the end, all that can be done is to summarize a vast ocean of knowledge on neurodevelopmental disorders, their functioning, how they are influenced by external factors and finally how AI can lend a hand in their build and progress. This here though becomes a sensitive point where the confidence of handing over medical diagnosis, which can potentially make or break the life of a growing child, is given into the hands of artificial intelligence. NDDs range wide but affect at higher intensities concentrated on smaller groups. So, the optimism lies in the fact that AI can help detect and classify at level one, the existence of NDDs or potential presence of them. The vision leads one to believe that systems can be developed to study from patterns and foresee the exact class of neurodevelopmental disorders to aid the handlers for
1 Artificial Intelligence Aided Neurodevelopmental …
7
early and effective detection of the disorders along with the specifications. Furthermore, it could also be to hand over such a system into the hands of parents and/or teachers to assist them in the upbringing of children falling into the categories of neurodevelopmental disorders. But those are left to the future potentials.
References 1. Bale TL, Baram TZ, Brown AS, Goldstein JM, Insel TR, McCarthy MM, Nemeroff CB, Reyes TM, Simerly RB, Susser ES, Nestler EJ (2010) Early life programming and neurodevelopmental disorders. Biol Psychiatry 68(4):314–319. ISSN 0006-3223. https://doi.org/10.1016/j. biopsych.2010.05.028 2. Singh N, Singh A (2020) Role of AI in modeling psychometrics to detect NDDs. In: Interdisciplinary approaches to altering neurodevelopmental disorders. www.igi-global.com/cha pter/role-of-artificial-intelligence-in-modeling-psychometrics-to-detect-neurodevelopmentaldisorders/254678. https://doi.org/10.4018/978-1-7998-3069-6.ch013 3. Dorothy B, Michael R (2008) Neurodevelopmental disorders: conceptual issues. Rutter’s Child Adolesc Psychiatry. https://doi.org/10.1002/9781444300895.ch3 4. Institute of Medicine (US) Committee on Nervous System Disorders in Developing Countries (2001) Developmental Disabilities. In: Neurological, psychiatric, and developmental disorders: meeting the challenge in the developing world. National Academies Press (US), Washington (DC). Available from: https://www.ncbi.nlm.nih.gov/books/NBK223473/ 5. Children and NDBID, Children’s Health and the Environment, WHO. https://www.who.int/ ceh/capacity/neurodevelopmental.pdf 6. De Felice A, Ricceri L, Venerosi A, Chiarotti F, Calamandrei G (2015) Multifactorial origin of neurodevelopmental disorders: approaches to understanding complex etiologies. Toxics 3(1):89–129. https://doi.org/10.3390/toxics3010089 7. Morris-Rosendahl DJ, Crocq MA (2020) Neurodevelopmental disorders-the history and future of a diagnostic concept. Dialogues Clin Neurosci 22(1):65–72. https://doi.org/10.31887/DCNS. 2020.22.1/macrocq 8. Thapar A, Cooper M, Rutter M (2017) Neurodevelopmental disorders. The Lancet Psychiatry 4(4):339–346. ISSN 2215-0366. https://doi.org/10.1016/S2215-0366(16)30376-5 9. Volkmar FR (2013) DSM-III. In: Volkmar FR (eds) Encyclopedia of autism spectrum disorders. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1698-3_1442 10. Doernberg E, Hollander E (2016) Neurodevelopmental disorders (ASD and ADHD): DSM5, ICD-10, and ICD-11. CNS Spectr 21(4):295–299. https://doi.org/10.1017/S10928529160 00262 11. Paloyelis Y, Mehta MA, Kuntsi J, Asherson P (2007) Functional MRI in ADHD: a systematic literature review. Expert Rev Neurother 7(10):1337–1356. https://doi.org/10.1586/14737175. 7.10.1337 12. Faras H, Al Ateeqi N, Tidmarsh L (2010) Autism spectrum disorders. Ann Saudi Med 30(4):295–300. https://doi.org/10.4103/0256-4947.65261. PMID: 20622347; PMCID: PMC2931781 13. Gibbs J, Appleton J, Appleton R (2007) Dyspraxia or developmental coordination disorder? Unravelling the enigma. Arch Dis Child 92(6):534–549. https://doi.org/10.1136/adc.2005. 088054. PMID: 17515623; PMCID: PMC2066137 14. Garlie MJ (2000) Oppositional defiant disorder (ODD) in children and adolescents. Graduate Research Papers, 734. https://scholarworks.uni.edu/grp/734 15. Ghosh S, Sinha M (2012) ADHD, ODD, and CD: do they belong to a common psychopathological spectrum? A case series. Case Rep Psychiatry 2012:520689. https://doi.org/10.1155/ 2012/520689. Epub (2012) Oct 11. PMID 23097736; PMCID PMC3477532
8
D. T. Joy et al.
16. Semrud-Clikeman M (2005) Neuropsychological aspects for evaluating learning disabilities. J Learn Disabil 38(6):563–568. https://doi.org/10.1177/00222194050380061301 17. Hudson C, Chan J (2002) Individuals with intellectual disability and mental illness: a literature review. Aust J Soc Issues 37:31–49. https://doi.org/10.1002/j.1839-4655.2002.tb01109.x 18. Albawi S, Mohammed TA, Al-Zawi S (2017) Understanding of a convolutional neural network. In: 2017 international conference on engineering and technology (ICET), Antalya, Turkey, pp 1–6. https://doi.org/10.1109/ICEngTechnol.2017.8308186 19. Lipton ZC, Berkowitz J, Elkan C (2015) A critical review of recurrent neural networks for sequence learning, eprint 1506.00019, http://arxiv.org/abs/1506.00019 20. Matthew B, Gagan S, Russell G, Nasimeh A, Meysam B, Peter S, Andrew G, Serdar D (2012) ADHD-200 global competition: diagnosing ADHD using personal characteristic data can outperform resting state fMRI measurements. Front Syst Neurosci 6:69.https://doi.org/10. 3389/fnsys.2012.00069 21. Zeinab S, Mohammadsadegh A, Soorena S, Mariam Z-M, Moloud A, Rajendra AU, Reza K, Vahid S (2020) Automated detection of autism spectrum disorder using a convolutional neural network. Front Neurosci 13. https://doi.org/10.3389/fnins.2019.01325. ISSN 1662-453X 22. Yu KH, Beam AL, Kohane IS (2018) Artificial intelligence in healthcare. Nat Biomed Eng 2(10):719–731. https://doi.org/10.1038/s41551-018-0305-z. Epub (2018) Oct 10. PMID 31015651 23. Uddin M, Wang Y, Woodbury-Smith M (2019) Artificial intelligence for precision medicine in neurodevelopmental disorders. NPJ Digit Med 2:112. https://doi.org/10.1038/s41746-0190191-0 24. Duda M, Haber N, Daniels J et al (2017) Crowdsourced validation of a machine-learning classification system for autism and ADHD. Transl Psychiatry 7:e1133. https://doi.org/10. 1038/tp.2017.86 25. Song D, Kim SY, Bong G, Kim JM, Yoo HJ (2019) The use of artificial intelligence in screening and diagnosis of autism spectrum disorder: a literature review. J Korean Acad Child Adolesc Psychiatry 30:145–152. https://doi.org/10.5765/jkacap.190027 26. Attallah O, Sharkas MA, Gadelkarim H (2020) Deep learning techniques for automatic detection of embryonic neurodevelopmental disorders. Diagnostics 10(1):27. https://doi.org/10. 3390/diagnostics10010027 27. Trewin A (2018) AI fairness for people with disabilities: point of view, arXiv:1811.10670 journal CoRR, vol abs/1811.10670. http://arxiv.org/abs/1811.10670 28. Parenti I, Rabaneda LG, Schoen H, Novarino G (2020) Neurodevelopmental disorders: from genetics to functional pathways. Trends Neurosci 43(8):608–621. ISSN 0166–2236. https:// doi.org/10.1016/j.tins.2020.05.004 29. Magrabi F, Ammenwerth E, McNair JB, De Keizer NF, Hyppönen H, Nykänen P, Rigby M, Scott PJ, Vehko T, Wong ZS, Georgiou A (2019) Artificial intelligence in clinical decision support: challenges for evaluating AI and practical implications. Yearb Med Inform 28(1):128– 134. https://doi.org/10.1055/s-0039-1677903 30. Khanna P (2020) Attitudes of society towards people with neurodevelopmental disorders: problems and solutions. In: Wadhera T, Kakkar D (eds) Interdisciplinary approaches to altering neurodevelopmental disorders. IGI Global, pp. 1–12. https://doi.org/10.4018/978-17998-3069-6.ch001
Chapter 2
Deep Learning Implementation for Dark Matter Particle Detection Anukriti and Vandana Niranjan
1 Introduction Space exploration and astrophysics are two wonders of modern science that produce new area of research in abundance. The research areas produce answers to the questions which include the exploration of our scientific knowledge on how deep do we ourself know this universe. The domain comprehended us about its formation as well as the composition of celestial world. A variety of research is already begun which is the prime motivation for our research to explore hidden secrets of the universe and its entities, in the form of dark matter.
1.1 Know Quarks An elementary and most fundamental particle of existing matter is called quarks. Quarks got its position in the elementary particles model table shown in Fig. 1. It exists in six different types, where up and down are most occurring. A single proton is made of one down and two up quarks, while neutrons are made up of one up and two down quarks as stated in [2] shown in Fig. 2. Quarks are spin 21 particles, and charge varies from −1 to 23 . 3 Quarks combine and form secondary particles which are known as hadrons, in which neutron and proton were the most sustained particles.
Anukriti (B) · V. Niranjan Department of Electronics and Communication Engineering, Indira Gandhi Delhi Technical University for Women, New Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_2
9
10
Anukriti and V. Niranjan
Fig. 1 Standard model [3] of elementary particle
Fig. 2 Proton and neuron quark composition as shown in Strassler [4]
1.2 Dark Matter Dark matter unlike traditional matter has weird characteristics that do not resemble with the traditional properties of visible matter. It is termed as dark matter due to the fact that its interaction with electromagnetic radiations is almost null, and since light is also an electromagnetic wave, due to no interaction, it becomes very hard to spot. Balasubramanian [5] produced a paper that hints on the fact that less than 5% of the universe is composed of the visible matter and energy that our most advance instruments ever observed. Dark matter makes up about 27%, and the remaining is the dark energy with around 68% constituency (Fig. 3).
2 Deep Learning Implementation for Dark Matter Particle Detection
11
Fig. 3 Pie chart depicting the composition of the universe
Another finding in dark matter stated that it could be made up of non-baryonic particles, because of the fact that it does not interact with radiation like regular matter particles composed of baryons (having three quark particles). Dark matter is a different entity than antimatter, because it does not produce gamma rays that should be produced when antimatter annihilates with matter [6].
1.3 Production of Dark Matter Particles Although dark matter does have sufficient mass but their interaction is weak, hence, they are considered as weakly interacting massive particle (WIMP) in natural occurrence. Thermal energy is the core of the production of such particles, and their existence is approximated of having energy around 10–10,000 giga eV which could be produced thermally only in the hot early universe. It also gives almost correct approximation for the abundance of observed dark matter during that period [6]. Other than WIMP, dark matter could be synthesized using other procedures also such as decay of heavier particles, vacuum misalignment, using oscillations and through gravitational mechanism. However, none of the mechanisms are good enough for producing dark matter with sustainable lifetime for traditional sensors and computers to detect. Hence, advance ring imaging [7] techniques are used to identify the type of particle and its path inside the detectors.
1.4 The so Invisible Dark Particles During the initial period of big bang, the universe was nothing but a giant cloud of matter and gigavolts of energy. This duration was called as the inflation of the universe, where the only present material was quark. As can be seen from Fig. 4 [8], at that period, there was abundance of everything, whether it was energy or quark. But as the period ascends, the quarks starts to combine to form matter (both visible and dark) and production of protons and hydrogen nuclei occurs. The first basic elements of the periodic table were produced within few seconds of big bang. But, those event did not halted there, as the inflation progressed,
12
Anukriti and V. Niranjan
Fig. 4 Timeline of the early universe
the expanding universe results in decaying of energy as quarks combine to form stable matter. A series of sub-atomic reactions starts taking place which not only reduced the energy of universe, but more stable and heavy compounds are also formed. This crucial event was termed as a freezeout.
1.5 Reactions During the Freezeout The early universe was in thermal equilibrium where the rate of expansion of universe (H) is smaller than the rate of interaction () taking place, i.e. ≥ H or t int ≤ t H ; we term this duration as coupling stage. However, during expansion, the rate of expansion of universe (H) became larger than rate of interaction () implying that the particles now fell out of the thermal equilibrium, i.e. ≤ H or t int ≥ t H . This is the period of decoupling. The ≥ H relation is valid only when the temperature is in the range of 100–1016 giga eV as described in [9]. In thermal equilibrium, the weight of electrons is in direct relation to e−mT , and during the neutrino decoupling, a reversible reaction will occur v + v ∗ ↔ e+ + e− where ν and ν ∗ are neutrino and anti-neutrino, respectively, and is assumed to be in thermal equilibrium. But as the temperature drops below the energy of electron, the energy for v + v ∗ ↔ e+ + e− reaction to happen does not exist; hence, only reaction happens, and therefore, the neutrinos propagate freely during the freezeout. e+ + e− → v + v ∗ Now, this produces a new confusion that if electrons are combining to form neutrino, then there must not be enough electrons for further reactions. The fact is that there are two prominent groups—the first group is for weak reactions where
2 Deep Learning Implementation for Dark Matter Particle Detection
13
Fig. 5 Relation between number density (relic abundance) and temperature
v+e →v+e and the second group for electromagnetic reactions where γ +e →γ +e where ν and γ are neutrino and photon. The motivating part was that the rate of electromagnetic reactions is really huge, so electrons stayed in equilibrium in the universe. But, due to this freezeout stage, dark matter disappeared in abundance in response to the creation of lower energy particles such as photons and neutrinos. As seen in Fig. 5 from [9], the relic abundance Y closely follows the equilibrium values, as it proceeds towards freezeout; after decoupling, it non-relativistically moves towards the constant value of Y ∞ . But, as the λ value is increased, the decoupling temperature accordingly decreases and relic abundance is, therefore, reduced. The black dashed line outcasts the thermal equilibrium state where it indicates a strong exponential drop.
1.6 Particle Decay Lawrence Berkeley National University [10] published an article which informs in simpler terminologies that like most of the heavy elements “decay” into simpler things, sugar breaks down into sucrose and fructose, so does the particle, and its decay signifies to the conversion of that fundamental particle into another fundamental particles. The decay is unique, because the final particle produced is not related to the starting particle, but it itself is a new particle as a whole. A particle can decay into different products, for example decay of muon as shown in Fig. 6 produces neutrino, anti-neutrino and electron.
14
Anukriti and V. Niranjan
Fig. 6 Example showing the decay of muon [11]
Three of the most common types of decay are alpha decay, beta decay and gamma decay from [12] all of which involve emitting one or more particles or photons. According to Arthur’s quantum theory, it is impossible to predict when a particular atom will decay, regardless of how long the atom has existed. However, for a significant number of identical atoms, the overall decay rate can be expressed as a decay constant or as half-life. The half-lives of radioactive atoms have a huge range; from nearly instantaneous to far longer than the age of the universe. Johnston in year 2015 posted their article in Particles and Interactions Research Update [13] that electron has a lifespan of approximate 66,000 yottayears (66 × 1027 years).
1.7 Artificial Method to Generate Dark Particle Using Proton Collision In Large Hadron Collider experiment at CERN [14], the LHCb Collaboration published their paper on high-energy particle collision to show the experimental results in production of sub-particles that exist only in high energy. The protonproton high-energy collision produces a series of events of production of leptons, bosons and large amount energy as required for sustaining the reactions. Every collision is unique as the particles generated in each collision are different and have different half-lives too. The challenge for this event is to detect and identify the particles that are produced in the collider of which some of them might be a constituent of dark matter.
1.8 Particle Identification The goal of the particle identification is to identify a type of a particle associated with a track using responses from different systems. Figure 7 according to the LHCb Collaboration is a generic model for the identification of particle generated after collision in Large Hadron Collider. Here, the tracking system is the first system of the LHCb detectors and stands before particle collision area of a detector. It is responsible for particle trajectories recognition and their parameters estimation. In additional, the system has a magnet
2 Deep Learning Implementation for Dark Matter Particle Detection
15
Fig. 7 Particle identification in stages
which deflects particles and allows to measure the particle momentum. The tracking system has several layers of sensors, and when a particle flies through these layers, it produce a response (a.k.a. Hits) in the sensors which recognize particle tracks and estimate their parameters. Particle momentum is calculated by formula ρ=
mcβ 1 − β2
where β is the velocity of charged particle in units of speed of light (c) in vacuum, ρ is the momentum, and m is the mass of the charged particle. The second part of identification is done by ring imaging Cherenkov detector (RICH) which is responsible for the particle-type identification. RICH detector is based on Cherenkov’s radiation effect. Cos(θ ) =
1 ηβ
where n is the refractive index of gases inside Cherenkov detector and Cos(θ ) is the Cherenkov emission angle as understood from meson spectroscopy done by [15]. From the above formula, it can be derived that at θ = 0° and β = n1 . In the research, the dataset we are using will contain particle of types muon, kaon, pion, proton and electron. But, if we know a particle momentum measured in the tracking system and the emission angle measured at RICH detector, we can estimate the particle type, and as a result, its mass is identified. Figure 8 from [15] depicts the RICH detector graph. From particle momentum formula, β could be re-written as ρ β= 2 ρ + m 2 c2
16
Anukriti and V. Niranjan
Fig. 8 Graph depicting Cherenkov angle for different particles
Therefore, Cherenkov’ emission angle can be re-written into the form ρ 2 + m 2 c2 Cos(θ ) = ηp The third stage of particle identification is the two-stage calorimeter wherein the first calorimeter is the electromagnetic and the subsequent one is hadron calorimeter. The purpose of these calorimeters is to detect the energy of particles. Particles interact with matter of the calorimeter and lose energy. The calorimeter measures how much energy the particles lose before they stop. The electromagnetic calorimeter is responsible for measuring the energy of only electrons and photons. Electromagnetic shower grows while the energy of the particles is above the critical value. The size of shower is determined as E = E0 e
− XX 0
And X = X 0 ln
E0 E
The hadron calorimeter is similar to the electromagnetic calorimeter, but this time matter interacts with hadrons like proton, neutron and other quark containing particles, and their energy is measured in similar manner as done in electromagnetic calorimeter (Fig. 9). The last stage of identification is in muon chamber where only muon is left undetected from the start, and in muon chamber, its energy is measured. As seen in Fig. 10, only muons are able to cross all calorimeters. A muon system has several layers of muon chambers with layers of metal between them. The goal of the metal is to stop muons. The larger the X, the higher the muon energy.
2 Deep Learning Implementation for Dark Matter Particle Detection
17
Fig. 9 Inside reaction of electromagnetic calorimeter and development of showers [16]
Fig. 10 Particle showers in different systems [17]
As seen in Fig. 11, a muon when passes through the gas in detector chambers, it gets ionized. The electrostatic field present inside the chamber causes the ions to move towards cathode, and electrons move towards the anode. It, therefore, generates a signal in the detector that detects the muon.
18
Anukriti and V. Niranjan
Fig. 11 Detection inside muon chamber
2 Neural Model and Dataset Learning 2.1 Feature Selection The dataset [18] consists of more than 100 K data entries, with 45 different features related to reconstructed track, energy deposits, particle momentum, Cherenkov light data and flag data along all the identification system, i.e. tracker, RICH detector, calorimeters and muon chamber. For the research, all features are not required as it will be a burden for the model that might over-fit which we intend to avoid. The following are the selected features of the dataset categorized into three unique categories to analyse various property of particle separately as accordance to our neural network and produce a better prediction of the type of particle. The dataset is categorized into three groups, based on the vector of track, physical detection using sensors or flags and using the energy deposits in the calorimeters. Each of these properties can give a prediction of the type of particle produced in collision. Category 1: Track Data Analysis The tracks of particle are reconstructed using imaging, and from the imaging technique, some data is generated like momentum of the particle labelled as TrackP, transverse momentum of the particle labelled as TrackPt, number of degrees of freedom for track fit in the Subdetectors 1 and 2 labelled as TrackDOFSubdetectors 1 and 2, respectively, the number of degrees of freedom using hits in all tracking subdetector labelled as TrackDOF. In addition to this data, we added the chi quality data for subdetectors also to support the model, features are chi quality of the track fit in the tracking Subdetectors 1 and 2 labelled as TrackQualitySubdetectors 1 and 2, respectively, the chi quality data of the track fit per degree of freedom labelled as TrackQualityPerDOF and lastly the distance between track and beam axis labelled as TrackDistanceToBeamZ.
2 Deep Learning Implementation for Dark Matter Particle Detection
19
Category 2: Detectors Flags Flag data is basically just Boolean value describing whether or not the particle is detected in a particular detector or not. If reconstructed track of the particles passes through scintillating pad detector, the data is generated in FlagSpd. If the reconstructed track of the particles passes through pre-shower sensors, the data is generated in FlagPreshower. If the reconstructed track of the particle passes through the first RICH detector, the flag is set on FlagRICH 1, and if it passes through second also, FlagRICH 2 is also set. If the reconstructed track of the particle passes through electron calorimeter, value is generated in FlagEcal. Similar is for hadron calorimeter where data is generated in FlagHcal. Lastly, we have muon chamber, so if reconstructed track passes through muon stations, FlagMuon is set. There might be a scenario where reconstructed track traces of the particles were deflected by detector, and in such case, FlagBrem showcases the indication. Category 3: Energy deposit and RICH analysis The deposits of energy detected across various devices are analysed in this category. The energy deposit associated to the track in the scintillating pad detector is labelled as SpdE. The energy deposit associated to the track in the pre-shower device is labelled as PreshowerE. The energy deposits in electron and hadron calorimeter are labelled as EcalE and HcalE, respectively. There is no reading for muon chamber energy deposit; hence, we will use the RICH detector data. This detector analyses the Cherenkov light and gives information in the form of flags that tells whether the momentum is sufficient enough for particles to be of certain type or not. There are five flags RICHFlagForElectron, RICHFlagForProton, RICHFlagForPion, RICHFlagForKaon, RICHFlagForMuon which individually indicates whether or not is the momentum is sufficient enough to cross the threshold for respective particle to produce Cherenkov light.
2.2 Data Split for Validation and Training Before the development of our model, the dataset is needed to be randomly separated into two parts such that one part is reserved for training and the other is reserved only for testing and verification of the model developed and accuracy measurement. The data is split into 1,000,000 rows for training and 20,000 for testing.
2.3 Neural Model Development As can be understood from the above dataset explanation that there is a variety of properties of a particle that we need to learn for prediction. It will create a lot of burden on classical machine learning techniques to understand a variety of such
20
Anukriti and V. Niranjan
properties and produce accurate prediction on the dataset that we are using. Further, we aim to detect some particles that exhibit properties which might not resemble the property of other particle which might be dark particles, but the problem with machine learning models [19] indicates that the machine learning models are rigid, and it might over-fit all the dataset into one or the other category which is undesirable. Hence, use of a neural network will come handy as for each individual category, we can adjust parameters of learning and change the capabilities of the model to produce the best prediction for this research.
2.4 Proposed System For each category, we have different number of feature sets. Hence, we will be creating three different models; each of them is going to return the probability of being either one of the five known particles or a particle with non-similar characteristics (unknown). Since we know that there could be only five possibilities, i.e. electron, kaon, proton, pion or proton, hence, we will take the output of model into six category where the sixth category will be the unknown particle property. The three networks Figs. 12, 13 and 14 depict three different datasets over which the system is being trained and validated for loss minimization. Fig. 12 Deep network for Category 1 dataset with nine input features
2 Deep Learning Implementation for Dark Matter Particle Detection
Fig. 13 Deep network for Category 2 dataset with eight input features
Fig. 14 Deep network for Category 3 dataset with nine input features
21
22
Anukriti and V. Niranjan
2.5 Network Architecture The development of neural network is done in the following manner: 1. 2. 3. 4. 5. 6. 7. 8. 9.
Create a new sequential model Add the first linear input layer with eight/ nine neurons which are fed from the dataset of each category Apply the rectified linear activation function (ReLU) for the input layer Add the first hidden layer with 36 neurons and feed each neurons from the input layer Apply the rectified linear activation function for the first hidden layer Add the second hidden layer with 36 neurons and feed each neuron from previous hidden layer Apply the rectified linear activation function for the second hidden layer Add the output layer with six neurons as we need six categories of data and feed each neuron from second hidden layer. Apply softmax activation function for the output layer
2.6 Activation Function In the research, we will be using two different activation functions, i.e. ReLU as explained in [20] for the input layer and hidden layers and softmax as explained in [21] for the output layer. The ReLU function is similar to a ramp function with half-wave rectification. Figure 15 shows operation of a ReLU function. ReLU activation function is given as f (x) =
x if x > 0 0 otherwise
Fig. 15 ReLU (left) and softmax (right) function
2 Deep Learning Implementation for Dark Matter Particle Detection
23
The output of ReLU function is zero for input less than zero; otherwise, it will be directly reflected in the output. For the output layer as we know is going to be a multi-class classification, therefore, for such output, softmax function is used. The softmax activation will output one value for each node in the output layer. The output values represent the probabilities of each class with the values summing up softmax activation function that is given as ex j f (x) = K j=1
ex j
where x j is the predicted output of the jth class of the network.
2.7 Loss Calculation and Optimization In this research, we are going to perform a multiple class classification using neural network. Hence, it will be best suited to use the cross-entropy as a loss measurement function. Cross-entropy is also known as log loss. Cross-entropy is built upon the idea of entropy from information theory, and it calculates the number of bits required to represent or transmit an average event from one distribution compared to another distribution as described in [22]. The formula for cross-entropy is given as L(y, yˆ ) =
K
y k log( yˆ k )
k
where yk is either 0 or 1 depicting whether the class label k is correct or not. For re-calibrating the weights and biases of the neural network in back propagation which can later give the best prediction, it is required to select the optimizer that will work best with the model. For this research, we will stick to the most reliant stochastic gradient decent. The SGD updates the weights and biases for each training example x i and label yi θ = θ − η · ∇θ J (θ ; xi ; yi ) where θ ∈ Rd is the parameter of the model, η is the learning rate kept at 0.002, and J(θ ) is the objective function that needs to be minimized. ∇ θ (θ ) is the gradient of objective function. The advantages of using SGD are well explained in [22], and it provides sufficient calculated results which indicate that the SGD is advantageous over other optimizers because it is a much faster algorithm for updating weights, and it performs one update of parameter at a time. Since the parameters in SGD update very frequently, there
24
Anukriti and V. Niranjan
is a high variance that significantly fluctuates the objective function, keeps overshooting the gradient and, hence, results in potentially better local minima compared to standard gradient decent.
2.8 Area Under the Curve For a trained neural network, it is critical to develop a confusion matrix as shown in Fig. 16. From the confusion matrix, we can derive some important metrics that are important for verifying the efficiency of the neural model. The first important metric is true positive rate (or sensitivity), which determines what proportion of the positive class got correctly classified. T.P.R. (Sensitivity) =
T.P. T.P. + F.N.
Similar to T.P.R., there is false negative rate, which determines what proportion of the positive class got incorrectly classified by the classifier. F.N.R =
F.N. T.P. + F.N.
The other important metric is true negative rate (or specificity), which tells us what proportion of the negative class got correctly classified. T.N.R. (Specificity) =
Fig. 16 Confusion matrix
T.N. T.N. + F.P.
2 Deep Learning Implementation for Dark Matter Particle Detection
25
Similar to T.N.R., there is false positive rate, which determines what proportion of the negative class got incorrectly classified by the classifier. F.P.R. =
F.P. T.N. + F.P.
The receiver operator characteristic (R.O.C.) curve is an evaluation metric for binary classification problems. It is a probability curve that plots the T.P.R. against F.P.R. at various threshold values and essentially separates the “signal” from the “noise”. The area under the curve (A.U.C.) as explained in [23] is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the R.O.C. curve. The higher the A.U.C., the better the performance of the model at distinguishing between the positive and negative classes (Fig. 17). A.U.C. = 0.5, as shown on the left of Fig. 18, indicates that the classifier is
Fig. 17 R.O.C. when A.U.C. = 0.5 (left) and 0.5 < A.U.C. < 1 (right)
Fig. 18 R.O.C. when A.U.C. = 1
26
Anukriti and V. Niranjan
predicting random class; hence, the classifier is not able to distinguish between positive and negative class values. 0.5 < A.U.C. < 1, as shown on the right of Fig. 18, indicates that classifier is able to detect more numbers of T.P. and T.N. than F.P. and F.N. Hence, there is a high chance that the classifier will be able to distinguish the positive class values from the negative class values. A.U.C. = 1, (ideal condition) as shown in Fig. 18, indicates that the classifier is able to perfectly distinguish between all the positive and the negative class points. So, the higher the A.U.C. value for a classifier, the better its ability to distinguish between positive and negative classes.
3 Result The neural network model so developed is tested on the whole dataset. Each model is able to identify six classes of particles. However, all network that we created predicted output with significant a difference in terms of loss and accuracy. For the features selected in Category 1 from the dataset, after performing 100 epochs, the cross-entropy loss comes out to be 1.7818, and for Category 2 from the dataset, after performing 100 epochs, the cross-entropy loss comes out to be 1.6933, and for Category 3 from the dataset, after performing 100 epochs, the cross-entropy loss comes out to be 1.7349. This only concludes that all three models are almost equivalent in performance, but the best result is shown in Category 2, wherein we had selected flags to identify the particle presence. The identification of particle is, therefore, done as shown in the R.O.C. curve plot Figs. 19, 20, 21 and 22. The A.U.C. for all particles is very close to 1 which indicates that the model did work well, and it identifies all the particles with more than 80% accuracy. The second part of research which is to identify unknown particles is also identified as seen in the R.O.C. plot for unknown particle. Figure 23 with A.U.C. of 0.83, therefore, declares that something more is generated than just five basic particles during the collision.
Fig. 19 R.O.C. curve detecting electron (left) and proton (right) particle
2 Deep Learning Implementation for Dark Matter Particle Detection
27
Fig. 20 R.O.C. curve detecting kaon (left) and pion (right) particle
Fig. 21 R.O.C. curve detecting muon particle
Fig. 22 Comparison of all curves
Comparing this research to previously done research in the same domain, the thesis of Viljoen on Machine Learning Uses on Particle Detection [24] focuses more on adversarial network different types on the adversarial networks and their advantages
28
Anukriti and V. Niranjan
Fig. 23 R.O.C. curve for other non-similar particles detected
on the particle identification. The model developed in the research is a CNN in two dimensions was produced for the identification. Similar kind of paper is guest [25] where again the objective research is done using the generative adversarial network with similar approach for particle identification in Alice experiment to understand the tracking system with the help of another paper by Paganini [26] for fast simulation; however, we in this paper have certainly focused on deep neural network unlike adversarial networks and focused over the LHCb experiment and its data generation. Another similar paper by Newby [27] produces a different convolutional neural network with much significant use case model as it could read more features simultaneously. The approach that we used in our paper is similar to [27] wherein we used feature chunks and treated them as individual entities for training the model. However, the advantage of our model is when we perform feature selection to improvise the results which can be seen from the results of detection. The research did produced a lower loss value as compared to papers of [25–27] and that too without even using a CNN model.
4 Conclusion The neural network model is developed to classify the particles generated in the collision of two proton particles in Large Hadron Collider at CERN. The R.O.C. curve indicates the existence of various particles in the collision and is classified on the graph. The existence particle with unknown property is significantly unique and distinguished from the properties of other particles. In future, the following research can be extended in terms of more optimized neural networks. Faster optimization techniques are useful for handling such bulky data. More focused analysis of unknown particle could be done using another network dedicated for it. Since the data is bulky and is costly in terms of both processing and accuracy, use of near-future quantum computer might come quiet handy for faster processing of such dataset.
2 Deep Learning Implementation for Dark Matter Particle Detection
29
Acknowledgements I would like to express my special thanks and gratitude towards my mentor and supervisor Prof. Vandana Niranjan for providing me a constant support and given me an opportunity to do this wonderful research under her guidance on the topic and provided me with so much of knowledge which was beyond her domain. Secondly, I would like to thank my parents without whose motivation the research work could not have been completed in limited time.
References 1. Alekhin S, Altmannshofer W et al (2016) A facility to search for hidden particles at the CERN SPS: the SHiP physics case. Rep Prog Phys 79(12):124201. https://doi.org/10.1088/0034-4885/ 79/12/124201 2. Griffiths D (2008) Introduction to elementary particles. ISBN 978-3-527-40601-2 3. Standard Model of Elementary Particles.svg. https://commons.wikimedia.org/w/index.php? title=File:StandardModelofElementaryParticles.svg&oldid=527547936 4. Strassler M (2013) Protons and neutrons: the massive pandemonium in matter. https://profma ttstrassler.com/articles-and-posts/particle-physics-basics/thestructure-of-matter/protons-andneutrons/ 5. Balasubramanian A (2017) Basics of the universe. https://doi.org/10.13140/RG.2.2.27891. 84007 6. Yashar A (2011) Supersymmetry vis-\‘a-vis observation: dark matter constraints, global fits and statistical issues 7. Nappi E, Seguinot J (2005) Ring imaging Cherenkov detectors: the state of the art and perspectives. Nuovo Cimento Rivista Serie 28:1–130. https://doi.org/10.1393/ncr/i2006-100 04-6 8. National Science Foundation, HistoryOfUniverse-BICEP2–20140317.png. https://commons. wikimedia.org/w/index.php?title=File:HistoryOfUniverseBICEP2-20140317.png&oldid= 511688254 9. Queiroz FS (2018) New physics landmarks: dark matter and neutrino masses. Adv High Energy Phys. https://doi.org/10.1155/2018/2652536; ISSN 1687-7357 10. Lawrence Berkeley National Laboratory (2016) The particle adventure. https://particleadve nture.org/decayintro.html 11. Aaij R, Abellan CB et al (2011) Search for the rare decays Bs0 → μ+ μ− and B0 → μ+ μ− . Phys Lett B 708:55–67. 24 p. https://doi.org/10.1016/j.physletb.2012.01.038 12. Beiser A (2003) Concepts of modern physics. ISBN 978-3-527-40601-2 13. Johnston H (2015) Electron lifetime is at least 66,000 yottayears. https://physicsworld.com/a/ electron-lifetime-is-at-least-66000-yottayears/ 14. LHCb Collaboration, Alves AA et al (2008) The LHCb detector at the LHC. J Instrum 3:S08005. https://doi.org/10.1088/17480221/3/08/s08005 15. Nerling F (2011) Meson spectroscopy with COMPASS. J Phys Conf Ser 312(3):032017. https:// doi.org/10.1088/1742-6596/312/3/032017; ISSN 1742-6596 16. Newman P (2008) Lecture 10. http://epweb2.ph.bham.ac.uk/user/newman/appt08/lecture10. pdf 17. Lippmann C (2012) Particle identification. Nucl Instrum Meth A 666:148–172. https://doi.org/ 10.1016/j.nima.2011.03.009 18. McCauley T (2014) Datasets derived from the Run2011A SingleElectron, SingleMu, DoubleElectron, and DoubleMu primary datasets. https://opendata.cern.ch/record/301 19. Ying X (2019) An overview of overfitting and its solutions. J Phys Conf Ser 1168:022022. https://doi.org/10.1088/1742-6596/1168/2/022022 20. Sultan HH, Salem NM, Al-Atabany W (2019) Multi-classification of brain tumor images using deep neural network. IEEE access 7:69215–69225. https://doi.org/10.1109/ACCESS.2019.291 9122
30
Anukriti and V. Niranjan
21. Chigozie N, Winifred I, Anthony G, Stephen M (2018) Activation functions: comparison of trends in practice and research for deep learning. CoRR, arXiv, DOI. abs/1811.03378 22. Murphy KP (2013) Machine learning: a probabilistic perspective. ISBN 9780262018029 0262018020, http://noiselab.ucsd.edu/ECE228/MurphyMachineLearning.pdf 23. Ling CX, Huang J, Zhang H (2003) AUC: a better measure than accuracy in comparing learning algorithms. Adv Artif Intell, 329–341. ISBN 978-3-540-44886-0 24. Viljoen CG (2019) Machine learning for particle identification and deep generative models towards fast simulations for the Alice Transition Radiation Detector at CERN. OpenUCT. https://open.uct.ac.za 25. Guest D, Cranmer K, Whiteson D (2018) Deep learning and its application to LHC physics. Ann Rev Nucl Particle Sci 68(1):161–181. https://doi.org/10.1146/annurev-nucl-101917-021019; ISSN 1545-4134 26. Paganini M, de Oliveira L, Nachman B (2018) Accelerating science with generative adversarial networks: an application to 3D particle showers in multilayer calorimeters. Phys Rev Lett 120(4). https://doi.org/10.1103/physrevlett.120.042003; ISSN 1079-7114 27. Newby JM, Schaefer AM, Lee PT, Forest MG, Lai SK (2018) Convolutional neural networks automate detection for tracking of submicron-scale particles in 2D and 3D. Proc Natl Acad Sci 115(36):9026–9031. https://doi.org/10.1073/pnas.1804420115; ISSN 0027-8424
Chapter 3
Augmentation of Handwritten Devanagari Character Dataset Using DCGAN Rajasekhar Nannapaneni, Aravind Chakravarti, Shilpa Sangappa, Parinita Bora, and Raghavendra V. Kulkarni
1 Introduction Discriminative models, such as support vector machines (SVM), decision trees, and linear regression, are highly accurate in predicting the boundaries between different classes of objects without really understanding the underlying distributions. Due to this, discriminative algorithms cannot create anything new by themselves. In this context, generative models stand apart. By employing generative models, one can enable machines to have human-like cognitive abilities. Deep neural networks are gaining huge popularity. In traditional machine learning, each algorithm is very different from the other and has high degree of assumptions in estimating real-world models. However, even when compared between vast majority of deep neural networks, the fundamental concepts like neurons, synaptic weights, back-propagation are not very different from each other. Just by using the same neural network architecture and back-propagation algorithm, generative adversarial network can be trained to understand the world for a specific class. Data is the fuel that is required in abundance for neural networks to achieve stateof-the-art accuracy. A neural network’s performance is seen to increase with the increase in training data. But collecting data, preprocessing it and getting it into a shape that can be useable is a tedious and time-consuming task. The motive of this work is to address this problem. Neural network is a very loose realization of human brain which consists of decision-making neurons. Deep neural networks are constructed by adding a large number of layers. Each layer in the neural network is trying to extract features. Adding more number of layers thus extract higher features and network will gain higher understanding of what it is trained for.
R. Nannapaneni (B) · A. Chakravarti · S. Sangappa · P. Bora · R. V. Kulkarni Ramaiah University of Applied Sciences, Bangalore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_3
31
32
R. Nannapaneni et al.
Convolution operations were predominantly used in digital signal processing (DSP) to extract features of signals. With the rise of image processing techniques, convolutional operations became popular for extracting edge and patterns out of an image. Deep convolutional neural networks use these convolutional operations to extract features from an image. GANs are deep neural network architectures comprising of two neural networks, competing one against the other (thus the adversarial). The generative model originates from the statistical theory which is based on joint probability distribution. Given a known variable (observable), an unknown (target), the statistical model is the joint probability distribution of the observable and the target. So, a generative model can generate new data after learning the features from existing data. Boltzmann machines and variational auto-encoder (VAE) were two such popular generative models before the invent of GAN. Devanagari characters are used in writing scripts of some of the oldest languages such as Sanskrit, Hindi, Marathi, and Nepali mostly used in southeast Asian countries. The handwritten Devanagari characters have very large variations in terms of how it is written. Additionally, few of the characters have very small differences between each other. Devanagari script is complex because of cursive nature of writing. In the context of capability of a machine for recognition and generation, the system has to be very intelligent to address such situations and come up with correct identification and certain regenerations. To the best of authors’ knowledge, this is the first effort of generating Devanagari characters using generative model. Many researchers in the recent past have worked on images of English digits using the modified National Institute of Standard and Technology (MNIST) dataset. This application can be very useful in preserving ancient scripts and sculptures and thereby preserving cultural heritage. In India, there are a lot of scriptures in the form of palm leaf, rock carvings, ancient literature, which are written in Devanagari script. There is vast knowledge within these ancient forms of writing, which needs to be saved and restored, before they are completely destroyed. Challenges involved in extracting data from such forms are listed in [1] . The proposed work has a potential for a strong impact on the restoration of the knowledge gathered from ages, to be passed to future generations.
1.1 Applications Data augmentation is a technique to address insufficient training data. Due to the tremendous growth of computing power and infrastructure, supervised modeling techniques can take advantage of enormous data available. However, in case of data insufficiency, data augmentation plays an important role. These synthetic techniques will expand the available training samples. This technique is shown to have increased classification accuracy and can avoid overfitting. As GANs can generate samples on its own, it is very useful in this context.
3 Augmentation of Handwritten Devanagari Character Dataset Using DCGAN
33
Random numbers generated by computers are not completely random in nature. The synthetically generated images of digits can be passed to a classifier, and random numbers can be generated. The scope of this work can be viewed in a larger context of providing human-like abilities to machine. The remainder of this paper has been structured as follows: A very brief survey of the recent research related to this study has been presented in Sect. 2. The details of the proposed DCGAN have been presented in Sect. 3. The details of the implementation of the DCGAN has been presented in Sect. 4. The results if the evaluation of the DCGAN have been briefly presented in Sect. 5. Finally, concluding remarks have been made in Sect. 6.
2 Related Work Digitization is one of the most important processes in the era of computers. It is the mechanism of converting non-digital data into digital data so that one can extract and preserve information underlying in the non-digital sources of data. Various non-digital data sources exist such as books, clay, wood, papers, copper sheets, papyrus, cloth, palm leaves, pottery, stone carvings, metal, tree bark, animal skin, and parchments, and it is essential to preserve the data written on these manuscripts as these writings pass on crucial information from one age to other. With the advent of advanced computing, a lot of researchers focused on various techniques to recognize handwritten characters efficiently and this problem is usually referred as optical character recognition (OCR). Several machine learning methods to leverage the OCR are presented in [2, 3]. A machine learning technique can be accurate only when its model is efficient which in turn depends on the availability of robust trainable dataset. It is known that any prediction accuracy is dependent on the model that is leveraged and the training data. As widely accepted, if the model is the engine, then the data is the fuel. This means that the more fuel we have, the merrier. One of the most experimented dataset for optical character recognition is MNIST database which was released by LeCun et al. [4]. Several approaches such as Knearest neighbor, multilayer perceptron, and convolutional neural networks have achieved 96.8%, 97.3%, and 99.1% accuracy, respectively, for recognizing handwritten characters of MNIST database [5]. It is observed that the Devanagari handwritten characters were not experimented with the way English handwritten characters and MNIST database is experimented. Devanagari is adopted to over 120 different languages of which 28 languages use it as a writing system. This script is used by over 500 million people around the globe [6]. To protect and preserve Devanagari manuscripts which carry heritage of many generations, one must ensure that a proper training dataset exists for leveraging machine learning algorithms to build an efficient model that could recognize Devanagari handwritten characters. Like the MNIST database, there exists the DHCD dataset which contains Devanagari handwritten characters [7]. The DHCD dataset has 92,000 images covering 46 characters of which 36 are consonants and 10 are characters.
34
R. Nannapaneni et al.
Classical CNN classifier to classify the DHCD dataset has been built in [8]. The model introduced in [9] achieved an accuracy of 89.6% for recognizing Devanagari handwritten digits using combining decision of multiple connectionist classifiers. The model introduced in [10] achieved an accuracy of 92.8% using histogram features. Primitive features of the characters have been used in [11] to achieve an accuracy of 93% for 21 fonts. The feedforward neural network developed in [12] delivered the accuracy of 94.57% and the one in [13] achieved an accuracy of 96% in the classification of Devanagari handwritten characters. Further, the model developed in [14] achieved 95% accuracy by the following fuzzy-based techniques. Another parameter that affects the accuracy of a network is the network depth. Experimentations on the network at different depths and achieved maximum of 96% accuracy in [15]. The accuracy of the Devanagari handwritten character recognition was further improved to 98.47% using improved CNN techniques in [7]. To detect the multifont, multi-style Devanagari characters LSTM architecture has been developed by Teja and Pawar [16]. Another scenario could be to detect the language in use, when multiple languages are used in the document. To overcome this kind of limitations, both temporal and spatial features are extracted forwarded by LSTM and CNNs [17]. According to Kaggle public MNIST leaderboard scores the accuracy of prediction for MNIST database has skyrocketed to 99% after the introduction of CNNs [18]. The accuracy further improved with the introduction of data augmentation. Thus, one cannot undermine the need of data augmentation and it is pivotal to exploit the opportunities that could allow data augmentation for various datasets. The data augmentation can further enhance the accuracies of Devanagari handwritten character recognition, and thus, it is essential to investigate various approaches to data augmentation. Usually, the data augmentation techniques manipulate the existing training images resulting in more training data. However, it is possible that the model can still overfit due to insufficient original training data samples. To enhance the training data, generative models would be better as they would help to increase the training samples by generating them rather than converting existing images. There are quite a few unsupervised learning methods, such as variational autoencoders, restricted Boltzmann machines, and GANs [19]. These are generative-based models. However, the generated images of variational autoencoders are usually blurry for character datasets and restricted Boltzmann machines use the Monte Carlo method for sampling that requires the number of samples well ahead of generation which is not flexible as choosing right number of samples is complex. In this work, GANs were leveraged due to their popularity in the space of data augmentation for generating synthetic images that resemble original Devanagari handwritten characters. GANs learn the distribution of original data to generate fake images that look natural. Several types of GANs exist of which it is also observed that DCGANs [20] are very suitable for handwritten type character datasets as it improved the classification accuracy of MNIST database. In this work, DCGAN has been employed for data augmentation on DHCD dataset to generate Devanagari character samples that look like natural handwritten characters.
3 Augmentation of Handwritten Devanagari Character Dataset Using DCGAN
35
3 The Proposed DCGAN 3.1 Data Collection The problem statement requires images of Devanagari characters in large numbers, for training GAN network. The Devanagari Handwritten Character Dataset (DHCD) is available on UCI machine learning repository [7]. This consists of images of handwritten Devanagari characters. The dataset consists of 92000 images in total. There are 46 classes with 2000 examples for each class. The dataset is large both in terms of samples and classes as compared to MNIST dataset. The school children of class six and seven from the Mount Everest Higher Secondary School, Bhaktapur, Nepal, contributed to this dataset by volunteering to write the characters in 2015. These were scanned manually. Besides the manual scanning, other preprocessing tasks were also performed. The images are grayscale images. Characters are centered within 28 × 28 pixels, with two-pixel padding on all four sides. Thus, the final resolution of images is 32 × 32. Images are stored in Portable Network Graphics (PNG) format. The split ratio of training and test samples is 85% and 15%.
3.2 Architecture In this work, a variant of GAN called deep convolutional generative adversarial networks [20] has been developed. The developed functional block diagram of DCGAN is depicted in Fig. 1. While training GANs is a tedious task, DCGANs introduced convolutional GANs which provided the stability that the other variants of GAN lacked. Supervised learning with CNNs was widely adopted due to the accuracy it was able to achieve. DCGAN was introduced to achieve the same kind of accuracy using unsupervised learning. Convolution is the process of running a kernel over an image to extract features. It basically is sum of product of image and kernel values. Kernels are used to extract features by performing a convolution between the image and the kernel. A kernel is convolved over the whole image to extract features. In a convolutional neural network, layers of convolution are done, so that the first layer picks up the most elementary features say edges, the second layer picks the second most elementary features like loops and lines, and further up layers pick the predominant features. The size of a kernel also has a huge impact. Small kernels pick up minute details, whereas large kernels discard quite a lot of features. The selection of kernel size depends on the application being developed. 1 × 1 kernel is looking at just one pixel from a feature map. Hence, it is not picking any features in the spatial domain. But it is picking up features by merging the different feature maps. A 1 × 1 kernel reduces number of feature maps. A 1 × 1 kernel also helps in only passing the features of objects relevant to us and shuns the background information. 3 × 3 convolution is
36
R. Nannapaneni et al.
Fig. 1 GAN architecture
convolving the image or feature map with a kernel of size 3 × 3. It is extensively used as it is computationally efficient. Convolution using a single 5 × 5 kernel is equal to performing 3 × 3 convolution twice. But the number of parameters and computations involved in two 3 × 3 convolutions (9 + 9 = 18 multiplications) is lesser than a single 5 × 5 convolutions (25 multiplications). Kernel values determine the kind of features required for extracting. In traditional image processing, kernels were created with specific values. But in convolutional neural networks, kernel values are randomly initiated and the network learns these kernel values during the process of training. The output of each layer in a CNN forms feature maps, which in turn are passed as an input to the next layer. The size of the output feature map depends on the size of the input image or input feature map and the size of the kernel. For example, if the input feature map is of size 400 × 400 and the kernel size is 3 × 3, then each feature map in the output of this layer would be of size 398 × 398. The number of feature maps in the output of a layer is equal to the number of kernels in the layer. If there are 32 kernels in a layer, then there will be 32 feature maps in the output of that layer. Convolutions have given state-of-the-art results in the field of imaging. This is the primary reason for choosing DCGAN variant for this work, as the dataset used here is images. DCGAN uses transposed convolution and convolutional stride for the upsampling and the downsampling. The network has replaced all maxpooling layers with strides. Batch normalization has been used to boost accuracy. The whole network consists of three sub-networks, the generator, the discriminator and a combined model that combines generator and discriminator. The generator generates images from noise. The discriminator takes input from generator, which are fake images and images from the dataset which are real images and learns to distinguish real and fake images. The cost/loss calculated at the discriminator is percolated back to the generator for backpropogation. The generator network in Fig. 2 starts with a random noise of 100 points. These 100 points are fed into a dense layer, which is a fully connected layer, of size 8192. The activation function for the dense layer is rectified linear unit (ReLU). The output of this dense layer is 8192 × 1, which is re-sized to 8 × 8 × 128. A 2D upsampling
3 Augmentation of Handwritten Devanagari Character Dataset Using DCGAN
37
Fig. 2 Generator architecture
Fig. 3 Discriminator architecture
is performed on it, increasing the feature map size to 16 × 16, followed by a 2D convolution of kernel size 3 × 3 and 128 feature maps keeping padding to same, i.e., the after the convolution the image padded to whatever size it was, keeping the image size same to what it was before the convolution operation. Batch normalization is used as a regularizer here, keeping momentum to 0.8, and setting activation function to ReLU. This is followed by 2nd two-dimensional upsampling, increasing the feature map size to 32 × 32. A two-dimensional convolution follows it, with the same hyperparameters as the first convolution, except for feature map size, which is now set to 64. This is followed by the third two-dimensional convolution with feature map size of unity, and rest of the parameters are the same as the previous convolution layer. This feature map is the output of generator, which is of size 32 × 32. The discriminator network in Fig. 3 takes an image of size 32 × 32. This could be an image from the training dataset. The first layer of discriminator is a 2D convolution, with 32 feature maps, kernel size 3 × 3, 2 strides, and padding set to same, Leaky ReLU activation function and a dropout of 0.25 acting as the regularizer. The output feature map of this layer is of size 16 × 16. This is followed by a second layer which is a 2D convolution with 64 feature maps and the parameters same as above. The output feature map of this layer is of size 8 × 8. These 8 × 8 are zero padded, which
38
R. Nannapaneni et al.
changes the feature map size to 9 × 9. The third layer is a 2D convolution with 128 feature maps and same parameters as above. The output feature map of this layer is 5 × 5. The fourth layer is a 2D convolution with 256 feature maps and stide 1, with rest of the parameters kept unchanged. This retains the feature map size to 5 × 5. The fifth layer is a fully connected layer which takes as input the flattened 6400 of 256 × 5 × 5 output of the previous layer and outputs a single value, which determines if the image is true or fake. The single neuron in the output layer acts as binary classifier.
3.3 Cost Function Because GAN contains two networks and competing against each other, there has to be separate but linked cost functions. The generator is trying to generate fake images to fool the discriminator, whereas discriminator is trying to classify realworld samples from the fake one. The discriminator cost function is represented by Eq. 1. While training, the goal of discriminative network is to reduce its cost, represented by J (D). In fact, this cost is actually a standard cross-entropy loss. The only difference is that discriminator is trained on two separate mini-batches. One batch is coming from the dataset (where all labels are 1s) and another from generator (where all labels are 0s). J (D) = ∇θd
m 1 log D x (i) + log 1 − D G z (i) m i=1
(1)
Because two networks are competing against each other, simplest version of generator is J (G) = −J (D) . That is, generator is trying to increase it cost in contrast to discriminator. The generator cost function is represented by Eq. 2. From the cost function, it can be seen that generator is trying to produce realistic images such that discriminator gets fooled and classify those samples as real images. J (G) = ∇θg
m 1 log D G z (i) m i=1
(2)
4 Implementation of the DCGAN Algorithm depicts the architecture of neural network. The implementation of generative adversarial networks includes implementation of its components generator and discriminator. The individual components can be trained, and their efficiency can be obtained; however, the most critical part of the GANs is that the training of the components together and making them to work in cohesion which is the difficult part. The
3 Augmentation of Handwritten Devanagari Character Dataset Using DCGAN
39
discriminator is trained with both real images from the DHCD dataset and the fake images generated by the generator. The real images are labeled all ones, and the fake images are labeled all zeros. Random noise is sent as an input to generator which in turn tries to model the distribution of the Devanagari handwritten characters, and it generates the fake images of Devanagari handwritten characters. The overall network architecture is determined based on the dataset and its complexity. The Devanagari handwritten character dataset has 92000 images consisting of 46 characters of which 36 are consonants and 10 are digits. Approximately, 1700 images per each character are for training and 300 images for each character are for testing.
4.1 The Discriminator The discriminator network is a sequential model and has six convolutional layers. The kernel size for each of the layers is 3 × 3 which is comparatively successful in convolutional neural networks. Each of the input 32 × 32 images is padded with zeros to ensure the features toward the edge of the images are conserved. Batch normalization was applied after each layer so that the features obtained at the output of each convolution layer are captured irrespective of the lighting conditions, occlusions, or any other image corruption conditions of the inputs. Leaky ReLU activation function was used in the convolution layer blocks pass on the positive values as is and suppress negative values as per the function of leaky ReLU. A drop out of 25% is added at each convolution layers to make sure that the discriminator network is not overfitted and 25% of its network weights are randomly dropped during training that would indirectly force the network to learn new features of the dataset allowing multiple ways to strengthen the understanding of the dataset. At the end of the network, the output of last convolution layer is flattened and passed through sigmoid activation function which would discriminate the given input image as real or fake. All real images are labeled as 1s, and all the fake images are labeled as 0s. The output of sigmoid activation function would be either 1 or 0 that will let us know if its a real or fake image. The discriminator network has around 396000 parameters in the network which is very huge and thus gives the relevance of using GPUs for training.
4.2 The Generator The generator network is a sequential model and has four de-convolutional layers. The kernel size for each of the layers is 3 × 3 which is comparatively successful in convolutional neural networks. The input to the generator network is random noise of size 128 × 1 which is reshaped to 128 × 8 × 8 in the initial dense layer. The most interesting block of the generator network is the de-convolutional layers which are also called as upsampling. These layers tend to provide fractional-stride convolution thus helps in increasing the size of the image size. At the end of the generator
40
R. Nannapaneni et al.
network, one expects to get Devanagari handwritten characters which are of size 32 × 32 and de-convolutional layers help to achieve that. Batch normalization was applied after each layer so that the features obtained at the output of each convolution layer are captured irrespective of the lighting conditions, occlusions, or any other image corruption conditions of the inputs. ReLU activation function was used in the de-convolution layer blocks pass on the positive values as is and suppress negative values to zero. The output of each layer is padded before being up-sampled to ensure the learnt distribution in the previous layers is not dropped. At the end of the generator network, a 32 × 32 image is obtained which is also considered as a fake image and it is then sent to the discriminator network as an input. The generator continuously tries to improve its fake images by indirectly improving its understanding of the distribution of the Devanagari handwritten characters. The generator network has close to 1 million parameters in the network which is very huge and thus gives the relevance of using GPUs for training.
4.3 Network Hyperparameters Some of the most important network hyperparameters are learning rate, epochs, and batch size. It is very important to tune these parameters in such a way that the generator and discriminator compete. The batch size of 128 was observed to be working well for the DHCD dataset, and this batch size is also very suitable for GPU architecture to handle 128 images in parallel. For the training, Adam optimizer is used with 0.01 learning rate and 0.9 momentum with the decay of 1e − 6. Other hyperparameters like number of convolutional layers and activation functions are discussed in Sect. 4.
5 Results 5.1 The DHCD Numerical GAN The DCGAN architecture that was initially applied to the digit images of the Devanagari handwritten character dataset. The digit Devanagari handwritten characters are relatively easier to generate as their distribution was learnt relatively quickly by the generator. Figure 4 illustrates the output of the GAN for the numerical Devanagari handwritten characters after 9990, 19,990, and 29,990 epochs, respectively.
3 Augmentation of Handwritten Devanagari Character Dataset Using DCGAN
41
Fig. 4 DHCD digit outputs of GAN
Fig. 5 DHCD consonant output of GAN
5.2 The DHCD Consonant GAN The DCGAN architecture that was implemented was later applied to the consonant images of the Devanagari handwritten character dataset. The consonant Devanagari handwritten characters are relatively tough to generate as their distribution was learnt relatively longer by the generator. Figure 5 illustrates the output of the GAN for the consonant Devanagari handwritten characters after 0, 10, 100, 1000, 10,000, 20,000, and 30,000 epochs, respectively. The output images depicted in Fig. 5 clearly show the evolution of Devanagari handwritten characters from random noise, thus confirming the learning of distribution of the handwritten characters. Hence, the existing DHCD dataset can be extended with the new additional GAN generated DHCD characters in Fig. 6 and thus can help in data augmentation. The proposed GAN architecture was trained and tested on Google Colab’s Nvidia Tesla T4 GPU. The overall training duration for the proposed network and hyperparameters was nearly 4 hours. And the GPU energy consumption was estimated
42
R. Nannapaneni et al.
Fig. 6 DHCD consonant outputs of GAN after 30,000th epoch
using Nvidia Management Library and was about 550J. Assuming the similar GPU is being used for inferencing, GAN network is capable of producing two images per milli-second.
6 Conclusion This paper demonstrates that GANs can generate handwritten characters that appear very similar to human handwritten characters which directly proves that the algorithm was able to learn human-like cognitive capabilities and can reproduce them. A brief review of literature is performed on the classification accuracies of Devanagari datasets and various generative models’ behavior has been investigated. The required pre-requisites have been captured and design of the GAN components like generator and discriminator has been performed. Each of the GAN components is built as a deep learning model, and they are trained on Google Colab research platform as training these models is compute-intensive requiring GPU. The implementation of the GAN architecture demonstrates the usage of GPU on Google cloud platform for training deep learning models. The algorithm has been coded in Python along with Keras API which uses the TensorFlow backend. In this work, the primary objective is to enhance the Devanagari handwritten character dataset samples via data augmentation. The data augmentation was achieved using GANs, and in particular, DCGANs that could learn the distribution of handwritten characters and generate new fake samples of Devanagari handwritten characters. The emphasis of the current work was to generate the DHCD characters which are a total of 46 characters that includes 36 consonants and 10 digits. As part of future work, this can also be extended to the vowels which are a combination of consonants and special characters and these are not covered as part of current scope. Another outlook could be to utilize the newly generate fake Devanagari handwritten characters along with the existing DHCD dataset samples and design a new classifier that could improve the existing accuracy of Devanagari handwritten character recognition. The future work can also include the generation of specific handwritten characters that are complex and would need more samples for training a classifier rather than randomly generating all the handwritten characters.
3 Augmentation of Handwritten Devanagari Character Dataset Using DCGAN
43
The current scope can also be extended to various other language handwritten character datasets so that more samples of fake images can be generated to preserve the heritage of those languages. From technical standpoint, there are several other GAN architectures that can be evaluated for generating DHCD handwritten characters like MNIST database which was experimented on several other GAN architectures. The other interesting future scope could be to experiment with the data augmentation on IIIT-HW-Dev [21] dataset which is Devanagari handwritten word dataset. This is quite different than the handwritten character recognition, and it involves handwritten words which are the combination of multiple consonants, vowels, and special characters. Data augmentation via GANs was not attempted for age-old languages and would have good scope to experiment.
References 1. Khobragade RN, Koli NA, Lanjewar VT (2020) Challenges in recognition of online and off-line compound handwritten characters: a review. Smart Trends Comput Commun 375–383 2. Sharma AK, Adhyaru DM, Zaveri TH (2020) A survey on Devanagari character recognition. In: Smart systems and IoT: innovations in computing. Springer, Singapore, pp 429–437 3. Memon J et al (2020) Handwritten optical character recognition (OCR): a comprehensive systematic literature review (SLR). IEEE Access 8:142642–142668 4. LeCun Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 5. Jain N et al (2017) Hand written digit recognition using convolutional neural network (CNN). Int J Innov Adv Comput Sci (IJIACS) 6(5) 6. Lajish VL, Kopparapu SK (2015) Online handwritten Devanagari stroke recognition using extended directional features. arXiv:1501.02887 7. Acharya S, Pant AK, Gyawali PK (2015) Deep learning based large scale handwritten Devanagari character recognition. In: 2015 9th International conference on software, knowledge, information management and applications (SKIMA). IEEE 8. Mani S (2019) Devanagari character recognition using machine learning. Int J Res Appl Sci Eng Technol 7:2660–2663. https://doi.org/10.22214/ijraset.(2019).3485 9. Bajaj R, Dey L, Chaudhury S (2002) Devanagari numeral recognition by combining decision of multiple connectionist classifiers. Sadhana 27(1):59–72 10. Arora S et al (2008) Combining multiple feature extraction techniques for handwritten Devanagari character recognition. In: 2008 IEEE region 10 and the third international conference on industrial and information systems. IEEE (2008) 11. Sharma R, Mudgal T (2019) Primitive feature-based optical character recognition of the Devanagari script. In: Progress in advanced computing and intelligent engineering. Springer, Singapore, pp 249–259 12. Ansari S, Sutar U (2015) Devanagari handwritten character recognition using hybrid features extraction and feed forward neural network classifier (FFNN). Int J Comput Appl 129(7):22–27 13. Jangid M, Srivastava S (2019) Deep ConvNet with different stochastic optimizations for handwritten Devanagari character. In: Advances in computer communication and computational sciences. Springer, Singapore, pp 51–60 14. Hanmandlu M, Ramana Murthy OV (2007) Fuzzy model based recognition of handwritten numerals. Pattern Recogn 40(6):1840–1854 15. Chakraborty B et al (2018) Does deeper network lead to better accuracy: a case study on handwritten Devanagari characters. In: 2018 13th IAPR international workshop on document analysis systems (DAS). IEEE (2018)
44
R. Nannapaneni et al.
16. Kundaikar T, Pawar JD (2020) Multi-font Devanagari text recognition using LSTM neural networks. In: First international conference on sustainable technologies for computational intelligence. Springer, Singapore 17. Bhunia AK et al (2020) Indic handwritten script identification using offline-online multi-modal deep network. Inf Fusion 57:1–14 18. Kaggle (2019). https://www.kaggle.com/c/digit-recognizer/discussion/61480 19. Goodfellow IJ et al (2014) Generative adversarial networks. arXiv:1406.2661 20. Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434 21. Dutta K, Krishnan P, Mathew M, Jawahar CV (2018) Offline handwriting recognition on Devanagari using a new benchmark dataset. In: 2018 13th IAPR international workshop on document analysis systems (DAS), pp 25–30. https://doi.org/10.1109/DAS.2018.69
Chapter 4
Deep Reinforcement Learning for Optimal Traffic Control Rajasekhar Nannapaneni, Raghavendra V. Kulkarni, and Shalabh Bhatnagar
1 Introduction The advent of ML is changing the way we live in today due to its immense potential of making machines smart. It has possibilities of embedding some of the human traits such as decision-making, logical reasoning, learning and, sometimes, even acting based on decisions. Machines could be enabled with human-like cognitive abilities through learning mechanisms such as RL when one can consider machines attaining true artificial intelligence (AI). RL enables an agent, machine or a robot with decision-making when interacting within an environment. DL has revolutionized the field of ML as it has facilitated efficient solutions for high-dimensional problems thus creating newer opportunities. The possibilities can get exploded when RL is powered by DL which can bring the RL concepts to reality as it empowers the application of RL in real world which is high dimensional. This combination can be called as DRL. The concepts of DRL are being applied on several domains to achieve newer heights and opportunities. One such application can be traffic control system in real world which is adaptive and dynamic. Road transportation is one of the primary means of transportations within any country. It forms a backbone to everyday life by facilitating office commutation, distribution of goods, day to day travel, etc. With the explosive growth in population, the number of vehicles on road has increased several fold causing fatalities per [1]. The increase in vehicles also resulted in congestion and traffic jams on roads especially in junctions as per [2]. The delays caused by traffic jams has in turn resulted in more travel time, thereby affecting day to day life. Besides travel time, it increases the fuel consumption by vehicles due to R. Nannapaneni (B) · R. V. Kulkarni Ramaiah University of Applied Sciences, Bangalore, India e-mail: [email protected] S. Bhatnagar Indian Institute of Science, Bangalore, Bangalore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_4
45
46
R. Nannapaneni et al.
longer wait times. The air and noise pollution at the junctions also increases causing unhealthy environment. It is imperative that the congestion and traffic jams on road junctions should be reduced or in fact resolved. One of the obvious solutions would be to enhance the road infrastructure which includes increasing more lanes, widening the roads, building more elevated corridors, etc. This solution is relatively a costly solution which can be leveraged when there is no other option. Another solution is to effectively manage the traffic at the junction via an intelligent traffic control system. This is a much more cost-effective solution and can be practically implemented. This solution is the main theme of this work where the traffic control system is developed for optimal traffic control using DRL.
2 Related Work The traffic control system developed through reinforcement learning has been experimented in various ways in the literature. One of the popular methods is via the use of convolutional neural networks (CNN) where the network is trained via a sequence of images from the traffic intersection. The work of [3–5] shows that they have experimented on CNN-based RL traffic control systems. One can consider the CNN-based approach is a high-informationdensity-based approach as it needs images in real time at the road intersection. The outcome of such approaches would have high accuracy, however, would require a lot of compute power. One of the alternatives would be to use traditional sensors and loop detectors on the arm of the road toward the intersection. Genders and Razavi [6] have experimented on this approach and have achieved reasonable results. This approach can be treated as a low-information-density-based approach due to the reason that the input to the network is obtained from sensors and will be translated to binary data vector. Be it sensor-based or others, the method of getting input data from discrete points on the arms of the road has been experimented by several researchers in Gao et al. [4], Vidali et al. [7], van Dijk [8] etc. Q-learning is one of the popular RL algorithms that was applied to the traffic control system applications. This algorithm was first published by Watkins and Dayan [9]. To address the problems involving high-dimensional state space, DL from [10] was applied to RL. This is suitable to traffic control system problems which naturally has high-dimensional state space. One of the inherent problems in this approach is that the DL network does not converge to the inputs coming from Q-learning algorithm. This is due to the fact that the incoming data or state space information is highly correlated unlike regular independent and identically distributed data with which DL networks are trained with. To address this, another concept known as experience replay is leveraged which is used by Vidali et al. [7], Strnad et al. [11], Zhang and Sutton [12], etc.
4 Deep Reinforcement Learning for Optimal Traffic Control
47
Most of the literature experimented using the tool SUMO which was first published by Krajzewicz et al. [13]. The SUMO integration with RL setting for traffic control system was evaluated by various research works including [4, 14], etc. In addition, various reward definitions such as cumulative delay, waiting time, queue length were experimented by Guo et al. [14], Mousavi et al. [15], Zheng et al. [16], etc. The road intersection can be pulled from the online maps as was done by Casas [17]. Overall, the literature suggests that there is a wide scope of improvising the existing work by improving the Q-learning algorithm, improving the reward scheme, tuning the neural network, trying left-hand traffic scenario, working on shared road scenario and trying mixed vehicle lengths.
3 Traffic Control System as Deep Reinforcement Learning Agent This work focuses on effectively managing traffic at the junction via an intelligent traffic control system. That is, given a four road intersection, the efficiency of the traffic control system will be improved through correct traffic light sequence. The four road intersection in question is a typical Indian four road intersection where two arms will have four lanes each and the other two arms will have two lanes each. As it is an Indian four road junction, the driving style would be right-hand where vehicles drive on left-hand traffic. Considering an Indian four road intersection and the traffic control system, it is pivotal to transform the problem into RL key components. The application of RL to vehicular traffic application requires translating the RL concepts to traffic control system components. The environment in this case would be the four road intersection, and the state space information is derived from the four road intersection. The agent is the traffic control system which is designed to be a deep Q-learning agent that gives actions as the required traffic light sequences. The reward will be translated to the cumulative delay of vehicles waiting at the junction. Process and working of the RL on the vehicle traffic control system are illustrated in Fig. 1. This details the environment, the agent and the actions translated to vehicle traffic control system.
3.1 Q-Learning The traffic control system application has a huge environmental state space, and hence, developing a model of the environment is not a feasible approach. RL leverages the Markov property as per [18], and RL has various model-free approaches too in which Q-learning algorithm is one of the off-policy temporal difference techniques.
48
R. Nannapaneni et al.
Fig. 1 Reinforcement learning on traffic control system
The Q-value determines how best is the action given the state, and this will facilitate the agent to pick action for the best policy. So, the algorithm maintains a Q-table, and the optimum policy is to pick the best Q-value of the available Q-values in the Q-table for a given state–action pair. The generic Q-learning algorithm per [19–21] is based on policy iteration shown in Eq. 1 where Q n is new Q-value and Q c is the current Q-value given a state. Q n (s, a) = Q c (s, a) + α r + γ max Q s , a − Q c (s, a)
(1)
3.2 Deep Q-Learning When the state space is very large, maintaining generic Q-table is not feasible. The ultimate goal is to determine actions given a state, that is, given a state, the algorithm has to determine the best Q-value appropriate for best action. A DL network can be introduced which can approximate the mapping of state to action. This method is ideal for large state spaces and can be easily applied to vehicle traffic control system where the state space is large. As the deep neural network is trained, it gains knowledge of the Q-values for a given state. Thus, the best action sequence is predicted given a state, and thus,
4 Deep Reinforcement Learning for Optimal Traffic Control
49
the traffic control system that acts as an agent facilitates traffic light sequences accordingly. 1Sample = [State, Action, Reward, Next State] (2) The data coming back from the environment is in the form of samples, and each sample consists of state and reward information as shown in Eq. 2. Using the state and reward information from the sample, one can predict the Q-values of current and next state–action pairs as per Eqs. 3 and 4. Q c (s, a) = predict(State)
(3)
Q s , a = predict(Next State)
(4)
The new Q-value of the current state is computed using Eqs. 3 and 4 with Eq. 1. This way, the DL network-predicted outputs are used in the Q-learning algorithm. The inputs to the DL network are shown in Eq. 18 which are the current state, and its associated new Q-value is obtained from Eq. 1. x = State; y = Q n (s, a)
(5)
Train− > Model(x, y)
(6)
Loss = MSE[Q n (s, a) − Q Predicted ]
(7)
One can observe that the training shown in Eq. 6 for the DL network aligns toward the Q-learning algorithm as one of the inputs is the new Q-value obtained from Q-learning algorithm shown in Eq. 1. The network itself is tuned toward this new Q-value by computing loss via mean squared error of the new Q-value with the network-predicted Q-value shown in Eq. 20.
3.3 Literature Deep Q-Learning The deep Q-learning technique mentioned in the previous section was prominent in literature which was modified and adapted to various applications. One such modification was attempted by Vidali et al. [7] when applying DRL to the vehicle control system. Equation 8 was the literature where deep Q-learning algorithm applied, where the learning rate was considered to be 1. As the vehicle control system application itself consists of episodic tasks, applying learning rate of 1 is applicable and, hence, achieved reasonable results. Q n (s, a) = r + γ max Q s , a
(8)
50
R. Nannapaneni et al.
3.4 Modified Deep Q-Learning A step forward in the deep Q-learning algorithm can be to compute an intermediary Q-value Q x first similar to the literature deep Q-learning technique as shown in Eq. 9. Q x (s, a) = r + γ max Q s , a
(9)
The new Q-value Q n can be obtained by calculating an expectation of the intermediary Q-value Q x and the predicted current Q-value Q c with more weight to intermediary Q-value Q x as shown in Eq. 10. Q n (s, a) = 2/3 ∗ [Q x (s, a)] + 1/3 ∗ [Q c (s, a)]
(10)
This new Q-value Q n is used as inputs to DL network and to compute loss function as shown in Eqs. 18 and 20.
3.5 State Representation The representation of state space varies according to the environment, and there can be many ways to represent state space for the same environment. This representation can also vary with respect to the density of information that can be embedded in a given state. In this case of vehicular traffic control system, the four road intersection is the environment, and state space can be obtained either by high information density or by low information density. One of the state space representations can be via images taken at the road intersection which has high information content of the situation at the road intersection. These images can be used to train a CNN which can in turn integrate with RL algorithms. These approaches can result in high accuracies but need high compute and infrastructure to support. Another way to obtain state representation is by dividing the road arms approaching the intersection into discrete cells as shown in Fig. 2L. The presence of a vehicle in any such cell can be treated numerically as 1 and absence of the vehicle can be considered to be 0 as shown in Fig. 2R. This combination of all the arms and their corresponding numericals can form a vector that can represent the state of the environment. This work leverages vector representation as a technique for state representation which can be considered to have lowest information density as it merely contains the position of the vehicles rather than the overall information of the road intersection.
4 Deep Reinforcement Learning for Optimal Traffic Control
51
Fig. 2 State of the environment (L) and state representation (R)
One may use traditional sensors or loop detectors to get discrete cell-based information from the road intersection. Though the information content in this vector-based approach is low, there is no compromise on efficiency, and in fact, this approach has basic hardware requirements.
3.6 Action Definition Action is taken on the environment based on the decision made by an agent. An agent interacts with an environment via actions and gets reward for its actions. In vehicular traffic control system, various possible action scenarios can be considered. One such scenario can be that the agent has a predetermined set of traffic light combinations from which actions are defined for a fixed amount of time. Another scenario can also have a predetermined set of traffic light combinations from which actions are defined for a variable amount of time dependent on the environment. In this work, the actions considered are the the predefined set of traffic light sequences whose timings are fixed. That is, the time incurred between one light sequence and other is fixed amount of time. A = (North South Advance, East West Advance)
(11)
Since the road intersection that is analyzed in this work has shared roads, the road intersection can be considered to have north–south and east–west directions. Hence, there could be two actions due to the shared nature of the roads as shown in Eq. 11. That is, one of the two actions could be north–south advance and second action could be east–west advance. Ultimately, the objective of the DRL agent is to gain knowledge of this action sequence.
52
R. Nannapaneni et al.
3.7 Reward Framework The reward is an outcome of the actions applied on an environment by an agent. The reward framework varies with environment to environment, and one has to choose appropriate reward which makes more sense for that environment. For the vehicle traffic control system, there are various reward function possibilities. One of the reward could be the throughput which is the count of number of vehicles passing the junction. Another reward function could be the queue length, and as the name suggests, it is the number of vehicles queued near the junction on all the four road intersection arms. The stationary waiting time of the cars is not considered when queue length and throughput are computed, and hence, queue length and throughput are not ideal for reward computation. The reward function can also be defined in terms of vehicle delay and waiting time. In this work, cumulative delay of vehicles is used which is the total amount of wait time of all the vehicles on all the fours arms of the junction. This is a very reasonable and practical way to relate performance of the agent in a road intersection environment. Reward = (old total wait time) − (current total wait time)
(12)
Equation 12 details the computation of the cumulative delay which is the difference of total wait time in previous step and total wait time of current step.
3.8 Literature Reward Function In RL, the general phenomena is to introduce negative rewards which would make the agent slightly aggressive to reach positive rewards rather than settling in slightest positive ones. Equation 13 expresses the introduction of negative rewards as per literature by Vidali et al. [7] where a constant 0.9 is multiplied with the cumulative wait times of previous step. Reward = 0.9 ∗ (old total wait time) − (current total wait time)
(13)
This Eq. 13 would make the reward negative even when the previous cumulative wait time is greater than current cumulative wait times of the vehicles.
4 Deep Reinforcement Learning for Optimal Traffic Control
53
3.9 Modified Reward Function As the tasks in vehicle control system are episodic, a variable is computed which depreciates to zero as the episode count reaches to total number of episodes as shown in Eq. 14. = 1.0 − (current episode/total episodes)
(14)
In this work, the reward function is further modified to improve the performance of the agent. The computed variable is multiplied with the cumulative wait times of previous step as shown in Eq. 15. Reward = ∗ (old total wait time) − (current total wait time)
(15)
As the value of depreciates over the progression in episodes, this would further increases the negative reward structure and, hence, would further makes the agent more aggressive in reaching optimal performance.
3.10 Agent Framework An agent interacts with the environment and chooses an action based on rewards received from the environment. The goal of any RL application is that the agent fully learns the dynamics of the environment. For smaller environments, the agents can learn from Markov decision process where the complete state space, the state transition probabilities and rewards of the environment can be modeled. As the environmental state space increases in dimensions, one cannot model entire environment and, hence, model-free approaches such as Q-learning are applied. As the environment gets even big, Q-learning is integrated with DL so that the problem of dimensionality is addressed. In the vehicle traffic control system, agent learns via DRL and intelligently takes actions based on the environmental state as shown in Fig. 3. Here, the agent chooses an action which acts upon the environment and in turn receives the state and inputs for computing reward from the environment. The samples retrieved from the environment are highly correlated, and hence, the samples are accumulated in a memory-like structure; batch of samples are randomly picked from the memory to train the DL network. This way, the chances of picking correlated samples is less, and in turn, it also results in using some of the previous old samples that could remember previous state information. This mechanism is called experience replay, and it enhances the stability of the DL network. Toward the end, the agent learns and gains the knowledge of the right action sequence and thus acts intelligently without any human intervention for any new state space combinations in the environment.
54
R. Nannapaneni et al.
Fig. 3 Agent framework
4 Simulated Framework for Traffic Control System The formulation of environment required for DRL agent to interact with and apply actions up on is pivotal. One of the approaches is to pick a real-time four road intersection and capture the traffic via traditional sensors or loop detectors. The agent can initiate actions based on reward outcomes from environment. Though this approach is the correct, one can initially experiment the outcomes in a simulated environment prior to practically implementing it. One such simulated approach is to leverage Simulation of Urban MObility (SUMO) which was first published by Krajzewicz et al. [13]. This simulated SUMO environment can be configured with any road intersection and any junction topology. In this work, a four road intersection from Bangalore, India, has been considered which follows the left-hand traffic configuration.
4.1 Traffic Light Configuration The four road intersection has a color strip on the stop line which is the traffic light at that point in time. One can notice that these color strips exist on all the four arms, and hence, there are four different traffic lights in this environment as shown in Eq. 16.
4 Deep Reinforcement Learning for Optimal Traffic Control
55
The number of traffic lights would have been more if there were dedicated lanes; however, since all the road arms are shared, the number of traffic lights are limited to four. T L = (T L N , T L E , T L W , T L S )
(16)
The standard traffic light colors can only be one of the states or colors shown in Eq. 17. Red color is the usual starting traffic light color where the vehicles will remain halted, green color is when the vehicles starts to move and yellow color is when the vehicles get ready to halt. T L Colors = (red, green, yellow)
(17)
As the traffic light transition occurs from red to green, the green traffic light stays for ten seconds until next traffic light occurs. If the next action is different from current, the traffic light changes from green to yellow and stays in yellow for four seconds before it changes to red. But if the next action from green is same as current, then the green stays for another ten seconds until the next action. Another key assumption that was made in formulating traffic lights is that at any point in time, at least one of the traffic light would be non-red so that there is at least one traffic light allowing vehicles to pass.
4.2 Framework of State Space Now that the environment and traffic light has been formulated, its important to formulate the state space in a way that the neural network can understand. In this work, each arm of the four road intersection is divided into 10 discrete cells. This implies that the total number of cells in this four road intersection would be 40 which dictates the dimensionality of the problem as 240 . The division of road intersection arms into discrete cells plays an important role in defining the importance of the cell. It is logical for one to think that a vehicle waiting near the junction would have waited longer in its drive than a vehicle waiting far away from the junction. So, its obvious that one has to give priority to vehicles waiting near the junction than the vehicles waiting far away from the junction. Hence, it is not a right approach to divide the road arms into equal discrete cells. Thus, one can divide the road arms into unequal discrete cells especially with smaller cells near the junction and bigger cells away from the junction. The smaller cells near the junction contribute more to the overall state vector as even a single vehicle in that small length of the arm is accountable for the single position change in the state vector, while multiple vehicles in a bigger cell account for just one single position change in the state vector. Thus, dividing the road arms into unequal discrete cells justifies the logical vehicle wait time considerations.
56
R. Nannapaneni et al.
4.3 Deep Neural Network The DL network is leveraged for integrating with Q-learning as the state space for the environment is of the order 240 which is huge and it is essential for predicting the actions given such a huge state space. The input to the network for the given state space would be a state vector of length 40. In this work, a five-layered DL network is used with each layer containing 400 neurons. The DL is a fully connected network with RELU as activation units. x = State Vector; y = Q n (s, a)
(18)
Train− > Model(x, y)
(19)
Loss = MSE[Q n (s, a) − Q Predicted ]
(20)
Equations 18, 19 and 20 correspond to the DL input and loss functions. The DL uses ADAM as optimizer and dropouts mechanism for eliminating any overfitting problem.
4.4 Experience Replay The samples obtained from the road intersection environment at time instant ‘t’ and ‘t + 1’ are highly correlated to each other. This is because the traffic on road intersection at time ‘t + 1’ depends on the vehicular positions at time instant ‘t.’ In general, the inputs to the DL techniques contain samples which adhere to independent and identically distributed properties. Unless the input data to DL network is independent and identically distributed, the DL network does not converge to approximate the output function. Thus, one can draw conclusion that there is a need to make the sample inputs from road intersection environment non-correlated for getting convergence or stability of DL output„ and at the same time, retain diversified input samples data from various situations at various instants of time so that the network would not lose or forget information of past experience. A method called as experience replay has been applied where the incoming samples are queued, and randomly, a batch of samples will be extracted and sent for DL network training. In this work, the experience replay is developed as shown in Fig. 4 where the memory size is 50,000 samples and the batch size is 100 samples. Once a 50,001 sample arrives to memory, the queue will pop out the first sample that was in the queue, and in this way, the memory size is maintained. The integration of experience replay with DL training has resulted in stable and converged outcomes.
4 Deep Reinforcement Learning for Optimal Traffic Control
57
Fig. 4 Experience replay
4.5 Design of Traffic Control System Once the environment, the state representation and the reward framework are formulated, it is important to integrate all of the frameworks so that a unified design of traffic control system is obtained. Figure 5 illustrates the design of the traffic control system where DRL agent interacts with environment via various tools.
Fig. 5 Design of traffic control system (Top) and training through deep Q-learning (Bottom)
58
R. Nannapaneni et al.
The inputs for formulating state vector and computing rewards is obtained from environment in SUMO tool via TraCI which is a module in Python. The TraCI module helps to retrieve the inputs from the environment in SUMO and TraCI also sends the action changes from Q-learning to environment in SUMO tool. The DL was developed using Google’s TensorFlow framework as mentioned in Ramsundar and Zadeh [22]. As a whole, this integration shown in Fig. 5 forms the overall design of the DRL agent interaction in a vehicle traffic control system environment.
5 Implementation and Training 5.1 Training Through SUMO The simulation of the vehicular traffic control system shown in Fig. 5 is done in SUMO. The four road intersection is the environment configured in SUMO using the utilities such as internal utilities in sumo. The NETEDIT tool is used to configure the traffic light configuration. The data from the environment is retrieved by TraCI and sent to the DRL agent which formulates the state vector and computes the reward from this data retrieved. Also, the outcome of the DRL agent which is the action change information is updated to the environment via the TraCI. The TraCI module comes with a set of commands which queries the SUMO tool and gets the necessary data. The commands can retrieve an in-depth information such as vehicle fuel consumption, noise emission, CO2 emission, queue length and vehicle waiting times etc.
5.2 Traffic and Route Generation The route for the vehicles is generated such that 75% of the vehicles go straight and only 25% of the vehicles take either right or left turn. Also, there are two different vehicles are introduced with varied lengths. That is, one of the vehicle with 5 cm and other type of vehicle with 10 cm. The vehicle length with 5 cm can be considered to be the size of a car while the vehicle length with 10 cm can be considered to be a bus or a truck. The routing is configured such that the cars are introduced on to the road with probability 0.9 while the truck or bus are introduced on to the road with probability 0.1. The generation of traffic is also made close to reality where the general traffic has peak in business hours. Hence, Weibull distribution was used to generate traffic where more vehicles will be generated during peak times as represented by its distribution.
4 Deep Reinforcement Learning for Optimal Traffic Control
59
The TraCI retrieves status of intersection at every timestep and then set the action chosen by the agent when it interacts with simulator during runtime. As the episode begins, the SUMO tool runs through 5400 steps where the traffic is generated in the tool. For visualization purposes, the vehicles which are not waiting appears blue, vehicles waiting upto 20 s appear in cyan color, vehicles waiting upto 40 s appear in green color, vehicles waiting upto 60 s appear in yellow color and vehicles waiting upto 80 s or more appear in red color. This process continues until all the 5400 steps in an episode are complete, and then, the next episode begins. The DRL agent keeps learning and eventually takes actions which are correct to the situations in future episodes.
6 Results and Discussion Several experimentations were performed on the vehicular traffic control system that was developed using DRL. As there were several parameters that are tunable, it is essential to determine the most important parameter first and then set it constant and modify other parameters.
6.1 Discount Factor Variations Discount factor γ is one of the crucial parameters in RL algorithms which dictates the weight-age of future rewards in the current context. It varies between 0 and 1 where 0 indicates that an agent takes actions based on its current rewards only, while a discount factor of 1 indicates that the agent takes actions based on all future actions. It is also mathematically treated as a trick to make infinite sum finite by introducing discount factor. Figure 6 illustrates the experimentation on vehicles cumulative delay through 500 episodes for varying discount factor γ . Three different discount factors γ 0.25, 0.5 and 0.75 were experimented, and the γ value of 0.25 has resulted in optimal outcome with respect to vehicle cumulative delay.
6.2 Mixed-Length Variations Another experimentation was the addition of vehicles of different lengths. That is, one of the vehicles is with 5 cm in length and other type of vehicle is with 10 cm of length. The vehicle length with 5 cm can be considered to be the size of a car while the vehicle length with 10 cm can be considered to be a bus or a truck. The routing is configured such that the cars are introduced on to the road with probability 0.9 while the truck or bus are introduced on to the road with probability
60
R. Nannapaneni et al.
Fig. 6 Top left—discount factor variation. Top right—vehicle length variation. Bottom—vehicle density variation
0.1. The cumulative delay of vehicles for fixed-length vehicles versus mixed-length vehicles is evaluated for 500 episodes as shown in Fig. 6. One can notice that the performance of the DRL agent is better with traffic containing vehicles of varying length which makes sense as in reality the traffic consists of varying length vehicles.
6.3 Vehicular Density Variations It is also crucial to determine the optimal traffic capacity of the given road intersection. Irrespective of how good the DRL agent can learn and take actions, there is a limit on the number of vehicles that can transit in a given road intersection. Thus, traffic generation of varying vehicles densities has been experimented that could give a clear indication of the limits the particular road intersection could handle. Figure 6 illustrates the cumulative delay of vehicle counts 200, 400 and 800, respectively, for up to 500 episodes. This evaluation clearly indicate that at the vehicle densities 200 and 400, there are less number of spikes which imply the stability of the given road intersection in handling the vehicle densities. While the vehicle density of 800 resulted in huge spikes indicating instability of vehicular cumulative delays. One can expect this outcome due to the fact that the vehicular delays become unpredictable as the number of vehicles through a given road intersection increases.
4 Deep Reinforcement Learning for Optimal Traffic Control Table 1 Performance of Q-learning variants Q-learning variants Original Q-learning Literature Q-learning Modified Q-learning
61
Mean cumulative delay (s) 941.238 920.768 917.886
Another outcome which can be assumed based on these results is that vehicular densities exceed 800 would definitely become more unstable, and hence, experimentation in that direction was not performed.
6.4 Modified Q-Learning Performance The performance of the original Q-learning algorithm, the literature Q-learning algorithm and the modified Q-learning algorithm from this work are evaluated for 500 episodes each with respect to vehicular cumulative delay. The modified Q-learning algorithm fairly performs better and appears to be more stable as episodes progresses. Table 1 summarizes the outcomes of the mean cumulative delay of Q-learning variants. This summary clearly showcases that the modified Q-learning algorithm has lowest mean cumulative delay.
6.5 Performance of Deep Reinforcement Learning Agent The parameters of the vehicular traffic control system through DRL are set to the already evaluated parameters such as 0.25 discount factor, mixed length vehicles, 200 vehicular density, modified reward function and modified Q-learning algorithm. Once the parameters are set as per the evaluated ones, several experimentations were performed for up to 500 episodes, and the results were captured. These results display the performance of the DRL agent in vehicular traffic control system (Fig. 7).
7 Conclusion This work has demonstrated that DRL can work wonders when applied to vehicular traffic control application. The DRL showcased human-like cognitive abilities through learning via RL algorithms such as Q-learning, especially when it is applied to various traffic scenarios in traffic control systems application. The deep reinforcement learning agent was designed as a fully connected deep neural network which is
62
R. Nannapaneni et al.
Fig. 7 Performance of the DRL agent
integrated with modified Q-learning algorithm. The interaction of deep reinforcement learning agent with real-world-like simulated environment SUMO is implemented, and deep dive experimentation is carried out. During the course of experimentation, the shared road intersection, left-hand traffic scenario, mixed-length vehicular traffic, modified reward structure and modified Q-learning algorithm were all evaluated. The performance outcome of the DRL agent for the mentioned metrics or parameters was seen reasonable and reflects close to reality. Hence, this work highlights not only the capabilities of the reinforcement learning but also its true success that can be obtained when applied to real-time problems.
7.1 Future Scope The emphasis of the current work was to experiment on shared road intersection, left-hand traffic scenario, mixed-length vehicular traffic, modified reward structure and modified Q-learning algorithm based on the gaps in the literature. As part of future work, this can also be extended to the various different intersection topologies to evaluate if there is any dependency of the topology to the agents algorithm. This can also be evaluated using recurrent neural networks as experimented in Van der Pol and Oliehoek [23]. Another outlook could be to experiment traffic light duration variations as an action of deep reinforcement learning instead of fixed traffic light duration and considering traffic light sequence as action space. The future work can also be scoped for mixed traffic scenarios where both manual and autonomous vehicles co-exist. That will be an interesting consideration, given a futuristic view of more autonomous vehicles to come on to the road in future as discussed in Makantasis et al. [24], Wu [25] and Isele et al. [26].
4 Deep Reinforcement Learning for Optimal Traffic Control
63
References 1. Singh SK (2017) Road traffic accidents in India: issues and challenges. Transp Res Procedia 25:4708–4719 2. Mishra P, Mishra P (2017) Vital stats: overview of road accidents in India. eSocialSciences working papers id: 11668 (2017) 3. Liang X et al (2018) Deep reinforcement learning for traffic light control in vehicular networks. arXiv preprint arXiv:1803.11115 4. Gao J et al (2017) Adaptive traffic signal control: deep reinforcement learning algorithm with experience replay and target network. arXiv preprint arXiv:1705.02755 5. Genders W, Razavi S (2016) Using a deep reinforcement learning agent for traffic signal control. arXiv preprint arXiv:1611.01142 6. Genders W, Razavi S (2018) Evaluating reinforcement learning state representations for adaptive traffic signal control. Procedia Comput Sci 130:26–33 7. Vidali A et al (2019) A deep reinforcement learning approach to adaptive traffic lights management. WOA 8. van Dijk J (2017) Recurrent neural networks for reinforcement learning: an investigation of relevant design choices. Diss. Masters thesis, University of Amsterdam. https://esc.fnwi.uva. nl/thesis/centraal/files/f499544468.pdf 9. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292 10. Goodfellow I et al (2016) Deep learning, vol 1(2). MIT Press, Cambridge 11. Strnad FM et al (2019) Deep reinforcement learning in world-earth system models to discover sustainable management strategies. Chaos Interdisc J Nonlinear Sci 29(12):123122 12. Zhang S, Sutton RS (2017) A deeper look at experience replay. arXiv preprint arXiv:1712.01275 13. Krajzewicz D et al (2002) SUMO (Simulation of Urban MObility)-an open-source traffic simulation. In: Proceedings of the 4th middle East symposium on simulation and modelling (MESM20002) 14. Guo M et al (2019) A reinforcement learning approach for intelligent traffic signal control at urban intersections. In: 2019 IEEE intelligent transportation systems conference (ITSC). IEEE 15. Mousavi SS, Michael S, Enda H (2017) Traffic light control using deep policy-gradient and value-function-based reinforcement learning. IET Intell Transp Syst 11(7):417–423 16. Zheng G et al (2019) Diagnosing reinforcement learning for traffic signal control. arXiv preprint arXiv:1905.04716 17. Casas N (2017) Deep reinforcement learning for urban traffic light control 18. Szepesvari C (2010) Algorithms for reinforcement learning: synthesis lectures on artificial intelligence and machine learning. Morgan and Claypool 19. Bellman R (1958) Dynamic programming and stochastic control processes. Inf Control 1(3):228–239 20. Sutton RS, Barto AG (1998) Introduction to reinforcement learning, vol 135. MIT press, Cambridge 21. Russell S, Norvig P (2002) Artificial intelligence: a modern approach 22. Ramsundar B, Zadeh RB (2018) TensorFlow for deep learning: from linear regression to reinforcement learning. O’Reilly Media, Inc 23. Van der Pol E, Oliehoek FA (2016) Coordinated deep reinforcement learners for traffic light control. In: Proceedings of learning, inference and control of multi-agent systems (at NIPS 2016) 24. Makantasis K, Maria K, Ioannis N (2019) A deep reinforcement learning driving policy for autonomous road vehicles. arXiv preprint arXiv:1905.09046 25. Wu C (2018) Learning and optimization for mixed autonomy systems-a mobility context. Diss, UC Berkeley 26. Isele D et al (2018) Navigating occluded intersections with autonomous vehicles using deep reinforcement learning. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE
Chapter 5
Ensemble Semi-supervised Machine Learning Algorithm for Classifying Complaint Tweets Pranali Yenkar and S. D. Sawarkar
1 Introduction As per the UN prediction, around 68% of the world population estimated to reside in cities by 2050. The exponential growth in the urban population poses many challenges in front of local government. City authorities need to plan, manage, and govern the public services like infrastructure, transportation, safety, etc. with very limited resources [1]. Innovations in the digital technology allow the traditional local government to transform into initially the ‘e-government’ and later on into the ‘smart government’. Smart government aspect of the ‘smart city’ concept [2, 3] employs the emerging information and communication technologies (ICT) to develop transparent and sustainable environment [4]. It also aims to utilize cost-effective and innovative solutions offered by the ICT to deliver better public services and improve the quality of life of the citizens. As citizens are the main stakeholder of the city, so smart government promotes innovative approaches to engage and collaborate with the citizens and understand their point of view. Interactive public engagement [5] with the government over all the issues, inconveniences, solutions, and suggestions would definitely help in solving the urban challenges. Various types of modes available for discussions some of which are surveys, call centers, Web applications and mobile apps. All of these platforms are having some or the other limitations like time consuming, costly and allows sharing the information only in a predefined manner. Easily accessible, quick and highly reachable social media platform is most prominent solution to all these limitations and can be used to communicate and convey the daily issues faced by the citizens [6]. Local government can utilize the data shared on social media such as popular platform, Twitter [7], for effective urban planning and recognizing the civic issues [8]. Being a social networking and engagement platform, citizens discusses many issues they face in daily life quite openly without any hesitation like P. Yenkar (B) · S. D. Sawarkar Datta Meghe College of Engineering, Airoli, Navi Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_5
65
66
P. Yenkar and S. D. Sawarkar
encroached footpaths, sound and noise pollution, inactive streetlights, traffic jams and broken bridges, safety and crime [9]. Owing to the popularity of the medium, every minute huge number of tweets is posted on twitter which makes it a tedious task to identify and extract the complaint tweets. Researches have experimented with the supervised machine learning algorithm [10] to automatically classify the tweets into complaint and non-complaint. But it needs huge amount of accurately annotated data which is not available in our scenario as it is problem and location specific. So instead of performing exhausting and tedious task of generating the labeled data, we explored the semi-supervised machine learning algorithms which needs a small amount of labeled data initially and use it subsequently to label the huge amount of unlabeled data. In this study, we proposed the ensemble of two semi-supervised algorithms, i.e., co-training [11–13] and self-training approach to perform the binary classification of twitter data. So the research objectives of this study are as follows: 1. 2.
Perform the twitter data analysis by using natural language processing and machine learning algorithm to understand urban issues Develop a framework for automated complaint identification using novel Cotraining Self-Training Ensemble Algorithm and evaluate its performance.
The rest of the paper is organized as follows: The next section outlines a literature survey of the existing work in the areas of social media analytics for urban issues and semi-supervised algorithm followed by proposed framework and methodology. Next, we described experimental set up along with the results of this study followed by conclusion of the study.
2 Literature Survey The objective of our study is to extract the tweets and classify it into complaint vs. non-complaint by applying the semi-supervised approach; hence, relevant existing literature covering social media analytics and semi-supervised machine learning algorithms are evaluated and analyzed thoroughly. As citizens are the core part of smart city ecosystem, numerous researchers have tried to understand the role of citizens, their opinions and concerns for better urban development [7, 14, 15]. Along with the blogs, Web sites and mobile apps, researches have analyzed social media platforms as well to take cognizance of the daily problems of the people. [16–19]. Authors have studied the tweets as a valuable source of complaints city wise, e.g., Dublin [20], Indonesia [21] and region wise [22] using various techniques and parameters like TF-IGM, geo-located tweets, clustering. While extracting the region-wise complaints, authors [22] have mainly focused on quantitative aspect like count of tweets rather than qualitative aspect, i.e., content of the tweet. Authors [23, 24] have performed sentiment analysis and emotion analysis to understand the issues faced by the citizens. Authors [25, 26] have categorized the tweets into information sharing, appreciation, promotional and complaint tweets by applying various
5 Ensemble Semi-supervised Machine Learning Algorithm …
67
text analytics methods. Authors [27] have experimented with geo-located tweets and applied supervised learning algorithm to extract the complaint tweets. However, very fewer amounts of tweets contain the geo-locations which resulted in ignoring large amount of complaint data. Aspect-based sentiment analysis of the complaints [28] helped to identify different issues mentioned in the same tweet based on sentiment score. Researchers were also concentrating on various external parameters like weather condition, day of week, time of day and historical data to predict complaints in near future so that government can act proactively before any mishap happens [29–31]. Researchers have also developed various platforms like Janayuja which collects the information related to the issues from various blogs, forums and social media and combine together at one place for better situational awareness [32]. In order to find the useful information from the large amount of extracted tweets by avoiding the issue of unavailability of large amount of labeled data, semi-supervised machine learning algorithms are preferred way. Co-training and self-training are two popular types of semi-supervised machine learning algorithms [33]. Co-training algorithm uses different subgroups of the same dataset to train the classifiers. Iosifidis and Ntoutsi [34] implemented co-training and self-learning semi-supervised algorithms to create the sizeable amount of labeled data. After studying the exiting research and observing that co-training and self-training algorithm performs well with the limited training data, we have proposed a framework using both the methods to classify the complaint tweets as discussed in the next section.
3 Methodology Toward achieving the objective of our study, we have experimented with various natural language processing and machine learning algorithms. The proposed framework to extract the urban issues with limited labeled data is illustrated in Fig. 1. Each of the steps of the mentioned framework is explained in details as below.
3.1 Data Extraction In order to get attention and reach it to the targeted department, people add the relevant user mention or twitter handle while writing the tweet. So we have determined the important user mentions and terms related to the urban issues, e.g., @mumbaitraffic, @MumbaiPolice, @RoadsOfMumbai, @mumbaimatterz, pothole, manhole, etc. and used them as the keywords to extract the tweets from twitter using Twitter streaming API. Around 1000 tweets written in English language by the citizens of Mumbai are used for the experimentation and are saved in comma-separated values (csv) file format.
68
P. Yenkar and S. D. Sawarkar
Data Extrac on
Data Preprocessing Feature Extraction Ensemble Co-training, self-training algorithm Model Building
Labeled instances
Evaluate Result Fig. 1 Proposed methodology
3.2 Data Preprocessing Tweets are written in users own style of writing so may contains irrelevant information which decreases the classification performance. So in order to remove the noise, various preprocessing techniques prominent in natural language processing techniques are used. It predominantly consists of removing unwanted data like mentions,
5 Ensemble Semi-supervised Machine Learning Algorithm …
69
urls, email, digits and stop word along with the spell correction, abbreviation expansion and lemmatization. Lemmatization process converts the word to the base form, e.g., playing, plays, played to play. Here in this study, unlike the existing approaches, the words with the more than two repetitive characters, e.g., ‘pleeeease’ are not replaced with the correct word like ‘please’ and rather used as one of the one of the features for further processing as it intensifies the sentiments of that word.
3.3 Feature Identification Various features are extracted from tweets which help in achieving the accurate classifications. Some of the features are count of words in clean tweet after preprocessing (count_words), count of number of hash tags (count_hashtags), urls (count_urls), mentions (count_mentions) present the tweet before preprocessing, total no. of emojis (count_emojis), no. of negative emojis (count_neg), count of exclamation (count_rep_ex) and question marks (count_rep_q), count of total words and no. of capital words (count_capital_words) present, etc. in the tweets.
3.4 Ensemble Co-training, Self-training Algorithm-Based Classifier In a typical co-training semi-supervised learning algorithm, available labeled data is divided into two groups based on features, and two classifiers C1 and C2 are trained on each view. At every cycle, few unlabeled samples are classified by classifier C1 and C2, and the samples with confidence greater than some threshold value will be appended back into the training data. This process continues for fixed number of iteration or till all unbalanced instances are exhausted. Self-training algorithm also executes the same procedure just like co-training semi-supervised algorithm except the input data is not partitioned into two views, and all the features are considered to derive the label of the target instance. Our study has explored the ensemble of cotraining and self-learning algorithm to improve the classification performance. This approach utilizes the pseudo-labels predicted by the co-training and self-training algorithm and based on that decides the final label of the unlabeled instance. In this approach, as shown in Fig. 2, instead of one classification algorithm, self-training approach is experimented with two different classification algorithms, and pseudolabel predicted by the model with better accuracy is considered as the final prediction of self-training approach. In our proposed approach, initially labeled dataset is partitioned into view1 and view2 using based on the extracted features (F) divided into two sets F1 and F2. The value of derived features is already calculated for all labeled and unlabeled tweets and remained same throughout the execution. Once all the tweets are labeled, complete labeled data is considered as a training set. The
70
P. Yenkar and S. D. Sawarkar
Fig. 2 Ensemble co-training self-training approach
output produced by each co-training and self-training algorithms is compared, and for deciding tweets labels from test data, the steps used are as follows: 1. 2. 3.
If both labels predicted by co-training and self-training are identical, then label will be assigned to the tweet. If predicted labels mismatch, then compare the accuracies of both the algorithms and assign the label suggested by the algorithm having the higher accuracy. If accuracies are equal, then assign the label predicted by algorithm which is showing higher confidence.
5 Ensemble Semi-supervised Machine Learning Algorithm …
71
4 Experimental Set Up In this study, the experiments are carried out on the tweets posted by the citizens of Mumbai city on the various authenticated user accounts. The implementations are done in Python using Sci-Kit Learn and Natural Language Tool Kit (NLTK). In first phase of implementation, we have used the pipeline concept and appended the extracted and calculated features together with the tweet content to create one feature and classified using two well-known supervised learning algorithms—logistic regression (LR) and Multinomial Naïve Bayes (MNB). In the next phase, ensemble version of co-training and self-training algorithms is tested with only 25% labeled data and 50% unlabeled data, and remaining 25% data is used as test data test the accuracy of the newly proposed approach. The performance of proposed complaint classification model has been assessed by various traditional evaluation metrics like classification accuracy, precision, recall and F1-score.
5 Results and Discussion This section discusses and compares the experimental results obtained after applying the existing and proposed methodology on the tweets in order to classify them in to ‘complaint’ and ‘non-complaint’ classes. In the first set of results, conventional supervised learning algorithms like logistic regression (LR) and multinomial Naïve Bayes (MNB) are trained and tested on all the extracted the features after performing the traditional preprocessing techniques like spell correction, abbreviation expansion, stop word removal and lemmatization. Although LR shows better performance in accuracy parameter as compared to MNB algorithm, the overall F1 score of MNB is slightly higher than LR. In the next set of results, different variations of Co-training Self-Training Ensemble Algorithm (COSEA) tested with different classification algorithms for each of two classifiers performed better as compared to the supervised learning algorithm. In first set of experimentation, in case of co-training algorithm, for any particular instance, F1 set of features are applied to the classifier algo.1 (MNB) and F2 set of features are applied to the classifier algo.2 (LR), and predicted label is considered as intermediated result. At the same time, in case of self-training algorithm, for the same instance, all the features (F1 + F2) are combined together and applied to the classifier algo.1 (MNB) and F2 set of features are applied to the classifier algo.2 (LR), and target label is predicted. The target labels predicted by co-training and self-training algorithms are compared as discussed in Sect. 3.4, and final label is decided. In this scenario, we got the accuracy as 87% and F1 score as 88%. Here, the precision (91%) achieved is better than the recall (87%) value. In the second set of experiments, experimental set up is same as first set of experiments except in case of co-training algorithm, the F1 set of features are applied to the classifier algo.1 (LR), and F2 set of features are applied to the classifier algo.2 (MNB). In this scenario, we
72
P. Yenkar and S. D. Sawarkar
Table 1 Comparative results of existing supervise and proposed ensemble approach Test case
Features
Algo
Supervised Processed (existing) tweets along with the calculated and extracted features using pipeline approach. Use of grid search technique for selecting best parameter for classification
count_emojis, count_neg, count_rep_ex, count_hashtags, count_rep_q, count_urls, count_excl_quest_marks, count_capital_words, count_words, count_mentions
MNB 0.89
0.83
0.85 0.83
LR
0.84
0.85
0.84 0.85
Ensemble (proposed)
* For Co-training algorithm. F1-count_emojis, count_neg, count_rep_ex, count_hashtags, count_rep_q F2-count_urls, count_excl_quest_marks, count_capital_words, count_words, count_mentions
C1 = 0.91 MNB C2 = LR
0.87
0.88 0.87
*Considered all features for Self-training
C1 = 0.87 LR C2 = MNB
0.89
0.88 0.89
Two different algorithms for two classifiers
P (%) R (%) F1 (%)
Acc (%)
got the improved accuracy (89%) as compared to the previous scenario. Hence, the result clearly indicated that our approach of using ensemble of co-training and selftraining algorithms has shown significant improvement in the important evaluation parameters like accuracy and F1 score as compared to existing supervised learning algorithms (Table 1).
6 Conclusion Citizens are actively participating and sharing their experiences about their issues in daily life using the social media. Even though tremendous amount of valuable data is available, it needs to be extracted, preprocessed and classified accurately so that local government can utilize it effectively. Our proposed framework extracted the relevant tweets and automatically identified the complaint tweets by exploring novel ensemble semi-supervise algorithm-based approach. Our approach also seems
5 Ensemble Semi-supervised Machine Learning Algorithm …
73
promising to overcome the limitation of sparse labeled data. Although the study has experimented with only twitter data, the same approach can be extended further on combined data collected from different information sharing platforms. Future work will also involve selection of most relevant unlabeled instances to be labeled in each iteration for better overall result.
References 1. Batty M (2013) Big data, smart cities and city planning. Dialogues Hum Geogr 7(1):274–279 2. Scholl HJ, AlAwadhi S (2016) Creating smart governance: the key to radical ict overhaul at the city of Munich. Inf Policy 21(1):21–42. https://doi.org/10.3233/IP-150369 3. Gil-Garcia JR, Helbig N, Ojo A (2014) Being smart: Emerging technologies and innovation in the public sector. Gov Inf Q 31(1):11–18. https://doi.org/10.1016/j.giq.2014.09.001 4. Nam T, Pardo TA (2014) The changing face of a city government: a case study of Philly311. Gov Inf Q 31(1):1–9. https://doi.org/10.1016/j.giq.2014.01.002 5. Vácha T, Pˇribyl O, Lom M, Bacúrová M (2016) Involving citizens in smart city projects: systems engineering meets participation. In: Smart cities symposium Prague (SCSP), pp 1–6 6. Criado JI, Sandoval-Almazan R, Gil-Garcia JR (2013) Government Innovation through social media. Gov Inf Q 30(4):319–326 7. López-Ornelas E, Abascal-Mena R, Zepeda-Hernández S (2017) Social media participation in urban planning: a new way to interact and take decisions. In: International archives of the photogrammetry, remote sensing and spatial information sciences 8. Santala V, Costa G, Gomes-Jr L, Gadda T, Silva TH (2020) On the potential of social media data in urban planning: findings from the beer street in Curitiba, Brazil. Plan Pract Res 35(5):510– 525 9. Estévez-Ortiz FJ, García-Jiménez A, Glösekötter P (2016) An application of people’s sentiment from social media to smart cities. El Profesional de la Información 25(6):851–858 10. Hasan A, Moin S, Karim A, Shamshirband S (2018) Machine learning-based sentiment analysis for Twitter accounts. Math Comput Appl 23(1):11 11. Hardaya IS, Dhini A, Surjandari I (2017) Application of text mining for classification of community complaints and proposals. In: 3rd international conference on science in information technology (ICSITech), pp 144–149 12. Silva NFFD, Coletta LF, Hruschka ER (2016) A survey and comparative study of tweet sentiment analysis via semi-supervised learning. ACM Comput Surv (CSUR) 49(1):1–26 13. Karlos S, Kostopoulos G, Kotsiantis S (2020) A soft-voting ensemble based co-training scheme using static selection for binary classification problems. Algorithms 13(1):26 14. Yigitcanlar T, Kankanamge N, Vella K (2020) How are smart city concepts and technologies perceived and utilized? A systematic geo-twitter analysis of smart cities in Australia. J Urban Technol 28(1–2):1–20 15. Alkhammash EH, Jussila J, Lytras MD, Visvizi A (2019) Annotation of smart cities Twitter micro-contents for enhanced citizen’s engagement. IEEE Access 7:116267–116276 16. Bousios A, Gavalas D, Lambrinos L (2017) CityCare: crowdsourcing daily life issue reports in smart cities. In: IEEE symposium on computers and communications (ISCC), pp 266–271 17. Jang JS, Lee BI, Choi CH, Kim JH, Seo DM, Cho WS (2016) Understanding pending issue of society and sentiment analysis using social media. In: Eighth international conference on ubiquitous and future networks (ICUFN), pp 981–986 18. Zha YF, Veloso M (2014) Profiling and prediction of non-emergency calls in NYC. In: Workshops at the twenty-eighth AAAI conference on artificial intelligence 19. Desouza KC, Bhagwatwar A (2012) Citizen apps to solve complex urban problems. J Urban Technol 19(3):107–136
74
P. Yenkar and S. D. Sawarkar
20. Panagiotou N, Zygouras N, Katakis I, Gunopulos D, Zacheilas N, Boutsis I, Kalogeraki V, Lynch S, O’Brien B (2016) Intelligent urban data monitoring for smart cities. In: Joint European conference on machine learning and knowledge discovery in databases, pp 177–192. Springer, Cham 21. Mahfud FKR, Tjahyanto A (2017) Improving classification performance of public complaints with TF-IGM weighting: case study: media center E-wadul Surabaya. In: 2017 international conference on sustainable information engineering and technology (SIET). IEEE, pp 220–225 22. Bansal P, Toshniwal D (2016) Analyzing civic complaints for proactive maintenance in smart city. In: 2016 IEEE/ACIS 15th international conference on computer and information science (ICIS). IEEE, pp 1–6 23. Masdeval C, Veloso A (2015) Mining citizen emotions to estimate the urgency of urban issues. Inf Syst 54:147–155 24. Singh N, Roy N, Gangopadhyay A (2018) Analyzing the sentiment of crowd for improving the emergency response services. In: International conference on smart computing (SMARTCOMP), pp 1–8 25. Agarwal S, Sureka A (2017) Investigating the role of Twitter in E-governance by extracting information on citizen complaints and grievances reports. In: International conference on big data analytics, pp 300–310 26. Agarwal S, Mittal N, Sureka A (2018) Potholes and bad road conditions: mining twitter to extract information on killer roads. In: Proceedings of the ACM India joint international conference on data science and management of data, pp 67–77 27. Abalı G, Karaarslan E, Hürriyeto˘glu A, Dalkılıç F (2018) Detecting Citizen problems and their locations using Twitter data. In: 6th international Istanbul smart grids and cities congress and fair (ICSG), pp 30–33 28. Kumar A, Jiang M, Fang Y (2014) Where not to go? Detecting road hazards using Twitter. In: Proceedings of the 37th international ACM SIGIR conference on research & development in information retrieval, pp 1223–1226 29. Semwal D, Patil S, Galhotra S, Arora A, Unny N (2015) Star: real-time spatiotemporal analysis and prediction of traffic insights using social media. In: Proceedings of the 2nd IKDD conference on data sciences, pp 1–4 30. Witanto JN, Lim H, Atiquzzaman M (2018) Smart government framework with geocrowdsourcing and social media analysis. Futur Gener Comput Syst 89:1–9 31. Wongcharoen S, Senivongse T (2016) Twitter analysis of road traffic congestion severity estimation. In: 13th international joint conference on computer science and software engineering (JCSSE), pp 1–6 32. Mukherjee T, Chander D, Eswaran S, Singh M, Varma P, Chugh A, Dasgupta K (2015) Janayuja: a people-centric platform to generate reliable and actionable insights for civic agencies. In: Proceedings of the 2015 annual symposium on computing for development, pp 137–145 33. Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109(2):373–440 34. Iosifidis V, Ntoutsi E (2017) Large scale sentiment learning with limited labels. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1823–1832
Chapter 6
Underwater Image Enhancement Using Fusion Stretch Method Litty Koshy and Shwetha Mary Jacob
1 Introduction Underwater environment is a mystique world, full of attractions such as the marine life, underwater flora, and many others. In various branches of research [1], underwater imaging has also been an important source of interest, such as underwater infrastructure inspection [2] and cables [2] as well as many more. Because of the illumination properties that occur in various environments in the underwater, underwater imaging is difficult. Because of the improper lighting and obstruction of the propagated light, the underwater images degrade more than a normal image that has poor visibility. Unlike normal pictures, underwater images, primarily due to absorption and scattering effects, suffer from low visibility resulting from the attenuation of the propagated light. Because of the absorption phenomenon, the light energy can be decreased, and scattering causes drastic changes in the direction of light propagation. As a result, the performance would have a foggy appearance and degradation in contrast, making it look distorted and misty to distant objects. Mainly in seawater images, the images are almost impalpable if the distance is more than 10 m and some of the colors are degraded because their composition of wavelengths is affected. Because of these issues, the final output will turn out to be fuzzy and faded. This paper suggests a single image strategy to enhance the accuracy of underwater image quality by removing the haze and blurriness. The entire strategy is based on a single image captured with a camera. This strategy derives the inputs and the weights from the original degraded image, which adopts an approach which does not use multiple images. White balancing, contrast stretching, and multi-scale fusion are the main processes involved in the proposed strategy. In the proposed framework firstly, in order to eliminate the color casts in degraded image, the image is white-balanced L. Koshy (B) · S. M. Jacob (B) Department of CSE, SCMS School of Engineering and Technology, Ernakulam, Kerala, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_6
75
76
L. Koshy and S. M. Jacob
such that the resulting image would have a natural appearance. Then the enhanced version of this restored image is obtained by applying contrast stretching to it, which helps to increase the global contrast of the image. The contrast-stretched image is later subjected to a gamma correction as well as a sharpening process in order to get two inputs from each process. Later, a multi-scale fusion is implemented. The three weight maps used in the multi-scale fusion process assess several image qualities that specify the spatial pixel relationships. These weights assign higher values to pixels to properly depict the desired image qualities. The structure of this paper is as follows. Section 2 briefs the related works of underwater dehazing. Section 3 presents the proposed system. Section 4 discusses the fusion strategy, and Sect. 5 presents qualitative and quantitative assessments of the fusion stretch method with color balance and fusion method with the help of an online underwater quality evaluation platform, PUIQE [3].
2 Related Works The image enhancement technique [4] introduced a new method to improve the appearance of captured underwater images and degraded them due to dispersion and absorption of the medium. This technique was a single image approach that did not primarily need an enhanced hardware and a prior knowledge about the conditions of underwater or scene structure. The two images are blended together and extracted from the white-balanced and color compensated version. In order to facilitate the transition of edges and color contrast to the output image, the two fused images, as well as their related weight maps, are identified. The quality of underwater images can also be improved using specialized hardware [1, 5, 6]. For instance, a system, which is developed by NAVAIR to image targets underwater from an underwater vehicle, is used to capture underwater images using modulation/demodulation techniques [1]. The major problem involving these complex systems is that they are very expensive. The way of estimating the optical transmission of blurred scenes in single input image, i.e., the system of dehazing the single image, was suggested by Fattal et al. [7]. According to this technique, in order to recover an image that is free of haze and to enhance the visibility of image, dispersed light is removed. The polarization-based methods use the technique where several images of the same scene are captured while rotating a polarizing filter fixed to the camera in order to attain several images with different degrees of polarization. The authors, Schechner et al. exploit the polarization method which is associated with light that has been backscattered [8]. The polarization techniques are not applicable to video acquisition [4]. Deep photograph model can improvise underwater images by employing the existing artificial landscape and urban 3D models that have been geo-referenced [9]. Some other techniques make use of the light propagation in fog and underwater. To recover images of foggy scenes, several single image dehazing techniques are available [10, 11]. He et al. [12, 13] proposed a method based on dark channel prior (DCP), for restoring underwater images. This method was proposed for dehazing scenes from the outdoor.
6 Underwater Image Enhancement Using Fusion Stretch Method
77
3 Proposed System The architecture diagram of the proposed system is shown in Fig. 1.
3.1 White Balancing Image enhancement approach adopts a three-step strategy, combining white balancing, contrast stretching, and multi-scale fusion, to restore underwater images. White balancing is the initial step to improve the quality of the underwater image. It primarily removes the color casts caused by different illumination properties [4]. White balance results in identifiable change in deep water because the absorbed colors are difficult to be restored [14]. In underwater, colors are closely linked to depth, and the resulting image is a color combination which is of green and bluish appearance which is the main problem. The color loss is proportional to the total distance between the scene and viewer. White balance is used to characterize the accuracy of white color which can be obtained by the combination of three primary colors red (R), green (G), blue (B). As a result, the white color is used as the standard for restoring color offset [15]. Figure 2 shows original and white-balanced image.
3.2 Contrast Stretching To increase the global contrast of the image, white-balanced image is subjected to contrast stretching. Contrast stretching is a technique for adjusting image intensities
Fig. 1 Overview: An input is formerly derived from taking the white-balanced version of the input image, this white-balanced image is subjected to contrast stretching, then two other images are derived from this contrast-stretched version and are merged together using a multi-scale fusion algorithm
78
L. Koshy and S. M. Jacob
Fig. 2 a Original image. b White-balanced image
in order to enhance the global contrast [16]. This technique uses linear normalization that stretches and matches an arbitrary interval of the intensities of an image and fits the interval to another arbitrary interval. Its purpose is to equally use the entire range of values available. The methodologies sometimes increase the image global contrast, when the image’s usable information is expressed by close contrast values. After this correction, the intensities will be more evenly distributed on the histogram. As a result, lower local contrast areas will achieve a higher contrast. Contrast stretching does this by increasing the disparity between the images maximum and the minimum intensity values. The approach works well in pictures of both light and dark backgrounds and foregrounds. Figure 3 shows white-balanced image and contrast-stretched image.
Fig. 3 a White-balanced image. b Contrast-stretched image
6 Underwater Image Enhancement Using Fusion Stretch Method
79
3.3 Multi-scale Fusion Multi-scale fusion principles are used to merge the inputs into a single dehazed underwater image. Image fusion can be used for a variety of purposes including image compositing [17], multispectral video enhancement [18], and HDR imaging [19]. In order to achieve fair visibility in the output image, fusion algorithm is applied which is expressed in terms of inputs and weights. A collection of input and weight maps is derived from a single degraded image which forms the basis of fusion algorithm. By retaining the relevant features of image, fusion algorithm blends multiple input images. Instead of attempting to extract inputs from a physical model of the scene, the algorithm aims for a quick and simple method. The underwater dehazing technique consists of four main steps: (i) (ii) (iii) (iv) 3.3.1
Derive inputs from the white-balanced underwater image. Derive inputs from the contrast-stretched underwater image. Definition of weight maps. Apply fusion process to the inputs and weight maps. Fusion Process Inputs
The primary step of this algorithm is to produce the white-balanced version of original image. The main aim is to enhance the appearance of the image by eliminating any unnecessary color combination caused by different illuminance properties. While taking image deeper in underwater, white balancing alone will not suffice in recovering the absorbed colors. As a consequence, contrast stretching is applied to obtain the first input. Since white-balanced underwater images tend to appear too vivid, the global contrast needs to be changed by applying contrast stretching. The aim of contrast stretching is to improve the clarity of hazy areas while degrading the rest of the image. In order to compensate for the loss, gamma correction method is applied to the contrast-stretched version of the image. Gamma correction is used for correcting the contrast and is important as, contrast stretching enhances the visibility in regions of haze and as for the rest of the image, and the contrast will remain low. Gamma correction improves the contrast between darker and lighter areas at the expense of details in under or overexposed areas [4]. Figure 4 shows contrast-stretched image and gamma-corrected image. To overcome this loss, the contrast-stretched version of the input is subjected to a sharpening algorithm. It reduces the degradation caused by scattering phenomenon and increases the image quality. Image sharpening is an enhancement technique that highlights image’s edges and fine details. Sharpening an image increases the contrast between bright and dark regions to bring out features. Figure 5 shows contraststretched image and gamma-corrected image.
80
L. Koshy and S. M. Jacob
Fig. 4 a Contrast-stretched image. b Gamma-corrected image
Fig. 5 a Gamma-corrected image. b Sharpened image
3.4 Weights for the Fusion Process During multi-scale fusion, the weight maps are used to make the pixels with the highest weight value stand out more in the final image. As a consequence, a variety of weight metrics are used to describe them. Figure 6 shows (a) gamma-corrected image (b) Laplacian weight (c) saliency weight (d) saturation weight. Figure 7 shows (a) sharpened image, (b) Laplacian weight, (c) saliency weight,
Fig. 6 a Gamma corrected image. b Laplacian weight. c Saliency weight. d Saturation weight
6 Underwater Image Enhancement Using Fusion Stretch Method
81
Fig. 7 a Sharpened image. b Laplacian weight. c Saliency weight. d Saturation weight
(d) saturation weight.
3.4.1
Laplacian Contrast Weight (Wt )
The edge detector used is Laplacian filter that measures the rate at which the first derivatives change and computes the second derivatives of an image. The contrast weight calculates the absolute value of the Laplacian filter which is then added to each luminance channel. For dehazing underwater, this weight is insufficient to enhance the contrast. We introduce a new weight metric to compensate for this issue.
3.4.2
Saliency Weight (Ws )
Saliency weight is used to give more significance to the area that lost their significance in the underwater scene. The saliency map tends to favor areas with high luminance values. However, this weight map does not give any emphasis to the less highlighted features in the scene; therefore, an additional weight map called the saturation weight map was introduced.
3.4.3
Saturation Weight (Wa )
Saturation weight map from the inference that increases the saturation in less highlighted regions. The multi-scale fusion algorithm takes advantage of highly saturated regions to become accustomed to chromatic knowledge. It can be computed using the following formula, Eq. (1) which uses the red (Rk ), green (Gk ), blue (Bk ), and luminance channels (L k ) [4]. Wa =
1/3 (Rk − L k )2 + (G k − L k )2 + (Bk − L k )2
(1)
82
L. Koshy and S. M. Jacob
3.5 Normalization Next step is the normalization, which involves merging the three weight maps into a single weight map. According to our algorithm, we have two inputs, gammacorrected and the sharpened images, and we calculate an aggregated weight map, W k , for each of the two inputs by summing up each of their three weight maps. After this, we apply the normalization formula to obtain two normalized weight maps. The generalized formula is given by, Wk = (Wk + δ)
k
(Wk + K δ)
(2)
k=1
where k is the kth input and δ denotes a small regularization term which is usually taken as 0.1 [4].
3.6 Fusion Process Image fusion is a method of preserving the most significant perceived properties by computing the inputs as weight maps. The key goal is to integrate the input weight maps into a final high-quality result, with all the features restored. Laplacian and Gaussian pyramids are used in the multi-scale decomposition. An image is decomposed into a sum of band-pass images using the Laplacian pyramid representation. Here, Gaussian pyramid is constructed for each of the three inputs, a gammacorrected normalized weight map, a gamma-corrected image, and a grayscale image of the sharpened normalized weight map (mask). The three Gaussian pyramids form inputs to the Laplacian pyramid. A Laplacian pyramid is constructed for each of its input. The Gaussian and Laplacian pyramids will have the same number of levels. Later, two Laplacian pyramids are combined (Laplacian gamma-corrected normalized weight map and Laplacian gamma-corrected image) with a mask (Gaussian mask) and later reconstruct the image. After sufficient up sampling, the final dehazed output is obtained by summing the used contributions of all stages. By using a fusion mechanism at each scale stage separately, the initial image structure is preserved. The human visual system, which is largely susceptible to local contrast variations such as edges and corners, drives multi-scale fusion.
4 Results and Discussions The proposed method is performed on real-world underwater images. Visual comparisons with competitive method, color balance, and fusion are presented in Fig. 8. The real-world underwater images are generated from the Internet because there is
6 Underwater Image Enhancement Using Fusion Stretch Method
83
Fig. 8 a Original image. b Final weight
no public underwater image dataset available. These real-world underwater images are taken under different lightings, depths for better understanding of our results. The fusion stretch method is evaluated by performing quantitative and qualitative comparisons with similar enhancement method, color balance and fusion method, on underwater images. Table 1 shows the assessment based on two metrics: UCIQE and UIQM. These metrics are used for underwater image assessment. UCIQE metric is used to measure the non-uniform color cast, low contrast, and blurring that characterize underwater images, while UIQM focuses on three main underwater image quality criteria: colorfulness, sharpness, and contrast. While evaluating the specialized underwater dehazing approaches, color balance, and fusion show higher robustness in recovering the visibility of the considered scenes. However, fusion stretch method outperforms, with UCIQE and UQIM values that are clearly higher. The quantitative analysis was done with the help of an online platform for underwater image quality evaluation, PUIQE. It provides standard consistency assessment tools for comparing underwater datasets in a standardized way. It is based on the progress of other computer vision fields that have been aided by assessment platforms. PUIQE encourages process Table 1 Quantitative evaluation IMAGE
Fusion Stretch
Color balance and Fusion
UCIQE
UIQM
UCIQE
UIQM
Reef1
0.792
0.913
0.705
0.713
fish
0.724
0.716
0.610
0.598
Reef2
0.708
0.654
0.623
0.617
Reef3
0.655
0.693
0.556
0.628
Shipwreck
0.678
0.613
0.547
0.451
Galdran1
0.649
0.623
0.541
0.540
Average
0.701
0.702
0.597
0.591
84
L. Koshy and S. M. Jacob
comparisons by using common databases and realistic assessment criteria. On an overall it is evident, the proposed system shows good visual consistency with major improvements in global contrast and color while preserving the image structure. The main drawback is regarding, the quality of pictures taken far from the camera is difficult to upgrade, as the color cannot always be restored. More emphasis on the gamma-corrected version compared to the sharpened version during the fusion process because of the popular belief that when the contrast-stretched gammacorrected image is used in the multi-scale fusion, it helps to mitigate the color cast while preserving the structure of the original image. Table 1 shows the quantitative Evaluation of fusion stretch alone and combination of color balance and fusion method (Figs. 9, 10, and 11).
Fig. 9 Graph comparing UCIQE values
Fig. 10 Graph comparing UIQM values
6 Underwater Image Enhancement Using Fusion Stretch Method
Fig. 11 Qualitative evaluation
85
86
L. Koshy and S. M. Jacob
5 Conclusion Underwater image enhancement using fusion stretch method is an another approach to enhance underwater images. This technique is based on the fusion theory and requires no additional data information than the original image. This method can improve a wide variety of underwater photographs with great precision. It is also capable of recovering important faded features and edges. Fusion stretch strategy outperforms the color balance technique in terms of UCIQE and UIQM metrics.
References 1. Dalgleish FR, Caimi FM, Kocak DM, Schechner YY (2008) A focus on recent developments and trends in underwater imaging. Marine Technol Soc 42(1):52–67 2. Foresti GL (2001) Visual inspection of sea bottom structures by an au- tonomous underwater vehicle. IEEE Trans Syst Man Cybern B Cybern 31(5):691–705 3. Cavallaro A, Li CY, Mazzon R (2018) An online platform for underwater image quality evaluation. Centre for Intelligent Sensing, Queen Mary University of London 4. Ancuti CO, Ancuti C, De Vleeschouwer C, Bekaert P (2018) Color balance and Fusion for underwater image enhancement. IEEE Trans Image Process 27 5. He D-M, Seet GGL (2004) Divergent-beam lidar imaging in turbid water. Opt Lasers Eng 41:217–231 6. Vaish V, Horowitz M, McDowall I, Levoy M, Chen B, Bolas B (2004) Synthetic aperture confocal imaging. In: Proceedings of ACM SIGGRAPH, pp 825–834 7. Fattal R (2008) Single image dehazing. ACM Trans Graph SIGGRAPH 27(3):72 8. Schechner YY, Averbuch Y (2007) Regularized image recovery in scattering media. IEEE Trans Pattern Anal Mach Intell 29(9):1655–1660 9. Kopf J et al (2008) Deep photo: model-based photograph enhancement and viewing. ACM Trans Graph 27,Art. no. 116 10. Ancuti CO, Ancuti C (2013) Single image Dehazing by multi-scale fusion. IEEE Trans Image Process 22(8):3271–3282 11. Fattal R (2008) Single image dehazing. Proc ACM SIGGRAPH, Art. no. 72 12. Sun J, He K, Tang X (2011) Single image haze removal using dark channel prior. IEEE Trans Pattern Anal Mach Intell 33(12):2341–2353 13. Sun J, He K, Tang X (2009) Single image haze removal using dark channel prior. In: Proceedings of IEEE CVPR, pp 1956–1963 14. Haber T, Ancuti C, Orniana C, EDM Belgium Philippe Bekaert, Hasselt University tUL IBBT, Enhancing underwater images and videos by fusion. Conference paper. 15. Manisrinivas S, Anilkumar D, Hemanth B, Santosh Kumar NT, Poojitha V, Under water image enhancement by fusion. Int J Mod Eng Res (IJMER) 16. Bhandari YS, Negi SS (2014) A hybrid approach to image enhancement using contrast stretching on image sharpening and the analysis of various cases arising using histogram. In: IEEE, International conference on recent advances and innovations in engineering (ICRAIE-2014), September 2014 17. Williams GP, Grundland M, Vohra R, Dodgson NA (2006) Cross dissolve without cross fade: Preserving contrast, color and salience in image compositing. Comput Graph Forum 25(3):577– 586
6 Underwater Image Enhancement Using Fusion Stretch Method
87
18. Mason JL, Bennett EP, McMillan L (2007) Multispectral bilateral video fusion. IEEE Trans Image Process 16(5):1185–1194 19. Kautz J, Mertens T, Van Reeth F (2009) Exposure fusion: a simple and practical alternative to high dynamic range photography. Comput Graph Forum 28(1):161–171 20. Simo M, Ortiz A, Oliver G (2002) A vision system for an underwater cable tracker. Mach Vis Appl 13:129–140
Chapter 7
A Novel Approach for Semantic Microservices Description and Discovery Toward Smarter Applications Chellammal Surianarayanan, Gopinath Ganapathy, and Pethuru Raj Chelliah
1 Introduction The monolithic style of conventional architectures could not meet the major critical needs of modern web-scale, cloud-hosted, enterprise-class, service-oriented, insights-driven, and customer-facing applications, namely continuous integration (CI), continuous delivery and deployment (CD), continuous improvement, and horizontal scalability. The new MSA style innately fulfills the above-mentioned needs by breaking up the application into many small-sized, functionally independent microservices which can be individually deployed, upgraded, replaced and substituted with advance versions and decommissioned. This means that any microservice can be redeployed independently without affecting the remaining portion of the application. Though MSA meets the needs of frequent deployment of applications, each microservice is designed around single purpose, and hence, each microservice has limited functionality. In order to realize a business process, obviously many tasks (abstract) have to be combined in such a way that they fully fulfill the given business process. Microservices which can realize the tasks have to be discovered and combined according to specific execution pattern. The process of combining individual services is called service composition, and service discovery is the prerequisite for performing the required composition. That is, prior to composition, microservices which implement the individual tasks have to be discovered according to C. Surianarayanan (B) Government Arts and Science College, Srirangam, Affiliated to Bharathidasan University, Tiruchirappalli, Tamil Nadu 620027, India G. Ganapathy Bharathidasan University, Tiruchirappalli, Tamil Nadu 620024, India P. R. Chelliah Reliance Site Reliability Engineering (SRE) Division Reliance Jio Platforms Ltd., Bangalore 560103, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_7
89
90
C. Surianarayanan et al.
functional needs of tasks. Each microservice exposes its functionality via its application programming interface (API). Microservices are commonly described using REST API and Apache Thrift API. Though Thrift API provides a suite of tools for developing service consumer’s and service provider’s code along with the required middleware such as stub and skeleton, frequently microservices are developed within an enterprise to achieve frequent deployment. In such scenario, developers tend to use REST API which does not require any extra infrastructure other than HTTP. Also, REST APIs are documented using Swagger or Open API Specification. REST API and Swagger are basically syntactic APIs. One of the critical needs in regard to microservices composition is to incorporate or annotate explicit semantics about the concepts of microservices so that discovery of required services could take place without any human intervention. Thus, the critical need is to describe microservices semantically so that they become machine-processable. Once services become machine-processable entities, discovery and hence composition can take place automatically. How to describe microservices semantically is an open issue. An approach is proposed to resolve this issue for the first time. Section 2 describes related work. Section 3 describes the proposed approach for semantic description of microservices. Section 4 describes semantic discovery, and Sect. 5 presents a case study to validate the proposed approach. Section 6 presents the conclusion.
2 Literature Survey Since microservices architecture is a recent application architecture paradigm, only a few research works can be found in the literature. The existing papers can be viewed into different categories such as literature related to fundamental design concepts of microservices, papers related to configuring the runtime parameters for microservices and papers related to the development of enterprises microservices for various purposes. Consider the first category. The target problems that occur while architecting with microservices such as service composition, resources management, real-time communication, provisioning of QoS attributes such as security, performance, maintainability, reliability, usability, and efficiency have been discussed in [1]. The basic microservices architectural design concepts along with the various advantages and disadvantages of containerized microservices are discussed in [2]. Consider the second category. In general, the runtime parameters, end point information, and service registries are highly dynamic, and the research work [3] provides a model-based approach where microservices are transformed into self-configuring services so that they themselves contain the runtime behavior within container itself and the runtime behavior is managed by concerned container. Consider the third category. High data availability is one of the critical requirements for Internet of Things (IoT)-based applications. The research work [4] describes how IoT-based Building Energy Efficiency Management Services (BEEMS) application meets the requirement of high data availability by using microservices for collecting, caching, processing, and analysis various kinds of data such as energy data, indoor and outdoor
7 A Novel Approach for Semantic Microservices Description …
91
environmental data, individual living pattern data, etc. In [5], an evolution framework and a set of evolution rules have been proposed to modernize the RosarioSIS legacy system and the quality attributes such as runtime performance, scalability, maintainability, and testability were analyzed for the adoption of microservices. The work also found that the microservices architecture has significant value in meeting the needs of more complex, very large, enterprise systems. The research work described in [6] is related to our theme of semantic microservices. In [6], the authors have presented an approach for the creation of semantic service descriptions from syntactic service specification, whereas the proposed work tried to directly incorporate the semantics of microservices in the OpenAPI specification itself. Further, an approach is proposed for semantic discovery which considers the similarity requirements of applications. It provides a way to filter the matched services according to the needs of application so that highly matched service will be automatically invoked during composition.
3 Semantic Service Description In general, APIs of microservices are described using REST architectural style. REST APIs can be constructed using URL template. However, there is no provision to express all the required aspects such as data type of arguments with a URL template. So, REST style has been standardized with several REST specifications. The standard REST specifications include OpenAPI (formerly called Swagger), RESTful API Modeling Language (RAML), and API BluePrint. Of the three, Swagger is very frequently used by developers. Swagger [7] is a set of rules for describing, producing, consuming, and visualizing REST APIs. Swagger was renamed as OpenAPI specification in 2016 after SmartBear software acquired the Swagger from Reverb Technologies. The OpenAPI specification is basically a syntax-based description. In this work, an approach is proposed for including semantics of microservices using the existing tags of OpenAPI Specification. The main purpose behind semantic description is that, with semantic description microservices will turn to be processed by machine (i.e., any software) so that the required business processes can be invoked without human intervention. For example, consider the terms, ‘soap.’ The word soap may mean different meanings, say for example, soap protocol used for communication and detergent soap used for washing. Unless otherwise what is the intended meaning of soap is expressed explicitly, automation is infeasible. So, there is a critical need to include semantics while describing microservices. With this idea in mind, suitable tags are identified and semantics in the form of ontology files are included in the description of microservices. Since ontologies provide formal definitions of concepts, there will be no ambiguity and machines can interpret/process service descriptions. Hence, semantics has brought maximal automation into discovery and composition.
92
C. Surianarayanan et al.
Swagger as OpenAPI Specification defines a standard, formal interface to RESTful APIs which can be interpreted by both humans and computers. The Swagger specification describes a standard structure with typical components such as • OpenAPI specification version—version of the Open API specification • Info object—given metadata about API • It contains sub–parts, namely title, description, terms of service, contact, license, and version • Server—It represents the server in which the resources are available • It contains sub-parts, namely (i) URL (URL of the target host) (ii) description and (iii) variables • Paths—Path gives the relative paths to the individual endpoints of operations of resources. • Operation—Operation object represents single API operation on the path • Externaldocs—It helps to refer external resource. In the proposed approach, the tag external docs are used for incorporating ontology files. Typically, an operation takes zero or a few parameters, performs some task, and returns response (i.e., output result). Consider a hypothetical microservice bookfinder which finds and returns a book, given title of the book as input. Consider that the operation of the service has its signature as findBook(Title). Here, title refers to the parameters or arguments to the operation and book refers to the response of the operation. Now consider that the parameter (input to the operation) is described in an ontology file say books.owl. Also, consider that the response (output parameter) is also defined in the same ontology file books.owl. Now the problem of including semantic description of parameters and responses is handled through externalDoc tags of Open API Specification. This tag can be defined for an operation object. An operation object describes single API operation on the path. The essential tags of operation object are shown in Fig. 1. As mentioned above, the externalDocs tag helps to refer the external resource. In general, it gives the URL of the microservice as in Fig. 2.
Fig. 1 Portion of path object of Open API specification which includes parameters and responses tags
7 A Novel Approach for Semantic Microservices Description …
93
Fig. 2 Externaldocs tag refers to the URL of the microservice
Fig. 3 Specifying input and output ontologies in the externalDocs tags
In this work, it is proposed to include the semantics of input parameters and output responses as shown in Fig. 3 It is implied that, the concepts described in parameters and responses have their semantics in the ontology files given through the externalDocs tag. To illustrate the idea fully, the description of path tag is given in Fig. 4. Please note that the response of an operation may be any data corresponding to JSON schema which is the fundamental schema supported by Open API Specification. When one wants to include an object for which semantics is given in externalDocs, then he/she can specify the entire intended concept in the example tag as given in Fig. 4. There is no explicit name tag for responses. So, the intended response object is fully specified in the example tag. (Also, the number of responses of microservices is likely to be one.)
4 Semantic Service Discovery Service discovery is the processing of finding matched services for a given query from a given pool of services which are typically available in a corporate registry. In semantic service discovery, service client (i.e., any service-consuming application) is expected to express its query along with its intended ontologies. This means that typically a query contains the required inputs and outputs (functional characteristics) and their ontologies. The proposed approach for semantic microservices discovery consists of three steps as shown in Fig. 5.
94
Fig. 4 Parameter and response description of/findbook operation
Fig. 5 Proposed approach for semantic microservice discovery
C. Surianarayanan et al.
7 A Novel Approach for Semantic Microservices Description …
95
Step-1: Preliminary matching While discovering matched services for a given query, the proposed approach, at first checks whether the input and output ontologies of query and available service are same. If ontologies are found to be same, then the approach checks whether the number of parameters and responses of query and available service are same. If the number of parameters and responses are same, then the approach proceeds to Step-2. Otherwise, it returns with no matches found. Step-2: Semantic matching Consider a query Q and an available microservice A. Since the functional characteristics of a microservice are mainly expressed using input parameters and output responses, the similarity between a query say, Q and an available microservice A denoted by Similarity(Q, A) is computed as the average of input and output similarities between Q and A as given in (1) Similarity(Q, A) = 0.5(InputSimilarity(Q, A) + OutputSimilarity(Q, A))
(1)
In (1), InputSimilarity(Q, A) refers to input similarity between Q and A which is computed from the individual similarity values of all input parameters of Q with the corresponding input parameter of A. OutputSimilarity(Q, A) refers to output similarity between Q and A which is computed from the similarity values of all 9 responses of Q with the corresponding response of A. The computation of InputSimilarity(Q, A) is similar to that of OutputSimilarity(Q, A). Computation of input similarity Consider a query Q and an available microservice A. Let m denote the number of parameters of Q and A, respectively. Let pq , pq ,…, pq denote the 1st, 2nd, 3rd, …, mth parameters of Q. Let p1 a , p2 a ,…, pm a denote the 1st, 2nd, 3rd, …, mth parameters of service A. The input similarity between microservices Q and A is computed in a pairwise manner as given in Fig. 6.
Fig. 6 Matching input parameters and responses of Q and A
96
C. Surianarayanan et al.
Table 1 Degree of Match between two concepts, adv and req
Degree of Match
Semantic relation/matching condition
Similarity Score
Equivalent
If req is equivalent to adv
1
Direct-plugin
If req is direct sub class of adv
0.9
Indirect-plugin
If req is indirect sub class of adv
0.8
Direct-Subsumes
If adv is direct sub class of req
0.7
Indirect-subsumes If adv is indirect sub class of req
0.6
Sibling
If adv and req have same 0.5 parent concept
Fail
No relations
0
Input similarity is computed using the formula InputSimilarity (Q, A) =
1 q paramSim( pi , pia )|i = 1, 2, ..., m mi
(2)
q
In (2), paramSim(pai , pi ) denotes the similarity between ith input parameter of service Q and ith input parameter of service A according to the semantic relations given in Table 1. Computation of output similarity The output similarity between Q A and denoted by OutputSimilarity (Q, A) =
q 1 responseSim pi , paj | j = 1, 2, ..., n nj
(3)
In (3), responseSim(r q , r a ) denotes the semantic relationship between jth output parameter of Q and that of A and it is computed according to the relations given in Table 1. q q The computation of paramSim( pai , pi ) or responseSim(r j , r aj ) is similar. The semantic relations between any two ontological concepts (concepts may be input parameters or output responses) can be found using different levels of Degree of Match (DoM) between them. Conventionally, four levels of DoM, namely exact, plugin, subsumes, and fail, were defined by Paolucciet al. [8] for semantic service matching. The conventional levels of DoM have been extended in one of our previous research works. In this work, toward providing reasonably matched services for automatic composition, six levels of DoM between an available or advertised concept adv and a query concept req are given in Table 1.
7 A Novel Approach for Semantic Microservices Description …
97
In practice, ontology reasoners such as Pellet are used to find DoM. Thus, by q q computing paramSim(pai , pi ) and responseSim(r j , r aj ) input similarity and output similarity are computed using (2) and (3). Then the total similarity is computed using (1) Step-3: Filtering the results While automating service discovery, it becomes very essential to consider the similarity requirements of an application. Further, each and every application has its own requirement according to the given context. In some contexts, applications can have very strict similarity requirements, whereas in some other contexts they can have flexible similarity requirements. Depending upon the nature of applications, an application can specify a threshold value. This similarity threshold will be used to filter the services returned by semantic matching step. Only those matched services which have similarity value at least equal to or more than the given threshold will be returned as matched services and they will be ranked according to similarity score. How applications can have different kinds of similarity requirements can be realized from the following examples. Example 1 Consider a query ‘findCardiologist’ which looks for cardiologist in a given hospital. The input parameters and the response of the query are Hospital and Cardiologist, respectively. Consider a portion of an imaginary ontology which describes the output concept Cardiologist as in Fig. 7. Since as defined in [9], semantic matching algorithms tend to find all services which various levels of DoMs such as equivalent, direct plugin, indirect plugin, direct subsumes, indirect subsumes, and fail. With this kind of semantic matching, it is likely to get many services which have at least some DoM other than fail. But this does not meet the exact requirements. As in this example, the query is looking for the concept cardiologist. This concept has superclass General Physician and siblings Gynecologist and Child Specialist. Services which have other than cardiologist also will be returned for the query. Since the functionality of cardiologist concept cannot be implemented by sibling or plugin or subsumes matches, which
Fig. 7 Fragment of imaginary ontology
98
C. Surianarayanan et al.
Fig. 8 Hotel concepts in travel.owl
finding matched services itself, the applications have to specify that it has a strict similarity requirement and it should state the similarity threshold as 1. Example 2 Let us consider another query which looks for Star-Hotel for the given city. Here city is the input parameter concepts, and hotel is the response. Consider the portion of ontology which describes the hotel concept as in Fig. 8.
5 Case Study To illustrate the proposed description and discovery mechanisms, a few hypothetical test microservices were created as mentioned in Table 2. Also, it is considered that the microservices described using OpenAPI Specification and they specify their input and output ontologies in the externalDocs tags as mentioned in Sect. 3. The details of ontologies that are processed can be processed using reasoners such as pellet. Consider a query, Q1 which looks for 5-Star-Hotelfor a given City as given in Fig. 9. Consider a matching algorithm works as per the discovery logic proposed in the previous section. While finding matched services for the query Q1, the matching algorithm at first the input and output ontologies of query is matched against those of a microservice available in the registry. In our example, the service S4 has its input and output ontologies same as those of query. So, the service S4 will be considered for semantic matching step. In this step, the input and output similarities are computing using Eqs. (2) and (3). While finding input similarity using (2), the input concept city of query is matched against the input concept of S4. Since and equivalence DoM is found between them, the input similarity is computed as 1 (note: the number input parameters, m = 1). Similarly, the output concept, 5-Star-Hotel, is matched with the output concept, Hotel of available microservice. Here, it finds an indirect-plugin match with similarity score 0.8. Now the total similarity is computed as 0.9 using Eq. (1). This kind of query
Service
Book_price_service
Book_finder_service
Title_comedy_film_service
Hotel_finder_service
University_researcher_service
Publication_a uthor_service
Title_film_service
Book_lecturer_service
University_professor_service
Physician_finder_service
Service id
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
Table 2 Test microservices available in a registry
City
University
University
Title
Publication-No
University
City
Title
Author
Book
Input parameters
Travel.owl
Portal.owl
Portal.owl
Myontology.owl
Portal.owl
Portal.owl
Travel.owl
Myontology.owl
Books.owl
Books.owl
Input ontology
Physician
Professor_in_academia
Lecturer_in_academia
Film
author
Researcher
Hotel
Comedyfilm
Book
Price
Output response
Hospital.owl
Portal.owl
Portal.owl
Myonto- logy.owl
Books.owl
Portal.owl
Travel.owl
Myontology.owl
Books.owl
Concept.owl
Output ontology
7 A Novel Approach for Semantic Microservices Description … 99
100
C. Surianarayanan et al.
Fig. 9 Service query
is likely to have flexible similarity requirement and assume that the application sets a similarity threshold of 0.5. Since the computed similarity satisfies the set similarity threshold, the service S4 is returned as matched service.
6 Conclusion Microservices are being positioned as the efficient and effective building block for building next-generation applications across the industry verticals. All kinds of devices-centric applications (mobile, wearable, implantable, handheld, wireless, fixed, portable, and nomadic devices), technologies-centric applications (the Internet of Things (IoT), blockchain, artificial intelligence (AI), etc.), middleware-centric applications (message brokers/gateways/hubs/queues/buses, orchestration engines, etc.), server-centric applications (enterprise, web, and cloud), and other applications such as analytical, operational, transactional, etc., are being developed leveraging APIs-stuffed microservices. For implementing and sustaining distributed, complicated and sophisticated applications, differently capable microservices have to be dynamically identified and linked up together. Thus, automated service description, registration, discovery, decision-making, and leverage are being touted as the most important tasks for the ensuing era of semantic microservices. That is, semantically enabled microservices provide the deeper and decisive automation. Precisely speaking, for creating smarter applications, we need smart microservices. The much-anticipated semantics capability microservices to be smart in their operations, offerings, and outputs. In this work, an approach is proposed to describe explicit semantics for microservices by using the tags of existing OpenAPI Specification. With this approach, a service provider can include the intended ontologies so that these ontologies are used during service discovery. With this approach, service discovery can be performed automatically without any human intervention.
References 1. Di Francesco P, Lago P, Malavolta I (2019) Architecting with microservices: a systematic mapping study. J Syst Softw 150:77–97 2. Venugopal MVLN (2017) Containerized microservices architecture. Int J Eng Comput Sci 6(11):23199–23208. ISSN 2319-7242
7 A Novel Approach for Semantic Microservices Description …
101
3. Kehrer S, Blochinger W, AUTOGENIC: automated generation of self-configuring microservices. In: Proceedings of the 8th international conference on cloud computing and services science, pp 35–46 4. Jarwar MA, Ali S, Chong I (2019) A microservices model to enhance the availability of data for building energy efficiency management services. Energies 12:360.https://doi.org/10.3390/ en12030360 5. Habibullah S, Liu X, Tan Z, Zhang Z, Liu Q (2019) Reviving legacy enterprise systems with microservice-based architecture within cloud environments. In: Nagamalai S et al (eds) SAI, NCO, SOFT, ICAITA, CDKP, CMC, SIGNAL—2019, pp. 173–186. © CS & IT-CSCP 2019 6. Schwichtenberg S, Gerth C, Engels G (2017) Proceedings of the 24th international conference on web services (ICWS). IEEE 7. OpenAPI Specification. Swagger. Available at https://swagger.io/specification 8. Paolucci M, Kawamura T, Payne TR, Sycara K (2002) Semantic matching of web service capabilities. In: International Semantic Web Conference, LNCS, Vol. 2342. Springer, pp 333–347 9. Surianarayanan C, Ganapathy G (2016) An approach to computation of similarity, inter-cluster distance and selection of threshold for service discovery using clusters. IEEE Trans Serv Comput 9(4):524–536
Chapter 8
Comparative Analysis of Machine Learning Algorithms for Imbalance Data Set Using Principle Component Analysis Swati V. Narwane and Sudhir D. Sawarkar
1 Introduction ML is popular and on-demand because it avoids human interaction and trained the systems automatically using data set. ML is used in various domains such as Prediction, Image recognition, Speech Recognition, and Medical diagnoses, etc. Domains like Medical diagnoses or healthcare are based on sensitive and huge data sets [1, 2]. Big data is the source of information to build predictive systems and pattern extraction [3]. Detection of any diseases depends on various features. Healthcare data sets contain various features in which some are directly or indirectly dependent on disease detection. ML-based systems required sufficient data to predict the accurate result. But practically healthcare data set is imbalanced where information related to disease detection is insufficient. The data set having insufficient information about disease detection is called class imbalanced [4]. The Forbes reports published that; the global market of ML was expected $1.58B in 2017 which was reached $30B by 2024 [5]. ML-based organizations were reduced costs by up to 25%, better retention, customer satisfaction, and revenue generation over products and services [6]. The ML algorithms face various challenges like workflow management [1], data cleaning [7], task scheduling [8], class imbalance [4], and many more. The effects of class imbalance were very dangerous on sensitive domains like healthcare, software defect prediction, and cybersecurity [9]. ML provides various techniques to handle class imbalance issues. These techniques are mainly divided into 3 categories; data level, algorithm level, and hybrid level. Data level approach based on minority and majority class [10, 11]. Undersampling and oversampling algorithms were used to tackle with data level approach S. V. Narwane (B) Department of Computer Engineering, Datta Meghe College of Engineering, Navi Mumbai, India S. D. Sawarkar Datta Meghe College of Engineering, Navi Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_8
103
104
S. V. Narwane and S. D. Sawarkar
[12]. How to handle class imbalance depends on the nature of the data set. Sometimes it was handled with a single algorithm or it may require a combination of algorithms [13, 14]. Decision tree, naïve Bayes, and support vector machines are popularly used ML algorithms. The algorithm level approach was better in performance than the data level approach but still, it has its cons [15]. The cons of the algorithm approach are incomplete trained systems [16], Bagging bad classifier [15], and outliers and noisy data over precision [17]. The combination of data level approach and algorithm level approach is known as hybrid level. The performance of hybrid level was better if an appropriate combination of data level and algorithm level techniques were used [18, 19]. To handle the class imbalanced problem the primary step is to study the structure and nature of the data set. But healthcare data set is very huge and contains several features. As discussed above, the data set contains very little information which available to detect accurate results. This scenario decreases the performance of MLbased systems. It has been observed that with limited features class imbalanced handling is quite simple. The major task is the selection of features from the data set. To select appropriate features from the data set there are two techniques: (i) feature selection and (ii) feature extraction. Feature selection is the process of reducing input variables for prediction but it may cause data loss whereas feature extraction avoids redundant features from actual data sets [20]. The performance of the feature extraction technique is better than feature selection. The remainder of the paper is organized as follows: Sect. 2 gives literature survey. Proposed methodology is covered in Sect. 3, followed by results and discussion in Sect. 4. Section 5 gives conclusion of the study.
2 Literature Survey The data set is the building block of the ML-based system. ML-based systems used in the healthcare domain drastically decrease their performance due to an imbalanced data set. To handle the class imbalance issue in the ML-based system, the basic step is to study the data set. The focal point of the data set was the size of the data set and features of the data set. Healthcare data sets were very large and also had lots of features to describe diseases. To handle such data set feature reduction or feature selection is one of the measures. The current section describes various techniques and methods. Devarriya et al. [21] proposed a method to balance an unbalanced breast cancer data set which was based on D score and F2 score functions. Khalid et al. [22] discussed feature selection and extraction techniques for preprocessing of data sets. In this article, the authors were considered a subset of the sturdy features from the data. The feature reduction was done by calculating the covariance of features which creates new features. The study stated that the performance of feature extraction was better than feature section techniques. The random equilibrium samples were used for feature selection by Huang et al. [23]. The experimental work was done on
8 Comparative Analysis of Machine Learning Algorithms …
105
few data sets and also worked on small-sized data sets. Doan et al. [24] focused on computational algorithms to select features in the data set. As discussed in the study, the proposed system does not work on mixed data sets. Escamila et al. [25] discussed in their article that proposed a system based on chi- square and PCA for feature selection. The proposed system tested only one data set. To study imbalance learning, Farajian and Adibi [26] focused on auto-encoding and one-class learning. The technique adopted for classification was a deep neural network. Results were compared with various criteria such as F-measure, G-means, and AUC. Battineni et al. [27] developed a Machine learning model to classify patients who suffered from a brain disease—dementia. The proposed method used the PCA technique to extract features. To classify patients who suffered from a brain disease—dementia. Proposed model based on logistic regression (LR), support vector machine (SVM), and K-nearest neighbor (KNN) techniques. The proposed study discussed the premium solutions were KNN and LR techniques. Magesh and Swarnalatha [28] proposed a new approach based on the cluster-based decision for feature selection. The proposed study evaluated using five-level distribution from class label distribution to high distribution. To study issues like biased classifiers and high dimensionality, Pei et al. [29] discussed a method based on Genetic Programming (GP), fitness function, and reuse mechanism. Genetic Programming applied for feature selection. Fitness functions focused on unbalanced data sets. Masum et al. [30] construct a system, detects early-stage neurological disorders using PCA. PCA techniques used low variant components for minor data sets. To tackle unbalanced data in system log files under anomaly detection Studiawan et al. [31] discussed various machine learning algorithms. The article compares multiple oversampling and undersampling methods and calculates precision values. The real-time streaming of log files was not considered in the proposed study. The literature study discussed in this section stated that feature reductions play an important role to handle the class imbalance problem. Feature extraction was more effective than feature selection. The next section was discussed about proposed methodology.
3 Proposed Methodology To enhance the performance of ML-based classifiers for unbalanced data sets proposed study was focused on the quality of data sets. The quality of the data set depended on structure and features. To improve the quality of the data set the proposed study used a feature extraction technique, known as PCA. PCA is an unsupervised feature extraction technique that is based on variances. PCA can increase variances and reduce reconstruction error. Figure 1 shows the experimentation flow of the proposed study. In the proposed study, data were preprocessed with the normal method as well as with PCA. The preprocess data was executed with standard ML algorithms. Finally, results were calculated for both (with PCA and without PCA) preprocessed data
106
S. V. Narwane and S. D. Sawarkar
Fig. 1 Experiment workflow
set using standard ML classifiers. Calculated results were explained in the terms of accuracy, precision, recall, F1 score, etc. Result comparisons were based on calculated values of precision rather than Accuracy. These terms were expressed using the following terminologies: P—True Positive—Amount of actual positive events and predicted as positive events. Q—True Negative—Amount of actual negative events and predicted as negative events. R—False Positive—Amount of events actual positive and predicted as negative events. S—False Negative—Amount of actual negative events and predicted as positive events Accuracy = (P + Q)/(P + R + Q + S) Precision = P/(P + S) Recall = Q/(Q + R) Normally results comparisons were based on accuracy but in healthcare systems, the precision value was more important as precision values predict exact results of the data set. The healthcare data sets contained few events which detect the presence of disease. But present information was sufficient to detect accurate results for ML-based systems. To address this problem the proposed study evaluated feature extraction technique PCA. The proposed study was evaluating results from Heart
8 Comparative Analysis of Machine Learning Algorithms …
107
Disease Data Set (HDDS). HDDS was collected from the Kaggle repository. The main focus of the proposed study was minor events of the data set. In HDDS 14 features were selected. In which 13 features were dependent and 1 was dependent or target feature. The target feature contains two values 0 or 1 where 0 predicts absence and 1 predicts the presence of heart disease. Experiments were performed in Python and record the results. The experimental flow of the study was explained in the following steps: Step 1 Upload unbalanced to balanced CSV file of the HDDS. Step 2 Preprocessed data with the traditional process. Step 3 Preprocessed using PCA techniques. Step 4 Preprocessed data executed with standard ML algorithms (NB, LR, DT, SVM, etc.) Step 5 To calculate precision values for unbalanced to balanced data set. Step 6 To compare precision values of the data set this was preprocessed with and without PCA. Table 1 showed a detailed analysis of HDDS for an imbalanced to balanced data set. As shown in Table 1, HDDS divided into two imbalance ratios (3:1, 1:3) for 0 and 1 event respectively, and one balanced ratio (1:1) for 0 and 1. Every set of data was executed with standard ML algorithms and calculated precision values. In HDDS analysis, the set of each imbalanced ratio was cross-checked with the external data set which was the opposite imbalanced ratio. Imbalanced data set with a ratio of 3:1 was cross-checked with 1:3 imbalanced data sets and vice versa. The observations were recorded during analysis states that imbalanced data sets preprocessed with PCA technique improve performance of the Ml-based systems. Table 2 was described precision values analysis of HDDS and Table 3 described precision values analysis of HDDS training data with external data set. Table 2 shows precision values analysis of HDDS for internal and external train test data set. Precision values were recorded of train test data set with and without PCA. The line graph representation of Tables 2 and 3 were shown in Figs. 2 and 3. Figure 2 shows a line graph of the HDDS train test data set for internal data. The X-axis shows accuracy and precision values and Y-axis shows standard ML algorithms. In a line graph, the square represents accuracy values without PCA; the triangle represents precision values without PCA. A line graph with a cross represents accuracy values with PCA. The line graph with the star represents precision values with PCA. Line graph of internal data set analysis states that precision values of ML algorithms were given the better performance of data set which preprocessed using PCA, though it is unbalanced. But if the data set was balanced and preprocessed with PCA, then results were drastically improved and accurate. Now let’s take a look at the analysis of with and without PCA for external data set. The same process was applied to an external data set and results were recorded. The precision value of the data set was better than those were preprocessed with PCA. The line graph analysis shown in Fig. 3 described the up and down performance of the HDDS with external data set for various standard ML algorithms. The analysis of ML algorithms stated that the data preprocessed with PCA improves the performance of the ML-based systems. The algorithm like the decision tree and neural network
108
S. V. Narwane and S. D. Sawarkar
Table 1 HDDS analysis with and without PCA ML Algo
IR
Class 0
Class 1
Train Acc
Precision
Recall
Test Acc
Precision
Recall
Naïve Bayes
3:1
75
25
0.9
0.84
0.92
0.9
0.67
1
1:3
25
75
0.9
0.95
0.74
0.85
1
0.5
1:1
50
50
0.9
0.92
0.88
0.95
0.91
1
3:1
75
25
1
1
1
0.9
0.67
1
1:3
25
75
1
1
1
0.9
1
0.67
1:1
50
50
1
1
1
0.85
0.91
0.78
3:1
75
25
0.94
0.84
0.97
0.95
0.83
1
1:3
25
75
0.9
0.98
0.63
0.8
0.93
0.5
1:1
50
50
0.9
0.9
0.9
0.95
1
0.89
Support vector machine
3:1
75
25
0.94
0.84
0.97
0.95
0.83
1
1:3
25
75
0.91
0.98
0.68
0.85
0.93
0.67
1:1
50
50
0.91
0.92
0.9
0.9
0.91
0.89
Neural network analysis
3:1
75
25
1
1
1
0.85
0.5
1
1:3
25
75
1
1
1
0.75
0.93
0.33
1:1
50
50
1
1
1
0.7
0.73
0.67
Decision tree
3:1
75
25
1
1
1
0.7
0.33
0.86
1:3
25
75
1
1
1
0.6
0.79
0.17
1:1
50
50
1
1
1
0.75
0.64
0.89
ML Algo
Test data Acc with PCA
Precision
Recall
Ext data test Acc
Precision
Recall
Ext data test Acc with PCA
Precision
Recall
Naïve Bayes
0.85
0.5
1
0.8
0.3
0.97
0.73
0.4
0.83
0.85
0.5
1
0.5
1
0.33
0.48
1
0.3
0.95
0.91
1
0.85
1
0.8
0.65
0.9
0.57
0.85
0.67
0.93
0.83
0.5
0.93
0.75
0.4
0.87
0.75
0.93
0.33
0.8
0.97
0.3
0.73
0.83
0.4
0.85
1
0.67
0.7
0.8
0.6
0.75
0.85
0.65
0.9
0.67
1
0.41
0.21
1
0.36
0.15
1
0.9
0.67
1
0.49
1
0.32
0.89
0.72
0.95
Random forest
Logistic regression
Random forest
Logistic regression
0.95
0.91
1
0.8
0.8
0.8
0.8
0.85
0.75
Support vector machine
0.85
0.5
1
0.38
0.17
1
0.39
0.19
1
0.85
1
0.5
0.52
1
0.36
0.47
1
0.29
0.9
0.91
0.89
0.83
0.8
0.85
0.8
0.85
0.75
Neural network analysis
0.85
0.5
1
0.54
0.39
1
0.45
0.27
1 (continued)
8 Comparative Analysis of Machine Learning Algorithms …
109
Table 1 (continued) ML Algo
Decision tree
Test data Acc with PCA
Precision
Recall
Ext data test Acc
Precision
Recall
Ext data test Acc with PCA
Precision
Recall
0.8
0.93
0.5
0.52
1
0.36
0.52
1
0.36
0.9
1
0.78
0.68
0.6
0.75
0.78
0.85
0.7
0.85
0.83
0.86
0.6
0.47
1
0.49
0.33
0.96
0.85
0.93
0.67
0.62
0.84
0.55
0.5
0.96
0.35
0.7
0.82
0.56
0.75
0.85
0.65
0.68
0.7
0.65
Table 2 Precision values analysis of HDDS for the internal test data Data set
Heart disease detection
ML algorithms
Internal data set Acc without PCA
Precision without PCA
Acc with PCA
Precision with PCA
NB1 (3:1)
0.90
0.67
0.85
0.50
NB2 (1:3)
0.85
1.00
0.85
0.50
NB3 (1:1)
0.95
0.91
0.95
0.91
RF1 (3:1)
0.90
0.67
0.85
0.67
RF2 (1:3)
0.90
1.00
0.75
0.93
RF3 (1:1)
0.85
0.91
0.85
1.00
LR1 (3:1)
0.95
0.83
0.90
0.67
LR2 (1:3)
0.80
0.93
0.90
0.67
LR3 (1:1)
0.95
1.00
0.95
0.91
SVM1 (3:1) 0.95
0.83
0.85
0.50
SVM2 (1:3) 0.85
0.93
0.85
1.00
SVM3 (1:1) 0.90
0.91
0.90
0.91
NNA1 (3:1) 0.85
0.50
0.85
0.50
NNA2 (1:3) 0.75
0.93
0.80
0.93
NNA3 (1:1) 0.70
0.73
0.90
1.00
DT1 (3:1)
0.70
0.33
0.85
0.83
DT2 (1:3)
0.60
0.79
0.85
0.93
DT3 (1:1)
0.75
0.64
0.70
0.82
analysis was the drastically improved performance of the unbalanced as well as balanced data set. Detailed analysis of decision tree algorithms was explained in the result and discussion section.
110
S. V. Narwane and S. D. Sawarkar
Table 3 Precision values analysis of HDDS for the external test data Data set
Heart disease detection
ML algorithms
External data set Accuracy without PCA
Precision without PCA
Accuracy with PCA
Precision with PCA
NB1 (3:1)
0.80
NB2 (1:3)
0.50
0.30
0.73
0.40
1.00
0.48
1.00
NB3 (1:1)
0.85
1.00
0.65
0.90
RF1 (3:1)
0.83
0.50
0.75
0.40
RF2 (1:3)
0.80
0.97
0.73
0.83
RF3 (1:1)
0.70
0.80
0.75
0.85
LR1 (3:1)
0.41
0.21
0.36
0.15
LR2 (1:3)
0.49
1.00
0.89
0.72
LR3 (1:1)
0.80
0.80
0.80
0.85
SVM1 (3:1) 0.38
0.17
0.39
0.19
SVM2 (1:3) 0.52
1.00
0.47
1.00
SVM3 (1:1) 0.83
0.80
0.80
0.85
NNA1 (3:1)
0.54
0.39
0.45
0.27
NNA2 1:3)
0.52
1.00
0.52
1.00
NNA3 (1:1)
0.68
0.60
0.78
0.85
DT1 (3:1)
0.60
0.33
0.49
0.47
DT2 (1:3)
0.62
0.84
0.50
0.96
DT3 (1:1)
0.75
0.70
0.68
0.85
1.20 1.00 0.80 0.60 0.40 0.20 0.00
Accuracy without PCA Accuracy with PCA
Precesion without PCA Precesion with PCA
Fig. 2 Line graph analysis of HDDS for the internal train test data
Accuracy without PCA
Precesion without PCA
Accuracy with PCA
Precesion with PCA
DT3 (1:1)
NNA3…
DT1 (3:1) DT2 (1:3)
NNA2…
111
NNA1…
SVM3…
SVM2…
SVM1…
LR3 (1:1)
LR2 (1:3)
LR1 (3:1)
RF2 (1:3)
RF3 (1:1)
RF1 (3:1)
NB3 (1:1)
NB2 (1:3)
1.50 1.00 0.50 0.00
NB1 (3:1)
8 Comparative Analysis of Machine Learning Algorithms …
Fig. 3 Line graph analysis of HDDS for the external train test data
4 Result and Discussion All standard ML algorithms were executed using preprocessed data set; in which the results of decision tree algorithms had noticeable changes. Those changes showed using the ROC curve. The results observed that the data preprocessed with PCA were performed better than the traditionally preprocessed data set. Figures 4 and 5 show the ROC curve for the unbalanced data set. ROC curve shows in Fig. 4 ROC curve for traditionally preprocessed data set with precision value 0.3333 whereas Fig. 5 shows ROC curve for preprocessed with PCA with precision value 0.8333. Figures 6 and 7 show the ROC curve for the balanced data set. ROC curve shows in Fig. 6 ROC curve for traditionally preprocessed data set with precision value 0.6364
Fig. 4 Decision tree ROC curve of HDDS unbalanced data set w/o PCA
112
S. V. Narwane and S. D. Sawarkar
Fig. 5 Decision tree ROC curve of HDDS unbalanced data set with PCA
Fig. 6 Decision tree ROC curve of HDDS balanced data set w/o PCA
whereas Fig. 7 shows ROC curve for preprocessed with PCA with precision value 0.8182.
8 Comparative Analysis of Machine Learning Algorithms …
113
Fig. 7 Decision tree ROC curve of HDDS balanced data set with PCA
5 Conclusion The class imbalance was a very challenging issue in the world of ML; specifically in the healthcare domain. The effects of class imbalance were dangerous as they were related to the life of human beings. Balanced data set is the need for the MLbased system. Practically, the situation was the opposite; such the opposite situation was decreased performance of ML-based systems. To address the class imbalance problem, the proposed study focused on the structure of the data set. The proposed work used heart disease detection data set (HDDS). The issue of the healthcare data sets was their dimensionality. The proposed study has reduced the dimensionality of the data set using the feature extraction technique PCA. The HDDS preprocessed using PCA. The preprocessed data set was evaluated with Ml algorithms and calculated precision values of unbalanced to balanced data set. The results state that the data processed with PCA was predicting accurate results than traditionally preprocessed data which was also improved the performance of the ML-based systems. In the future, the study will be extended toward the use of improved PCA techniques for reduced dimensionality of the data set. Also try to work on hybrid techniques to classify data for ML-based systems. The study will also be focused on how to handle a multi-class imbalance in the healthcare domain.
114
S. V. Narwane and S. D. Sawarkar
References 1. Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361 2. Qiu J, Wu Q, Ding G, Xu Y, Feng S (2016) A survey of machine learning for big data processing. EURASIP J Adv Signal Process 1:67 3. Moreno MV, Terroso-Sáenz F, González-Vidal A, Valdés-Vela M, Skarmeta AF, Zamora M A, Chang V (2016) Applicability of big data techniques to smart cities deployments. IEEE Trans Indust Inf 13(2): 800–809 4. Holden G (2016) Big Data and R&D Management: A new primer on big data offers insight into the basics of dealing with “uncomfortable data”—data that is too large or too unstructured to be accommodated by a firm’s existing processes. Res Technol Manag 59(5):22–26 5. Moorning KM (2017) Evaluating the impact of the socio- technical digital intelligence factor on customer-business relationships. Bus Manag Rev 8(4):1 6. Lee I, Shin YJ (2020) Machine learning for enterprises: applications, algorithm selection, and challenges. Bus Horiz 63(2):157–170 7. Gudivada V, Apon A, Ding J (2017) Data quality considerations for big data and machine learning: going beyond data cleaning and transformations. Int J Adv Softw 10(1):1–20 8. Ji W, Wang L (2017) Big data analytics based fault prediction for shop floor scheduling. J Manuf Syst 43:187–194 9. Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N (2018) A survey on addressing high-class imbalance in big data. J Big Data 5(1):42 10. Kalsoom A, Maqsood M, Ghazanfar MA, Aadil F, Rho S (2018) A dimensionality reductionbased efficient software fault prediction using Fisher linear discriminant analysis (FLDA). J Supercomput 74(9):4568–4602 11. Bellinger C, Sharma S, Japkowicz N, Zaïane OR (2019) Framework for extreme imbalance classification: SWIM—sampling with the majority class. Knowl Inf Syst, 1–26 12. Hassib EM, El-Desouky AI, El-Kenawy ESM, El- Gha mrawy SM (2019) An imbalanced big data mining framework for improving optimization algorithms performance. IEEE Access, 7, pp 170774–170795 13. Sitompul OS, Nababan EB (2018) Biased support vector machine and weighted-smote in handling class imbalance problem. Int J Adv Intell Inform 4(1):21–27 14. Hsu CC, Wang KS, Chung HY, Chang SH (2019) Equation of SVM-rebalancing: the pointnormal form of a plane for class imbalance problem. Neural Comput Appl 31(10):6013–6025 15. Feng W, Huang W, Ren J (2018) Class imbalance ensemble learning based on the margin theory. Appl Sci 8(5):815 16. Kumar S, Biswas SK, Devi D (2019) TLUSBoost algorithm: a boosting solution for class imbalance problem. Soft Comput 23(21):10755–10767 17. Cho P, Lee M, Chang W (2019) Instance-based entropy fuzzy support vector machine for imbalanced data. Pattern Anal Appl, pp 1–20 18. Bader-El-Den M, Teitei E, Perry T (2018) Biased random forest for dealing with the class imbalance problem. IEEE Trans Neural Netw Learn Syst 30(7):2163–2172 19. Al Majzoub H, Elgedawy I, Akaydın Ö, Ulukök MK (2020) HCAB-SMOTE: a hybrid clustered affinitive borderline SMOTE approach for imbalanced data binary classification. Arab J Sci Eng, pp 1–18 20. Battineni G, Chintalapudi N, Amenta F (2020) Comparative machine learning approach in dementia patient classification using principal component analysis. Group 500:146 21. Devarriya D, Gulati C, Mansharamani V, Sakalle A, Bhardwaj A (2020) Unbalanced breast cancer data classification using novel fitness functions in genetic programming. Expert Syst Appl 140:112866 22. Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 science and information conference. IEEE, pp 372– 378
8 Comparative Analysis of Machine Learning Algorithms …
115
23. Huang CX, Huang Y, Qu Y, Fang X, Zhai P, Fan L, Yin H, Xu Y, Li J (2020) Sample imbalance disease classification model based on association rule feature selection. Pattern Recogn Lett 133:280–286 24. Doan DM, Jeong DH, Ji SY (2020) Designing a feature selection technique for analyzing mixed data. In: 2020 10th annual computing and communication workshop and conference (CCWC). IEEE, pp 0046–0052 25. Gárate-Escamila AK, El Hassani AH, Andrès E (2020) Classification models for heart disease prediction using feature selection and PCA. Inform Med Unlocked 19:100330s 26. Farajian N, Adibi P (2021) Minority manifold regularization by stacked auto-encoder for imbalanced learning. Expert Syst Appl 169:114317 27. Battineni G, Chintalapudi N, Amenta F (2020) Comparative machine learning approach in dementia patient classification using principal component analysis. Group, vol 500, p 146 28. Magesh G, Swarnalatha P (2020) Optimal feature selection through a cluster-based DT learning (CDTL) in heart disease prediction. Evolutionary intelligence, pp 1–11 29. Pei W, Xue B, Shang L, Zhang M (2020) Genetic programming for high-dimensional imbalanced classification with a new fitness function and program reuse mechanism. Soft Comput 24(23):18021–18038 30. Masum M, Shahriar H, Haddad HM (2020) Epileptic seizure detection for imbalanced datasets using an integrated machine learning approach. In: 2020 42nd annual international conference of the IEEE engineering in medicine & biology society (EMBC). IEEE, pp 5416–5419 31. Studiawan H, Sohel F (2020) Performance evaluation of anomaly detection in imbalanced system log data. In: 2020 fourth world conference on smart trends in systems, security and sustainability (WorldS4). IEEE, pp 239–246
Chapter 9
Application of Reinforcement Learning in Control Systems for Designing Controllers Rajendar Singh Shekhawat and Nidhi Singh
1 Introduction Controlling a system for desired behavior has a lot of industrial relevance. In literature, there are several methods proposed to design a controller for various kinds of systems. The classic proportional–integral–derivative (PID) is the most common and popular controller used in the industry. PID controllers have three parameters, namely K p , K i , and K d . In conventional PID control theory, these parameters are tuned using several classical algorithms. Such algorithms are Ziegler–Nichols method, Chien– Hrones–Reswick method, and Cohen–Coon method [1], etc. Even though the PID controller has three parameters to be tuned, it is very difficult to find the set of parameters that provides the fast, stable, and robust response. Parameters of PID controllers are fixed after the tuning method. These classical controllers are not adaptive. Besides, if the system deviates from the linearized point, the performance of these controllers goes down [2]. A large deviation from the linearized point may lead the controller to go out of the operating region. This will make the controller parameters invalid to govern the system properly. Further, PID controllers are mostly applicable to linear systems because these are based on linear model [3]. Not only that, sometimes no model of the system is available, and hence, PID control theory cannot be applied because of the unavailability of the system model. Data-driven techniques are used to solve the above-mentioned issues. There is a recent trend of applying machine learning in controller design to explore a data-driven approach. Broadly, machine learning techniques can be classified into three main categories: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning is used for known input and output of the system for mapping the new input data. Unsupervised learning, in contrast, provides unlabeled data that the algorithm tries to make sense of by extracting features and patterns R. S. Shekhawat (B) · N. Singh Gautam Budha University, Greater Noida, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_9
117
118
R. S. Shekhawat and N. Singh
on its own. Reinforcement learning has an agent which needs to make an informed decision in the given environment. Application of supervised learning in controller design is used in [4]. Deep learning is used to train the neural network which learns the input–output of a PID controller for the DC motor. Fuzzy logic has also been extensively used in fine-tuning the controller design process. K p , K i , and K d parameters in the PID controller are tuned using fuzzy logic to select the values which give good results. For instance, in [5], parameters of PID controller are tuned using fuzzy logic. Adaptive PID controller design is also proposed in the literature. Song et al. [6] use deep reinforcement learning to make the PID controller self-adaptive. Deep reinforcement learning is being extensively used in making intelligent controllers. Also, it is a data-driven technique which does not require system model. Following are the key contributions in this paper: 1. Set up the Markov decision process (MDP) model of the closed-loop control system of DC motor and ball–beam setup. 2. After that, training of RL agent for minimizing a cost function using a crafted reward function is done to have a better comparison with standard PID controllers as both will have a common goal, i.e., minimizing the cost function or objective function. 3. Further, analyze the performance of RL agent and standard PID controller for two systems. First, the application of RL is shown in designing a controller for a DC motor system. Second, the application of RL for the nonlinear system is described. For that, a beam and ball system is taken as the basis of the study.
2 Control Techniques 2.1 PID Technique PID tuning is the most popular control technique used in industrial setting. This control works on the feedback mechanism where error is calculated from reference point and output value of system. If there is an error, then parameters of PID are changed, and the process is repeated again. For some systems, only proportional– integral (PI) part can work and for some proportional–derivative (PD) works. It depends on the system for which these controllers are designed. The whole process of tuning the parameters of PID controller to achieve the desired closed-loop system response is called as parameter tuning of PID. This tuning process is a cumbersome process and requires a lot of manual effort.
9 Application of Reinforcement Learning in Control …
119
2.2 Reinforcement Learning Consider an agent lying in an environment where it needs to make some sequential decisions to achieve its target. The end goal of an agent can be anything from winning a ping-pong game to finding a shortest path in a maze. Reinforcement learning [7] provides a tool/algorithm by which training an agent leads to achieve the goal. Agent takes action in the environment and receives reward for the same. Reward can be penalties in case of non-alignment with the goal. Its goal is to maximize the total reward as shown in Fig. 1. RL algorithms are modeled on Markov decision process (MDP). Basically, RL algorithms try to solve the MDPs to achieve a long-term goal. MDP tuple consists of state, action, reward, and transition probabilities.
3 Application 3.1 DC Motor The first system contains a linear system of DC motor. DC motor is the most common device used as an actuator in mechanical control. For example, the control of a rotary inverted pendulum requires a DC motor to drive the arm and the pendulum. The circuit diagram of the DC motor is shown in Fig. 2. Traditionally, the PID controller is designed to control the DC motor. LQR objective function is used to find the parameters of the PID controller. This is done to have a common goal for the comparison between standard PID controllers and RL-based controllers. Hence, both the controllers will have a common objective which is a cost function J using the LQR cost function. DC motor is characterized by the following set of equations:
Fig. 1 RL model
+ Vb (t) Vs (t) = i(t)R + L di dt Vb (t) = K b ω
(1) (2)
T M (t) − T L (t) = bω + J dω dt T M (t) = K T ∗ i
(3) (4)
120
R. S. Shekhawat and N. Singh
Fig. 2 DC motor
T L (t) is taken to be zero. The parameters of these equations for DC motor are defined in Table 2.
3.2 Ball–Beam System The second system in Fig. 3 shows a study of a nonlinear system of ball and beam [8]. A ball is placed on a beam, as shown in Fig. 3, where it is allowed to roll with 1 ◦ C of freedom along the length of the beam. A lever arm is attached to the beam at one end and a servo gear at the other. As the servo gear turns by an angle theta, the lever changes the angle of the beam by alpha. When the angle is changed from the horizontal position, gravity causes the ball to roll along the beam. A controller will be designed for this system so that the ball’s position can be manipulated. The second derivative of the input angle alpha actually affects the second derivative of r. However, this contribution can be ignored. The Lagrangian equation of motion for the ball is as follows: 0=
J +m R2
d2 r + mg sin α − mr dt 2
dα dt
2 (5)
The beam angle (alpha) can be expressed in terms of the angle of the gear (theta). α=
d θ L
The parameters for ball–beam systems are defined in Table 1.
(6)
9 Application of Reinforcement Learning in Control …
121
Fig. 3 Ball–beam setup Table 1 Parameters used in ball and beam system training Parameter Value Mass (m) Radius (R) Lever arm (d) Length of the beam (L) Ball’s moment of inertia (J) r Optimizer/learning rate
0.111 kg 0.015 m 0.03 m 1.0 m 9.99e − 6 kgm2 Ball position Adam/0.001
4 Proposed Design and Simulation In this section, modeling of closed-loop control system of DC motor and ball–beam is done as MDP model. The system acts as an environment for RL agent. Hence, the state of the system is the observable parameters for the RL agent. In case of the error, it is the observable parameter that we feed to the RL agent. The action of the RL agent is voltage for the case of DC motor and theta for ball–beam system. Now train the RL agent in episodes where each episode is having a particular number of steps. RL agent’s goal is to search for the optimal policy that maximizes the accumulated reward.
4.1 DC Motor MDP By designing a PI controller for the DC motor control system, observation or state tuple consists of e , e dt where e is ω − ωref , ω is speed of DC motor, and ωref is reference speed. Action is the output of NN which is the voltage provided to the DC
122
R. S. Shekhawat and N. Singh
motor system to control its speed. NN has one input, one hidden, and one output layer. The hidden layer has two neurons, and their weights corresponding to input layers are K p and K i . Output layer is one dimensional which is [e e dt] [K p K i ]T .
4.2 Ball–Beam MDP Similarly, implementing a PD controller for the ball–beam control system, observa, where e is r − rref . Action is the output of tion or state tuple consists of e and de dt NN which is the angle (θ ) provided to the ball–beam system to control the ball’s position. NN has one input, one hidden, and one output layer. The hidden layer has two neurons, and their weights corresponding to input layers are K p and K d . Output ] [K p K d ]T . layer is one dimensional which is [e de dt
4.3 Simulation Linear quadratic Gaussian (LQG) regulator is to control the output of the system and simultaneously minimize the control effort u. Cost function is given by ⎛ 1 J = lim E ⎝ T →∞ T
T
⎞ (αe(t)2 + βu(t)2 )dt ⎠
(7)
0
The conventional PI/PD controller parameters are tuned using this cost function as objective and compared with RL agent. Training of RL agent is done with the same goal of minimizing this cost function. Hence, the reward for RL agent is chosen as −J , α, β are chosen as 2.5 & 0.009, respectively, for both the controllers: RL agent and conventional PI/PD controller. While training, RL agent will maximize the reward and thus minimize the cost. Twin-delayed deep deterministic policy gradient (DDPG) (TD3) [9] is used as the training RL algorithm. Training is done in episodes, and each episode length is of 200 steps if not mentioned explicitly. RL agent is trained for 1000 episodes, and the learned weights are used to evaluate the RL controller with the standard PID controller. All the simulations are carried out in MATLAB using the Simulink. Figure 4 shows the system setup in MATLAB/Simulink. The system in Simulink diagram represents DC motor and ball–beam one by one during the simulations. For the ball–beam problem, assuming that the ball rolls without slipping and friction between the beam and ball is negligible. The parameters for this example are defined in Table 1 while for DC motor in Table 2. Figure 5 shows the open-loop response of DC motor. According to the plot, with 1 voltage of input, the motor speed grows unbounded. This shows that the open-loop response of DC motor is not stable. Similarly, Fig. 6 shows the
9 Application of Reinforcement Learning in Control …
Algorithm 1 Training based on DDPG TD3 method M: Total episodes T : Episode duration env = simulinkEnv(DCMotor or BallBeam) Batch Si ze: MiniBatchSize E x p Bu f f er Len: Experience Buffer Length agent = DDPGTD3(Batch Si ze,E x p Bu f f er Len) rng(0): Random Seed Generation θ: Actor Parameters α: Critic1 Parameters β : Critic2 Parameters θ ← 0 for i = 0 to M − 1 do Run train(T , agent): Collect trajectory φ j = {s1 , a1 , r1 , ..., sT , aT , r T } Compute Loss: L Estimate θ from {φ j }, L Update θ, α, β using θ from {φ j }, L end for
Fig. 4 Simulink setup Table 2 Parameters used in DC motor system training Parameter Value Motor viscous friction constant (b) Electric resistance (R) Electric inductance (L) Motor torque constant (K T ) Moment of inertia of the rotor (J) Back EMF constant (K b ) Optimizer/learning rate
0.1 N*m/(rad/s) 1 0.5 mH 0.01 Nm/Amp 0.01kgm2 0.01 Voltage/rad/s Adam/0.001
123
124
R. S. Shekhawat and N. Singh
Fig. 5 DC motor open-loop response
Fig. 6 Ball–beam open-loop response
open-loop response of ball–beam setup. This shows that the open-loop response is unstable. The open-loop response that the system got was unstable. Hence, controllers are required to control the output of system.
5 Results and Discussions By training RL agent for both the systems one by one, to compare the RL agent’s performance, we use a conventional PD/PI controller. There are conventional algorithms to design PID controllers in literature. For LQG objective-based optimization, MATLAB ControlSystemTuner is used . The LQG objective is the same as mentioned
9 Application of Reinforcement Learning in Control …
125
in (7). After running some iterations, ControlSystemTuner gives the PI parameters for comparison with the RL agent. This PI/PD controller will be referred to as the conventional PI/PD controller in the rest of the section.
5.1 DC Motor Setup Training of RL agent was stopped after an increase in reward was very minimalistic. With the help of trained neural network weights K p and K i for the evaluation, these tuned parameters are used in the PI controller for the DC motor to compare its performance with the conventional PI controller. Table 3 shows the result for stability analysis and step response parameters. The main objective of the RL agent was to reduce the cost function. We see that the RL agent has a lower cost value of 44.384 compared to 46.526 of the conventional PI controller. The phase margin of RL is 65.28 ◦ C compared to 48.728 ◦ C of the conventional PI controller. Similarly, the gain margin for RL is 7.079 dB, and the conventional PI controller has 5.745 dB. RL controller is more stable as it has a higher phase margin and gain margin values. Step response for RL and conventional PI controllers is presented in Fig. 7. RL agent has a lower settling time (2.339 s) compared to the conventional PI controller (2.408 s). This shows that RL agent controller tracks the input fast compared to conventional PI controller. Similarly, RL has a lower overshoot value of 0.678 % , whereas conventional PI has 19.8 %. Hence, we can say that the RL agent achieves better performance in the case of the step response. Overall, the RL agent achieved a lower LQG cost and stable response compared to the conventional PI controller.
5.2 Ball–Beam Setup Training of RL agent was carried out till 1000 episodes as reward signal was saturated till this time. Figure 8 shows the reward progress during the training. The dark orange line shows the average reward which is increasing with episodes. An increase
Table 3 DC motor results Parameter LQG cost Gain margin time (dB) Phase margin (◦ C) Rise time (s) Settling time (s) Overshoot (%)
RL controller
Conventional PI
44.384 7.079 65.28 0.309 2.339 0.678
46.526 5.745 48.728 0.338 2.408 19.8
126
Fig. 7 DC motor step response
Fig. 8 RL reward plot
R. S. Shekhawat and N. Singh
9 Application of Reinforcement Learning in Control …
127
Fig. 9 Ball–beam step response Table 4 Ball–beam system results Parameter RL controller LQG cost Gain margin time (dB) Phase margin (◦ C) Rise time (s) Settling time (s) Overshoot (%)
35.65 9.1 30.75 1.62 6.03 12.84
Conventional PI 270.8 6.2 10.67 3.2 6.08 0.63
in reward shows a decrease in the cost function. It shows that the average reward increased from −240 to −40 by the end of the training. For the evaluation of the trained neural network weights, K p and K d are used. These tuned parameters are used in the PD controller for the ball–beam [10] system to compare its performance with the conventional PD controller. Step response of RL and conventional PD is given in Fig. 9. Table 4 shows the result for stability analysis and step response parameters. Settling time is 6.03 s for RL and 6.08 s for conventional PD. Also, the rise time for RL is 1.62 s while for PD is 3.2 s. This shows that the RL agent has a lower settling time and rise time in the case of step response compared to the conventional PD controller. However, RL has a high overshoot compared to the PD controller. The main objective of the RL agent was to reduce the cost function. The RL agent has a lower cost value of 35.65 compared to 270.8 of conventional PD controller. Also, the RL controller is more stable as it has a higher phase margin and gain margin values. Note that the RL agent’s main objective was to minimize the cost function. The RL agent was not trained to meet the certain criteria for a particular value of
128
R. S. Shekhawat and N. Singh
settling time and overshoot, since this is a nonlinear system which makes it difficult to achieve the benefit of both of the world, i.e., low cost and better step response.
6 Concluding Remarks In this paper, the application of reinforcement learning in designing the controllers for the control system applications is shown. By applying an MDP model over the closed-loop system of DC motor as well as the ball–beam system, we trained the RL agent to minimize the cost function. By selecting a common goal, i.e., minimize the cost function for RL agent and standard PI/PD controller to have a better and fair comparison, RL is applied as both kinds of systems: linear and nonlinear systems. Further evaluating the performance of the RL agent with standard PI/PD controllers, it is concluded that the RL agent minimized the cost function by a great margin compared to standard PI/PD controllers. RL agent is also more stable, hence showing that a data-driven technique can be used in designing the controllers which will be beneficial especially in the case when we do not have a working model of the systems. In the future, these approaches will be applied for system where their models are not present.
References 1. Meshram PM, Kanojiya RG (2012) Tuning of pid controller using ziegler-nichols method for speed control of dc motor. In: IEEE-international conference on advances in engineering, science and management (ICAESM-2012), pp 117–122 2. Xia Y, Zhu Z, Fu M, Wang S (2011) Attitude tracking of rigid spacecraft with bounded disturbances. IEEE Trans Industr Electron 58(2):647–659 3. Yurkevich V (2008) Pi and pid controller design for nonlinear systems in the presence of a time delay via singular perturbation technique, vol 01, pp 168 – 174 4. Kangbeom Cheon MH, Kim J, Lee D (2015) On replacing pid controller with deep learning controller for dc motor system. J Autom Control Eng 3(6) 5. Ghosal D, Shukla S, Sim A, Thakur AV, Wu K (2019) A reinforcement learning based network scheduler for deadline-driven data transfers. In: IEEE global communications conference (GLOBECOM). IEEE, 1–6 6. Wang XS, Cheng YH, Sun W (2007) A proposal of adaptive pid controller based on reinforcement learning. J China Univ Mining Technol 17(1):40–44 7. Samsuden MA, Diah NM, Rahman NA (2019) A review paper on implementing reinforcement learning technique in optimising games performance. In: 2019 IEEE 9th international conference on system engineering and technology (ICSET), pp 258–263 8. Meiling Ding BL, Wang L (2019) Position control for ball and beam system based on active disturbance rejection control. Syst Sci Control Eng 7(1):97–108 9. Dankwa S, Zheng W (2019) Twin-delayed ddpg: a deep reinforcement learning technique to model a continuous movement of an intelligent robot agent, pp 1–5 10. Shah M, Ali R, Malik FM (2018) Control of ball and beam with lqr control scheme using flatness based approach. In: 2018 international conference on computing, electronic and electrical engineering (ICE Cube), pp 1–5
Chapter 10
Classifying Skin Cancer Images Based on Machine Learning Algorithms and a CNN Model S. Aswath and M. Kalaiyarivu Cholan
1 Introduction The past few decades elucidate the kind of significance toward the need for taking care of one’s skin. It is almost impossible in general to estimate how many people suffer from skin diseases, because there are so many various skin diseases currently existing in our environment. Despite skin being constantly regenerated, it is highly discernible due to which it is susceptible to various skin diseases. Its innumerable vital consequences encompass inoculating the body from microbes and similar elements, body temperature control and transmitting sensations such as cold, touch and heat. Our skin comprises three layers. The outermost layer that is waterproof in nature is the epidermis, and this layer unveils our skin tone. The middle layer encompassing hair follicles and sweat glands is called the dermis. The third innermost layer is the subcutaneous layer and is also called as the hypodermis. This layer incorporates fat and connective tissue. Any one of these layers is susceptible to skin conditions for infection. Innumerable factors precipitating skin diseases and influencing skin disorder pattern are occupation, genetics, habits, nutrition, etc. Geographical factors like season and climate also tend to be the factors in certain cases. In developing countries, congestion among crowds and poor hygiene are accountable for spreading of skin diseases. Early detection of skin disease would be very productive in such scenarios. Skin diseases can be stimulated by bacterial infections, allergies, viruses or fungal infections, etc. A perceptible change in texture or color is resulted from these infections. Some skin disease when left uncured may turn into more chronic, infectious diseases. This may more likely result in a skin cancer. Wherefore, a diagnosis at an earlier stage may prove quite effective in hampering the development of the disease S. Aswath (B) · M. Kalaiyarivu Cholan Department of Computer Science, PES University, Bengaluru, Karnataka, India Department of Computer Science, Annamalai University, Chidambaram, Tamil Nadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_10
129
130
S. Aswath and M. Kalaiyarivu Cholan
and thereby its spread too. The treatment and diagnosis of these diseases are mostly expensive and involve enduring lot of pain too. Human Experts’ Diagnosis predominantly depends on subjective judgements and thereby not likely to be reproducible, whereas a diagnostic system using automatic computer-based technology is more reliable and objective. This technology will cure the skin disease as soon as possible by identifying the pattern of skin diseases that varies from country to country. With the advent of proliferation of clinical innovation, the intention of computer being utilized for the determination of skin diseases has been quite common around as of late. Deployment of computer novelty can make it less difficult to recognize the skin diseases just from the pictures of the trained skin picture and substantially succor the human’s capacity to examine complex data. Man-made reasoning has been replaced by computerization in all fields, and medical service is not an exception. A computer can impressively and effortlessly decipher a great deal of pictures where it is arduous for the human to decipher quite plenty of information and scrutinize the subtleties that are subsided by the picture inside. Classification that has been computer-aided and computer-dependent diagnoses has recently got alluring and is being worked on by many examinations. Computer-based determination has demonstrated to be exceptionally useful in skin disease finding. Automatic computer-based classification technology is very helpful to doctors and patients. In this paper, we are proposing a model which will have the capability to classify into multiclass skin cancer types (melanocytic nevi, melanoma, benign keratosis-like lesions, basal cell carcinoma, actinic keratoses, vascular lesions, dermatofibroma) accurately without taking any aid from an expert. This would lead to getting the disease treated at its inceptive stages thereby preventing inessential pathological tests. We are deploying five different feature extraction algorithms such as GLCM, HOG, color histogram, LBP and CNN. Out of these, CNN outperformed other feature extraction algorithms. We used four machine learning algorithms to classify the images. The algorithms used are random forest classifier, XGBoost classifier, support vector machine and CatBoost.
2 Literature Survey The early 1990s marked the period where diagnoses aided by computer systems were introduced to bridle the challenges that the dermatologists had to face while performing skin cancer classification [1]. The primary efforts being made by employing dermoscopy images were constricted for classifying melanoma and benign skin cancer lesions [2]. Machine learning algorithms that were traditional in nature such as decision trees [3], K-nearest neighbors [4], super vector machines [5], logistic regression [6], naive Bayes classifier [7] and artificial neural networks [8] were all consolidated as an attempt to procure a method that is not only reliable but also accurate in nature. But, it was observed that there were high inter-cell and intra-cell disparities in melanoma, which resulted in displeasing results when based on diag-
10 Classifying Skin Cancer Images Based on Machine Learning Algorithms …
131
nostic performance using handcrafted features. In [9], Vinayaka proposed a model in which locally adaptive regression kernel was used for feature extraction. CatBoost and multilayer perceptron were later used for classifying images from three datasets such as Grimace, faces95 and faces96. Tremendous amount of work have been done on classification systems using computer-aided technologies. Smartphone utilization in the field of health care has also gained wide attention. This proposition leads to a more easily accessibly smartphone-based architecture for clinical purposes. Bourouis et al. [10] in 2013 proposed an artificial neural network model aimed to recognize skin cancer for which he utilized pictures of skin. The model built was based on mobile whose main goal was to perceive and classify abnormal and normal skin types. In this proposal, training of a neural network has been performed by utilizing a multilayer perceptron algorithm and is later tested with abnormal and normal skin datasets. An accuracy of 96.50% was achieved in identifying abnormal skin from normal using the algorithm that was based on mobile neural networks. In the recent times, deep convolutional neural networks (CNN) has been used to a great extent to identify skin cancer. This resulted in a breakthrough to the problems that were existing and expeditiously became the choice that was mostly preferred when it comes to skin cancer classification [11]. This is predominantly because besides producing high accuracy values in classification, it also attenuates the burden of a machine learning expert in terms of feature engineering by spontaneously recognizing abstractions in the datasets that are deeper and at higher level [12]. Deep convolutional neural networks (CNN) was employed by Liao [13] in 2016 to identify and classify skin diseases. The dataset used here has been taken from two different sources and consists of twenty-two types of various classes of skin diseases. CNN has been consolidated with other algorithms to achieve better accuracies. Deeskhith et al. demonstrated this in [14] where CNN was integrated with HOG for better accuracies in order to identify emotions from lecturers during their lecture videos and predict the feedback. This system helped to foretell the feedback to be received by the professor even before the actual feedback is received. Features extracted from CNN were demonstrated on a linear classifier by Kawahara et al. [15], and this was pretrained on a dataset which consists of 1300 natural images. This was able to be highly accurate while distinguishing skin lesions up to ten in number. No intricate preprocessing or segmentation of lesions is required for the proposed method. An accuracy of 81.9% and 85.8% over 10 classes and 5 classes was attained using this approach. But, the count of images that were utilized for useful feature extraction while training was found to be insufficient. In [8], the authors had proposed a CNN architecture that was both novel in nature and also consisted of diverse tracts for classifications of skin lesions. They transformed a CNN that was already trained on single resolution to be able to function for inputs which are multi-trained. They achieved an accuracy of 79.15% when they had fine-tuned the entire network over a publicly available lesion dataset for ten classes. Classification of skin type utilizing a machine learning strategy and convolutional neural network (CNN) was presented by Alarifi et al. [16] . Skin types were classified into three categories: spotted, normal and wrinkled. He had implemented the CNN architecture incorporated by GoogleNet
132
S. Aswath and M. Kalaiyarivu Cholan
and utilized a Caffe framework. Evaluation after training and testing the result was executed utilizing performance metrics such as accuracy, precision, recall, F1-score, sensitivity and false negative rate. Deep learning approaches as well as machine learning approaches have been used for performing this experiment. Aswath et al. in [17] had identified conflicts on Reddit based on soccer and later analyzed and procured some interesting insights. Features based on CNN were contemplated in [18], but it was inadequate to train effectively because only 900 images were utilized for training this neural network model. Singla et al. in [19] have provided some interesting insights in deploying ML solutions to detect diseases and perceived their performances.
3 Dataset Philipp Tschandl created this dataset predominantly to intercept the problem in which there was lack of variegation and small-sized dataset being available for dermatoscopic images. This had created difficulties in the training of neural networks while automating the process of skin lesions diagnosis. The dataset embodies 10015 images which are dermatoscopic in nature. Out of which 6705 images belong to “Melanocytic Nevi”, 1113 belong to “Melanoma”, 1099 belong to “Benign keratosis-like lesions”, 514 belong to “Basal cell carcinoma”, 327 belong to “Actinic keratoses”, 142 belong to “Vascular lesions”, 115 belong “Dermatofibroma”. We have split the dataset into training and testing in the ratio 8:2. The dataset consists of a CSV file in which there are nine columns listing out details such as lesion_id, image_id, age, sex, localization. We added another three columns such as path to show the path for the particular image, cell_type to display the particular cell type and cell_type_idx to display the cell category as an integer, and this was performed using the categorical function in Pandas. Using the path column, we redirect to the actual image and perform classification (Fig. 1).
4 Stages of the Pipeline This sections elucidates the fragments that have been incorporated to form the entire system and confers information regarding each of the modules and the way they have been embraced together to devise a pipeline proficient in nature. Figure 2 accentuates all the phases that form the pipeline.
10 Classifying Skin Cancer Images Based on Machine Learning Algorithms …
133
Fig. 1 Sample from HAM10000 dataset
Fig. 2 Illustration of the proposed system
4.1 Image Preprocessing Goal of preprocessing is an improvement of image data that reduces unwanted distortions and enhances some image features important for further image processing. Image preprocessing involves two main things: 1) image resize and 2) grayscale conversion.
4.1.1
Resize Image
Image resizing refers to the scaling of images. Scaling has turned round to be quite advantageous in many image processing techniques as well as applications that are incubent to machine learning. It accomplishes in successfully alleviating the number of pixels that is seen in an image and that proves to be quite beneficially appropriate
134
S. Aswath and M. Kalaiyarivu Cholan
in several cases. For example, it can compress the time of training model as higher is the number of pixels, and it turns out that in that image higher is the number of input nodes that in turn upswings the complexity with respect to the model. Skin lesion original image size is 600 × 400 dimensions. So, we have resized all the images into 100 * 100 dimensions. This assists in reducing the complexity of the preprocessing step and increases the speed.
4.1.2
Conversion to Grayscale
It denotes a technique in preprocessing that pertains to images that consist three channels. As we can notice that each pixel in the image consists of three channel values, this goes in order such as red which is the first channel, green being the second one and finally the last value signifies blue’s intensity. It is indispensable that each pixel has to map to exactly one value in the grayscale. Three methods are utilized to attain this: Firstly, it is the average method, in which the three following channels are taken into account, and their average values are calculated using the equation. G(x) =
(Red + Green + Blue) 3
(1)
Secondly comes the luminosity technique in which weighted average computation of three channels R, G, B is performed. We can also infer that green color is easily responded to by humans than red and blue; therefore, the highest weight has been assigned to green color. The following equation can be utilized to calculate using this technique: G(x) = 0.21 · R + 0.72 · G + 0.07 · B (2) Lastly comes the lightness method in which we take the least significant and the most significant colors into consideration. Computation of the average values of these channels is performed as following: G(x) =
(Max(Red, Green, Blue) + Min(Red, Green, Blue)) 2
(3)
4.2 Image Segmentation This has been utilized as a segmentation technique in our project. The primary intention behind using this edge detection operator is to recognize a broad range of edges existing in the images. The following four steps have been involved in this technique: • Freeing the images from unnecessary noises as well as clearing undesirable speckles. • Gradient’s direction and intensity are acquired utilizing the gradient operator.
10 Classifying Skin Cancer Images Based on Machine Learning Algorithms …
135
• Determination of the superiority of a pixel by comparing it with its corresponding neighbors by non-maximum suppression. • Detection of the beginning and the ending of the edges using hysteresis threshold. In this technique, intensity of pixels is indicated by edges. In order to ascertain it, the preferable method is to accentuate the variance in intensities in both the vertical as well as in the horizontal directions. On smoothing the image, we calculate Ix and I y by ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ −1 0 1 1 2 1 K x = ⎝⎣−2 0 2⎦⎠ , K y = ⎝⎣ 0 0 0 ⎦⎠ (4) −1 0 1 −1 −2 −1 The slope θ of the gradient and the magnitude G have been calculated as follows: ||G =
(Ix )2 + (I y )2
θ (x, y) = arctan
Ix Iy
(5)
(6)
4.3 Feature Extraction 4.3.1
Gray-level Co-occurrence Matrix
Gray-level co-occurrence matrix represents an estimation in which gray level sets of an image joint to a probability density function (PDF). The following equation expresses GLCM: (7) Pμ (i, j)(i, j = 0, 1, 2, . . . , N − 1), where i, j denote the two pixels’ gray level, N represents the gray grades, μ denotes the relation between the position of both the pixels such as (μ = (x, y). Various values of μ determine the direction and the distance between both the pixels. Generally, 0°, 45°, 90°, 135°can be chosen as the four directions. Extraction of textural features can be accomplished for a grayscale image utilizing the GLCM matrix for the distance between the patterns and a specific directional orientation. Some of the textural features that can be potentially extracted utilizing GLCM are: • Energy: This textural feature calculates the measurement of degree of the pixel pair recurrences. It is the estimation of texture’s disarray in an image. Energy values tend to be very high when the pixels are highly correlated. The following equation defines “energy”: N −1
pμ2 (i, j) (8) k1 = i=0
136
S. Aswath and M. Kalaiyarivu Cholan
Fig. 3 Gray-level co-occurrence matrix texture feature analysis
• Contrast: It is the measurement of the spatial frequency present in an image which also decides local variations in an image. The difference that exists between the brightness of a specific object with its neighboring objects and the difference in colors aids to measure contrast. The following equation defines “contrast” k2 =
N −1
i=0
t2
⎧ N −1 N −1 ⎨ ⎩
i=0 j=0
pμ (i, j)
⎫ ⎬ ⎭
(9)
• Correlation: It performs an estimation of the way a particular pixel has been correlated over a image entirely with its neighboring pixels. The correlation value is infinity if the image is constant, whereas it varies between −1 and 1 to denote the negative correlation of an image which is perfectly positive. The below equation denotes the “correlation” feature: N −1 N −1
pμ (i, j)(i − μi )( j − μ j ) k3 = σi σ j i=0 j=0
4.3.2
(10)
Color Histogram
This feature procures information related to color of an image such as image contrast, dispensation of pixel brightness and distribution of color. It represents the affluence of an image depending on histogram-indexed result of vector quantization (VQ). Segregation of an image into multiple blocks of image is done. Max and min quan-
10 Classifying Skin Cancer Images Based on Machine Learning Algorithms …
137
tizers are the image quantizers that compute VQ-indexed colors of each blocks of the image. Let us consider an input image F to be of dimensions M * N in a general RGB color range. This image F is first dissected into numerous non-overlapping blocks of image each of size m * n. If f (i, j) = f R (x, y), f G (x, y), f B (x, y) is a block of an image with i and j as its indexes, then the minimum quantizer for this block of image can be acquired by pinpointing the least pixel values over blue, green and red spaces as following: qmin (i, j) = min∀(x,y) f R (x, y), min∀(x,y) f G (x, y)min∀(x,y) f B (x, y)
(11)
Utilizing this in a similar method, we acquire the maximum quantizer by pinpointing the maximum pixel values over blue, green and red spaces as follows: qmax (i, j) = max∀(x,y) f R (x, y), max∀(x,y) f G (x, y)max∀(x,y) f B (x, y)
(12)
where x can be in the range [1, 2, 3, . . ., m] and y in the range [1, 2, 3, . . ., n]
4.3.3
Local NBinary Patterns
Local binary pattern as well as its mutants has been extensively utilized in myriads of applications in computer vision and image processing. Capture of the textural information present in an image is performed by LBP. The histogram that is obtained from the LBP code can be employed as a picture feature descriptor in order to retrieve and classify images. When color images are taken into consideration, it fist converts them into grayscale before computing LBP. Neighboring pixel values have been taken into consideration and compared with the central pixel value which is the currently processed value. Firstly, the input image F which is of dimensions M * N located in the RGB color region is transformed into inter-band mean depiction as follows (Fig. 3): 1 (13) g(x, y) = [ f R (x, y) + f G (x, y) + f B (x, y)] 3 In which x is equal to 1, 2, 3, . . . , M and y is equal to 1, 2, 3, . . . , N . (x, y) indicate the position of a pixel in a picture. R, G, B refer to red, green and blue regions in color. For a center pixel denoted by gc for which the neighboring values in pixel are represented by gp , we can compute LBP utilizing the below formula: LBPP,R (x, y) =
P−1
s(gp − gc )2p ,
p=0
1, x ≥ 0; where s(x) = 0, x < 0
(14)
138
S. Aswath and M. Kalaiyarivu Cholan
where P denotes the pixels lying in the neighborhood and R indicates the radius. p indicates the neighboring pixel’s index.
4.3.4
Histogram of Gradients
HOG is the feature descriptor in which the appearance of a local object and its shape as it is perceived in an image can be depicted in the form of distribution of density of gradients. Dissection of the image into cells assists in implementing this descriptor. Each pixel is present inside a cell where the cell assembles the direction of histogram of gradients for that pixel. In the first step, we calculate the values of gradients by implementing a 1D based in order to acquire the discrete points of derivative masks as follows: In order to embellish a dense representation and fabricate HOG way more resilient to undesired noise, the picture is dissected into 8 * 8 cells. For every single cell present, we determine the corresponding HOG values. With respect to a particular region, we reckon the gradient’s direction by deploying a histogram which is constituted by deploying the corresponding quantity and the 64 values obtained in the directs of the gradients. The histograms are classified based on the angles corresponding the gradients which are in the range of 0–180. The three subcategories must be taken into account while constructing HOG: • Add the angle to the right class of HOG if the angle turns out to be lesser than 160 and does not incisively come in the midway of two categories. • Proportionate the amount of contribution and distribute correspondingly if the angle turns out to be lower than 160 and exactly comes in between the two categories. • Distribution of the pixel in a proportional manner between 0 and 160 if the angle turns out to be greater than 160.
4.3.5
Convolutional Neural Networks
We are employing the base model to be Xception which is based on convolution layers that are distinguishable in a depthwise manner. Xception originally refers to “extreme Inception” because the hyposthesis of “Xception” is a more resilient version of the hypothesis intrinsic to Inception architecture [80]. Thirty-six layers of convolution are present that constitute the Xception network which strives to extract features. The Xception model incorporates multiple flows such an entry, middle and an exit flows. It incorporates a convolutional layer which performs the final decision in activation and filters the input in a uncomplicated executional manner. In our experiment, we have added certain layers at the end of the Xception model so that it will suit our project and give more efficient results in the output. We have added a flattening layer to our base model. This flattening layers assist in transmuting into a single-dimensional arrayed values which can be passed into the next layer. This is predominantly done to get a long single vector of features. The next layer that
10 Classifying Skin Cancer Images Based on Machine Learning Algorithms …
139
we are connecting here is the dense layer which can also be referred to as a fully connected layer. This layer has obtained its name due to the fact that it is densely connected where almost all of the neurons present in a particular layer are connected to the ones in the following layer. A dropout layer has been deployed after this in order to circumvent overfitting on the data that have been used in the training stage. It also enacts a masking role where it turns out the contribution of certain neurons to be void in nature when it passes to the next layer and lets the other neurons unimpacted. In the absence of this layer, there could be a scenario where the samples that are present in the very first batch of training could impact the learning process in a highly disproportionate manner. In order to attenuate the count of training epochs that are necessary to train deep networks, we include a layer called batch normalization which follows a technique to train very deep neural networks in which the each mini-batches are standardized as an input to a particular layer. Thereby by enhancing regularization, it circumvents generalization errors. Inclusion of a ReLU layer with this model enhances the efficiency of computations by accelerating the training of the network unaccompanied by any substantial difference in its accuracy. There could be an issue such as the vanishing gradient predicament in which case the time for training appears to be more for layers in the network which are present in the lower position. Eventually, in order to result in the form of distribution of probabilities where a input value gets normalized into a vector, we use softmax function.
4.4 Classifiers 4.4.1
Random Forest Classifier
It is a classification technique that has been orchestrated on decision trees employed on a divide and conquer method on a dataset that has been randomly split. One of the predominant reasons to call it a forest is that it consists of a collection of decision trees together. Each tree in this group is based on a random sample independent with other trees, and a feature selection detector is generated for each tree. When it comes to the aspect of classification, each tree in the group would vote, and the final result is the winning tree which turns out to be the most popular class. Gini importance is used in order to find the significance of a node by the following equation: ni j = w j C j − wleft( j) Cleft( j) − wright( j) Cleft( j)
(15)
where ni represents node’s importance, C sub(j) denotes the node’s impurity value, w sub(j) indicates the weighted number of samples reaching a specific node j.
140
4.4.2
S. Aswath and M. Kalaiyarivu Cholan
XGBoost
Boosting in its extreme form has been designated the name XGBoost. It is tree technique in its ensemble form that deploys a gradient descent architecture and applies the proposition of boosting the weak learners. Algorithmic enhancements and optimization of systems have been performed on top of the gradient boosting machine (GBM) framework to result in XGBoost. Some of the system optimizations that have been incorporated in XGBoost are: • Hardware Optimization: Few enhancements have been included to utilize hard resources in an efficient manner. In order to accumulate gradient statistics, internal buffers have been apportioned to create cache awareness and optimized use of available disk memory for computations by “out-of-core” • Paralleization: An interchange of the order of the loops improves run time considerably. Parallel threads have been employed to sort them. And, this switch results in a significant improvement in its performance. • Tree Pruning: Unlike GBM framework which is greedy , utilization of a depth first approach considerably improves the computational performance. The objective function at an iteration t which we might require in order to minimize is the below equation: L (t) =
n
l(yi , yit−1 + f t (xi )) + ( f t )
(16)
=1
4.4.3
CatBoost
An exclusive version of gradient boosting is following by CatBoost. It asserts ordered features and also supports categorical features. A feature which is categorical in nature can only receive bounded values for a particular instance. A consequential challenge that has been observed in the machine learning is to operate with categorical values. In this case, CatBoost has accomplished itself to be quite competent while examining data categories such as image, text, audio and also data that are historical in nature (Table 1).
Table 1 Measures of accuracies monitored from experimentation in other algorithms Random forest XGBoost CatBoost SVM classifier GLCM LBP HOG Color histogram
72.5 78.8 74.35 77.5
75.2 78.1 76.35 72.35
74 78.6 77.05 76.72
77.5 79 75.7 79.6
10 Classifying Skin Cancer Images Based on Machine Learning Algorithms …
141
Columns have been encoded into format like one-hot encoding, and CatBoost allots indices to such categorical columns. A proficient technique like mean encoding has been deployed in order to attenuate overfitting when certain categories surpass the max size of one hot. Then, it implements the transmutation of categories values into values which are floating point or an integer by utilizing the below equation: avg_target =
countInClass + prior totalCount + 1
(17)
where CountInClass denotes the label value’s frequency of existence where it might be 1 if it embodies a feature value of current category. totalCount denotes the complete count of objects in those whose feature values of categories is similar. It can also be denoted as below: p=1 j=1 [x σ,k = x σ p ,k ]Yσ j + α · P (18) p=1 j=1 [x σ,k = x σ p ,k ] + α 4.4.4
Support Vector Machine
A supervised technique that is predominantly deployed for classifications and regressions in machine learning. The principal intention behind deploying this algorithm in classification task is to determine the boundary lines called a hyperplane that works best in distinguishing the n-dimensional space provided. This serves us by assisting in insertion of our fresh unassigned data in the most accurate category. SVM principally deploys a method in which it sets a line which enacts a linear surface decision in order to separate the datasets and utilizes a strategy to optimize on both the sides by using a quadratic programing technique. In scenarios where the data are not linearly distinguishable, the given data would be transfigured into spaces of higher dimensional such as R3 , R4 , . . . , Rn . We have employed Gaussian kernel in our project also known as the radial basis function kernel mainly for this reason. In this kind of kernel, the function is the calculated distance from the origin to a specific point. The below equation denotes the Gaussian kernel: K (X 1 , X 2 ) = exponent(−γ X 1 − X 2 2 )
(19)
where X 1 − X 2 indicates the Euclidean distance that exists between X 1 and X 2 . The γ denotes that we have deployed a RBF kernel in our project (Table 2).
4.5 Results An estimation of the percentage of observations that are positive in nature was determined accurately out of the entire positive examinations which were anticipated and
142
S. Aswath and M. Kalaiyarivu Cholan
Table 2 Measures of correctness monitored from experimentation in CNN Precision Recall F1-score Random forest classifier XGBoost CatBoost SVM
0.82
0.83
0.82
0.83 0.83 0.86
0.82 0.84 0.84
0.84 0.83 0.85
indicated by precision. Recall denotes the percentage in which the observations that are positive in nature have been accurately determined when we compare it with the total observations that are actually incorporated in the given class. The weighted mean of recall and precision that contemplates both false negatives as well as false positives is called as the F1-score. We have evaluated the performances of four of the machine learning algorithms such as random forest classifier, XGBoost, CatBoost and support vector machine on features that have been extracted using diverse algorithms such as GLCM, LBP, HOG, color histogram and an extended version of Xception model. We perceive that the extended Xception model extracts features more effectively when compared with the other feature extraction algorithms. We can also conclude that SVM has performed comparatively better that XGBoost and random forest classifier when the features were extracted deploying algorithms such as GLCM, LBP and color histogram, whereas CatBoost performed better when the features were extracted utilizing histogram of gradients (HOG). We also would conclude that our extended CNN model performed significantly better when compared to other feature extraction algorithms such a GLCM, LBP, HOG and color histogram. We can observe that to be on the safer side, we can conclude that SVM performs better when the features are extracted employing the CNN, LBP or color histogram algorithms and CatBoost performed comparatively better when when the features were extracted deploying HOG algorithm. When we look into the CNN model, we perceive SVM to perform better than the other algorithms.
5 Conclusion and Future Work Satisfactory performance was observed by the model contemplating the hardware that is substandard in nature which we had employed to train as well as test our model. Despite the perceived performance, there are well prospects for enhancements when we consider the feature extraction techniques. As we have comprehended that extraction of features consumes lot of time involved in computations in the pipeline, this impediment makes it a highly arduous procedure to train the model on
10 Classifying Skin Cancer Images Based on Machine Learning Algorithms …
143
the available hardware. Therefore, efficient alternatives that can possibly run on a lightweight machine can optimize its performance and aid us in coming up with a more lightweight designs.
References 1. White R, Rigel DS, Friedman RJ (1991) Computer applications in the diagnosis and prognosis of malignant melanoma. Dermatol Clin 9(4):695–702 2. Ramteke NS, Jain SV (2013) ABCD rule based automatic computer-aided skin cancer detection using matlab. Int J Comput Technol Appl 4(4):691 3. Celebi ME, Iyatomi H, Stoecker WV, Moss RH, Rabinovitz HS, Argenziano G, Soyer HP (2008) Automatic detection of blue-white veil and related structures in dermoscopy images. Comput Med Imaging Graph 32(8):670–677 4. Ballerini L, Fisher RB, Aldridge B, Rees J (2013) A color and texture based hierarchical k-nn approach to the classification of non-melanoma skin lesions. In: Color medical image analysis. Springer, pp 63–86 5. Celebi ME, Kingravi HA, Uddin B, Iyatomi H, Aslandogan YA, Stoecker WV, Moss RH (2007) A methodological approach to the classification of dermoscopy images. Comput Med Imaging Graph 31(6):362–373 6. Blum A, Luedtke H, Ellwanger U, Schwabe R, Rassner G, Garbe C (2004) Digital image analysis for diagnosis of cutaneous melanoma. development of a highly effective computer algorithm based on analysis of 837 melanocytic lesions. Br J Dermatol 151(5):1029–1038 7. Maglogiannis I, Doukas CN (2009) Overview of advanced computer vision systems for skin lesions characterization. IEEE Trans inf Technol Biomed 13(5):721–733 8. Iyatomi H, Oka H, Saito M, Miyake A, Kimoto M, Yamagami J, Kobayashi S, Tanikawa A, Hagiwara M, Ogawa K et al (2006) Quantitative assessment of tumour extraction from dermoscopy images and evaluation of computer-based extraction methods for an automatic melanoma diagnostic system. Melanoma Res 16(2):183–190 9. Kamath VR, Varun M, Aswath S (2021) Facial image indexing using locally extracted sparse vectors. In: Advances in artificial intelligence and data engineering. Springer, pp 1255–1270 10. Bourouis A, Zerdazi A, Feham M, Bouchachia A (2013) M-health: skin disease analysis system using smartphone’s camera. Procedia Comput Sci 19:1116–1120 11. Codella N, Cai J, Abedini M, Garnavi R, Halpern A, Smith JR (2015) Deep learning, sparse coding, and SVM for melanoma recognition in dermoscopy images. In: International workshop on machine learning in medical imaging. Springer, pp 118–126 12. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 13. Liao H (2016) A deep learning approach to universal skin disease classification. Department of Computer Science, CSC, University of Rochester 14. Godavarthi D, Aswath S, Mishra D, Jayashree R (2020) Analysing emotions on lecture videos using CNN and HOG (workshop paper). In: 2020 IEEE Sixth international conference on multimedia big data (BigMM). IEEE, pp 435–440 15. Kawahara J, BenTaieb A, Hamarneh G (2016) Deep features to classify skin lesions. In: 2016 IEEE 13th international symposium on biomedical imaging (ISBI). IEEE, pp 1397–1400 16. Alarifi JS, Goyal M, Davison AK, Dancey D, Khan R, Yap MH (2017) Facial skin classification using convolutional neural networks. In: International conference image analysis and recognition. Springer, pp 479–485 17. Aswath S, Godavarthi D, Das B (2020) Analysing conflicts in online football communities of reddit. In: 2020 International conference on emerging trends in information technology and engineering (ic-ETITE). IEEE, pp 1–6
144
S. Aswath and M. Kalaiyarivu Cholan
18. Yu L, Chen H, Dou Q, Qin J, Heng PA (2016) Automated melanoma recognition in dermoscopy images via very deep residual networks. IEEE Trans Med Imaging 36(4):994–1004 19. Singla S, Veeramalai G, Aswath S, Pal VK, Namjoshi V (2021) 4 machine learning: an ultimate solution for diagnosis and treatment of cancer. In: Artificial intelligence for data-driven medical diagnosis. De Gruyter, pp 85–102
Chapter 11
Lossless Data Compression Method Using Deep Learning Rahul Barman, Sayali Badade, Sharvari Deshpande, Shruti Agarwal, and Nilima Kulkarni
1 Introduction Data is changing the entire world in today’s era. An enormous amount of data is generated every day. According to a survey, by the end of 2025, 463 exabytes of data would be generated daily. Also, nine out of every ten people would be digitally active globally. The statistics clearly depict the rise of digitalization and the need for data. The greater the amount of data, the more will be the demand for storing it. This in turn increases the cost for storage. Modern computers can store a large number of files; however, the file size still matters. The smaller the size of files, the more files can be stored. To reduce the size of the files, data compression techniques are used widely. Data compression reduces the number of bits required to express the data in a compact format. It involves re-encoding the data using fewer bits than the actual representation. There are two major forms of data compression, namely “lossy data compression” and “lossless data compression”. As the name suggests, lossy compression involves some loss of data during decompression. It can achieve much higher compression ratios; however, the quality of the file is compromised. On the other hand, lossless data compression ensures that there is no loss of data and that no information is actually removed. This type of data compression technique is often used when the quality of the file format needs to remain intact. Most businesses today rely on data compression in some way or the other due to the various benefits that this technique provides. Mainly, the storage capacity concerns have been resolved due to data compression. There are various file formats on which data compression can be applied such as text, images, video, audio, etc. Several predefined algorithms are used to compress different file formats. For instance, R. Barman · S. Badade · S. Deshpande · S. Agarwal · N. Kulkarni (B) Department of Computer Science and Engineering, MIT School of Engineering, MIT ADT University, Pune-412201, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_11
145
146
R. Barman et al.
Huffman encoding algorithms [1] have proved to be the best for text format compression. Lempel–Ziv Welch (LZW), [2] delta encoding, discrete cosine transformation (DCT) and discrete wavelet transform (DWT) are the most common techniques which are used for image/video compression. Few other advantages of compression are the reduced data transmission time and communication bandwidth. Also, data compression helps to achieve higher streaming speed even with lower bandwidth.
2 Literature Survey As described by James E. Fowlert and Roni Yagel in the paper [3] “Optimal Linear prediction for the Lossless Compression of Volume Data”, 2002, a lossless data compression algorithm is presented which makes use of optimal linear prediction to derive benefit from the correlations between two or more things. Volumetric data sets, which are used in biomedicines and other scientific applications, have been used. The algorithm combines two different methods namely, differential pulse-code modulation (DPCM) and Huffman coding. The accuracy obtained for a set of volume data files is 50%. Various types of images such as magnetic resonance imaging (MRI) images, computed tomography (CT) images and electron-density map data are used for compression purposes. Though this paper provides an efficient algorithm for lossless data compression, mostly only images can be compressed using this system. Also, machine learning has not been used in the system. In the paper [4] “A Lossless Data compression algorithm for real-time database”, 2006, the authors provide an effective and reliable method for lossless data compression of a real-time database. The algorithm is designed using two types of methods known as LZW algorithm and run length encoding (RLE) algorithm. Process data is given as input to the compression model which first classifies it by characteristics. Later, different compression methods are designed for different types of data. Pretreatment approaches are implemented so as to improve the compression ratio of the model. The performance test depicted that the algorithm improved the realtime performance. Also, the efficiency of the model was increased while accessing the database. In spite of using pretreatment approaches to make the model accuracy better, the compression ratio can reach only upto a certain threshold value. Below the threshold value, the data cannot be compressed thus making the system non-adaptive. Another paper entitled [5] “Data compression using encrypted text”, 2002, presented an algorithm for text compression. The proposed algorithm makes use of the words present in the dictionary to encrypt the given text. A unique encryption or signature of each word in the dictionary is replaced by certain special characters. Also, few characters are retained to make the word retrievable. The main idea was to develop a better signature of the text before compression. This was done in order to reduce the storage space required. The compressed data would require less space as compared to the original data. Ten text files from the English text domain were used for the testing purpose. The system showed great accuracy and gave a higher compression ratio. The only drawback of this system was that if information needed
11 Lossless Data Compression Method Using Deep Learning
147
to be exchanged between two parties using this algorithm, then both the parties must share a common dictionary. Also, the system did not make use of the machine learning model. As described by Kodituwakku and Amarasinghe in the paper [6] “Comparison of Lossless Data Compression algorithms for text data”, various lossless data compression algorithms are available like RLE algorithm, Huffman encoding algorithm, arithmetic encoding algorithm, Shannon–Fano algorithm and LZW algorithm for data compression in lossless manners. The authors have made comparisons in terms of compression performances like space efficiency, time efficiency, type and structure of the source of the inputs. They also mention that the compression behaviour of data also depends on the fact that whether the algorithm used is lossy or lossless algorithm. The compression ratios, compression factor, saving percentage, compression time, entropy and code efficiency are discussed in the paper, and their performances are compared with each other. Various charts and tables are provided for the better understanding of the different algorithms for data compression. In the paper [7] “An Efficient Compression Algorithm (ECA) for text data”, 2009, the authors provide a reversible transformation technique to improve the data compression ability of the text data with an efficient level of security to the transmitted data. The proposed algorithm describes two steps involved in the algorithm— creating an intelligent dictionary of commonly used words in expected input files and encoding that input text data. The encoded file received is in ASCII character code form. It provides security from hackers as the data will only be known only when the dictionary is known to the hacker. The compression ratio, compression time and rate are compared with different BWT methods and provide the results along with the visual representation. So overall, the paper describes a text compression algorithm with some added security element into it. The paper [8] “RepoZip: A technique of lossless compression of document collection”, 2015, presents the idea of using RepoZip technique for compressing the already existing compression algorithm over the document collection like collection of OOXML documents or collection of PDF documents. The method is implemented by firstly exploiting the meta-data and content-level redundancies. To detect the redundancies, the Lempel–Ziv-Markov chain (LZMA) approach is used. The method extracts the content from the document and clusters them based on their meta-data type. The clusters are then converted into file type and used as input to the existing compression algorithm. The author compared the LZMA and RepoZip approach where it observed that the RepoZip approach gives much higher compression ratio than the LZMA. In the paper [9] “Neural Network technique for Lossless Image Compression Using X-Ray Images”, 2011, the authors Senthilkumaran and Dr. Suguna proposed a lossless data compression technique by using improved backpropagation neural network. The proposed system makes use of three matrices which are compression performance, compression ratio and transmission time to analyse and compare the existing Huffman coding algorithm. The authors achieve a higher compression ratio by using the proposed improved backpropagation neural network technique than
148
R. Barman et al.
the existing Huffman coding algorithm. The proposed method is mainly applied on X-ray images to ensure the scope of the method in the medical field.
3 Proposed System The proposed system comprises a method of lossless data compression using machine learning. The model used is a sequence-to-sequence recurrent neural network (RNN) model for both compression and decompression. The sequence-to-sequence model can predict sequence data seen in text and images. It contains several long shortterm memory (LSTM) or gated recurrent unit (GRU) layers which keep a track of the context. This context helps in recreating the sequence needed for the output. The sequence-to-sequence model consists of two main parts, namely encoder and decoder. The encoder takes in the input data and returns a context vector. The context vector contains the hidden information from the input data which helps the decoder to generate the correct outputs. This context vector is fed to the decoder system as input. The decoder returns output sequence, hidden state and the context of the sequence. During the training phase, the model is trained on 50,000 training inputs wherein each training input contains 49 characters. To generate the output labels, each training input is transformed into its corresponding binary form. These binary forms are then grouped into seven binary digits and converted back into their decimal form giving the compressed data needed for output labels. The model can compress both text and image data. The text data is divided into batches of 49 characters which is then used for compression. For image data, the image vector is flattened and normalized by dividing each pixel by 255 and multiplied by 128. This flattened vector is divided into batches of 49 pixels and fed to the model for compression. The original data is converted into one hot vector and fed to the encoder model. The encoder model generates the context vector for the data. This context vector is used as input to the decoder model. The decoder model generates the required compressed data and is checked with the output labels to calculate the loss. The optimizer used for the model is RMSProp. The optimizer backpropagates through the model and minimizes the loss. For the decompression model, it takes the compressed data as the input and generates the decompressed data. The same sequence-to-sequence model is used for decompression. The compressed data is fed to the encoder which gives the context vector. This context vector is used as input to the decoder and outputs the decompressed data (Fig. 1). The model was trained on Google Colab which provides Tesla K80 GPU with 12 GB of GPU memory for 12-h training time. It also provides 16 GB of RAM. The programming language used for the model is Python along with the TensorFlow library. The server is a Python Flask chosen for its lightweight and easy integration of machine learning models.
11 Lossless Data Compression Method Using Deep Learning
149
Fig. 1 System block diagram
4 Result Analysis and Discussion The proposed system achieves an accuracy score of 74% on the training data containing 50,000 training inputs. To increase the accuracy score, the model was trained on another batch of 50,000 training inputs. This increased the accuracy score to 86%. The training data used for the model is IMDB Movie Review dataset. To test the model on image data, CatsVsDogs dataset from Kaggle is used. The model can compress 13,704 characters into 8451 characters which is 43% less than the original size. Although the model gives high accuracy and high compression ratio, it takes a lot of time to compress and decompress large file sizes. The use of graphics processing unit (GPU) can decrease the time significantly. The below bar graph shows the comparison between different model accuracies. Compression models used are DPCM [3], RNN [10] and the proposed method which is the sequence-to-sequence model (Fig. 2). The model was tested on four review texts for the movies 3 idiots, Titanic, Avatar and Social Dilemma, each containing a variable number of characters, and the below graph shows the compression for each text document (Fig. 3). The paper [11]“Syntactically Informed Text Compression with Recurrent Neural Network” makes use of RNN for text file format compression. This is compared to the proposed system’s performance (Table 1). The paper [12]“Variable Rate Image Compression with Recurrent Neural Network” makes use of RNN for image file format compression. This is compared to the proposed system’s performance (Table 2).
150
R. Barman et al.
Fig. 2 Line graph comparing different model accuracies
Fig. 3 Graph comparing the total number of characters before and after compression Table 1 Comparison of model accuracies for text compression
Table 2 Comparison of compression ratio for image compression
Sr. no
Model used
Model accuracy (%)
1
RNN
35.99
2
Sequence to sequence
86
Sr. no
Model used
Compression(%)
1
RNN
10
2
Sequence to sequence
43
11 Lossless Data Compression Method Using Deep Learning
151
5 Conclusion On the basis of the review paper [13], it can be observed that various compression algorithms are not able to compress complex data as machine learning is not used in most of the systems. Also the existing systems are unable to compress multiple file formats within the same system. The proposed system present in the paper is a method of lossless data compression using machine learning. The use of machine learning ensures that the model keeps improving its performance over time with more and more data. Also, the model can easily deal with the complex data. The model uses a predefined algorithm which initially compresses the data up to a certain amount. The output of the predefined algorithm is then used as the output labels to the machine learning (sequence-to-sequence) model, and the original data is used as the input to the model. To decompress the compressed data, the compressed data is used as input and the original data is used as output label. The system can compress both text and image data within the same system. In the near future, any type of data (hybrid, audio, video, etc.) which can be represented as a sequence of numbers can be compressed by the proposed system.
References 1. Chi CH, Kan CK, Cheng KS, Wong L (1995) Extending huffman coding for multilingual text compression. In: Proceedings DCC ‘95 data compression conference 1995, pp 437. https://doi. org/10.1109/DCC.1995.515547 2. Sharma K, Gupta K (2017) Lossless data compression techniques and their performance In: International conference on computing, communication and automation (ICCCA), pp 256–261. https://doi.org/10.1109/CCAA.2017.8229810 3. Fowler JE, Yagel R (1995) Optimal linear prediction for the lossless compression of volume data. In: Proceedings DCC ’95 data compression conference, pp 458. https://doi.org/10.1109/ DCC.1995.515568 4. Huang W, Wang W, Xu H (2006) Lossless data compression algorithm for real-time database. In: 6th world congress on intelligent control and automation, pp 6645–6648. https://doi.org/ 10.1109/WCICA.2006.1714368 5. Franceschini RW, Mukherjee A (1996) Data compression using encrypted text. In: Proceedings of data compression conference—DCC ‘96, pp 437. https://doi.org/10.1109/DCC.1996.488369 6. Kodituwakku SR, Amarasinghe S (2010) Comparison of lossless data compression algorithms for text data. Indian J Comput Sci Eng 1 7. Jain A, Patel R (2009) An efficient compression algorithm (ECA) for text data. In: 2009 international conference on signal processing systems, pp 762–765. https://doi.org/10.1109/ ICSPS.2009.96 8. Sumanaweera DN, Doole FF, Pathiraja DP, DeshapriyaGGK, Dias G (2015) RepoZip: a technique for lossless compression of document collections. In: Moratuwa engineering research conference (MERCon) 2015, pp 330–335. https://doi.org/10.1109/MERCon.2015.7112368 9. Senthilkumaran N, Suguna J (2011) Neural network technique for lossless image compression using x-ray images. Int J Comput Electric Eng 3(2):1793–8163 10. Cox D (2016) Syntactically informed text compression with recurrent neural networks. arXiv:1608.02893[cs.LG] 11. Barman R, Deshpande S, Kulkarni N, Agarwal S, Badade S (2021) A review on lossless data compression techniques. Int J Sci Res Eng Trends 7(1):2395–566X ISSN (online)
Chapter 12
Comparative Study on Different Classification Models for Customer Churn Problem Anuj Kinge, Yash Oswal, Tejas Khangal, Nilima Kulkarni, and Priyanka Jha
1 Introduction When a consumer decides to discontinue utilizing your products or services, this is known as customer churn. When a client churns, there are often early warning signs that may have been discovered through churn research. Cost analysis of production and advertisements play a vital role in a company’s growth. Thus, it becomes even more important as the company grows. During the initial growth phase of a company, it is easy to gain customers and the number of customers that are likely to churn is also comparatively low. However, as the company grows, it becomes more difficult to gain new customers. And in the meantime, the customer churn rate grows higher. Contractual and non-contractual relationships are the two sorts of relationships that exist in the customer churn problem [1]. Companies can follow the exhaustive approach that is by improving product quality and come up with marketing advertisements to attract new customers. But getting new clients into a business is a more time-consuming and expensive process as the companies have to spend more on marketing their product. Rather, a cost-effective technique would be to reduce the customer churn rate. This would significantly reduce advertising costs and help the businesses make changes in their product/services based on user’s needs. The pace at which customers switch from one company to another was discovered to be the rate of loss. As a result, identifying the customer who is likely to leave the service is critical [2]. The main idea is to identify customers who are likely to churn for the given bank dataset. To do the same, it is required that the attributes of the data we have for a specific period are correctly identified. It is always a better approach for the bank (any A. Kinge (B) · Y. Oswal · T. Khangal · N. Kulkarni · P. Jha Department of Computer Science and Engineering, MIT School of Engineering, MIT Arts Design and Technology University, Pune, India N. Kulkarni e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_12
153
154
A. Kinge et al.
company or organization) to retain their existing customers rather than just trying to attract new customers. Once these customers are identified, the bank can offer services according to their reasons, use or criteria that they value the most. In Fig. 1, we can see that any particular bank has some objective to use any customer churn predictive model. Some significant parameters are decided in order to build a predictive model which upon further testing has the ability to predict whether or not a customer is likely to churn. Based on the results, bank and re-engage proactively with the customers who as per the prediction are likely to churn. Customer churn prediction is a classification issue, which means it finds the classification rules to help classify a record into one of the specified classes whose label is unknown [3]. This will help the business company or any organization provide better services according to the needs of the user and help them retain their customers. The focus of this paper is to find out the customers that are likely to churn using various machine learning models and to make a comparative study that among the models used.
Fig. 1 Customer churn process
12 Comparative Study on Different Classification Models …
155
Following are some of the popular hyperparameter tuning algorithms. A.
B.
Grid search: Grid search is a method of searching a manually defined portion of the hyperparameter space of a given algorithm exhaustively. This algorithm then compares each model’s score and keeps the best one. Cross-validation on several different folds with different hyperparameter combinations to discover more accurate results is a typical grid search addition [12]. Genetic algorithm: The genetic algorithm is a metaheuristic search method based on natural selection, genetics and evolution mechanisms. The genetic algorithm is a search optimization methodology that simulates the role of genetic material in biological organisms [15]. Because they contain schema, beneficial substructures that can possibly be merged to generate fitter individuals, a small population of individual exemplars can successfully search a wide space. Genetic algorithms are adaptable to their surroundings since they are a platform that appears in a changing environment [14].
The following is a breakdown of the paper’s structure: The first section introduces the concept of a churn prediction process, and the second section offers a literature review on churn prediction models. In the third section, the working process is explained. The experimental results and analyses are addressed in the fourth section, followed by the conclusion in the fifth section.
2 Literature Survey The in-depth survey has been carried out to understand common methods used for customer churn. Some of the selected literature papers are given in Table 1.
3 Proposed System 3.1 Data Set In this proposed work, we have used a data set named “Churn for Bank Customer” from the Kaggle website [11]. The original data set has 10,000 records. We have split the data set into two parts, the Training batch file and the Prediction batch file. The Training batch file consists of 8000 records. The Training batch file data set will be used for training and validation. There are few features such as Customer Id, Credit Score, Age, Tenure, Balance, Number of products bought by the customer, whether the customer has a credit card or not, whether the customer is an active member or not, the customer’s estimated salary and Exit status. Exit status is a dependent feature, whereas all other features are independent features. A correlation between different variables is shown in Fig. 2.
156
A. Kinge et al.
Table 1 Literature survey Author name
Method
Observation
He et al. [4]
The customer churn modelling is based on the SVM model which is improved using the random sampling method
The prediction effect of the classification models was improved, and sampling ratios had an impact on the performance of prediction results
Mishra and Reddy [5]
Various algorithms such as decision tree, naive Bayes classifier, support vector machine and random forest classifier were compared
Random forest classifier was observed to be performing the best among all of the classifiers. This classifier had low specificity, high sensitivity and 91.66% of accuracy
Petkovski et al. [6]
This work covered various phases of analysis such as understanding the business, data cleaning, data preprocessing on telecommunication data set with the various algorithms like naive Bayes classifier, C4.5, KNN and logistic regression were used
It was observed that logistic regression performed the best among all of the algorithms which gave 94.35% of accuracy, whereas naive Bayes classifier, decision tree (C4.5) and KNN gave 85.23%, 91.57% and 90.59% accuracy, respectively
Tianqi Chen and Carlos Guestrin [7]
In this work, XGBoost classifier is used which is a scalable boosting algorithm. Greedy algorithm and approximate algorithm were used to improve the efficiency
After comparing the XGBoost algorithm with pGBRT, it was observed that the XGBoost algorithm took less time for training as compared to the pGBRT algorithm
Anuj Sharma, Prabin Kumar Panigrahi [8]
In this work, artificial neural networks (ANNs) were used for classification in customer churn problem
It was observed that medium-sized neural networks performed quite well with an accuracy of 92%
Ismail et al. [9]
The multilayer perceptron After comparing the (MLP) algorithm was proposed performance of MLP with the to predict customer churn logistic regression and multiple regression, it is observed that MLP neural network was the best algorithm with an accuracy of 91.28%
Qureshi et al. [10]
In this work, regression analysis, decision trees, KNN and ANN were used to predict the customer churn
The results reveal that decision trees are the best accurate classifier algorithm for detecting probable churners in the data set examined
12 Comparative Study on Different Classification Models …
157
Fig. 2 Correlation among different features
In the proposed work, the following classification models are used: A. B.
C.
Logistic regression: Logistic regression is a type of supervised learning that attempts to make predictions using techniques such as least mean square error. KNN: KNN is a classification model that discovers and groups the minimal K Euclidean distances from a given query. After that, a query is assigned to the class that appears the most in the set. SVM: Support vector machine (SVM) is a computer that uses a hyperplane to categorize different classes by maximizing marginal distance (the distance between the hyperplane and marginal plane). Pass-through support vectors and marginal planes are parallel to the hyperplane (support vectors are points of classes nearest to the hyperplane through which the marginal plane passes). The SVM kernel (which increases the dimensionality of data) will be employed in nonlinear classification.
158
D.
E.
F.
G.
H.
I.
J.
A. Kinge et al.
Naive Bayes: The Bayes Theorem is used by the naive Bayes algorithm to determine the conditional probability of a class given the query. For each class, the conditional probability is calculated, and the numbers are then normalized to 1. After that, the question is placed in the class with the highest conditional probability. Decision tree: The internal nodes (also known as decision nodes) of a decision tree classification model operate as test cases, while the leaf nodes represent the classes into which the current query is to be classified. The attributes/features are the decision nodes. The root node is the starting point for constructing a tree. Each attribute’s knowledge acquisition is calculated at each level (which is the difference between the entropy of the entire set and entropy of that particular feature). As a decision node, the attribute/feature with the biggest information gain at that level is chosen. The tree is parsed from top to bottom once the model is ready based on the query. Random forest: The data collection is divided into numerous sample datasets by random forest (in each such sample, a subset of rows and features is selected with replacement). A decision tree is then trained. Multiple decision trees result in trained models with minimal bias and variance, reducing overfitting. Once the model is complete, the given query is sent through each of the decision trees, and the question is classified by a majority vote. Stochastic gradient descent: Stochastic gradient descent is an iterative approach for minimizing the cost function by finding its local minima along the slope. Steps towards minima are taken at each iteration, with the step size determined by the product of learning rate and gradient (which is a partial derivative of the cost function with each feature). Artificial neural network: The artificial neural network (ANN) is a model that was developed to recreate the neuron network seen in the human brain. Here, input will be sent to the node, which will take the total of the input products and their associated weights and pass it on to the next node as output. These weights are determined through training on a set of data. The neural network is made up of several of these nodes stacked in layers. AdaBoost: AdaBoost is a boosting method that creates several base learners and adds sample weight to each record. A base learner is essentially a stump (decision tree of depth 1). For each feature, a decision tree is built, and the one with the lowest entropy is chosen as the base learner. The total error and stump performance will then be determined in order to update the sample weights. The weights of incorrectly predicted tuples will be increased, while the weights of correctly produced tuples will be reduced. A fresh data set will be produced based on modified sample weights, and new stumps will be trained. The operation is repeated once again. This is true for sequential classifiers with a finite number of steps. A query is sent to each of these base learners during categorization, and a majority vote is taken. .XGBoost: XGBoost is a variation of the gradient boosting technique that sequentially builds several decision stumps, each one reducing the mistakes of the one before it. This algorithm is written in such a way that it can operate in
12 Comparative Study on Different Classification Models …
K.
159
parallel on all cores of the devices, maximizing the amount of computing power available. Additionally, it employs cache for the temporary storage of values during training. Its rapid speed is due to the characteristics described above. Overall performance is improved by regularization (which prevents overfitting) and auto-pruning (which prevents the tree from growing beyond a particular depth). Internally, it is also capable of dealing with missing values. CatBoost: CatBoost is another gradient boosting technique that has the benefit of being able to cope with data that is diverse (image, text, audio). Instead of using sequential trees, it uses oblivious trees with the requirement that the same condition is used for prediction at all levels. This simplifies fitting schemes and improves overall CPU efficiency. This makes fitting schemes easier to understand and increases CPU efficiency. Regularization is used to avoid overfitting and to identify the best possible solutions.
In Fig. 3, we can see that our customer churn data set is inputted to the different machine learning models such as logistic regression, K-nearest neighbour (KNN), support vector machine (SVM), stochastic gradient descent (SGD), decision tree, random forest, artificial neural networks, AdaBoost, XGBoost and CatBoost. Some of these models require feature scaling before training, whereas bagging and boosting algorithms do not require feature scaling. Once the features are scaled, these models are trained. After training, we perform some hyperparameter tuning on these models using grid search and genetic algorithm. Finally, a model with the best testing accuracy is selected.
4 Experimental Results 4.1 Performance Analysis In this work, we have used conventional classification algorithms such as logistic regression, K-nearest neighbour (KNN), support vector machine (SVM), stochastic gradient descent(SGD), decision tree and random forest, and boosting algorithms AdaBoost, XGBoost and CatBoost. Figure 4 shows the accuracy of the conventional classifiers before hyperparameter tuning. Figure 5 shows the accuracy of the conventional classifiers after hyperparameter tuning using grid search. Figure 6 shows the accuracy of the conventional classifiers after hyperparameter tuning using a genetic algorithm. It was observed that the genetic algorithm performs better than grid search when it comes to hyperparameter tuning. Also, random forest was the best model among the conventional classifiers which gave 87% test accuracy. Boosting algorithms such as AdaBoost, XGBoost and CatBoost provide more efficient ways of building a classification model. These types of classification models are generally tuned up to the mark by default. So, hyperparameter tuning was not
160
A. Kinge et al.
Fig. 3 Proposed system
done on these algorithms. It was observed that the XGBoost classifier performed the best among the boosting algorithms which gave 95.25% test accuracy. Figure 7 shows the accuracy of the boosting algorithms.
4.2 Evaluation Metrics To analyse the performance of the classification models, a confusion matrix is chosen. A confusion matrix is a classification metric that summarizes the prediction results of the classification model. For binary classification, it contains True Positive (TP),
12 Comparative Study on Different Classification Models …
161
Fig. 4 Accuracies before hyperparameter tuning
Fig. 5 Accuracies after hyperparameter tuning (grid search)
True Negative (TN), False Positive (FP) and False Negative (FN). True Positive (TP) implies that the customer has churned, and it is predicted as churned. True Negative (TN) implies that the customer has not churned, and it is predicted as not churned. False Positive (FP) implies that the customer has not churned, but it is predicted as churned. False Negative (FN) implies that the customer has churned, but it is
162
A. Kinge et al.
Fig. 6 Accuracies after hyperparameter tuning (genetic algorithm)
Fig. 7 Accuracies of boosting algorithms
predicted as not churned. Accuracy =
TP + TN TP + FP + TN + FN
In Table 2, we can see that among all classification models, the XGBoost model has the best metric values. Hence, it was selected as the best model for the customer churn problem. Using True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN), a confusion matrix is created. Figure 8 shows the confusion matrix of the XGBoost model for prediction data.
12 Comparative Study on Different Classification Models … Table 2 Comparison of models based on their accuracies
163
Classifier
Accuracy(%)
Logistic regression
83.19
KNN
84.75
SVM
85.93
Naive Bayes
77.25
Decision tree
86.25
Random forest
87
SGD
82.81
AdaBoost
86.06
XGBoost
95.25
CatBoost
91.06
ANN
85.68
Bold indicates the classifier with highest accuracy
Fig. 8 Confusion matrix of XGBoost classifier
5 Conclusion Many banking and finance companies are having trouble predicting which customers are likely to abandon them. Churn prediction is a key problem in the banking and finance industry that has recently attracted the attention of numerous research scholars. In this research paper, we have done a comparative study of customer churn prediction using various machine learning classification models. When compared to other models, the experimental findings reveal that the XGBoost is the best classifier for the churn prediction problem in terms of all performance measures with 95.28% train accuracy and 95.25% test accuracy. Every customer expects the service provider to give good service or reward points. Enabling prompt services for valid clients is a more difficult challenge because predicting genuine customers for the organization is extremely challenging. By forecasting customer behaviour, early churn prediction helps prevent a company’s loss.
164
A. Kinge et al.
References 1. Xia G, He Q The research of online shopping customer churn prediction based on integrated learning. In: 2nd international conference on mechanical, electronic, control and automation engineering (MECAE 2018), Advances in Engineering Research 149 2. Mozer MC, Wolniewicz R, Grimes DB (2000) Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry. IEEE Trans Neural Netw 11:690–696 3. Au WH, Chan KCC, Yao X (2003) A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Trans Evolut Comput 7(6) 4. He B, Shi Y, Wan Q, Zhao X (2014) Prediction of customer attrition of commercial banks based on svm model. Procedia Comput Sci 31:423–430, ISSN 1877–0509, https://doi.org/10. 1016/j.procs.2014.05.286 5. Mishra A, Reddy US (2017) A comparative study of customer churn prediction in telecom industry using ensemble based classifiers, 721–725. https://doi.org/10.1109/ICICI.2017.836 5230 6. Petkovski A, Risteska Stojkoska B, Trivodaliev K, Kalajdziski S (2016) Analysis of churn prediction: A case study on telecommunication services in Macedonia, 1–4. https://doi.org/10. 1109/TELFOR.2016.7818903 7. Chen T, Guestrin C (2016) “XGBoost.” In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. n. pag Crossref Web 8. Sharma A, Dr. Kumar P, Panigrahi (2011) A neural network based approach for predicting customer churn in cellular network services. Int J Comput Appl 27(11):0975–8887 9. Ismail MR, Awang MK, Rahman MNA, Makhtar M (2015) A multi-layer perceptron approach for customer churn prediction. Int J Multimedia Ubiquitous Eng 10(7):213–222 10. Qureshi SA, Rehman AS, Qamar AM, Kamal A, Rehman A (2013) Telecommunication subscribers’ churn prediction model using machine learning. In: Eighth international conference on digital information management (ICDIM), 131–136 11. A., M (2020) Churn for bank customers. Kaggle. https://www.kaggle.com/mathchi/churn-forbank-customers 12. Siji George CG, Sumathi B (2020) Grid search tuning of hyperparameters in random forest classifier for customer feedback sentiment prediction. (IJACSA) Int J Adv Comput Sci Appl 11(9) 13. Wicaksono AS, Supianto AA (2018) Hyper parameter optimization using genetic algorithm on machine learning methods for online news popularity prediction. In: (IJACSA) Int J Adv Comput Sci Appl 9(12) 14. Lingaraj H (2016) A Study on genetic algorithm and its applications. Int J Comput Sci Eng 4:139–143 15. Shetye A, Tike R, Varun KV, Wadkar M (2017) A survey on applications and optimizations of genetic algorithm. Int J Eng Res Technol (IJERT) 06(03). https://doi.org/10.17577/IJERTV 6IS030112
Chapter 13
An E-healthcare System Using IoT and Edge Computing Nitish Gupta, Ashutosh Soni, Yogesh Kumar Gond, and Dinesh Kumar
1 Introduction People who are infected or suspected of being infected with the corona virus are currently either undergoing care in a hospital or being monitored at home via phone calls with healthcare professionals or are themselves reporting their clinical data. In the wake of COVID (and related diseases, where in view of their own protection and preventing the spread of the virus by keeping with self-isolation norms) quality doctors cannot be physically present to treat patients. Therefore, due to the lack of adequate monitoring and reporting facilities, the required treatments are not provided to patients in real time. For patients under quarantine or otherwise undergoing any other treatment, doctors may not have a way of automatically documenting and evaluating symptoms, critical health records, and monitoring health progression. Edge computing, as a supporting structure between end devices and cloud data, aims to move cloud resources and services to the network’s edge, allowing for faster service response [1]. In [2], it has been noted that the computing of boundaries and IoT capacities can be combined to develop a new ecosystem that helps develop information and communication fields. IOT provides an excellent opportunity for regionalized smart applications with edge computing [3]. As a result [4], the strain on data transmission through the network’s backhaul is significantly decreased. We managed to create a proof-of-concept (POC) for an e-healthcare system that integrates and visualizes clinical user data using IoT and edge computing. Measurements, such as temperature, are taken using certain sensors via Arduino (a microcontroller) that help medical practitioners to better understand the symptoms of an infection. To reduce latency, the data are then sent and filtered on an edge unit (here, a Raspberry Pi). This processed data are then submitted to a cloud platform for analN. Gupta · A. Soni · Y. K. Gond · D. Kumar (B) Computer Science And Engineering Department, MNNIT Allahabad, Prayagraj, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_13
165
166
N. Gupta et al.
ysis and generation of graph-based reports that help the healthcare professionals to optimize their medication process. On the other hand, patients can receive timely and relevant care.
2 Objectives In this work, an edge and IoT-based e-health monitoring system has been developed that aims to assist healthcare professionals by providing them a medium to remotely monitor the patients under their supervision in real time from anywhere in the world. This enables them to provide timely medication as per requirement. This also keeps them protected from getting caught in the pandemic and also allows in-home patient treatment thus lowering the burden on the hospital resources. We have set up our test bed on temperature sensor LM35 coupled with Arduino and Raspberry Pi 3 as our edge device along with Apache Edgent running on top of it. Further, the analysis of data is done on the IoT cloud platform powered by MATLAB, known as ThingSpeak. Our setup has focused and worked on the following: – Configuration and deployment of Raspberry Pi as the edge-IoT system. – Discovering Apache Edgent’s interface groups to communicate with back end of our external systems. – Establish the proof-of-concept test bed for IoT-based edge services. – Minimize data transmission to the IoT cloud analytics server.
3 Material Used This work uses various hardware and software resources as mentioned below:
3.1 Hardware Components Temperature Sensor LM35 LM35 is a standard industry temperature measurement chip for body temperature measurement for our proof-of-concept Arduino UNO It is a microcontroller board based on the ATmega328P integrated circuit. This uses communication and flow of information from LM35 to the EH-IOT system. Raspberry PI It supports the development of an edge-IoT system based on Edgent. It is used for the deployment of the local edge computing in this experiment.
13 An E-healthcare System Using IoT and Edge Computing
167
3.2 Software Components Apache Edgent It is a tool to develop open-source data analysis on the edge of the IoT scenario. In this study, its main purposes are: 1. 2.
Reduction to centralized IoT cloud analysis servers in data transfer amounts. Reduces the volume of data transfer stored in the IoT cloud repository.
Raspbian OS The Edgent on the top of the Raspberry Pi 3 is hosted with Raspbian OS. This study is used for the seamless use of EH-IoT in edge analyzation. ThingSpeak The ThingSpeak offers an effective and simple IoT cloud analytics platform to meet the required needs to analyze, view, and store sensed data.
4 Implementation 4.1 Collecting Sensor Data via Arduino Temperature Data using LM35 LM35 is a sensor for the temperature measurement range −55–150°C. It is a three-terminal device that offers the analog voltage which is temperature proportionate in a direct manner. The schematic diagram is shown in Fig. 1. A microcontroller can convert the output analog voltage into a digital form with ADC. By looking at the datasheet of the sensor, we found that LM35 outputs temperature in Celsius form. It increases the output by one for every 10 mV shift in temperature and outputs 1.5 V at maximum, as opposed to the Arduino’s 5 V. Thus, we have written the code for the Arduino accordingly. Fig. 1 Schematic Diagram
168
N. Gupta et al.
4.2 Setting up the Edge Device Our Edge device, i.e., Raspberry Pi 3 is a low power-consuming device that runs on Raspbian OS. Here, we need to perform preprocessing viz-a-viz filtering of data from Arduino. This is done with the help of Apache Edgent running on top of Raspbian OS. We referred to an article [5] for the installation of Raspberry Pi OS and setting up Apache Edgent on it. Raspberry Pi imager is an image utility tool by the Raspberry Pi foundation to write supported operating systems on the microSD card which can then be used to boot up the Pi.
4.3 Apache Edgent Edgent is an open-source edge device that allows you to analyze streaming information on your edge devices. As you look at the edge, you can: – Reduce the amount of data you transmit to your server for analytics – Decrease the amount of data you store Edgent speeds up application development [2] to push data analytics to cuttingedge devices and machine learning. It includes edge equipment, sensors, appliances, or vehicles that have connections to a network, like routers, gates, and computers. Edgent applications process data locally, for instance, in a car, on Android phones or Raspberry Pi, before sending data over a network. For instance, if your computer takes temperature readings 1,000 times per second from a sensor, processing the data locally is more efficient and sends only interesting or unexpected results over the network. Setting up Apache Edgent on Raspberry Pi The Edgent binary release can be installed from their Github repository [6]. Apart from the binary, it requires Java8 for its execution and Gradle for building the files. We have followed an IBM article for installation and build on Raspberry Pi, a glimpse of which is shown in the Fig. 2. Working of Apache Edgent A stream is the fundamental building block of an implementation based on Edgent: a continuous sequence of tuples (messages, events, sensor readings, and so on). The Edgent API can process or analyze each tuple when the tuple appears on a stream, which produces a stream derived from the tuple. Source streams are streams that generate data, such as readings from the temperature sensor of a computer for analysis. Streams are terminated using sink functions to monitor local devices or to send information to outside services, such as central analytical systems, via a message center. The primary API of Edgent is functional where streams are sourced, converted, analyzed, or sinked through features, usually expressed as Java8 lambda expressions [6].
13 An E-healthcare System Using IoT and Edge Computing
169
Fig. 2 Apache Edgent running on Raspberry Pi
The structure of basic Edgent applications is the same: – Find a provider. – Build the topology and the processing graph for it. – Send the topology to be executed. Multiple topologies can be created, started, and shut down by means of external application in more sophisticated applications.
4.4 ThingSpeak ThingSpeak is an IoT analytics framework that allows you to aggregate live cloud data streams, view them, and analyze them. You can send data to ThingSpeak from your devices, instantly display live data, and send alerts. The three main thumb rules behind ThingSpeak’s work are as follows: 1. 2. 3.
Collect—send the sensor data to the cloud privately. Analyze—analyze the data with MATLAB and visualize it. Act—cause a response
Setting up a ThingSpeak Channel ThingSpeak is the platform owned by the MathWorks foundation, the parent organization of MATLAB. To use ThingSpeak, we created a new account on Mathworks which was available for free usage. Channels store all data collected by ThingSpeak. Each channel has eight data fields, plus three for location data and one for status data. You can use ThingSpeak applications to analyze and visualize data in a channel once you collect data. We have set up a temperature channel to visualize temperature data as shown in Fig. 3. MQTT and ThingSpeak MQTT is an architecture of publication/subscription designed primarily to connect bandwidth and power-resistant devices over wireless networks. The protocol is lightweight and simple and runs on TCP/IP or Web sockets. Over WebSockets, SSL can secure MQTT. The publication/subscription architecture allows the server to scan message continuously to the client device.
170
N. Gupta et al.
Fig. 3 Setting up temperature channel in ThingSpeak
The MQTT broker is the main point of contact and sends all messages between senders and recipients. Any device connecting to the broker can publish or subscribe to the informative subjects. This subject contains the routing data of the broker. Every client who wants to send messages publishes them on a particular subject, and every customer who wishes to receive messages registers for a specific topic. All messages will be sent by the broker to the relevant clients. ThingSpeak has an MQTT broker at the URL mqtt.thingspeak.com and port 1883. The ThingSpeak broker supports both MQTT publish and MQTT subscribe as shown in Fig. 4. MATLAB Analysis The MATLAB is MathWorks’ programming language and multichannel computing environment. MATLAB enables the manipulation of the matrix, function tracking, data execution, algorithms, user interface, and other language programs interface. With the ability to use MATLAB code in ThingSpeak, one can analyze the live stream of data as it gets collected in a channel.
13 An E-healthcare System Using IoT and Edge Computing
(a) MQTT Publish
171
(b) MQTT Subscribe
Fig. 4 MQTT publish and MQTT subscribe
4.5 Working of Algorithms The workflow of our algorithms for various stages of data flow is described via the flowcharts as shown in Figs. 5 and 6. Data from Arduino to Raspberry PI to Edgent
Fig. 5 Data source and data sink
172
N. Gupta et al.
Fig. 6 Filtering mechanism
Data Filtering in Edgent
5 The Detailed Experimental Set-up Our demo configuration starts with sensing and transferring body temperature to the Arduino as the data source. The raw data are transmitted to a Raspberry Pi 3 edgeIoT device based on Edgent, which performs filtering requirement, i.e., temperature information only in the range of 96F to 99.5F is useful, which is then transmitted to the IoT cloud server (ThingSpeak) for MATLAB analysis.
13 An E-healthcare System Using IoT and Edge Computing
173
Fig. 7 Actual image of our EH-IoT setup
In particulars, the JSerialComm, the Java library, processes the transmission of data and serial port selection in Raspberry Pi. The USB port is used to connect Arduino UNO with the Raspberry Pi. The A1 analog pin of Arduino is connected to the LM35 sensor output. Besides, the Eclipse IDE on the Raspberry Pi3 carries out a Java program that includes the main method. JSerialComm library manages serial communication between the Arduino and the Java program. When running the main Java program, it creates an instance of the DataSource program to collect data and feed it to the main Java program from the Arduino. Then, useful data must be filtered out for cloud transmission. The filter method is used for TStream objects, where a filter predicate determines which tuples to use lambda expressions for further processing. Through these libraries and platforms, we have produced the result our work seeks to achieve. The setup is shown in Fig. 7.
6 Results The results obtained in our test demo setup are displayed the figures.
6.1 Raw Temperature Data Sensed via Arduino The temperature data sensed via the LM-35 sensor are outputted on the serial monitor of the Arduino IDE as shown in Figs. 8 and 9.
174
N. Gupta et al.
Fig. 8 Actual image of interfacing LM35 with Arduino
6.2 Data Filtering using Edgent The outliers sensed via the LM35 sensor is filtered out, and only, the concerned and useful data are sent to the cloud for analysis. The filtered output is shown in Figs. 10 and 11.
6.3 MATLAB Analysis The graph in Fig. 12 depicts the aggregate of 20 temperature readings in degree Celsius taken at an interval of 1 minute. This is a sample processing of temperature data using MATLAB in-built mathematical algorithms.
7 Analysis Our general analysis results promisingly in relation to our test scenarios. Our application works on any Java version 8 device. The normal temperature range for a healthy human body ranges from 96F to 99.5F while the body is regarded as abnormal or is in a state of fever outside this temperature range. The sensor readings are processed
13 An E-healthcare System Using IoT and Edge Computing
Fig. 9 Sensed data from LM35 on Arduino monitor
Fig. 10 Filtered temperature data using Apache Edgent
175
176
N. Gupta et al.
Fig. 11 Channel on ThingSpeak setup for temperature analysis
Fig. 12 Temperature (Celsius) versus time graph generated using MATLAB on ThingSpeak
by the edge device itself when the patient body temperature is normal, i.e., data are not uploaded into the cloud. In other cases, when the temperature readings go beyond the range, data is uploaded to cloud. The study on edge IoT aims to reduce workload and bandwidth on the cloud. The way to significantly reduce bandwidth is by sending selected data values into the cloud.
8 Conclusion In IoT applications, where instant data processing is needed, edge computing plays an important role [7]. A new case study was proposed and developed while using the Edgent platform. We were able to successfully set up our edge-based e-healthcare system physically with continuous support from our mentor. Our novel EH-IOT concluded with the results supporting our claim of latency reduction and reducing network traffic. We explored a plethora of new technologies such as Apache Edgent, Arduino, Raspberry Pi, ThingSpeak (IoT cloud). We were successful in integrating these technologies to devise a useful and scalable solution for E-healthcare.
13 An E-healthcare System Using IoT and Edge Computing
177
References 1. Bilal K et al (2018) Potentials, trends, and prospects in edge technologies: fog, cloudlet, mobile edge, and micro data centers. Comput Netw 130:94–120. https://doi.org/10.1016/j.comnet.2017. 10.002 2. Ray PP et al (2019) Edge computing for internet of things: a survey, e-healthcare case study and future direction. J Netw Comput Appl 140:1–22. https://doi.org/10.1016/j.jnca.2019.05.005 3. Joshi SA et al (2018) Home automation system using wireless network. In: Proceedings of the 2nd international conference on communication and electronics systems, ICCES 2017. https:// doi.org/10.1109/CESYS.2017.8321195 4. Kumar D et al (2021) IoT services in healthcare industry with fog/edge and cloud computing. In: IoT—based data analytics for the healthcare industry, pp 81–103, Academic Press. https:// doi.org/10.1016/b978-0-12-821472-5.00017-x 5. Setting up on Apache Edgent my Raspberry Pi 3—IBM Developer Recipes, https://developer. ibm.com/recipes/tutorials/setting-up-apache-edgent-on-my-raspberry-pi-3/ 6. Edgent Incubation Status-Apache Incubator, https://incubator.apache.org/projects/edgent.html 7. Singh M, Baranwal G (2018) Quality of service (QoS) in internet of things. In: Proceedings— 2018 3rd international conference on internet of things: smart innovation and usages, IoT-SIU 2018. https://doi.org/10.1109/IoT-SIU.2018.8519862 8. Kumar A et al (2020) NearBy-offload: an android based application for computation offloading. In: 2020 IEEE 15th international conference on indus trial and information systems, ICIIS 2020—proceedings, pp 357–362. https://doi.org/10.1109/ICIIS51140.2020.9342724
Chapter 14
Autism Detection Using Machine Learning Approach: A Review C. Karpagam
and S. Gomathi a Rohini
1 Introduction The future well-being of a society predominantly lies in the health and education of children. Similarly, mental wellness also forms a row along. Proper support and attention are instantly required for children suffering from mental illness. Here, we focus on autism one such disorder that affects mental ability and communication at early stage. The main purpose of this work is to find significant improvements made in recent years and a detailed review to detect autism disorder using the machine learning approach. Considering this as our primary findings, we will focus on expanding our research by concentrating on implementation in the future. The contents of this paper are organized as: Sect. 1 presents the introduction. Section 2 is about autism spectrum disorder and the importance of applying machine learning techniques to detect at the earliest. Section 3 presents the theoretical background concluded by researchers on reviewing research articles. Section 4 enlists the detailed review of various methodologies applied in autism detection using ML approach. The dataset, classifier, outcome and various other particulars of each article are enlisted in Tables 1, 2, 3. A summary of each year improvements is appended. Section 5 illustrates and discusses the most popular algorithmic techniques applied in the research world and the collection of papers considered for the review. Section 6 is followed by a conclusion and future study.
C. Karpagam (B) Department of Computer Science, Dr. N. G. P. Arts and Science College, Coimbatore, India e-mail: [email protected] S. G. Rohini Department of Computer Applications, Sri Ramakrishna College of Arts and Science College, Coimbatore, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_14
179
180
C. Karpagam and S. G. a. Rohini
Table 1 Particulars of machine learning approaches in ASD research S. no
Author
Year
Classifiers
Dataset
Remarks/Accuracy
References
1
Matthew et al.
2016
RF
1162 instances from Georgia ADDM site
86.50%
[37]
2
Wenbo et al.
2016
SVM
Eye movements of children
88.51%
[38]
3
Fuad et al. 2016
KNN, SVM methods: bagging, RSM and FSC
BPNN, gene data—CGH data obtained from combiner five chromosomes
NN classifier when combined using FSC gives better accuracy
[39]
4
Jianbo et al.
2016
NLP
Semi-structured and unstructured data 83.40% collected from medical documents
83.40%
[40]
5
Sebastien et al.
2017
DT, RF, LR, AB and 13 more
Phenotype data from public data repository
Obtained ROC [16] curve is greater than 0.9 with limited features
6
Duda et al. 2017
ENet, LDA
Anonymous responses obtained using crowdsource recruitment method
ENet and LDA models exhibited the best classification performance
[41]
7
Baihua et al.
2017
SVM, NB
40 kinematic measures from 8 different imitation conditions were analyzed
86.7% by applying SVM
[42]
8
Altay et al.
2018
LDA, KNN
292 samples with 90.8% accuracy by 19 different applying LDA attributes are collected using mobile application
[43]
9
Ayse
2018
SVM, KNN, RF
104 adolescent 100% accuracy data from UCI using RF machine learning repository
[44]
10
Mofleh
2018
FURIA
509 instances and 91.40% 24 variables using ASD screening mobile application
[45]
(continued)
14 Autism Detection Using Machine Learning Approach: A Review
181
Table 1 (continued) S. no
Author
Year
Classifiers
Dataset
Remarks/Accuracy
References
11
Murat
2018
NB, LR, RF, Polynomial SVM, linear SVM, KNN
lncRNA gene The proposed Bayes [18] dataset form network model can BrainSpan Atlass handle the unlabeled data effectively with 78.31% of accuracy
2 Background Study on ASD AND ML Autism children face difficulty in communicating and interacting with individuals socially [1]. Children with autistic traits also find trouble in adapting to a new or unfamiliar situation. Interpreting communication and responding in a flow is another complex nature of them. Other problems are delayed speech, poor eye contact, lack of effort in problem-solving, restrictive and repetitive behavior. Autism is a spectrum disorder; it means the severity level differs from one child to the other. The levels are commonly classified as following [2], or we can simply state in na¨ıve man terms as mild, average and severe. – Requires support – Requires substantial support – Requires very substantial support Autism begins in early childhood and eventually continues by disrupting the whole life span of the individual. Common causes influencing autism are environmental factors, genetics, children born to older parents and a few more under consideration [3]. David [4] examined that various risk factors influencing ASD are maternal infection, maternal antibodies, drugs used during pregnancy and postnatal factors. As the exact cause of autism is still unknown [5], the ongoing research process aims to identify the prominent reason. Systematic screening tools are designed to detect autism, and it provides valuable results during an assessment. Tanja et al. [6] identified 46 diagnostic instruments in ASD and specified the particulars of all in detail. In general, the diagnostic tools vary depending on the age group, i.e., toddler, child, adolescent or adult. The outcome of the assessment guides the psychologist to begin the therapy session. When the child has not exhibited any developmental milestone at a particular age, then that is the right time to start diagnosis. Commonly, children are diagnosed at the age of 23–210 months [7], and earlier screening than this age limit might cause inaccuracy in prediction. Conventional methods like a blood test or physical examination will not help identify autistic traits [8]. Clinical diagnosis using authenticated tools by a trained physician is the only remedy to detect autism. But, standard methods and procedures
182
C. Karpagam and S. G. a. Rohini
Table 2 Particulars of machine learning approaches in ASD research S. Author no
Year Classifiers
Dataset
12 Vaishali et al.
2018 NB, J48 DT, 292 instances from UCI 97.95% accuracy SVM, KNN, ML repository with SVM MLP
[46]
13 Victoria et al.
2018 LR classifiers
74%
[47]
14 Erkan et al.
2019 KNN, SVM, UCI database RF
100% accuracy with RF
[11]
15 Mariia et al.
2019 BN
Results displayed in the form of probabilistic measure
[48]
16 Tania et al.
2019 FDA, C5.0, 2009 records from GLMBoost, Kaggle and UCI ML LDA, MDA, repository PDA, SVM and CART
71 participants eye movement were recorded using an eye tracker
Dataset was obtained from its respective author
Remarks/Accuracy References
98% accuracy with [49] SVM
17 Elizabeth 2019 GMM, HC et al.
2400 samples from 7822 clinical record were used from de-identified archival database
Analysis found a [50] hierarchy of five distinct subgroups representing different degrees of severity across developmental domains
18 Abdullah 2019 RF, LR, et al. KNN
704 instances and 20 features
97.541% accuracy by applying LR with 13 selected features
19 Milan et al.
2019 KNN, nonlinear SVM, DT, LR, SSAE, RF
851 samples from Autism Brain Imaging Data Exchange (ABIDE)
Neural network [52] model outperformed other with accuracy of 62%
20 Thabtah et al.
2019 Rules machine learning
Data collected by a mobile application—ASDTests during Sep 2017 and Jan 2018
The proposed model exhibited high accuracy compared to other ML techniques used
[51]
[14]
Author
Thabtah et al.
Lee et al.
Luke et al.
Suman et al.
Jaber et al.
Erina et al.
Maitha et al.
Wingfield et al.
S. no
21
22
23
24
25
26
27
28
2020
2020
2020
2020
2020
2019
2019
2019
Year
Dataset Using a mobile application—ASDTests
Remarks/Accuracy RF and NB–SVM achieved more than 87% mean accuracy
21 attributes to cover 704 instances from adult autism UCI dataset
UCI repository
M-CHAT-R data of 14,995 toddlers (age 16–30 months)
DT, NN, NB, SVM, ADB and more
SVM, XGBoost, AB, CVB, NN, RF, NB, RF–GBM 228 samples collected using Pictorial 2 Autism Assessment Schedule (PAAS) checklist
Data was collected through a mobile application
KNN, SVM, RF, backpropagation, DL Two datasets for children and adolescent—UCI repository
CBA, CMAR, MCAR, FACA, FCBA, ECBA, WCBA
NB, SVM, LR, KNN, NN, CNN
FNN
References
[56]
[55]
[54]
[53]
RF was the optimal classifier with 98% accuracy
NN model with 99% accuracy
RF algorithm with the highest performance
(continued)
[12]
[59]
[58]
WCBA outperformed all other with [57] 97% accuracy
Accuracy of 99.53%, 98.30%, 96.88% on 3 dataset using CNN
Crossing 99%accuracy on different subgroups divided
SVM, LSA, NB–SVM, MNB, RF, and Evaluations written by clinical experts Crossing 99%accuracy on different 3 more during the year 2006, 2008 and 2010 subgroups divided
LR
Classifiers
Table 3 Particulars of machine learning approaches in ASD research
14 Autism Detection Using Machine Learning Approach: A Review 183
Charlotte et al.
29
30
2020
2020
Year
SVM
SVM, RF, AB
Classifiers
Confidential dataset
1054 records of the children dataset from 12 to 36 months
Dataset
Remarks/Accuracy
A reduced subset of five behavioral features showed good specificity (83%) and sensitivity (71%)
AB gains more accuracy
References
[61]
[60]
Abbreviations of terms used in: support vector machine (SVM), random forest (RF), na¨ıve Bayes (NB), logistic regression (LR), K-nearest neighbor (KNN) and neural network (NN). Backpropagation neural network (BPNN), random subspace method (RSM), feature selection-based combiner (FSC), comparative genomic hybridization (CGH), natural language processing (NLP), receiver operating characteristic (ROC), decision tree (DT), AdaBoost (AB), ElasticNet (ENet), linear discriminant analysis (LDA), fuzzy unordered rule induction algorithm (FURIA), multi-layer perceptron (MLP), Bayesian networks (BN), flexible discriminant analysis (FDA), boosted generalized linear model (GLMBoost), mixture discriminant analysis (MDA), penalized discriminant analysis (PDA), classification and regression trees (CART), Gaussian mixture models (GMM), hierarchical clustering (HC), stacked sparse auto-encoder (SSAE), latent semantic analysis (LSA), naive Bayes–SVM (NB–SVM), multinomial naive Bayes (MNB), feedforward neural network (FNN), convolutional neural network (CNN), classification Based on multiple class association rules (CMAR), classification based on association rules (CBA), fast associative classification algorithm (FACA), enhanced CBA (ECBA), weighted classification based on association rules (WCBA), fast classification based on association rules algorithm (FCBA), deep learning (DL), CV boosting (CVB), adaptive boosting (ADB)
Author
Mary et al.
S. no
Table 3 (continued)
184 C. Karpagam and S. G. a. Rohini
14 Autism Detection Using Machine Learning Approach: A Review
185
are time-consuming [9] because answering a checklist or questionnaire, scoring and evaluation take time to interpret the results. One in 68 children is affected [10]. According to the WHO’s statistical data, 0.63% of children are diagnosed with autism spectrum disorder [11]. Autism affects approximately 3.8 children per 1000 in the UK [12], 1 in 55. In Japan, 1 in 65 in Ireland, 1 in 94 in Canada, 1 in 167 in Belgium and 1 in 196 in Norway [13]. The average waiting time in the USA to get an autism diagnosis is 3 years [14]. This statistical measure shows a discrepancy in low- and middle-income countries like India and Sri Lanka. The low ratio is due to a lack of sufficient knowledge in identifying the risks of autism. Lydia et al. [15] contributed a systematic review analysis on 28 studies for screening ASD in low- and middle-income countries. The author reports cultural adaptation and community collaboration to strive beyond the translation of procedures in regional languages. Autism is diagnosed in one in 42 boys compared to one in 189 girls [16]. Four times more autism is affecting boys compared to girls. Rachel et al. [17] does a systematic review on the male–female ratio in ASD and declares the value is not 4:1; it is closer to 3:1. The most common risk factor in autism is the children who have a sibling with autistic traits are at higher risk of developing ASD. Studies prevail that autism risk in siblings is 25% [18]. In a recent study, the genetic influences in ASD are estimated to be approximately 80% [19]. Considering the above particulars, several kinds of research were undergone to detect autism at an early stage in children in recent years [20, 21]. Early detection is essential to reduce the risk factors and provides a healthy environment for the children to sustain in a competitive world. Most of the autistic kids were not diagnosed till the age of two, and this time-lapse in diagnosis leads to disability that affects the whole life span of the children. Due to lack of knowledge in low-income parents, the diagnosis process is further extended till the child reaches school at the age of five or six. If no proper attention is given, then a permanent lack of social skills is often accompanied by a widespread deterioration of interaction. In the past few years, the new technologies and innovations in science have brought a huge breakthrough in diagnosing a disorder ahead. Emerging research methods in machine learning bear out promising results in solving major disorders before it turns into a complex factor [22]. Machine learning makes accurate predictions or classifications on large dataset. ML algorithms are capable of learning a model with an increasing amount of data [23]. So, whenever a new data enters into the set, it automates the decision-making tasks and provides high accuracy in classifying the resultant. Machine learning is mainly classified into three techniques; they are supervised, unsupervised and reinforcement learning [24] [25]. – Supervised learning is the basic type of ML algorithm in which the model is trained on labeled data. It is mainly focused on classification types of problems, where the data to be labeled accurately.
186
C. Karpagam and S. G. a. Rohini
– Unsupervised learning is another technique in which we work on the unlabeled dataset. Clustering is the most commonly used technique, which involves detecting groups in the input data. Compared to the previous learning, this works without an output variable. – Reinforcement learning learns by experience. The best example is the YouTube recommendations to the user based on their interests. This learning is used when there is no fixed dataset. Machine learning techniques play a key role in the accurate analysis of data in the health sector. Identifying an appropriate algorithmic technique is a crucial job. Choosing a precise algorithm for the given dataset in ML depends on the application problem [22]. According to the available data, the algorithm’s efficiency and performance might vary. A detailed study on the problem and dataset facilitates to target the desired algorithm and output. Some other characteristics associated with this are the number of instances, categorical attributes, null values, etc. Popular supervised machine learning classification algorithms are support vector machine (SVM), random forest (RF), na¨ıve Bayes, logistic regression, K-nearest neighbor (KNN) and neural network. Other commonly used algorithms are depicted in Fig. 1. In basic, the machine learning technique receives the input data and analyzes and predicts the output. The different modality of inputs used by researchers is video (children gestures) [26, 27], audio (recorded voice) [28, 29], image (brain and posture images of children) [30–32] and text (questionnaire answered by parents or caretakers) [7, 33]. The main goal of the survey is to identify significant improvements in the field of research particularly machine learning approaches to detect ASD in children at an early stage. In this paper, a complete review of recognized ASD machine learning approaches is presented and attested with evaluation metrics. We aimed to focus on articles published during the year 2016–2020.
3 Related Work N number of advancements and new improvements in the medical field emerge day by day. Concurrently, the sheet is balanced with a growing number of new disorders and viruses spreading across the universe. Machine learning techniques contribute a significant part to manage and process a huge amount of data in the health sector. In recent years, many articles have been published in diagnosing autism using machine learning approaches. Very limited articles are present in reviewing the advancements made by researchers in ASD diagnosis. Hyde et al. [34] investigated 45 research articles that present a comprehensive review on applications of supervised machine learning in autism detection. The author initially highlights the emergence of big data in the modern world, and therefore , machine learning techniques provide a new pavement in predicting solutions
14 Autism Detection Using Machine Learning Approach: A Review
187
Fig. 1 Machine learning techniques
to complex problems. Later, a detailed review on machine learning algorithms is comprised, and the information provided was a great resource for analysis and future enhancement. Hosseinzadeh et al. [35] reviewed 28 articles that focused on diagnosing autism disorder and how to improve the lifestyle of autistic children with the support of Internet of Things (IoT) devices. The author depicts the details of many approaches analyzed, techniques reviewed, evaluation metrics, distribution of articles among different publishers and sensors used for ASD detection. Thabtah et al. [36] examined and critically analyzed recent machine learning studies on autism detection. The author even addressed the issues such as the unavailability of processed data and imbalanced datasets. Due to such concern, ML algorithms take minimal participation in autism detection. The team of people concludes that SVM might provide promising results due to its high predictive power.
188
C. Karpagam and S. G. a. Rohini
4 Machine Learning Techniques in ASD Research Matthew et al. [37] experimented with a machine learning approach by extracting the text contained in evaluations to discriminate between autistic and non-autistic children. Initially, the task is to determine the words that are most important, and later, random forest algorithm was built to perform classification that results in a good performance. The author concludes that this type of text-based machine learning techniques would be useful in various health sectors to intervene in the documents. Wenbo et al. [38] analyzed the eye movement data and explore the possibility of using an ML algorithm to identify ASD. Children were trained to memorize faces and later tested to identify the same with more samples. During the examination process, the eyeball movements of individual were captured, and feature extraction methods are applied to avoid repetition in parameter values. The author shows promising findings by using the SVM technique that results in a better outcome. Fuad et al. [39] designed an automated system to speed up the autism detection process using gene data. Earlier autism detection was based on behavioral traits, and in this paper, the researcher proposed a novel approach to determine ASD using gene data that gives better performance in a short duration. Jianbo et al. [40] classified semi-structured and unstructured medical documents to detect ASD. By applying natural language processing technique, the proposed model yields promising results, and the author suggested that the same approach can be applied to other types of health-related disorders. Sebastien et al. [16] initiated an effort to minimize the time complexity of autism detection through five different classification models. To improve the robustness of the model, nested cross-validation was performed that identifies the predictive features. With reduced 10 feature set, SVM, decision tree and logistic regression give better result around 93%, whereas with reduced 5 feature set, SVM, LDA and logistic regression give better result around 88%. The team of researcher’s aims to extend their work in the future studies to improvise the accuracy with a balanced dataset. Duda et al. [41] stated the bottlenecks that arise in differentiating ASD and ADHD. As both the disorders related to behavioral symptoms, it was hard to recognize and distinguish them. In the experiment conducted, the author demonstrated the variation between the disorders using only 5 items where the earlier method utilized 65 items. The researchers conclude that compared to earlier models, ENet and LDA exhibit high performance. Baihua et al. [42] investigated autistic traits using kinematic features as a new attempt in classifying the disorder. The author believes that this is the first attempt in the field of research to detect autism using kinematic parameters. Altay et al. [43] applied linear discriminant analysis (LDA) classifier and Knearest neighbor for autism detection in children aged 4–11 years. The performance is measured by considering the factors such as accuracy, sensitivity, specificity, precision, F-measure for both the algorithms. The author concludes the result with 90.8% accuracy using the LDA classifier.
14 Autism Detection Using Machine Learning Approach: A Review
189
Ayse [44] has proven a high accuracy rate of 100% using the random forest ML algorithm. The dataset and implementation methods are detailed with precise values. The study applies tenfold cross-validation to remove parameter dependency. The author also states that the lowest performance is delivered by the KNN algorithm. Mofleh [45] adopted a fuzzy data mining technique to successfully distinguish an individual with autism. The proposed model was implemented using the fuzzy unordered rule induction algorithm (FURIA) and compared the evaluation metrics with JRIP, RIDOR and PRISM. The author concludes that the key strength of the model is easy interpretation to family members where no clinical assistance is required in admitting it. Murat [18] proposed a new learning model using lncRNA gene data for autism risk prediction. The proposed approach yields better results compared to standard classifiers. The author aims to extend the study by the ensemble of more classification models in the future. Vaishali et al. [46] aimed to propose a model with feature reduction to improve the performance. The researcher’s implemented the model using five different machine learning algorithms in two different methods and presented a comparison review. The first method is applied before feature reduction (21 features), and later, the new model was proposed (using ten features) with binary firefly feature reduction algorithm. Final results were interpreted that the latter technique provides better results. Victoria et al. [47] aimed to detect autism using the eye gaze of adults when they are searching for information on a webpage. Several logistic regression classifiers were trained and finally concluded that autism can be detected using eye-tracking data with a confident accuracy. Erkan et al. [11] focused on three machine learning algorithms for early detection of ASD. The entire workflow of the proposed model was depicted using a flow diagram. The author clearly states that early detection is possible and accuracy will result in a higher rate if the number of data samples used is large in number. Five sets of experiments were evaluated with different splits of train-test data. The results confirm the best accuracy by using the random forest technique Mariia et al. [48] proposed a new model of static Bayesian network for predicting autism in children. The team of people collaborated with pediatricians and subject experts to quantify the variables used. The final results were predicted in the form of probabilistic measure, i.e., presence and absence of autism in percentage. This measure helps in diagnosing kids who are required to be examined further for treatment. Tania et al. [49] illustrated ASD prediction using a different set of classifiers for toddler, child, adolescent and adult datasets. The team of researchers applied feature transformations (FT) methods like sine and Z-score that yield the best results. A collection of 250 classifiers was applied to the dataset with three different FT methods such as log, scale and sine. The final results of the approaches are exhibited by using nine classifier machine learning methods, and the same was presented in a structured format. The outcome shows that when machine learning algorithms are properly optimized, it provides efficient results. Elizabeth et al. [50] made a significant contribution to determine the low-level details in autism detection. The authors had moved one step forward compared to
190
C. Karpagam and S. G. a. Rohini
current research developments. The focus is not limited to detecting autism but also extended to determine the severity of the domain in which the children are affected. The developmental behavior domains concentrated are language, social, cognitive, motor skills, etc. The main motive is to assist the clinician in beginning the therapy session. Abdullah et al. [51] examined the performance of three machine learning algorithms that are random forest, logistic regression and K-nearest neighbors. The author concentrated on improving the effectiveness of the proposed model with a minimum number of features in the dataset. The feature selection methods applied are chisquare and least absolute shrinkage and selection operator (LASSO). As expected, the outcome was exceeding compared to earlier findings with 97.54% of accuracy with 13 features by applying the logistic regression model. Milan et al. [52] particularly concentrated on personal characteristic data (PCD) extracted from Autism Brain Imaging Data Exchange (ABIDE) database. The team of researcher’s focused on emphasizing the strong predictive power of PCD in determining neurodevelopmental disorders. Results are shown using nine machine learning models with six features. The better outcome of this comparative analysis shows that neural network model gives 62% of accuracy. Thabtah et al. [14] propose a new method called rules machine learning (RML) which is based on a least applied ML classification technique called covering. The author declares that the new method is much efficient compared to other existing ML techniques. Experiment results show that derived RML methods have lower error rates and higher specificity, harmonic mean and sensitivity rates compared to other considered ML algorithms. Another prominent work by the author is autism classification using logistic regression analysis. An empirical analysis is implemented and yields a notable good performance in detecting autism in adults and adolescents. Further extension is made to improvise the results by applying in-depth feature analysis using information gain (IG) and chi-square testing (CHI) [53]. Lee et al. [54] compared eight supervised machine learning algorithms performance across ten random train-test splits. The team of authors presented a predominant work on classifying autism disorder in clinicians written documents during the years 2006, 2008 and 2010. The unstructured text classification model was implemented by applying machine learning algorithms. The research outcome is that random forest, SVM and na¨ıve Bayes achieved nearly 87% of accuracy. Luke et al. [55] primarily focused on ASD detection in toddlers by considering their socio-demographic differences as a prime factor. The dataset was subdivided into groups such as race, gender and maternal education that play a major role in understanding the questionnaire. The author developed the model by applying the feedforward neural network (FNN) ML method to improve the efficiency of the result. The overall correct classification percentage of each subgroup reaches close to 100%, which exceeds the current evaluation methods. Suman et al. [56] aim to propose an approach using machine learning techniques for autism detection. The author concentrated on three different datasets that consist of adult, adolescent and children data. Finally, the evaluation of results is noted with 98% of accuracy that is attempted using various machine learning and deep learning
14 Autism Detection Using Machine Learning Approach: A Review
191
techniques. Convolutional neural network is strongly suggested by the author that effectively optimizes the accuracy of the outcome. Jaber et al. [57] presented a comparative method of seven association classification (AC) algorithms and exhibited the performance of each one of them. The experimental study displays the evaluation results in the form of accuracy, F-measure, recall and precision which are precisely illustrated. The author concludes that the WCBA algorithm outperforms the remaining AC algorithm techniques with a high level of accuracy. Erina et al. [58] compare the performance of machine learning methods with the results obtained in DSM-5. The machine learning algorithms applied are K-nearest neighbors (KNN), support vector machine (SVM), random forest, deep learning and backpropagation. The author uses kappa statistic to increase the accuracy level of the classification results. A different set of parameters is applied in each algorithm, and the results are illustrated using a bar chart. A high level of accuracy is obtained using the random forest algorithm with the highest kappa statistic value 1. Maitha et al. [59] focused on providing the best machine learning model in diagnosing ASD in children. The author used three different datasets that apply to toddler, children and adolescent with three different sets of questionnaires. This exhibits the importance given to different age group of kids as their perception level differs. The team added up their efforts in choosing the appropriate algorithm for evaluation. The final results are revealed and concluded that the neural network model gives the best performance in all three databases. Wingfield et al. [12] examined the effects of autism disorder in low- and middleincome countries. The model proposed by the author displays a significant performance improvement in identifying ASD using the machine learning approach. The random forest classifier demonstrated in the article provides notable accuracy with a true positive rate of 88%. Mary et al. [60] mentioned that the proposed algorithm yields high accuracy and works efficiently with a large dataset. The AdaBoost gains better performance compared to SVM and random forest and also less affected by overfitting problems. Charlotte et al. [61] extended their previous work by focusing on autism detection in adolescents and adults. The performance compared to the previous study is the same, but they made a significant contribution in reducing the subsets with five features applying SVM a machine learning technique. The efforts applied resulted in a total feature reduction of 84% compared to the original autism diagnostic observation schedule (ADOS).
4.1 In Summary from 2016 to 2020 A total of seven papers are considered in the year 2016 and 2017. Due to the unavailability of text datasets during the initial stages, researchers mainly focused on other modalities such as audio, image and video. The main components measured
192
C. Karpagam and S. G. a. Rohini
are acoustic features, brain images, gene analysis [62], eye movements, semistructured/unstructured clinical reports and crowdsourced data. SVM and NLP methods are employed in major, and 80% of mean accuracy is obtained. In 2018, [63] Fadi Thabtah made a significant contribution in designing a mobile application which consists of ten screening questionnaire to perform validation. And, the instances collected were shared through a public repository which paves a path to a huge number of articles and implementations. Autism detection is a time-consuming process that can be administered only by a trained professional. Compared to standard diagnosis tools and methods in detecting autism, these types of new datasets helped the researchers to focus on working with reduced features with more efficiency. In 2019, a huge number of articles are published using machine learning approaches due to the availability of dataset in the UCL repository. Another notable improvement is the accuracy of results using feature extraction methods. Several supervised machine learning classification algorithms were employed which outperform the earlier findings. Artificial intelligence brought a big break through in research during 2019. Amazing innovations in reinforcement learning and neural networks will probably shape the future with more machine learning technologies. In 2020, boosting and bagging methods are dominating with high performance. The dataset values are considered from a different source of information. Due to the rapid development of smartphones and mobile application, the accumulation of dataset values is not anymore a difficult task. A huge count of productive implementation was done using the ML approach. Now, the accuracy in prediction crosses 90%. Not limited to ML, deep learning, artificial intelligence, Internet of Things are few other fields that simultaneously work in assisting autism detection. We collected the peer-reviewed articles using Google Scholar, PubMed, IEEE Xplore and ScienceDirect databases. The combinations of searching keywords are “autism spectrum disorder”, “machine learning approach”, “diagnosis and detecting ASD”, supervised machine learning”, “classification techniques”, “literature review of ASD”, “early detection of ASD” and “screening tools and methods”. Nearly, 75 articles were reviewed, and finally, 30 of the most relevant papers were extracted that mainly focus on machine learning approaches to detect autism disorder. The papers discussed in this article were published during the years 2016 to 2020. Figure 2 depicts the publication of articles reviewed year-wise.
5 Discussion In this paper, we have reviewed the methods and approaches of 30 articles. The percentage of commonly used classifiers in the reviewed articles is depicted in Fig. 3. SVM technique takes the highest usage value among them. It is a popular supervised machine learning technique that can be applied to both classification and regression problems. SVM aims to operate on decision boundaries that separate the data point into classes. It is also effective for high dimensional data. Another major reason for its
14 Autism Detection Using Machine Learning Approach: A Review
193
Fig. 2 Number of articles reviewed from 2016 to 2020
Fig. 3 Percentage of classifiers in the reviewed articles
popularity is that it also acts as a prime classifier for unstructured and semi-structured documents. Next in the range is random forest, where 40% of the enlisted articles applied this algorithm. It is another most used classifier applicable to both classification and regression problems. It is a robust modeling technique that aggregates a collection of decision trees for prediction and behavior analysis. Other popular classification models applied are KNN, NB and LR. Most recent articles used AdaBoost, an ensemble method to improve the classification accuracy. It is an iterative method that combines multiple classifiers to build a strong classifier model. Similarly, in the future, we might encounter more ensemble techniques like stacking, blending, boosting and bagging. One percentage of the global population is affected by autism [52]. If proper support and attention provided at the right time, then we can state that autism is not anymore an illness. Due to the time-consuming factor in diagnosing the disorder, the waiting list for appointments is exceeding the capacity. The increasing number of autism cases alerts the researchers to develop a simple and cost-effective tool to diagnose the disorder [48].
194
C. Karpagam and S. G. a. Rohini
6 Conclusion There is no cure for ASD, and an irreversible loss will occur if proper intervention is not applied. The ultimate aim of early detection is to increase functional independence and improve the quality of life of the children. Our previous study Examination on Early Detection of Autism Syndrome in Child Development Disorder projects an overview and underlying facts about ASD. To its continuation, here, we expand the review of recent articles to detect autism using machine learning technique. In the future, we aim to focus on implementing a classification technique that will effectively optimize the improvements in diagnosing ASD.
References 1. Frith U, Happé F (1994) Language and communication in autistic disorders. Philos Trans Royal Soc London. Series B: Biol Sci 346(1315):97–104 2. Weitlauf AS, Gotham KO, Vehorn AC, Warren ZE (2014) Brief report: dsm-5 “levels of support:” a comment on discrepant conceptualizations of severity in asd. J Autism Dev Disord 44(2):471–476 3. Kundu MR, Das MS (2019) “Predicting autism spectrum disorder in infants using machine learning.” J Phys: Conf Series 1362:012018, IOP Publishing 4. Amaral DG (2017) “Examining the causes of autism.” In: Cerebrum: the Dana forum on brain science, vol 2017, Dana Foundation 5. Ka-luz˙na-Czaplin´ska J, Z˙ urawicz E, J´o´zwik-Pruska J (2018) “Focus on the social aspect of autism.” J Autism Develop Disorders 48(5):1861–1867 6. Sappok T, Heinrich M, Underwood L (2015) “Screening tools for autism spectrum disorders.” Adv Autism 7. Van Hieu N, Hien NLH (2018) Artificial neural network and fuzzy logic approach to diagnose autism spectrum disorder. Int Res J Eng Technol 5(6):1–7 8. Scassellati B (2005) “Quantitative metrics of social response for autism diagnosis.” In: ROMAN2005. IEEE international workshop on robot and human interactive communication 2005, pp 585–590, IEEE 9. Falkmer T, Anderson K, Falkmer M, Horlin C (2013) Diagnostic procedures in autism spectrum disorders: a systematic literature review. Eur Child Adolesc Psychiatry 22(6):329–340 10. Azeem MW, Imran N, Khawaja IS (2016) Autism spectrum disorder: an update. Psychiatr Ann 46(1):58–62 11. Erkan U, Thanh DN (2019) Autism spectrum disorder detection with machine learning methods. Curr Psychiatr Res Rev Formerly: Curr Psychiatr Rev 15(4):297–308 12. Wingfield B, Miller S, Yogarajah P, Kerr D, Gardiner B, Seneviratne S, Samarasinghe P, Coleman S (2020) A predictive model for paediatric autism screening. Health Inf J 26(4):2538–2553 13. Lavanya G “Autism spectrum disorder analysis using artificial intelligence: a survey” 14. Thabtah F, Peebles D (2020) A new machine learning model based on induction of rules for autism detection. Health Inf J 26(1):264–286 15. Stewart LA, Lee L-C (2017) Screening for autism spectrum disorder in low-and middle-income countries: a systematic review. Autism 21(5):527–539 16. Levy S, Duda M, Haber N, Wall DP (2017) Sparsifying machine learning models identify stable subsets of predictive features for behavioral detection of autism. Molecular autism 8(1):1–17
14 Autism Detection Using Machine Learning Approach: A Review
195
17. Loomes R, Hull L, Mandy WPL (2017) What is the male-to-female ratio in autism spectrum disorder? a systematic review and meta-analysis. J Am Acad Child Adolesc Psychiatry 56(6):466–474 18. Gok M (2019) A novel machine learning model to predict autism spectrum disorders risk gene. Neural Comput Appl 31(10):6711–6717 19. Bai D, Yip BHK, Windham GC, Sourander A, Francis R, Yoffe R, Glasson E, Mahjani B, Suominen A, Leonard H et al (2019) Association of genetic and environmental factors with autism in a 5-country cohort. JAMA Psychiat 76(10):1035–1043 20. Zwaigenbaum L (2010) Advances in the early detection of autism. Curr Opin Neurol 23(2):97– 102 21. Corsello CM (2005) Early intervention in autism. Infants Young Child 18(2):74–85 22. Fatima M, Pasha M et al (2017) Survey of machine learning algorithms for disease diagnostic. J Intell Learn Syst Appl 9(01):1 23. Osisanwo F, Akinsola J, Awodele O, Hinmikaiye J, Olakanmi O, Akinjobi J (2017) Supervised machine learning algorithms: classification and comparison. Int J Comput Trends Technol (IJCTT) 48(3):128–138 24. Ayodele TO (2010) Types of machine learning algorithms. New Adv Mach Learn 3:19–48 25. Ray S (2019) “A quick review of machine learning algorithms.” In: 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), pp 35–39, IEEE 26. Zunino A, Morerio P, Cavallo A, Ansuini C, Podda J, Battaglia F, Veneselli E, Becchio C, Murino V (2018) “Video gesture analysis for autism spectrum disorder detection.” In: 2018 24th international conference on pattern recognition (ICPR), pp 3421–3426, IEEE 27. Vyas K, Ma R, Rezaei B, Liu S, Neubauer M, Ploetz T, Oberleitner R, Ostadabbas S (2019) “Recognition of atypical behavior in autism diagnosis from video using pose estimation over time.” In: 2019 IEEE 29th international workshop on machine learning for signal processing (MLSP), pp 1–6, IEEE 28. Deng J, Cummins N, Schmitt M, Qian K, Ringeval F, Schuller B (2017) “Speechbased diagnosis of autism spectrum condition by generative adversarial network representations.” In: Proceedings of the 2017 international conference on digital health, pp 53–57 29. Mohanta A, Mittal VK (2020) “Classifying speech of asd affected and normal children using acoustic features.” In: 2020 national conference on communications (NCC), pp 1–6, IEEE 30. Dekhil O, Ismail M, Shalaby A, Switala A, Elmaghraby A, Keynton R, Gimel’farb G, Barnes G, El-Baz A (2017) “A novel cad system for autism diagnosis using structural and functional mri.” In: 2017 IEEE 14th international symposium on biomedical imaging (ISBI 2017), pp 995–998, IEEE 31. Zhao F, Zhang H, Rekik I, An Z, Shen D (2018) Diagnosis of autism spectrum disorders using multi-level high-order functional networks derived from restingstate functional mri. Frontiers Human Neurosci 12:184 32. Katuwal GJ, Cahill ND, Baum SA, Michael AM (2015) “The predictive power of structural mri in autism diagnosis.” In: 2015 37th annual international conference of the ieee engineering in medicine and biology society (EMBC), pp 4270–4273, IEEE 33. Shahamiri SR, Thabtah F (2020) Autism ai: a new autism screening system based on artificial intelligence. Cogn Comput 12(4):766–777 34. Hyde KK, Novack MN, LaHaye N, Parlett-Pelleriti C, Anden R, Dixon DR, Linstead E (2019) Applications of supervised machine learning in autism spectrum disorder research: a review. Rev J Autism Develop Disorders 6(2):128–146 35. Hosseinzadeh M, Koohpayehzadeh J, Bali AO, Rad FA, Souri A, Mazaher-inezhad A, Rezapour A, Bohlouli M (2021) “A review on diagnostic autism spectrum disorder approaches based on the internet of things and machine learning.” J Supercomput 77(3):2590–2608 36. Thabtah F (2019) Machine learning in autistic spectrum disorder behavioral research: a review and ways forward. Inf Health Soc Care 44(3):278–297 37. Maenner MJ, Yeargin-Allsopp M, Van Naarden BK, Christensen DL, Schieve LA (2016) Development of a machine learning algorithm for the surveillance of autism spectrum disorder. PLoS ONE 11(12):e0168224
196
C. Karpagam and S. G. a. Rohini
38. Liu W, Li M, Yi L (2016) Identifying children with autism spectrum disorder based on their face processing abnormality: a machine learning framework. Autism Res 9(8):888–898 39. Alkoot FM, Alqallaf AK (2016) Investigating machine learning techniques for the detection of autism. Int J Data Min Bioinform 16(2):141–169 40. Yuan J, Holtz C, Smith T, Luo J (2016) “Autism spectrum disorder detection from semistructured and unstructured medical data.” EURASIP J Bioinf Syst Biol 2017(1):1–9 41. Duda M, Haber N, Daniels J, Wall D (2017) Crowdsourced validation of a machine-learning classification system for autism and adhd. Trans Psychiatr 7(5):e1133–e1133 42. Li B, Sharma A, Meng J, Purushwalkam S, Gowen E (2017) Applying machine learning to identify autistic adults using imitation: an exploratory study. PLoS ONE 12(8):e0182652 43. Altay O, Ulas M (2018) “Prediction of the autism spectrum disorder diagnosis with linear discriminant analysis classifier and k-nearest neighbor in children.” In: 2018 6th international symposium on digital forensic and security (ISDFS), pp 1–4, IEEE 44. Demirhan A (2018) Performance of machine learning methods in determining the autism spectrum disorder cases. Mugla Journal of Science and Technology 4(1):79–84 45. Al-Diabat M (2018) Fuzzy data mining for autism classification of children. Int J Adv Comput Sci Appl 9(7):11–17 46. Vaishali R, Sasikala R (2018) A machine learning based approach to classify autism with optimum behaviour sets. Int J Eng Technol 7:18 47. Yaneva V, Eraslan S, Yesilada Y, Mitkov R et al (2020) Detecting high-functioning autism in adults using eye tracking and machine learning. IEEE Trans Neural Syst Rehabil Eng 28(6):1254–1261 48. Voronenko M, Lurie I, Boskin O, Zhunissova U, Baranenko R, Lytvynenko V (2019) “Using bayesian methods for predicting the development of children autism.” In: 2019 IEEE international conference on advanced trends in information theory (ATIT), pp 525–529, IEEE 49. Akter T, Satu MS, Khan MI, Ali MH, Uddin S, Lio P, Quinn JM, Moni MA (2019) Machine learning-based models for early stage detection of autism spectrum disorders. IEEE Access 7:166509–166527 50. Stevens E, Dixon DR, Novack MN, Granpeesheh D, Smith T, Linstead E (2019) Identification and analysis of behavioral phenotypes in autism spectrum disorder via unsupervised machine learning. Int J Med Informatics 129:29–36 51. Abdullah AA, Rijal S, Dash SR (2019) “Evaluation on machine learning algorithms for classification of autism spectrum disorder (asd).” J Phys: Conf Series 1372:012052, IOP Publishing 52. Parikh MN, Li H, He L (2019) Enhancing diagnosis of autism with optimized machine learning models and personal characteristic data. Front Comput Neurosci 13:9 53. Thabtah F, Abdelhamid N, Peebles D (2019) A machine learning autism classification based on logistic regression analysis. Health Inf Sci Syst 7(1):1–11 54. Lee SH, Maenner MJ, Heilig CM (2019) A comparison of machine learning algorithms for the surveillance of autism spectrum disorder. PLoS ONE 14(9):e0222907 55. Achenie LE, Scarpa A, Factor RS, Wang T, Robins DL, McCrickard DS (2019) A machine learning strategy for autism screening in toddlers. J Develop Behav Pediatrics: JDBP 40(5):369 56. Raj S, Masood S (2020) Analysis and detection of autism spectrum disorder using machine learning techniques. Procedia Computer Science 167:994–1004 57. Alwidian J, Elhassan A, Ghnemat R “Predicting autism spectrum disorder using machine learning technique,” 58. Dewi ES, Imah EM (2020) “Comparison of machine learning algorithms for autism spectrum disorder classification.” In: International joint conference on science and engineering (IJCSE 2020), pp 152–159, Atlantis Press 59. MR. Alteneiji, L. M. Alqaydi, and M. U. Tariq, “Autism spectrum disorder diagnosis using optimal machine learning methods,” Autism, vol. 11, no. 9, 2020. 60. Kumar MSJDS “Prediction and comparison using adaboost and ml algorithms with autistic children dataset”
14 Autism Detection Using Machine Learning Approach: A Review
197
61. K¨Upper C, Stroth S, Wolff N, Hauck F, Kliewer N, Schad-Hansjosten T, Kamp-Becker I, Poustka L, Roessner V, Schultebraucks K et al (2020) Identifying predictive features of autism spectrum disorders in a clinical sample of adolescents and adults using machine learning. Sci Rep 10(1):1–11 62. Guan J, Yang E, Yang J, Zeng Y, Ji G, Cai JJ (2016) Exploiting aberrant mrna expression in autism for gene discovery and diagnosis. Hum Genet 135(7):797–811 63. Thabtah F “A mobile app for ASD screening.” http://www.asdtests.com/, 2017. [Online; accessed 20 Dec 2020
Chapter 15
Double-Image Encryption Through Compressive Sensing and Discrete Cosine Stockwell Transform Saumya Patel and Ankita Vaish
1 Introduction Currently, information transmission through public networks is increasing day by day. The transmitted information can be in the form of images, videos, etc., which contains lots of confidential information. Due to this, information security is important before transmission. Many researchers have given several encryption algorithms based on chaos theory [1], double-random phase encoding [2], bit-level permutation [3], Arnold transform [4], and others. These encryption algorithms are not considering compression. To overcome this problem, in 2004, Donoho et al. [5] have pioneered a new sampling and reconstruction algorithm, which can compress the signal as well as encrypt. Compressive sensing (CS) specifies that the signal can be reconstructed with fewer measurements under certain conditions through reconstruction algorithms. CS is a new acquisition model that compresses the signal at the time of sampling through linear projection. Moreover, it has been found in the literature that CS-based simultaneous compression encryption may increase compression performance [6–8]. Many researchers have given several encryption algorithms based on CS. Xu et al. [9] have given a CS-based encryption algorithm, in which the signal is measured from two axes through a circulant measurement matrix (MM). Further, permutation and diffusion are employed to increase the security level. An encryption algorithm for color images is given by Zhang et al. [10], where the Kronecker product is applied to generate a MM through a small seed matrix which is obtained by chaos map. At last, fraction Fourier transform has been employed to re-encrypt the signal. Gong et al. [11] have given an encryption algorithm, where the SHA-256 algorithm is applied to obtain the seed value of the chaos map.
S. Patel (B) · A. Vaish Banaras Hindu University, Varanasi, U.P., India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_15
199
200
S. Patel and A. Vaish
Simultaneously, multiple-image encryption is an unavoidable topic in the era of image encryption due to less memory uses and efficient transmission through network. Many researchers have been given lots of multiple-image encryption algorithms based on chaos theory [12–14]. Yang et al. [15] have introduced double-image encryption (DIE) algorithm, in which discrete cosine transform (DCT) is applied on both the images and then the transformed coefficients are sorted by Z-scan to compress and merge the images. Further, DNA coding is applied to the encrypted signal to provide more security. CS can reconstruct the signal from few measured values, so DIE is more efficient through CS. Some of the researchers have given algorithm based on it. Rawat et al. [16] have given a CS-based DIE scheme. Combining CS and discrete fractional random transform, Zhou et al. [17] have given a DIE scheme in which two images are compressed, encrypted through CS, and merged to make a single image for further encryption through Arnold transform and discrete fractional random transform. In this paper, a double-image encryption algorithm based on CS and discrete cosine Stockwell transform (DCST) [18] is proposed. CS is employed in both images to encrypt and compress the signal. The resulting images are combined sequentially to make a single image. Finally, zig-zag confusion and DCST have been applied to enhance the security level of the cryptosystem. The rest of the paper is organized as follows: Sect. 2 presents the mathematical background of CS. In Sect. 3, the proposed algorithm is introduced. Section 4 demonstrates the experimental analysis of the proposed algorithm on some standard images. Finally, Sect. 5 concludes the proposed work.
2 Compressive Sensing CS is a sampling and reconstruction algorithm that can reconstruct the signal with very few measured values if the signal is sparse. Sparsity explains the signal has fewer nonzero values which contain most of the information. Let X ∈ RN ∗1 be a signal and has a sparse representation on an orthogonal basis ϕ, α=ϕ∗X
(1)
where α represents the K—sparse (K N) signal that contains the principle information. ϕ is an orthogonal basis. In the CS process, a MM (φ) ∈ RM ∗N measured the sparse signal in linear form, Y = φα = φϕ X = δ X
(2)
15 Double-Image Encryption Through Compressive Sensing …
201
where Y ∈ RM ∗N is a compressed signal and M N. δ is a sensing matrix. l 0 —norm is used to estimate the sparse solution, minα0 s.t. Y = φϕα
(3)
l 0 —norm is an NP-hard problem, so l1 —norm is employed to estimate the sparsity, minα1 s.t. Y = φϕα
(4)
For perfect recovery of the signal X from the measured values, δ should satisfy the restricted isometric property (RIP) [19]. RIP is an NP-hard problem, so coherence is the alternative solution.
3 Proposed Algorithm The overall scheme of the proposed work is shown in Fig. 1, and the steps are as follows: Step 1: Step 2: Step 3: Step 4:
Two plain images of size N * N are represented as I1 and I2. Both images I1 and I2 are subdivided into four equal non-overlapping blocks. Each block of the images is sparsified through an orthogonal basis (ϕ) of discrete wavelet transform (DWT). Each sparse block of image I1 and I2 is measured through the measurement matrices φ 1 and φ 2 . Measurement matrices are through a logistic map [20]. The generated chaotic map is iterated ite = M2 ∗ N2 ∗d times with the initial parameters μ0 , × x 0 , and μ1 , × x 1 to obtain two sequences.
Fig. 1 Proposed scheme
202
S. Patel and A. Vaish
L 1 = {L 11 , L 12 , L 13 . . . L 1.ite }
(5)
L 2 = {L 21 , L 22 , L 23 . . . L 2.ite }
(6)
where d is a sampling parameter and the sequences are sampled within the interval d. L s1 = 1 − 2L 11+K ∗d K = 0, 1, 2 . . . (M/2)(N /2) − 1
(7)
L s2 = 1 − 2L 21+K ∗d K = 0, 1, 2 . . . (M/2)(N /2) − 1
(8)
The obtained sequences are reordered in a column-wise manner and get two matrices φ 1 and φ 2 of size M/2 * N/2.
Step 5: Step 6: Step 7:
φ1 = φ1 (1 : m, :)
(9)
φ2 = φ2 (1 : m, :)
(10)
where m = M/4. And φ 1 and φ 2 are the final MM. All sub-blocks are merged successively to enlarge in a single image. Zig-zag scrambling is applied. At last, discrete cosine Stockwell transform (DCST) is applied to enhance security.
At the decoder end, all the steps are performed in the reverse order to decrypt the signal, and OMP [21] algorithm is used for reconstruction.
4 Experimental Analysis In this section, the performance analysis of the proposed algorithm is discussed using some mathematical metric such as peak signal-to-noise ratio (PSNR), histogram analysis, correlation coefficient, mean square error (MSE), etc. In the experiment, several traditional test images of size 256 × 256 are used. Figure 2a, b depicts the primary images. Figure 2c shows the result for encrypted image, and Fig. 2d, e depicts the decoded results through the proposed work. An important approach to evaluate the performance of the encryption scheme is time complexity. We have used the images from USC-SIPI “Miscellaneous: volume 3” [22] dataset, and MATLAB R2018 is used for implementation. The encoding time of the proposed algorithm for the “Lena–Pepper” image is 0.428627, which is a relatively fast encryption speed in the DIE scheme. Table 1 shows the comparison of encryption time of several DIE schemes.
15 Double-Image Encryption Through Compressive Sensing …
203
Fig. 2 Encoding–decoding results
Table 1 Encryption time comparison with existing scheme
Encryption time (s)
256 × 256 (Lena–Pepper)
Wu [13]
11.3088
Liu [12]
1.9313
Zhou [17]
0.606061
Proposed
0.428627
4.1 Histogram Analysis Histogram depicts the intensity distribution. Figure 3 shows the result for encrypted and decrypted image histograms. Figure 3a, b shows the histogram of primary “Lena” and “Pepper” images, while Fig. 3 (c) demonstrates the intensity distribution of encrypted images which is different from the original. Figure 3c explains that the intensity distribution of encrypted images follows the normal distribution. Thus, the attackers are not able to identify the information. Figure 3d, e is the histogram of the decoded image which is almost identical to the primary one.
4.2 PSNR PSNR is a quality measurement of the decrypted image. Table 2 compares the PSNR values of the existing scheme from the proposed which demonstrates that the algorithm gives satisfying results.
204
S. Patel and A. Vaish
Fig. 3 Histogram a, b primary image, c encrypted image, d, e decoded image
Table 2 Comparison of proposed work with some existing work
Images
PSNR [23]
PSNR [17]
PSNR (proposed)
Lena
29.3657
30.5689
32.8150
Pepper
25.1275
29.7801
31.3676
Einstein
30.8947
31.7635
33.2324
Man
23.0438
28.2198
29.0780
4.3 MSE The quality of the decrypted image is identified through MSE among the primary image and decrypted image. The proposed algorithm is tested on several images, and the results are shown in Table 3. MSE of the correctly decrypted image is near zero, which demonstrates that the introduced algorithm decrypts the image with negligible information loss. Table 3 MSE and SSIM values of proposed scheme
Images
MSE
SSIM
Lena
7.8341e−04
0.8735
Pepper
9.4458e−04
0.8783
Einstein
0.0011
0.8616
Man
0.0017
0.8629
15 Double-Image Encryption Through Compressive Sensing …
205
Table 4 Comparison of CC with existing scheme Correlation coefficient
Horizontal
Lena
0.9203
Vertical 0.9712
Diagonal 0.9430
Pepper
0.9363
0.9704
0.9633
Lena–Pepper encrypted [23]
0.0120
0.0917
0.1019
Lena–Pepper encrypted (proposed)
0.0026
−0.0061
−0.0002
Einstein
0.9570
0.9758
0.9682
Man
0.8378
0.8874
0.9298
Einstein–Man encrypted [23]
0.0256
0.0698
0.0367
Einstein–Man encrypted (proposed)
0.0029
−0.0010
0.0035
4.4 Structure Similarity Index Metric (SSIM) SSIM calculates the similarity index among the original and decoded images. Table 3 demonstrates the results for the similarity index, and the values are near to 1. Thus, it shows the decrypted images are identical to the original.
4.5 Correlation Coefficient (CC) CC usually evaluates the correlation between adjacent pixels. In order to obtain CC of adjacent pixels, thousands of adjacent pixels are selected from the plain image and cipher image, and then CC is evaluated in each direction. Table 4 shows the results of CC.
5 Conclusion In this paper, a double-image encryption scheme based on CS and DCST is proposed. A logistic map is employed to generate the MM which is less coherent and measured pixel values that have principle information of the signal. Zig-zag confusion is applied to permute the pixel values. DCST is used to enhance the security of the proposed scheme. The number of bands of DCST is used as a security key, and if the wrong number of bands is used to decrypt the image, then it will give wrong results. Experimental results depict the superiority of the proposed scheme over existing.
206
S. Patel and A. Vaish
References 1. Özkaynak F (2018) Brief review on application of nonlinear dynamics in image encryption. Nonlinear Dyn 92(2):305–313 2. Zhou NR, Hua TX, Gong LH, Pei DJ, Liao QH (2015) Quantum image encryption based on generalized Arnold transform and double random-phase encoding. Quantum Inf Process 14(4):1193–1213 3. Zhu Z, Zhang W, Wong K, Yu H (2011) A chaos-based symmetric image encryption scheme using a bit-level permutation. Inf Sci 181(6):1171–1186 4. Chen W, Quan C, Tay CJ (2009) Optical color image encryption based on Arnold transform and interference method. Opt Commun 282(18):3680–3685 5. Donoho DL (2006) Compressed sensing. IEEE Trans Inf Theory 52(4):1289–1306 6. Chen J, Yu Z, Qi L, Fu C, Xu L (2018) Exploiting chaos-based compressed sensing and cryptographic algorithm for image encryption and compression. Opt Laser Technol 99:238–248 7. Yang Y-G, Guan B-W, Li J, Li D, Zhou Y-H, Shi W-M (2019) Image compression-encryption scheme based on fractional order hyper-chaotic systems combined with 2D compressed sensing and DNA encoding. Opt Laser Technol 119:105661 8. Zhou N, Zhang A, Wu J, Pei D, Yang Y (2014) Novel hybrid image compression–encryption algorithm based on compressive sensing. Optik 125(18):5075–5080 9. Xu Q, Sun K, Cao C, Zhu C (2019) A fast image encryption algorithm based on compressive sensing and hyperchaotic map. Opt Lasers Eng 121:203–214 10. Di Z, Liao X, Yang B, Zhang Y (2018) A fast and efficient approach to color-image encryption based on compressive sensing and fractional Fourier transform. Multim Tools Appl 77(2):2191– 2208 11. Gong L, Qiu K, Deng C, Zhou N (2019) An image compression and encryption algorithm based on chaotic system and compressive sensing. Opt Laser Technol 115:257–267 12. Liu Z, Gong M, Dou Y, Liu F, Lin S, Ahmad MA, Dai J, Liu S (2012) Double image encryption by using Arnold transform and discrete fractional angular transform. Opt Lasers Eng 50(2):248– 255 13. Wu Y, Noonan JP, Yang G, Jin H (2012) Image encryption using the two-dimensional logistic chaotic map. J Electron Imaging 21(1):013014 14. Wang R, Deng G-Q, Duan X-F (2021) An image encryption scheme based on double chaotic cyclic shift and Josephus problem. J Inf Secur Appl 58:102699 15. Yang Y-G, Guan B-W, Zhou Y-H, Shi W-M (2021) Double image compression-encryption algorithm based on fractional order hyper chaotic system and DNA approach. Multim Tools Appl 80(1):691–710 16. Rawat N, Kim B, Muniraj I, Situ G, Lee B-G (2015) Compressive sensing based robust multispectral double-image encryption. Appl Opt 54(7):1782–1793 17. Zhou N, Yang J, Tan C, Pan S, Zhou Z (2015) Double-image encryption scheme combining DWT-based compressive sensing with discrete fractional random transform. Opt Commun 354:112–121 18. Vaish A, Kumar M (2018) Color image encryption using singular value decomposition in discrete cosine Stockwell transform domain. Opt Appl 48(1) 19. Candes EJ (2008) The restricted isometry property and its implications for compressed sensing. CR Math 346(9–10):589–592 20. Phatak SC, Suresh Rao S (1995) Logistic map: a possible random-number generator. Phys Rev E 51(4):3670 21. Liu E, Temlyakov VN (2011) The orthogonal super greedy algorithm and applications in compressed sensing. IEEE Trans Inf Theory 58(4):2040–2047 22. SIPI Image Database–Misc. http://sipi.usc.edu/database/database.php?volume=misc 23. Zhou N, Jiang H, Gong L, Xie X (2018) Double-image compression and encryption algorithm based on co-sparse representation and random pixel exchanging. Opt Lasers Eng 110:72–79
Chapter 16
A Hybrid Deep Learning Model for Human Activity Recognition Using Wearable Sensors Kumar Gaurav, Bholanath Roy, and Jyoti Bharti
1 Introduction Human activity recognition (HAR) is the advanced technique that enables an automated detection of human physical activities. The variety of the ways different people execute a given activity makes automated detection of physical activity which is a challenge. In addition, certain actions may be conducted at the same time, and there may be no link of cause and effect between two successive activities [1]. The existence of sensors and their availability on mobile platforms (i.e., accelerometers, gyroscopes, and magnetometers) makes it easy for us to measure or evaluate various physical activities, such as movement, position, and direction. The sensor data are common in several areas such as health care [2], security [3], robotics [4], transport [5], sport [6], smart homes, and smart cities. The data gathered by sensors are commonly utilized for solutions. In addition, HAR is one of the best technological tools which is used for supporting the everyday lives of older people [7]. The growing number of persons over 60 years of age also will substantially raise the price of health care. This fact shows that HAR has a significant role to play in intelligent patient monitoring systems [8]. In general, wearable sensors (e.g., gyroscope, accelerometer, IMU sensor) [9], ambient sensors (e.g., cameras, GPS, and PIRs), and sensors integrated in smartphones are available for categorizing the sensors utilized in HAR study. Today, smartphones have a wide range of sensors such as movement, position, and direction sensors, which have focused on the HAR domain in numerous research projects [10]. While environmental and cell phone detectors are quicker and easier to use, they are K. Gaurav (B) · B. Roy · J. Bharti Department of Computer Science & Engineering, Maulana Azad National Institute of Technology, Bhopal 462003, India J. Bharti e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_16
207
208
K. Gaurav et al.
Fig. 1 Human activity recognition model in real-time scenario
far less flexible and precise than wearable sensing devices. In addition, the secrecy of remote monitoring might be affected. Much of the present HAR methods however are dependent on ambient and smartphone sensor data collected [9, 11–13]. This study examines UCI-HAR dataset. This dataset contains data from 18 daily activities through sensors that are wearable. Some examples of activities are walking, running, cycling, and driving. The difficulty in classifying the sensor data into a collection of well-known physical activities is acknowledged by HAR. This is why machine learning (ML) is often employed in the solutions offered for this problem, such as vector support machines, decision trees (DTs), and Naive Bayes (NB). Below is a Fig. 1 that shows a real-time scenario of the HAR model. The analysis is done by applying a hybrid model GRU-CNN results in providing a higher accuracy rate. Through the strength of CNNs in feature extraction and the capacity of GRU to take into account temporal dependence between the activities may be used through the combination of CNN and GRU models. We compare in this study various hybrid DL models as an extension of this work [14–28]. The variables assessed in this comparison study include Accuracy, Precision, Recall, and F1-Score. In summary, the following contributions are as follows: • To model and categorize data from human behavior with various existing DL techniques and shows up results after comparison. • Evaluation of well-known performance metrics on the suggested hybrid models (i.e., accuracy, recall, score, sensitivity, and specificity).
16 A Hybrid Deep Learning Model for Human Activity Recognition …
209
• Analysis of the influence on the evaluating outcomes of various hyper-parameters (such as number of layers, pooling size, and kernel size) and comparing with the models that are published on the same dataset the performance of the suggested models. The remaining paper is organized in the following sections. Section 2 illustrates about the related work that are done in the past on the various DL techniques in order to achieve a better HAR system. Section 3 provides a description of the dataset which is UCI-HAR. The nest section is Sect. 4 which is an experimental result analysis. The last Sect. 5 explains the concluded results obtained from the proposed model with some future enhancements.
2 Related Work In the past decade, deep learning approaches are gaining outstanding results in the field of feature extraction, as there are academics aims to use them in first term to human behavior identification. Taylor et al. [7] utilized Boltzmann machine convolutions for identifying and extracting sensitive data from video behaviors in early periods. In order to gain additional information about action. In this research paper, the comparative study of three most efficient deep learning models LSTM-RNN, GRU-RNN as well as SVM has been performed on the most famous dataset ‘Human Activity Recognition Using Smartphones Data Set’ present at UCI machine learning repository. This study concluded that support vector machine showing the best performance out of those trained models on the basis of the accuracy and f-1 score. But, since the above work was not enough capable to show best performance because of the limitations of study were to perform prediction using the deep learning models like CNN, RNN, LSTM and compare it with those machine learning algorithms. The next work is the enhanced work of this paper. Du et al. [4] suggested Bio-LSTM for human activity recognition has been performed on the two video datasets UCI-50 dataset and HMDB-51 dataset having the labels clap, climb, walking, cartwheel, eat, catch, push-ups, and wave by training two models. Convolutional neural network (CNN) and hidden Markov model (HMM) on 70% of the datasets and testing on the 30% of the dataset and compared the performance of the both on the different labels and concluded CNN performed most efficiently than the SVM models which was optimized further in that study than the previous implemented models [4]. But, the above work can generally generate the human activities and the complex nature of this model as it trained two models simultaneously. The next model proposed that the 1 axis along with the sensor data be treated as 1D data and then sent to CNN for the purpose of identification. Liu et al. [29] suggested combining CNN and CRFs for action segmentation and reconnaissance by using UCI-HAR dataset. The CNN is also capable to automatically
210
K. Gaurav et al.
learn space–time features automatically, whereas the CRF can record dependence between outputs. Other typical deep learning approaches, such as the recursive neural network and the long-term memory network, are also extensively utilized. On the one hand, deep learning has been effective in the identification of human behavior. It is also frequently utilized in sensor-based identification of human behavior. It only integrated the accelerometer and gyro sequences in an active picture to automatically enable the ideal characteristics to be learned from the active image using a deeply convoluted neural network (DCNN).
2.1 Findings of Summary In this section, the Table 1 summarizes several findings of the existing work that are explained in the above section. CNNs, for example, are particularly strong at removing suitable local functionality from sensor data. But, the time relationships between data records are memory-free and ignore. On the other hand, the GRU is ideally adapted for issues that play a major part in temporary dependencies. This paper proposes some tests on different combinations of CNN with other DL techniques and compares them for key performance criteria, e.g., accuracy and recall. In Table 1, the summary of findings for the related has been explained so to better understand about the proposed model. Based on several combination of algorithm applied in the previous work clearly suggested that from [9, 14], the model of extraction is of manual nature, so it mislead the tradition convolution. Thus, the rate of accuracy hardly reaches the max at 90.11%. Thus, to ensure the accuracy rate increment, the proposed model simulates GRU over LSTM in the NNs, so the combination can extract the features in an automated way on the dataset UCI-HAR. Table 1 Related or summary of findings References
Mode of feature extraction
Modeling
Accuracy rate (%)
Type of sensor
[9]
Manual
KNNs, HMMs, SVMs, RFs
88
Smartphone
[14]
Manual
SVMs
89.78
Wearable
[15]
Manual
SVMs, KNNs
90.11
Wearable
[15]
Automated
CNNs
92.09
Wearable
[16]
Automated
VRNNs, CNNs
92.78
Smartphone
[7]
Automated
CNNs, LSTMs
92.56
Ambient
[4]
Automated
CNNs, LSTMs, BiLSTMs
92.89
Smartphone
Proposed work
Automated
GRU-LSTM, CNN,
93.43
Wearable
16 A Hybrid Deep Learning Model for Human Activity Recognition … Table 2 UCI-HAR activities
Activities
Samples
Percentage (%)
Walk
122,091
16.3
Up
116,707
15.6
Down
107,961
14.4
Sit
126,677
16.9
Std
138,105
18.5
Lay
136,865
18.3
211
3 Dataset Description Table 2 summarized the facts for information in the dataset including activities and samples. Some variances may be noticed between them. The UCI-HAR dataset. (https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+sma rtphones) contains the highest volunteer population, which indicates that this dataset was built from 30 individuals’ recordings. Five types of sensors, including accelerometers, gyroscopes, magnetometers, sensors, and environmental sensors, were collected. The UCI-HAR [27] dataset was compiled of 30 people between the ages of 19 and 48 years. All individuals have been told to follow an activity routine during the recording process. And everyone wore a cell phone with built-in inertial sensors. Six daily operation includes: standing (Std), laying (Lay), traveling up (Downhill), and going downhill (Downhill) (Up). However, this collection of data includes comprises transitions from position to sit down, sit up, sit, sit down, lie down, stand up, lay down to stand. From this dataset, six activities are selected for the input samples containing angular velocity rate at 50 GHz, and the percentage with the particular dataset is also given in Table 2.
4 Proposed Hybrid Model DL models have the greatest benefit of their capacity in order to learn complicated raw data characteristics. This eliminates the requirement for pre-exploration and handmade extraction of features. In this part, the DL models utilized in our experimental investigation are the GRU-LSTM with CNN and that will be shown briefly in this section. Following are the steps that are followed in the proposed model so to preprocessed the data on initial stage.
212
K. Gaurav et al.
4.1 Preprocessing This is the initial and the important stage before applying the proposed model. Preprocessing involves the cleaning of unnecessary data, selection of instance, normalization, and segmentation. The categories of preprocessing are explained below. 1. Linear Interpolation The present datasets are realistic using sensors which are worn on the person body. Thus, throughout the collecting procedure, certain data may be lost and lost NaN/0 which is normally indicated. To address this issue, the linear method for interpolation to fill the gap is used in this paper. 2. Scaling and Normalization In this process, the training of model is generally done so to normalize the data using the formula mentioned below. X i=
X i −X imin X imax −X imin
(1)
N = no. of channels X imax , X imin are the values for the ith channel. 3. Segmentation Recording of each activity lasts only for a brief time. To partition the data for more samples, a small sliding window is require.
4.2 Proposed Model Architecture In Fig. 2, the GRU-LSTM-CNN architecture is shown whose working is followed by the input taking then after preprocessing of data (linear interpolation, scaling and normalization, segmentation). Then, the LSTM 64 neurons are built up which gets filter by the applying convolution 64 filters, pertaining the kernel size = 5 and stride = 2. Then, to address the correct output, the max pooling is applied with the pooling size of 2. At final applying Flatten which enables the conversion of the pooled feature map into a single column which is then passed to the full connected layer and then with the batch normalization to the horizontal data of sampling points with vertically present channels of sensors. Finally, the output can get at the output layer which is a dense layer with softmax classifier.
16 A Hybrid Deep Learning Model for Human Activity Recognition …
Input
1. Linear Interpolation Pre-processing 2. Scaling and Normalization 3. Segmentation
GRU 64 Neurons LSTM 64 Neurons
2 Convolution 64 Filters Kernel Size: 5 Stride: 2
Max-Pooling, Pooling Size: 2, Stride: 2
2 Convolution 64 Filters Kernel Size: 5
Flatten BN Output Dense with Softmax
Fig. 2 Architecture of proposed model
213
214
K. Gaurav et al.
4.3 GRU Layer This layer has a large impact on vanish the gradient problem similar to the LSTM layer. GRU uses the mechanisms of gating which allows to manage and to take control of the flow of information in between the cells of the NN. So, the gate helps in finding out that what should be retained/passed or dropped. The gates work on the value in between 0 and 1, where 0 (zero) is for unimportant info, and 1 is for important info. Total no. of gates in GRU = 2 1.
2.
Reset gate This gate is somewhat like the forgotten gate of the cell LSTM. The resultant reset vector r is the info that will decide how well the concealed time stages are erased. Update gate
The proposed work is based on a supervised learning. Now, start the working at GRU layer embedding the input of dimension to the output = 64. The output of GRU will be a tensor of shape (batch_size, timesteps, 64). model.add(layers.GRU(64,return_sequences = True))_model.compile(optimizer = ’Adam’, loss = ’binary_crossentropy’,metrics = [‘accuracy’]). At the output side, the softmax layer is used for the back propagation to the LSTM layer. The output of the first hidden layer is a block of feature maps with the 2-dimensional shape of 128 × 64.
4.4 LSTM Layer LSTM works on the concepts of cells which is a smaller unit via input output and forget gates simultaneously. It enables the power to remember the values of unsynchronized time. Thus, the gates flow of information. The concepts of LSTM as well as GRU units are quite identical in all manners. In addition, we aim to wipe away duplicate info that includes a recorded mass memory component in an arbitrary period of time. A vectors that is distinct for every input sequence, starting through one zero element for t = 0t = 0, are supplied into the networks by the memory. For xt ∈ RN, where N is length of the feature for every step of time, The hidden state ht for an LSTM cell can be calculated as follows: i t= σ xi .U i + h t−1 wi f t= σ xt .U i + h t−1 w f
16 A Hybrid Deep Learning Model for Human Activity Recognition …
215
ot=σ (xt .U O +h t−1 wo ) ct = tanh xt .U g + h t−1 w g ct=σ ( ft .ct−1 +it .ct ) h t = tanh(ct ).ot
(2)
Here, i, f, o are the taken inputs, forget, and output gates, respectively. This is important to remember that they contains same equations as parameter matrices (W is the recurrent connection at the previous hidden layer and current hidden layer; U is the weight matrix connecting the inputs to the current hidden layer).
4.5 Convolution and Pooling Layers In the proposed model, there are four convolutional 1D are used which are named as: Conv1D_ (Conv1D) Conv1D_2 (Conv1D) Conv1D_3 (Conv1D) Conv1D_4 (Conv1D). It can be seen clearly from Table 3 that at each step after applying Conv1D, the shape of output is reducing down. Here, total parameter is 139,590, and trainable parameter is also same, and nontrainable parameter is o. In this work, a CNN model is proposed for classification where 64 filters of a 3 × 3 size are taken and pushed it over the direction followed by left from upper to the right of lower. A value is determined using a convolution method for every point based on the filter. For every filter, a feature maps are created when the filters moved. This is taken by the activation function ReLU which are applied on and replaced each negative value in feature map with 0. Table 3 A1D fully convolutional modeling
Layer/Stride
Filter size
Output size
Conv1D_ (Conv1D)
3×3
N × 124 × 64
Conv1D_2 (Conv1D)
3×3
N × 120 × 64
Conv1D_3 (Conv1D)
3×3
N × 56 × 64
Conv1D_4 (Conv1D)
3×3
N × 52 × 64
Input
216
K. Gaurav et al.
2 × 2 matrix in the pooling layers is taken to pick up the biggest values on the feature maps and used them as input for corresponding layers. After that to convert all the pooled data through Flattening into a continuous vector, a function called Flatten is used. The last layer, a fully connected layer, used an activation function called softmax. The convolutional and pooling layers outputs reflect high-level features of input. As features extractors, the convolutionary layers learn the feature representations of their inputs. The neurons are organized into feature vector in the convolutionary layers. Each neural in an overview map contains a dynamic range linked via a group of beginning of the training, commonly referred to as a filter bank, to a neuron district of the preceding layer. The inputs are combined with the learning weights to calculate a new feature map, and the results are being transmitted via a non-linear kernel function. A value is determined using a convolution method for every point, based on the filter. For every filter, a feature maps are created when the filters moved. This is taken by the activation function ReLU which are applied on and replaced each negative value in feature map with 0. In order to improve such efficiency. The mathematical equation of sigmoid, hyperbolic tangent as well as rectified linear unit are given below, σ (X ) = tanh(x) =
ex 1 + e−x
2 −1 1 + e−2x
F(x) = max(0, x)
(3) (4) (5)
where x is the weighted sum of inputs (Fig. 3).
4.6 Global Average Pooling Layer The aim of the pooling layers is to lower the space resolution of the characteristic maps and hence obtain a spatial invariance for distortions and translations. Global average pooling may be seen as a structural regularization term that directly enforces the belief map of features (categories). The MLP conv layers allow this, and they are much closer to trust maps than GLMs. A number of convolutionary as well as pooling layers are generally built up to obtain more basic functional representations when traveling over the network. These depictions and the functionality of the top-level reasoning are interpreted by the totally linked levels following these layers.
16 A Hybrid Deep Learning Model for Human Activity Recognition …
217
Fig. 3 Graphical representation of activation functions
4.7 Batch Normalization Layer Batch normalization is a training technique that standards the inputs for each minibatch for very deep neural networks. This stabilizes the learning experience and reduces the number of training stages necessary to build neural network substantially. Finally, the dropout of 1664 is obtained with dense 6 for param# 9990. output = layers.Dense(6)(decoder_output)
5 Results and Discussions UCI-HAR dataset is most well liked testing dataset in analysis of human activity recognition based on wearable sensor. 30 people wearing a smartphone (Samsung galaxy SII) on the waist is assembled with accelerometer and gyroscope participated with data collection. Each people who performed 6 activities such as (LAYING, STANDING, SITTING, WALKING, WALKING-UPSTAIRS DOWNSTAIRS). This dataset randomly divided into two part: trainable and non-trainable dataset. Total params: 139,590 Trainable params: 139,590 Non-trainable params: 0. By utilizing the sensor (gyroscopic and accelerometric gage) in a cell phone, they have collected the accelerator ‘three coordinate acceleration’ (tAcc-XYZ), as well as the ‘tGyro-XYZ’ (tGyro-XYZ). Thus, noisy filter was from before to provide sensors (accelerometer as well as gyroscope) and collected in sliding windows that had a range of 128 window.
218 Table 4 Hyper-parameters list
K. Gaurav et al. Layer (type)
Output shape
Param #
GRU
(None, 128,64)
14,400
LSTM
(None, 128,64)
33,024
Conv1D
(None, 124,64)
20,544
Conv1D
(None, 120,64)
20,544
Max pooling
(None, 60,64)
0
Conv1D
(None, 56,64)
20,544
Conv1D
(None, 52,64)
20,544
Max pooling
(None, 26,64)
0
Flatten
(None, 1664)
0
Dropout
(None, 1664)
0
Dense
(None, 6)
9950
5.1 Implementation of Model The proposed model is implemented over Python API network that supports a high level of programming and can simulate neural networks on TensorFlow. The use of TensorFlow is taken for the purpose of backend. The classification and training are done on a PC which is an i3 core processor with 4.00 GHz, 64 GB RAM. The system is incorporated with the Ubuntu OS. The proposed model is trained with the supervised learning, to analyze data, and GRU is then back-propagated along with the softmax layer to the LSTM (Table 4). Total params: 139,590 Trainable params: 139,590 Non-trainable param: 0. Here, we used four convolutional one dimension where output shape is decreased each step. Here, total parameter is 139,590, and trainable parameter is also same, and non-trainable parameter is o. The model was trained in a fully supervised manner, and the gradient was back-propagated from the softmax layer at the LSTM layer. # Confusion Matrix Print (confusion_matrix(Y_test, y)).
5.2 Performances Metrics The overall accuracy of the proposed model has been calculated via the parameters used for evaluating the performance of the proposed model are accuracy, precision, and F1-score which are defined as follows: Classification results are obtained with reference to precision, recall, F1-score, and accuracy of each class, and overall accuracy.
16 A Hybrid Deep Learning Model for Human Activity Recognition …
219
• Precision It gives the information about how the precise the model is out of those positive assumptions, and how many are false, and it is calculated as: Precision = 2 ∗
TP × 100(%) TP + FP
• Recall It measures if any of the actual positives our model catches by marking them as positive, and it is calculated as: Recall = 2 ∗
TP × 100(%) TP + FN
• F1-score It is a representation of the accuracy of a test. F1-score is the harmonic mean (HM) of the precision and recall and achieves its highest value at 1. F1-score is calculated as: F1score = 2 ∗
Precision ∗ Recall × 100(%) Precision + Recall
• Accuracy It may be represented as the ratio of correctly predicted true positive with true negative to a total of true positive, true negative, false positive, false negative count. Accuracy is calculated as: Accuracy = 2 ∗
TP + TN × 100(%) TP + TN + FP + FN
Here, True positive: TP True negative: TN False negative: FN False positive: FP (Tables 5 and 6). For UCI-HAR dataset, there were 7352 trained instances and 2947 test instances which have been correctly classified, and achieved accuracy 93.56% (Fig. 4). Table 5 Confusion matrix for recall Lay
Sit
Stand
Walk
Down
UP
537
0
0
6
416
50
0
71
0 0 0
Recall
0
0
0
100
0
0
19
84.72
457
1
0
3
85.90
0
0
467
26
3
94.15
1
0
6
419
0
99.76
14
0
0
0
457
97.70
220
K. Gaurav et al.
Table 6 Confusion matrix for precision Lay
Sit
Stand
Walk
Down
UP
Precision
537
0
0
0
0
0
98.89
6
416
50
0
0
19
82.86
0
71
457
1
0
3
90.13
0
0
0
467
26
3
99.78
0
1
0
6
419
0
94.15
0
14
0
0
0
457
94.81
Fig. 4 Model loss. 100 epochs: accuracy: 0.9342%, loss: 0.067
6 Conclusion In this paper, sensor-based (accelerometer and gyroscope) HAR algorithm using HYBRID model (GRU-LSTM-CNN) for raw data collected by mobile sensor is given into GRU layer followed by LSTM layer and four convolutional layer network which is capable of learning temporal dynamic on various time scale according to learned parameter of LSTM and GRU. As per the proposed study, the dataset UCI-HAR by
16 A Hybrid Deep Learning Model for Human Activity Recognition …
221
applying the hybrid approach gives an accuracy rate of 93.43% comparatively to the previous one. Based on these experiments and comparisons, the model suggested in this study is more efficient. The future aim is to maximize accuracy via the application of various models and techniques such as optimization and hyper-tuning in deep learning and machine learning.
References 1. Lopez-Nava IH, Angelica MM (2016) Wearable inertial sensors for human motion analysis: a review. IEEE Sens J 16(15) 2. Liu Y, Nie L, Liu L, Rosenblum DS (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115 3. Liu T, Bingfei F, and Qingguo L (2017) The invention relates to a wearable motion sensor and a method for resisting magnetic field interference 4. Lara OD, Labrador MA (2013) A survey on human activity recognition using wearable sensors. IEEE Commun Surv Tutorials 15(3):1192–1209 5. Ordóñez FJ, Roggen D (2016) Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors (Switzerland) 16:1 6. Du X, Vasudevan R, Johnson-Roberson M (2019) Bio-LSTM: a biomechanically inspired recurrent neural network for 3-d pedestrian pose and gait prediction. IEEE Robot Autom Lett 4(2):1501–1508 7. Taylor GW, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer, Berlin, Germany 8. Ji S, Xu W, Yang M, Yu K (2013) 3D Convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231 9. Ronao CA, Cho S-B (2016) Human activity recognition with smartphone sensors using deep learning neural networks. Expert Syst Appl 59:235–244 10. Cho K, van Bart M, Caglar G et al (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the EMNLP 2014— 2014 conference on empirical methods in natural language processing, Doha, Qatar, Oct 2014, pp 1724–1734 11. Zeng M, Le TN, Bo Y et al (2015) Convolutional neural networks for human activity recognition using mobile sensors. In: Proceedings of the 2014 6th international conference on mobile computing, applications and services, Austin, TX, USA, pp 197–205, Nov 2015 12. Jiang W, Yin Z (2015) Human activity recognition using wearable sensors by deep convolutional neural networks. In: Proceedings of the 2015 ACM multimedia conference MM 2015, Brisbane, Australia, Oct 2015, pp 1307–1310 13. Chen Y, Xue Y (2015) A deep learning approach to human activity recognition based on single accelerometer. In: Proceedings of the 2015 IEEE international conference on systems, man, and cybernetics, SMC 2015, Hong Kong, China, Oct 2016, pp 1488–1492 14. Murad A, Pyun JY (2017) Deep recurrent neural networks for human activity recognition. Sens (Switzerland) 17:11 15. Zhou J, Sun J, Cong P et al (2019) Security-critical energy-aware task scheduling for heterogeneous real-time MPSoCs in IoT. IEEE Trans Serv Comput (TSC) 12:99 16. Guan Y, Plötz T (2017) Ensembles of deep LSTM learners for activity recognition using wearables. Proc ACM Interactive, Mobile, Wearable and Ubiquitous Technol 1(2):1–28 17. Qi L, Zhang X, Dou W, Hu C, Yang C, Chen J (2018) A two-stage locality-sensitive hashing based approach for privacy-preserving mobile service recommendation in cross-platform edge environment. Futur Gener Comput Syst 88:636–643
222
K. Gaurav et al.
18. Ignatov A (2018) Real-time human activity recognition from accelerometer data using Convolutional Neural Networks. Appl Soft Comput 62:915–922 19. Nweke HF, Teh YW, Al-garadi MA, Alo UR (2018) Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: state of the art and research challenges. Expert Syst Appl 105:233–261 20. Wang J, Chen Y, Hao S, Peng X, Hu L (2019) Deep learning for sensor-based activity recognition: a survey. Pattern Recogn Lett 119:3–11 21. Wu S, Li G, Deng L et al (2019) $L1$-norm batch normalization for efficient training of deep neural networks. IEEE Transactions on Neural Networks and Learning Systems 30(7):2043– 2051 22. B. Almaslukh, J. Al Muhtadi, and A. M. Artoli, “A robust convolutional neural network for online smartphone-based human activity recognition,” Journal of Intelligent & Fuzzy Systems, vol. 35, no. 2, pp. 1609–1620, 2018. 23. Yao R, Lin G, Shi Q, Ranasinghe DC (2018) Efficient dense labelling of human activity sequences from wearables using fully convolutional networks. Pattern Recogn 78:252–266 24. Kautz T, Groh BH, Hannink J, Jensen U, Strubberg H, Eskofier BM (2017) Activity recognition in beach volleyball using a deep convolutional neural network. Data Min Knowl Disc 31(6):1678–1705 25. R. Jozefowicz, W. Zaremba, and I. Sutskever, “An empirical exploration of recurrent network architectures,” in Proceedings of the 32nd international Conference on machine learning, ICML 2015, vol. 3, pp. 2332–2340, Lille, France, July 2015. 26. Li S, Zhao S, Yang P, Andriotis P, Xu L, Sun Q (2019) Distributed consensus algorithm for events detection in cyber-physical systems. IEEE Internet Things J 6(2):2299–2308 27. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors, arXiv preparation, Geneva, Switzerland 28. Abidine BMh, Fergani L, Fergani B, Oussalah M (2018) The joint use of sequence features combination and modified weighted SVM for improving daily activity recognition. Pattern Anal Appl 21(1):119–138 29. Liu C, Liu J, He Z, Zhai Y, Hu Q, Huang Y (2016) Convolutional neural random fields for action recognition. Pattern Recogn 59:213–224 30. Huang Y, Wan C, Feng H (2019) Multi-feature fusion human behavior recognition algorithm based on convolutional neural network and long short term memory neural network. Laser Optoelectron Prog 56:7 31. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Chapter 17
Detection of Diabetic Retinopathy Using Deep Learning-Based Framework Imtiyaz Ahmad, Vibhav Prakash Singh, and Suneeta Agarwal
1 Introduction Diabetes is a chronic disease that occurs in the human body when the pancreas fails to produce the required insulin or when the body fails to process it properly [1]. As it advances, it starts affecting the circulatory system of the human body including the retina and causes damage to the retinal blood vessels, which leads to diabetic retinopathy and gradually decreases the vision of the patient. This disease can cause permanent blindness to the affected person if appropriate treatment is not provided in the early stages. The abnormal shift in blood sugar level starts happening in diabetes mellitus. Generally in a human body, glucose is converted into energy that helps to perform normal human functions. But in the worst-case scenario, there is an abnormal blood sugar level, the excess blood sugar causes hyperglycemia. Non-proliferative diabetic retinopathy (NPDR) and proliferative diabetic retinopathy (PDR) are two main stages of DR [2] as shown in Fig. 1. NPDR is a condition in which the retina becomes inflamed (a case of macular edema) because of the accumulation of glucose that leads to leakage of blood vessels in the eyes. In a severe condition, retinal vessels might get blocked completely, which causes macular ischemia. There are different levels in NPDR in which sometimes the patient suffers from blurred vision or loses sight partially or completely. PDR occurs in the advanced stage of diabetes in which extra blood vessels start growing in the retina (a case of neovascularization). These new I. Ahmad (B) · V. P. Singh · S. Agarwal Computer Science & Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, UP, India V. P. Singh e-mail: [email protected] S. Agarwal e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_17
223
224
I. Ahmad et al.
Fig. 1 A NPDR retina and PDR retina
blood vessels are very narrow and brittle with more tendency to cause hemorrhage. This can lead to partial or complete loss of vision. Automatic testing is very necessary for the manual effort to be reduced as costs in this process are very high. Making it expensive to detect deviations from retina images should be done automatically through digital photography techniques and imaging techniques. In DR, the arteries that help in nourishing the retina start to leak fluid and blood into the retina causing visual aids known as lesions such as microaneurysms, hemorrhages, hard exudates, cotton wool spots [3]. Based on the type and number of lesions present, the severity of DR can be represented in five levels. Level-0 is a normal eye without any DR, levels 1–3 represent NPDR and level-4 represents PDR [4]. Many researchers have done their works in the detection and classification of DR using deep learning. Some of the previous works are enlisted in Table 1. From this table, it is evident that the detection performances of many of these approaches are quite limited. Alyoubi et al. [5] have discussed the detailed review of the classification methods that have been done for binary as well as multilevel. In binary classification, DR images are classified into two classes, i.e., ‘Yes’ and ‘No’. Where ‘Yes’ represents infected images or images with the presence of DR and ‘No’ represents images without DR.), and in multilevel classification DR images are classified into many classes. These are some classification methods are based on vessel segmentation from the image. In this work, we have taken binary classification for the detection and classification. In the next two sections, we have discussed the proposed methods and model followed by their experimental results. Finally, in section-4, we have concluded this work.
17 Detection of Diabetic Retinopathy Using Deep Learning-Based …
225
Table 1 State-of-the-art works for the detection of DR Author’s References
DL method used
Database AUC used (no of images)
Accuracy (%)
Sensitivity (%)
Specificity (%)
[6]
CNN-ResNet34
Kaggle (35,000) [7]
–
85
86
–
[8]
CNN-InceptionV3
Private dataset (30,244)
0.946
88.21
85.57
90.85
[9, 10]
CNN-ResNet
Private dataset (60,000) and STARE (131) [11]
0.982 and 0.95
94.23 and 90.84
90.94 –
95.7 –
[12]
CNN-ResNet50
Messidor (1200) [9] IDRID (516) [13]
0.96 –
92.6 65.1
92 –
– –
[14]
CNN-VGG
Kaggle (20,000) [7]
–
78.3
–
–
[15]
CNN-InceptionV3
Kaggle (166) [7]
–
63.23
–
–
[16]
CNN
DDR (13,673) [16]
–
0.824
–
–
2 Methods and Model There are various approaches are available in the literature, which is based on supervised, and unsupervised learning methods for the detection of Diabetic Retinopathy. Deep learning is one of the techniques which is widely used in medical imaging applications like image classification, image segmentation, image retrieval, image detection, and registration of images. For detection and classification of diabetic retinopathy, deep learning techniques or deep neural networks have been widely used. Deep neural networks produce outstanding results in the removal of default features and isolation. Unlike machine learning methods, the performance of deep learning methods increases with the size of dataset. Figure 2 shows the step-wise working of the proposed method, used for the detection and classification of DR. First step starts with the selection of retinal fundus image database. There are several databases available publically which we can use, discussed in the previous section. In our model, we worked on two databases, APTOS and IDRID.
226
I. Ahmad et al.
Input Images
Pre-processing
Testing
Softmax and Classification
Training on customized CNN of 12 layers
2 fully connected layers
Fig. 2 Basic methods used in our model for detection of DR
APTOS [17] (Asia Pacific Tele-Ophthalmology Society)—Contains 3662 training images and 1928 testing images. Images are available with the ground truths which are classified on the basis of severity of DR rating on a scale of 0–4. It is a part of the Kaggle competition and in our model, we have used only training images, i.e., 3662 images. IDRID [13] (Indian Diabetic Retinopathy Image Dataset)—Contains a retinal fundus image captured by a retinal specialist at an Eye Clinic located in Nanded, Maharashtra, India. Datasets are available in three different parts, i.e., segmentation, disease grading, and localization, and contains 81, 516, and 516 images with ground truths, respectively. In our model, we have used the Disease Grading part of the dataset in which ground truths are available that are classified on the basis of severity of DR rating on a scale of 0–4. After taking the datasets, we splatted the entire datasets into two classes, i.e., ‘No Class’ and ‘Yes Class’. On the basis of available graded groundtruths, the datasets grading from 1 to 4 severity grade has been kept in one class as ‘Yes class’, representing the infected images with DR and the datasets having 0 severity grade has been kept in another class as ‘No class’, representing the normal images without DR. After that, we divided the dataset into a training set, validation set, and testing set. From the total number of images, 20% has been kept randomly for testing purposes, and from the remaining 80%, 10% has been kept for validation purposes. The detailed division of datasets is shown in Table 2. Table 2 Datasets used Dataset
Number of images Training set
Total
Validation set
Testing set
Yes class
No class
Yes class
No class
Yes class
No class
APTOS
1337
1300
149
144
371
361
3662
IDRID
250
121
28
13
70
34
516
17 Detection of Diabetic Retinopathy Using Deep Learning-Based …
227
2.1 Preprocessing In the APTOS dataset, the size of images is not in uniform way, they vary from image to image. To train a network we needed to resize them equally. In our model, we resized all the images to 299 * 299. Similarly, we resized the images of the IDRID dataset. After preprocessing, the images are used as an input to train our CNN architecture.
2.2 CNN Architecture CNN is a type of Deep Learning Neural Network and has achieved extremely impressive success in the field of computer vision. CNN uses the supervised learning method, a problem space where the predicted target is clearly labeled within the data which is being used for training. As a general point of view, in this network earlier stages help in detecting the features like edge and later layers recombine those features and form higher-level attributes of the given input fallowed by the classification. Various variants of CNN have been proposed in the last few years, but the basic components are the same. The basic architecture of CNN is that it works in different layers (processing units) like the convolutional layer, pooling layer, fully connected layer, dropout layer, activation functions, and batch normalization layers. In CNN, the output and prediction algorithm feature is integrated as a single model. Therefore, the released features have a lot of discriminatory power, because every CNN model is trained under the direction of the output labels. We tried to train the datasets on a customized CNN architecture by changing the number of layers and other parameters and the final architecture which we used is shown in Table 3, which consists of 3 convolutional layers, 3 max-pooling layers, 3 batch normalization layers, 3 ReLU layers and a softmax layer as activation functions and 2 fully connected layers. At first, we put images of size 299 * 299 * 3 (where 3 is the number of channels) as input to the convolution layer consisting of 16 filters of size 3 * 3, stride [1 1], and same padding. On doing this, the number of activations in the first layer will be 299 * 299 * 16 = 14, 30, 416 and total learnable parameters will be 448 (weight 3 * 3 * 3 * 16 + bias 1 * 1 * 16). ‘Convolutional layers apply a convolution operation to the input, passing the result to the next layer. A convolution converts all the pixels in its receptive field into a single value’ [18]. The convolution layer is followed by a batch normalization layer and a ReLU layer. Batch normalization is used to increase the speed of training of CNN and to decrease the sensitivity in the utilization of the network. It normalizes the data overall observations in a mini-batch for each channel independently [19]. ReLU, rectified linear unit, is an activation operation that performs a random threshold operation in which input values smaller than zero are set to zero and are computationally efficient than other activation functions [20].
228
I. Ahmad et al.
Table 3 CNN architecture used in our model Layers
Description
Activations
Image Input
input
299 * 299 * 3
Learnables
Convolution
16 3 * 3 convolution with stride [1 1] and padding ‘same’
299 * 299 * 16
Weights 3 * 3 * 3 * 16 Bias 1 * 1 * 16 Total 448
Batch Normalization
–
299 * 299 * 16
Offset 1 * 1 * 16 Bias 1 * 1 * 16 Total 32
ReLU
–
299 * 299 * 16
–
Max-pooling
2 * 2 max-pooling with stride [2 2]
149 * 149 * 16
–
Convolution
32 3 * 3 convolution with stride [1 1] and padding ‘same’
149 * 149 * 32
Weights 3 * 3 * 16 * 32 Bias 1 * 1 * 32 Total 4640
Batch Normalization
–
149 * 149 * 32
Offset 1 * 1 * 32 Bias 1 * 1 * 32 Total 64
ReLU
–
149 * 149 * 32
–
Max-pooling
2 * 2 max-pooling with stride [2 2]
74 * 74 * 32
–
Convolution
64 3 * 3 convolution with stride [1 1] and padding ‘same’
74 * 74 * 64
Weights 3 * 3 * 32 * 64 Bias 1 * 1 * 64 Total 18,496
Batch Normalization
–
74 * 74 * 64
Offset 1 * 1 * 64 Bias 1 * 1 * 64 Total 128
ReLU
–
74 * 74 * 64
–
Max-pooling
2 * 2 max-pooling with stride [2 2]
37 * 37 * 64
–
Fully connected
2 fully connected layer
1*1*2
Weights 2 * 87,616 Bias 2 * 1 Total 175,234
Softmax
–
1*1*2
–
Classification
Classification output
1*1*2
–
After ReLU, there is a pooling layer (max-pooling in particular) of filter size 2 * 2 and stride of 2. The pooling layer is used to downgrade the magnitude of the feature map to get better or clear features of images. Max-pooling selects the most activated presence of a feature (or computes the maximum of each region) [21]. The details about activations and total learnable parameters in each layer are given in Table 3. The output from the pooling layer is now the input for the next convolution layer, and the process repeats twice by changing the number of filters in the convolution layer as 32 and 64, respectively. To reduce the overfitting problem, we used L2 regularization.
17 Detection of Diabetic Retinopathy Using Deep Learning-Based …
229
At last, we used two fully connected (FC) layers followed by a softmax and then the classification layer. The output from the last max-pooling layer is flattened and fed as input to the FC layer which ‘adds a bias vector and multiplies the input by a weight matrix’ [18]. Softmax is an activation function that transforms the vector of numbers into the vector of probabilities. For classification, the classification layer calculates the cross-entropy loss with incompatible classes [22].
3 Results and Analysis We trained our model in MATLAB R2021a in a PC with configuration ‘Windows 10 OS, Intel(R) Core(TM) i5-5200U CPU @ 2.20 GHz 2.20 GHz and RAM 8 GB’. We used two datasets to train our model, and the training progress of both the datasets is shown in Fig. 3. In training, to reduce the loss in minimum effort, we have used different types of optimizers to update the parameters like weight and learning rate. In our model, we used Adam optimizer (adaptive moment estimation) which works with the momentum of first and second orders. We used the default learning rate of Adam optimizer, which is 0.001. We trained our model by changing the number of layers, epochs, and batch size. In the APTOS dataset, we used 10
Fig. 3 a Training progress of IDRID dataset. b Training progress of APTOS dataset
230
I. Ahmad et al.
epochs and a mini-batch size of 64 and got the validation accuracy of 95.90% and it took 147 min and 18 s to train the network. In the IDRID dataset, we used 20 epochs and a mini-batch size of 64 and got the validation accuracy of 80.40% and it took 64 min and 54 s to train the network. After training and classification, we tested our model by testing on 20% of images, and it took less than 2 min to test on both datasets. We calculated the performance matrices by using MATLAB code ‘plot confusion’, and the plot is shown in Fig. 4. The target class (columns) is the actual expected class, and the output class (rows) is the predicted class. Table 4 reflects the elements of confusion matrix and their metrics. True positive (TP) is a condition where if the class label of a record in a dataset is positive, the classifier predicts the same for that particular record. Similarly, the true negative (TN) is a condition where if the class label of a record in a dataset is negative, the classifier predicts the same for that particular record. False positive (FP) is a condition where the class label of a record in a dataset is negative, but the classifier predicts the class label as positive for that record. Similarly, a false negative (FN) is a condition where
Fig. 4 a Confusion matrix of APTOS dataset. b Confusion matrix of IDRID dataset
Table 4 Confusion metrics elements Predicted value
No
TN
FN
NPV TN/(TN + FN)
Yes
FP
TP
PPV/Precision (TP)/(TP + FP)
Specificity TNR TN/(TN + FP)
Sensitivity TPR TP/(TP + FN)
Accuracy (TP + TN)/(TP + FP + TN + FN)
No
Yes
Actual value
17 Detection of Diabetic Retinopathy Using Deep Learning-Based …
231
the class label of a record in a dataset is positive, but the classifier predicts the class label as negative for that record [23]. Using all these definitions, we can define the accuracy, which is the percentage of correct predictions made by a classifier compared to the actual value of the label. It can also be defined as the average number of correct tests in the number of all tests [23]. Further, sensitivity is defined as the percentage of true positives which are rightly identified by the classifier while testing, and specificity is defined as the percentage of true negatives which are rightly identified by the classifier during testing. The predictive results of the framework are shown in Fig. 4. In APTOS dataset, out of 732 images of test dataset, 336 images are correctly identified as positives (TP), 339 images are correctly identified as negatives (TN), 35 images are incorrectly identified as negatives (FN), and 12 images are incorrectly identified as positives (FP). On calculating the performances, NPV = 90.9%, PPV or precision = 96.6%, sensitivity or TPR = 90.6%, specificity or TNR = 96.7%, and accuracy = 93.6%. Similarly, in IDRID dataset, out of 114 images of test dataset, 63 images are correctly identified as positives (TP), 17 images are correctly identified as negatives (TN), 7 images are incorrectly identified as negatives (FN), and 17 images are incorrectly identified as positives (FP). On calculating the performances, NPV = 70.8%, PPV or precision = 78.8%, sensitivity or TPR = 90.0%, specificity or TNR = 50.0%, and accuracy = 76.9%. The curve between true positive rates and false positive rates which shows the performance of our classification model at every threshold value of classification, particularly known as the ROC curve or receiver operating characteristic curve is shown in Fig. 5. Results of testing of both datasets are given in Table 5. From the results, we can say that the performances of CNN depend upon the size of dataset; if we increase the number of images by performing any augmentations like rotation, flipping, and translation, then the learning of model will be more accurate. So, increasing the size of the dataset by applying various augmentations will be our
Fig. 5 a ROC curve of APTOS dataset. b ROC curve of IDRID dataset
232
I. Ahmad et al.
Table 5 Performance measure values of our model Database
Method
Sensitivity (%)
Specificity (%)
Accuracy (%)
AUC
APTOS
Customized CNN
90.6
96.7
93.6
0.9781
IDRID
Customized CNN
90.0
50.0
76.9
0.8210
future plan. Also, our model classified the DR only in two classes and most of the authors have failed to detect all the stages of DR accurately. Also, that needs more computational power CPUs or GPUs for computing, to train a network with large datasets and multiple classes in minimum time as possible.
4 Conclusions DR is mainly caused by the damage of blood vessels in the tissue of the retina, and it is one of the leading causes of visual impairment globally. To save the human eye from eyesight loss, detection of DR in early stages is necessary, and automated computer-aided diagnostic (CAD) system can assist ophthalmologists for the detection of DR. In this paper, we have introduced an automated screening framework using customized CNN model. The detection performances of this approach are tested on two benchmark datasets and got encouraging test results.
References 1. Taylor R, Batey D (eds) (2012) Handbook of retinal screening in diabetes: diagnosis and management. Wiley 2. Neuwirth J (1988) Diabetic retinopathy: what you should know 3. Agurto C, Murray V, Barriga E, Murillo S, Pattichis M, Davis H, …, Soliz P (2010) Multiscale AM-FM methods for diabetic retinopathy lesion detection. IEEE Trans Med Imaging 29(2):502–512 4. Gupta A, Chhikara R (2018) Diabetic retinopathy: present and past. Procedia Comput Sci 132:1432–1440 5. Alyoubi WL, Shalash WM, Abulkhair MF (2020) Diabetic retinopathy detection through deep learning techniques: a review. Inf Med Unlocked 100377 6. Esfahani MT, Ghaderi M, Kafiyeh R (2018) Classification of diabetic and normal fundus images using new deep learning method. Leonardo Electron J Pract Technol 17:233–248 7. Kaggle dataset [Online]. Available, https://www.kaggle.com/c/diabetic-retinopathy-detection/ data 8. Jiang H, Yang K, Gao M, Zhang D, Ma H, Qian W (2019) An interpretable ensemble deep learning model for diabetic retinopathy disease classification. In: 2019 41st Annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 2045–2048 9. Decencière E, Zhang X, Cazuguel G, Lay B, Cochener B, Trone C, …, Klein JC (2014) Feedback on a publicly distributed image database: the Messidor database. Image Anal Stereol 33(3):231–234
17 Detection of Diabetic Retinopathy Using Deep Learning-Based …
233
10. Liu YP, Li Z, Xu C, Li J, Liang R (2019). Referable diabetic retinopathy identification from eye fundus images with weighted path for convolutional neural network. Artif intell Med 99:101694 11. Hoover AD, Kouznetsova V, Goldbaum M (2000) Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Trans Med Imaging 19(3):203– 210 12. Li X, Hu X, Yu L, Zhu L, Fu CW, Heng PA (2019) CANet: cross-disease attention network for joint diabetic retinopathy and diabetic macular edema grading. IEEE Trans Med Imaging 39(5):1483–1493 13. Porwal P, Pachade S, Kamble R, Kokare M, Deshmukh G, Sahasrabuddhe V, Meriaudeau F (2018) Indian diabetic retinopathy image dataset (IDRiD). IEEE Dataport. https://doi.org/10. 21227/H25W98 14. Dutta S, Manideep BC, Basha SM, Caytiles RD, Iyengar NCSN (2018) Classification of diabetic retinopathy images by using deep learning models. Int J Grid Distrib Comput 11(1):89–106 15. Wang X, Lu Y, Wang Y, Chen WB (2018) Diabetic retinopathy stage classification using convolutional neural networks. In: 2018 IEEE international conference on information reuse and integration (IRI). IEEE, pp 465–471 16. Li T, Gao Y, Wang K, Guo S, Liu H, Kang H (2019) Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Inf Sci 501:511–522 17. APTOS dataset [Online]. Available, https://www.kaggle.com/c/aptos2019-blindness-detect ion/data 18. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR workshop and conference proceedings, pp 249–256 19. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456 20. Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Icml 21. Nagi J, Ducatelle F, Di Caro GA, Cire¸san D, Meier U, Giusti A, …, Gambardella LM (2011) Max-pooling convolutional neural networks for vision-based hand gesture recognition. In: 2011 IEEE international conference on signal and image processing applications (ICSIPA). IEEE, pp 342–347 22. Bishop CM (2006) Pattern recognition. Mach Learn 128(9) 23. Gadekallu TR, Khare N, Bhattacharya S, Singh S, Reddy Maddikunta PK, Ra IH, Alazab M (2020) Early detection of diabetic retinopathy using PCA-firefly based deep learning model. Electronics 9(2):274
Chapter 18
Multiclass Image Classification Using OAA-SVM J. Sharmila Joseph , Abhay Vidyarthi , and Vibhav Prakash Singh
1 Introduction Image classification is a highly successful machine learning domain. This technology supports machines to comprehend and recognize real-time objects and surroundings using digital images as inputs. Several machine learning classifiers have shown prominent results in numerous binary image classification tasks. Among them, support vector machine is the most versatile binary classifier that has been used for classification, recognition, regression tasks etc. But on the other hand, many of the real-time issues comes under the discrimination of beyond two categories. In this instance extending binary classifier algorithm to predict more than one classes may result in possibilities of classification error. The generalized solution for solving multiclass problem using binary classifier can be adopted only by using decomposition strategies. It also diminishes the complexity. The one against one (OAO) and one against all (OAA) decomposition strategies are the commonly used approaches [1– 4]. In several studies [5–8] multiclass SVM has employed for better performance in categorization of images. In this paper, support vector machine using OAA approach is utilized. The classification accuracy also affected by the extracted features so it is important to extract the most discriminative features from the images. In recent days, majority of the data sets comes with finite number of color images that possesses three color J. Sharmila Joseph (B) · A. Vidyarthi VIT Bhopal University Bhopal-Indore Highway, Sehore, (M.P.) 466114, India e-mail: [email protected] A. Vidyarthi e-mail: [email protected] V. P. Singh Motilal Nehru National Institute of Technology, Prayagraj Allahabad 211004, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_18
235
236
J. Sharmila Joseph et al.
channels Red, Green and Blue. These images are taken as the input and classification is accomplished without modifying them. It impacts the effectiveness of the classifier. In RGB color space the luminance information from the color cannot set apart since they correlated with each other whereas HSV color space gives the luminance information of color images. Hence, the color statistical features are extracted from HSV color space in this paper to acquire better classification accuracy. Texture is one of the primitive low level feature the gives the knowledge about the spatial distribution of pixels intensity in an image. In this study, we choose the LBP method to extract texture features which is flexible and adaptable to monotonic gray level transformations, scaling, perspective, brightness and rotation variance [9]. Due to its computational simplicity, the LBP technique enables picture analysis in difficult real-time scenarios [10] and also it exhibits high capability to discriminate [11]. The main objective of this paper is to utilize multiclass SVM to categorize ten different classes and to analyze the performance of the proposed system with various performance matrices. The rest of the paper is organized as follows: In Sect. 2, we have discussed the methodology, in which we have given the details of the feature extraction just like color feature and LBP texture feature further we have discussed the proposed classification framework. In Sect. 3, we have discussed our experimental results and finally in Sect. 4, we have concluded our results.
2 Methodology In this paper an image classification framework is established to solve the problem of multiclass image classification. As shown in Fig. 3, multiclass image classification is performed by hybrid features-based OAA-SVM. In this section firstly, the feature extraction methods are given in detail. Secondly, the classification method has been described. Thirdly, the working of the proposed model has given in step by step.
2.1 Feature Extraction Feature extraction is defined as a method which involves one or more measurements that quantifies the substantial characteristics of any object. Color feature. The color distribution in an image is measured with the help of color moments. Mean. The first color moment can be interpreted as the average color in the image, and it can be calculated by using the following formula. Ei =
N 1 Pi j N j=1
(1)
18 Multiclass Image Classification Using OAA-SVM
237
where N is the number of pixels in the image and Pij is the value of the jth pixel of the image at the ith color channel. Standard Deviation. The second color moment is the standard deviation, which is obtained by taking the square root of the variance of the color distribution. ⎛ ⎞ N 1 2 Pi j − E i ⎠ σi = ⎝ N j=1
(2)
where Ei the mean value, or first color moment, for the ith color channel of the image. In image analysis, colors images are used extensively. Each pixel in the image carries the color information. Color features are important because of its invariance to transformation or rotation of the image pixels. In the circumstances of illumination variation, their effectuality descends drastically. It has been noted that HSV color space offers optimum color features among different color spaces [12–16]. Hence, the RGB image is converted to HSV color space and fifteen color features like mean values, standard deviation values, skewness values, histogram max and histogram min of each HSV channels are evaluated Texture Feature. Texture feature has been generally carried out by extracting texture elements from gray scale images. In this work texture features are indispensable owing to following two reasons. 1.
Images from same classes belongs to different color as shown in Fig. 1. Therefore, we cannot predict the images simply from the color feature, it is imperative to extract texture features for identifying the class in which the images belong to.
2.
In our database images from different classes belongs to same color but vary in texture as shown in Fig. 2. So, texture feature is mandatory here.
Local Binary Pattern. The quality of the retrieved texture features has a significant impact on the success of a good categorization. In this work local binary pattern (LBP) has chosen for texture analysis. Local binary pattern has been considered as the supreme tool to retrieve robust features. The LBP operator was developed by Ojala [17]. Later, many improvements of this descriptor were emerged for extracting textures. Traditional LBP locates key points within an image and creates a histogram that reflects their distribution. It uses a sliding window to scan the pixels and creates a
Fig. 1 Sample images with different colors from the same class
238
J. Sharmila Joseph et al.
Fig. 2 Sample images with same color from different classes
Fig. 3 Proposed system
Database
Convert to HSV color space
Extract color features
Convert RGB to gray
Extract texture features by LBP
Color features + Texture features
OAA-SVM (Classification)
binary code based on the differences between the center pixel and its equally spaced circular neighbors. The radius parameter r denotes the distance, whereas the pixel parameter p denotes the number of neighbors [18]. The LBP operator is defined by
18 Multiclass Image Classification Using OAA-SVM
LBP p,r =
p−1
s(I (x j , y j ) − (xc , yc ))2 j
239
(3)
i=0
where xj and yj are the coordinates of the jth neighbor of the central pixel (xc, yc).
2.2 Classification Support vector machine (SVM) is a supervised machine learning classifier. SVM categorizes more effectively for binary classification by finding the best hyperplane. They can, however, be modified to deal with multiclass issues by using decomposition approaches. In this paper, extracted features are classified using SVM with one against all (OAA) approach. In OAA technique, the number of binary predictions equal to the number of classes to categorize. Hence, as many SVM models are created according to the number of classes. Each SVM model establishes the optimum hyper-plane to discriminate one class among other all classes. The prediction of unknown data is concluded by the maximum score obtained from all SVMs. Working of the proposed model is given in Fig. 3 which involves the following steps. Step 1: Step 2: Step 3:
Step 4: Step 5:
The Wang images are given as input. The color features are extracted after converting the image into HSV color space. The texture features are extracted by LBP after converting the input images to grayscale images. For the computation of LBP feature, the computation process for each position of sliding window involves three steps: First, calculate the difference between the central pixel and its neighborhood pixels. Second, if the value is greater than or equal to the central pixel, s(x) is set to 1, otherwise 0. Third, binary code is calculated from the values in a clockwise direction then summed and it is considered as the new value of the central pixel, in addition, it is employed as a distinctive local texture. The color and texture features are fused together. Eventually, SVM classifier is trained with the color and texture features for classification. The effectiveness of the model is evaluated by using the test images.
3 Experiments and Results To demonstrate the effectiveness of OAA-multiclass SVM we conducted experiments on 1000 Wang images with ten different classes namely, Bus, Dinosaur, Tribes, Horse, Fort, Flowers, Elephant, Food, Beach and Mountain. Among 1000 images 800 images are chosen for training and 200 images are picked for testing. Testing images comprise 20 images in each class. All the images are resized to 384 × 286 pixels before feature
240
J. Sharmila Joseph et al.
extraction. MATLAB tool is employed for computation and simulation. The RGB image are converted into HSV color space and 15 color features like H mean, S mean, V mean, H standard deviation, S standard deviation, V standard deviation, skewness of H, skewness of S, skewness of V, max and min histogram values of each HSV channels are calculated. The HSV channel and histogram visualization of HSV color space is shown in Fig. 4. Texture features are extracted by using local binary pattern (LBP) after converting from RGB to gray. The extracted features are trained with OAA-multiclass SVM classifier. To explore the performance of the proposed model, we did a comparative study of performance measures of multiclass SVM which classified based on color features, texture features and hybrid features. The effectiveness of the proposed model was investigated by measuring accuracy, specificity, precision, recall and F1-score. The latter four metrics are calculated from confusion matrix shown in Tables 1, 2, and 3. The correct predictions and misclassifications in each class can be easily recognized from the confusion matrix. Accuracy =
TP + TN (TP + FP + TN + FN)
(4)
(TN) (TN + FP)
(5)
Specificity =
Recall/Sensitivity = Precision = F1 Score = 2 ∗
Fig. 4 HSV color space descriptor
(TP) (TP + FN)
(TP) (TP + FP)
(Recall ∗ Precision) (Recall + Precision)
(6) (7) (8)
18 Multiclass Image Classification Using OAA-SVM
241
Table 1 Confusion matrix for classification of 10 different classes based on color feature Predicted classes → Actual Tribes Beach Fort Mountain Food Bus Dino Elephant Flower Horse classes ↓ Tribes
19
0
1
0
0
0
0
0
0
0
Beach
0
15
0
0
0
0
1
1
3
0
Fort
0
1
17
0
0
0
1
1
0
0
Mount
0
4
0
14
0
0
1
1
0
0
Food
0
0
0
0
17
0
0
0
3
0
Bus
0
0
1
0
0
18
1
0
0
0
Dino
0
0
0
0
0
0
19
1
0
0
Elephant
0
1
3
0
0
0
0
16
0
0
Flower
1
0
0
0
1
0
3
0
14
1
Horse
0
0
0
0
0
0
0
0
1
19
Table 2 Confusion matrix for classification of 10 different classes based on texture feature Predicted classes → Actual Tribes Beach Fort Mountain Food Bus Dino Elephant Flower Horse classes ↓ Tribes
18
0
0
0
2
0
0
0
0
0
Beach
0
12
2
4
0
0
0
1
1
0
Fort
1
0
15
1
0
2
1
0
0
0
Mount
0
3
0
12
0
0
0
4
0
1
Food
3
1
0
1
14
0
0
1
0
0
Bus
0
2
0
0
0
18
0
0
0
0
Dino
0
0
0
0
0
0
18
2
0
0
Elephant
0
2
0
3
0
0
0
12
0
3
Flower
3
0
0
0
1
0
0
0
16
0
Horse
0
0
0
0
0
0
0
2
0
18
To explore the performance of the proposed model, we did a comparative study of performance measures of multiclass SVM which classified based on color features, texture features and hybrid features and it is shown in Figs. 5, 6 and 7. It is obvious that hybrid features-based OAA-multiclass SVM has high discrimination capability than the classification based on individual color and texture features. It is clearly stated in the Table.4 with the average accuracy, specificity, precision, recall and F1-score of 10 different classes.
242
J. Sharmila Joseph et al.
Table 3 Confusion matrix for classification of 10 different classes based on hybrid features Predicted classes → Actual Tribes Beach Fort Mountain Food Bus Dino Elephant Flower Horse classes ↓ Tribes
20
0
0
0
0
0
0
0
0
0
Beach
0
16
0
0
0
0
0
0
4
0
Fort
0
0
18
1
0
0
0
0
0
1
Mount
0
2
0
15
0
0
0
0
3
0
Food
0
0
0
0
20
0
0
0
0
0
Bus
0
0
0
0
0
19
0
0
1
0
Dino
0
0
0
0
0
0
20
0
0
0
Elephant
0
0
2
0
0
0
0
17
0
1
Flower
0
0
0
0
0
0
0
0
20
0
Horse
0
0
0
0
0
0
0
0
0
20
99 95 95 95 95
96 70 67 68 70
98
Accuracy 80 80 80 80
96 95 73 8395
F1-Score 100 90 100 95 90
Precision 85 99 94 89 85
97
Recall
8597 77 81 85 100 70 100 82 70
80
99 95 95 95 95
100
75 71 73 75
PERFORMANCE METRICS (%)
Specificity
60 40 20 0
CLASSES
Fig. 5 Performance measures of classification based on color feature
120 100 80 60 40 20 0
Recall
Precision
F1-Score
Accuracy
96 90 72 80 90 96 60 60 60 60 75 99 88 81 75 95 60 57 58 60 98 70 82 76 70 99 90 90 90 90 99 90 95 92 90 94 60 55 57 60 99 80 94 86 80 98 90 81 85 90
PERFORMANCE METRICS (%)
Specificity
CLASSES
Fig. 6 Performance measures of classification based on texture feature
18 Multiclass Image Classification Using OAA-SVM
99 100 91 95 100
Accuracy 100 85 100 92 85 96 100 71 83 100
100 100 100 100 100
F1-Score
100 100 100 100 95
100 100 100 100 100
Precision
99 75 83 94 75
80
Recall
99 90 90 90 90
100 100 100 100 100
100
99 80 91 85 80
PERFORMANCE METRICS (%)
Specificity
243
60 40 20 0
CLASSES
Fig. 7 Performance measures of classification based on hybrid features
Table 4 Overall performance measures Features used
Overall Specificity (%) Precision (%) Recall (%) F1-score (%) accuracies (%)
Color features
84
98.1
85.2
84
84.5
Texture features
76.5
97.3
77.4
76.5
76.94
Hybrid feature 92.5 (color + texture)
99.2
93.7
93
93.34
4 Conclusion This paper has investigated an efficient model for multiclass image classification. The experiment was conducted with Wang images of ten different classes. Color features from HSV color space and texture features from LBP are extracted to discriminate the images precisely. Eventually, multiclass SVM classifier categorized ten different categories of images using one against all decomposition strategy. A comparative study was made with different performance measures to exhibit that our OAA- multiclass SVM discriminates effectively with hybrid features than individual color and texture features. Future work focusing on expansion of this model with other effective feature extraction techniques to classify large multiclass datasets.
References 1. Agrawal N, Singhai J, Agarwal DK (2017) Grape leaf disease detection and classification using multi-class support vector machine. In: International conference on recent innovations in signal processing and embedded systems. https://doi.org/10.1109/RISE.2017.8378160
244
J. Sharmila Joseph et al.
2. Chehade NH, Boureau J-G, Vidal C, Zerubia J (2009) Multi-class SVM for forestry classification, In: 16th IEEE international conference on image processing. https://doi.org/10.1109/ ICIP.2009.5413395 3. D. A. Gustian, N. L. Rohmah, G. F. Shidik, A. Z. Fanani, R. A. Pramunendar, and Pujiono, Classification of Troso Fabric Using SVM-RBF Multi-class Method with GLCM and PCA Feature Extraction, In: International Seminar on Application for Technology of Infor- mation and Communication, (2019). doi: https://doi.org/10.1109/ISEMANTIC.2019.8884329. 4. P. Kamani, A. Afshar, F. Towhidkhah, and E. Roghani, Car Body Paint Defect Inspection Using Rotation Invariant Measure of the Local Variance and One-Against-All Support Vector Machine, In: First International Conference on Informatics and Computational In- telligence, (2011). doi: https://doi.org/10.1109/ICI.2011.47. 5. B. Janney.J, U. G, S. Divakaran, S. Mary Jo, and N. Basilica.S, Classification of Cervical Cancer from MRI Images using Multiclass SVM Classifier, In: International Journal of Engineering & Technology, vol. 7, no. 2.25, (2018). doi: https://doi.org/10.14419/ijet.v7i2.25.12351. 6. S. Deepak and P. M. Ameer, “Automated Categorization of Brain Tumor from MRI Using CNN features and SVM. Journal of Ambient Intelligence and Humanized Computing, (2020). doi: https://doi.org/10.1007/s12652-020-02568-w. 7. C. S. Jothi, V. Usha, S. A. David, and H. Mohammed, “Abnormality Classification of Brain Tumor in MRI Images using Multiclass SVM, In:Research Journal of Pharmacy and Technology, vol. 11, no. 3, (2018). doi: https://doi.org/10.5958/0974-360X.2018.00158.0. 8. S. B. Jadhav, V. R. Udup, and S. B. Patil, “Soybean leaf disease detection and severity measurement using multiclass SVM and KNN classifier, In: International Journal of Elec- trical and Computer Engineering (IJECE), vol. 9, no. 5,pp4077–4091, (2019).doi: https://doi.org/10. 11591/ijece.v9i5. 9. T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invari- ant texture classification with local binary patterns,” In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, (2002). doi: https://doi.org/10.1109/TPAMI.2002.101 7623. 10. S. L. Happy, A. George, and A. Routray, A real time facial expression classification sys- tem using Local Binary Patterns, In: 4th International Conference on Intelligent Human Computer Interaction, (2012). doi: https://doi.org/10.1109/IHCI.2012.6481802. 11. O. Lahdenoja, J. Poikonen, and M. Laiho, Towards Understanding the Formation of Uni- form Local Binary Patterns, ISRN Machine Vision, vol. 2013, ()2013. doi: https://doi.org/10.1155/ 2013/429347. 12. Wan X, Kuo C-CJ (1996). Color distribution analysis and quantization for image retrieval. https://doi.org/10.1117/12.234782 13. Wei-Ying Ma and Hong Jiang Zhang (1998) Benchmarking of image features for content-based retrieval. In: Conference record of thirty-second Asilomar conference on signals, systems and computers. https://doi.org/10.1109/ACSSC.1998.750865. 14. Zhang Z, Li W, Li B (2009) An improving technique of color histogram in segmentationbased image retrieval. In: Fifth international conference on information assurance and security. https://doi.org/10.1109/IAS.2009.156. 15. Mathias E, Conci A (1998) Comparing the influence of color spaces and metrics in contentbased image retrieval. In: International symposium on computer graphics, image processing, and vision. https://doi.org/10.1109/SIBGRA.1998.722775 16. Mustikasari M, Madenda S (2014) Performance analysis of color based image retrieval. Int J Comput Technol 12(4). https://doi.org/10.24297/ijct.v12i4.7058 17. Ojala T, Pietikäinen M, Harwood D (1996) A comparative study of texture measures with classification based on feature distributions. Pattern Recogn 29(1):51–59. https://doi.org/10. 1016/0031-3203(95)00067-4 18. Hazgui M, Ghazouani H, Barhoumi W (2021) Genetic programming-based fusion of HOG and LBP features for fully automated texture classification. Vis Comput. https://doi.org/10.1007/ s00371-020-02028-8
Chapter 19
Deep Learning-Based Differential Distinguisher for Lightweight Ciphers GIFT-64 and PRIDE Girish Mishra, S. K. Pal, S. V. S. S. N. V. G. Krishna Murthy, Ishan Prakash, and Anshul Kumar
1 Introduction The concept of differential cryptanalysis was proposed by Biham and Shamir [1] in 1990. Differential cryptanalysis is a chosen plaintext attack mostly applied over block ciphers. In this cryptanalytic attack, large number of pairs of plaintexts and the associated ciphertexts are used in order to find out the secret encryption key in less time in comparison to the brute force attack. Apart from being applied to block ciphers, this attack, in several cases, is also applicable to stream ciphers [2] and hash functions [3]. A cryptographic distinguisher is an adversary having some advantage to distinguish a cipher from random permutations. In differential distinguisher, a particular input difference 6 is chosen and a large number of plaintext pairs are prepared such that each pair has XOR difference 6 between them. Now, each pair of plaintexts reaches to a pair of corresponding ciphertext pairs with XOR difference between themselves. Now, if (6, ) correspondence takes place with higher probability for the cipher than it is supposed to occur in a true random cipher, then the attacker possesses very good prospects of mounting the differential cryptanalytic attack on the selected cipher. The classical differential cryptanalysis is based on exhaustive approach of finding differential trails. Matsui proposed a method based on branch-and-bound method G. Mishra (B) · S. K. Pal Scientific Analysis Group, Defence R&D Organisation, New Delhi, Delhi 110054, India G. Mishra · S. V. S. S. N. V. G. Krishna Murthy Defence Institute of Advanced Technology, DIAT (DU), Pune 411025, India I. Prakash Delhi Technological University, New Delhi, Delhi 110042, India A. Kumar Maharaja Agrasen Institute of Technology, New Delhi, Delhi 110086, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_19
245
246
G. Mishra et al.
[4] to search for the high probability differential characteristics. Several other algorithms such as mixed integer linear programming (MILP) based differential attack, introduced by Zhu et al. [5] also provide significant results in finding high probability differential trails. Aron Gohr [6] introduced a novel neural network-based method at Crypto’2019 to develop cryptographic distinguisher. The objective of this development was to distinguish between the outputs of encryption algorithm (for a chosen input difference) from random data. The author established the method for roundreduced Speck32/64 cipher. He also developed a highly selective key search policy based on a variant of Bayesian optimization, which in combination with the neural network-based distinguisher further reduces the security of the encryption scheme. His was the first work which was claimed to be better than the classical version. Building upon the work done by Aron Gohr, Baksi et al. [7] tackled the distinguisher problem as a classification problem in order to manage it efficiently with machine learning techniques. They used several deep learning (DL) architectures such as multilayer perceptron (MLP), convolutional neural network (CNN) and long short-term memory (LSTM) networks to address the same problem. The Gohr’s landmark work was further taken forward by other researchers as well [7–10]. Recently, Chen and Yu [11] followed the same line of research and developed a generic neural aided statistical cryptanalytic attack. Bellini and Rossi [10] constructed distinguishers against Tiny Encryption Algorithm (TEA) and RAIDEN block ciphers by using the deep learning approach. They chose these ciphers intentionally as their block size and key size are double in comparison to SPECK32/64, which was the focus cipher for Gohr’s deep learning experiments. Following the works of Gohr [6] and Bakshi et al. [7], Aayush et al. [8] presented a deep learning-based differential distinguisher for round-reduced PRESENT cipher. This research work was further pursued by Yadav and Kumar [9] and they proposed to extend the classical differential distinguisher with a deep learning-based differential distinguisher. They mounted an attack on an SPN, Feistel, and ARX structure-based ciphers SPECK, SIMON and GIFT-64, respectively. In this paper, we use a similar approach, as used by Gohr, on lightweight ciphers GIFT-64 and PRIDE. GIFT-64 [12] is an SPN design-based lightweight block cipher consisting of 28 rounds with 64-bit input block size and 128-bit key size. PRIDE is a software oriented-cipher having SPN architecture with 64-bit input block size and 128-bit key size. In SPN design-based block ciphers, the round function incorporates a non-linear layer, comprising of small size S-boxes operating in parallel over small chunks of the state, and a linear layer that combines the resultant chunks after Sboxes. PRIDE cipher consists of a total of 20 rounds, however, in our work, we focus on rounds 5–10. Zhu et al. [5] presented input differentials for GIFT-64 cipher for 12 and 19 rounds with a probability of 2−59 and 2−60, respectively using mixed integer linear programming-based differential cryptanalysis. We use these differentials to mount attacks on the mentioned ciphers. We use two architectures in our experiments, namely, multilayer perceptron (MLP) and convolutional neural networks (CNN), to address the classification problem. Through this work, we achieve significantly better results compared to previously presented results.
19 Deep Learning-Based Differential Distinguisher for Lightweight …
247
Outline of the paper Section 2 starts by introducing the lightweight ciphers GIFT64 and PRIDE. Then in Sect. 3, we discuss about the differential distinguisher. This is followed by Sect. 4 describing the deep learning models used for the distinguishing task. In Sect. 5, we discuss the results produced by the models introduced in Sect. 4. Finally, we conclude with our observations in Sect. 6.
2 Description of Block Ciphers In this section, we present the brief description of lightweight block ciphers GIFT-64 [12] and PRIDE [13].
2.1 GIFT Lightweight Block Cipher The designers of GIFT [12] proposed two variants of the cipher, namely, GIFT-64128 and GIFT-128-128, which are 28-round and 40-round SPN ciphers, respectively. Both versions have a key size of 128 bits. These two versions are popularly known as GIFT- 64 and GIFT-128, respectively. For our experiments, we focus on GIFT-64 variant of the cipher. Each round of both the ciphers implements a round function. As shown by the authors [12], the round function consists of 3 steps: SubCells, PermBits and AddRoundKey. The structure of GIFT-64 cipher starts with 4 × 4 S-Box (shown in Table 1), which is applied 16 times in parallel to serve the purpose of introducing non-linearity in the cipher. This non-linear Subcells step is followed by PermBits step and then by AddRoundKey 1. 2. 3.
Subcells: The internal state of the cipher can be represented as S = s0 s1 s2… s15 . Each si denotes the nibble on which the instance of S-box is applied. PermBits: This operation updates the internal state S by a linear bit permutation. The bit permutation specification is shown in Table 1. AddRoundKey: A n/2-bit round key RK is extracted from the key state. It is partitioned into 2 j-bit words RK = U||V = u j−1 …u0 ||v j−1 …v0 = n/4. For GIFT-64, RK is XORed to the state as s4i+1 ← s4i+1 ⊕ ui , s4i ← s4i ⊕ vi , ∀i ∈ {0, 1,…, 15}.
Table 1 S-box of GIFT-64 x
0×00×10×20×30×40×50×60×70×80×90×a0×b0×c0×d0 ×e0׃
S (x) 0 × 1 0 × a 0 × 4 0 × c 0 × 6 0 × ƒ 0 × 3 0 × 9 0 × 2 0 × d 0 × b 0 × 7 0 × 5 0 × 0 0 ×80×e
248
G. Mishra et al.
Table 2 Bit permutation for GIFT-64 i
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
P64 (i)
0
17
34
51
48
1
18
35
32
49
2
19
16
33
50
3
i
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31 7
P64 (i)
4
21
38
55
52
5
22
39
36
53
6
23
20
37
54
i
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
P64 (i)
8
25
42
59
56
9
26
43
40
57
10
27
24
41
58
11
i
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
P64 (i)
12
29
46
63
60
13
30
47
44
61
14
31
28
45
62
15
The algorithm of GIFT-64 cipher is shown in Algorithm 1. The S-box (S) and permutation (P64 ) used in the algorithm are shown in Table 1 and 2, respectively. Algorithm 1 GIFT-64 Encryp on Algorithm 1: Input: Plaintext X0 = (x63, x62, ..., x0) and round key rk i = (ui, vi) ∀i ∈ {1, 2, ..., 28} 2: Output: Ciphertext X28 3: for do do 4: for S x3+4∗ j , x2+4∗ j , x1+4∗ j , x0+4∗ j = y3+4∗ j , y2+4∗ j , y1+4∗ j , y0+4∗ j 5: 6: end for (y63, y62, ..., y0) = P64 (y63, y62, ..., y0) 7: 8: for do 9: y3∗( k+1)+k = ci ⊕ y3∗( k+1)+k 10: end for do 11: for 12: y4i+1 = y4i+1 ⊕ ui 13: yi = yi ⊕ vi 14: end for 15: Xi+1 = (y63, y62, ..., y0) ⊕ (1 21 then ORACLE = CIPHER else ORACLE = RANDOM.
4 Deep Learning Model We have implemented 4 distinguisher models denoted by MLP1, MLP2, CNN1 and CNN2. MLP1 and MLP2 are artificial neural networks (multilayer perceptron networks) with 7 and 6 layers, respectively. CNN1 and CNN2 are convolutional neural networks consisting of 7 and 8 layers, respectively. The input to the neural networks is a sequence of binary data, which is 64 bits for GIFT-64 and PRIDE ciphers along with the respective class label. The output layer of all models consists of a single neuron with sigmoid activation function. Model CNN1 and CNN2 use a kernel of size 3 and stride 1. For example, in CNN1 we build a sequential model with filters of sizes shown in Table 4. The sequential model uses a 1-D convolution block which consists of 1-D convolution operation followed by 1-D batch normalization and leaky-relu activation function. After applying the last convolutional block which has 512 filters we flatten the output to a layer which has 28,672 neurons, after that we apply another linear layer and finally the prediction layer. The prediction layer consists of a single neuron, and sigmoid activation function is used here to predict the probability of the classes. To avoid over fitting of the models we have used L2 regularization along with dropouts. For training the network we use ADAM optimization algorithm. Data set collection: We have performed the comparative analysis over 4 different models for which that choose varying sizes of known input data set ranging from 211 to 220 samples. For generating random samples, we use the NumPy random number generator [15]. These data sets are generated for rounds 5–10 for the respective ciphers. The data set comprises examples of real ciphertext differences and data generated at random. The prior is labeled as 1 whereas the latter as 0. Table 4 Machine learning architecture
Network
Architecture
MLP1
64, 1024, 1024, 1024, 1024, 128, 1
MLP2
64, 128, 1024, 1024, 1024, 1
CNN1
128, 256, 256, 14,848, 100, 1
CNN2
128, 256, 256, 512, 28,672, 1024, 1
19 Deep Learning-Based Differential Distinguisher for Lightweight …
253
Table 5 Hyper-parameters for machine learning model Hyper-parameters
Values
Batch-size
512
Epochs
20
Encryption rounds
5–10
Sample size
211 –220
Optimization algorithm
ADAM
Loss function
Binary cross entropy
Validation split
0.15
Test split
0.15
Learning rate
0.001
5 Results The entire experiment ran for a few days on NVidia Tesla T4 which is a powerful GPU device equipped with 16 GB of RAM. Memory usage was significantly high during training and for higher rounds machine learning models require more computational power for learning to distinguish. Detailed comparative analysis of the performance of 4 models used is shown in Fig. 2. In the figure it can be seen that the accuracy for any round between 5 and 10 with sample size varying between 211 and 217 is close to 50% for GIFT-64. We were able to achieve significant results with sample size greater than 217 . The accuracy we are referring here is the test accuracy and the size of the corresponding test data can be calculated using the parameters given in Table 5. We observe that the accuracy of the model increases significantly when we
Fig. 2 Machine learning-based distinguisher comparison for GIFT-64 (a) and PRIDE (b)
254
G. Mishra et al.
increase the sample size. Out of the 4 models shown, MLP1 performs the best, even on higher rounds. It reaches an accuracy of 96% in round 10 for GIFT-64 and 100% for PRIDE. Both the convolutional neural network models CNN1 and CNN2 give similar results, but do not supersede the performance of MLP1. For PRIDE (Fig. 2b) we observe that our models give significant results on a dataset of size greater than 214 and MLP1 achieves 100% accuracy for round 10. Other models also perform equally well. From Fig. 2, it is clear that MLP1 was able to perform the distinguishing task better than the other models, which can clearly be seen in Figs. 3 and 4. In these figures, we see that though MLP2 performs equally good with similar accuracies as that of MLP1 but MLP1 still performs slightly better for both ciphers even on small dataset. Moving ahead with the experiments, we mounted a full round attack on both GIFT-64 and PRIDE ciphers, the results are shown in Fig. 5. We have considered two scenarios, the first scenario is concerned with distinguishing the ciphertext differences (generated by fixed input difference) from random data. The fixed input differences are taken from [5] and [14] for GIFT-64 and PRIDE ciphers, respectively. The second scenario is concerned with distinguishing the ciphertext differences (generated by randomly chosen input difference) from random data. The random data in each scenario is generated using NumPy random number generator. For GIFT-64
(a) 6 rounds
(c) 8 rounds Fig. 3 Test accuracies for 6–9 rounds of GIFT-64 encryption
(B) 7 rounds
(d) 9 rounds
19 Deep Learning-Based Differential Distinguisher for Lightweight …
(a) 6 rounds
(B) 7 rounds
(c) 8 rounds
(d) 9 rounds
255
Fig. 4 Test accuracies for 6–9 rounds of PRIDE encryption
cipher, an accuracy of 96% and 90% was achieved for scenario 1 and 2, respectively. Similarly for PRIDE cipher, we achieved 100% and 98% accuracy for scenario 1 and 2, respectively.
6 Conclusion In this paper, we introduced a simple neural network distinguisher architecture, which was successfully able to distinguish ciphertext differences from the random data for GIFT-64 and PRIDE ciphers. The model was able to mount an attack on rounds 5–10 with an accuracy higher than that of classical differential distinguishers. The accuracy of these models is directly dependent on the intensity of training. We also successfully mounted a full round attack on both the ciphers by using the proposed model MLP1. We did not perform a key recovery attack, but this can be developed in future along with other machine learning algorithms to achieve better results.
256
G. Mishra et al.
Fig. 5 Test accuracy for full round attack on GIFT-64 (left) and PRIDE (right) encryption
(a) 28 rounds
(b) 20 rounds
References 1. Biham E, Shamir A (1992) Differential cryptanalysis of the full 16-round des. In: Annual international cryptology conference, Springer, pp 487–496 2. Biham E, Dunkelman O (2007) Differential cryptanalysis of stream ciphers, tech rep Computer Science Department, Technion 3. Biham E, Shamir A (1993) Differential cryptanalysis of hash functions. In: Differential cryptanalysis of the data encryption standard, Springer, pp 133–148 4. Matsui M (1994) On correlation between the order of s-boxes and the strength of des. In: Workshop on the theory and application of cryptographic techniques, Springer, pp 366–375 5. Zhu B, Dong X, Yu H (2019) Milp-based differential attack on round-reduced gift. In: Cryptographers’ track at the RSA conference, Springer, pp 372–390 6. Gohr A (2019) Improving attacks on round-reduced speck32/64 using deep learning. In: Annual international cryptology conference, Springer, pp. 150–179 7. Baksi A, Breier J, Dong X, Yi C (2020) Machine learning assisted differential distinguishers for lightweight ciphers. IACR Cryptol ePrint Arch 2020:571
19 Deep Learning-Based Differential Distinguisher for Lightweight …
257
8. Jain A, Kohli V, Mishra G (2020) Deep learning based differential distinguisher for lightweight cipher present. IACR Cryptol ePrint Arch 2020:846 9. Yadav T, Kumar M (2020) Differential-ml distinguisher: machine learning based generic extension for differential cryptanalysis. IACR Cryptol ePrint Arch 2020:913 10. Bellini E, Rossi M Performance comparison between deep learning-based and conventional cryptographic distinguishers 11. Chen Y, Yu H (2020) Neural aided statistical attack for cryptanalysis. IACR Cryptol ePrint Arch 2020:1620 12. Banik S, Pandey SK, Peyrin T, Sasaki Y, Sim SM, Todo Y (2017) Gift: a small present. In: International conference on cryptographic hardware and embedded systems, Springer, pp 321–345 13. Albrecht MR, Driessen B, Kavun EB, Leander G, Paar C, Yalçın T (2014) Block ciphers–focus on the linear layer (feat. pride). In: Annual cryptology conference, Springer, pp. 57–76 14. Yang Q, Hu L, Sun S, Qiao K, Song L, Shan J, Ma X (2015) Improved differential analysis of block cipher pride. In: International conference on information security practice and experience, Springer, pp 209–219 15. Oliphant TE (2006) A guide to NumPy, vol 1. Trelgol Publishing USA
Chapter 20
COVID-19 Detection Using Chest X-rays: CNN as a Classifier Versus CNN as a Feature Extractor N. A. Sriram , J Vishaq , T Dhanwin , V Harshini , A Shahina , and A Nayeemulla Khan
1 Introduction With the sudden surge in pneumonia cases across Wuhan, a small city in China, panic struck when it started spreading in an uncontrollable manner. The disease is researched and found to be severe acute respiratory syndrome coronavirus 2 (SARSCoV-2), also known as COVID-19 by the World Health Organization (WHO). This virus, though initially found in animals such as bats and cats, due to its zoonotic nature spread to humans later started spreading through human-to-human contact [1]. COVID-19 is an extremely infectious virus, strong enough to cause a global pandemic. As of June 2021, the virus has infected more than 180 million people, causing nearly 4 million deaths [2]. Fever, cough, sore throat, breathlessness, sneezing, headache, malaise and throat swelling are reported as the primary symptoms of COVID-19 [3]. The most common diagnosis for COVID-19 is the reverse transcriptase quantitative polymerase chain reaction (RT-qPCR) [4, 5]. Small amounts of viral RNA are extracted from a nasal swab, amplified and quantified with virus detection indicated N. A. Sriram · J. Vishaq (B) · T. Dhanwin · V. Harshini · A. Shahina Department of Information Technology, Sri Sivasubramaniya Nadar College of Engineering, Old Mahabalipuram Road (OMR), Kalavakkam, Chennai, Tamil Nadu 603110, India e-mail: [email protected] N. A. Sriram e-mail: [email protected] T. Dhanwin e-mail: [email protected] V. Harshini e-mail: [email protected] A. Shahina e-mail: [email protected] A. Nayeemulla Khan School of Computer Science and Engineering, Vellore Institute of Technology, Vandalur-Kelambakkam road, Chennai, Tamil Nadu 600127, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_20
259
260
N. A. Sriram et al.
(a) COVID
(b) Non-COVID
Fig. 1 Samples of chest X-ray images used in our dataset
visually using a fluorescent dye in this test. However, this is an exhaustive test, taking up to 2 days to get results. Other common COVID-19 detection tests include obtaining a CT scan or chest X-ray of the infected person [6]. These results, though comparatively faster than RT-qPCR test, still have to be manually examined by a radiologist for the indication of COVID-19 infection. In our work, we combine modern artificial intelligence techniques for the detection of COVID-19 from these images. Prior research works [7–9] have shown the efficiency of using convolutional neural networks (CNN) models, achieving state-of-the-art results in pneumonia detection from chest X-ray images. For our work, we use chest X-rays for detecting the presence of COVID-19. Chest X-ray images do not show the initial traces of COVID-19 infection. However, as the disease progresses, patchy infiltration can be seen, involving mid zone and upper or lower zone of the lungs, occasionally with evidence of a consolidation [5]. Chest X-ray images of COVID-19 positive and negative samples are shown in Fig. 1. The advantages of chest X-rays over conventional diagnosis are [10] as follows: 1. X-rays are cost effective and more widespread. 2. Diagnosis from an X-ray is extremely quick and usually done within a few hours. 3. X-rays are also portable unlike CT scans, increasing their accessibility. Though various CNN models have been proposed for detecting COVID-19 from CT scan images or chest X-ray images, these models have not supplanted manual techniques. Researchers are constantly developing new and improved models for better classification, which can one day be the stand-alone tool for COVID-19 detection. However, these studies can act as a second confirmation for COVID-19 test centres. In our study, we conduct three different experiments for detecting the presence or absence of COVID-19 from chest X-rays (binary classification). Various CNN models are used as classifiers in the first set of experiments, and the performance is noted. In the second set of experiments, the fully connected (dense layer) output is obtained
20 COVID-19 Detection Using Chest X-rays: CNN as a Classifier …
261
from the CNN and passed on to machine learning (ML) classifiers for performance testing. For the third set of experiments, the second experiment is slightly tweaked, and instead of the fully connected output, the flattened output is obtained. Due to the large size of the features that are extracted, principal component analysis (PCA) is done on this flattened output before passing it on to ML classifiers. The classification results for the three experiments are compared.
2 Related Works A number of studies have been conducted regarding COVID-19 detection with different datasets, varying in size and kind. Some studies have conducted binary classification for detecting the presence of COVID-19, while others have performed multi-class classification for categorizing the anomalies present in a human body including COVID-19. Most of the studies involve the usage of convolutional neural networks (CNNs) for classification purposes as they display high effectiveness in computer vision problems [11]. To optimize the visualization of tuberculosis using chest X-rays, Pasa et al. [12] proposed the use of simple CNN architectures, analysing the output using saliency maps and Grad-CAMs. CNNs have also proved to be very good image classifiers in Madaan et al.’s [13] work among others, where the feature vectors which are the outputs of the next convolutional layers and the sources of incoming data are present in various convolutional layers, showing how information is contained in multiple channels. Apostolopoulos and Bessiana [14] in their study proposed that transfer learning can be used to efficiently detect various abnormalities in medical image datasets. They employed transfer learning for the automatic detection of COVID-19-induced pneumonia. Study by Narin et al. [15] proposed the detection of COVID pneumoniainfected patients with five different pre-trained CNN models (ResNet-50, ResNet101, ResNet-152, InceptionV3 and Inception-ResNetV2) based on their chest X-ray radiographs. Similarly, a study by Madaan et al. [13] proposed an early detection system called XCOVNet using CNN, to achieve quicker results when compared to other testing methods. Hussain et al. [1] introduced a CNN model called CoroDet to conduct three different experiments for classifying among two (COVID and normal), three (COVID, normal and non-COVID viral pneumonia) and four classes (COVID, normal, non-COVID viral pneumonia, and non-COVID bacterial pneumonia), respectively. Nayak et al. [16] also equipped a CNN-based approach for the detection of COVID-19 virus by evaluating the effectiveness of eight transfer learning-based models (AlexNet, VGG-16, GoogleNet, MobileNetV2, SqueezeNet, ResNet-34, ResNet50 and InceptionV3) for the classification of COVID-19 from normal cases. The study by Barstugan et al. [17] employed five different feature extraction methods for the detection of COVID-19 virus. Moreover, they used twofold, fivefold and tenfold cross-validation methods during the classification process using support vector machines (SVM). Punn and Agarwal [18] used weighted class loss function and
262
N. A. Sriram et al.
random oversampling for transfer learning models to perform binary and multi-class classification of chest X-ray images. Oh et al. [19] employed a CNN approach which was local patch based for the COVID-19 detection with a smaller number of trainable parameters. We can see that various models have been proposed for COVID-19 detection with different datasets. A dataset which contains a subset of the dataset used in our work has been experimented by Alqudah et al. [20]. In their work, they used augmented chest X-ray images to prevent the CNN model from overfitting. Their work uses CNN as a normal classifier as well as a feature extractor, passing the highdimensional feature to SVM and random forest (RF) for classification purposes. Their CNN as a classifier performs the best among other approaches, achieving a test accuracy of 95.2%. The same group later extended their work [21] by using CNN in two different scenarios. In the first scenario, CNN is used as a feature extractor, while in the second one, features are extracted from the fully connected layer of three pre-trained CNN architectures (AOCT-Net, MobileNet and ShuffleNet). These features are passed on to three different classifiers (SVM, RF and KNN (K-nearest neighbours)) for COVID-19 detection. In their work, most of the models recorded an accuracy, sensitivity, specificity and precision of more than 98%. Another work by Haque et al. [22] uses normal chest X-ray images and also the Mendeley released augmented COVID-19 X-ray images dataset. Their work achieved a best testing accuracy of 99% for two different CNN models. In our work, we use a dataset obtained from the Kaggle data repository [23]. We perform three experiments in which CNN is used as a classifier, as a feature extractor by extracting features from the flattened layer and by extracting CNN’s first fully connected layer output for classification purposes using the chest X-ray images. The size of the dataset we use (8088 images) nearly exceeds the size of datasets used in all the previous works. The dataset description and proposed methodology are discussed in Sect. 3, and the results are presented in Sect. 4.
3 Proposed Methodology The block diagram of the three different experiments for our study is shown in Fig. 2. This section presents the dataset description and the various experiments considered for our study.
3.1 Dataset Description The dataset we consider for our study is obtained from Kaggle’s dataset [23]. This dataset contains two folders for COVID and non-COVID images. The images in the dataset have been augmented to produce 5500 non-COVID images and 4044 COVID images. Some of the augmentations include geometric image transformations such
20 COVID-19 Detection Using Chest X-rays: CNN as a Classifier …
263
Fig. 2 Framework of our proposed methodology
as flipping, rotation, translation and scaling. The dataset does not contain proper image distribution for training purposes. So, in order to avoid the problem of class imbalance, we manually split the data to have equal images in both classes, resulting in 4044 images for each class (both COVID and non-COVID). The non-COVID dataset also contains the X-ray images of other respiratory diseases such as viral pneumonia, SARS, streptococcus and pneumocystis. This dataset includes anteroposterior, posteroanterior and lateral X-ray images. All the images are converted into 227 × 227x3 dimensional images before passing it on to the models.
3.2 Experiment 1 (CNN as a Classifier) As our first experiment, we use various CNN models (including three pre-trained models) to classify the images as COVID positives and negatives. The dataset containing 8088 images is split in such a way that 80% of the input raw images are used for training the model while the remaining 20% are used up for testing. We use customized CNN models as well as pre-trained ResNet-152, AlexNet and VGG-16 as the models for performance evaluation. All the models are run for 15 epochs, with a learning rate of 0.001, using Adam optimizer.
3.3 Experiment 2 (Flattened Layer Output) Since CNNs have the exceptional capability of converting lower-dimensional features to higher-dimensional features, we use it as a feature extractor similar to the work by Alqudah et al. [20, 21]. We use the same set of CNN models used in Sect. 3.2. We obtain the higher-dimensional features from the flattened layer of the model.
264
N. A. Sriram et al.
Since the flattened output is very large, PCA is done on the features to reduce the dimensions. These features are then passed on to five different ML classifiers (SVM, decision tree (DT), XGBoost, multilayer perceptron (MLP) and RF) for the task of classification.
3.4 Experiment 3 (Fully Connected Layer Output) Similar to Experiment 2, we now add a fully connected layer of 512 neurons to the flattened layer and obtain the outputs from respective CNN models considered for our study. The output with a dimension of 512 is then used as the input feature vectors for ML classifiers. The same set of ML classifiers mentioned in Sect. 3.3 is used for classification purposes in this experiment as well. The hyperparameters used for the ML classifiers in Experiments 2 and 3 are given in Table 1.
4 Experiments and Results In this section, we discuss the performance of our various experiments and compare various models to identify the best performing one. All results for training and evaluation below are obtained using fivefold cross-validation for all classifiers.
Table 1 Hyperparameters used for ML classifiers ML classifier Experiment 2 SVM
DT
XGBoost
MLP
RF
Regularization parameter: 10 Gamma: 10 Kernel: rbf Criterion: gini Splitter: best Minimum samples leaf: 1 Objective function: binary logistic Estimators: 100 Loss: logloss Activation function: ReLu Iterations: 3000 Solver: Adam Estimators: 100 Criterion: gini Maximum features: auto
Experiment 3 Regularization parameter: 10 Gamma: 0.0001 Kernel: rbf Criterion: gini Splitter: best Minimum samples leaf: 1 Objective function: binary logistic Estimators: 100 Loss: logloss Activation function: ReLu Iterations: 3000 Solver: Adam Estimators: 100 Criterion: gini Maximum features: auto
20 COVID-19 Detection Using Chest X-rays: CNN as a Classifier …
265
4.1 Performance Metrics The evaluation of the classifier is based on how accurately it distinguishes between positive and negative COVID-19 samples. Based on this, four outcomes are considered: true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN). These values are used to predict the accuracy, precision, recall and F1-score. • Precision is the ability of a classifier to predict the positive samples as positive out of all the positive predictions made by the classifier. precision =
TP TP + FP
(1)
• Recall is the classifier’s ability to capture the number of correct positive predictions made out of all predictions that could have been made. recall =
TP TP + FN
(2)
• Accuracy is the ability of the classifier to distinguish classes properly. accuracy =
TP + TN TP + TN + FP + FN
(3)
• F1-score is the harmonic mean of precision and recall and provides a better measure of the incorrectly classified cases. F1Score =
2 × (precision × recall) precision + recall
(4)
• Area under the curve (AUC) is usually used as a receiver operating characteristic (ROC) curve summary and portrays a classifier’s ability to differentiate between classes. A confusion matrix is plotted for our best performing model (VGG-16 as a classifier) in Fig. 3, which compares the actual target values with the values predicted by the CNN model. It summarizes the performance of the model by means of an N × N matrix, where N is the number of target classes.
4.2 Performance of CNN Classifier The performance of the trained CNN models is evaluated by testing the models’ ability to accurately predict the target classes of the test dataset. The test set, which contains 1,618 images (20% of the dataset), is provided as the inputs for the CNN
266
N. A. Sriram et al.
Fig. 3 Confusion matrix for the VGG-16 model, used as a classifier Table 2 Performance evaluation of CNN as a classifier CNN model Precision (%) Recall (%) F1-score (%) 5 layer CNN ResNet-152 VGG-16 AlexNet
92.04 87.30 96.55 93.72
91.93 94.98 95.96 96.82
91.98 90.98 96.25 95.24
AUC (%)
Accuracy (%)
91.90 90.43 96.23 95.09
91.90 90.48 96.22 95.11
models (five-layer CNN, VGG-16, ResNet-152 and AlexNet). Based on the learning that the models have obtained from the train set (6,470 images), the CNN models achieve state-of-the-art results as shown in Table 2. VGG-16 outperforms all the other models with a test accuracy score of 96.22%. The ROC curve for our best performing VGG-16 classifier on test data is shown in Fig. 4.
4.3 Performance of Features Extracted from the Flattened Layer From the flattened layer of the CNN models, the corresponding higher-dimensional features for the inputted 1,618 test images and 6,470 train images are extracted. These features that have the ability to distinguish between different target classes are then passed on to ML classifiers for training and testing purposes. Five different classifiers (SVM, MLP, DT, RF and XGBoost) are trained with features of the train set. The performance of classifiers tests the CNN’s ability to capture distinguishable features. Table 3 shows the comparison of performances that are exhibited by the five ML classifiers for every model by means of test accuracy. Among all the classifiers, it could be seen that AlexNet-MLP performs the best achieving accuracy of 95.17%. The ROC curve for the same is shown in Fig. 5.
20 COVID-19 Detection Using Chest X-rays: CNN as a Classifier …
267
Fig. 4 ROC curve for VGG-16 as a classifier Table 3 Performance evaluation of the extracted features from the flattened layer, passed on to ML classifiers CNN model ML Precision Recall (%) F1-score AUC (%) Accuracy classifier (%) (%) (%) SVM DT 5 layer XGBoost CNN MLP RF ResNet-152 SVM DT XGBoost MLP RF VGG-16 SVM DT XGBoost MLP RF AlexNet SVM DT XGBoost MLP RF
50.55 60.28 85.85 88.04 72.62 86.09 79.69 88.44 92.14 75.10 93.45 81.10 92.74 94.69 89.31 93.25 85.14 93.24 93.73 85.64
100.00 61.24 93.52 90.95 64.54 83.25 77.26 87.04 91.80 83.74 95.96 84.47 92.17 95.96 68.45 96.33 85.45 92.78 96.94 82.39
67.15 60.76 89.52 89.47 68.34 84.64 78.46 87.73 91.97 79.19 94.69 82.75 92.45 95.32 77.50 94.76 85.29 93.01 95.31 83.98
50.00 59.99 88.88 89.16 69.83 84.75 78.56 87.70 91.90 77.68 94.54 82.17 92.40 95.23 80.04 94.60 85.10 92.95 95.15 84.13
50.55 60.01 88.93 89.18 69.77 84.73 78.55 87.70 91.90 77.75 94.56 82.20 92.39 95.11 79.91 94.62 85.10 92.95 95.17 84.11
268
N. A. Sriram et al.
Fig. 5 ROC curve for AlexNet feature extractor, passed on to MLP classifier
4.4 Performance of Features Extracted from the Fully Connected Layer Features are extracted from the fully connected layer (512 neurons) of the CNN models for the respective input images (6470 train images and 1618 test images). These features are then handed over to the ML classifiers for classification purposes. The ML classifiers are trained with the train set that were extracted by the CNN models and are then evaluated using the test set. The results in terms of test accuracy obtained by the ML classifiers for the four CNN models are tabulated in Table 4. The table clearly shows that VGG-16 outperforms all the other models achieving an accuracy of 96.10% with both SVM and XGBoost classifier. The ROC curves on test data, for our best performing VGG-16-SVM and VGG-16-XGBoost, are shown in Fig. 6.
4.5 Analysing Results from the Three Experiments From the three experiments, it could be seen that the best performing model for our dataset is when VGG-16 is used as a classifier, with an accuracy rate of 96.22%, as compared to CNN features extracted from the flattened and fully connected layer, achieving 95.17% and 96.10%, respectively. On comparing the average accuracy rates of the three experiments, it could be concluded that CNN features extracted from the flattened layer do not perform as good as the other two experiments.
20 COVID-19 Detection Using Chest X-rays: CNN as a Classifier …
(a) ROC curve for VGG-16 fully connected layer, passed on to SVM classifier
(b) ROC curve for VGG-16 fully connected layer, passed on to XGBoost classifier Fig. 6 ROC curves for VGG-16 fully connected layer
269
270
N. A. Sriram et al.
Table 4 Performance evaluation of the extracted features from the fully connected layer, passed on to ML classifiers CNN model ML Precision Recall (%) F1-score AUC (%) Accuracy classifier (%) (%) (%) SVM DT 5 layer XGBoost CNN MLP RF ResNet-152 SVM DT XGBoost MLP RF VGG-16 SVM DT XGBoost MLP RF AlexNet SVM DT XGBoost MLP RF
93.19 89.42 91.73 92.29 91.47 91.04 81.51 90.22 89.83 90.53 95.42 92.15 95.98 96.15 95.30 93.91 92.00 93.03 95.36 93.58
93.76 89.97 93.64 92.29 94.49 89.48 80.31 86.91 78.85 95.33 96.94 94.74 96.45 94.74 96.82 96.21 94.25 96.33 93.15 96.33
93.47 89.70 92.67 92.29 92.96 90.25 80.91 88.54 83.98 97.85 96.17 93.42 96.21 95.44 96.05 95.04 93.11 94.65 94.24 94.95
93.38 89.55 92.50 92.21 92.74 90.24 80.84 88.64 84.86 88.10 96.09 93.24 96.16 95.43 95.97 94.91 92.93 94.47 94.26 94.79
93.38 89.55 92.52 92.21 92.76 90.23 80.84 88.62 94.79 88.07 96.10 93.26 96.10 95.42 95.98 94.93 92.95 94.49 94.25 94.80
Table 5 Performance time for the best performing models, from each experiment CNN model Feature extraction (ms) Classification (ms) VGG-16 classifier AlexNet-MLP flattened layer VGG-16-SVM fully connected layer VGG-16-XGBoost fully connected layer
– 33.86 67.36
66.74 51.29 1.04
67.36
7.41
Table 5 shows the time taken to classify one sample test data as COVID-19 positive or negative. All the values are in milliseconds (ms). All of the experiments were run on Intel Core i7-8550U + NVIDIA GeForce MX150 GPU. We also compare VGG-16 and VGG-19 models, in terms of performance time. The compressed VGG-16 model takes lesser time than the bigger VGG-19 model. A similar observation is also noted when ResNet-152 and ResNet-164 pre-trained models were compared, with compressed ResNet-152 model taking lesser performance time.
20 COVID-19 Detection Using Chest X-rays: CNN as a Classifier …
271
5 Conclusion With COVID-19 gripping the world with fear, various researches have been conducted on the proper diagnosis and treatment of this deadly disease. ML engineers are constantly coming up with advanced CNN models for COVID-19 detection. In our work, for COVID-19 diagnosis, we combine deep learning’s (various CNN models) exceptional feature extraction capabilities with machine learning’s (classifiers like SVM and RF) extraordinary classification abilities. We run three different experiments and compare our results. We achieve a good classification rate. The model can be further generalized by increasing the size of the dataset, and it can also be trained to diagnose COVID-19 from CT scans. We believe this study could be helpful for further research and eventually aid the overwhelmed healthcare industry. Acknowledgements We would like to thank Kaggle for making the COVID-19 chest X-ray dataset publicly available for our experiment.
References 1. Hussain E, Hasan M, Rahman MA, Lee I, Tamanna T, Parvez MZ (2021) CoroDet: a deep learning based classification for COVID-19 detection using chest X-ray images. Chaos Solitons Fractals 142 (2021). https://doi.org/10.1016/j.chaos.2020.110495 2. https://www.worldometers.info/coronavirus/ 3. Singhal T (2020) A review of coronavirus disease-2019 (COVID-19). Indian J Pediatr 87(4):281–286. https://doi.org/10.1007/s12098-020-03263-6 4. Wang W, Xu Y, Gao R, Lu R, Han K, Wu G, Tan W (2020) Detection of SARS-CoV-2 in different types of clinical specimens. JAMA 323(18):1843–1844. https://doi.org/10.1001/jama. 2020.3786 5. Horry, M.J., Chakraborty, S., Paul, M., Ulhaq, A., Pradhan, B., Saha, M., Shukla, N.: X-ray image based COVID-19 detection using pre-trained deep learning models. engrXiv (2020). https://doi.org/10.31224/osf.io/wx89s 6. Chung M, Bernheim A, Mei X, Zhang N, Huang M, Zeng X, Cui J, Xu W, Yang Y, Fayad ZA, Jacobi A, Li K, Li S, Shan H (2020) CT imaging features of 2019 novel coronavirus. Radiology 295(1):202–207. https://doi.org/10.1148/radiol.2020200230 7. Xianghong G, Liyan P, Huiying L, Ran Y (2018) Classification of bacterial and viral childhood pneumonia using deep learning in chest rcadiography. In: Proceedings of the 3rd international conference on multimedia and image processing (ICMIP 2018), Association for Computing Machinery, New York, pp 88–93. https://doi.org/10.1145/3195588.3195597 8. Chouhan V, Singh SK, Khamparia A, Gupta D, Tiwari P, Moreira C, Damaševiˇcius R, de Albuquerque VHC (2020) A novel transfer learning based approach for pneumonia detection in chest X-ray images. Appl Sci 10(2):559. https://doi.org/10.3390/app10020559 9. Lakhani P, Sundaram B (2017) Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284(2):574–582. https://doi.org/10.1148/radiol.2017162326 10. Arpan M, Surya K, Harish R, Krithika R, Vinay N, Subhashis B, Chetan A (2020) CovidAID: COVID-19 detection using chest X-ray. https://doi.org/10.48550/arXiv.2004.09803 11. Fei G, Yue Z, Wang J, Sun J, Yang E, Zhou H (2017) A novel active semisupervised convolutional neural network algorithm for SAR image recognition. Comput Intell Neurosci 1–8. https://doi.org/10.1155/2017/3105053
272
N. A. Sriram et al.
12. Pasa F, Golkov V, Pfeiffer F, Cremers D, Pfeiffer D (2019) Efficient deep network architectures for fast chest X-ray tuberculosis screening and visualization. Sci Rep 9. https://doi.org/10. 1038/s41598-019-42557-4 13. Madaan V, Roy A, Gupta C, Anand S, Cristian B, Radu P (2021) XCOVNet: chest X-ray image classification for COVID-19 early detection using convolutional neural networks. New Gener Comput. https://doi.org/10.1007/s00354-021-00121-7 14. Apostolopoulos ID, Mpesiana TA (2020) Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks. Phys Eng Sci Med 43:635–640. https://doi.org/10.1007/s13246-020-00865-4 15. Narin A, Kaya C, Pamuk Z (2021) Automatic detection of coronavirus disease (COVID-19) using X-ray images and deep convolutional neural networks. Pattern Anal Appl. https://doi. org/10.1007/s10044-021-00984-y 16. Nayak SR, Nayak D, Sinha U, Arora V, Pachori R (2021) Application of deep learning techniques for detection of COVID-19 cases using chest X-ray images: a comprehensive study. Biomed Signal Process Control 64:1–12. https://doi.org/10.1016/j.bspc.2020.102365 17. Barstu˘gan M, Özkaya U, Öztürk S¸ (2020) Coronavirus (COVID-19) classification using CT images by machine learning methods. https://doi.org/10.48550/arXiv.2003.09424 18. Punn NS,Agarwal S (2021) Automated diagnosis of COVID-19 with limited posteroanterior chest X-ray images using fine-tuned deep neural networks. Appl Intell 51:2689–2702. https:// doi.org/10.1007/s10489-020-01900-3 19. Oh Y, Park S, Ye JC (2020) Deep learning COVID-19 features on CXR using limited training data sets. IEEE Trans. Med. Imaging 39(8):2688–2700. https://doi.org/10.1109/TMI.2020. 2993291 20. Alqudah A, Qazan S, Alquran H, Abuqasmieh I, Alqudah A (2020) Covid-2019 detection using X-ray images and artificial intelligence hybrid systems. https://doi.org/10.5455/jjee.204158531224. 21. Alqudah A, Qazan S, Alqudah A (2020) Automated systems for detection of COVID-19 using chest X-ray images and lightweight convolutional neural networks. https://doi.org/10.21203/ rs.3.rs-24305/v1. 22. Haque AKMB, Rahman M (2020) Augmented COVID-19 X-ray images dataset (Mendely) analysis using convolutional neural network and transfer learning. https://doi.org/10.13140/ RG.2.2.20474.24003 23. https://www.kaggle.com/ssarkar445/covid-19-xray-and-ct-scan-image-dataset
Chapter 21
Fuzzy Set-Based Frequent Itemset Mining: An Alternative Approach to Study Consumer Behaviour Renji George Amballoor and Shankar B. Naik
1 Introduction Market basket analysis aims to discover interesting patterns in the form of frequent itemsets and associations rules from transactional databases [1]. These patterns help identify items that are collectively in demand. Association rules between itemsets reveal the dependence of the purchase of an item on the purchase of another item [2]. However, in the world dominated by Heisenberg’s Uncertainty Principle [3], there is a trade-off in the precise measurement of consumer behaviour [4, 5]. The inability to simultaneously measure the position and velocity of a photon in quantum physics translates into the helplessness of market research agencies to measure what the consumer intends to buy and what he actually buys. The wave-particle duality of consumer behaviour exhibited by a consumer by being both rational and irrational makes the precise choice prediction complex. The traditional economic theories believed that consumers are rational homo economicus [6] referred to as Econs [7] who tries to maximize their utility based on self-interest [8]. The impartial spectator (decision-making agent) will always make the best choice from the set of available alternatives for utility optimization. The writings of classical and neoclassical schools of economics sketched a rational consumer with infinite cognitive potentials characterized with a choice behaviour for maximizing his/her expected utility [9]. Simon questioned the concept of perfect rationality [10] of consumers in the real world. He was of the opinion that the consumers exhibit bounded rationality [11]
R. G. Amballoor (B) · S. B. Naik Directorate of Higher Education, Government of Goa, Porvorim, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_21
273
274
R. G. Amballoor and S. B. Naik
resulting from inborn biological and rational limitations [12, 13]. It is highly impossible for the consumer to scan through all the alternatives and to examine the consequences of each of it while deciding to buy goods and services. Therefore, the rationality is bounded due to inadequacy of thinking capacity, time and available information [13]. Tversky and Kahneman, who laid the foundation for the new discipline of behavioural economics showed how human brains functions when encountered with complexity/uncertainty. When the consumer is faced with challenging decisions or ambiguous evidence, their judgement will be subject to predictable errors. The error creeps in due to heuristics and biases [14] which are basically mental shortcuts and assumptions for quick decisions. Therefore, consumers are satisficers seeking satisfactory solutions over optimal ones. The traditional theories explain consumer behaviour using the binary logic and are inadequate to explain real world scenario. Conversely, the behavioural school is of the opinion that it is impossible to trust our preferences to reflect our interest while decision-making [15], even when it is based on personal experience and recent memories [16]. When people use heuristics in decision-making, they are in fact using accuracy-effort trade-off to make the best choice [17]. Similarly, consumers may use the system thinking which results in fast, usual and spontaneous to avoid the wastages of mental resources and still better choices [18]. Hence, in the paper, we propose an algorithm fuzzy set-based frequent itemset mining (FSFIM) to mine frequent itemsets and association rules by applying the concept of fuzzy sets. The quantities of the items purchased are used to classify the item as low, medium or high based on the membership function of each of the fuzzy sets low, medium or high. Thereafter, frequent itemsets and association rules are generated which containing the classified items. The patterns have added information based on the quantity of the items purchased.
2 Motivation and Related Work Frequent itemset mining aims at discovering frequent itemsets from transactional databases. It was proposed by Srikant et al. [19]. Algorithms such as Apriori [19], FP-Growth [20] and Eclat [21] have been proposed to discover frequent itemsets from transactional datasets. The patterns generated are in the form of frequent itemsets and association rules. These patterns contain information about the frequency of occurrence and associations only in the form of itemsets only. They do not reveal anything information about the quantity of the items purchased in a transaction. Information about frequency of purchase of various quantities of items is useful, especially in production and packaging of items in different quantities. Association rules between different quantities of items will help understand the probability of a customer buying a particular amount of an item when he/she has already purchased some quantity of another item. Mining frequent itemsets based on their quantities purchased suffers from the problem that the occurrence of a pair of item and its quantity will be less as compared
21 Fuzzy Set-Based Frequent Itemset Mining: An Alternative …
275
to the itemset itself. Hence, is it required that the quantities are classified into ranges and each range is given a label. The concept of fuzzy sets can be applied to convert the quantities of purchase into labels. In this paper, we propose a method to classify the each item in transaction as low, medium or high based upon the quantity of the item purchased in the transaction and then mine frequent itemsets and association rules containing itemsets classified as low, medium or high. The human science, especially social science including economics, deals with the people [22]. The conventional models in statistics and mathematics provide crispy estimates which are unable to translate into linguistic terms, concepts and ideas like low preference, medium preference and high preference. Even though linguistic and verbal concepts provide a better representation of real-life situation, it is inadequate for mathematical model building [23]. Fuzzy logic methodology is most desirable when there is ambiguity and vagueness about input values and theoretical relationships. The concept of fuzzy logic was introduced for the first time by Zadeh [24] in his paper titled fuzzy sets. The fuzzy set is multi-valued logic that has no crisp boundaries and will contain elements with different degrees of membership ranging from 0 (total falsity) to 1 (total truth). For example, the preference of consumers A, B and C can be expressed using fuzzy logic as A having 0.89 degree membership, followed by B with 0.68 and C with 0.53. The inaccuracies of economics reality and phenomena can be preserved using fuzzy logic [25]. It is used to study consumer behaviour with non-standard preferences. The non-standard preferences result in predictable anomalies like imperfect optimization and lack of self-control [26]. The sorites paradox [27], wherein n grains make a heap and n − 1 grains also makes a heap. But it is difficult to say what amount of grains do not make a heap. However, one grain cannot make a heap. The paradox highlights the importance of fuzzy concepts in economics and social sciences. The fuzzy logic is useful to explain human behaviour when it is difficult to explain in Boolean logic with data point having a crisp membership of 1 (yes) or 0 (no), i.e. a data point is either a member of the set or is not a member. The recently proposed algorithms in Refs. [28] and [29] discover itemsets using fuzz logic. The generated patterns of interest are in the form of itemset and do not reveal any information about the quantities purchased, although they are executed on databases which have information about the quantities of each item purchased.
3 Fuzzy Set-Based Frequent Itemset Mining (FSFIM) 3.1 Preliminaries Let I = {x 1 , …, nn } be the set of literals called item. Let D = {t 1 , … t N } be the database containing N transactions wherein each transaction t i = {(x ij , qij )/ 1 ≤ i ≤ m, 1 ≤ j ≤ n, x ij ∈ I and qij ≥ 1} consists of pairs of item x ij and the quantity qij of the item purchased in the transaction.
276
R. G. Amballoor and S. B. Naik
Table 1 Database of transactions
Transaction id
Items purchased
1
(A, 2) (B, 8)
2
(A, 9) (C, 8)
3
(B, 4) (C, 9) (D, 10)
4
(A, 6) (E, 2)
5
(A, 5) (D, 8) (E, 5)
For example, as shown in Table 1, the database consists of 5 transactions. Transaction 1 consists of two items A and B. The quantity of item A purchased is 2, and quantity of item B purchased is 8.
3.2 Fuzzyfication of the Items Based on Quantity Purchased We define three fuzzy sets low, medium and high with membership functions as shown in Fig. 1. Both sets, low and high, have sigmoid membership functions given as μLow (X i j ) = μHigh (X i j ) =
1 1+
e−αL (X i j −bL )
1 1 + e−α H (X i j −b H )
(1) (2)
Set medium has bell shaped member function given as μMedium (X i j ) =
Fig. 1 Membership functions for linguistic terms low, medium and high
1 (X i j −C M ) 1 + α M 2b M
(3)
21 Fuzzy Set-Based Frequent Itemset Mining: An Alternative … Table 2 Database of fuzzy transactions
Transaction id
Items purchased
1
(Low. A) (High. B)
2
(High. A) (High. C)
3
(Medium. B) (High. C) (High. D)
4
(Medium. A) (Low. E)
5
(Medium. A) (High. D) (Medium. E)
277
The values of the parameter aL , bL , aH , bH , aM and bM are decided by the user. A new database Dμ is derived from the original database D by applying the expressions 1, 2 and 3 to the elements of the transactions in database D. An element (x ij , qij ) in transaction t i derives (S.x ij ) where S is the label of the fuzzy set which has the maximum value of the μ for the item x ij . This step removes the quantity associated with the item and classifies the item as either low or high or medium based upon the quantity purchased in the transaction. The database Dμ derived from the database mentioned in the example in Table 1 is shown in Table 2.
3.3 Frequent Itemset Generation and Association Rule Mining Algorithm such as Apriori or FP-Growth is executed on the database Dμ to generate the frequent itemsets. The frequent itemsets generated here are already classified as low, medium or high as these labels are already associated with the itemsets in Dμ . Thus the patterns generated contain information about the quantity-wise frequency of purchase of an item. For example, if (Low, A) and (High, B) are frequent, then it means that item A is frequently purchased in low quantity, while item B is frequently purchased in high quantity. The association rules generated are not between the quantities of items purchased. For example, the association rule (High. A) (Low. B) with confidence 90% implies that the probability of a customer purchasing low amounts of item B when he/she has purchased high amounts of item A if 95%. The fuzzy set-based frequent itemset mining (FSFIM) algorithm is presented in Fig. 2, while Fig. 3 shows the flow chart.
4 Experiments The proposed method was implemented using C++ on a system with Intel(R) Core(TM) i5-8250U CPU @ 1.60 GHz 1.80 GHz and 4 GB RAM. The datasets
278
R. G. Amballoor and S. B. Naik
Fig. 2 Algorithm fuzzy set-based frequent itemset mining (FSFIM)
Fig. 3 Flow chart for fuzzy set-based frequent itemset mining (FSFIM)
used are synthetic datasets. The dataset has 200 K transaction. The number of items was 10. The maximum quantity of item purchased in a single itemset was 100. The proposed algorithm was compared with the Apriori algorithm. The observations are listed in Table 3. Table 3 Experimental results for s0 = 0.4
Parameter
Proposed algorithm
Apriori
Frequent itemset count
9
14
Association rule count
6
7
Execution time (s)
0.05
0.02
21 Fuzzy Set-Based Frequent Itemset Mining: An Alternative …
279
It was observed that the proposed algorithm generated less number of frequent itemsets and association rules as compared to the number of frequent itemsets and association rules generated using Apriori algorithm for the same value of minimum support threshold. However, the patterns generated using the proposed approach contain added information about the quantity of the items purchased in the transaction which was missing in the patterns generated using the Apriori approach. The proposed algorithm requires more time than the Apriori algorithm for the same database and same value of S 0 . These is due to the addition step of fuzzyfication of transactions.
5 Conclusion The rational and the bounded rationality school approaches to study of the consumer behaviour have not been able to provide a realistic picture about the several characteristics of items/products/services. The study in this paper attempts to apply the concept of fuzzy sets to the study of consumer behaviour (market basket analysis). This paper has focused on the generation of frequent categories of itemsets and associations rules between them. The categories of the itemsets are low, medium and high. Itemsets in the source database are classified on the basis of the quantity of their purchase in the transactions. The categories are represented as fuzzy sets. The classification of an item is done using membership of the item to the fuzzy set. This approach to study can generate new inputs for firms, market research agencies and policy makers which otherwise are not captured by the concept crisp sets. The data considered in this study is static. In future, it is proposed to apply fuzzy set-based frequent itemset mining for real-time data and to identify scenarios which will help further refining the method.
References 1. Naik SB, Khan S (2021) Application of association rule mining-based attribute value generation in music composition. In: Bhateja V, Satapathy SC, Travieso-Gonzalez CM, Aradhya VNM (eds) Data engineering and intelligent computing. Advances in intelligent systems and computing, vol 1407. Springer, Singapore. https://doi.org/10.1007/978-981-16-0171-236 2. Amballoor RG, Naik SB (2021) Utility-based frequent itemsets in data streams using sliding window. In: 2021 International conference on computing, communication, and intelligent systems (ICCCIS), pp 108–112. https://doi.org/10.1109/ICCCIS51004.2021.9397198 3. Busch P, Heinonen T, Lahti P (2006) Heisenberg’s uncertainty principle. Phys Rep 452(6):155– 176 4. Amballoor RG, Naik SB (2021) Dissemination of firm’s market information: application of Kermack-Mckendrick SIR model. In: Singh M, Tyagi V, Gupta PK, Flusser J, Ören T,
280
5.
6.
7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
20. 21.
22. 23.
24. 25. 26. 27. 28. 29.
R. G. Amballoor and S. B. Naik Sonawane VR (eds) Advances in computing and data sciences. ICACDS 2021. Communications in computer and information science, vol 1441. Springer, Cham. https://doi.org/10.1007/ 978-3-030-88244-03 Amballoor RG, Naik SB (2021) Sustainability issues of women street vegetable & flower entrepreneurs in Goa: need for state interventions. J Entrepreneurship Innov Emerging Econ. https://doi.org/10.1177/23939575211044308 Mill J (2007) On the definition and method of political economy. In Hausman D (Author) The philosophy of economics: an anthology. Cambridge University Press, Cambridge, pp 41–58. https://doi.org/10.1017/CBO9780511819025.003 Thaler Richard H (2016) Misbehaving: the making of behavioral economics. W.W. Norton Company Adam S (2008) An inquiry into the nature and causes of the wealth of nations: a selected edition. Kathryn Sutherland (ed). Oxford Paperbacks, Oxford Walras L (1883) Th´eorie math´ematique de la richesse sociale. Duncker & Humblot, Lausanne Simon HA (1997) Administrative behavior, 4th edn. Free Press, New York Simon HA (1982) Models of bounded rationality. MIT Press, Cambridge, MA Simon HA (1955) A behavioral model of rational choice. Q J Econ 69(1):99–118 Simon HA (1956) Rational choice and the structure of the environment. Psychol Rev 63(2):129– 138 Kalantari B (2010) Herbert A Simon on making decisions: enduring insights & bounded rationality. J Manage History 16(4):509–520 Tversky A, Kahneman D (1974). Judgment under uncertainty: heuristics and biases. Science 185(4157):1124–1131 Daniel K (2011) Thinking, fast & slow. Penguin Books, New Delhi Gigerenzer G (2008) Why heuristics work. Perspect Psychol Sci 3(1):20–29. https://doi.org/ 10.1111/j.1745-6916.2008.00058.x L¯ıga P (2019) Criticism of behaviourial economics: attacks towards ideology, evidence & practical application. J WEI Bus Econ 8 Schuster Simon Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of 20th international conference on very large data bases, VLDB, vol 1215, pp 487–499 Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. ACM SIGMOD Rec 29(2):1–12 Zaki M, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Proceedings of 3rd International conference on knowledge discovery and data mining (KDD’97). AAAI Press, Menlo Park, CA, USA, pp 283–296 Smithson M (1988) Fuzzy set theory & the social science: the scope for application. Fuzzy Sets Syst 26:1–21 Jelena DSI, Djuris Z (2013) Neural computing in pharmaceutical products & process development. In Djuris J (ed) Computer aided applications in pharmaceutical technology. Springer Zadeh LA (1965) Fuzzy sets. Inf Control 8:338–353 Ferrer-Comalat JC et al (2020) Fuzzy logic in economic models. J Intell Fuzzy Syst 38(5):5333– 5342 Francesc T (2015) Fuzzy logic in modern economics. In: Seising R et al (eds) Towards the future of fuzzy logic. Springer, Switzerland Basu K (1984) Fuzzy revealed preference theory. J Econ Theory 32:212–227 Cui Y, Gan W, Lin H, Zheng W (2021) FRI-Miner: fuzzy rare itemset mining. arXiv preprint arXiv:2103.06866 Wu TY, Lin JCW, Yun U, Chen CH, Srivastava G, Lv X (2020) An efficient algorithm for fuzzy frequent itemset mining. J Intell Fuzzy Syst 38(5):5787–5797
Chapter 22
EEG Seizure Detection Using SVM Classifier and Genetic Algorithm Tulasi Pendyala, Anisa Fathima Mohammad, and Anitha Arumalla
1 Introduction Over 50 million people among the world’s population experience epileptic seizures regardless of age. Out of the 50 million, about 10 million people suffer from epilepsy in India. Epilepsy occurs due to uncertain electrical events in the brain neurons causing seizures. If untreated, epileptic seizures may lead to strokes of uncontrollable movements, falls, burns, unconsciousness, and sometimes death. Therefore, proper treatment is required to manage the sudden seizures that may occur to a patient. Epileptic seizures are mostly detected by using electroencephalogram (EEG) which records the electrical patterns in the brain. The abnormal activity in the EEG recording can be categorized as [1]: (i) interictal and (ii) ictal to identify the epileptic patient. The interictal represents the period in between the seizures, and the period during a seizure is called ictal. The EEG recordings and visual interpretations by the neurologist expert are tedious. To provide proper treatment, epileptic seizures must be detected in time for further treatments. In this paper, for detection of seizures in EEG signals efficiently, a method has been proposed. The method is based on four main stages: (a) EEG signal processing, (b) signal decomposition using db4, (c) feature selection using ACO and GA, and (d) SVM and KNN algorithms for classification. The remaining paper is arranged in the following sequence. In the Sect. 2, related work on EEG seizure detection is described. Section 3 describes the methodology. Section 4 represents the analysis and results. Section 5 presents the conclusion of this paper.
T. Pendyala (B) · A. F. Mohammad · A. Arumalla V. R. Siddhartha Engineering College, Vijayawada, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_22
281
282
T. Pendyala et al.
2 Related Work Neurologists classified the seizures into two types based on the symptoms. They are partial seizure and generalized seizure. Partial seizure causes only a section of cerebral hemisphere to be effected, whereas in generalized seizure entire brain networks get effected quickly, and all regions of brain suffer. Till now, classification of seizure detection methods has been done in various ways. The latest technologies and requirements in seizure detection are categorized into four domains. They are time domain, frequency domain, and wavelet domain. Generally, time-domain analysis is patient specific, which is performed on shorttime windows called epochs. The authors Runarsson and Sigurdsson [2] proposed a solution using the half-wave and histogram methods. In this, SVM was used as a classifier, and a sensitivity of 90% was achieved with their solution. In the time domain, the frequency components of the signal are not provided which is a drawback and is overcome by frequency analysis. In Ref. [3], the authors used an autoregressive (AR) model to automatically classify epileptic seizures. In their work, after using fast Fourier transform (FFT) as a preprocessing method for the neural network, the classification accuracy achieved is 91%. The combination of frequency and time domains can give better accuracy, which can be achieved by wavelet transform. The author Panda et al. [4] used a feature extraction technique with five-level decomposition in their approach. The statistical features such as standard deviation, entropy, and energy are extracted. Daubechies (db2) was used as a reference wavelet, and an SVM classifier was used. The algorithm was tested on 500 epochs of EEG data for detecting seizure activity. They compared the results of every feature individually and got an accuracy of 91.2% for the energy feature. The author Baldominos et al. [5] optimized the EEG seizure detection process using a genetic algorithm and evaluated it on the CHB-MIT database. The results obtained were quite fascinating as the number of false positives achieved was very less when compared to other works. Ali et al. [6] proposed a technique of genetic algorithm-based feature selection for EEG data. Their proposed technique was efficient to achieve better classification with a fewer number of features. They used a genetic algorithm for feature selection and then classified using KNN classifier yielded better results than the results with the full feature set. The accuracy of the proposed technique was enhanced by 3% when compared to the accuracy obtained without feature selection. Similarly, Bharat et al. [7] used ant colony optimization-based classification which yielded good results in their classification approach. In Ref. [5], the frequencies less than 5 Hz (delta rhythms) were removed due to the medical statement that seizures rarely occur during sleep, and also, they only considered the seizures which are longer than 22 s which could be a drawback as seizures can also occur in shorter duration. The SVM with GA as feature selection method is considered by taking the reference of Fan et al. [8] where they got highest accuracy of 91.3%.
22 EEG Seizure Detection Using SVM Classifier and Genetic Algorithm
283
In this work, the seizure occurrence is detected by using SVM classifier and genetic algorithm as a feature selection method which results good accuracy and efficient detection at every second.
3 Methodology The process flow of proposed method for automated seizure detection is shown in Fig. 1
3.1 EEG Database The CHB-MIT dataset is used, which is publicly made available by the Massachusetts Institute of Technology Boston Children’s Hospital (CHB-MIT). It contains 686 EEG records from the scalp of 22 patients treated at CHB. The dataset includes 23 sets of EEG records from 22 patients; 5 men aged 3 to 22 years old, and 17 women aged 1.5 to 19 years old. Each patient’s record may have numerous seizures and non-seizure log files, which represent the outbreak at the beginning and end of the seizure and can be easily viewed in a browser called EDF browser. The main dataset adopts a 1D format and contains EEG signals received through various channels. All these signals from the dataset are sampled at 256 Hz. Records that hold at least one seizure are classified as seizure files, and those that do not contain seizures are classified as non-seizure files. Of the 686 records, 198 records contained anomalies.
3.2 Preprocessing The frequency range of the EEG signals is limited to (0–64 Hz) using a FIR LPF so that the frequencies above the limit are considered as noise. Then, wavelet decomposition (db4) is used to extract rhythms (gamma, beta, alpha, theta, and delta) of EEG signal.
Input data
EEG signals from public dataset
Signal Decomposition FIR filter db4wavelet decomposition
Fig. 1 Process flow for proposed method
Feature Extraction
Statistical features
Feature Selection
ACO GA
Classification
SVM KNN
284
T. Pendyala et al.
Band Limited EEG (0-64HZ) A1 (0-32Hz)
LPF
HPF
D1 (32-64Hz) Gamma
A2 (0-16Hz) A1 (16-32Hz)
LPF
HPF
Delta
A3 (0-8Hz)
LPF
A1 (8-16Hz)
HPF
Alpha
A4(0-4Hz) A1 (4-8Hz) Beta
LPF
HPF
Theta
Fig. 2 Wavelet decomposition (db4) of EEG signal
The wavelet transforms for EEG signal decomposition can be depicted in Fig. 2. Here, two-function convolution is used with high-pass filter (HPF) and low-pass filter (LPF) coefficients, and then, signal is further down sampled. The signal in the wavelet is divided into two types, approximation and detail. The approximation coefficients are obtained from LPF, and the detail signal is obtained from HPF as shown in Fig. 2. The formula for LPF and HPF is expressed as: LP(S) = A j =
S(k)h(2 j − k)
(1)
S(k)g(2 j − k)
(2)
K
HP(S) = D j =
K
where LP(S) = Low-pass filter output. HP(S) = High-pass filter output. Aj /Dj = Approximation/detail coefficient. S(k) = Input signal. h/g = Filter coefficient. Daubechies wave is a feature of fourth-order smoothing (db4) that can detect changes in the signal. Hence, the ripple coefficient is calculated using db4. The frequency bands with different attenuation rates of fourth-order Daubechies wavelet
22 EEG Seizure Detection Using SVM Classifier and Genetic Algorithm
285
(db4) with a sampling rate of 256 Hz are as follows: D1, D2, D3, D4, A4 with frequency bands 64–32 Hz, 32–16 Hz, 16–8 Hz, 8–4 Hz, 4–0 Hz, respectively.
3.3 Feature Extraction Because many factors must be processed and analyzed, the following are some statistical features calculated from wavelet coefficients of each sub-band of EEG signals. 1. 2. 3.
Maximum amplitude. Minimum amplitude. Energies: The energies of detail and approximation coefficients are calculated using the following equations: EDi =
N
|Di j |2 i = 1, 2, ....l
(3)
|Ai j |2 i = 1, 2, ....l
(4)
j=1
EAi =
N j=1
4.
where EDi = energy of wavelet detail. EAi = energy of wavelet approximation. Aij /Dij = approximation/detail coefficients. l = level of decomposition. N = no. of approximation and detail. Mean: The mean of the signal is calculated as μi =
5.
N 1 Di j i = 1, 2...l N j=1
(5)
Standard deviation: The calculation of standard deviation is performed using following equation. σ =
1 (Di − μ)2 N − 1 i=1 N
(6)
286
6.
T. Pendyala et al.
Skewness: Skewness is defined as the measure of symmetry. The skewness is calculated as skewness =
7.
N 1 Di − μ 4 −3 N i=1 σ
(7)
Kurtosis: Kurtosis is used to measure the shape of random variables probability distribution. It is calculated using (8). N 1 Di − μ 4 kurtosis = N i=1 σ
(8)
3.4 Feature Selection Evolution is a popular method to pick the finest features from the extracted feature set. These methods include many different methods, such as differential evolution (DE), GA, particle swarm optimization (PSO), ACO, and artificial bee colony (ABC). Here, ACO and GA are used to select features. ACO uses interactive artificial ants to discover noble results to discrete optimization problems. The algorithm uses meta-heuristic behavior in the ant system. In this system, ants build solutions iteratively and add pheromones to paths correlating to these solutions. Selection of path is a random process, which is based on two parameters: (i) the ant pheromone (ii) the heuristic value. The value of pheromone shows the number of ants. When the ant reaches the decision point, it is more certain to choose the path with the highest heuristic score and pheromone value. The solution corresponding to the ant path is evaluated, and the pheromone path value increases accordingly. Since ACO gives fast discovery of solutions to the problems, able to adapt changes, and can search among large data, it is used here. In this model, by using ACO, 50 features are selected from 230 features by using ACO as feature selection technique. A genetic algorithm (GA) is an algorithm that belongs to the category of evolutionary algorithm (EA) and is motivated by the regular selection process. In a genetic algorithm, the population (a set of possible solutions) of an optimization problem is transformed into the best solution. Each individual possible solution has its features (their chromosomes or genotypes) that can be mutated and changed. The solution is generally expressed as binary strings and can be expressed in new encodings. The process is iterative, usually starts with an arbitrary population of possible solutions. In each iteration, the each solution fitness is evaluated. The most suitable solutions are randomly nominated from the recent set. The genome of each solution is altered to generate a new generation. Then, in the next iteration, the newly generated possible individual solutions are considered. The algorithm concludes when the highest no. of generations occurs or when the fitness is satisfactory. Since the
22 EEG Seizure Detection Using SVM Classifier and Genetic Algorithm
287
genetic algorithm is easier to understand and it is useful in finding good solutions to relatively large problems, it is used here. In this model, 114 features are selected from 230 features by using GA as feature selection technique.
3.5 Classification SVM classifier is a simple, efficient classifier which predicts faster than other algorithms and uses less memory while providing good accuracy. In this classification model, a total of 810 data points are obtained from feature extraction. From that 810 data points, 648 are used as train data points, and the rest are used as test data points by classifier. A 2 s epoch is considered for extracting 230 features. As the EEG feature set is a high dimensional data, we use SVM classifier because the SVM classifier performs well when the no. of features is much greater. The SVM classifier is mostly used in medical and scientific fields. A total of 57 support vectors are obtained in our classification model, and the RPF kernel function is used with a kernel scale of 10,000. The KNN classifier is easy to implement and is robust with respect to search space. The KNN classifier outperforms other classifiers when the training set is much larger because the KNN will discover the distance between the uncertainty and all samples in the database by selecting the sample closest to the specified number (K) of the query and then vote for the most frequent label. The number of neighbors used in our model is 3, and Euclidean is used as the distance parameter.
3.6 Evaluation of Classifier Performance After classification is done, it is very important to assess the classifier to test its usefulness. Here, classifier performance uses measures such as specificity, sensitivity, precision, accuracy, and f_measure to find the predictive ability of a classifier model. Almost, all indicators are based on the concept of true and false predictions generated by the model and compared with actual results. If the model is perfect, then all predictions are either true positive or true negative. If forecasts do not match, the actual results are marked as false positive or false negative. It is important to minimize false negative results, as the seizure may go unnoticed. Accuracy: Accuracy =
TP + TN × 100 TN + FP + TP + FN
(9)
288
T. Pendyala et al.
Sensitivity: Sensitivity =
TP × 100 TP + FN
(10)
Specificity =
TN × 100 TN + FP
(11)
Precision =
TP × 100 TP + FP
(12)
Specificity:
Precision:
f_measure: f_ measure = 2 ×
(precision · specificity) (precision + specificity)
(13)
4 Result Analysis In the result analysis section, the experimental results of proposed solution at various stages are shown.
4.1 Preprocessing The EEG signals are passed to a FIR LPF to band limit the EEG signals to (0–64 Hz) so that the frequencies higher than the limit are taken as noise. Figure 3 shows the frequency response of non-filtered and filtered EEG signal.
4.2 Wavelet Decomposition After filtering of EEG signal to (0–64 Hz) frequency band, the signal is then divided into 2 s epochs. The db4 wavelet decomposition method is used to divide the signal into 5 sub-bands, namely gamma band (32–64 Hz) beta band (16–32 Hz), alpha band (8–16 Hz), theta band (4–8 Hz), and delta band (0.5–4 Hz); Fig. 4a shows the normal EEG rhythms, and Fig. 4b shows the abnormal EEG rhythms. The division of filtered EEG signal into five bands gives us detailed information on EEG which can’t be found in filtered data.
22 EEG Seizure Detection Using SVM Classifier and Genetic Algorithm
289
Fig. 3 Output of FIR filter
The statistical features like maxima, minimum, mean, standard deviation, kurtosis, skewness, and the detailed, approximation energy ratios (ED1, ED2, ED3, ED4, EA1) for each 2 s segment are considered as features for classification. Each EEG signal having 23 channels and each channel consist of 10 features. So, overall 230 features are used for classification.
4.3 Classification Without Feature Selection Firstly, all the features without using feature selection are considered to train the classification model. Figure 5 is the ROC curve for classifiers SVM and KNN without feature selection which shows that KNN has more area under curve than SVM.
4.4 Feature Selection and Classification Without feature selection, the unwanted, redundant features from the dataset are also included in classification which resulted in reduction of accuracy.
290
T. Pendyala et al.
(a)
(b) Fig. 4 a Rhythms of normal data; b rhythms of abnormal data
Classification with ACO as Feature Selection: Here, both k-nearest neighbor and SVM classification algorithms are used. As the number of features extracted is large in count, so ACO reduces the number of features and to train a classification model. Figure 6 gives the details about ACO parameters. Figure 7 is the ROC curve for classifiers with ACO feature selection method which shows that KNN has more area under curve than SVM, i.e., KNN is outperforming SVM when ACO feature selection is used.
22 EEG Seizure Detection Using SVM Classifier and Genetic Algorithm
291
Fig. 5 ROC without feature selection INPUT Parameter
Value
N
10
Max_Iter
100
tau
1
alpha
1
rho
0.2
beta
0.1
eta
1 Data Table
Fig. 6 Ant colony optimization parameters
OUTPUT
ACO
Parameter
Value
Nf
50
Selected Features
292
T. Pendyala et al.
Fig. 7 ROC with ACO feature selection
Classification with GA as Feature Selection: Even though the number of selected features is reduced to 50 in ACO, it effects the model accuracy. In order to get more accuracy, GA for feature selection is used to train a classification model. Figure 8 gives the details about GA parameters. Figure 9 is the ROC curve for classifiers with GA feature selection method which shows that SVM has more area under curve than KNN, i.e., SVM is outperforming KNN when GA feature selection is used.
INPUT Parameter
OUTPUT
Value
N
10
Max_Iter
100
Crossover Rate
0.8
Mutation rate
0.3
Data Table
Fig. 8 Genetic algorithm parameters
GA
Parameter
Value
Nf
114
Selected Features
22 EEG Seizure Detection Using SVM Classifier and Genetic Algorithm
293
Fig. 9 ROC with GA feature selection
4.5 Comparison Table Among all the classifier models, the SVM with GA algorithm for feature selection gave the highest accuracy, and KNN without any feature selection has also given good results. Table 1 shows values of confusion matrices for both classifiers with different feature selection methods. Table 2 shows the comparison between evaluation parameters of different classifiers. After extracting the classifier rules, the sample signal of one hour is divided into two second’s segments. The classifier rules are used to detect whether a seizure is present in that two second segment or not. By taking all the time periods which Table 1 Confusion matrix Feature selection
–
ACO
GA
classifier
KNN
SVM
KNN
SVM
KNN
SVM
True positive
76
77
72
74
73
79
False positive
5
4
9
7
8
2
True negative
81
77
78
75
80
81
False negative
0
4
3
6
1
0
294
T. Pendyala et al.
Table 2 Comparison between both classifiers without and with feature selection algorithm Feature selection
-
classifier
KNN
Accuracy Sensitivity
ACO 96.91
100
GA
SVM
KNN
SVM
KNN
95.06
92.59
91.88
94.44
95.06
96.30
92.59
98.77
95.06
88.89
91.36
90.12
SVM 98.77 100
Specificity
98.83
97.53
Precision
94.19
95.06
89.66
91.46
90.91
97.59
f_measure
97.01
95.06
92.86
92.02
94.67
98.78
contain a seizure. We developed an event information text which contains the latency (time at which seizure occurred), type (Type of classifier).
4.6 EEG Plots For more detailed observation of the EEG signals, we used the EEG labs toolbox in Matlab. After opening the EEG labs toolbox (file >> import data >> Using EEG functions and plugins >> From EDF files) and upload the respective file, then upload the events information file. Then, select the plot. It will display the EEG data and seizure time as shown in below figures. Figure 10 shows the EEG plot when no seizure is present, and Fig. 11 shows the EEG plot when seizure is present and detected by the classifier.
Fig. 10 EEG plot with no seizure
22 EEG Seizure Detection Using SVM Classifier and Genetic Algorithm
295
Fig. 11 EEG plot with seizure
4.7 Observations After testing ACO, GA feature selection methods with SVM and KNN classifiers, the observations made are as follows: When no feature selection method was used, SVM achieved 95.6%, and KNN achieved 96.91% accuracies. When ACO was used as feature selection method, SVM and KNN achieved 91.98% and 92.59%, respectively. When GA was used as feature selection method, SVM achieved 98.77% accuracy, whereas KNN achieved 94.44% accuracy, and sensitivity of 100% and 98.77% was achieved, respectively. Therefore, from the observations, it can be concluded that the system achieved highest accuracy and sensitivity when GA feature selection method was used with SVM classifier when compared to other combinations.
5 Conclusion Using statistical features, genetic algorithm as feature selection method and SVM as classifier, an efficient seizure detector was realized in MATLAB. Firstly, EEG database was pre-processed, and statistical features such as energies of detail and approximation coefficients are derived using four-level wavelet decomposition. Then, best features are selected by using ACO and GA feature selection methods which are then trained and tested by the SVM and KNN classifiers. From the experimental result of seizure detector system, we conclude that the feature selection method of genetic algorithm with SVM classifier yielded highest accuracy and sensitivity when compared with ACO. This method was evaluated on the dataset of epileptic and normal data, and it was able to classify seizure and no seizures in MATLAB software with high accuracy.
296
T. Pendyala et al.
References 1. Adeli H, Ghosh-Dastidar S, Dadmehr N (2007) A wavelet-chaos methodology for analysis of EEGs and EEG subbands to detect seizure and epilepsy. IEEE Trans Biomed Eng 54(2):205–211 2. Runarsson TP, Sigurdsson S (2005) On-line detection of patient specific neonatal seizures using support vector machines and half-wave attribute histograms. In: International conference on computational intelligence for modelling, control and automation and International conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06), vol 2. IEEE, pp 673–677 3. Subasi A, Kemal Kiymik M, Alkan A, Koklukaya E (2005) Neural network classification of EEG signals by using AR with MLE preprocessing for epileptic seizure detection. Math Comput Appl 10(1):57–70 4. Panda R, Khobragade PS, Jambhule PD, Jengthe SN, Pal PR, Gandhi TK (2010) Classification of EEG signal using wavelet transform and support vector machine for epileptic seizure diction. In: 2010 International conference on systems in medicine and biology. IEEE, pp 405–408 5. Baldominos A, Ramón-Lozano C (2017) Optimizing EEG energy-based seizure detection using genetic algorithms. In: 2017 IEEE congress on evolutionary computation (CEC). IEEE, pp 2338–2345 6. Ali T, Nawaz A, Ayesha Sadia H (2009) Genetic algorithm based feature selection technique for electroencephalography data. Appl Comput Syst 24(2):119–127 7. Bharat MVSSK, Josyula SB (2015) Rule discovery based classification on biological dataset using ant colony optimization. Int J Res Comput Commun Technol 4 8. Fan J, Shao C, Ouyang Y, Wang J, Li S, Wang Z (2006) Automatic seizure detection based on support vector machines with genetic algorithms. In: Asia-Pacific conference on simulated evolution and learning. Springer, Berlin, Heidelberg, pp 845–852
Chapter 23
Colorization of Grayscale Images Using Convolutional Neural Network and Siamese Network Archana Kumar, David Solomon George, and L. S. Binu
1 Introduction Image colorization is the process of taking an input grayscale image and then producing an output colorized image that represents the semantic colors and tones of the input [1]. This process of adding colors must be in such a way that the colorized image is perceptually meaningful as well as visually appealing. It can be viewed as a process of assigning a three dimensional color vector or RGB values to each pixel of the input grayscale image. Different scenes usually have distinct color styles, and hence, it is difficult to accurately capture the color characteristics. It has been observed that sometimes different colored regions have the same intensity distribution. Since it is the intensity or the luminance of a black and white image that is utilized to predict color, this is a challenge. There is no unique solution to the colorization problem and human intervention often becomes important as a deciding factor in assigning colors [2] (e.g., leaves may be colored in green, yellow and brown). It is more challenging to colorize legacy videos. Independently applying image colorization on each frame often causes flickering and false discontinuities [3]. Basically, the colorization methods are classified as follows [3]: interactive colorization (Color scribble based), exemplar-based colorization (Example/reference based), fully automatic colorization (Learning based), video colorization (Combinations of above methods). In this paper, it is proposed to investigate the possibilities of improvement in colorization by combining the colorization neural network with a Siamese network as a feedback system.
A. Kumar (B) · L. S. Binu College of Engineering, Trivandrum, India D. S. George Government Engineering College, Idukki, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_23
297
298
A. Kumar et al.
2 Related Works The colorization problem has attracted much attention in the research community, and the way it is tackled by researchers has been evolving over time. The various existing approaches toward solving this problem and how they have evolved are explained in the following sections.
2.1 Interactive Colorization This is the earliest method of colorization, and it requires the user to provide color scribbles or manual markings of color in the different regions of the image to be colorized. Based on these scribbles, the colors are propagated to the nearby pixels, assuming that coherent neighborhoods have similar colors [4]. This method requires neither precise image segmentation, nor accurate region tracking. The basic assumption behind this method is that nearby pixels having similar gray levels have similar colors. The image is processed in the YUV color space. Let the color at pixel ’r’ be denoted by U(r) and V(r). The algorithm tries to minimize the difference between U(r) (and V(r)) and the weighted average of colors in neighboring pixels. This is performed over all the pixels. The major limitation of this method is that it requires a considerable amount of effort from the user in terms of providing the color scribbles. It is thus time-consuming. Another drawback is that color bleeding occurs at the boundaries. This has been solved by researchers, by combining edge detection along with colorization [5]. This method also finds applications in selective recoloring.
2.2 Exemplar-Based Colorization Exemplar or Example-based colorization is a semi-automatic technique of transferring color from a reference color image which is semantically similar to the target greyscale image that needs to be colorized. The chromatic information is transferred by matching the luminance values and the texture. The color of a pixel in the reference image is transferred to the pixel in the greyscale image having similar luminance values [6]. This method is most effective on images having distinct luminance clusters or having distinct textures. A constraint of this method is that the reference image should be semantically similar to the target image, ie. it should contain similar object types. The algorithm performs feature extraction at superpixel resolution. A feature vector is computed for each superpixel based on the intensity, standard deviation, Gabor features, surf descriptors, etc. The extracted features are then used to find correspondences between the reference and the target images and hence assign colors. The Euclidean distances
23 Colorization of Grayscale Images Using Convolutional Neural …
299
between corresponding features are calculated to find the superpixels that are most similar to each other. After identifying the most similar superpixel, the color is transferred as micro-scribbles to the center of the superpixel, and this color is propagated based on the principles of the scribble-based algorithm already explained in previous chapter. However, this method also has some limitations. The colorization is not accurate at object boundaries or thin image structures. Color bleeding and color washout are common. The method also fails when suitable color exemplars are not available. Another drawback is that sometimes the target image may contain a number of complex objects, and it may be difficult to find an appropriate reference image. To solve this problem, this method was modified such that the images from the Internet are utilized to find a suitable reference image, using filtering schemes [7]. The filtering is based on user-provided semantic text labels.
2.3 Fully Automatic Colorization Fully automatic colorization implies no human interference while assigning color. This is achieved through deep learning techniques trained on large datasets. The network locates the most similar image patch/pixel in a huge reference image database (training data) and then transfers color information from the matched patch/pixel to the target patch/pixel. It can be said that there exists a complex gray to color mapping function that is capable of mapping the extracted features at each pixel to the corresponding chrominance values. Through training, the network learns this mapping. The various deep learning architectures that researchers have tried out include convolutional neural networks (CNNs), generative adversarial networks (GANs) and network ensembles. Fully automatic colorization is faster than all other methods. But, issues of color bleeding and color washout persist. Also, if the target image is outside the scope of the training data, the quality of the output may not be satisfactory. The various deep learning models used for solving this problem are as follows: • Convolutional Neural Networks: The CNNs used for colorization are usually VGG style networks with multiple convolutional blocks. Each block has two or three convolution layers followed by a rectified linear unit and terminating in a batch normalization layer [8]. The features extracted by the multiple layers of the VGG network are used to train a regression model that maps the raw greyscale values to the chrominance values. • Generative Adversarial Networks: A GAN is a combination of two CNNs—a generator and a discriminator. The generator applies color based on what it has been trained on, and the discriminator tries to criticize the color choice [2]. By this interactive mechanism, after a few iterations, the generator output becomes close to real truth such that the discriminator fails to identify whether it is the
300
A. Kumar et al.
generator’s output or a real image. That is the point when we can say that the network is effectively colorizing the images. • Network Ensemble: This is a mixture learning colorization model. First of all, a color style clustering is performed on the training data to give multiple scene type clusters [1]. Separate neural networks are trained on each of these clusters, and the resulting neural network ensemble is used to colorize the target greyscale image. From an input grey image, the features are extracted at each pixel, and the nearest cluster as well as the corresponding neural network is identified and fed as inputs.
3 Proposed Technique 3.1 Block Diagram The proposed colorization model consists of a CNN to implement the colorization logic and a Siamese network to evaluate the outputs. The block diagram of the proposed network is given in Fig. 1. The two blocks are explained in below sections: Colorizer CNN: The type of CNNs used in the colorization problem is of VGG style. The key difference here compared to other visual neural networks is the significance of pixel location. In colorization networks, the image size or ratio stays constant throughout. In other types of neural networks, the image gets distorted as it gets closer to the last layer. Though max-pooling layers increase the information density, they also cause distortions in the image. The layout of an image is not given importance by these layers. Hence, we do not use them in colorizing networks. We use stride of 2 instead to decrease the height and width by half. In this way, we can increase the information density without distorting the image. Another difference from classifier networks is the use of upsampling layers and maintaining the image ratio. This is because we cannot afford to let the image decrease its quality and size as it moves Fig. 1 Block diagram of the proposed network
23 Colorization of Grayscale Images Using Convolutional Neural …
301
through the layers. Image ratio can be kept constant by making use of white padding (padding = ’same’). Upsampling helps to double the size of the image. Siamese Network: A Siamese network (or a twin neural network) is an artificial neural network that contains two or more identical subnetworks [9, 10] having the same configuration with the same parameters and weights. These networks are used to find the similarity of the given inputs by comparing their respective feature vectors, and the basic principle behind a Siamese network is called similarity learning. This is a supervised machine learning technique, where the aim is to learn a similarity function which defines the extent to which two objects are related or similar to each other and outputs a similarity score. Suppose the two input images are A and B, and F(A) and F(B) are the corresponding feature vectors, then the similarity is found by the expression: d(A, B) = F(A) − F(B)2
(1)
If A and B are the same image, ‘d’ is small, and if they are different, ‘d’ is large. The loss function for a Siamese network is developed around this logic. To train the network, the data must be grouped into pairs of images that are either similar or dissimilar. Only one subnetwork needs to be trained. The other uses the same weights and parameters. In negative pairs, the distance between the images should be greater than a threshold ‘m’ [11]. So, if a pair already satisfies this condition, we need not further separate them. This is why hinge loss is also used. The expression is given below: L(A, B) = max 0, m 2 − F(A) − F(B)2
(2)
The hinge loss value will be ‘0’, when A and B are very far apart. Putting together L2 loss and the hinge loss, we get the contrastive loss. The expression is given below: L(A, B) = yF(A) − F(B)2 + (1 − y) max 0 − m 2 − F(A) − F(B)2 (3) Here, ‘y’ is the label of the image pair. Hence, when it is a positive pair (y = 1), L2 norm becomes the loss function, and for a negative pair (y = 0), hinge loss should be the loss function that needs to be minimized. By using this contrastive loss, the positive pairs are brought together, and the negative pairs are moved apart.
3.2 Principle The idea is to minimize the difference between the colorized and its original image by using the Siamese network. A Siamese network can be used to measure similarity between two images in terms of a similarity score. If this similarity score can be used as feedback to the colorization network and thereby try to improve its performance
302
A. Kumar et al.
by producing outputs that are similar to the ground truth, or we can say that will be classified as similar by a Siamese network. The colorization model may act as the generator, and in place of the discriminator, the Siamese network can be used. Once the training progresses, the similarity score which is fed back acts as a loss function, and when its value gets optimized, the model output and ground truth would almost look like each other.
3.3 Dataset To solve the problem of colorization, network models trained on colored images are required. Any RGB image dataset can be used for training. The dataset used here, to train the models is CIFAR-10. The CIFAR-10 dataset consists of 60,000 color images of resolution 32 × 32 belonging to 10 classes, with 6000 images in each class. There are 10,000 test images and 50,000 training images. The test batch contains 1000 randomly selected images from each of the classes. The training batches contain the remaining images, i.e., 5000 images from each class distributed unevenly among the batches.
3.4 Method The basic logic behind the deep learning-based techniques for colorization is explained below. These models perceive RGB images in terms of the ‘CIELAB’ color space (also known as CIE L*a*b* or sometimes abbreviated as simply ‘Lab’ color space) which consists of three components, namely the ‘L’ component, which is the luminance/brightness component and the a, b components which are the chrominance components. To be more precise, ‘a’ is the green–red component, and ‘b’ is the blue–yellow component. The lab color space is designed in such a way that it can approximate human vision. It achieves perceptual uniformity, and its L component closely matches human perception of lightness. Deep learning models are trained with color images decomposed into the lab space. The ‘L’ component is the input feature, and a, b are the labels. The network is trained to predict the chrominance components from the ‘L’ channel of the input gray image. The predicted a, b values combined with the ‘L’ value gives the colorized output. This output which is in the lab format is converted back to RGB.
3.5 Training Data Preparation Colorizer: The colorization network was trained alone first with images from the CIFAR-10 dataset. The steps of pre-processing of the training data are as follows.
23 Colorization of Grayscale Images Using Convolutional Neural …
303
Since the images are in the RGB format, it is first divided by 255 for normalizing purpose. Training input becomes the extracted L value of the colored training image. Training label is the extracted ab values. The ab value needs to be divided by 128, since the lab space ranges from −128 to +128. This division gives a bounded range between −1 and + 1. The class labels of CIFAR-10 are not used here. Similarity Learning: The Siamese network was also trained on CIFAR-10 to give similarity scores between 0 and 1. The training data were prepared by grouping images belonging to the same class to give positive pairs and images belonging to two different classes to give negative pairs. The positive pairs are labeled ‘1’ and negative pairs as ‘0’. For this training, a contrastive loss function was used which is a combination of L2 norm and hinge loss. After the Siamese network is trained, it is used to find the difference between the output of the colorizer and the ground truth. This value is fed back and used as the loss function to optimize the colorization process. As the Siamese loss gets minimized, the distance between the inputs decreases. This implies the output, getting closer to the ground truth.
3.6 Training Once the Siamese network was trained and ready to give similarity scores on giving two input images, the model weights were saved. The colorizer model is combined with this to get the proposed model. For that a custom loss function was written by reading the weights of the saved Siamese model and inputting the colorizer output and the ground truth in each iteration. So, this value will be the loss that needs to be optimized in the next iteration. Instead of an error function as a loss function, here, we have used another device (Siamese network) to calculate the error in terms of a similarity score. This is the novelty introduced in colorization through this work. The model was trained for 1000 epochs with a decaying learning rate and using Adam optimizer.
3.7 Prediction After training the network, we make a prediction with a test grey image. The output obtained is a value between −1 and + 1. Hence, this has to be multiplied by 128 to get the color value in the ab spectrum. To convert this into a color image in RGB format, first, a black canvas is created with three layers of zeroes. Then, the L component is added onto it, directly from the input test image. Finally, the predicted output value, up scaled to the ab spectrum, is also added to get the complete colorized image in the LAB format which can be converted back to the RGB format and displayed.
304
A. Kumar et al.
4 Results—Colorized Outputs The logic for achieving colorization was written in Python and run on Google Colab using the Tensorflow framework. Functions available in the Keras library as well as Matlab were used to plot and visualize the outputs. A few examples of the colorized output images are given in Fig. 2. Fig. 2 Ground truth and output colorized image using proposed network given in pairs. PSNR with respect to ground truth is given below each output image
(b) 23.21dB
(d) 23.65dB
(f) 21.26dB
(h) 18.21dB
(j) 21.19dB
(l) 27.65dB
(n) 16.94dB
(p) 26.71dB
(r) 20.57dB
(t) 19.96dB
(v) 28.21dB
(x) 20.21dB
23 Colorization of Grayscale Images Using Convolutional Neural …
305
Fig. 2 (continued)
(z) 25.1dB
() 22.55dB
() 27.56dB
() 24.03dB
() 20.23dB
() 29.32dB
() 28.12dB
() 23.53dB
In Fig. 3 are given the outputs of the colorization network using MSE loss and Siamese Loss against the ground truth for comparison. In some cases, we can see a clear advantage for the Siamese loss, but for some colors, MSE seems to be better. The saturation is a bit on the higher side, whereas in MSE, there is a tendency to approximate to a sepia shade. This is in alignment with what many authors have reported [2, 8]. PSNR values are not a reliable metric in this problem [1, 12]. PSNR, RMSE, and such similar metrics give biased results in this case. Hence, for colorization, perceptual comparison is more convincing than quantitative comparison [1]. However, both are given in the results. The results obtained using the proposed model are given alongside a few previous works using the same dataset. Refer Fig. 4 for the comparison. Visual comparison is more reliable than quantitative evaluation as already reported in a lot of papers regarding colorization [1, 2, 8, 12].
306 Fig. 3 (From left) Ground truth, output colorized image using proposed method, and output colorized image using MSE. The PSNR values in comparison with ground truth are given below each of the outputs
A. Kumar et al.
(b) 19.72dB
(c) 20.34dB
(e) 26.42dB
(f) 21.50dB
(h) 23.53dB
(i) 21.65dB
(k) 29.32dB
(l) 28.34dB
(n) 17.8dB
(o) 20.6dB
(q) 20.23dB
(r) 23.18dB
23 Colorization of Grayscale Images Using Convolutional Neural …
307
Fig. 4 Colorization results with CIFAR-10: (from left) grayscale, ground truth, colorized with U-Net [2], colorized with GAN [2], colorized using proposed network
5 Conclusion The colorization problem is complex since it has no unique solution and cannot be generalized. The recent learning-based colorization techniques automatically colorize a grayscale image using various neural network architectures. This is faster and more efficient but leaves no control over the feature selection. This work is aimed at achieving colorization using deep learning-based networks and a Siamese neural network to provide a loss function. The results obtained by the proposed technique
308
A. Kumar et al.
are at par with and in some cases better than that of using MSE. The results could probably improve if we tweak the network models used or if we make use of a dataset of high resolution images. Transfer learning approaches that involve state-of-the-art models trained on huge datasets could also prove useful. The possibilities of utilizing the features learned by these trained models in colorization can also be explored. Acknowledgments Authors would like to thank all those were involved in this work directly and indirectly.
References 1. Cheng Z, Yang Q, Sheng B (2017) Colorization using neural network ensemble. IEEE Trans Image Process 26(11):5491–5505 2. Nazeri K, Ng E, Ebrahimi M (2018) Image colorization using generative adversarial networks. In: International conference on articulated motion and deformable objects. Springer, Cham, pp 85–94 3. Zhang B, He M, Liao J, Sander PV, Yuan L, Bermak A, Chen D (2019) Deep exemplar-based video colorization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8052–8061 4. Levin A, Lischinski D, Weiss Y (2004) Colorization using optimization. In ACM SIGGRAPH 2004 papers, pp 689–694 5. Huang Y-C, Tung Y-S, Chen J-C, Wang S-W, Wu J-L (2005) An adaptive edge detection based colorization algorithm and its applications. In: Proceedings of the 13th annual ACM international conference on multimedia, pp 351–354 6. Gupta RK, Chia AY-S, Rajan D, Ng ES, Zhiyong H (2012) Image colorization using similar images. In: Proceedings of the 20th ACM international conference on multimedia, pp 369–378 7. Chia AY-S, Zhuo S, Gupta RK, Tai Y-W, Cho S-Y, Tan P, Lin S (2011) Semantic colorization with internet images. ACM Trans Graph (TOG) 30(6):1–8 8. Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: European conference on computer vision. Springer, Cham, pp 649–666 9. Pan Z, Bao X, Zhang Y, Wang B, An Q, Lei B (2019) Siamese network based metric learning for SAR target classification. In: IGARSS 2019–2019 IEEE International geoscience and remote sensing symposium. IEEE, pp 1342–1345 10. Vizv´ary L, Sopiak D, Oravec M, Bukovˇcikov´a Z (2019) Image quality detection using the Siamese convolutional neural network. In: 2019 International symposium ELMAR. IEEE, pp 109–112 11. Kert´esz G (2021) Different triplet sampling techniques for lossless triplet loss on metric similarity learning. In: 2021 IEEE 19th World Symposium on applied machine intelligence and informatics (SAMI). IEEE, pp 000449–000454 12. Larsson G, Maire M, Shakhnarovich G (2016) Learning representations for automatic colorization. In: European conference on computer vision. Springer, Cham, pp 577–593
Chapter 24
Modelling and Visualisation of Traffic Accidents in Botswana Using Data Mining Ofaletse Mphale and V. Lakshmi Narasimhan
1 Introduction Traffic accidents are the major cause of deaths and injuries worldwide which have a huge impact on its economic development and societal health. Studies estimate that over 1.2 million persons die and around 20–50 million are injured annually worldwide due to RTAs [1]. Even though Africa has recorded the least death rates, an increase of 4.1% in traffic accident deaths had been registered in Botswana since 2015 [2]. Therefore this escalated traffic accidents mortalities to be the second-largest death contributor in Botswana, succeeding deaths resulting from HIV/AIDS. Traffic accidents are triggered by various factors that occur collectively or individually in association resulting in discernible patterns [3] and, to improve road safety, there is a need to understand these special patterns and their relationships in every accident events. Gaborone City, capital city of Botswana, contributes to a larger share of traffic accident mortalities while Botswana has been ranked 32nd in mortality due to RTAs in Africa [4]. Insights on traffic accidents in Botswana are gathered from Botswana Police Service (BPS) and traffic safety agencies. These are hereogenous datasets which are usually messy and less useful for strategic decisions making. In data science, data mining (DM) and knowledge discovery in databases (KDD) are popular approaches applied in traffic accident analysis to discover hidden links in large amount of data [5, 6]. This entails the usage of useful algorithms such as K-means clustering algorithm, association rules, random forest, k-nearest neighbors, decision trees and more which are efficient in extracting latent knowledge. This study uses frequent pattern growth (FP-Growth) algorithm to mine frequent item sets of traffic accidents. This is followed O. Mphale · V. Lakshmi Narasimhan (B) Department of Computer Science, University of Botswana, Gaborone, Botswana O. Mphale e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_24
309
310
O. Mphale and V. Lakshmi Narasimhan
by association rule mining algorithm to model relationships between co-occurrence of different traffic accident factors in unique accidents events. The study results are foreseen to be beneficial to traffic accident monitoring institutions where it could enhance understanding on the main critical factors triggering RTAs in Botswana. Furthermore, this study supports management decisions to identify and implement accident prevention mechanisms to overcome the burdens incurred due to RTAs. The rest of the paper is organised as follows: Sect. 2 presents the literature review, where several data mining and modelling approaches related to the study are presented. The methodology employed in this paper is presented in Sect. 3. Section 4 details the study results obtained, while the conclusions and future works are discussed in Sect. 5.
2 Literature Review Over the past decades, DM and KDD approaches had been famed in the literature for traffic accident analysis and knowledge detection. The market basket analysis is a famous approach where association rule mining algorithm is efficiently applied to excavate repeated patterns in customer shopping behaviour in large retail stores [5]. Retailers use this approach to gain insight on relations between items that customers typically tend to purchase together frequently. Therefore, this enables them to know the sets of goods to put together on sale at appropriate prices, which may not necessarily be minimum prices. Association rule mining algorithm had been adopted to extract hidden links between different traffic accident variables and the location of the accident [6]. Apriori and predictive Apriori algorithms have been critically compared for effectiveness in mining interesting traffic accident rules. In the study findings, it was discovered that Apriori algorithm outperformed predictive Apriori at mining useful associations between different traffic accident variables [7]. Pego [8] analysed a historical traffic accident data from Gaborone, Botswana, and several relationships between traffic accident factors and their causes were identified. Moreover, in her study it was shown that traffic accident temporal and spatial factors, such as time of the day, could have a significant effect on occurrence of road accidents. In another study, an econometric model was employed to determine the main causes of RTAs and fatalities in Botswana [9]. His study finding showed that travelling during the night-time increases chances of road accidents. Similarly, it was also discovered that expansion of road infrastructure has no relationship with the occurrence of road accidents in Botswana. However, the main factors contributing to RTAs were outlined as speeding, drink driving and the increased traffic congestion during rush hours in urban areas [10]. In addition, the study recommends that the regular use of public transport, setting targets to reduce RTAs and establishing Road Traffic Council can reduce the number of RTAs in Botswana. The various factors contributing to road accidents in Botswana were also categorised into three main classes which include; vehicles, road users and the road
24 Modelling and Visualisation of Traffic Accidents …
311
system [11]. It is established that RTAs are caused by road users through speeding, unlicenced driving, using cell phones whilst driving, alcohol, and drug abuse. Some factors triggering RTAs which relate to vehicles class were identified as mechanically faulty vehicles, unmaintained vehicles, old vehicles and tyre blowouts. Other factors that relate to road system conditions were also highlighted as potholes, stray livestock and road design attributes. Furthermore, the study recommends that RTAs in Botswana can be further minimised if strict penalties are imposed against livestock owners who leave stock straying in highways and public roads are maintained regularly. A hybrid clustering technique and classification methods were employed to assess prediction accuracy using vehicle crash severity data [12]. In their study, it was shown that the hybrid clustering technique based on genetic algorithm (GA) produced more accurate predictions when compared to other classification algorithms. Furthermore, the hybrid GA method was simple to implement and provided more comprehensive feedback. A multi-criteria analysis model has been tested on various attributes of traffic accident data in Morocco [13]. Decision trees, such as random forest, random forest big data and gradient boosting classification, were explored with various measures to evaluate their prediction error in traffic accident factors in UK [14]. The study findings showed that random forest provided a much higher precision of 18.24% error, followed by random forest big data with 14.63% error and lastly gradient boosted classification with 14.47% error respectively. In analysing the literature survey, it is evident that traffic accidents are a global concern, and the methods used by scholars for analysing traffic accidents have become complex. The associations between some traffic accident factors identified by scholars seemed to be contrary and inconclusive [15–17]. Furthermore, there is currently no work that investigates commonalites between co-occurences of traffic accident factors in the context of Botswana, using association rules mining algorithm.
3 Methodology 3.1 Traffic Accident Dataset Description The dataset used in the study was acquired from BPS in Gaborone main branch and this consisted of 15,989 traffic accident cases registered from the period September 2019 to February 2020. The structure of the raw data is mostly categorical with more than 35 attributes. Since not every column in the raw data was necessary for data analysis, some columns were dropped and the selected attributes are presented in Table 1. To ensure that the data was appropriate for data mining tasks, further data preprocessing steps were employed by the study analysis process, and these are discussed in Sect. 3.2.
312
O. Mphale and V. Lakshmi Narasimhan
Table 1 Outline of the major attributes of traffic accidents Attribute Class
Name of Attribute
Description
Accident attributes
Accident severity
Fatal, Serious, Minor, Damage only
Driver attributes
Victim attributes
Time
Morning, Afternoon, Evening, Night
Day of the week
Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday
Traffic density
Dense, Light, Very light, Negligible
Road surface type
Gravel, Tar, Sand, other
Light conditions
Daylight, Dark, Night street Lit, Night Bright Moon
Weather conditions
Fine, Rain, Hail, Mist, Wind
Junction type
Not Junction, Crossroads, T-Junction, Y-Junction, Roundabout, Level crossing, Staggered junction, Other
Road curvature
Straight, Bend open, Bend blind
Police station
Police station name
Collision type
Animal-domestic, Animal wild, head-on, nose to trail, obstacle on road, obstacle off-road, pedestrian, rollover, side collision, other
District
District name
Driver age
Young, Teen, Youth, Adult, Senior
Driver gender
Male, Female
Alcohol/drug suspect
Yes, No
Driver profession
Doctor/Nurse, Engineer, Legal, Financial, Government, Police, Army, Other professional, Labourer, Student, Unemployed, Other
Driver licence
Animal drawn, bicycle, bus, car, light duty, lorry, lorry & trailer, Motorcycle, Mini bus, taxi, wheel drive, pickup, other
Vehicle ownership
Valid, invalid
Vehicle type
Bus company, company, Government, Private, other
Victim age
Young, Teen, Youth, Adult, Senior
Victim gender
Male, Female
Pedestrian manoeuvre
Crossing the road, playing on road, on edge of road, on footpath, walking on road, other
Number of casualties
Percentage number of casualties
24 Modelling and Visualisation of Traffic Accidents …
313
3.2 Data Pre-processing The data pre-processing steps are necessary in data mining to ensure data that is converted into appropriate format for knowledge detection. It comprises of key processes like replacing missing values, outlier detection and removal of duplicate values in datasets [18, 19]. Because the raw data was messy and consisted of a lot of missing values, RapidMiner studio [20] was used for initial data cleaning and transformation. The columns with empty spaces or missing values were replaced with “0”. However, in some cases e.g. the ages less than 11 or greater than 80 were removed. The duplicate example sets were also pruned from the dataset. Therefore this lead to about 80% of our original dataset being considered for further data mining. Data transformations techniques were applied on some attributes to improve data analysis and readability process. For example, Time and Age attributes were set as follows: • Time: Morning (04:00 am–11:59 am), Afternoon (12:00 am–15:59 pm), Evening (1600 pm–21:59 pm), Night (22:00 pm–03:59 pm). • Age: Young (0–12 years), Teen (13–19 years), Youth (20–36 years), Adult (37– 65 years), Senior (65 years+). Traffic accident key sheet and a traffic safety professional were consulted during data pre-processing and transformation processess to ensure validity of the measures. The major attributes consider ed for data analysis process are shown in Table 1.
3.3 Association Rule Mining Algorithm Association rules mining algorithm was discovered by Ref. [21] when examining transactions in transactional databases. Since then, association mining algorithm had been applied in disciplines such as road accident analysis to uncover latent relationships in data. Association rule mining algorithm is mathematically defined as [15], A → B, where A, B ⊂ I, such that A = ∅, B = ∅ and A B = ∅. • A—is the antecedent of the rule • B—is the consequent • I—is a collection of item sets. The strength of association rule provides the interestingness measure of the support, confidence and the lift. This interestingness is often subjective. A lift value greater than 1 signifies a positive occurrence of the rule. Some of the essential metrics used in the measurement of the strength of the association rules are discussed in Eqs. 1–3:
314
O. Mphale and V. Lakshmi Narasimhan
Support (X ) =
No. of transaction of X Total no. of transactions
(1)
The confidence of an association rule is defined as shown: Confidence(X → Y ) = support(X ∪ Y ) · support(X )
(2)
The confidence of the rule is the conditional probability P (Y|X), where X and Y are transactions. Confidence of the rule is user supplied, and it determines how strongly does the X antecedent rule imply the Y consequent of the rule. The lift1 of a rule is mathematically defined as shown: Lift(X → Y ) = support(X ∪ Y ) · support(X ) ∗ support(Y )
(3)
FP-Growth Algorithm. The FP-Growth algorithm is an improvement of Apriori algorithm. It uses FP tree data structures which follow a recursive divide and conquer approach to mine frequent sets items. The FP-Growth algorithm is fast and scalable when compared to the Apriori algorithm. It generates association rules based on distinct measures of lift, leverage and conviction by specifying minimum threshold [16]. The FP-Growth algorithm could be executed in the following steps: lift of a rule is mathematically defined as shown: i.
Scan data to determine the support count for each item. • Infrequent items are discarded, while the frequent items are sorted in decreasing support counts.
ii.
Make a second pass over the data to construct the FP tree. • As the transactions are read, before being processed, their items are also sorted in the order as in step 1. The pseudo-code of FP-Growth algorithm could also be defined in Fig. 1.
4 Results and Discussions 4.1 Visualisation of Traffic Accident Data The significance of data visualisation is to transform data into visual context such as graphs or maps, for easier interpretation so that strategic management decision could be made easily [18]. In data analysis, RapidMiner Studio in conjunction with Tableau 1
The lift value of an association rule is the ratio of the confidence of the rule and the expected confidence of the rule. It is a measure of importance of a rule.
24 Modelling and Visualisation of Traffic Accidents …
315
Fig. 1 Pseudo-code of FP-Growth algorithm. Source Reference [17]
Public software was used to visualise and model different factors triggering traffic accidents in Botswana. Traffic accident factors were modelled based on attribute classes, e.g. accident attributes, driver attributes and victim attributes, respectively. The total percentage number of casualties involved was used to evaluate the degree of influence of different factors during the analysis process. Sections 4.1.1–4.1.3 provide analysis and visualisations of traffic accident factors based on the accident attribute class.
4.1.1
Distribution of Traffic Accidents Based on Accident Attribute Class
Figure 2 shows that most of the traffic accidents in Botswana occur in Kweneng, Central and South East district, respectively, and the least number of accidents occur in Chobe, Kgalagadi and Kgatleng districts. The results further show that Mogoditshane, Gaborone, Mahalapye and Serowe police stations had registered the highest number of traffic accidents in Botswana. Figure 3 shows that 69% of traffic accidents that occur in Botswana are minor accidents followed by 22% serious and 9% fatal accidents severity, respectively. Figure 4 shows that most traffic accidents occurring in Botswana are due to collision with pedestrians and rollover. However, it is further revealed that few of the traffic accidents incidents involve collision with obstacle on the road such as dividers, highway barriers, road signage and road tracers. Figure 5 shows that 51% of traffic accidents in Botswana occur in the morning, 24% in the afternoon, 14% in the evening and 11% at night, respectively. Based on the results, it could be concluded that it is most accident-prune time is in the morning and afternoon. Moreover, the safest time in Botswana is at night which involves least number of traffic accidents. Figure 6 shows that 14.7% traffic accidents occur on Thursday, 17.1% on Friday
316
O. Mphale and V. Lakshmi Narasimhan
%Number of CasualƟes
Fig. 2 Accident distribution by Police stations
80% 60%
69%
40% 20% 0%
9% Fatal
Fig. 3 Accident distribution by Severity
Fig. 4 Accident distribution by Collision type
22% Minor Severity
Serious
24 Modelling and Visualisation of Traffic Accidents …
317
11% AŌernoon 24% 51%
Evening 14%
Morning Night
Fig. 5 Accident distribution by Time
% Total NUMBER OF CASUALTIES 22.4% 13.6%
14.7%
14.0% 10.1%
17.1%
8.0%
Fig. 6 Accident distribution by Day of the week
and 22.4% Saturday. Based on the results, Thursday, Friday and Saturday are the most accident-prune days in Botswana. The results further show that 10% of the accidents occur on Tuesday and 8% on Wednesday. These are the safest days. These results are also typical worldwide. For example, in USA, Britain and Iran traffic accidents occur mostly on Fridays and Saturdays, while the safest day of the week with the least number traffic accident is on Tuesday and Wednesday [19, 22]. However, findings contradict the findings in Malta in the city of Qormi, which shows that Sunday is the safest days of the week and Tuesday is one of the most dangerous days of the week [23]. Figure 7 shows that 69% of traffic accidents occur in the rural location compared to the urban location in Botswana. 39.74% of the accidents in rural location occur in daylight light condition and 23.12% in the dark light conditions, respectively. About 1.38% of the RTAs occur in night bright moonlight in Botswana shows that 14.7% traffic accidents occur on Thursday, 17.1% on Friday and 22.4% Saturday. Figure 8 shows about 80% of traffic accidents in Botswana occur in tar road where the road curvature is straight. However, less than 15% of the RTAs occur on gravel or sandy road surface type. Furthermore, around 3% of the RTAs occur on bend blind road curvature. Figure 9 shows that 96% of RTAs in Botswana occur in fine weather condition and 39% of the accidents occur where traffic density is light. About 3.56% of the accidents occur in the rain weather condition when the traffic density is light.
318
O. Mphale and V. Lakshmi Narasimhan
Fig. 7 Accident distribution by Location type and Light condition
Fig. 8 Accident distribution by Road curvature and Road surface type
4.1.2
Distribution of Traffic Accidents Based on Accident Attribute Class
Figure 10 shows that around 75% of the drivers involved in RTAs are males. Furthermore, it is revealed that about 97% of drivers who involve in traffic accident are under the influence of alcohol or drug abuse, but they hold valid driving licences. This is very common worldwide. Police say, “Men are drunken drivers, Women are bad drivers”.
24 Modelling and Visualisation of Traffic Accidents …
Fig. 9 Accident distribution by Weather and Traffic density
Fig. 10 Accident distribution by Driver sex, Alcohol suspect and Licence validity
319
320
O. Mphale and V. Lakshmi Narasimhan
Figure 11 shows that around 61% of traffic accidents in Botswana involve youth and adults who drive cars. It is also shown that around 17% of the RTAs that occur involve the pickup vehicle type, respectively. Furthermore, on fewer occasions, it is shown that about 3% of RTAs involves seniors who are motorcyclist. Figure 12 shows that around 55% of drivers who involve in traffic accidents belong
Fig. 11 Accident distribution by Vehicle type and Driver age
Fig. 12 Accident distribution by Vehicle ownership and Driver profession
24 Modelling and Visualisation of Traffic Accidents …
321
to other profession category and 37% own their private vehicles. The second-highest category is the unemployed at 9% and around 4% own their private vehicles. Police officers are least involved in traffic accidents in Botswana.
4.1.3
Distribution of Traffic Accidents Based on Victim Attribute Class
Figure 13 shows that around 32% of victims of traffic accidents in Botswana are males who are classified as youth and around 27% are classified as adults, respectively. In most occasions, around 30% of victims were crossing the road and less than 5% are playing on the road or on footpath. Figure 14 shows that 65% of RTAs victims in Botswana are males and are not
Fig. 13 Accident distribution by Victim age, Sex and Pedestrian manoeuvre
Fig. 14 Accident distribution by School and Victim sex
322
O. Mphale and V. Lakshmi Narasimhan
Fig. 15 Modelling of association rules
school pupils. However in some cases, 4% female victims are pupils on the journey to and from school.
4.2 Modelling Association Rules RapidMiner studio was used to generate association rules using FP-Growth and create association rules operators sequentially. Figure 15 presents the necessary steps involved in generating the association rules. As shown in Fig. 15, the FP-Growth algorithm was run in multiple cycles to accurately determine frequent item sets. The parameters of the growth operator and create association rules operator were optimised to generate the most interesting rules. For example, minimum support measure was set to 0.20 and the confident level was set to 0.80 and the lift values were observed. The algorithm generated a lot of rules but only useful rules were selected based on the highest lift value which was greater than the value of 1. According to [24], association rules with lift value less than 1 are negatively associated and therefore are less useful for knowledge discovery. Brief discussions of some of the generated rules are shown in Sects. 4.2.1–4.2.3.
24 Modelling and Visualisation of Traffic Accidents …
4.2.1
323
Generated Association Rules
See Table 2. Table 2 Generated association rules Rules Association Rules
Description
1
{DRIVING LICENCE VALID = “valid” AND ROAD CURVATURE = “Straight” AND JUNCTION TYPE = “Not Junction”} = > {ROAD SURFACE TYPE = “Tar”}
Rule 1 depicts that in an accident event where the driver licence is valid and the road is straight is not a junction, then there is a high probability that the accident would occur on a tar road
2
{DRIVER SEX = “Male” AND JUNCTION TYPE = “Not Junction”} = > {ALCOHOL/DRUGS = “Yes” AND ROAD CURVATURE = “Straight”}
Rule 2 depicts that if a male driver is involved in an accident and the accident occurs where there is no junction then chances are the driver also tested positive for alcohol/ drugs and the accident occurred in a straight road curvature
3
{DRIVING LICENCE VALID = “valid”, WEATHER = “Fine”, ROAD CURVATURE = “Straight” AND DRIVER SEX = “Male”} = > {ALCOHOL/DRUGS = “Yes” AND ROAD SURFACE TYPE = “Tar”}
Rule 3 depicts that if an accident occurs and the drivers licence was valid, the weather was fine, the road was straight and the driver sex was male, then there is a high chance that the driver would test positive for alcohol/drugs and the accident had occurred on tar road
4
{ALCOHOL/DRUGS = “Yes” AND VICTIM SEX = “Male”} = > {PEDESTRIAN MANOEUVRE = “crossing the road”}
Rule 4 depicts that if a traffic accident occurs where the driver had tested positive for alcohol or drugs and the victim is male, then there are chances that the pedestrian was crossing the road when the accident occurred
5
{ACCIDENT SEVERITY = “Minor”} = > Rule 5 depicts that if an accident severity {WEATHER = “Fine” AND ROAD is minor then there are possibilities that CURVATURE = “Straight”} the accident had occurred when weather is fine and it is in a straight road where there are no curves
6
{DRIVING LICENCE VALID = “valid” AND VEHICLE OWNERSHIP = “Private”} = > {WEATHER = “Fine” AND ROAD CURVATURE = “Straight”}
Rule 6 depicts that in an accident event where a driver licence is valid and they own their private cars, then there is a high chance the accident occurred in daylight and road curvature is straight
7
{ROAD SURFACE TYPE = Tar, PEDESTRIAN MANOEUVRE = crossing the road, VICTIM AGE = Youth} = > {ROAD CURVATURE = Straight}
Rule 7 depicts that in an accident event where the road surface type was tar and pedestrian was crossing the road, where the victim is identified as youth then there is a great possibility that the accident would occur in a straight road
324
4.2.2
O. Mphale and V. Lakshmi Narasimhan
Summary of the Rules
The dataset reveals that there are strong links between different traffic accident factors and co-occurences of different traffic accidents in Botswana. In essense, the major influential factors that show strong and positive correlations with accidents could be related to driver, victim, accident severity, weather and the road attributes. Furthermore, the association rules also depict that the high correlating factors between traffic accidents cases in Botswana are driver sex, driver alcohol/drug abuse, driver licence validity, vehicle ownership, victim age, pedestrian manoeuvre, road surface type, road curvature, weather and severity of the accident.
4.2.3
Ranking of the High Most Influential Factors from Association Rules
The high most influential factors generated from the association rules were ranked based on the percentage number of casualties involved in a descending order. Figure 16 presents the ranking of the most influential factors accounting to RTAs in Botswana as shown. From Fig. 16, the high most influential factors triggering road accidents in Botswana are ranked as follows: alcohol/drugs, driver licence validity, driver age, weather, road curvatures, road surface type, pedestrian manoeuvre, driver sex, junction type, accident severity, vehicle ownership, victim sex and victim age, respectively. The top high most influential factors are alcohol/drugs, driving licence validity, driver age, weather, road curvature and road surface type which involves 14.94%, 12.30%, 9.82%, 7.84%, 7.24 and 6.87% numbers of casualties, respectively. The least
Fig. 16 Ranking of the most influential factors of RTAs
24 Modelling and Visualisation of Traffic Accidents …
325
Fig. 17 Ranking of the high influential factors by Attribute Class
influential factors accounting to RTAs in Botswana are vehicle ownership, victim sex and victim age which involve 5.65%, 5.60% and 3.98% number of casualties, respectively. Figure 17 shows that the driver class contributes to a larger share of the RTAs in Botswana, followed by accident class and finally the victim class. This had involved 44%, 43% and 11% number of casualties, respectively. From the results, it could be concluded that most traffic accident factors triggering road accidents are related to the driver and the road attributes, respectively. Factors relating to the victim had a relatively the lowest degree of influence to most traffic accidents in Botswana.
5 Recommendations Based on the analysis, we propose the following recommendations the accident preventions and/or minimisation in Botswana: • Deployment of reflectors at accident-prone areas. Indeed the reflectors are very cheap when bought in bulk. The strategic use of reflector paint will also reduce accidents very significantly. • It is noted that regular road maintenance in Botswana is somewhat poor when compared to other countries. Indeed the car registration fee also is very low. Therefore, we recommend that the fee be hiked and a substantial fine amount of the fee is allocated for regular and periodic road maintenance • Driver education is a must, particularly for youth drivers. Therefore, we recommend that a system of provisional licence, called P1 and P2 licences, whereby fresh licence holders have to be accident-free for the first two years. Besides following stricter road rules and specific restrictions. Such a system has been successfully adopted in Australia [25], to reduce teenage and young/fresh drivers induced accidents. • Inducting mandatory rest stops for long-distance drivers.
326
O. Mphale and V. Lakshmi Narasimhan
Fig. 18 Satellite view of Botswana at night. Source [26]
• • • • •
Introducing segregated road flooring at accident-prone areas. Introducing camera monitoring at accident-prone locations. More frequently conducting drug and alcohol testing on Fridays and Saturdays. Specially posted “Reduced speed” signs in accident-prone areas Use of solar panel powered (light emitting diodes) LEDs at accident-prone areas e.g. bends, (high way) curbs stones, etc. It is noted that Botswana as a whole is not well-lit at night-time (See the satellite view of Botswana at night- Fig. 18)
It is interesting to note that China, France and Denmark have lighted all their roads, including highways using solar-operated LEDs [27, 28]. Parts of their roads are actually solar panels. This has not only reduced the number of RTAs but also significantly improved driver satisfaction. The project cost–benefit analysis showed that in the medium term, it is worth every penny spent.
6 Conclusion and Future Works The study presented in this paper applied association rule mining algorithm to uncover underlying relationships between different accident factors causing road accidents in Botswana. The association rule mining algorithm used FP-Growth algorithm for frequent pattern mining of item sets. The findings of the study include that most traffic accident in Botswana occur in Kweneng, Central and South East districts, respectively. In these accidents, 69% are minor severity, 22% are serious and 9% are fatal accident, respectively. Furthermore, the results show that 51% of the accident occurs in the morning followed by 25% in the afternoon. Results also show that the safest time from RTAs in Botswana is at night.
24 Modelling and Visualisation of Traffic Accidents …
327
Findings also show that Thursday, Friday and Saturday are the most accidentprune days. However, Tuesday and Wednesdays are the safest days. These findings are also typical in other countries like USA, UK and Iran. The accident drivers are mostly youth male drivers who are often under the influence of alcohol or drug. Furthermore, the results show that drivers who involve in traffic accidents hold valid driver’s licences and often collide with pedestrians crossing the road in daylight. When ranking the traffic accident high influential factors our findings shows that the top five most critical factors to observe involve alcohol or drug abuse, driver licence validity, weather, road curvature and road surface type. These factors relate to the driver and the accident class. Furthermore, there are several strong relationships between traffic accident factors such as the driver, victim, weather, accident severity and road condition. These factors had shown positive correlations between their cooccurrences in unique traffic accidents events in Botswana. For future work, the study intends to acquire raw data that span at least five years or more in order to improve the research quality of the findings. Additionally, the study intends to investigate different data mining methods like clustering techniques and predictive models to discover unique insights on interactions of different traffic accidents factors in Botswana. A list of recommendations has also been provided in order to enhance safe driving and road safety in Botswana.
References 1. World Bank Group (2017) The hıgh toll of traffıc ınjurıes: unacceptable and preventable. The World Bank, Washington DC 2. Statistics Botswana (2018) Transport and infrastructure statistics report, September 2018. [Çevrimiçi]. Available: https://www.statsbots.org.bw/sites/default/files/publications/Bot swana%20Transport%20and%20%20Infrastructure%20Statistics%20Report%202018.pdf. Eri¸sildi: 12 December 2019 3. Martín et al. (2014) Using data mining techniques to road safety ımprovement in Spanish roads. In: XI Congreso de Ingenieria del Transporte, vol 160, pp 607–614 4. Atlas Magazine (2018) World Health Organization report—“Road safety in 2017”. Atlas Magazine—Insurance news around the world. [Çevrimiçi]. Available: https://www.atlas-mag.net/en/ article/road-safety-in-2017. Eri¸sildi: 18 June 2019 5. Susan L (2017) A gentle ıntroduction on market basket analysis—association rules, 25 September 2017. [Çevrimiçi]. Available: https://towardsdatascience.com/a-gentle-introduct ion-on-market-basket-analysis-association-rules-fa4b986a40ce. Eri¸sildi: 18 June 2019 6. Kumar S, Toshniwal D (2016) A data mining approach to characterize road accident locations. J Mod Transp 24:62–72. https://doi.org/10.1007/s40534-016-0095-5 7. Amira A, El Tayeb VP, Abdelaziz A (2015) Applying association rules mining algorithms fortraffic accidents in Dubai. Int J Soft Comput Eng (IJSCE) 5(4). ISSN: 2231-2307 8. Pego M (2009) Analysis of traffic accident in Gaborone, Botswana. Master of Arts Dissertation, University of Stellenbosch. http://scholar.sun.ac.za/handle/10019.1/2395 9. Mphela T (2020) Causes of road accidents in Botswana: an econometric model. J Transp Supply Chain Manage 14:a509. https://doi.org/10.4102/jtscm.v14i0.509 10. Mupimpila C (2008) Aspects of road safety in Botswana. Development Southern Africa, Taylor & Francis Online, 25(4), pp. 425–435. https://doi.org/10.1080/03768350802318506
328
O. Mphale and V. Lakshmi Narasimhan
11. Munuhwa S, Govere E, Samuel S, Chiwira O (2020) Managing road traffic accidents using a systems approach: case of Botswana—Empirical review. J Econ Sustain Dev 11(10). ISSN: 2222-1700 (Paper) ISSN 2222-2855 (Online) 12. Hasheminejad SHA, Zahedi M, Hasheminejad SMH (2018) A hybrid clustering and classification approach for predicting crash injury severity on rural roads. Int J Injury Control Safety Promotion 25(1):85–101. https://doi.org/10.1080/17457300.2017.1341933 13. Addi A, Tarik A, Fatima G (2016) An approach based on association rules mining to ımprove road safety in Morocco. In: International conference on ınformation technology for organizations development (IT4OD), Fez, Morocco 14. Babic F, Zuskacova K (2016) Descriptive and predictive mining on road accident data. In: International symposium on applied machine intelligence and informatics 15. Rao S, Gupta R (2012) Implementing ımproved algorithm over Apriori data mining association rule algorithm. Int J Comput Sci Technol 3(1):489–493 16. Han J, Kamber M (2006) Data mining concepts and techniques. Elsevier, San Francisco 17. Purba JT, Hery H, Putra CP (2018) Usage ICT application for bundling products: strategic digital marketing in facing the 4.0 technology, %1 içinde The 1st International conference on computer science and engineering technology Universitas Muria Kudus 18. Heitzman A (2019) SEJ Search Engine J 29 January 2019. [Online]. Available: https://www. searchenginejournal.com/what-is-data-visualization-why-important-seo/288127/#:~:text= Data%20visualization%20is%20the%20act,outliers%20in%20groups%20of%20data 19. MailOnline (2017) MailOnline, 11 August 2017. [Online]. Available: https://www.dailymail. co.uk/news/article-4780444/Fridays-common-day-car-accident.html 20. RapidMiner (2019) Data preparation, August 2019. [Çevrimiçi]. Available: https://rapidminer. com/glossary/data-preparation/. Eri¸sildi: 23 June 2021 21. Agrawal R, Imieli´nski T, Swami A (1993) Mining association rules between sets of items in large databases. %1 içinde In: Proceedings of the 1993 ACM SIGMOD international conference on management of data (SIGMOD ’93). New York, NY, USA, pp 207–216 22. Carrig D (2018) USA TODAY, 26 May 2018. [Online]. Available: https://www.usatoday.com/ story/money/nation-now/2018/05/26/driving-car-crash-deaths-speeding/640781002/ 23. Times Malta (2017) Traffic accidents shoot up, 10 March 2017. [Online]. Available: https:// timesofmalta.com/articles/view/traffic-accidents-shoot-up.641996 24. Baesens B, vanden Broucke S (2017) What is the lift value in association rule mining? 10 April 2017. [Online]. Available: https://www.dataminingapps.com/2017/04/what-is-the-liftvalue-in-association-rule-mining/. Eri¸sildi: 14 June 2019 25. myLicence (2021) Government of South Australia-Department for Infrastructure and Transport, 19 January 2021. [Online]. Available: https://mylicence.sa.gov.au/my-car-licence 26. Depositphotos (2021) Depositphotos, 12 February 2021. [Online]. Available: https://depositph otos.com/184829792/stock-photo-satellite-view-of-botswana-at.html 27. Industrytoday (2020) Solar street lights vs traditional street lights, 8 May 2020. [Online]. Available: https://industrytoday.com/solar-street-lights-vs-traditional-street-lights/ 28. Phoenix Energy (2018) Phoenix Energy Blog, 19 July 2018. [Online]. Available: https://www. phoenixenergygroup.com/blog/development-of-solar-powered-highways
Chapter 25
EPM: Meta-learning Method for Remote Sensing Image Classification Shiva Pundir and Jerry Allan Akshay
1 Introduction Remote sensing images are an important data source that can help us observe and measure structures on the surface of the Earth. The volume of remote sensing images is growing at an exponential rate due to the recent advancements in satellite technology. This gives an impetus to make full use of the increasing repository of remote sensing images that can be used for intelligent earth observations. Thus, understanding collections of remote sensing images is extremely important. Due to an increased interest to understand and interpret remote sensing images accurately and effectively, the field of scene classification of remote sensing images has become an active research area. In order to label the given remote sensing image data accurately with predefined semantic categories, remote sensing image scene classification is required. In the recent decades, extensive research has been done on remote sensing image scene classification, with various real-world applications such as geospatial object detection [1, 2], natural hazards detection [3, 4], environment monitoring [5], and urban planning [6, 7]. It is a known fact that good feature descriptions and representations are prerequisites to extract semantic information from remote sensing image data. Feature engineering is an important step in remote sensing image classification. In the recent years, due to the exponential increase in the computing power available to the public, there has been a shift from crafting features manually to extracting features from the data samples via deep learning techniques, which are discussed in the next section of the paper. However, these networks are usually prone to failure when there happens to be limited available annotated data. The failure occurs because these networks tend to overfit the input data. This consequently leads to the networks being unable to S. Pundir (B) · J. A. Akshay University Visvesvaraya College of Engineering, Bangalore, Karnataka, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_25
329
330
S. Pundir and J. A. Akshay
generalize well. These networks also tend to learn skewed class priors and, hence, can be biased toward dominant classes of the distribution; and in cases where heavytailed class distributions exist, they do not generalize well. In juxtaposition, humans have the ability to quickly learn from a few examples by using already learnt prior knowledge. Such a capacity in fast adaptation and data efficiency can greatly expand the utilization of machine learning in various applications. Hence, meta-learning techniques have emerged to facilitate learning from rather small amounts of available annotated data. These techniques allow the system to rapidly adapt to new environments and tasks with a small number of training examples. In this research paper, we test and explore meta-learning techniques such as prototypical networks [8] (distance metric-based learner) and MAML [9] (gradient-based method) to help identify and label remote sensing images. To summarize, the contributions made in this paper are as follows: • We propose to use an ensemble of meta-learning methods in order to help label and classify remote sensing image datasets that have long-tailed class distributions, with few available annotated data samples. We formulate the classification problem as a n-shot (1-shot, 5-shot, or 10-shot) classification problem. We call the proposed network as the EPM model (ensemble of prototypical network and MAML). • We explore MAML which is a gradient-based method, prototypical networks which are metric learning based, and the ensemble of both to classify remote sensing images in low-data scenarios. • We also evaluate the EPM model on the publicly available remote sensing datasets such as AID [10], NWPU-RESISC45 [11], RSI-CB128 [12], PatternNet [13], and UC-Merced [14]. The results demonstrate that the EPM model outperforms prototypical networks and MAML working as a stand-alone model. The rest of this research paper is organized as follows: Sect. 2 presents the related work, Sect. 3 presents the proposed methodology, Sect. 4 presents the dataset and evaluation, Sect. 5 presents the experiment and results, and Sect. 6 presents the conclusion.
2 Related Work In the recent decades, there has been extensive research on remote sensing image scene classification, driven by its real-world applications. The rudimentary stages of scene classification were mainly based on handcrafted features as demonstrated by Yang and Newsam [14], Bhagavathy and Manjunath [15], dos Santos and Penatti [16], Aptoula [17], and Penatti et al. [18]. In order to help design a variety of features and their combinations, these methods were dependent on the engineer’s skill set and their domain knowledge. These characteristics carry vital information that can be used to classify scenes. A variety of these techniques exist such as SIFT [19], HOG [20], and many more.
25 EPM: Meta-learning Method for Remote Sensing …
331
In recent years, unsupervised feature learning has emerged as a viable alternative to handcrafted features. Unsupervised feature learning can extract features automatically from images from unlabeled input data. There exists many unsupervised feature learning techniques such as sparse coding, principal component analysis (PCA), auto-encoder, and K-means clustering. A few of these techniques can be integrated together, and few models can be stacked in order to form deeper unsupervised models. These techniques have led to considerable progress with unsupervised feature learning in the remote sensing image scene classification domain [21–25]. However, unsupervised feature learning techniques alone are not enough to provide the accurate results since there is a lack of semantic information which is provided by class labels. Hence, labeled data is still needed in order to develop better and more efficient supervised feature learning methods. When Salakhutdinov and Hinton [26] made a breakthrough in deep feature learning, there was a shift in the techniques used by the researchers. Researchers were now using deep multi-layer neural networks in order to obtain deep feature representations. These methods comprise multiple processing layers which can be used to learn powerful deep feature representations of data with multiple levels of abstraction [27]. At present, a variety of deep learning models exist such as deep belief nets (DBNs) [28], convolutional neural networks (CNNs) [29, 30], stacked auto-encoder (SAE) [31], and many other such deep learning models. But these models use a large volume of data and require a large sample size in order to be trained accurately. Therefore, we apply the few shot learning paradigm to the remote sensing classification problem to tackle these limitations.
3 Proposed Methodology 3.1 Preliminaries Few Shot Learning Paradigm Few shot learning models are trained on a labeled dataset Dtr and tested on Dte with only a few labeled examples per class in the test set. The class sets are disjoint between Dtr and Dte . Therefore, few shot learning deals on the data D = {Dtr , Dte } together. The few shot approaches rely on an episodic training arrangement where it is simulated by taking few samples from the large labeled set Dtr during training as shown Fig. 1. The K -shot, N -way episode training consists of each episode e taking a sample from N categories from the Dtr and then sampling two sets of images from these categories: (i) the support set Se = N ×K containing K examples for each of the N categories and (ii) the query (si , yi )i=1 set Q e = (q j , y j ) Qj=1 containing different examples from the same N categories. Like every other training of a model in machine learning, the objective here is to minimize the loss on the data samples in Q e , given the Se . This loss function can be denoted as the negative log-likelihood of the true class of each query sample with parameter θ :
332
S. Pundir and J. A. Akshay
Fig. 1 Few shot learning episodes (Here with K = 2 and N = 3)
L(θ ) =
E
(Se ,Q e )
−
Qe
log pθ (yt |qt , Se )
(1)
t=1
where for episode e, the sampled query and support set are (qt , yt ) ∈ Q e and Se , respectively. Prototypical Networks Prototypical network is a distance metric-based metalearning technique which calculates the representation of each class as a prototype vector, and this vector is the mean vector of the embedded support instances belonging to its class as shown in Fig. 2. For preparation of one training task, a subset of N classes are selected randomly. For each training task, a support set S = (x1 , y1 ), ..., (xu , yu ) and a query set Q = (xu+1 , yu+1 ), ..., (xu+v , yu+v ) are created by sampling examples from the selected classes, where x j are inputs and y j represent corresponding labels. Here, the number of examples in support and query set is represented by u and v, respectively. The representations of the inputs are calculated in the embedding space using x with an embedding function φ: z = φ(x, θ ) with parameter θ . For a class k, the prototype vector ck is calculated as the mean of all the embedded vectors of the example inputs Sk of the corresponding class k. Formally, ck is computed as follows: ck =
1 φ(x j , θ ) |Sk | (x ,y )∈S j
j
(2)
k
For a query sample x, the probability distribution of the predicted labels y is calculated using softmax over negative distances to the prototypes in the embedding space: exp(−d(z, ck )) (3) p(y = k|x, ck ) = k exp(−d(z, ck ))
25 EPM: Meta-learning Method for Remote Sensing …
333
Fig. 2 Prototype ck calculated as the mean of embedded examples for each class from support set
where z = φ(x, θ ). The learning begins by minimizing the negative log-probability of true label k, which can be formulated as follows: J = − log( p(y = y j |x j , ck ))
(4)
which is computed using (2) with the estimated prototypes. Model Agnostic Meta-Learning Model agnostic meta-learning (MAML) is a gradient-based meta-learning technique for few shot learning. The approach that MAML follows is to learn initial starting parameters of a network in such a way that it can quickly learn the optimized parameters pertaining to any few shot task in a proficient manner, which can then yield accurate results as demonstrated in Fig. 3. We define the base model to be a neural network represented by f θ with metaparameters θ . The aim is to learn initial parameters θ0 such that it can then be adjusted to obtain good task-specific parameters after only a few effective gradient descent steps on the data from a support set Sb . The parameters obtained after N steps are denoted by θ N , and the steps are called inner loop update processes. The updated θ parameters after i steps on data from the support task Sb can be expressed as follows: b b − α∇θ L Sb ( f θ(i−1) ) (5) θib = θi−1 where α is the step size, θib are the parameters after i steps toward task b, and b ) is the loss on the support set of task b after (i − 1) update steps. It is L Sb ( f θ(i−1) worth noting that θib is invariant to permutation of the samples. The meta-objective function with task batch size B can be denoted as follows: Lmeta (θ0 ) =
B b=1
LTb ( f θ Nb (θ0 ) )
(6)
334
S. Pundir and J. A. Akshay
Fig. 3 Optimizing initial θ such that MAML can quickly adapt to new tasks
The qualitative value of initial parameter θ0 is expressed as the sum of the losses throughout all the tasks. This transforms into a meta-objective function which is minimized to optimize the initial parameter value θ0 . This optimization is called the outer-loop update process. It is this initial θ0 that intrinsically holds the different task’s knowledge. The resulting update for the meta-parameters θ0 can be expressed as follows: B θ0 = θ0 − β∇θ LTb ( f θ Nb (θ0 ) ) (7) b=1
where β is step size and LTb denotes the loss on the target set for task b.
3.2 EPM We present the remote sensing image classification problem as a few shot problem. This is mainly due to two reasons. Firstly, the remote sensing image dataset can be skewed, and this causes the model to perform poorly on classes with less number of training examples. Secondly, training a large complex CNN is time consuming, and CNNs in general are very data hungry. It is a well-known fact that ensemble methods tend to reduce the variance of estimators and subsequently improve the quality of prediction as demonstrated in [32]. In order to gain accuracy from averaging, various data augmentation techniques or randomizations are typically used to encourage a high diversity of predictions [33, 34]. While individual classifiers of the ensemble network may perform poorly, the quality of the average prediction surprisingly turns out to be high sometimes. Hence, we develop an ensemble model called EPM model. An overview of the ensemble model architecture is shown in Fig. 4. The model broadly consists of two meta-learning methods, namely distance metric-based prototypical network (PN) and gradient-based MAML. For each episode e, the image batch is input into both the models and the models output the class scores. Then, the weighted average of these class scores is calculated using w1 and w2 to predict the final label of each image in the query set.
25 EPM: Meta-learning Method for Remote Sensing …
335
Fig. 4 Overview architecture of the EPM
4 Dataset and Evaluation We use five very well-known datasets in the remote sensing image recognition field for a comprehensive result. We briefly describe each of the dataset used. Figure 5 describes few examples from the datasets. AID is a dataset published by Wuhan University that consists of 30 scene classes. Each scene class consists of roughly 220–420 images, and each image has a size of 600 × 600 pixels. The dataset consists of 10,000 images, and the pixel resolution of each image ranges from 0.5m to 8m. NWPU-RESISC45 is a dataset which consists of 45 scene classes, with each class consisting of 700 images, and each image has a size of 256 × 256 pixels. The dataset consists of 31,500 images, and the pixel resolution of most images ranges from 0.2 to 30m. RSI-CB128 is a dataset that contains six categories with 45 subclasses of more than 36,000 images. The dataset includes variable image samples but averages out to 800 image samples per class. The pixel resolution varies within 0.3–3 m. PatternNet is a high-resolution remote sensing image dataset assembled for RSIR. PatternNet is based on project “TerraPattern”, which is an open-source tool for discovering “patterns of interest” in unlabeled satellite imagery. It consists of thirty-eight classes, with a total of 800 images, where each image has a size of 256 × 256 pixels. The spatial resolution varies from lowest −0.062 m to highest −4.693 m. UC-Merced is a dataset published by the University of California Merced. This dataset consists of twenty-one scene classes, and each image has a fixed size of 256 × 256 pixels. The dataset consists of 2100 images, and the pixel resolution of each image is 0.3 m.
5 Experiment and Results We use the deep learning framework named PyTorch [35] to conduct the experiment. To get a comprehensive result, we run each experiment for 400 episodes and average over them. We tabulate the results into Table 1.
336
S. Pundir and J. A. Akshay
Fig. 5 Few examples collectively from the five datasets. Each row represents a dataset Table 1 Average accuracy on 400 test episodes. PN: Prototypical networks, M: Model agnostic meta-learning, EPM: Ensemble of prototypical networks and MAML Dataset used 1-shot 5-shot 10-shot PN M EPM PN M EPM PN M EPM AID NWPURESISC45 RSI-CB128 PatternNet UC-Merced
52.4 49.0
58.2 57.3
60.4 62.7
73.4 75.2
76.8 76.3
77.9 78.3
80.2 80.5
81.7 80.8
83.8 83.5
50.2 51.4 51.9
56.9 60.2 58.5
61.5 61.9 60.9
74.4 74.0 75.1
77.0 76.8 77.1
78.2 79.3 78.5
79.9 80.7 81.0
80.2 79.9 81.3
83.7 84.1 84.3
25 EPM: Meta-learning Method for Remote Sensing …
337
For prototypical networks, we use four convolution blocks connected sequentially to extract image features. Each convolution block consists of a number of neural network layers connected in a sequential manner. Each block of the convolution network comprises a 64-filter 3 × 3 convolution layer followed by a batch normalization layer [36]. We use ReLU nonlinearity as the activation function, which is finally followed by a 2 × 2 max pooling layer. This layer architecture results in a 64-dimensional output space when applied to images. All of our CNN models are trained via stochastic gradient descent with Adam optimizer. We use learning rate of 10−3 as the initial value and then cut the learning rate in half every 1000 episodes. We trained prototypical networks in the five-way (1-shot, 5-shot, and 10-shot) scenarios with training episodes containing ten query points per class. For MAML, we use the same neural network architecture with a fully connected linear layer head that predicts the probability distribution of the classes for the images. Due to the computational limit, we only use first-order MAML with meta-batch size of four samples. Following the original paper, we keep the hyper-parameters same, i.e., we train the model using Adam optimizer and cross-entropy loss as the loss function. We use an initial learning rate of 10−3 and reduce the learning rate to half for every ten episodes if the loss does not decrease. We also keep the inner loop learning rate to be 0.01. Similar to prototypical networks, we provide results for five-way (1-shot, 5-shot, and 10-shot) classification. After experimenting, we found w1 = 0.58 and w2 = 0.42 to be the useful values for calculating the weighted average of the class scores. We also evaluated the effects of distance metric and the number of training samples used while testing per episode on the performance of prototypical networks and MAML. We represent the results on a bar graph in Fig. 6.
Fig. 6 Few examples collectively from the five datasets. Each bar represents a dataset
338
S. Pundir and J. A. Akshay
6 Conclusion We propose the EPM model as a more efficient and accurate classification model to address the problem of remote sensing image classification, specifically for training classes with relatively small sample sizes. Through our tests and analyses on five different datasets, we confirm that the few shot learning method can be applied to the remote sensing image classification problem. We believe that further research in this path and direction can be propelled by incorporating newer state-of-the-art few shot learning algorithms that can be used as a stand-alone model, or even as an ensemble of different models. Acknowledgements We would like to thank our mentor Dr. P Deepa Shenoy for her guidance and Massimiliano Patacchiola for his helpful learning content. This work was supported by the University Visvesvaraya College of Engineering.
References 1. Li Y, Zhang Y, Huang X, Yuille AL (2018) Deep networks under scene-level supervision for multi-class geospatial object detection from remote sensing images. ISPRS J Photogramm Remote Sens 146:182–196 2. Li K, Wan G, Cheng G, Meng L, Han J (2020) Object detection in optical remote sensing images: a survey and a new benchmark. ISPRS J Photogramm Remote Sens 159:296–307 (Online). Available: https://www.sciencedirect.com/science/article/pii/S0924271619302825 3. Martha T, Kerle N, Westen C, Jetten V, Vinod Kumar K (2011) Segment optimization and datadriven thresholding for knowledge-based landslide detection by object-based image analysis. IEEE Trans Geosci Remote Sens 49:4928–4943 4. Cheng G, Guo L, Zhao T, Han J, Li H, Fang J (2013) Automatic landslide detection from remote-sensing imagery using a scene classification method based on bovw and plsa. Int J Remote Sens 34(1):45–59 5. Zhang T, Huang X (2018) Monitoring of urban impervious surfaces using time series of highresolution remote sensing images in rapidly urbanized areas: a case study of shenzhen. IEEE J Sel Top Appl Earth Observations Remote Sens 11(8):2692–2708 6. Longbotham N, Chaapel C, Bleiler L, Padwick C, Emery WJ, Pacifici F (2012) Very high resolution multiangle urban classification analysis. IEEE Trans Geosci Remote Sens 50(4):1155– 1170 7. Tayyebi A, Pijanowski BC, Tayyebi AH (2011) An urban growth boundary model using neural networks, gis and radial parameterization: an application to tehran, iran. Landsc Urban Plan 100(1):35–44 8. Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. CoRR, vol abs/1703.05175 (Online). Available: http://arxiv.org/abs/1703.05175 9. Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. CoRR, vol abs/1703.03400. (Online). Available: http://arxiv.org/abs/1703.03400 10. Xia G, Hu J, Hu F, Shi B, Bai X, Zhong Y, Zhang L, Lu X (2017) Aid: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans Geosci Remote Sens 55(7):3965–3981 11. Cheng G, Han U, Lu X (2017) Remote sensing image scene classification: benchmark and state of the art. CoRR, vol abs/1703.00121 12. Li H, Tao C, Wu Z, Chen J, Gong J, Deng M (2017) RSI-CB: a large scale remote sensing image classification benchmark via crowdsource data. CoRR, vol abs/1705.10450
25 EPM: Meta-learning Method for Remote Sensing …
339
13. Zhou W, Newsam SD, Li C, hao Z (2017) Patternnet: a benchmark dataset for performance evaluation of remote sensing image retrieval. CoRR, vol abs/1706.03424 14. Yang Y, Newsam S (2010) Bag-of-visual-words and spatial extensions for land-use classification. In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems, ser. GIS ’10. Association for Computing Machinery, New York, pp 270–279 15. Bhagavathy S, Manjunath BS (2006) Modeling and detection of geospatial objects using texture motifs. IEEE Trans Geosci Remote Sens 44(12):3706–3715 16. dos Santos J, Penatti O (2010) Torres R. Evaluating the potential of texture and color descriptors for remote sensing image retrieval and classification 2:203–208 17. Aptoula E (2014) Remote sensing image retrieval with global morphological texture descriptors. IEEE Trans Geosci Remote Sens 52(5):3023–3034 18. Penatti OAB, Nogueira K, dos Santos JA (2015) “Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? In: IEEE conference on computer vision and pattern recognition workshops (CVPRW), 44–51 19. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110 20. Dalal N (2005) Triggs B. Histograms of oriented gradients for human detection 1:886–893 21. Mekhalfi M, Melgani F, Bazi Y, Alajlan N (2015) Land-use classification with compressive sensing multifeature fusion. IEEE Geosci Remote Sens Lett 12:08 22. Cheriyadat AM (2014) Unsupervised feature learning for aerial scene classification. IEEE Trans Geosci Remote Sens 52(1):439–451 23. Zheng X, Sun X, Fu K, Wang H (2013) Automatic annotation of satellite images via multifeature joint sparse coding with spatial relation constraint. IEEE Geosci Remote Sens Lett 10:652–656 24. Xia G-S, Wang Z, Xiong C, Zhang L (2015) Accurate annotation of remote sensing images via spectral active clustering with little expert knowledge. Remote Sens 7:15014–15045 25. Li Y, Tao C, Tan Y, Shang K, Tian J (2015) Unsupervised multilayer feature learning for satellite image scene classification. IEEE Geosci Remote Sens Lett 13:11 26. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507 27. LeCun Y, Bengio Y (2015) Hinton G. Deep learning 521:436–44 28. Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554 29. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems - Volume 1, ser. NIPS’12. Curran Associates Inc., Red Hook, pp 1097–1105 30. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv 1409.1556 31. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408 32. Seni G, Elder J (2010) Ensemble methods in data mining: improving accuracy through combining predictions, vol 2 33. Mikoajczyk A, Grochowski M (2018) Data augmentation for improving deep learning in image classification problem, pp 117–122 34. Hu B, Lei C, Wang D, Zhang S, Chen Z (2019) A preliminary study on data augmentation of deep learning for image classification. CoRR, vol abs/1906.11887 35. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: an imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, Fox E, Garnett R (eds) Advances in neural information processing systems 32. Curran Associates, Inc., pp 8024–8035 36. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR, vol abs/1502.03167
Chapter 26
Collaborative Network Security for Virtual Machine in Cloud Computing for Multi-tenant Data Center Rajeev Kudari, Dasari Anantha Reddy, and Garigipati Rama Krishna
1 Introduction In previous years, the computational world has been experiencing substantial modifications starting from single application to client–server architecture. This resulted in update of service-oriented architecture from distributed base. In Ref. [1], Pervez et al. said that the main target of all these transformations is to improve business process execution and making the software easily available to clients in an efficient way. In Ref. [2], Joseph et al. explained about cloud computing is an IT delivery model which is trending in today’s world of technology, and the upcoming generation of network computing is capable of delivering both software and hardware resources and services over the Internet with cheaper IT costs and reduces difficulty level. Various companies like Amazon, Google, IBM, Microsoft, Oracle, Sales force, and HP are showing their capability in providing cloud solutions to its users in various methods. The data centers for cloud service consist of infrastructure where the Internet services are supported. In Ref. [3], NIST demonstrates that cloud data center is obtained from various aspects, and IaaS, PaaS, and SaaS are very popular categorization techniques, and it also consists of different cloud categories like public cloud, private cloud, hybrid cloud, and many different categories. Computing, networking, storage are included in a system which are in use or archiving (not in use), and transmission from the data perspective. Most particularly, the cloud network characteristics of various cloud services (remote access) are considered, inclusive of a cloud and R. Kudari (B) · D. A. Reddy · G. R. Krishna Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, India e-mail: [email protected] D. A. Reddy e-mail: [email protected] G. R. Krishna e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_26
341
342
R. Kudari et al.
among networks of cloud. The network virtualization is provided by VMware NSX along with software-defined network (SDN) within a data center. The Google B4 network [4] also uses an open flow-based SDN [5, 6] for development of all interrelated connections within the data centers of cloud in various stations. The network boundary has dynamic attributes of SDN it possesses some challenges like “twin” in network topology that has extendable flexibility which is the effect of virtualization. It can otherwise be termed as dynamic and virtual boundaries of logic by replacing the static, constant boundaries of the conventional network in SDN. In Ref. [7], Whyman et al. describe that cloud computing is a technique that helps the users to store and process the user data on Web via Internet, and the main concern is security that is undoubted. The main reason that is not encouraged by the companies for not updating to cloud computing platform is that it is not much safe and secure as compared to infrastructure as that of conventional network system [8]. For the users to trust this cloud computing platform, it is very crucial for maintaining security in every aspect of this computing platform.
2 RNetwork Security in VMs 2.1 Virtual Machine Security The virtualization of technology has begun with the enhancement of the system/360 by IBM [10]. Improvement in the server performance by users providing virtual machine in an operating system (hypervisor) is the major reason for virtualization. In cloud computing, virtualization has become a basic technology from many years, and scalable Internet services have enabled cloud computing platforms for allocating virtual machines dynamically such as Amazon EC2/S3 [11]. Virtual machine technology is the streaming sector in today’s IT industry, and eventually, security maintenance of virtual machines has become a major concern.
2.2 Virtual Network Isolation is one of the major issues involved in virtualization. Hence, in order to ensure that one VM cannot affect the other VMs being processed in the similar host, here, isolation plays a key role in virtual machines. Within a shared physical network, creation of independent and isolated network is a method involved in virtual network. Various hypervisors are observed at present (i.e., Xen, VMware) that offers virtual network mechanism for virtual machines for accessing physical networks. In this paper, Xen hypervisor is taken as the instance for representing the working model of virtual network.
26 Collaborative Network Security for Virtual Machine …
343
2.3 Vulnerabilities in Virtual Network Remarkable affects are observed in the interconnectivity of VMs by virtual network. It is one of the biggest challenges of security while designing the platforms like cloud computing. It is highly secured for isolating each VM; dedicated physical channels are utilized for every link provided by host VM. Anyhow, we already discussed about the configuration modes of virtual network in Xen; it is commonly known that many of the hypervisors (i.e., VMware, virtual box) offer virtual networking platform for linking VMs by usage of bridge routes. The performance of the communication among inter-VM is needed, while VMs are being processed in the same host in these modes, and this leads to vulnerability of isolation.
2.4 Network Security Issues of Data Center Networks Security-related issues of data center networks require SDN security. It is a leading topic for both academics and industrial purpose. A framework for development of security applications is introduced by FRESCO [12, 13]. It is used for solving various important issues during implementation of security service elements based on the demands. The simple and convenient designs of complex applications in network security are provided by the open standards of open flow, and the integration of large networks is performed. By this procedure, applications are not so efficient. Open flow more particularly on the security aspects. A scripting language is generated by FRESCO in application programming interface (API). Basic procedure unit of modular library in FRESCO for security monitoring is written by the programmer. A FRESCO module allows customization of stream processing rules to provide an effective response over detected network threats.
2.5 VMware NSX Network Virtualization Data center networks deal with network virtualization platform like VMware NSX [14]. Network security system and network management are assured by VMware NSX for the development of software-defined data center which integrates SDN and network functions virtualization (NFV). This model is designed for achieving the easy establishment of multiple layers in a logical network by overcoming complex configuration that improves the efficiency and flexibility of physical devices in deployment of network. It generates 2–7 layers of NFV, and it generates a softwarebased security model. For achieving the disassociation of the hardware networks that are underlying and it also generates NFV from 2 to 7 layers security model which is software-based, thus, we can make an efficient use of the infrastructure at present network system.
344
R. Kudari et al.
2.6 Network Security and NSX Vendors offer some security functions that are integrated by NSX. For simplification of the security products for third-party and combined services, single-specific API is developed by the authorized vendors in linking the security platforms [15] and the network platform. Deployment of third-party Firewalls is implemented by the tools like NSX’s service c,omposer. It also helps in deployment of malware detection, management of vulnerability, protection from loss of data, trespass detection, and intrusion prevention platforms on whole.
3 A Novel Virtual Network Model vCNSMS In this section, vCNSMS which is a prototype system of collaborative network management is developed from vCNSMS for data center networks used by multitenants. From this paper, it is explained that vCNSMS is integrated with cloud environment of virtual network. A prototype system like vCNSMS is a home-brewed version of multifunction gateway with an un tangible open source.
3.1 The DCN Collective Network Security Principle 3.1.1
Network Topology at Basic Level
The Data center networks are used for deployment of peer-UTM and Security center. In stages of bootstrap the peer-UTM executes process of registration in a Security Center. The registration information is received by the security centers and it also represents the UTMs that are registered. In bootstrap stages information such as vCNSMS configuration and registration procedure is displayed.
3.2 Deep Security Check in vCNSMS 3.2.1
Function Settings
The Security rules are maintained in the security center which gathers information such as feedback from the deployment rule and such data is stored into the security logs. (1)
Security rules can be downloaded at high standards in these Security Centers such rules are excluded from the new set of rules which makes this network
26 Collaborative Network Security for Virtual Machine …
(2)
(3)
3.2.2
345
more reliable. The peer-UTMs are added with additional rules into the package and published. New rules are to be downloaded from these security centers and they are inserted in to the firewall modules such that these correlated functions are enabled. Security Centers recollects the security events back to the network. Filtration module contains UDP. Particular categories of data packets are blocked under these current rules. Appendix A consists of a format for UDP rules and matching content is presented in this area. Security Functions Enhancement
This prototype system consists of Protocol Control module and it mainly enforces UDP protocol rules. (1)
(2)
Rules are updated regularly in UTM more particularly Firewall rules are obtained in this UTM repeatedly. This information includes configuration of the system and the content inspection rules for UDP. Here blacklist is prepared based on the filtration of content such as source, ports and URL. Content filtering is based on UDP packets based on these rules. Packets which contains same kind of signatures will be joined together. For instance, DNS request packets of DNS consisting of specified string text “abc” is matched together.
4 Result 4.1 vCNSMS Intelligent Flow Processing An advanced method named intelligent flow processing is used for detecting intrusion. Intelligent processing flow in vCNSMS is based on the smart packet verdict scheme. It is suggested that a protection policy for intelligent flow processing basing on the security level in vCNSMS.
4.2 Shared Network Layer In the shared network layer it is not a simple task to stop sharing of information among virtual machine with shared network. Our presumption is that virtual machines belongs to the same shared virtual network are reliable for one another. Same network sharing unit must be allocated for the virtual machines which work for similar companies or industries. For improving this security to the higher levels in this model specific set of subnets are mentioned as follows (i.e., 10.232.193.0,
346
R. Kudari et al.
10.232.194.0, 128.128.10.0) earlier and a unique subnet is shared by every network by the administrator which is created and shared by the network sharing areas. Logical switches, logical routers, APIs in NSX, Firewalls, load balancing logic circuits, logical VPN etc. are the core elements of VMware NSX. It provides virtualization of network. This approach enables the operators of data center for treating this physical network as a capacity tool and system on demand. Complete NSX structure is shown in Fig. 1 exhibits the spoofing process in virtualized environment. This information of feeding back received from Virtual Machine is recorded by the routing table that consists of port, MAC address and IP address. In this research paper we have analyzed the best technique for multi- tenant data centers and the Fig. 1 explains about how the network virtualization is possible by NSX and how spoofing process is carried out in the virtual environment. This paper also explains about the cloud management platform and core components involved in it that is logical firewalls, load balancer, Logical VPN, logical switches and routers etc.… and the information is recorded. This spoofing concept is explained with the Fig. 1 Spoofing in virtual network platform of VMware NSX
26 Collaborative Network Security for Virtual Machine …
347
Fig. 2 Throughput rate of smart packet verdict scheme in vCNSMS
help of following diagram for representation of Spoofing in Virtual Network platform of VMware NSX. Vendors of Network security provide the scalability in NSX. A framework like distributed services is used by NSX platform which allows multiple number of hosts for integrating this network service, and it makes insertion of easy into new modules of service by using NSX API. Variety of interfaces in logical devices are offered by NSX but at the same time it requires cooperation among the vendors for achieving precise mapping from logical to physical connection while dealing huge varieties of hardware devices. Data center for multiple tenants addresses network security in vCNSMS and vCNSMS is represented with a centralized collective scheme. Further vCNSMS also combines a smart packet verdict for inspecting the packet flow for protecting data center network from the threats that can be occurred. The following graph in Fig. 2 depicts the throughput of rate of smart packet verdict scheme in vCNSMS is shown below as follows. Above graph depicts and explains about the pure forwarding of smart packets in vCNSMS throughput rate in effective way and it gives accurate results compared with other process in cloud computing of virtual machine is depicted in the graph. Basing on operational deployment the gained experience is used for evaluation of data center network on vCNSMS and this security center is capable of establishing additional rules of security and events on data network is also collected. There may be possibility of detecting violations made in network policy and artificial intelligence technique identifies intrusion into a network zone basing on learning procedures which are unsupervised. It is an optimized area that has high scope of inventions in future.
348
R. Kudari et al.
5 Conclusion In the research paper we suggested a vCNSMS network security that is addressed in data centers utilized by multiple tenants and centralized collaborative scheme of vCNSMS is represented A smart packet verdict scheme for inspection of packet is integrated further by vCNSMS for defending itself from expected attacks inside a network of the data center. A data center for vCNSMS is deployed by an SDN-based virtualization network for obtaining a flexible and scalable system for protection of multiple tenant servers for various network policies and user requirements for security concerns. The scheme like smart packet verdict is investigated further and improved with parallelization. Additional focus is shown on the virtual network security which is a leading technology in cloud platform. A novel framework for virtual network is improved in order to maintain inter-communication security for virtual machines. Based on the analysis of Xen, virtual machines are deployed in physical machines. The key functions of this model are Firewalls, specified subnets and routing table that works efficiently in preventing attacks on virtual machines such as sniffing and spoofing theory attacks.
References 1. Pervez Z, Lee Y-K, Lee S (2020) Multi-tenant, secure, load disseminated SaaS architecture. Meeting on 12th Advanced communication technology (ICACT) International conference proceedings. Phoenix, USA, pp 214–219 2. Armbrust M, Joseph D, Griffith R, Fox A, Anthony (2019) Above the clouds: a Berkeley view of cloud computing. Department of EECS, California University, Berkeley 3. Amin, Singh H, Sethi N (2019) Review on fault tolerance techniques in cloud computing. Int J Comput Appl 4. Jain S, Ong J, Poutievski L, Kumar A, Mandal S, Venkata S, Singh A, Wanderer J, Zhu M, Zhou J, Zolla J, Hozle U (2018) B4: Experience with a globally-deployed software defined WAN. In: Proceedings of ACM SIGCOMM, conference on SIGCOMM, China 5. Panda A, Singla, Liu JD, Schapira M, Godfrey B, Shenker S (2018) Ensuring connectivity via data plane mechanisms. In: 10th USENIX proceedings of symposium on networked systems design and implementation, USA 6. Yan BH, Liu JD, Liu JD, Shenker S (2017) Data driven network connectivity. Meeting on 10th ACM workshop on trending topics on networks, USA 7. Whyman (2017) Cloud computing, information security and privacy advisory board, December 5, 2017 8. RPV, Kandukuri BR, Rakshit A (2017) Cloud security issues. In: 2017 IEEE International conference proceedings on service computing, pp 517–520 9. Rad MP, Badashian AS, Meydanipour G, Delcheh MA, Alipour M, Afzali H (2016) A survey of cloud platforms and their future. In: Lecture Notes in Computer Science, pp 788–796 10. Vaquero LM, Rodero-Merino L, Lindner M (2016) A break in the clouds: towards a cloud definition. SIGCOMM Comput Commun Rev 39:50–55 11. Geelan J (2016) Twenty one experts define cloud computing. Virtualization. IEEE J 12. Yegneswaran V, Porras P, Shin S, Fong M, Tyson M (2016) FRESCO: modular composable security services for software-defined networks. Presented at network and distributed security symposium
26 Collaborative Network Security for Virtual Machine …
349
13. Reiter MK, Sekar V, Kompella RR, Willinger W, Zhang H, Andersen DG (2015) A system for network-wide flow monitoring. In: 5th USENIX symposium proceedings on networked systems design and implementation, USA, pp 233–246 14. Anderson T, Peterson L, Shenker S, Turner J (2015) Overcoming the Internet impasse through virtualization. Computer 38(4):34–41 15. Schwenk J, Jensen M, Gruschka N, Iacono LL (2015) On technical security issues in cloud computing. In: IEEE International conference on cloud computing, pp 109–116 16. Chen H, Yu L, Qin J, Dou Q, Heng P-A (2017) Automated melanoma recognition in dermoscopy images via very deep residual networks. IEEE Trans Med Imaging 36(4):994–1004 17. Menai MEB (2016) Random forests for automatic differential diagnosis of erythemato–squamous diseases. Int J Med Eng Inform 7(2) 18. Xiang Y, Song HO, Savarese S, Jegelka S (2016) ‘Deep metric learning via lifted structured feature embedding. In: Proceedings of IEEE conference on computer and visual pattern recognition (CVPR), pp 4004–4012 19. Zhou P, Chen M, Hu L, Wu D, Alamri A, Hassan M (2014) AI-skin: skin disease recognition based on self-learning through a closed-loop framework, vol 54, pp 1–9
Chapter 27
Binary Classification of Toxic Comments on Imbalanced Datasets Using Recurrent Neural Networks Abhinav Saxena, Ayush Mittal, and Raghav Verma
1 Introduction Toxicity in online forums and communities is a major issue that needs to be addressed to ensure a friendly and positive environment for the users. Racist, sexist and hateful comments often found in online forums discourage people from being part of the online community and most stop participating in discussions altogether. Cyberbullying is also a huge problem faced by a large number of children registered on social media websites and it can severely affect their psyche. Flagging comments as toxic and non-toxic is a binary classification problem. Here, the features are the contents of the comment itself and the label is one of two classes—“Toxic” or “Non-toxic”. Since context and relative positions of words play an important role in the overall meaning and sentiment of the sentence, trivial methods such as a simple keyword search will not work well. Also, traditional methods such as bag of words representation and TF-IDF representation for text are not sufficient to achieve good accuracy. Apart from feature extraction, the machine learning model to be trained must also incorporate some level of “context” and “memory” from the surrounding words. This project aims to identify toxic comments using word embeddings for feature extraction and a deep neural network for classification. The dataset which we use to train the model is Google/Jigsaw’s Wikipedia comments dataset which contains comments on Wikipedia talk pages labelled into seven classes—“Clean”, “Toxic”, “Obscene”, “Insult”, “Identity Hate”, “Severe Toxic” and “Threat”. This dataset has highly imbalanced classes so we have used augmentation (backtranslation) on the minority class (“Toxic”), and under sampling on the majority class (“Non-toxic”) to balance the classes. Moreover, due to the very large proportion of data in the “Clean” class (89.84%), we decided to create a binary classifier and reduce these A. Saxena (B) · A. Mittal · R. Verma National Institute of Technology, Kurukshetra, Haryana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_27
351
352
A. Saxena et al.
classes into two—“Toxic” and “Non-toxic”. The “Toxic” class includes all classes apart from “Clean”. Finally, we have used a subset of this dataset, and a Twitter hate comments dataset [1] for testing. The F1 score is used as a metric of how well our model performs. We have used the standard text preprocessing practices commonly used in text classification problems. Feature extraction includes representing words in the form of vectors in order to associate meaning to each word and also capture relations between them. We are using word embeddings for this purpose. After feature extraction, we train a deep neural network having a single recurrent layer on these features and perform a comparative study to find out which architecture gives the best results. We have compared four models each having a combination of recurrent and dense layers, along with dropout layers to help prevent overfitting. We have compared the validation accuracy of LSTM, GRU, Bi-LSTM and Bi-GRU to find out which of these is best for this application.
2 Motivation In this digital age, a large part of an individual’s social needs is fulfilled by the Internet. Social media platforms like Facebook, Twitter, Reddit, etc. have a huge number of users and it is only increasing with time. Online platforms and forums are becoming a place where one can share their opinions and discuss various topics with other people. Hence, these websites want to create a welcoming environment for every one of their users. However, some users aim to create a negative environment by posting racist, sexist and hateful comments. As more and more children are getting access to the Internet at a young age, this is an alarming problem. Therefore, there needs to be some form of moderation to what people post online. Countless comments are posted on social media websites every day and some fraction of them will inevitably be toxic and hateful. Human volunteers and users can identify and flag these comments but it is not possible to manually identify every comment in this way. We aim to build a machine learning model for binary classification of comments to automate this task of identifying toxic comments. This will help create a welcoming and positive environment on the Internet.
3 Related Works The basic components for any kind of text processing are feature extraction and training a model. Word embeddings have been extensively used for feature extraction. For training, logistic regression, SVM, RNN and CNN have been used. RNNs are widely used in the field of text processing since RNN can process sequence of inputs while retaining information about the previous input. This is especially helpful when
27 Binary Classification of Toxic Comments …
353
we are dealing with sentences since each subsequent word is dependent on the words that occur before it. Yin et al. [2] combined TF-IDF with sentiment/contextual features. They compared the performance of this model with a simple TF-IDF model and reported a 6% increase in F1 score. This suggests that just a simple TF-IDF approach is not well suited for feature extraction for this particular problem. Zaheri et al. [3] found that LSTM gave 20% more true positives when compared to a naïve Bayes approach. Nguyen and Nguyen [4] proposed a model for sentiment label distribution that involved using a convolutional neural network combined with a bidirectional-LSTM. This model attained an accuracy of 86.63% on the Stanford Twitter Sentiment Corpus. Beniwal and Maurya [5] proposed a hybrid deep learning approach using convolutional and bidirectional-GRU layers along with conventional dense layers and achieved an F1 score of 0.79 on Wikipedia Comments dataset. This suggests that RNN architectures would give good results for this classification problem. Yu and Wang [6] proposed a word vector refinement model that could be applied to pre-trained word vectors (e.g., Word2vec and Glove) to enhance sentiment information capture. The word embeddings from the refined model improved Word2Vec and GloVe by 1.7 and 1.5% averaged over all classifiers for binary classification, respectively. This suggests that a considerable amount of the model’s accuracy is dependent on feature extraction and word embeddings used. Chu and Jue [7] compared the performance of various deep learning approaches to this problem, specifically using both word and character embeddings. They assessed the performance of recurrent neural networks with LSTM and word embeddings, a CNN with word embeddings, and a CNN with character embeddings. This also reinforces the idea of using recurrent layers in the architecture. Khieu and Narwal [8] compared SVM, multilevel perceptron, CNN and LSTM in an attempt to find which model is best suited for binary and multi-label classification. They also pointed out the class imbalance in Google Jigsaw dataset. Since the imbalance is so severe, it must be handled. They addressed this issue by using random subsampling from the non-toxic class for binary classification. They achieved the best F1 score of 0.88 using a three-layer LSTM architecture for binary classification. Aken et al. [9] also used random subsampling for handling class imbalance and compared the performance of CNN, LSTM, GRU and logistic regression and also created an ensemble to get better accuracy. They also tested their model on Twitter hate comment dataset to find how their model performed on an entirely new dataset that it has not been trained on. The best F1 score they achieved was 0.791 on Wikipedia comment dataset, and 0.793 on Twitter comment dataset. In our approach, we have used resampling and data augmentation using back translation to handle the class imbalance. We have also used a combination of recurrent and dense neural network layers and we have achieved better F1 scores in both the Wikipedia comments dataset and Twitter hate comments dataset than the previous works on this domain.
354
A. Saxena et al.
4 Proposed Approach We aim to create a model for binary classification of toxic comments which also performs well in other datasets it has not been trained on. Apart from this, we compare the performance of LSTM, GRU, Bi-LSTM and Bi-GRU (Bidirectional) to find out which of these give the best results for this classification problem. We train the model using Google/Jigsaw’s Wikipedia comments dataset and test its performance on a subset of this dataset and on Twitter hate comments dataset. We have used F1 score as a metric of our model’s performance on these two datasets.
4.1 Handling Class Imbalance The first thing we notice about the dataset is that there is a large imbalance among the classes. The majority of comments are “Clean” (more than 80%). Moreover, since we aim to create a model that performs well for binary classification, we need to combine the other six classes—“Toxic”, “Obscene”, “Insult”, “Identity Hate”, “Severe Toxic” and “Threat” into one. Clearly, even after combining toxic classes together, there is still a large class imbalance. To handle this, we perform data augmentation on the “Toxic” class to increase its size and perform random under sampling on the “Non-toxic” class to decrease its size (Table 1). Data augmentation is used whenever there is a class imbalance in the dataset and one needs to increase the size of a class. Conventionally, data augmentation for audio and visual data is done by adding noise, using transformations, etc. The most important thing to consider for textual data augmentation is that the overall sentiment of the text remains the same while the words that are used change. One way to augment textual data is to use a thesaurus to replace words with their synonyms. Another way is to translate the text into some intermediate language and then back translate it to the original one. This is called backtranslation and we have used it for data augmentation. The intermediate language used is French. The next step is to select a random sub-sample from the non-toxic data points. Finally, we have 32,450 data points in each class, for a total of 64,900. We use a 60–20–20 split for training, testing and validating. For testing, we are also using Non-toxic data points that were not selected during under sampling. Table 1 Class distribution
Toxic
Non-toxic
Original
16,225 (10.16%)
143,346 (89.84%)
After resampling
32,450 (50.0%)
32,450 (50.0%)
27 Binary Classification of Toxic Comments …
355
4.2 Preprocessing and Feature Extraction The first step in any kind of text classification is preprocessing. Most of the time when dealing with text data, text preprocessing affects the accuracy the most. First, we clean the text by removing all characters that are not alphabets or spaces. Then we make all the characters lowercase and finally, we remove stop words (the, is, on, etc.) as they do not affect the sentiment of the sentence and then perform stemming. Stemming involves reducing a word to its root form. For example, “going” and “gone” will both be reduced to “go”. The accuracy of the GRU model improved by 1.1% when we used stemming (Fig. 1). After cleaning the text, we need to represent it in the form of a feature vector. Various pre-trained word embedding models like GloVe, Word2Vec, etc. are available for this. These word embeddings capture relationship between words and are hence very useful in tasks such as this one. Simple approaches like Bag of Words and TFIDF give worse results in sentiment analysis. We have used GloVe word embedding model [10] with 300 dimensions for our feature vector. We have also tried using different number of dimensions (20,100 and 200) but using 300-dimension feature vectors gave the best results. Therefore, for every word in a sentence, we have a feature vector of size 300 which will be fed into the neural network. Finally, we create the embedding layer which will be the first layer in the model that we make. Note that this particular layer is not made trainable since we are using a pre-trained model and do not want to update the pre-defined weights. Now, each comment is in the form of a vector ready to be fed into the deep neural network.
Fig. 1 Pre-processing and feature extraction process
356
A. Saxena et al.
4.3 Model Architecture After extracting the features, we train a deep neural network. This network has a single RNN layer. We have tried four architectures, each using a single RNN layer— LSTM, GRU, bidirectional-LSTM or bidirectional-GRU. The reason for using RNN is that we can use their internal state which acts as memory to process sequence of inputs. Thus, we can feed words in a sentence to the RNN and each subsequent word is then affected by the word occurring before it (and by surrounding words in both directions in the case of bidirectional RNN) [11] (Fig. 2). LSTM and GRU are modified versions of RNN and are widely used in text classification and natural language processing. Recurrent neural networks suffer from vanishing gradient problem. During the process of backpropagation to update the weights, the gradient becomes smaller and smaller with time, which means that it will be so small that it will not contribute much to the learning of the algorithm. LSTM and GRU were created as the solution to short-term memory. They have internal mechanisms called cell states and gates that can regulate the flow of information. The gates can learn which words in the comment are important to keep or to be thrown away. They are composed out of a sigmoid neural net layer and a pointwise multiplication operation. The sigmoid layer outputs a number between zero and one, for how much of each component should pass. With this, the gate can pass relevant information through the long chain of words to make more accurate predictions. Nowadays, LSTM’s and GRU’s can be found in many NLP applications like speech recognition, speech synthesis, and text generation and many more. We have used a single recurrent layer in our architecture along with dense, spatial dropout 1D, dropout and global max pooling layers. The four models that we use will each use a different architecture in the recurrent layer and we aim to identify which architecture is best suited for this application. Not using dropout layers causes the model to overfit rapidly. The spatial dropout 1D layer performs the same function as dropout layer but it drops entire 1D feature maps instead of individual elements. The final dense layer has a sigmoid activation function and returns a value between 0 and 1. For values greater than 0.5, the comment is labelled as toxic.
Fig. 2 Model architecture
27 Binary Classification of Toxic Comments …
357
4.4 Hyperparameter Tuning Model accuracy greatly depends on the hyperparameters used. Hyperparameters for neural networks include epoch size, batch size, optimizer and learning rate. Epoch Size. Larger the number of epochs, the more susceptible the model is to overfitting. We found that the four models give the best validation accuracy at 6 epochs. Learning Rate and Optimizer. We tested the Stochastic Gradient Descent (SGD) and Adam optimizer with different learning rates for the four models. It was found that a learning rate of 0.001 with the Adam optimizer yielded the best results. Batch Size. Batch size controls how many of the data points are to be taken simultaneously for the chosen optimization algorithm. Very large batch sizes take longer to compute and experimentally give worse results compared to small batch sizes [12]. The final hyperparameters used are discussed in detail in the results section.
4.5 Evaluation Metrics We have compared the four architectures (LSTM, GRU, Bi-LSTM, Bi-GRU) using validation accuracy as the metric. Higher validation accuracy means the model performs well in unknown data and is not overfitted on the training dataset. Once we have established the architecture to be used and the hyperparameters which give the best results, we run our model on a subset of the Google Jigsaw dataset and on the Twitter hate comments dataset. This gives us a good measure of how the model performs on new datasets that it has not been trained on. Since the Twitter dataset has mostly toxic comments (83%), we have used F1score to measure how accurate our model is. The reason for using F1 score is that it is a better metric than accuracy in datasets with class imbalance. We also test our model in a dataset with mostly non-toxic comments (a subset of Google Jigsaw Wikipedia comments dataset) and find the F1 score for that as well.
5 Results Using GRU in the recurrent layer of the model achieved the maximum validation accuracy of 94.16% as shown in the Table 2. In Table3, we have mentioned the validation accuracy for the different hyperparameters chosen in the GRU model. The epoch size for each model was fixed at 6 epochs since training the model any further caused overfitting. The hyperparameters discussed in the Table 3 are- optimizer, learning rate, batch size and the GloVe embedding dimensions.
358
A. Saxena et al.
Table 2 Validation accuracy for the four models tested Architecture used in recurrent layer
Validation accuracy (%)
LSTM
91.47
GRU
94.16
Bi-LSTM
93.18
Bi-GRU
92.61
Table 3 Validation accuracy for the different hyperparameters chosen for the GRU model Optimizer
Learning rate
Embedding dimensions
Batch size
Validation accuracy (%)
Model 1
SGD
0.01
200
64
Model 2
SGD
0.01
300
64
92.23
Model 3
Adam
0.001
200
64
90.12
Model 4
Adam
0.001
300
128
91.45
Model 5
Adam
0.001
300
64
94.16
Model 6
Adam
0.01
300
64
92.74
91.57
Adam optimizer performed better than Stochastic Gradient Descent. Learning rate determines how quickly the model converges to a local minima. A model with a very small learning rate takes a very long time to converge while a large learning rate causes divergence in the loss function. A learning rate of 0.01 was too high and caused the loss function to diverge and learning rate of 0.001 gave the best results. Any lower caused very slow convergence. Training with batch size of 128 took longer and gave worse results than with the batch size 64. The GloVe model has pre-trained word embeddings of dimensions 50, 100, 200 and 300. We found that increasing the dimensions increased the accuracy of the model and using a 300-dimensional feature vector gave the best results. Resampling the dataset and data augmentation using back translation also significantly improved the validation accuracy. The validation accuracy for the GRU model increased by 2.4% when we used data augmentation to increase the size of the minority class (Toxic). If we use the original class distribution or only perform under sampling on the majority class, there is not enough information to correctly classify a comment and we found many false negatives in this case (toxic comments classified as non-toxic). It should be noted that validation accuracy was used as a metric to only compare the four architectures and find which performed the best. High validation accuracy does not necessarily mean the model will perform well when testing and using data from other datasets if the test data has a heavy class imbalance. Accuracy may be high even if there are a large number of false negatives or false positives if the test data has uneven class distribution. Therefore, we have used the F1 score for further evaluation of how the model performs.
27 Binary Classification of Toxic Comments …
359
Table 4 Precision, Recall and F1 score for the two test datasets Dataset
Precision
Recall
F1 score
Wikipedia comments (mostly non-toxic)
0.90
0.89
0.89
Twitter hate comments (mostly toxic)
0.86
0.83
0.84
The dataset we trained our model on was balanced and contained a 1:1 ratio of “Toxic” and “Non-toxic” comments. The twitter hate comments dataset, on the other hand, has a large class imbalance (83.19% toxic). Hence, F1 score is the best metric for this. We also tested our model on a dataset containing mostly non-toxic comments (25% toxic) to compare how our model performs in each of these scenarios. This dataset is a subset of the original Wikipedia comments dataset and includes some of the comments that were discarded during under sampling. We achieved an F1 score of 0.89 for Wikipedia comments dataset and 0.84 for Twitter hate comments dataset (Table 4).
6 Conclusion We achieved the best validation accuracy of 94.16% using a GRU layer in our deep neural network. Therefore, GRU performed better than LSTM, Bi-LSTM and BiGRU. Moreover, our model outperforms the other works in this domain with an F1 score of 0.89 on the Wikipedia test dataset and 0.84 on the Twitter hate comments dataset. This shows that our model performs well even when the dataset has high class imbalance and can handle data from datasets it has not been trained on. Handling class imbalance in the training dataset using back translation for data augmentation is one of the reasons we were able to achieve such good results since we were able to balance the two classes without compromising much on the size of the dataset. Using GRU in the recurrent layer of the network gave better results than any other architecture because GRUs, in general, require less data to generalize. Since a large amount of the model’s performance was dependent on feature extraction, using different pre-trained word embeddings or training custom word embeddings especially for this application may further increase the accuracy of the model. Other data augmentation techniques such as using synonyms can also be used for resampling the dataset.
References 1. Davidson T, Warmsley D, Macy M, Weber I (2017) Automated hate speech detection and the problem of offensive language. In: Proceedings of the International AAAI conference on web and social media, vol 11(1). The AAAI Press, Palo Alto, California, pp 512–515
360
A. Saxena et al.
2. Yin D, Xue Z, Hong L, Davison BD, Kontostathis A, Edwards L (2009) Detection of harassment on web 2.0. Proc Content Anal WEB 2:1–7 3. Zaheri S, Leath J, Stroud D (2020) Toxic comment classification. SMU Data Sci Rev 3(1) 4. Nguyen H, Nguyen ML (2017) A deep neural architecture for sentence-level sentiment classification in Twitter social networking. In: Hasida K, Pa W (eds) International conference of the Pacific association for computational linguistics. Springer, Singapore, pp 15–27 5. Beniwal R, Maurya A (2021) Toxic comment classification using hybrid deep learning model. In: Karuppusamy P, Perikos I, Shi F, Nguyen TN (eds) Sustainable communication networks and application. Springer, Singapore, pp 461–473 6. Yu LC, Wang J, Lai KR, Zhang X (2017) Refining word embeddings for sentiment analysis. In: Proceedings of conference on empirical methods in natural language processing. Copenhagen, Denmark, pp 534–539 7. Chu T, Jue K, Wang M (2017) Comment abuse classification with deep learning. http://web.sta nford.edu/class/archive/cs/cs224n/cs224n.1174/reports/2762092.pdf. Accessed 12 Jun 2021 8. Khieu K, Narwal N (2018) Detecting and classifying toxic comments. https://web.stanford. edu/class/archive/cs/cs224n/cs224n.1184/reports/6837517.pdf. Accessed 12 Jun 2021 9. Van Aken B, Risch J, Krestel R, Löser A (2018) Challenges for toxic comment classification: an in-depth error analysis. arXiv preprint arXiv:1809.07572 10. Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543 11. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 12. Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PT (2016) On large-batch training for deep learning: generalization gap and sharp minima. arXiv preprint arXiv:1609.04836
Chapter 28
Robust Object Detection and Tracking in Flood Surveillance Videos S. Divya, P. Kaviya, R. Mohanaruba, S. Chitrakala, and C. M. Bhatt
1 Introduction We are in an era where technology advances at a rapid rate and solutions are being found to the majority of the calamities that pose threat to mankind. Flooding being one such calamity which imposes danger to human life and things is caused by various issues like global warming and earthquakes. Nowadays, we have surveillance cameras everywhere to monitor everything. The applications of convolutional neural networks (CNN) are extending in a wide range of areas, and one such important area is object detection. Various researches have been done in this particular area. Hence using the flooded videos from surveillance cameras, object detection is carried out using a deep learning algorithm in robust conditions. The robust conditions are imitated from the collected data set by various augmentation techniques. Once an object is detected and is classified as to which class it belongs to, the detected object is tracked. Multiple objects are being tracked along the consecutive frames which has been implemented by Kalman filter, altogether proposing the RODT in flooded environments.
2 Related Work Object detection algorithms find application in various fields such as defence, security and health care. Various object detection algorithms such as face detection and skin detection are simulated and implemented using MATLAB 2017b to detect S. Divya (B) · P. Kaviya · R. Mohanaruba · S. Chitrakala Department of Computer Science and Engineering, College of Engineering Guindy, Anna University, Chennai, India C. M. Bhatt Disaster Management Studies, Indian Institute of Remote Sensing, Dehradun, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_28
361
362
S. Divya et al.
various types of objects for video surveillance applications with improved accuracy [1]. Pack and detect (PaD) is an approach to reduce the computational requirements of object detection in videos. In PaD, only selected video frames called anchor frames are processed at full size [2]. A new type of correlation filter named minimum output sum of squared error (MOSSE) filter is robust to variations in lighting, scale, pose and non-rigid deformations [3]. YOLO performs better when speed is given preference over accuracy. Deep learning combines SSD and MobileNets that is used to perform efficient implementation of detection and tracking. This algorithm performs efficient object detection while not compromising on the performance [4]. Objective of multiple object tracking (MOT) is to assign a unique track identity for all the objects of interest in a video, across the whole sequence [5]. As the features in image increase, demand for efficient algorithm to excavate hidden features increases. The algorithms give real-time, accurate, precise identifications suitable for real-time traffic applications [6]. A cost-effective fire detection CNN architecture for surveillance videos is inspired from GoogleNet architecture, considering its reasonable computational complexity and suitability for the intended problem compared to other computationally expensive networks such as AlexNet [7]. A method for real time human detection based on video content analysis of feeds from surveillance cameras, which demonstrates that YOLO is effective method and comparatively fast for recognition and localization in COCO Human dataset [8]. Correlation filter-based moving object tracker is used along with scale adaptation and online re-detection. The translation filter is trained using a kernelized correlation filter with the multiple features for identifying the initial target location in each frame [9]. Objects are detected from video sequence by making use of object detection algorithms like Gaussian mixture model, Haar algorithm, histogram of oriented gradients and local binary patterns. As the field of image processing is very application specific, different algorithms work well for different circumstances of video sequencing [10]. Objective of multiple object tracking (MOT) is to assign a unique track identity for all the objects of interest in a video, across the whole sequence. Tracking by detection is the most common approach used in addressing MOT problem [11]. A novel target tracking method called PCANet network is used to extract features. The number of filters in the PCA layer is determined by the cumulative contribution rate, which achieves the adaptive adjustment of the network parameters. Firstly, the region of interest is acquired by particle filtering; secondly, the depth characteristics of the image are extracted via PCANet; finally, the target is determined by the SVM classifier [12]. An approach called N-YOLO, which is instead of resizing image step in YOLO algorithm, divides into fixed size images used in YOLO and merges detection results of each divided sub-image with inference results at different times using correlation-based tracking algorithm, and the amount of computation for object detection and tracking can be significantly reduced [13]. A general framework for detection, recognition and tracking preceding vehicles and pedestrians based on a deep learning approach is used here. It combines a novel deep learning approach with the use of multiple sources of local patterns and depth information to yield robust on-road vehicle and pedestrian detection, recognition and tracking [14].
28 Robust Object Detection and Tracking in Flood Surveillance Videos
363
2.1 Proposed Work In this proposed RODT, objects are detected and tracked in robust scenarios such as in night, in rainy conditions. In order to detect in robust scenarios, we have augmented our data set using techniques such as shifting, flipping, rotating, blurring, adding rainy effects and converting day to night. Here, the re-detection of already entered objects is also handled. If an already detected object enters into the frame after few frames, it is detected as the same old object, and the same ID is assigned to it.
3 System Architecture The proposed RODT system detects and tracks objects in robust environments which is achieved by subjecting the flooded data set to various augmentation techniques such as day to night conversion, adding artificial rain effects, blurring and tilting which not only increases the size but helps in making the system adaptable to sturdy environments. The input videos from flooded scenarios are pre-processed into images and fed into the training system. The training system consolidates a neural network in which the data set is trained. The convolutional network is used for feature extraction. Grid division takes place. The coordinates (x, y), height (h), width (w) along with the confidence score are calculated. Then the class probability score of the bounding box is calculated. The object is detected using non-max suppression and IoU calculation. The class confidence score is calculated, and object is classified. After object detection, the detected object is tracked by a Kalman filter calculating frame difference between subsequent frames using Mahalanobis distance. The position of the tracked object is updated. Post-processing is done, and image frames are converted into video. If the already tracked object re-enters into the frame, instead of re-detecting it is detected as the old object itself. The video after object detection and tracking is given as output. The count of objects per class and snipping out detected objects is also done (Fig. 1).
4 Data Augmentation In order to increase the data set size and make the model adaptable to robust environments, various data augmentation techniques were applied to the images (frames got from videos ). The augmentation techniques used are as follows.
364
S. Divya et al.
Fig. 1 System architecture
4.1 Flipping Flipping is an extension of rotation. Here, we have used the fliplr function of NumPy to flip the image from left to right. It flips the pixel values of each row, and the output confirms the same. Similarly, we can flip the images in a right–left direction.
4.2 Shifting By shifting the images, we can change the position of the object in the image and hence give more variety to the model. The translation hyperparameter defines the number of pixels by which the image should be shifted. The image is shifted by (25, 25) pixels.
4.3 Rotating Rotate the image with respect to a particular angle (say 45◦ C).
28 Robust Object Detection and Tracking in Flood Surveillance Videos
365
4.4 Blurring Blurring is used to make the deep learning model more robust. We use a Gaussian filter for blurring the image. Sigma here is the standard deviation for the Gaussian filter. We have taken it as 1. The higher the sigma value, the more will be the blurring effect.
4.5 Day to Night The given day images are converted to night images by adjusting the RGB value of the image array. Algorithm 1 refers to changes done in an image array to adjust the RGB value (Fig. 2). Algorithm 1: Day to night conversion Begin arrImg ← convertirImgM atrixRGB(img) for i in range(img.size[1]) : do for j in range(img.size[0]) : do Adjust pixels according to night output ← Image.f romarray(arrImg) end for end for return output Imagearray End
4.6 Rainy Effects To give the rain effect, additional lines which are greyish transparent are added on top of the image indicating rain drops, and also the image is made shady and blurred to indicate the typical rainy day scenario. As mentioned in Algorithm 2, the image
Fig. 2 Day to night conversion
366
S. Divya et al.
Fig. 3 Adding rain effects
is read at first followed by adding rain drops and blurring the image so as to give a rainy day effect (Fig. 3). Algorithm 2: Adding Rain effects Begin image ← input.shape() defining slant length, drop length, drop width of the rain drops generate rain drops image ← cv2.blur(image, (7, 7)) brightness − coef f icient ← 0.7 image − HLS ← cv2.cvtColor(image, cv2.COLOR − RGB2HLS) image − HLS ← cv2.cvtColor(image − HLS, cv2.COLOR − HLS2RGB) End
5 Object Detection and Classification For every frame, the objects in the frame are being detected. At first, the image is divided into grids as mentioned in Algorithm 3, and the confidence of that object being present and belonging to a particular class is inferred. The predicted confidence undergoes non-max suppression where in out of the many boundary boxes, the one with a maximum IOU value is chosen as the final output. The IOU threshold is set to 0.45, for those boxes whose IOU is less than 0.45 is rejected and out of the remaining boxes the one with a max value is chosen along with its class confidence. As a result, the detected and classified objects are obtained which is shown in Fig. 4.
5.1 Counting Detected Objects The detected objects in every frame are being counted totally and with respect to every class. The detected objects in the frame are iterated and checked as to which class it belongs to and keeps a note of the count.
28 Robust Object Detection and Tracking in Flood Surveillance Videos
367
Fig. 4 Object detection and classification
5.2 Snipping Detected Objects The detected objects are being cropped around 5 pixels around the boundary box and stored where the objects are snipped based on the coordinates of the boundary box.
6 Multi-object Tracking For every frame, the object in that frame is detected, and if it is found in consecutive of nn-init frames (nn-init = 3), the object is added to a list of tracked objects (i.e. tracking is begun for that object). There are set of matched tracks which is the list of matches in the current and previous frame, unmatched tracked which is the list of unmatched objects in the previous and current frame and unmatched detections which is the list of objects which are seen for the first time. Algorithm 3 explains the corresponding steps. Algorithm 3: Multi-Object Tracking Begin trackedObjectsAtP resent ← [] for all frames do for all detections do matchExistingTrackedObject(detection,trackedObjectsAtPresent) matchTrackedObject.update(detection) trackedObjectsAtPresent.update(newtrackedObject(detection)) for all trackedObjectsAtPresent do if isUnmatched(trackedObject) then trackedObject.remove() end if end for end for end for End
368
S. Divya et al.
6.1 Kalman Filter The Kalman filter works recursively, where we take current readings, to predict the current state, then use the measurements and update our predictions. So, it creates a new distribution (the predictions) from the previous state distribution and the measurement distribution. For each detection, we create a “Track” that has all the necessary state information. It also has a parameter to track and delete tracks that had their last successful detection long back, as those objects would have left the scene.
6.2 Frame Difference Calculation Using Squared Mahalanobis Distance Thereby to produce a robust tracking system, squared Mahalanobis distance was used. Squared Mahalanobis distance is an effective multivariate distance metric that measures the distance between a point and a distribution which we actually need whereas Euclidean distance calculates the distance between two points (Fig. 5).
7 Re-detection The object which is being tracked before it is being deleted from the list of tracked objects is remembered for max-age which is set to 100, and if the object enters again before max-age, the object is detected as the old one (i.e. the old object ID is assigned). Algorithm 4 explains the corresponding steps.
Fig. 5 Subsequent tracked frames
28 Robust Object Detection and Tracking in Flood Surveillance Videos
369
Algorithm 4: Re-Detection Begin mean ← Givenmeanvalue covariance ← Givencovariance max − age ← 100 redetection(): if state is TrackState.Tentative then state ← T rackState.Deleted else if time-since-update greater than max-age then state ← T rackState.Deleted end if End
8 Experimental Results The IOU threshold we set is 0.45 True Positive(TP): TP is the count of the objects that are correctly identified, i.e. whose IOU ≥ 0.45 False Positive(FP): FP is the count of the objects that are incorrectly identified, i.e. whose IOU < 0.45 True Negative(TN): TN is every part of the image where we did not predict an object. This metric is not useful for object detection. False Negative(FN): FN is count of objects that are incorrectly rejected, i.e. when a ground truth is present in the image and model failed to detect the object. Figure 6 describes the TP, FP, TN and FN values. The answers produced by the system are analysed using the metrics: precision, recall, F1-score and mAP (Fig.6).
8.1 Precision Precision is the closeness of the measurements to each other. P = TP/(TP + FP)
(1)
8.2 Recall It is defined as the fraction of samples from a class which are correctly predicted by the model. R = TP/(TP + FN) (2)
370
S. Divya et al.
Fig. 6 Structure of the confusion matrix
8.3 F1-score F1-score considers both precision and recall values, hence must be maximized to make the model better. F1 = 2((P R)/(P + R)) (3)
8.4 Mean Average Precision The mean average precision or mAP score is calculated by taking the mean AP over all classes, depending on different detection challenges that exist. mAP = (1/N )
n
A Pi
(4)
i=1
8.5 Average IoU Intersection over union (IoU) is a number from 0 to 1 that specifies the amount of overlap between the predicted and ground truth bounding box. An IoU of 0 means that there is no overlap between the boxes An IoU of 1 means that the union of the boxes is the same as their overlap indicating that they are completely overlapping (Fig. 7). n Best IoU for each label/Total no of labels (5) Avg IoU = i=1
28 Robust Object Detection and Tracking in Flood Surveillance Videos
371
Fig. 7 Metric evaluation
Metrics Precision Recall F1-score mAP
Values (in percentage) 88 76 78 88
9 Conclusion Object detection and tracking in robust environments such as rainy days, blurry weather and during night times was implemented using the proposed RODT system. The RODT system works more accurately than already present object detection algorithms in robust environments. The proposed system which was implemented using augmented data set and deep learning algorithm showed a better performance in detecting objects in robust environments such as flood. However, the RODT system can be extended to detect objects of various classes. Acknowledgements This work has been supported by the RESPOND project funded by the Indian Space Research Organisation (ISRO) under Grant No: RES/4/676/19-20.
372
S. Divya et al.
References 1. Raghunandan A et al (2018) Object detection algorithms for video surveillance applications. In: International Conference on Communication and Signal Processing (ICCSP) 2. Kumar AR, Ravindran B, Raghunathan A (2019) Pack and detect: fast object detection in videos using region-of-interest packing. CODS-COMAD’19 3. Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In: IEEE computer society conference on computer vision and pattern recognition 4. Chandan G et al (2018) Real time object detection and tracking using deep learning and OpenCV. In: International conference on inventive research in computing applications (ICIRCA) 5. Karunasekera H, Wang H, Zhang H (2019) Multiple object tracking with attention to appearance structure, motion and size. IEEE Access 6. Redmon J, Farhadi A (2018) YOLOv3: An incremental improvement 7. Muhammad K et al (2018) Convolutional neural networks based fire detection in surveillance videos. IEEE Access 6:18174–18183 8. Keerthana T, Kala L (2019) A real time Yolo human detection in flood affected areas based on video content analysis. Int R J Eng Technol (IRJET) 6(6). e-ISSN: 2395-0056 9. Islam MM, Hu G, Liu Q, Dan W, Lyu C (2018) Correlation filter based moving object tracking with scale adaptation and online re-detection. IEEE Access 10. Mohana et al (2018) Simulation of object detection algorithms for video surveillance applications. In: 2nd international conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) 11. Mohana, Aradhya HVR (2019) Object detection and tracking using deep learning and artificial intelligence for video surveillance applications. International Journal of Advanced Computer Science and Applications (IJACSA) 10(12) 12. Mu Q, Wei Y, Liu Y, Li Z (2018) The research of target tracking algorithm based on an improved PCANet. In: 10th international conference on intelligent human-machine systems and cybernetics (IHMSC), Hangzhou, pp 195–199 13. Jha S, Seo C, Yang E, Joshi GP (2020) Real time object detection and tracking system for video surveillance system. Springer Science+Business Media, LLC, part of Springer Nature 2020 14. Nguyen VD et al (2017) Learning framework for robust obstacle detection, recognition, and tracking. IEEE Trans Intell Transp Syst 18(6):1633–1646
Chapter 29
Anti-drug Response Prediction: A Review of the Different Supervised and Unsupervised Learning Approaches Davinder Paul Singh, Abhishek Gupta, and Baijnath Kaushik
1 Introduction Molecular identification techniques are enhancing rapidly and providing efficient throughput. Therefore, precision drug prediction has received a lot of attention. By the need for approach is to combine as well as analyze relevant knowledge in a manner that could increase knowledge in this domain and possible therapeutic choices, computational biology is also at the front of drug development work. As the amount of information grows, accurate computational predictions of tumor cell line’s treatment response regarding the molecular interconnections, genetic characteristics, and structural characteristics become increasingly important. Another main reason seems to be that melanoma is indeed a complex condition generated from a variety of chromosomal mutations and phenotypic changes. The continued influence of genetic information is being generated as a result of growing technical advancements in elevated research. Despite the prevalence of statistics, it makes sense to use information judgment tools in molecular diagnostics. Among the most difficult computational issues regarding personalized medicine involves gaining a molecular understanding of the cancer cells and recommending personalized therapy to individuals who enable enhanced efficiency across various types of cancers by monitoring medication sensitivities. Associated with tumor variability as well as based on inter sub-clones, precise medication outcome assessment and also the discovery of new anticancer therapies still are difficult challenges. Bioinformatics, a new area that studies how genetic changes, as well as transcriptional programs, influences medication responsiveness could be a possibility. One of the goals of biomedical research toward precision medicine is to identify molecular, cell lines, and genetic traits that are suitable for a single treatment diagnosis. Not only can individual gene expression D. P. Singh (B) · A. Gupta · B. Kaushik School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_29
373
374
D. P. Singh et al.
and biochemical characteristics be identified as indicators of antibiotic resistance or susceptibility, however, composites of some of that gene expression and biochemical traits could also be utilized to estimate a medication’s impact on an individual [1]. The finding of accurate biomarkers is a difficulty for not only the majority of extensively employed cytotoxic medications but also for therapeutic targets because pharmacological objectives are often limited treatment indications under their own [2]. The establishment of multidimensional regenerative therapies and the identification of biomarker diagnostic of medication treatment necessitate effective computational techniques as well as a large number of instances. Across both preclinical and clinical contexts, conventional regression methods and more complex convolutional neural networks have been employed to develop determinants for medication response and susceptibility. As the sophistication of predictive analytics develops, so does the amount of data required for training them. Although individual omic profiles and patient outcomes seem to be the most specific data assets when developing statistically significant determinants, such collections are generally limited based on a variety of factors such as increased charges, low enrollment rates, including complicated regulatory frameworks [3]. Furthermore, because of the nature of the research, fair evaluation of numerous treatment methods for the same individual throughout the individual is nearly impossible. The finding of accurate biomarkers is a difficulty for not only the majority of extensively employed cytotoxic medications but also for therapeutic targets because pharmacological objectives are often limited treatment indications under their own [2]. The establishment of multidimensional regenerative therapies and the identification of biomarker diagnostic of medication treatment necessitate effective computational techniques as well as a large number of instances. Across both preclinical and clinical contexts, conventional regression methods and more complex convolutional neural networks have been employed to develop determinants for medication response and susceptibility. As the sophistication of predictive analytics develops, so does the amount of data required for training them. Although individual omic profiles and patient outcomes seem to be the most specific data assets when developing statistically significant determinants, such collections are generally limited based on a variety of factors such as increased charges, low enrollment rates, including complicated regulatory frameworks [3]. Furthermore, because of the nature of the research, fair evaluation of numerous treatment methods for the same individual throughout the individual is nearly impossible. There are several methods of drug response prediction. Some of them are based on machine learning (ML) and some are based on deep learning (DL). The most challenging task is the training of models with complex drugs structures. The computational time of existing models is also high. Therefore, there is a need for a method that can reduce the computational cost and easily learn the complex designs. The features of the drugs such as gene mutations, expression, genomic information, and cell lines and basic information related to the drug response prediction model are explained in Sect. 1. In Sect. 2, literature on existing methods is surveyed. The various deep learning-based models of drug response prediction are discussed
29 Anti-drug Response Prediction: A Review of the Different …
375
in Sect. 3. The machine learning-based supervised and unsupervised methods with results are depicted in Sect. 4. Section 5 is the conclusion and future scope.
2 Literature Review In [4] designed a method for anticancer drug response prediction and used stacked generalization and ensemble learning methods. The gene expression provided a large number of the gene that was difficult to handle. Therefore, high dimensions of the gene were converted to low dimensions by the selection of genes. The performance of the model was evaluated on branch mark datasets: CCLE and GDSC datasets. The datasets had high-dimensional gene expressions that were reduced to low dimensions. The 1274 genes were selected from dataset1 and 1239 genes were selected from dataset2. The cell visibility with a large scale was induced more gene expression. In [5], the authors proposed a model for the estimation of drug sensitivity with two prediction algorithms. The first was a supervised link, and the second was an extended supervised link algorithm for prediction. The supervised prediction algorithm checks the quality of cell lines of cancer and selects better cell lines from them. The second algorithm selects the features of cell lines. The performance of the model was evaluated on clinical trial data and represented with an AUC termed as area under the ROC curve. Turki et al. [6] designed a transfer learning algorithm for the estimation of anticancer drug sensitivity. The algorithm was formed with three techniques. In the first technique, auxiliary data was shifted to the target dataset. The second technique was used to align the selected data to target training sets, and in the third technique machine, learning-based algorithms were trained by using samples and training sets. The clinical trial datasets were used for the training and testing of the machine learning algorithms. The performance of the algorithm was calculated through area under the curve (AUC) and receiver operating characteristic (ROC). The algorithm provided efficient results on large cell lung cancer, multiple myeloma, breast cancer datasets, and triple-negative breast cancer. Dong et al. [7] designed a model based on a machine learning algorithm used to predict the anticancer drug response. The support vector machine recursive feature elimination (SVM-RFE) was used for the features selection. The performance of the machine learning-based model was examined on Cancer Cell Line Encyclopedia (CCLE) and Cancer Genome Project (CGP) databases. The Cancer Cell Line Encyclopedia (CCLE) database got more than 80% accuracy on 10 drugs set and more than 75% accuracy for 19 drug sets. In [8, 9], authors proposed the prediction models for anticancer drug response. The authors in [8] used the hybrid approach with the weighted collaborative filtering method and used cell line information for the prediction purpose. The similarities of drug and cell lines shrunk into one set. The K most similar neighbors were used for the prediction. The working of the model was evaluated on the GDSC database, whereas the performance of [9] model was examined on the GDSC termed as Genomics of Drug Sensitivity in Cancer database, GCSI termed as Genentech Cell Line Screening
376
D. P. Singh et al.
Initiative, Cancer Cell Line Encyclopedia (CCLE) database, and with enhanced CoExpression Extrapolation (COXEN) model. Both the model provided state-of-the-art results in the prediction of anticancer drug response. Clayton et al. [10] worked on a machine learning method and used 5-fluorouracil and gemcitabine in the proposed method. The gene expressions were used as input features. The clustering of the sets was done by the Clara method, and the random forest was used for the discrimination. The method attained the accuracy of 86%, evaluated on a clinical trial dataset. Huang et al. [11] used the machine learning-based approach for the prediction of response of anticancer drugs. The machine learning-based support vector machine (SVM)-based technique was used for discrimination purposes. The performance was examined on the Cancer Genome Atlas (TCGA) database and got more than 80% accuracy. In [12], the authors proposed a model based on the autoencoder features selection method. The model was provided effective results in predicting the response of anticancer drugs. There were multiple features of the drugs to reduce the features and for the selection of important ones autoencoder network was built. Then, for the small groups of features, Boruta algorithm was used, and the random forest was used for the prediction. The performance was measured on two datasets, GDSC and CCLE. In [13], the authors provide the prediction model based on cell lines for the prediction of response of anticancer drugs. The cell lines were used in CDCN termed as cell line-drug complex network. The performance of the model was evaluated with RMSE termed as root mean square error and NRMSE termed as normalized root mean square error. The CCLE and GDSC datasets were used for the training and testing purpose of the model. Table 1 describes the existing proposed methods used in anti-drug response prediction, dataset, and performance metrics such as accuracy and means square error rates and key-points.
3 Deep Learning-Based Anti-drug Response Prediction Models Precision medicine, specifically defined by The National Research Council, is used to divide individuals under subsets that respond differently toward medical treatments. Developing alternative care based on individual qualities may raise the effectiveness of procedures, reduce cost, and reduce adverse reactions [14]. As a result, forecasting a sufferer’s responsiveness to different symptoms is critical in molecular diagnostics. Nowadays, enormous data collections are laying the foundation for the creation of predictive analytics and analytical tools like deep learning (DL) and machine learning (ML). In deep learning-based model, the first and most important task is data collection. The input and output variables are selecting in preprocessing phase. The features of drugs are extracted for the training of the model. Feature extraction is an essential task. The correct features will provide precision results on drug prediction [15]. The hyperparameters are tuning for the training of the model. The model performance
29 Anti-drug Response Prediction: A Review of the Different …
377
Table 1 Anti-drug prediction response using existing methods, datasets and performance metrics Authors
Method used
Dataset
Performance metrics
Tan et al. [4]
Ensemble learning CCLE and GDSC MSE method
Results MSE D1 = 1.709 MSE D2 = 3.947
Turki et al. [5] Supervised learning-based algorithm: SVM and SVR
Clinical trial data
Area under the MAUC + SVM = ROC curve (AUC) 0.86 MAUC + SVR = 0.868
Turki et al. [6] Transfer learning algorithm
Clinical trial datasets
SD (standard deviation) and AUC
AUC = 0.648 SD = 0.036
Dong et al. [7] Machine learning-based algorithm using SVM
CCLE and CGP
Accuracy
Accuracy = 75% for 10 drugs Accuracy = 67% for 19 drugs
Zhang et al. [8]
Hybrid approach with the weighted collaborative filtering method
GDSC dataset
Pearson correlation coefficient (PCC) and RMSE
PCC = 0.8 RMSE = 0.54
Zhu et al. [9]
Enhanced COXEN (co-expression extrapolation) model
Genentech Cell Line Screening Initiative (gCSI) dataset and Cancer Cell Line Encyclopedia (CCLE) dataset
R2 (overall prediction performance), performance improvement percentage (PIP), Adjuster p-value
R2 = 0.725 PIP = 27.3% Adjusted P-value = 4.32 * 10−25
Clayton et al. [10]
Machine l based earning method
Clinical trial dataset
Accuracy
Accuracy = 78%
Huang et al. [11]
Support vector machine (SVM)-based algorithm
The Cancer Accuracy Genome Atlas (TCGA) database
Accuracy = 86%
Xu et al. [12]
Autoencoder algorithm
GDSC and CCLE AUC, accuracy, recall, Matthews correlation coefficient (MCC), specificity, F1Score,
AUC = 0.71 ACC = 0.65 Rec = 0.652 Spec = 0.796 F-score = 0.6501 MCC = 0.3109
CCLE and GDSC Accuracy datasets
Acc = 0.7
Wei et al. [13] CDCN termed as cell line-drug complex network model
378
D. P. Singh et al.
Fig. 1 Workflow of the anti-drug prediction model [15]
can be evaluated by distinct parameters such as accuracy and mean square error. The workflow of the deep learning-based drug prediction model is presented in Fig. 1. The biochemical drugs have a complex structure that increases the complexity. To reduce the complexity of the drugs, various advanced learning-based models such as hidden Markov model, collaborative filtering based model, random forestbased model, SVM based model, and autoencoder model have been established. The predictive models are used for drug response prediction, different classifiers, and feature selection methods are implemented for the efficient results from the model. Some of the drug response prediction models are explained below: • SVM is the type of supervised learning-based method, which is used to divide the data into different classes and is also used for pattern recognition. In the drug response prediction model, SVM is used to find the relationship between the epigenetic and genomic attributes [11]. The SVM will describe values as lines inside the spatial domain, with a significant division as large as possible separating data from different classes. Test instances are placed into the same kind of area and categorized according to which class they belong. • The hidden Markov model is based on unobservable data states. The process is known as the Markov process which is based on probability distribution. There are multiple features of drugs; hidden Markov model is applied on features to generate the mutation data. • The collaborative filtering-based model works on the structures of the drugs. It checks the cell-line similarities and then estimates the drug response. The advantages and disadvantages of the various drug response prediction models are depicted in Table 2. The models of drug response prediction are based on machine learning and deep learning methods. There are numerous terms like overfitting, attention mechanism, omics data, drug sensitivity, etc., are used in ML and DL-based techniques. The various terms and their definitions are defined in Table 3.
29 Anti-drug Response Prediction: A Review of the Different …
379
Table 2 Various anti-drug response prediction models using advantages and disadvantages Prediction models
Advantages
Disadvantages
HMM-LMF model based on hidden Markov model [16]
Provided efficient results with random testing
Hugh execution time
Hybrid approach with the weighted collaborative filtering model [8]
Efficient results for predicting similarity exhibited in drugs
Sparse matrix issues and difficulties in the integration of multi-omic data
Random forest-based predictive model [10]
Provided effective results for predicting the response of cancer drug
Limited dataset
SVM-based machine learning model [11]
Gene expression provided individual responses
Only provided the initial treatment
Autoencoder model [12]
Boruta algorithm used for feature selection and attained effective results
Occurrence of uncertainty in some cases
Cell line-drug complex network model [13]
Helpful in real clinical practice Work only on current data, no knowledge of previously existing data
Table 3 Terminology [14] Terms
Definition
Biomarker drug response
A biological feature typically predicts the tumor impact of a specific medication
Attention mechanism
It is used to find the relevant input that provided effective results
Hyperparameter
Hyperparameter is a model parameter trained before training the model
Omics data
Domains of medicine that concentrate on identifying and defining certain biological organisms as well as interconnections
Molecular graph
A graph that represents the composition of a molecule, with nodes representing atoms and bonds, is representing by edges
Overfitting
When a model works the data sets too well and cannot generalize over new information, it is called overfitting
Drug sensitivity
A cell line’s or tumor’s vulnerability to a medication
4 Anti-drug Response Using Supervised and Unsupervised Learning-Based Methods Machine learning is the sub-part of artificial intelligence (AI) approach that enables a technology to develop and learn upon a known algorithm by using pre-existing knowledge and data. Giving a machine a collection of information with such a combination of technologies as well as allowing the machine to employ a certain technique to predict regularities is the first step toward machine learning [3]. Collected
380
D. P. Singh et al.
information with supervision categories recorded with experts or assessed through investigations is used by supervised learning methods to design a framework that estimates labeled for unlabeled data. Artificial neural networks are helpful since they can handle operations like image recognition that is difficult to convert toward a traditional algorithm. Unsupervised learning seeks to uncover connections in databases that are not labeled by anyone. Investigators face a significant challenge in analyzing MI techniques since unsupervised techniques use previously unrecognized samples. The framework of the drug response prediction model is presented in Fig. 2. The represented framework shows that how to identify the drugs. The ability to determine cancer vulnerability to certain anticancer therapy is a critical task toward personalized medicine. Machine learning techniques can be learned using efficient throughput monitoring knowledge to build methods to analyze how cancerous cells and individuals will respond to new medications or treatment mixtures [7]. There are several algorithms used for drug
Fig. 2 The main framework using anti-drug response prediction [17]
29 Anti-drug Response Prediction: A Review of the Different … Table 4 Comparison analysis with unsupervised and supervised learning-based methods (accuracy %)
Ref No.
Type of methods
Name of method
Accuracy rate (%)
Menon and Rajeswari [18]
Unsupervised method
K-means
0.89–89
Xu et al. [12] Supervised method
Neural network
0.96–96
Huang et al. [11]
SVM
0.83–83
Random forest
86
Supervised method
Clayton et al. Supervised [10] method
Table 5 Comparison analysis with unsupervised and supervised learning-based methods (RMSE)
381
Ref No.
Type of methods
Name of methods RMSE
Wei et al. [13]
Supervised
Neural network
0.89
Liu et al. [19]
Unsupervised method
Collaborative filtering method
0.119
response prediction based on supervised and unsupervised learning. Due to excellent capacity to preserve local database schema and acquire a hierarchy of attributes, supervised and unsupervised methods have been widely employed for image, video, textual, and auditory information. The various supervised and unsupervised types of methods such as K-means, SVM, and random forest are depicted in Table 4 (Table 5). The various existing methods of drug response prediction are compared based on accuracy and root mean square error (RMSE). In Fig. 3, methods are compared with accuracy evaluation metric and neural network has attained the maximum accuracy from other existing methods. The comparison of various supervised and unsupervised methods based on RMSE evaluation parameters is shown in Fig. 4.
5 Conclusion and Future Scope Deep learning and machine learning-based methods of drug response prediction have been discussed in this paper. The most difficult task for the prediction of drug response model is to understand the structure of the drugs. The computational rate of the existing models is very high due to complex biomedical drugs structures. The unsupervised method, k-means had achieved 89 percent of accuracy. The supervised method such as neural network, support vector machine, and random forest are also compared in this work. The neural network had achieved 96 percent of accuracy, SVM had attained 83 percent, and random forest had attained 86 percent of accuracy. The neural network-based method has achieved 0.89 RMSE, whereas collaborative
382
D. P. Singh et al.
100
Accuracy
95 90 85 Accuracy
80 75 70
KNeural means Network
SVM
Random forest
Methods Fig. 3 Comparison of existing methods of drug response prediction based on the accuracy rate (%)
RMSE value
RMSE 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0
RMSE
Neural network
Collaborative filtering method
Methods Fig. 4 Comparison of existing methods of drug response prediction based on the RMSE
filtering method has attained 0.119 RMSE. There are different features are extracted from the datasets such as gene expression, gene mutations, cell lines, and genomic information for the prediction of drugs. There is a need for multiple layers design while learning the complex designs of the drugs. The machine learning and deep learning-based drug response prediction methods have attained efficient results for cancer drug prediction. In future, the hybrid algorithms can be used to resolve the existing issues and help to improve the existing prediction performance.
29 Anti-drug Response Prediction: A Review of the Different …
383
References 1. Parca L, Pepe G, Pietrosanto M, Galvan G, Galli L, Palmeri A, Sciandrone M, Ferrè F, Ausiello G, Helmer-Citterich, M (2019) Modeling cancer drug response through drug-specific informative genes. Sci Rep 9(1):1–11 2. Adam G, Rampášek L, Safikhani Z, Smirnov P, Haibe-Kains B, Goldenberg A (2020) Machine learning approaches to drug response prediction: challenges and recent progress. NPJ Precision Oncol 4(1):1–10 3. Wang Z, Li H, Carpenter C, Guan Y (2020) Challenge-enabled machine learning to drugresponse prediction. AAPS J 22(5):1–6 4. Tan M, Özgül OF, Bardak B, Ek¸sio˘glu I, Sabuncuo˘glu S (2019) Drug response prediction by ensemble learning and drug-induced gene expression signatures. Genomics 111(5):1078–1088 5. Turki T, Wei Z (2017) A link prediction approach to cancer drug sensitivity prediction. BMC Syst Biol 11(5):1–14 6. Turki T, Wei Z, Wang JT (2018) A transfer learning approach via procrustes analysis and mean shift for cancer drug sensitivity prediction. J Bioinform Comput Biol 16(03):1840014 7. Dong Z, Zhang N, Li C, Wang H, Fang Y, Wang J, Zheng X (2015) Anticancer drug sensitivity prediction in cell lines from baseline gene expression through recursive feature selection. BMC Cancer 15(1):1–12 8. Zhang L, Chen X, Guan NN, Liu H, Li JQ (2018) A hybrid interpolation weighted collaborative filtering method for anti-cancer drug response prediction. Front Pharmacol 9:1017 9. Zhu Y, Brettin T, Evrard YA, Xia F, Partin A, Shukla M, Yoo H, Doroshow JH, Stevens RL (2020) Enhanced co-expression extrapolation (COXEN) gene selection method for building anti-cancer drug response prediction models. Genes 11(9):1070 10. Clayton EA, Pujol TA, McDonald JF, Qiu P (2020) Leveraging TCGA gene expression data to build predictive models for cancer drug response. BMC Bioinform 21(14):1–11 11. Huang C, Clayton EA, Matyunina LV, McDonald LD, Benigno BB, Vannberg F, McDonald JF (2018) Machine learning predicts individual cancer patient responses to therapeutic drugs with high accuracy. Sci Rep 8(1):1–8 12. Xu X, Gu H, Wang Y, Wang J, Qin P (2019) Autoencoder based feature selection method for classification of anticancer drug response. Front Genet 10:233 13. Wei D, Liu C, Zheng X, Li Y (2019) Comprehensive anticancer drug response prediction based on a simple cell line-drug complex network model. BMC Bioinform 20(1):1–15 14. Baptista D, Ferreira PG, Rocha M (2021) Deep learning for drug response prediction in cancer. Brief Bioinform 22(1):360–379 15. Sakellaropoulos T, Vougas K, Narang S, Koinis F, Kotsinas A, Polyzos A, Moss TJ, Piha-Paul S, Zhou H, Kardala E, Gorgoulis VG (2019) A deep learning framework for predicting response to therapy in cancer. Cell Rep 29(11):3367–3373 16. Emdadi A, Eslahchi C (2021) Auto-HMM-LMF: feature selection based method for prediction of drug response via autoencoder and hidden Markov model. BMC Bioinform 22(1):1–22 17. Kuenzi BM, Park J, Fong SH, Sanchez KS, Lee J, Kreisberg JF, Ma J, Ideker T (2020) Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 38(5):672–684 18. Menon MS, Rajeswari PR (2020) a novel approach for predicting drug response similarity using machine learning. Eur J Mol Clin Med 7(8):796–808 19. Liu H, Zhao Y, Zhang L, Chen X (2018) Anti-cancer drug response prediction using neighborbased collaborative filtering with global effect removal. Mol Therap Nucleic Acids 13:303–311 20. Suphavilai C, Bertrand D, Nagarajan N (2018) Predicting cancer drug response using a recommender system. Bioinformatics 34(22):3907–3914
Chapter 30
An Highly Robust Image Forgery Detection Using STPPL-HBCNN and Region Detection Using DBSCAN-ACYOLOv2 Technique Sagi Harshad Varma and Chagantipati Akarsh
1 Introduction Image forgery is the act of manipulating an image deliberately to modify the information it conveys [1]. The image can be manipulated by adding, eradicating, or modifying the essence of the image devoid of leaving any hint of the induced change. Because of the accessibility of a generous amount of free image editing tools as well as software, the images are forged easily with high complexity in detection. Hence, the confidence in the authenticity in addition to the integrity of an image has been corroded [2]. Consequently, it is necessary for a robust algorithm to spot forgery automatically, and in image processing, it is one among the notable research problems [3]. There are numerous algorithms for digital image forensics, and they are split into passive or active approaches. In the active approaches, special hardware is needed for signature embedding as well as identification; however, they are not appropriate for images that are captured by means of general-purpose cameras [4, 5]. In passive approaches, forgery detection (FD) is done by searching the intrinsic footprints of the tampered image [6]. Relying on the footprints that are utilized in forensic assessment, the previous passive algorithms are separated into disparate categories, say pixel, geometric-, camera-, together with format-centered approaches [7, 8]. However, these algorithms are not appropriate for FD if a section of an image is copied as of an anonymous source [9]. On that account, to assure the authenticity of an image, the majority of the researchers have started to give more attention to image FD, region detection, and also disparate approaches were developed to oppose tampering along with forgery [10]. S. H. Varma (B) School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India C. Akarsh VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_30
385
386
S. H. Varma and C. Akarsh
Certain techniques, say K-means clustering-centered convolutional neural network (CNN), regional-CNN (RCNN), mask-RCNN (MRCNN), etc., are developed to overcome image forgery. Albeit disparate techniques were developed, accurate FD and region localization is challenging because of non-informative feature extraction (FE), noisy, poor model development, etc. [11]. The work has presented the detection of the forged image as well as region localization utilizing STPPL-HBCNN and DBSCAN-ACYOLOv2, correspondingly to overcome these prevailing challenges. The paper is organized as: Sect. 2 illustrates various surveys on image forgery; Sect. 3 explains the proposed methodology; Sect. 4 discusses the outcomes attained by the proposed approach; lastly, Sect. 5 concludes the paper.
2 Literature Survey Mahmood et al. [12] presented a robust method that extracted the stationary wavelet transform (SWT)-centered features for exposing the forgeries on digital images. Discrete cosine transform lessened feature vectors’ dimension. Considering true in addition to false detection rates, the technique trounced the prevailing techniques. The developed FD technique was implemented for detecting the tampered regions and was employed aimed at image forensic applications. Nevertheless, SWT encompassed spatial resolution issues, which imprecisely extracted useful features. Bilal et al. [13] formed an FD technique that employed a union of speeded-up robust features with binary robust invariant scalable keypoints (BRISK) aimed at FD. Centered upon hamming distance together with ‘2nd ’ nearest neighbor, the merged features were matched. For clustering the matched features, density-centered spatial clustering of applications having noise clustering algorithms were employed. Considering true in addition to false detection rates, the suggested technique trounced the top-notch techniques utilized for FD. The fused aspect scheme made the DBSCAN over-learned that brought about an overfitting issue and lessened the accuracy. Abdalla et al. [14] examined the copy–move FD centered on a fusion processing method that encompassed a deep convolutional together with an adversarial model. ‘4’ datasets were taken. The outcomes specified a significantly higher DA performance (~95%) showed via the deep learning CNN along with discriminator forgery detectors. However, the work was computationally intricate. Alipour et al. [15] formed a semantic pixel-wise segmentation of JPEG centered upon a deep neural network. Semantic segmentation allotted image’s every pixel to a class label. To segment the JPEG blocks’ boundaries, deep CNN (DCNN) was trained. The trained deep CNN precisely detected the block boundaries associated with disparate JPEG compressions. The CNN-centered algorithm functioned well for non-aligned JPEG-FD together with localization. It was not dependable for unlabeled data.
30 An Highly Robust Image Forgery Detection Using STPPL-HBCNN …
387
3 Proposed Framework for Forgery Detection and Forgery Region Detection These days, digital crimes are augmenting at a faster rate than defensive measures. At times, for a crime or malicious action, the digital media content, say an image or a video might be irrefutable evidence. However, the image proof might be unauthenticated in consequence of the image contamination or forgery. The work has developed STPPL-HBCNN to detect whether the image is forged or not to maintain the image’s authentication. And if it is forged, then DBSCAN-ACYOLOv2 does the region localization. The work flow is exhibited in Fig. 1.
3.1 Image Color Conversion The inputted image is inputted in the RGB color space form. Then, with reference to luminosity, it is transmuted into YCbCr aimed at the lower resolution potential of the color. The formulation of RGB to YCbCr color space is rendered as: ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤ 0.2568 0.5041 0.0979 16 FY IR ⎣ FCb ⎦ = ⎣ −0.1482 −0.2901 0.4392 ⎦⎣ IG ⎦ + ⎣ 128 ⎦ FCr IB 0.4392 −0.3678 −0.0714 128 ⎡
(1)
where FY , FCb , FCr signifies the YCbCr color space pixels and IR , IG, IB implies the RGB color space pixels. Subsequently, to ameliorate the DA, the Y CbCr color space image is split into 8 × 8 patches and also the overlapping image slice is removed.
Fig. 1 Proposed framework for FD and forgery region detection
388
S. H. Varma and C. Akarsh
3.2 Feature Extraction Representing an image in its compacted and exclusive form of matrix–vector or single values is the idea behind FE. The split patches of 8 × 8 are taken for FE. The work has developed BI-STDWT for FE. Centered upon time–frequency, this technique extracts useful data as of the image and also evades denoising in DWT. The techniques render a generation of new pixels centered on the close-by pixels for higher resolution scaling of the image that is split relating to time frequency. The patched image is dissected into [FL L , FL H , FH H , FH L ] and bilinear interpolationcentered scaling is performed. Now, the size’s image is decomposed as: N, −1
F(r, c) =
L. ηs,l,i s,t,i (r, c) +
s,t=0
N −1
G G s,t,i s,t,i (r, c)
(2)
g∈G s=1 t,i=0
where LL s,t,i (r, c) = 2−s/2 2−s r − t, 2−s c − i , g ∈ G
(3)
2 2 i−1 s−1 LL where ηs,t,i = i=1 c s,t,i drdc signifies the bilinear interpolation s=1 asi r scaling coefficient. For every sub-band image, the bilinear coefficients render new pixels via considering its close-by pixels. The new pixels value is multiplied with every sub-band image. Lastly, a histogram-centered analysis of every sub-band level between the pixels value and intensity value is plotted. The forgery image features can well be extracted utilizing bilinear interpolation via finding out new pixels with
G G = F(r, c)s,t,i dr dc exhibits the (t, i)th their neighboring pixels. The term s,t,t wavelet coefficients in S scale as well as sub-band G. Image’s various sub-bands are achieved [FL L , FL H , FH H , FH L ] grounded on the above-given equation.
3.3 Forgery Detection FD interprets the extracted images for detecting the image as an authenticated one or not. The extracted image gets trained to the STPPL-HBCNN. The developed FD technique enhances the FD accuracy and also converges quicker if there is any error loss and goes through an information aggregation of the image at the deeper phases of the neural network. Every sub-band is inputted into the convolutional layer (CL). It carries out a compilation of two-dimensional convolutions betwixt the extracted input features FnG , where G and n implies the sub-band levels as well as the indexes of the features. The CL is gauged as
30 An Highly Robust Image Forgery Detection Using STPPL-HBCNN …
ωnG =
n n
ωi,Gj Fi,Gj + ζi,Gj
389
(4)
i=0 j=0
⎛ ζnG = Z ⎝
N N
⎞ ωi,Gj Fi,Gj + ζi,Gj ⎠
(5)
i=0 j=0
where i, j implies rows as well as columns that signify the features, ωi,Gj implies the weight at the respective sub-band interval of G, ζi,Gj implies the bias, and Z signifies the ReLU activation function. After CL, the spatial–temporal pooling layer comes where the fixed-length depiction of the image concerning size or scale will be avoided, and also object deformation is rendered. Additionally, it aids in the avoidance of image loss. The STP layer is integrated into the last CL. It encompasses ‘3’ pyramid levels (6 × 6, 3 × 3, 1 × 1) with 46 bins as well as forms a fixed-length output that is inputted into the fully connected layer. G FnG = PSTP lopers Fi, j
(6)
STP where Plopers implies an STP pooling layer. Lastly, the CL in tandem with the STP pooling layer creates the ith layer of a CNN. It can capture low-level information and lessen the image’s intricacy. Therefore, the output as of the CL is flattened and additionally processed.
FnG (FFlaten ) = [F1 , F2 , F3 , F4 , . . . .Fn ]
(7)
It forms an N-dimensional vector, where N implies the number of classes that the program has to select as of the flattened features. For categorizing the inputted image belonging to a specific label, a softmax activation function (actσ ) is utilized. actm = actsm
n
ωi, j Fi, j + ζi, j
(8)
i=1
The probability-centered detection of the image is rendered by the softmax. The ROC curve evaluated the threshold intended for the probability. The proposed work has the requirement for a higher true positive rate (TPR) together with a lesser false positive rate (FPR). Thus, the chosen threshold value is 0.88 TPR and 0.12 FPR. The points within the curve signify the forgery image, and the ones exterior to the curve imply the authenticate image. Backpropagation is performed centered upon hierarchical batch stochastic gradient descent (HBSGD) for minimizing the loss in the instance where there is some error loss in attaining the predicted output. The HBSGD renders a quicker
390
S. H. Varma and C. Akarsh
Fig. 2 HBSGD error minimization
convergent rate as well as finds out hierarchical-centered batch and the most moderate value of weights for lessening the error loss as exhibited in Fig. 2 and also the smallest and largest value of weights are ignored. The loss function is gauged as: B E MSE =
i=1
Fi, j − Fi, j
2 (9)
n
where E MSE signifies mean square error regression loss function, Fi, j signifies the predicted output, B implies the batch size, n signifies the total features in the batch size. For lessening the loss, HBSGD updated the weight value via finding optimal weight value that is rendered by: &ωnew = ωold ± J
∇fi (ωold )
(10)
i=B
& if (ωnew ≤ 0.8) : ωmow = ω L & elseif (0.7 ≤ ωnew ≤ 0.8) : ωmew = ω M & else : ωnew = ω S
(11)
where ω L ω M ω S implies the large, moderate, and small weight value. The average of the moderate weight value is considered for obtaining an optimal convergent weight value. Lastly, if it is a forgery image, then the image is provided to region detection, or else authenticated image is attained as exhibited below: m
=
if F ≥ 0.8 authenticate image if F < 0.8 forgery image
(12)
30 An Highly Robust Image Forgery Detection Using STPPL-HBCNN …
391
3.4 Forgery Region Detection The next phase is the region detection, which aids in locating the forgery ones as of an original area. The work has generated DBSCAN-ACYolov2, which can track image in addition to video frames forgery. Clustering of a similar area and spotting the forgery with regards to noise is rendered. Next, it does the bounding box creation on the forgery area. The field of view (FOV) of every filter in every layer of Yolov2 is augmented by the Atrous convolution. It aids in the augmentation of the segmentation’s accuracy. Initially, the image’s histogram data is created, which is rendered to the semisupervised multiple-object tracking learning. The distinctive similarities between the pixel points on an image are identified, and they are clustered as per their similarities. Minimum points (MinPts) and Epsilon (E) are the ‘2’ parameters that the technique depends upon. ‘E’ indicates a circle’s radius via considering a one-pixel point as of the image. MinPts exhibits the point that fulfills the user condition to make a dense cluster or region over an image. Now, the imperative points, say core points, boundary points, as well as noise points, are formed centered on ε and MinPts. If a pixel point fulfills the MinPts within the E distance, then it is regarded to be the core point. And, if a pixel point is the core point’s neighbor, it is regarded as the boundary point. Lastly, if no points lie nearer to the core or boundary point, it is regarded a noise point. Centered upon Euclidean distance, a core point (∀ε ), boundary point BC , as well as noise point are gauged. Initially, an arbitrary pixel point is chosen F Pi, j . Considering E as the radius, a circle is drawn over the pixels, and also a condition for fulfilling the MinPts utilizing Euclidean distance is gauged by: &∀ε F Pi, j = &B ε F Pi, j =
1 2 2 2 Pl − P1m + Pl2 − P2m + . . . Pln − Pnm
Bl1 − ∀1m
2
2 2 + Bl2 − ∀2m + . . . Bln − ∀nm
(13) (14)
upon the core and boundary points, the authenticated image clusters Centered C K will be generated, and the forgery image will be labeled as outlier (C*). The Atrous convolution-centered Yolov2 is trained on the cluster, and next, the forgery region is detected. Next, the cluster points are split into S × S grids. Bounding boxes are predicted for every cluster utilizing: l ∗ prwwh IoUpred Prob C K C ∗ ∗ Prob C ∗ ∗ IoUtruth pred = Prob C K
(15)
where Prob(C∗) ∗ struth pred signifies the prediction bounding box that illustrates the prwwh
forgery region; IoUpred
implies the intersection-over union betwixt the predictions
392
S. H. Varma and C. Akarsh
and the ground truth; Prob C K |C∗ signifies the conditional probabilities of the object belonging to C classes. CNN stands as the basic design that works on the background for the Yolov2; however, the CNN lessens the segmentation property by lessening the spatial resolution of the resulting feature map. Thus, Atrous convolution performs the FE, which provides the relationship betwixt the lth layer and the l − 1th layer, which function as = act(C1 ) = act w ∗ [i + r · K] + β l
(16)
where wl implies the convolution kernel’s weight, β l signifies the bias parameter, * represents convolution, r implies the rate at which the inputted image stride is sampled. If there attains an error loss, then backpropagation is performed to lessen the loss utilizing: δl−1 =
∂L ∂L ∂C l = · = δl−1 ∂C l−1 ∂C l ∂C l−1
∗
rot 180 wl · act wl−2 ∗ l−1 + β l−1 (17)
rot(180(.) signifies the 180° counterclockwise rotation of the weight parameter matrix; • implies the Hadamard product; L(.) signifies the loss function. The distance loss value is estimated grounded on the distance between the truth and predicted bounding box. Lastly, the forgery region is detected.
4 Results and Discussion Centered on disparate performance metrics, the proposed work for FD utilizing STPPL-HBCNN and region localization of forgery image utilizing DBSCANACYOLOv2 is estimated experimentally in comparison with various prevailing methodologies. CIFAR-10 and MNIST datasets were taken. The CIFAR-10 exhibits 32 × 32 color images as well as the MNIST exhibit 28 × 28 hand-written grayscale images. These were executed in MATLAB.
4.1 Performance Analysis of Proposed Forgery Detection Technique The proposed STPPL-HBCNN is contrasted with various prevailing methods, say VGG16CNN, CNN, generative adversarial network (GAN), and DCNN for detecting
30 An Highly Robust Image Forgery Detection Using STPPL-HBCNN …
393
Fig. 3 Graphical demonstration of the proposed method for FD
forgery on account of sensitivity, accuracy, FPR, together with false negative rate (FNR). It can well be exhibited as of Fig. 3 that the proposed one attains a better metrics performance to detect the forgery image. The proposed one attains 91.52% accuracy and 92.96% sensitivity. The prevailing works achieve accuracy along with sensitivity value ranging between 75.78 and 83.78%, which is relatively lower than the proposed work. On account of the extraction of the utmost informative features as of the image, the higher value of the metrics aimed at the proposed work is attained. The proposed one attains a low false alarm rate via attaining an FPR of 10.23% and an FNR of 9.25%. It remains to be effectual when weighed against the prevailing methods, which attains a lower FPR and FNR value ranging between 34.18% and 46.65%.
4.2 Evaluation of a Proposed Method for Detecting Forgery Region With reference to sensitivity, accuracy, precision, FPR, together with FNR metrics, the DBSCAN-ACYOLOv2 is graphically analyzed for the forgery region detection. Next, the proposed work is contrasted with disparate existing methods say K-means clustering-based CNN, fast-CNN (FCNN), improved-MRCNN (IMRCNN), together with MRCNN. Figure 4 exhibits the graphical demonstration of the proposed work for localizing the image forgery region. The proposed work achieved a robust accuracy, sensitivity, together with a precision value of 91.48%, 92.96%, and 91.89% correspondingly. However, the appraisal of prevailing methodologies for the accuracy, sensitivity, in addition to precision value ranges within 75.68% and 81.24%, which is lower contrasted with the proposed one. Besides, the proposed work avoids the overlapping
394
S. H. Varma and C. Akarsh
Fig. 4 Graphical demonstration of the proposed method for detecting forgery region
as well as misclassification of the forgery region via attaining an FPR of 10.23% and FNR of 8.25% and remains to be effectual than the prevailing methods, which attains a high FPR and FNR value between 29.65 and 47.56%.
5 Conclusion The work has presented STPPL-HBCNN-based IFD and DBSCAN-ACYOLOv2based forgery region segmentation. The proposed work does the step-by-step process. Initially, the image is categorized as authenticated or non-authenticated. Then, the non-authenticated is sent for the forged area segmentation. The proposed work converts the RGB image into YCbCr for higher quality and then extracts the informative features as of the image utilizing BI-STDWT. Next, the extracted features are loaded to the STPPL-HBCNN for detecting the image as a normal or forgery one. If the image is a non-authenticated, that is, forgery image, then DBSCANACYOLOv2 does the region localization. The proposed work does a supervised detection of forgery and semi-supervised-centered region detection that detected the forgery for unlabeled image datasets. And it identifies deep pixel variation for conquering the forgery pixels on an image or video frame. Experimentation exhibits that the proposed one achieves a DA of 91.52% and region DA of 91.48% and shuns false prediction via attaining an FPR and FNR of 10.23 and 8.25%. Therefore, the proposed work remains to be robust than the existent top-notch methods.
30 An Highly Robust Image Forgery Detection Using STPPL-HBCNN …
395
References 1. Bunk J, Bappy JH, Mohammed TM, Nataraj L, Flenner A, Manjunath BS, Chandrasekaran S, Roy-Chowdhury AK, Peterson L (2017) Detection and localization of image forgeries using resampling features and deep learning. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), IEEE, 21–26 July 2017, Honolulu, HI, USA 2. Li H, Luo W, Qiu X, Huang J (2017) Image forgery localization via integrating tampering possibility maps. IEEE Trans Inf Forensics Secur 12(5):1240–1252 3. Raju PM, Nair MS (2018) Copy-move forgery detection using binary discriminant features. J King Saud Univ Comput Inf Sci. https://doi.org/10.1016/j.jksuci.2018.11.004 4. Yarlagadda SK, Güera D, Bestagini P, Maggie Zhu F, Tubaro S, Delp EJ (2018) Satellite image forgery detection and localization using gan and one-class classifier. In: International symposium on electronic imaging, vol 214, issue no 7, pp 1–9 5. Mursi MFM, Salama MM, Habeb MH (2017) An improved SIFT-PCA-based copy-move image forgery detection method. Int J Adv Res Comput Sci Electron Eng 6(3):23–28 6. Jin G, Wan X (2017) An improved method for SIFT-based copy–move forgery detection using non-maximum value suppression and optimized J-Linkage. Signal Process Image Commun 57:113–125 7. Zhao F, Shi W, Qin B, Liang B (2017) Image forgery detection using segmentation and swarm intelligent algorithm. Wuhan Univ J Nat Sci 22(2):141–148 8. Yao H, Xu M, Qiao T, Wu Y, Zheng N (2020) Image forgery detection and localization via a reliability fusion map. Sensors 20(22):1–18 9. Singh A, Singh G, Singh K (2018) A markov based image forgery detection approach by analyzing CFA artifacts. Multimedia Tools Appl 77(21):28949–28968 10. Wang X-Y, Jiao L-X, Wang X-B, Yang H-Y, Niu P-P (2018) A new keypoint-based copy-move forgery detection for color image. Appl Intell 48(10):3630–3652 11. Wang X, Wang H, Niu S, Zhang J (2019) Detection and localization of image forgeries using improved mask regional convolutional neural network. Math Biosci Eng 16(5):4581–4593 12. Mahmood T, Mehmood Z, Shah M, Saba T (2018) A robust technique for copy-move forgery detection and localization in digital images via stationary wavelet and discrete cosine transform. J Vis Commun Image Represent 53:202–214 13. Bilal M, Habib HA, Mehmood Z, Saba T, Rashid M (2019) Single and multiple copy–move forgery detection and localization in digital images based on the sparsely encoded distinctive features and DBSCAN clustering. Arab J Sci Eng 45:2975–2992 14. Abdalla Y, Iqbal MT, Shehata M (2019) Copy-move forgery detection and localization using a generative adversarial network and convolutional neural-network. Information 10(9):1–26 15. Alipour N, Behrad A (2020) Semantic segmentation of JPEG blocks using a deep CNN for non-aligned JPEG forgery detection and localization. Multimedia Tools Appl 79: 1–17, 8249– 8265
Chapter 31
Deep Learning-Based Differential Distinguisher for Lightweight Cipher GIFT-COFB Reshma Rajan , Rupam Kumar Roy , Diptakshi Sen , and Girish Mishra
1 Introduction The main goal of cryptography is to encrypt messages such that it is hidden from unauthorised people. Only the ones with the decryption key will be able to decrypt the message. Cryptanalysis is the process of coming up with ways to decrypt messages without using the decryption key. Differential cryptanalysis is one such cryptanalytic method significantly applied on block ciphers. It is a chosen plaintext attack which uses the difference between two input plaintexts and the difference between their corresponding ciphertexts to find the key. Differential attack was first introduced by Eli Biham and Adi Shamir in the late 1980s [1]. They brought up several attacks against block ciphers and hash functions. It was noted that although the DES algorithm was resistant to differential cryptanalysis, slight alterations could make it more prone to the attack. In 1994, Don Coppersmith, IBM-DES team member stated the actions taken by them to defend DES against differential attack since they found it in 1974 [2]. Although DES was not prone to differential attacks, some other block ciphers were apparently at risk. One such block cipher was FEAL-4 [3]. The conventional technique of differential attack is a tiring process which includes creating the difference distribution table for mounting the differential attack. AronGohr [4] in CRYPTO’2019 proposed an efficient method where a deep learning-based distinguisher model was used to mount an attack on lightweight block cipher Speck 32/64 [5]. This method was based on Markov assumption which takes into account all the cipher text differences for a particular plaintext difference. It was proved that R. Rajan (B) Department of Computer Science Engineering, Amrita School of Engineering, Amritapuri, India R. K. Roy · D. Sen Department of Computer Science Engineering, University of Calcutta, Kolkata, India G. Mishra (B) Scientific Analysis Group, Defence R&D Organisation, Delhi, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_31
397
398
R. Rajan et al.
this approach gave better outcomes when compared to previous cryptanalytic works on Speck. They introduced a round-reduced differential attack on Speck 32/64 for 11 rounds. This was followed by an improvised deep learning-based approach brought forward by Baksi et al. [6] where differential attacks against 8 rounds of Gimli-Hash and Gimli-Cipher were performed. In this technique, they used various ciphertext differences as features in the input layer and fixed sets of plaintext differences as labels. Neural network model is trained to predict the plaintext difference class when a set of output differences is provided. Another work similar to this was published by Aayush Jain et al. [7], in which they performed a similar attack on PRESENT cipher using four different models. Our research focuses on a similar attack, as mentioned above, on the block cipher GIFT-COFB [8]. This cipher was designed by Subhadeep Banik, Avik Chakraborti, Tetsu Iwata, Kazuhiko Minematsu, Mridul Nandi, Thomas Peyrin, Yu Sasaki, Siang Meng Sim and Yosuke Todo in 2020. GIFT-COFB is an Authenticated Encryption with Associated Data (AEAD) scheme, which combines features of GIFT-128 lightweight block cipher and the COFB lightweight AEAD operating mode. We will be focussing on the encryption part of GIFT-COFB that is almost similar to that of GIFT-128 which uses the substitution permutation network (SP network). It supports 128-bit plaintext as well as 128-bit key and a key scheduling up to 40 rounds. We have used two different sets of differential classes, one being a random differential class and another one inferred from Wang’s differentials [9]. We will be discussing GIFT-COFB and the differential distinguisher in detail in the later sections. Apart from Baksi’s neural network model [6], we also use a multi-layer perceptron, as used in [7]; by reducing hidden layers and compare the performance of two models. In Sect. 2, we give a brief overview on block cipher GIFT-COFB [8] along with its encryption algorithm. We discuss two different sets of input differentials that we used for generating the dataset in Sect. 3. This is followed by discussion on the structure of our deep learning models in Sect. 4 and the corresponding results as well as some graphs which we collected during our experiments in Sect. 5. Lastly, we give a brief conclusion of our study along with some future work areas in Sect. 6.
2 GIFT-COFB GIFT-COFB is a block cipher-based authenticated encryption scheme which uses S/P network [8]. GIFT-COFB can be viewed as the integration of the features of GIFT 128 and COFB. GIFT-128 is considered as one of the lightest design ciphers and is also denoted as “SMALL PRESENT” which overcame several weaknesses existing in PRESENT cipher with regards to linear cryptanalysis. COFB, on the other hand, computes the “COMBINED FEEDBACK” of block cipher output and data block to improve the security level. Block diagram of encryption of GIFT-COFB is given in Fig. 1. GIFT-COFB uses a 128-bit block as plaintext input to the cipher state in the encryption part, and it is expressed as 4 32-bits segments. It uses a secret key of
31 Deep Learning-Based Differential Distinguisher for Lightweight …
399
Fig. 1 Block diagram of GIFT-COFB encryption
length 128 bits and a key scheduling algorithm for a 40 round encryption process. Both cipher state and key states are initialised in the following way: ⎡
⎤⎡ ⎤ S0 B0 B1 B2 B3 ⎢ S1 ⎥ ⎢ B4 B5 B6 B7 ⎥ ⎥⎢ ⎥ Initialize (P) = ⎢ ⎣ S2 ⎦ ⎣ B8 B9 B10 B11 ⎦ S3 B12 B13 B14 B15 ⎤ ⎡ ⎤ ⎡ W0 W1 B0 B1 B2 B3 ⎢ W2 W3⎥ ⎢ B4 B5 B6 B7 ⎥ ⎥ ⎥ ⎢ Initialize (K ) = ⎢ ⎣ W 4 W 5 ⎦ ← ⎣ B8 B9 B10 B11 ⎦ B12 B13 B14 B15 W6 W7 where Bi is the 8-bit block and S i is the segments in which plaintext is divided in such a way that S0 contains the first bits of all 32 S-boxes and so on. K is the key which is divided into 8 segments as W i with i varying from 0 to 7. After initialisation, the result is loaded as input into the substitution layer. Unlike PRESENT cipher where we use S-BOX in substitution layer in GIFT-COFB, we use the following instructions to substitute the input text:
400
R. Rajan et al.
Table 1 Bit Permutation (Pbi ) used in GIFT-COFB Index
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
Pb0
29
25
21
17
13
9
5
1
30
26
22
18
14
10
6
2
Pb1
30
26
22
18
14
10
6
2
31
27
23
19
15
11
7
3
Pb2
31
27
23
19
15
11
7
3
28
24
20
16
12
8
4
0
Pb3
28
24
20
16
12
8
4
0
29
25
21
17
13
9
5
1
Index
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
Pb0
31
27
23
19
15
11
7
3
28
24
20
16
12
8
4
0
Pb1
28
24
20
16
12
8
4
0
29
25
21
17
13
9
5
1
Pb2
29
25
21
17
13
9
5
1
30
26
22
18
14
10
6
2
Pb3
30
26
22
18
14
10
6
2
31
27
23
19
15
11
7
3
S1 ← S1 ⊕ (S0 & S2) S0 ← S0 ⊕ (S1 & S3) S2 ← S2 ⊕ (S0 | S1) S3 ← S3 ⊕ S2 S1 ← S1 ⊕ S3 S3 ←∼ S3 S2 ← S2 ⊕ (S0 & S1) {S0, S1, S2, S3} ← {S3, S1, S2, S0) where ⊕ , |, ∼ and & are XOR, OR, NOT and AND operations. The substitution layer is followed by the permutation layer in which we use the following instruction and Table 1 to get the permuted text. PermBits(S) = {Pb0(S0), Pb1(S1), Pb2(S2), Pb3(S3)} Next round key is added to the permuted text. Round key is generated using the key scheduling algorithm of GIFT. We define the AddRoundKey function AddRoundKey as: {AddRoundKey(S, K S, i) = {S0, S1 ⊕ (W 6 k W 7), S2 ⊕ (W 2 k W 3), S3 ⊕ Consti },
where Consti = 0 × 800000XY is the ith round constant and the byte XY = 00c5c4c3c2c1c0 is the round constant generated using the 6-bit affine LFSR, whose state is updated as follows: c5||c4||c3||c2||c1||c0 ← c4||c3||c2||c1||c0||c5 ⊕ c4 ⊕ 1
31 Deep Learning-Based Differential Distinguisher for Lightweight …
401
Finally, the KeyUpdate function is used to update the key state after each round. It is defined as: KeyUpdate(KS) = {W 6 2, W 7 12, W 0, W 1, W 2, W 3, W 4, W 5}
3 Differential Distinguisher We have developed our deep learning model for GIFT-COFB based on differential distinguisher similar to that of Baksi’s model [6] on machine learning-based differential distinguisher. We have used two different classes of differential distinguishers; one inferred from Wang’s input differential Table 2 and another one is a random input differential class Table 3. In the first differential class, we followed a similar pattern to Wang’s differential which he had used in his work on PRESENT’s differential cryptanalysis [9]. In this section, we will also discuss our algorithmic approach used for modelling our deep neural network. In this differential distinguisher, we have used input differential classes with four differentials (i.e. t = 4) in each class. The class which was similar to Wang’s differentials, i.e. taken from his study, showed better results compared to the random differential class which are more likely to have low probabilities at times. The algorithm for deep learning-based differential distinguishers is shown as a flowchart in Fig. 2. The whole process is divided in two phases: offline and online phase. In the offline phase (training phase), the deep learning model is trained to learn the relationship between input and output differentials by passing the dataset, i.e. the output–input pair with ciphertext as feature and input differential class as label. Now, we move on Table 2 Wang’s pattern differentials Input differential classes
Input differentials
1
0 × 70000000000000000000000000007000
2
0 × 07000000000000000000000000000700
3
0 × 00700000000000000000000000000070
4
0 × 00070000000000000000000000000007
Table 3 Random differentials Input differential classes
Input differentials
1
0 × 12340000004444000000000000007777
2
0 × 37000234500000065436200000009876
3
0 × 11110022220003333000044440000555
4
0 × 77770067980000000000003456780007
402
R. Rajan et al.
Fig. 2 Flowchart for deep learning-based differential distinguisher
to the online phase only if the train accuracy is greater than or equal to 1/t (here, ≥ 1/4). In the online phase (test phase), we generate a random test case and then pass through our model. If the test accuracy comes out to be greater than or equal to 1/t (here, ≥ 1/4), it predicts whether the given ORACLE is cipher or random text.
4 Deep Learning Model For developing our deep learning model, we have considered sample size to be 10,000, i.e. 10,000 different key plain text pairs and two different types (Wang’s and random) of input differentials containing four types of classes each. For every key– plaintext pair, the plain text is XORed with the key register and the key is updated for the next round, and the XORed output is put into the SP network in each round.
31 Deep Learning-Based Differential Distinguisher for Lightweight … Table 4 Hyperparameters of our models
Hyperparameters
Values
Batch size
200
Epoch
25
Sample size
10,000
Optimiser
Adam
Loss function
MSE loss
Learning rate
0.001
Validation split
0.3
Encryption round
2–6
403
This process continues for the r rounds to get the desired cipher text. Considering the rounds from 2 to 6, it is observed that the model is prone to differential attack till round 4 if we are using the input differentials similar to Wang’s pattern (M1 and M2), but for other two (M3 and M4) models, it is sustainable till round 3. Then, both input and output difference classes are stored in the training dataset. Other hyperparameters Table 4 used here are loss function to mean squared error loss, batch size to be 200 and epoch to be 25. Four different types of model are implemented based on Baksi’s recommended model of “deep learning based differential distinguisher” [6]. The four models are denoted by M1, M2, M3 and M4. In model M1 and M3, we have implemented Baksi’s model of differential distinguisher with Wang’s input differential as well as our random input differential. In M2 and M4, we have implemented our own model of differential distinguisher taking Wang’s input differential and random input differential. Baksi recommended the multilevel perceptron neural network which consists of three hidden layers with 128, 1024, 1024 neurons. Our model is an improvement over Baksi’s model, and it consists of two hidden layers of MLP network which provides a better chance of avoiding the data overfitting and producing better validation accuracy.
5 Result Table 5 consists of the train accuracy as well as the test accuracy of all the 4 models for the rounds 2–6. As said in Sect. 4, M1 is the Baksi’s model implemented with input differential having Wang’s pattern, while M2 is our improvised model with same set of input differential. It is quite clear from the table that M1 and M2 have better test accuracy when compared to M3 and M4, where M3 is the Baksi’s model with random differential and M4 is our improvised model with the same random differential set. This indicates that the differentials that we created using Wang’s pattern gave higher probability outputs in contrast to random differentials.
404
R. Rajan et al.
Table 5 Comparison of models based on train and test accuracy Rounds
Model 1 (M1)
Model 2 (M2)
Model 3 (M3)
Model 4 (M4)
Train
Train
Train
Train
Test
Test
Test
Test
2
1
0.97
0.93
0.95
1
1
1
1
3
0.94
0.915
0.96
0.955
0.95
0.655
0.89
0.72
4
0.94
0.61
0.91
0.615
0.9
0.32
0.81
0.28
5
0.88
0.255
0.69
0.295
0.85
0.255
0.76
0.275
6
0.92
0.245
0.7
0.27
0.7
0.24
0.7
0.245
As we can infer from Table 5, M2 is a better model in comparison to M1, and similarly, M4 showed better performance than M3. This proves that our improvised model with a lesser number of hidden layers overcomes the problem of overfitting, thus giving better accuracy when compared to Baksi’s model with three hidden layers. M1 and M2 have moderate test accuracy, i.e. the cipher is prone to attack till round 4, whereas M3 and M4 have higher accuracy till round 3 only. Comparison of test accuracy of M1-M3 and M2-M4 for rounds 3–6 is shown in Figs. 3 and 4, respectively. Out of these four models, M2 shows the best result, i.e. our improvised model with differentials inferred from Wang’s differential. However, it is noted that our proposed
(Round 2)
(Round 3)
(Round 4)
(Round 5)
Fig. 3 Comparison of models M1 and M3
31 Deep Learning-Based Differential Distinguisher for Lightweight …
(Round 2)
(Round 4)
405
(Round 3)
(Round 5)
Fig. 4 Comparison of models M2 and M4
model can mount an attack on GIFT-COFB only till its fourth round encryption. We have implemented our work on online cloud-based Jupyter Notebook environment, i.e. Google Colab using the CPU runtime with a specification of Intel Xeon Processor with two cores @ 2.30 GHz and 13 GB RAM. This work’s implementation can be found in the GitHub repository [10].
6 Conclusion We carried out our research work taking references from Wang’s differentials and Baksi’s model. The Wang’s pattern that we followed in our differential class proved to be more effective than any random differential class. We made some improvements in Baksi’s model and reduced the number of hidden layers. This change gave us some better results within a lower training time. With the combination of Wang’s differential and our improvised model, we were successful in mounting an attack on GIFT-COFB till the fourth round. However, if we encrypt the text using the whole 40 rounds of GIFT-COFB, the accuracy is very low which concludes that 40 round GIFT-COFB is immune to our model. As a future work, we would be working on improvising our model to increase the accuracy for more rounds of the cipher. Also, we could mount an attack on different block ciphers using our model to check whether
406
R. Rajan et al.
they are immune to the attack. Lastly, we have not come up with any key retrieval method in this paper, which can also be considered in the future. Acknowledgements The Authors are grateful to the Scientific Analysis Group, DRDO, India for giving the opportunity to work in the field and for the guidance throughout the work. The authors would also like to thank the University of Calcutta, India and Amrita School of Engineering, Amritapuri, India for their constant support and encouragement.
References 1. Biham E, Shamir A (1993) Differential cryptanalysis of the data encryption standard. Springer, Berlin 2. Coppersmith D (1994) The data encryption standard (des) and its strength against attacks. IBM J Res Dev 38(3):243–250 3. Aoki K, Ohta K (1996) Differential-linear cryptanalysis of feal-8. IEICE Trans Fundam Electron Commun Comput Sci 79(1):20–27 4. Gohr A (2019) Improving attacks on round-reduced speck32/64 using deep learning. In: Annual international cryptology conference. Springer, Berlin, pp 150–179 5. Dinur I (2014) Improved differential cryptanalysis of round-reduced speck. In: International conference on selected areas in cryptography. Springer, Berlin, pp 147–164 6. Baksi A, Breier J, Dong X, Yi C (2020) Machine learning assisted differential distinguishers for lightweight ciphers. IACR Cryptol ePrint Arch 2020:571 7. Jain A, Kohli V, Mishra G (2020) Deep learning based differential distinguisher for lightweight cipher present. IACR Cryptol ePrint Arch 2020:846 8. Banik S, Chakraborti A, Iwata T, Minematsu K, Nandi M, Peyrin T, Sasaki Y, Sim SM, Todo Y (2020) GIFT-COFB. IACR Cryptol ePrint Arch 2020:738 9. Wang M (2008) Differential cryptanalysis of reduced-round present. In: International conference on cryptology in Africa. Springer, Berlin, pp 40–49 10. https://github.com/DRDOINTERNSHIP/DeepLearning_Cryptograhy
Chapter 32
Dementia Detection Using Bi-LSTM and 1D CNN Model Neha Shivhare, Shanti Rathod, and M. R. Khan
1 Introduction Dementia is a general term used to describe a decline in mental capacity that is severe enough to interfere with way to life. Alzheimer is the most typical form of dementia. Symptoms embody a good range mainly related to a progressive decline in memory and other cognitive abilities, reducing a person’s ability to perform everyday activities, together with orientation and way-finding tasks. Individuals with dementia even experience difficulties to find their approach in acquainted environments, and dementia patients begin having issues in several tasks of daily life. Initial symptoms are gentle, like black out, confusion, mood changes, issues finding the proper words, needing some facilities with designing or work. But over time, they are unable to try to do traditional tasks of daily living. They might have issues in walking, talking, and swallowing food. At the final stage, they become completely dependent and might have issues like pneumonia, infections, bedsores, multiorgan failure, and so on. Typically, the family could notice initial changes, but they suppose this is often a part of traditional aging or stress. Sometimes, particularly once the persons with dementia are younger and show behavior changes, the symptoms could also be mistaken as a psychiatric problem. Dementia syndrome might have an effect on regarding a 135 million individuals worldwide by the year 2050. Healthcare technologies and helpful
N. Shivhare (B) Dr. C V Raman University, Kota, Bilaspur, Chhattisgarh, India S. Rathod Electronics and Telecommunication Department, Dr. C V Raman University, Kota, Bilaspur, Chhattisgarh, India M. R. Khan Electronics and Communication Department, GEC, Jagdalpur, Chhattisgarh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_32
407
408
N. Shivhare et al.
systems are designed to support standard of living of individuals with dementia, up their autonomy, safety, and quality of life. Dementia is a psychological sickness that impacts memory and is contemplated by an assortment of neurotic methods including AD. Dementia could affect a person’s discourse, capacity to interface, and language. Since no. of people experiencing dementia are quickly developing, early acknowledgment of dementia is basic in clinical practice, and there is a necessity for robotized, easy-to-utilize, minimal expense, and exact separation strategies. Current research utilizing a subjective methodology of (CA-Conversation Analysis) has shown that communication issues could be distinguished during conversational among patients and nervous system specialists, or that this could be utilized to isolate among patients with (ND-neurodegenerative infection) and (FMD-utilitarian memory illness); exhibit memory issues not reasons by means of dementia [1, 2]. Manual CA, on the opposite side, is expensive and complex to increase for regular clinical employments. While an outcome, we made a completely robotized plot subject to examinations of a person’s discourse and discourse as they communicate with an Intelligent Virtual Agent (IVA). The IVA requests a grouping from memory-testing inquiries that have been setting up to be mentioned to answer psychologically. These questions are like a kind of requests that are much of the time asked during history segment of an ordinary eyeto-eye arrangement. In the event that assessing a plan in a genuine memory clinical on patients with ND and FMD, a no. of attributes utilized in confabulation investigation were extricated, and higher exactness levels were gotten [3, 4]. The utilization of IVAs in medical care applications has right now expanded multiplication. An IVA is an informal head movement displayed on a screen that might be went with through other discourse/video developments like (TTS-text-to-speech), pre-catch sound/video, and (ASR-automatic speech recognition) implanted in a sort of spoken exchange conspire which performs communications with clients or offers different offices for them (e.g., inspiring them to take a walk). Applications include use through people with psychological wellness issues [5–8], moderate intellectual weakness (MCI) [9], (ADAlzheimer’s sickness) [10, 11], and a sound old [12]. Nakatani et al. made a 3D virtual director dependent on an image of a familiar face, including a relations partner, to associate with dementia patients and give “individual-focused consideration” [13]. Dementia has additionally been recognized utilizing IVAs. Tanaka et al. made an IVA utilizing talk exchange to recognize dementia indications from the get-go [14]. A basic block diagram is shown here. Speech signals which are the recorded input signals are fed to An Automatic Speech Recognition System. ASR output is fed to feature extraction unit, which extracts the required features. Those features are given to a classifier, which compares the signals and divide the output into two classes, ND and FMD (Fig. 1). A double AI classifier, here we need to separate among people with ND and those with FMD gatherings. Additionally, no single classifier could execute well constantly, and in light of the information, we could (utilize a regular approval strategy) choose different classifiers to find the best classifier with most extreme exactness. The remaining article is coordinated as follows: Part 2. Literature Review work Part 3. The proposed bidirectional LSTM network with 1D CNN model calculations
32 Dementia Detection Using Bi-LSTM and 1D CNN Model
409
Fig. 1 Block diagram for a dementia recognition framework
is introduced in Part 3 follow by problem Statement. Part 4 presents and talks about the consequences of the model simulation. At last, in Part 5, the final conclusion and ideas are given for future work.
2 Literature Review Lauraitis et al. [16] the learn contained 339 voice models assemble from 15 individuals: 7 individuals with beginning stage CNSD (3 Huntington’s, 1 Parkinson’s, 1 cerebral paralysis, 1 post-stroke, 1 early dementia), and other 8 tough individuals. The Neural Impairment Test Suite (NITS) versatile application’s voice recorder is utilized to assemble discourse information. Pitch forms (MFCC-Mel-recurrence cepstral coefficients), (GTCC-Gamma tone cepstral coefficients), Gabor (scientific Morlet) wavelet, and hearable spectrograms are utilized to remove attributes. The accuracy of a sound versus weakened order issue is 94.50% (Bi-LSTM) and 96.3% (WST-SVM). The innovation made could be utilized in computerized CNSD patient wellbeing state checking and clinical decision help plans, and additionally as a part of an (IoMT-Internet of Medical Things). Rochford et al. [17] the effect of applying stop and expression time sharing information in recognizing among intellectually solid and impeded more established people was concentrated in this investigation. Worldly qualities utilizing a static 250ms limit, brief attributes utilizing a unique edge, and interruption and expression time designation boundaries 3 arrangements of qualities were recovered from 187 discourse catching. Utilizing (LDA-Linear Discriminant Analysis) classifications, an ability of each of these gatherings to recognize among intellectually solid and
410
N. Shivhare et al.
psychologically hindered members was tried. Whenever differentiated to a static transient element, that noticed failure of a classifier utilizing a delay and expression length conveyance boundaries improved by 0.22% (to 64.20% affectability), 6.33% (73.12% particularity), and 3.27% (68.66% all out exactness) Lopez-de-Ipina et al. [18] the objective of this article is to investigate a chance of utilizing canny algos to results obtained from non-intrusive logical methods on dubious patients to upgrade both early acknowledgment of Alzheimer’s illness or seriousness of an infection. This article assesses (ERAA-Emotional Response Automatic Analysis), which is reliant upon both conventional and novel discourse highlights: Higuchi (FD-Fractal Dimension) and emotional temperature (ET). The methodology has a best the particular advantage of being, in adding to non-intrusive, minimal expense, and liberated from angle impacts. This is a pre-clinical studio for approving forthcoming symptomatic tests and biomarkers. For a portrayal of qualities equipped to early determination of Alzheimer’s ailment, ERAA delivered incredibly phenomenal and cheerful results. Mirheidari et al. [19] they include solid old controls (HCs) and those with MCI to a rundown of analytic classes in this exploration. They’re likewise investigating into whether IVA could be utilized to control more conventional psychological evaluations, as verbal familiarity appraisals. A four-way classifier arranged on an enormous list of capabilities got 48% exactness, which expanded to 62% when just the 22 most significant highlights were utilized (ROC-AUC: 82% ). Luz (2017). Longitudinal Monitoring et al. [20] On an informational index of intuitive language catching of Alzheimer’s patients (n=214) and matured oversees (n=184), a proposed strategy shows that a Bayesian classify working on qualities dispensed with through simple algos for sound demonstrations acknowledgment and language rate catching could get accuracy of 68%. Liu et al. [21] the technique for recognizing dementia is explored in this paper by assessing an intuitive language made by means of Mandarin speakers during a picture portrayal task. To start, a Mandarin discourse dataset is made that incorporates a discourse from sound people and likewise patients with (MCI-gentle intellectual disability) or dementia. Then, at that point, three gatherings of attributes are recovered from voice accounts, including time qualities, acoustic qualities, and etymological attributes, and differentiated by making strategic relapse classifiers for dementia acknowledgment. Combining all attributes produces a biggest proficiency for recognizing dementia from sound controls, with an accuracy of 81.9% in a 10-crease cross-affirmation. Preliminaries are utilized to inspect a significance of different characteristics, and they show that difference in perplexities created from etymological models is a generally gainful. Haider et al. [22] from a computational paralinguistic viewpoint, examination into future worth of essentially acoustic attributes consequently got from intuitive language for Alzheimer’s dementia recognizable proof. On a compared model of Dementia Bank’s Pitt instinctual language dataset, with patients likened by means of sex and time, an exhibition of different best in class paralinguistic trademark sets for Alzheimer’s distinguishing proof was assessed. The (eGeMAPS-expanded Geneva moderate acoustic boundary set), an emblazoned trademark set, the difference 2013
32 Dementia Detection Using Bi-LSTM and 1D CNN Model
411
trademark set, and most recent multiresolution Cochlea grams (MRCG) attributes were among a capabilities assessed. Moreover, they additionally give a most recent (ADR-dynamic information portrayal) include extraction strategy for Alzheimer’s dementia identification. The discoveries show that a classification system dependent on acoustic discourse attributes extricated by means of our ADR strategy could get exactness levels identical to models utilizing more elevated level language highlights. As indicated by discoveries, all trademark sets contribute information that isn’t gathered by other trademark sets. They exhibit that while eGeMAPS trait set offers marginally worked on exact (71.34%) than other quality sets actually, “hard combination” of trademark sets helps exactness to 78.70%.
3 Proposed Work (A)
(B)
Problems identification Support vector machine is used, but required accuracy is not found. So to increase the accuracy, efficiency and prediction rate, we have used deep learning methods. Proposed Methodology Dementia dataset is taken where audio document is considered for speech recognition analysis. Data is collected from Dementia Bank. The audio files received from dataset analysis converted to text file based on speech analysis. Then, at that point perform tokenization on information patient dataset. In this process, the dataset is cleaned by eliminating and supplanting invalid string, null string, and eliminating underscore. After that, perform word tokenizing. Make age and sex records independently and produce a full information outline. Presently, both pickle records are mirage into another information outline which have the highlights of their id which are available in both pickle document and reject the others. Finally, we make a hybrid model including 1DConv and Bi-LSTM, then, at that point passes the information for preparing into the model. Neural network weights are saved and with the automatic speech recognition tool on the basis of NLP by recognizing a voice, it can be predicted that the person has dementia or not.
Data Pre-processing Load the dataset and convert it in a predicament document or pickle file. Then, at that point perform tokenization on information patient dataset. In this, clean the dataset by eliminating and supplanting invalid string and eliminating highlight or underscore. At long last, perform word tokenizing. Make age and sex records independently and produce a full information outline. In this, it has two marks dementia and control. Dementia mark comprises four classes that are cookie, fluency, recall, and sentence.
412
N. Shivhare et al.
Now encode the tokenized data using UTF-8 that consists of a tokenized list and tokenized id that creates a data frame using text, level and id in dementia list. This data frame saves in CSV format. Now process the Ana graphic file that consists of id, entry age, initial date. Then create a patient dictionary by considering id, age, sex, race, and education. Now create final data frame including all attributes from Ana graphic file and patient dictionary. Now analyze the intensity of sentiments by merge up pickle file and final data frame that is in CSV. Then we defined Post tag in preprocessing, then also it considered vocabulary size of 30,000, sequence length of 73, and embedding size of 300. Training Pass pickle document for preparing and process post labels for all meeting records and split the dataset train 90% and test 10% with 4, 10, and 95 arbitrary seeds. Presently, perform word inserting utilizing Glove6V. There are utilizing an Adagrad optimizer with a learning price of 0.0001 and finally, prepare the network. (C)
Hybrid CNN + Bidirectional LSTM Model This framework is a mixture of above two frameworks. We pass the embeddings via a sequence of 1D convolutional layers followed by a Max Pooling surface, with two bidirectional LSTM surfaces stacked over the Max pool surface. A dense network follows this. The activations are used for CNN and bidirectional LSTM, while we use ReLU activation for dense surfaces followed by a SoftMax function for categorization.
1D CNN In this part, a one-dimensional convolution neural network model (1D CNN) has been introduced for human speech activity detection dataset. A neural network is a hardware and/or app scheme modeled after way neurons in a person’s intelligence purpose. Conventional neural networks aren’t well suited to picture processing and must be fed pictures in low-resolution portions. CNN’s “neurons” are arranged more like those in a frontal lobe, an area in humans and other animals amenable for procedure picture inputs. Traditional neural networks’ piecemeal image processing difficulty is prevented by arranging layers of neurons in such a way that they span a whole visual field. A CNN employs technology similar to a multilayer perception which is optimized for low processing needs. An input surface, an output surface, and a secreted surface with several convolution surfaces, pooling surfaces, fully connected surfaces, and normalizing surfaces make up a CNN’s surfaces. The elimination of constraints and improvements in picture processing performance results in a scheme that is significantly more effective and easier to prepare restricted for picture processing and natural language processing. A similar approach could be used to identify human activity using onedimensional series of data, like in an instance of acceleration and dementia data. The architecture study how to extract characteristics from observation series and how to map interior characteristics to various action kinds.
32 Dementia Detection Using Bi-LSTM and 1D CNN Model
413
The benefit of utilizing CNNs for series categorization is that they could study directly from raw time series data, removing a requirement for domain information to manually engineer input characteristics. The model should be able to learn an internal representation of a time series data and ideally obtain execute similarly to architecture trained on an edition of a dataset with artificial characteristics. This part is split into four components; they are: Fit and evaluate Model load data Complete example Summarize results. Bidirectional LSTM Bidirectional LSTMs are an extension of conventional LSTM which could be used to increase model efficiency on sequence categorization issues. Bidirectional LSTMs prepare 2 LSTMs on an input sequence instead of one in issues where all time steps of an input series are accessible. The first on an input sequence, and a second on a reversed replica of it. This could offer network more context and lead to quicker and even fuller learning of an issue. Add Model Layer Bidirectional LSTM We can extend an instance to showcase a bidirectional LSTM now that we know how to create an LSTM for sequence categorization issues. This can be done by wrapping an LSTM secreted surface in a bidirectional surface, as seen below: This will construct two copies of a secreted surface, one that will fit in an input series as-is and one a reversed copy of an input series. The outcome values from these LSTMs are concatenated by default. Input layer: input sentence to this framework. Embedding layer: map every word into a lower dimension vector. Conv1V: We are going to discuss the TensorFlow API and Keras Conv1D class, including the most important parameters you need to tune when training your own deep convolutional neural networks (CNNs). Attention layer: Make a weight vector, and multiply it by a weight vector to combine word-level characteristics from every time step into a sentence-level characteristic vector. Dropout: Dropouts are a regularization technique that is used to avoid a framework from overfitting. Dropouts are used to change a percentage of a network’s neurons at random. The arriving and departing links to those neurons are also switched off when neurons are switched off. This is complete to help a framework learn better. Dropouts should not be utilized after convolution layers; instead, they must be utilized after network’s dense layers. This is always a great idea to only turn off neurons to 50%. If we turn off more than half of a framework, there is a chance
414
N. Shivhare et al.
that framework may lean badly and a forecast will be inaccurate. Let’s look at how to use dropouts and describe them when creating a bidirectional LSTM model. Dense: The term refers to which neurons in a network surface are fully linked (dense). Every neuron in a surface before it gathers data from all neurons in a surface before it, creating them extremely interconnected. In other terms, a dense surface is a fully interconnected surface, indicating that all neurons in 1 layer are coupled to those in a next. (D)
.Proposed Algorithm
Step 1: Collect the data Dementia dataset. Step 2: Audio file is considered for speech recognition. Step3: After that is generated and it is predefined given in dementia bank. Step4: After that audio file is converted to text on the basis of speech analysis using NLP. Step 5: Then perform tokenization on data patient dataset. Step6: In this, clean the dataset by removing and replacing null string, punctuation, and removing underscore. Finally, perform word tokenizing. Create age and sex files individually and generate a full data frame. Step7: Now encode the tokenized data using UTF-8 that consists of a tokenized list and tokenized id that creates a data frame using text, level and id in dementia list. Step 8: Create a hybrid model including 1D-Conv and Bi-LSTM, then passes the data for training. Step 9: Finally, the measured the model performance on test data measured on test data in an aspect of accuracy, precision, recall, and F-score. After completing the training performance of the model. Step 10: Neural network weights are saved and with the automatic speech recognition tool on the basis of NLP by recognizing a voice it can be predicted that the person has dementia or not (Fig. 2).
4 Results and Discussion This work has implemented using Python programming language and platform used is Jupyter notebook (version 6.3.1).here, we have used Dementia dataset. Perform the experiment. The description of such dataset and achieved results of the proposed model has given below. Dataset Description Dataset Dementia Bank (Boller and Becker 2005) is the biggest publicly available dataset of transcripts. Dementia Bank is a shared database of multimedia interactions for the study of communication in dementia (Fig. 3). Voice recordings of AD interviews (and control) patients. 1 Patient were required to complete several activities, like the “Boston Cookie Theft” explanation activity,
32 Dementia Detection Using Bi-LSTM and 1D CNN Model
415
Start
Collect the Dementia Dataset
Audio file is considered for speech recognition
Data Generation
Conversion from Audio file to text
Tokenization
Clean the dataset by removing and replacing null string.
Train Model
Test Model
NLP by recognizing a voice it can be predicted that the person has dementia or not.
Stop Fig. 2 Proposed work flowchart
patients were shown an image and required them to explain what they saw (see Fig. 4). Other activities included the “Recollect Test,” in which patients were asked to recall details from a previously stated story. Automatic morph syntactic analysis, like standard part-of-speech labeling, explanation of tense, and repetition indicators, is included with each transcript in Dementia Bank. 2 Note that these are not AD-specific
416
N. Shivhare et al.
Fig. 3 Boston cookie stealing task description. All activities in the picture were to be explained by participants
characteristics, but rather generic, automatically extracted language qualities. To use as datasets, we separated every transcript into individual utterances. We also deleted any utterances that were not accompanied by POS tags. This balancing lowered that amount of data but assured that models with tagged and untagged settings were matched fairly. The Alzheimer and associated Dementias Research at a University of Pittsburgh School of Medicine obtained these transcripts and audio files as part of a wider protocol. The University of Pittsburgh was awarded NIH funds AG005133 and AG003705 to help them acquire Dementia Bank information. Individuals with probable and suspected Alzheimer’s Illness, as well as elderly controls, took part in a study. Data was collected on a yearly basis throughout a period. https://dementia.talkbank.org/access/English/Pitt.html. Performance Matrix 1.
Accuracy The number of correct forecasts your model made for entire test dataset is referred to as accuracy. The following formula is used to calculate it: Accuracy =
2.
TP + TN TP + TN + FP + FN
(1)
Recall The right positive price, also called recall, is an evaluation of how many right positives are forecasted out of all positives in a dataset. Sensitivity is another name for it. The following method is used to measure a metric:
32 Dementia Detection Using Bi-LSTM and 1D CNN Model
417
Fig. 4 ROC graphs base
Recall = 3.
TP TP + FN
Precision Accuracy is a criterion for evaluating how precise a positive forecast is. In other terms, how sure could you be that a positive result is indeed positive if it is forecasted as such? It is evaluated using formula below: Precision =
4.
(2)
TP TP + FP
(3)
F-score The F1-score is by far a most popular F-score. It is their harmonic mean, which is a mixture of precision and recall. F1-score could be measured using formula below: F1 = 2 ·
Precision · recall Precision + recall
(4)
418
5.
N. Shivhare et al.
ROC
A ROC (ROC curve-receiver operating characteristic curve) demonstrates how well a categorization framework executes among all categorization levels. On this graph, two variables are plotted: True Positive Rate False Positive Rate A.
Outcome of the Model
Comparison of performs parameter between existing method and table base and proposed 1D-Conv and Bi-LSTM (proposed) method (Figs. 4, 5 and Table 1).
Fig. 5 ROC curve proposed
Table 1 Comparison of performs parameter between existing method and table base and proposed 1D-Conv and Bi-LSTM (proposed) method
Parameter
Existing method
1D-Conv and Bi-LSTM (proposed)
Accuracy
0.77
0.83
Precision
0.76
0.98
Recall
0.83
0.80
F-Score
0.79
0.88
32 Dementia Detection Using Bi-LSTM and 1D CNN Model
Performance Comparison
% value
Fig. 6 Comparison bar graph of performance parameters for existing method and proposed 1D-Conv and Bi-LSTM
419
1 0.8 0.6 0.4 0.2 0
ExisƟng Method 1D-Conv and BiLSTM (Proposed)
Parameters
The comparison of several classification methods is represented in Table 4.1. It represents overall performance comparison output in contrast to several existing methods and 1d-Conv and Bi-LSTM method like accuracy, precision, recall and F-score (Fig. 6).
5 Conclusion Dementia illness impacts discourse is a kind of syntactic, semantic, information, and auditory issues, as indicated by results from past studies. Without utilizing master characterized phonetic highlights, we utilized an exchange learning methodology to upgrade independent AD gauging utilizing a similarly minimal centered talking dataset. On a Cookie-Theft picture clarification check of a Pitt corpus, we tried a recently built pre-arranged transformer subject to talking structure which we improved with upgrade procedures. The accuracy (83%), precision (98%), recall (80%), and F1-scores of (88%) were acquired utilizing sentence-level 1D-CNN and Bi-LSTM, which upgraded cutting edge results. In a few dialects, pre-prepared language models are open. As a result, a procedure introduced in this investigation could be tried in dialects other than English. Besides, with multilingual renditions of these plans, data of AD forecasting in 1 language could be communicated to another dialect if a sufficiently large dataset isn’t available. In future, we might accumulate a bigger dataset that might help in a formation of a more summed up implanting. Further, we can likewise expand the dataset for individuals communicating in various dialects.
420
N. Shivhare et al.
References 1. Elsey C, Drew P, Jones D, Blackburn D, Wakefield S, Harkness K, Venneri A, Reuber M (2015) Towards diagnostic conversational profiles of patients presenting with dementia or functional memory disorders to memory clinics. Patient Educ Counsel 98:1071–1077 2. Jones D, Drew P, Elsey C, Blackburn D, Wakefield S, Harkness K, Reuber M (2015) Conversational assessment in memory clinic encounters: interactional profiling for differentiating dementia from functional memory disorders. Aging Ment Health 7863:1–10 3. Mirheidari B, Blackburn D, Harkness K, Walker T, Venneri A, Reuber M, Christensen H (2017) An avatar-based system for identifying individuals likely to develop dementia. In: Proceedings of Interspeech, pp 3147–3151 4. Mirheidari B, Blackburn D, Venneri A, Reuber M, Walker T, Christensen H (2018) Detecting signs of dementia using word vector representations. In: Proceedings of Interspeech. ISCA 5. Rus-Calafell M, Gutierrez-Maldonado J, Ribas-Sabaté J (2014) A virtual reality-integrated program for improving social skills in patients with schizophrenia: a pilot study. J Behav Ther Exp Psychiatry 45(1):81–89 6. Leff J, Williams G, Huckvale M, Arbuthnot M, Leff A (2014) Avatar therapy for persecutory auditory hallucinations: what is it and how does it work? Psychosis 6(2):166–176 7. Huckvale M, Leff J, Williams G (2013) Avatar therapy: an audio-visual dialogue system for treating auditory hallucinations. In: INTERSPEECH, pp 392–396 8. Hayward M, Jones A, Bogen-Johnston L, Thomas N, Strauss C (2017) Relating therapy for distressing auditory hallucinations: a pilot randomized controlled trial. Schizophr Res 183:137– 142 9. Morandell M, Hochgatterer A, Fagel S, Wassertheurer S (2008) Avatars in assistive homes for the elderly. In: Symposium of the Austrian HCI and Usability Engineering Group. Springer, Berlin, pp 391–402 10. Carrasco E, Epelde G, Moreno A, Ortiz A, Garcia I, Buiza C, Urdaneta E, Etxaniz A, Gonzalez M, Arruti A (2008) Natural interaction between avatars and persons with Alzheimers disease. In: International conference on computers for handicapped persons. Springer, Berlin, pp 38–45 11. Tran M, Robert P, Bremond F (2016) A virtual agent for enhancing performance and engagement of older people with dementia in serious games. In: Workshop artificial compagnonaffect-interaction 2016 12. Cyarto E, Batchelor F, Baker S, Dow B (2016) Active ageing with avatars: a virtual exercise class for older adults. In: Proceedings of the 28th Australian conference on computer human interaction. ACM, pp 302–309 13. Nakatani S, Saiki S, Nakamura M (2018) Integrating 3d facial model with person-centered care support system for people with dementia. In: International conference on intelligent human systems integration. Springer, Berlin, pp 216–222 14. Tanaka H, Adachi H, Ukita N, Ikeda M, Kazui H, Kudo T, Nakamura S (2017) Detecting dementia through interactive computer avatars. IEEE J Transl Eng Health Med 5:1–11 15. Rizzo A, Lucas G, Gratch J, Stratou G, Morency L, Chavez K, Shilling R, Scherer S (2016) Automatic behavior analysis during a clinical interview with a virtual human. In: MMVR, 2016, pp 316–322 16. Lauraitis A, Maskeliunas R, Damasevicius R, Krilavicius T (2020) Detection of speech impairments using cepstrum, auditory spectrogram and wavelet time scattering domain features. IEEE Access 8:96162–96172. https://doi.org/10.1109/access.2020.2995737 17. Rochford I, Rapcan V, D’Arcy S, Reilly RB (2012) Dynamic minimum pause threshold estimation for speech analysis in studies of cognitive function in ageing. In: 2012 annual international conference of the IEEE engineering in medicine and biology society. https://doi.org/10.1109/ embc.2012.6346770 18. Lopez-de-Ipina K, Alonso JB, Travieso CM, Egiraun H, Ecay M, Ezeiza A, Barroso N, Martinez-Lage P (2013) Automatic analysis of emotional response based on non-linear speech modeling oriented to Alzheimer disease diagnosis. In: 2013 IEEE 17th international conference on intelligent engineering systems (INES). https://doi.org/10.1109/ines.2013.6632783
32 Dementia Detection Using Bi-LSTM and 1D CNN Model
421
19. Mirheidari B, Blackburn D, OrMalley R, Walker T, Venneri A, Reuber M, Christensen H (2019) Computational cognitive assessment: investigating the use of an intelligent virtual agent for the detection of early signs of dementia. In: ICASSP 2019—2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/icassp.2019.868 2423 20. of Alzheimer’s Dementia in Spontaneous Speech. IEEE J Sel Top Signal Process 1–1. https:// doi.org/10.1109/jstsp.2019.2955022 21. Liu Z, Guo Z, Ling Z, Wang S, Jin L, Li Y (2019) Dementia detection by analyzing spontaneous mandarin speech. In: 2019 Asia-Pacific signal and information processing association annual summit and conference (APSIPA ASC). https://doi.org/10.1109/apsipaasc47483.2019. 9023041 22. Haider F, De La Fuente S, Luz S (2019) An assessment of paralinguistic acoustic features for detection of Alzheimer’s dementia in spontaneous speech. IEEE J Sel Top Signal Process 1–1. https://doi.org/10.1109/jstsp.2019.2955022
Chapter 33
PairNet: A Deep Learning-Based Object Detection and Segmentation System Ameya Kale, Ishan Jawade, Pratik Kakade, Rushikesh Jadhav, and Nilima Kulkarni
1 Introduction It is observed by WHO that many countries are lacking in the number of doctors as compared to the population. These shortages suggest that many countries lack the skilled people in their healthcare department; this worked as a motivation to create an automated system. Detection of medical anomalies in a medical report is one of the laborious and time-consuming tasks performed by the physicians and doctors. To reduce this workload, there was a need of a generalized model which will be effective to detect and segment various anomalies with higher accuracies. Global irreversible blindness is caused by the disease glaucoma as this disease damages the eye’s optic nerve. An estimation of about 60.5 million people (about twice the population of Texas) was affected by primary angle-closure glaucoma (PACG); the blindness caused by glaucoma can be prevented by the preliminary treatment. Detecting the optic disk in the fundus image is a fundamental step in the detection of this disease. Currently, ophthalmologist is inspecting the fundus images without any automated system. The process is arduous, time-consuming, and based on the experience of the specialist, the quality of the work can be compromised. Moreover, some of the patients would not get the treatment due to lack of the specialist. Image processing and deep learning are the two main approaches for the detection and segmentation of the optic disk. As deep learning is robust in detecting the optic disk in noise variation and illumination variation, deep learning is the perfect approach to follow to get a robust generalized medical object detection and segmentation model. There are various approaches using deep learning techniques to detect A. Kale (B) · I. Jawade · P. Kakade · R. Jadhav · N. Kulkarni Department of Computer Science, MIT Art Design Technology University, Loni Kalbhor, India N. Kulkarni e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_33
423
424
A. Kale et al.
the optic disk. Mostly, the detection systems in the modern era can be divided into two groups: One stage detector and two stage detectors. In one stage detector, the recent works of Retina Net [1] have demonstrated positive results. Whereas in the two stage detectors, group R-CNN [2] work is astonishing, followed by Faster R-CNN [3] and Mask R-CNN [4]. A brain tumor, known as an intracranial tumor, is an abnormal mass of tissue in which cells grow and multiply uncontrollably, unchecked by the mechanisms that control normal cells. Brain tumor segmentation is vastly carried out by the neurologist, which can be compromised in case of inexperienced person, as it is most crucial, complex and important work in the field of medical imagery. This will result in inaccurate prediction and diagnosis. Similarity between tissues and tumor makes it a demanding job to extract the tumor region from images. According to the statistics presented by [5], the 10th leading cause of death is the brain and nervous system cancer. The survival rate for the patients varies from 32% to 36%, where gender-based statistics show that male have a 34% and female have 36% survival rate. As per WHO, 400,000 people (about half the population of Delaware) are suffering due to brain tumor and a whooping, 120,000 have lost their lives to this illness in the previous year’s [6]. For intracranial segmentation, Devkota et al. [7] proposed a method based on the mathematical morphological operations and spatial FCM algorithm for the whole brain segmentation which optimizes the computational power and dwindles the computation time, but the result is not that satisfactory as the accuracy to detect the cancer is 92%, whereas 86.6% pf accuracy is noted on the classifier. Yantao et al. [8] reconfigured the segmentation technique which was based on the histogram. The abnormality in the regions was detected by using the region-based active contour model. This resulted in the dice coefficient of 73.6% and sensitivity of 90.3%. Researcher Badran et al. [9] focused on the edge detection and accumulated the adaptive thresholding in order to detect and segment the target feature from the medical imagery. To enhance the texture-based tumor segmentation in MRI, a researcher Pei et al. [10] proposed a method that applies tumor growth patterns as novel features. Another model was proposed by Dina et al. [11] with the modification of the probabilistic neural network model, which is like learning vector quantization; the researcher was able to obtain 79% of the reduction in processing time due to the modified PNN method. Othman et al. [12] applied the probabilistic neural network-based segmentation technique for displaying some key improvements in feature extraction and dimensionality reduction. Principal component analysis played a vital role in determining the results. Training dataset and testing dataset included 20 and 15 subjects, respectively. The accuracy ranged from 73% to 100% based on the spread value. Based on region fuzzy clustering and deformable model, Rajendra et al. [13] got 95.3% and 82.1% of ASM and Jaccard Index. LinkNet network was proposed by Zahra et al. [14]. It was used for tumor segmentation and achieved the dice score for single network and multiple networks as 0.73 and 0.79, respectively. Therefore, early revelation of intracranial tumors would play a significant role in decreasing the fatality rate and would offer better treatment to the patients. But
33 PairNet: A Deep Learning-Based Object Detection and Segmentation …
425
manual segmentation is arduous, complex and demanding task as loads of MRI images are generated in a hospital. An automated anomaly detection system would be a boon for many people. It was observed that object detection models generate bounding boxes over the target object, whereas segmentation models do the function of generating the bounding box as well as mask the target object but with minimal accuracy. There was a need for highly accurate segmentation system specific to the medical domain. Though deep learning shows positive results in terms of accuracy, the dependence of these algorithms on the high-end machines is heavy; on the other hand, the traditional image processing and machine learning algorithms are optimized enough to work on low power machines. The reason behind it is the large amount of matrix multiplication operations inherited in the deep learning algorithms. Need of optimize models which can run on low-end devices is need of the modern deep learning era.
2 Proposed System The paper proposes a custom pipeline called PairNet: a deep learning-based object detection and segmentation pipeline especially designed for medical images. PairNet comprises various steps for detecting the target feature in the given image. It utilizes majorly two deep learning-based approaches for object detection and object segmentation. The process of finding the region of interest (ROI) within an image is called object detection. The result is a H×W sized bounding box around the desired object inside the image, where H and W are the height and width of the bounding box, respectively, which can be used for further analysis of the target feature. However, a bounding box is not sufficient in telling the shape of the object as the bounding boxes are either rectangular or squarish in shapes. Here, object segmentation comes into play. Object segmentation categorizes each pixel value of an image into a particular class. It will create a pixel mask covering the area of the object. As a result, this technique is of immense use in medical studies where the granular details of the object at study have the utmost importance. PairNet comprises of four units namely, (1) (2) (3) (4)
Preprocessing unit Object detection unit Cropping unit Object segmentation unit
Figure 1 depicts the steps involved in the PairNet pipeline for processing of the input image.
426
A. Kale et al.
Fig. 1 General system diagram of PairNet
3 Methodology 3.1 Datasets Following publicly available standard datasets are used for the research study: • Fundus images datasets: (1)
(2) (3) (4) (5) (6)
MESSIDOR-DB [15]: It contains three individual datasets each consisting of 400 images. Images from these individual datasets have a resolution of 1440 × 960, 2240 × 1488, 2304 × 1536 pixels, respectively. DIARET-DB0 [16]: It has a total of 130 images. DIARET-DB1 [17]: It consists of 89 images. DRISHTI-DB [18, 19]: It consists of 101 images of high resolution. DRIONS-DB [20]: It has 110 images and have a resolution of 600 × 400. DRIVE-DB [21]: It consists of 40 images with a resolution of 584 × 565.
The MESSIDOR-DB consisting of 1200 images is utilized for training the object detection unit as it consists of high-resolution images as well as a substantially greater number of images compared to other datasets. As the MESSIDOR-DB comprises of three individual datasets each consisting of 400 images, one of these individual three datasets has been used to train the object segmentation unit. • Brain Tumor dataset: Dataset consists of the 155 brain MRI images which are shared publicly by Mr. Navoneel Chakrabarty on the Kaggle platform for the research and evaluation purposes under the original author’s license [22]. Out of these 155 images,
33 PairNet: A Deep Learning-Based Object Detection and Segmentation …
427
five images are ambiguous; hence, 150 images were considered for the experimental dataset. 80% of the images (i.e., 120 images) are used for training PairNet, and 20% of the images (i.e., 30 images) are used of testing.
3.2 PairNet Pipeline The proposed PairNet pipeline is designed for detection and segmentation of target features from medical images. The pipeline processes the input through four different units, namely preprocessing unit, object detection unit, cropping unit and the object segmentation unit. The object detection unit and the object segmentation unit utilize state-of-the-art deep learning techniques of Reinforcement YOLOv4-tiny and Mask R-CNN-lite. The step-by-step process in the pipeline architecture helps to localize the target feature with ease and thus improves the accuracy of segmentation of the object. The functioning of various modules in PairNet can be understood in the following way: Preprocessing Unit: System can accept image in any format like jpg, png, etc. Then, the image is converted into grayscale to remove red, green, and blue (RGB) channels from the image. This helps to eliminate color variances from image and model concentrates on other aspects of image rather than color and makes it less resource hungry to process further. Here, the Python program is developed which converts image into gray scale and convert all different types of extension to ‘.jpg’. Object Detection Unit: A preprocessed image of optic image is provided by the previous preprocessing unit in the pipeline to the ROI module. This preprocessed grayscale image helps to overcome the color variance of the retinal fundus image from various dataset. This helps to create and train a robust and generalized object detection model, without a color bias. The ROI calculation module utilizes the proposed object detection model which helps to detect and draw a bounding box on the region of interest. This helps to localize the target feature from whole image and thus makes the process of target feature segmentation simpler and more accurate. Cropping of ROI: The proposed object detection model detects area of interest, where exactly the optic disk is situated in the image and save it in the JSON file. For cropping, we utilize the detection information about the images which helps to crop the image. The coordinates stored in the JSON file helps to find the ROI to be cropped. Cropped image is stored on the disk and propagated to the next object detection module. Object Segmentation Unit: The cropped ROI of the optic disk is then fed into the Mask R-CNN-lite. FPN identifies the circular shape as the feature of the optic disk by applying various filters over the image. These feature maps are then passed over to ROIAlign Layer. ROIAlign is responsible for normalizing the different sizes of ROIs of optic disk into the same size. These ROIs and features are wrapped into vector features are further passed over to FCN and masking layers. FCNs generates
428
A. Kale et al.
the bounding boxes over the optic disks. Simultaneously, the masking layers generate the pixel-wise masks, covering the circular shapes of the optic disks.
3.3 Proposed architectures optimized for low computational resource usage PairNet pipeline is specifically built and optimized for low-powered devices such as CPU dependent devices, IoT devices, mobile phones, etc. and has significantly less computational resource usage and faster processing speeds. It consists of new custom proposed architectures for object detection as well as for object segmentation. YOLOv4-tiny [23] and Mask R-CNN [24] have been altered and optimized to work together to detect and segment the target object from the medical images. The proposed architectures for object detection and object segmentation are: • Reinforced YOLOv4-tiny: The object detection module utilizes a proposed Reinforced YOLOv4-tiny model based upon YOLOv4-tiny model for target localization. YOLOv4-tiny is an architecture based upon a YOLOv4 [23] architecture focused for faster computation and processing speeds. YOLOv4 utilizes the CSPDarknet53 backbone for feature extraction. This makes it a highly accurate state-of-the-art object detection algorithm for detecting large as well as smaller objects but requires significantly high computational resources and has slower prediction speed. YOLOv4-tiny depicted in Fig. 2 utilizes the CSPDarknet53 tiny architecture as the backbone for feature extraction. Here, the CSP Block separates the feature map into two parts and further merges them by a cross stage residual age. To minimize the computational resource usage, YOLOv4-tiny architecture utilizes leaky ReLU activation function rather than Mish in YOLOv4 to reduce the resources required [25]. Subsequently, the FPN helps to extract feature maps of two different scales and enhances the object detection speed. Furthermore, the fully connected layers are met to the final YOLO layers for bounding box prediction. This current architecture being fast but takes a hit on the accuracy of the prediction. To improve upon this problem, the proposed Reinforced YOLOv4-tiny architecture depicted in Fig. 3 comprises of three more convolutional neural network (CNN) layers. This helps to extract more information and features from the image and thus provide a better feature map to the next layers. This helps to improve the accuracy of the architecture with a minimal overhead and still being extremely fast and significantly faster than YOLOv4 architecture. This makes it optimized for low-powered devices. Among the three CNN layers, a single layer has been added before the FPN which helps to add more information as the FPN is connected to both the feature scales; this placement of the CNN layer helps to propagate the extra dense information to both the feature scales thus contributing to both the YOLO layers predictions.
33 PairNet: A Deep Learning-Based Object Detection and Segmentation …
429
Fig. 2 YOLOv4-tiny architecture [25]
Fig. 3 Proposed Reinforced YOLOv4-tiny architecture
This added layer has a filter size of 128 and a stride value of 1. Subsequently, two more layers have been added to the upper 13×13 feature scale considering the input image size is 416×416, which furthermore allows to extract more information from the image before the prediction from the upper 13×13 YOLO layer. One of the CNN
430
A. Kale et al.
Fig. 4 Proposed Mask R-CNN-lite architecture [27]
layers has a filter size of 128 and a stride value of 1; and the other layer consists of a filter size of 512 and a stride value of 1. • Mask R-CNN-lite Mask R-CNN supports ResNet 50 and ResNet 101 (Fig 4). ResNet stands for Residual Networks. ResNet forms the backbone of the Mask R-CNN-lite model. ResNet is responsible for extracting the target object from the image. ResNet 50 suggests ResNet consisting of 50 convolutional layers whereas ResNet. 101 suggests ResNet with 101 convolutional layers [26]. Development of PairNet’s object segmentation module was first developed with ResNet 101. But the repetitive region of interest calculations from the object detection module and the object segmentation modules were compounding the time. Moreover, 101 convolutional layers are not necessarily needed to compute the ROI from an already computed ROI of reduced size. On the contrary, 50 convolutional layers are not enough to identify ROIs of complex target objects from the cropped image. Hence, the paper claims ResNet 90 architecture which is used as a backbone for Mask R-CNN-lite which is claimed to be Mask R-CNN-lite. Therefore, reduction in the CNN layers in ResNet results in an architecture which uses less computational resources and thus has faster processing speed and time. This characteristic of the proposed Mask R-CNN-lite model makes it optimized for low-powered devices. Once the ROIs are computed by ResNet 90, they are passed to the Feature Pyramid Network or FPN. FPN identifies distinctive features of the target object and generates feature maps. These feature maps are further passed over to the region proposal network or RPN. RPN utilizes ROI Pooling and ROIAlign in Mask R-CNN, where all the different sizes of the ROIs are normalized to generate ROIs of same sizes. Converting ROIs into same size makes further computations easier for the model. These new ROIs are wrapped into vector features and are then passed to
33 PairNet: A Deep Learning-Based Object Detection and Segmentation …
431
Fig. 5 Proposed Mask R-CNN-lite architecture
fully connected network (FCN) and masking layers. The FCNs generate bounding boxes over the target object based on the different classes of the object. The masking layers have two convolutional neural networks (CNNs) which are responsible for the generation of pixel-wise masks over the object [28, 29]. Figure 5 depicts the proposed architecture of Mask R-CNN-lite.
4 Experiment Results Testing of the models was carried on the following hardware and software: CPU: Intel Xeon @ 2.30GHz GPU: NVIDIA Tesla T4 CUDA-version: 11.0.221 Pillow-version: 8.0.1 The proposed pipeline has been tested on two different problems statements which are discussed as follows: • Optic disk detection and segmentation: Publicly available dataset MESSIDOR-DB has been used for training the deep learning-based modules in the pipeline. Furthermore, the DRIVE-DB, DIARETDB0, DIARET-DB1 and DRIONS-DB and DRISHTI-GS have been used for the validation of the pipeline. Figure 6 contains some sample images from the above said datasets. Table 1 describes the mean average precision (mAP) comparison of the proposed Reinforced YOLOv4-tiny model with the existing YOLO models on different datasets. It can be inferred from the comparison that the proposed model performs
432
A. Kale et al.
Fig. 6 PairNet pipeline in action Table 1 Comparison of proposed detection model with existing models Models
DRIVE DIARE T-DB1 DIARE T-DB0 DRISHTI DRIO NS BFLOPS
Reinforced 99.94 YOLOv4-tiny
100
99.19
100
100
7.59
YOLOv4
97.5
100
98.46
100
100
59.36
YOLOv4tiny
99.82
99.91
97.59
98.81
100
6.73
The bold data in the table corresponds to the proposed model whereas the rest correspond to existing models.
better or similar on the various datasets. This depicts the generalization and robustness of the model on unseen data. Also, while comparing the binary floating-point operations (BFLOPS) utilized by the respective models, it is seen that YOLOv4-tiny consumes almost 10% of computational resources and has better mAP as compared to YOLOv4. This makes the YOLOv4-tiny favorable for low- powered devices. The proposed Reinforced YOLOv4-tiny consumes 0.86 BFLOPS more compared to YOLOv4-tiny which is a negligible increment in the computational resource usage but has a significant better mAP score on the given datasets. While comparing the proposed detection model with the existing systems in Table 2, the model performs better or on par with the existing systems. It can be inferred that other systems tend to perform good on some dataset but fail to provide on others, whereas the proposed detection model excellent throughout various datasets. This
33 PairNet: A Deep Learning-Based Object Detection and Segmentation …
433
Table 2 Comparison of proposed detection model with existing models Systems
DRIVE
DIARET- B1
DIARET- B0
DRISHTI- S
DRIONS-DB
Reinforced YOLOv4-tiny
99.94
100
99.19
100
100
Mahfouz and Fahmy [30]
100
97.75
98.46
-
-
Ramakanth and Babu [31]
100
98.88
98.46
-
-
Sinha and Babu [32]
95
100
96.9
-
-
Lu et al. [33]
97.5
98.9
99.2
-
-
Xiong and Li [34]
100
97.8
99.2
-
-
Qureshi et al. [35]
100
94.02
96.79
-
-
Unver et al. [36] 100
98.88
96.92
-
-
Johannes Dietter 100 [37]
98.8
97.8
-
-
The bold data in the table corresponds to the proposed model whereas the rest correspond to existing models
again indicates the generalization and robustness of the proposed model compared to the existing systems. While comparing the proposed segmentation model with the existing model in Table 3, where the proposed model Mask R-CNN-lite performs exceptionally well with a higher speed and utilizes less computational resources, though with very few layers compared to the archetype Mask R-CNN and also gives respectable accuracy on the three datasets except one. This indicates the generalization, optimization and robustness of the proposed model compared to the archetype of Mask R-CNN which uses ResNet 50 or ResNet 101. Mask R-CNN and Mask R-CNN-lite used for obtaining the object segmentation results of PairNet compute the result on cropped images, whereas other researchers have computed their results on full images, i.e., on full retinal fundus images. Cropped images have lower features but have high dense information about the target feature compared to the original images from the datasets. This means the model computes results over a smaller number of pixels and features yielding better results and processing time. This suggests unfair comparisons of results of Mask R-CNN and Mask R-CNN-lite with the results of other researchers.
Table 3 Comparison of proposed detection model with existing models Models
DRIVE
DRISHTI
DIARET-DB1
DRIONS
Mask R-CNN-lite
100
100
97.7
100
Mask R-CNN
100
100
100
100
the bold data in the table corresponds to the proposed model whereas the rest correspond to existing models, it can be inferred that
434 Table 4 Respective mAPs of the modules in the pipeline
A. Kale et al. Unit
Model
Object detection Reinforced YOLOv4-tiny
Number of images
mAP
30
75.89
YOLOv4-tiny Object segmentation
Mask R-CNN-lite Mask R-CNN
76.44 30
86.66 33.33
The bold data in the table corresponds to the proposed model whereas the rest correspond to existing models
• Brain tumor detection and segmentation: Mr. Navoneel Chakraborty shared the dataset publicly of brain MRI images on the Kaggle platform for research and evaluation purposes. The dataset consisted of 155 images; out of these 155 images, five images were ambiguous; hence, 150 images were chosen as the total dataset. 80% of the images, i.e., (120 images) are used for training PairNet and 20% of the images (i.e., 30 images) are used for testing. Table 4 depicts the calculated mAP for both object detection and object segmentation modules of PairNet. The table also compares the mAP of PairNet’s Reinforced YOLOv4-tiny with original YOLOv4-tiny and Mask R-CNN-lite with archetype Mask R-CNN. Reinforced YOLOv4-tiny has similar accuracy compared to YOLOv4-tiny, whereas Mask R-CNN-lite performs exceptionally well compared to Mask R-CNN considering Mask R-CNN-lite consume less consumes less computational resources and has faster processing speed and time.
5 Conclusion The paper presents a deep learning-based pipeline system for detection and segmentation optimized for low-powered devices. The pipeline utilizes deep leaning-based object detection and object segmentation module for finding the region of interest and target feature segmentation. The paper proposes two architectures, namely Reinforced YOLOv4-tiny and Mask R-CNN-lite based upon Yolov4-tiny and Mask RCNN, respectively. The results discussed in the paper conclude that the proposed model has better, or on-par mAP scores compared to archetype architectures and other previous systems. Also, the proposed models have excellent generalization and a robust performance toward unseen data. The proposed models significantly have less computational resource usage making them suitable for low-powered and IoT devices and therefore improving its accessibility.
33 PairNet: A Deep Learning-Based Object Detection and Segmentation …
435
6 Future Scope Model accuracy can be increased with the help of more datasets and advanced training resources. In the future, many more anomalies can be detected by utilizing the pipeline. Depending upon the specific use cases more preprocessing and postprocessing modules can be added to the pipeline. The computational resource usage can be further improved to make it more accessible. The processing speed of the pipeline can also be improved.
References 1. Lin T, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss for dense object detection. In: ICCV, pp 2980–2988 2. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587 3. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real- time object detection with region proposal networks. In NIPS, pp 91–99 4. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask R-CNN. In: ICCV. IEEE, pp 2980–2988 5. Brain tumor: statistics, Cancer.Net Editorial Board, 11/2017. Accessed on 17th Jan 2019 6. Rajasekaran KA, Chinna Gounder C (2018) Advanced Brain Tumour Segmentation from MRI Images. In (Ed.), High-Resolution Neuroimaging - Basic Physical Principles and Clinical Applications. IntechOpen.https://doi.org/10.5772/intechopen.71416 7. Devkota B, Alsadoon A, Prasad PWC, Singh AK, Elchouemi A (2017) Image segmentation for early-stage brain tumor detection using mathematical morphological reconstruction. In: 6th international conference on smart computing and communications, ICSCC 2017, 7–8 December 2017, Kurukshetra, India 8. Song Y, Ji Z, Sun Q, Zheng Y (2016) A novel brain tumor segmentation from multi-modality MRI via a level-set-based model. J Signal Process Syst 87. https://doi.org/10.1007/s11265016-1188-4 9. Badran EF, Mahmoud EG, Hamdy N (2010) An algorithm for detecting brain tumors in MRI images. The 2010 International Conference on Computer Engineering & Systems, 368-373 10. Pei L, Reza S, Li W, Davatzikos C, Iftekharuddin KM (2017) Improved brain tumor segmentation by utilizing tumor growth model in longitudinal brain MRI. Proceedings of SPIE-the International Society for Optical Engineering, 10134, 101342L. https://doi.org/10.1117/12.225 4034 11. Dahab DA, Ghoniemy SSA, Selim GM (2012) Automated brain tumor detection and identification using image processing and probabilistic neural network technique. Int J Image Process Vis Commun 1(2):1–8 12. Othman MF, Ariffanan M, Basri M (2011) Probabilistic neural network for brain tumor classification. In: 2nd international conference on intelligent systems, modelling and simulation 13. Rajendran, A. & Raghavan, Dhanasekaran (2012) Fuzzy Clustering and Deformable Model for Tumor Segmentation on MRI Brain Image: A Combined Approach. Procedia Engineering. 30. 327-333. https://doi.org/10.1016/j.proeng.2012.01.868 14. Sobhaninia Z, Rezaei S, Noroozi A, Ahmadi M, Zarrabi H, Karimi N, Emami A, Samavi S (2018) Brain Tumor Segmentation Using Deep Learning by Type Specific Sorting of Images. ArXiv, abs/1809.07786 15. Decencière E et al (2014) Feedback on a publicly distributed database: the Messidor database. Image Analysis Stereology 33(3):231–234. ISSN 1854-5165 (MESSIDOR) 16. Kauppi T, Kalesnykiene V, Kamarainen J-K, Lensu L, Sorri I, Uusitalo H, Kälviäinen H, Pietilä J, DIARETDB0: evaluation database and methodology for diabetic retinopathy algorithms. Technical Report (PDF) 17. Kauppi T, Kalesnykiene V, Kamarainen J et al (2007) DIARETDB1. In: Proceedings of the British machine vision conference, UK
436
A. Kale et al.
18. Sivaswamy J, Chakravarty A, Joshi GD, Syed TA (2015) A comprehensive retinal image dataset for the assessment of Glaucoma from the optic nerve head analysis. JSM Biomed Imaging Data Papers 2(1):1004 19. Sivaswamy J, Krishnadas SR, Joshi GD, Jain M, Tabish AU (2014) Drishti-GS: Retinal image dataset for optic nerve head(ONH) segmentation. 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), 53-56 20. Carmona EJ, Rincón M, García-Feijoo J, Martínez-de-la- Casa JM (2008) Identification of the optic nerve head with genetic algorithms. Artif Intell Med 43(3):243–259 21. Staal JJ, Abramoff MD, Niemeijer M, Viergever MA, van Ginneken B (2004) Ridge based vessel segmentation in color images of the retina. IEEE Trans Med Imaging 23(4):501–509 (DRIVE) 22. Chakrabarty N (2019, April) Brain MRI images for brain tumor detection, Version 1. March 2021 from https://www.kaggle.com/navoneel/brain-mri-images-for-brain-tumor-detection 23. Bochkovskiy A, Wang C, Liao HM (2020) YOLOv4: Optimal Speed and Accuracy of Object Detection. ArXiv, abs/2004.10934 24. He K, Gkioxari G, Dollar P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397. https://doi.org/10.1109/tpami.2018.2844175 PMID: 29994331 25. Jiang Z, Zhao L, Li S, Jia Y (2020) Real-time object detection method based on improved YOLOv4-tiny. ArXiv, abs/2011.04244 26. Sharma P (2019) Computer vision tutorial: implementing mask R-CNN for image segmentation (with python code). https://www.analyticsvidhya.com/blog/2019/07/computer-vision-implem enting-mask-r-cnn-image-segmentation/ 27. Hui J (2018) Image segmentation with mask RCNN. https://jonathan-hui.medium.com/imagesegmentation-with-mask-r-cnn-ebe6d793272 28. Khandelwal R (2019) Computer vision: instance segmentation with mask R-CNN. https:// towardsdatascience.com/computer-vision-instance-segmentation-with-mask-r-cnn-798350 2fcad1 29. Zhang X (2018) Simple understanding of mask RCNN. https://medium.com/@alittlepain833/ simple-understanding-of-mask-rcnn-134b5b330e95 30. Mahfouz AE, Fahmy AS (2009) Ultrafast localization of the optic disc using dimensionality reduction of the search space. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp 119. https:// doi.org/10.1007/978-3-642-04271-3 31. Ramakanth SA, Babu RV (2014) Approximate nearest neighbor field based optic disk detection. Comput Med Imaging Graph 38:49–56. https://doi.org/10.1016/j.compmedimag.2013.10.007 32. Sinha N, Babu RV (2012) Optic disk localization using L1 minimization. In: 2012 19th IEEE Interntional conference on image process, IEEE 2829–2832. https://doi.org/10.1109/ ICIP.2012.6467488 33. Lu S, Lim JH (2011) Automatic optic disc detection from retinal images by a line operator. IEEE Trans Biomed Eng 58:88–94. https://doi.org/10.1109/TBME.2010.2086455 34. Xiong L, Li H (2016) An approach to locate optic disc in retinal images with pathological changes. Comput Med Imaging Graph 47:40–50. https://doi.org/10.1016/j.compmedimag. 2015.10.003 35. Qureshi RJ, Kovacs L, Harangi B, Nagy B, Peto T, Hajdu A (2012) Combining algorithms for automatic detection of optic disc and macula in fundus images. Comput Vis Image Underst 116(1):138–145. https://doi.org/10.1016/j.cviu.2011.09.001 36. Ünver H, Kökver Y, Duman E, Erdem O (2019) Statistical edge detection and circular hough transform for optic disk localization. Appl Sci 9:350. https://doi.org/10.3390/app9020350 37. Dietter J, Haq W, Ivanov IV, Norrenberg LA, Völker M, Dynowski M, Röck D, Ziemssen F, Leitritz MA, Ueffing M (2019) Optic disc detection in the presence of strong technical artifacts. Biomed Signal Process Control 53:101535. ISSN17468094. https://doi.org/10.1016/ j.bspc.2019.04.012 https://www.sciencedirect.com/science/article/pii/S1746809419301090
Chapter 34
Analysis of Electrocardiogram Signal Using Fuzzy Inference Evaluation System J. S. Karnewar and V. K. Shandilya
1 Introduction Electrocardiogram (ECG) signals are crucial components of the cardiovascular diseases (CVDs) detection frameworks. In case of ECG signal analysis and classification, performance evaluation is a scientific and systematic research problem. In machine learning ECG signal model, performance metrics with reference to error estimators such as root mean squared error (RMSE) and/or the mean absolute error (MAE) are treated as definite pattern in automated characterization of observed data from testing data [1]. The root mean squared error (RMSE) is mostly opted as a caliber metric to compute ECG signal error removal model performance. World Health Organization (WHO) stated that cardiovascular diseases (CVDs) is having substantial contribution in global mortality rate with approximate holding of 17.9 million people who died each year from CVDs and total estimated value of 31% of all deaths worldwide [2–5]. The key motivation behind this work lies to improve the automated minimization of root mean squared error (RMSE) in ECG signal processing using fuzzy inference evaluation model by achieving a robust and reliable system. Because manual interpretation and diagnosis of threaten cardiovascular diseases affected electrocardiogram (ECG) signals by cardiologist may not be visualized and analyzed comprehensively. So, there is need to adopt a robust and reliable computer-aided model for fast and highly accurate detection of CVDs. Hence, we proposed fuzzy inference evaluation approach for ECG signal analysis can be beneficial for forming the adaptive clusters and removal of root mean squared errors for accurate prediction of abnormalities in ECG signal.
J. S. Karnewar (B) · V. K. Shandilya Sipna College of Engineering & Technology, Amravati, Amravati (MS), India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_34
437
438
J. S. Karnewar and V. K. Shandilya
Fig. 1 Waveform of a normal ECG [6]
Electrocardiogram (ECG) signal can be obtained by a non-invasive technique, and it is a vital therapeutic agent which can be deployed for taking intelligent healthcare prediction of various CVDs [6] (Fig. 1). An ECG signal is made up of monotonous repetitions of the phrase “PQRST.” To create the “P” wave, a crust is formed from the linear signal at first. The downward deflection of the descending linear wave is characterized as a “Q” wave. Just beyond the Q wave, a quick vertical deflection forms a high cone, which is the “R” wave. The minor downward deviation on its way forms “S” wave. The “T” wave is a conspicuous hinge occurs after the S wave which is the end mark of a segment of the ECG signal.
2 Literature Review Kyriakidis et al. [7] discussed advantages and limitations of 24 error measures from the previous literatures. They used the lower and upper bounds of the confidence interval (CI) in order studied 24 metrics used in air quality forecasting. On the basis of existing studies, they proposed a new forecasting performance indices that can be more confident and ideal in the estimation of the forecasting performance than using a single measure. Shcherbakov et al. [8] presented 20 error measures related to forecast in their survey literature. They grouped the error measurements in the seven groups as absolute errors, percentage errors, symmetric errors, relative errors, scaled errors and other errors. They presented the formulas to evaluate the measures and elaborate on the drawbacks for every accuracy measurements. They proposed integral normalized mean square error to reduce the impact of outliers. They provided the recommendation to select multiple error measures considering the fact where single
34 Analysis of Electrocardiogram Signal Using Fuzzy Inference …
439
error measure has the disadvantages that can mislead to inaccurate evaluation of the results. Prasath et al. [9] discussed 54 error measures and effects of these measures in machine learning environment using K-Nearest Neighbor (KNN) classifier. Deza et al. [10] compiled various distance metrics from varied knowledge-based domains. Acharya et al. [11] developed automated diagnostic system for detection of myocardial infarction. For this, they used three methods such as discrete wavelet transform (DWT), empirical mode decomposition (EMD) and discrete cosine transform (DCT) to acquire respective detailed coefficients. They used locality preserving projection (LPP) data reduction method to reduce these coefficients. The limitation of this work is that they have relied on only a single error measure for detection of the diseases. Adam et al. [12] extracted relative wavelet of four nonlinear features such as fuzzy entropy, sample entropy, fractal dimension and signal energy from DWT coefficient. Zhang et al. [13] suggested a combination of statistical metrics that consist of standard deviation, skewness, kurtosis, distribution of forecast errors, entropy and RMSE to evaluate the correctness of the system. Before they searched for a comprehensive set of metrics for evaluating the performance of solar power forecasting system. Tian et al. [14] proposed the development of measures set in hybridization manner for error metrics that is a continuous research strategy. They focused on studies on informational relationships of metric to avoiding redundancy involved in the metrics that has to compile into a hybrid set.
3 Database The ECG signal used in this study is from Physionet [15] retrieved from MIT-BIH Arrhythmia database. The National Institutes of Health’s (NIH)-funded Physionet Complex Physiologic Signals as research resource designed to promote and ease research into cardiovascular and other complex biological signals [16]. The database contains 48 records which contain data from two sensors, i.e., MLII and V1 had recorded. The sampling frequency had kept as 360 Hz per channel, and 11-bit resolution had maintained over 10 mV range. In particular, most frequent reference in this search is MLII, where the signal ECG morphology is clearly visible. The MLII led, and the normal sinus rhythm is allowed for evaluation. Each file includes ECG signals considering two channels for the duration 30 min and chosen having 24-h recordings of 48 distinct people.
4 Preprocessing Preprocessing of ECG signals is crucial as ECG signals generally consist of various types of drift called as noise as well as various types of artifacts [17]. During the preprocessing step, our main objective is to reduce or overcome on this noise so that we can able to get the proper de-noised signal which will help to decide the fiducially
440
J. S. Karnewar and V. K. Shandilya
points (P, Q, R, S, T), its event, non-event phenomenon such as P wave, ORS complex, T wave, PQ segment, ST segment. Typical types of noise may have categories such as power line intrusion, baseline wander and noisy contact data of electrode, electrode motions artifacts, muscles contraction and instrumentation noise.
4.1 Removal of Powerline Interference The definition of a filter represented by a sophisticated series of zeros is a very new method to power line interference reduction. The notch filter (second-order transformation) function is as follows. H (z) = (1 − z 1 z −1 )(1 − z 2 z −1 ) = 1 − 2cos(ω0 )z −1 + z −2 Here, notch filter is having transfer function of polynomial in z-plane; hence, z transform of points is taken. Because this filter does have a notch by means of a fairly huge throughput, it can modulate power line pressure and also electrocardiogram bandwidth filters with wavelengths of almost 0. Therefore, the filter has to be adjusted such that the knot is more efficient. The range of the notch is identified and decided by the polar radius r and decreased as the group size approach r. Practical result of this observation is that a residual current produces a ringing artifact in the out signal in the transmitter. This filtering may occur after the transient for functional filtering; this imitates the small-amplitude cardiovascular activity that often exists in the intermediate section of the QRS system, i.e., the delayed ability.
4.2 Removal of Baseline Wander FIR filter is based on impulse response in finite duration because it resolves to zero in finite time. The equation of FIR filter is stated by Trimale and Chilveri [18], Y (n) =
N −1
h(k) ∗ X (n − k)
0
where X(n) represents the input sample, Y (n) represents the output sample, h(k) represents the filter coefficient, and N indicates the filter order. Every value of the weighted sum is the output for input values.
34 Analysis of Electrocardiogram Signal Using Fuzzy Inference …
441
4.3 Minimizing EMG Noise The EMG disturbance has a higher frequency distortion; thus, an n-point rolling normal (MA) filter could be employed to eliminate or at least lessen EMG interference in the beginning of the electrocardiogram transmissions. For equations, the MA filter has the following form. P(t) =
c
U k ∗ q(t − x)
b=0
p(t) is the real signal being considered and x is the small residue. Filter entry and exit are represented by q and p, respectively. The Uk quantities b = 0, 1, 2. t, where t is the filter order, are filter parameters, or press values. The sums of the filter parameters reflect the result of dividing by the sample size used (t + 1).
4.4 Segmenting ECG Signals High-pass filter is used to decompose the signal where detail coefficients taken from the high-pass filter and approximation coefficients from low-pass filter given as outputs [19]. We used the Daubechies-6 series wavelet function and decomposed the ECG signals to seven levels in this study [20]. Figure 2 depicts the decomposition of a typical normal ECG waveform to seven levels using DWT. As a result, analysis can be done using distinct features extracted from these levels (Fig. 3). Using Nyquist’s rule, we can discard the half of the samples because half of the signal’s lower-bound frequencies are removed [11]: ylow[n] =
∞
x[k]g[2n − k]
k=−∞
yhigh[n] =
∞
x[k]h[2n − k]
k=−∞
Here, subsampling is taken with the subsampling operator, (y ↓ k)[n] = y[kn] Fig. 2 Block diagram of filter analysis
442
J. S. Karnewar and V. K. Shandilya
Fig. 3 DWT decomposition of normal ECG signals up to seven levels
34 Analysis of Electrocardiogram Signal Using Fuzzy Inference …
443
Above summation equation can be bifurcated more concisely as, ylow = (x ∗ g) ↓ 2 yhigh = (x ∗ h) ↓ 2 We will use the percent of cross-correlation formula [11] to assess the correlation coefficient in between total decomposed signals separately with the unprocessed ECG signal in order to select the best coefficient: c = 100 ∗
N i=1
x(i) ∗ N i=1
y(i) x 2 (i) ∗
i=1 N
y 2 (i)
where x denotes original ECG signal and y denotes ECG signal reconstructed with detail coefficients.
5 Features Extraction In this work, the morphological features of ECG signal are extracted with respect to duration (in ms) and amplitude (in mv) measurement of ECG signal on twodimensional axis plane. The features extracted are in terms of ECG signal event and non-event phenomenon such as P wave, T wave, R peak value, QRS duration, ST duration, ST amplitude, PQ duration and PQ amplitude.
6 Methodology ECG signal analysis requires forming a historical rule base by learning the rule of supervised system. In the work, formation of historical rule base is performed by learning the rule “If–Then” of fuzzy inference system (FIS) which belongs to Takagi–Sugeno fuzzy model. Here, historical rule base comprises the classification of all ECG signal under assigned labels [21] (Fig. 4).
6.1 Fuzzifier Fuzzifier uses the membership function stored in the fuzzy knowledge base to transform crisp input to a linguistic variable.
444
J. S. Karnewar and V. K. Shandilya
Fig. 4 Architecture of fuzzy inference systems [21]
Crisp input consists of a set of linear items, such as ECG signals from various disorders. A linguistic variable is one that is made up of linguistic terms rather than numerical values. A membership function is a curve that describes the relationship between input values and membership values/degrees ranging from 0 to 1.
6.2 Inference Engine Inference engine is used to convert fuzzy input to fuzzy output, utilizing “If- Then” fuzzy rules. Now, fuzzy sets are the fuzzy output produced by an inference engine after applying fuzzy rules (Fig. 5).
Fig. 5 Inference engine fuzzy sets
34 Analysis of Electrocardiogram Signal Using Fuzzy Inference …
445
6.3 Defuzzifier Defuzzifier uses membership functions similar to those employed by the fuzzifier to transform the inference engine’s fuzzy output to crisp.
6.4 Fuzzy Evaluation System The fuzzy inference calculation is performed by using evalfis function in MATLAB. The syntax as below Out put = eval f is (I nput, Fismat, N um Pts) [Out put, I P P, O P P, A P P] = eval f is (I nput value, Fismat str uctur e, num Pts) The given function computes the output vector of the fuzzy inference system which is defined by the fismat structure, given input values and input in the condition where it is running with only one range variable. We have considered and evaluated the following arguments in evalfis function: Fismat: This file contains a matrix of ECG signal values that will be assessed as a FIS structure. NumPts: This argument specifies the number of sample points on which the membership functions should be evaluated over the input or output range. For this argument, the default value of 101 points is considered. Output: It is M-by-N output matrix, where M indicates the provided input values and N indicates the number of output variables belongs to FIS. IPP: It is the result for evaluated input using membership functions. This matrix is created by N in size with numRules which denotes the number of rules and N denotes the count of total input variables. OPP: This matrix is numPts by numRules*M in size, where number rules denote the count of rules. Here, M denotes the total number count of outputs.
7 Result As described in the preceding section, we have employed fuzzy inference evaluation system on ECG signal which is evaluated to reduce errors as shown in Figs. 6, 7 and 8. The use of RMSE is considered as specific purpose error metric calculation for numerical values prediction [22]. The formula is, k 1 (Pm − E m )2 RMSE = k m=1
446
Fig. 6 Adaptive clusters of ECG signals
Fig. 7 ECG signal error ratios
J. S. Karnewar and V. K. Shandilya
34 Analysis of Electrocardiogram Signal Using Fuzzy Inference …
447
Fig. 8 RMSE curves of ECG signals
where Pm are predicted values of variables, Em are the observations, and k is the number of observations available for analysis. In this work, we have focused on removal of RMSE from ECG signals because RMSE is a good performance measure, as we wanted to estimate the standard deviation (concept drift) of continuous and abruptly changing ECG signals (Table 1). It is observed from Table 1, RMSE is maximum for initial step size 0.02 with 32 iterations and minimum for initial step size 0.001 with 74 iterations, where userdefined radius or range of influence for subtractive clustering is 0.60 and 0.45, Table 1 Training result for FIS FIS
Initial step size
Training iterations
Training RMSE (%)
Range of radius r
1
0.001
74
1.83
0.45
2
0.01
15
6.23
0.45
3
0.02
32
7.11
0.60
4
0.11
25
2.65
0.40
5
0.12
15
4.45
0.65
6
0.002
37
4.43
0.45
7
0.13
43
3.24
0.50
8
0.17
67
2.65
0.55
9
0.19
64
2.75
0.60
10
0.20
31
4.50
0.50
448
J. S. Karnewar and V. K. Shandilya
respectively. Hence among all estimators, FIS 1 has highest accuracy due to the lowest obtained RMSE. It is clear from evaluation of ECG signals using fuzzy inference system, the high performance in RMSE value estimation can be achieved with minimum initial step size in compilation with maximum training iterations. We recommend to increase the training iteration for more accurate result. The consideration of RMSE measure metric for the ECG signals is suggested when the ECG signals are computed on time amplitude scale with crystal appearance of its event and non-event phenomenon morphological characteristics and systematically passed through the preprocessing procedures.
8 Conclusion Electrocardiogram (ECG) signal is a vital and non-invasive therapeutic agent which can be deployed for taking intelligent healthcare prediction of various cardiovascular diseases. However, because ECG signals are continuous and non-stationary in nature, and change rapidly due to noise, motion artifacts, concept drift and interferences, manual analysis and interpretation of ECG signals can be time-consuming, inaccurate and time-consuming effort for medical practitioners. Hence, in this paper, we have focused on dynamically changing streaming ECG signal data and proposed a fuzzy inference evaluation system (FIS) approach for analysis. The experiments are carried out with adaptive clustering methods on vital signal ECG of human body and applied the fuzzy inference evaluation system on it. The result shows that the fuzzy inference evaluation technique is more accurate for clustering and removal of errors in terms of RMSE metric calculations for ECG signal.
References 1. Neill SP, Reza Hashemi M (2018) Root mean square error (RMSE) in fundamentals of ocean renewable energy, science direct 2. World Health Organization (2021) Cardiovascular diseases (CVD’s). Information year 2021, Retrieved from https://www.who.int/en/news-room/fact- sheets/detail/cardiovasculardiseases-(cvds) 3. Mendis S, Puska P, Norrving B (2011) Global atlas on cardiovascular disease prevention and control, World Health Organization (WHO) in collaboration with the World Heart Federation and the World Stroke Organization, 155p 4. Mathers CD, Loncar D (2006) Projections of global mortality and burden of disease from 2002 to 2030. PLoS Med 3(11):e442. https://doi.org/10.1371/journal.pmed.0030442 5. Bloom DE, Cafiero ET, Jane-Lopis E (2013) The global economic burden of non-communicable diseases. World Economic Forum, Geneva, Switzerland 6. Patil DD, Mudkanna JG, Rokade D, Wadhai VM (2012) Concept adapting real-time data stream mining for health care applications. Journal of Springer, vol 166, pp 341–351. ISSN: 1867-5662 7. Kyriakidis I, Kukkonen J, Karatzas K, Papadourakis G, Ware A (2015) New statistical indices for evaluating model forecasting performance. 2015 Skiathos Island, Greece
34 Analysis of Electrocardiogram Signal Using Fuzzy Inference …
449
8. Shcherbakov MV, Brebels ANL, Tyukov AP, Janovsky TA, Kamaev VAE (2013) A survey of forecast error measures. World Appl Sci J (Inform Technol Mod Indus Educ Soc) 24:171–176. https://doi.org/10.5829/idosi.wasj.2013.24.itmies.80032 9. Prasath VB, Alfeilat HAA, Lasassmeh O, Hassanat A (2017) Distance and similarity measures effect on the performance of K-nearest neighbor classifier-a review 10. Deza MM, Deza E (2016) Encyclopaedia of distances, 4th edn. Springer, Heidelberg 11. Acharya UR, Fujita H et al (2019) Deep convolutional neural network for the automated diagnosis of congestive heart failure using ECG signals. Appl Intell 49:16–27 12. Muhammad A, Oh SL, Sudarshan VK, Koh JEW, Hagiwara Y, Hong TJ, San TR, Rajendra Acharya U (2018) Automated characterization of cardiovascular diseases using relative wavelet nonlinear features extracted from ECG signals. Comput Methods Progr Biomed 161:133–143 13. Zhang J, Florita A, Hodge BM, Lu S, Hamann HF, Banunarayanan V, Brockway AM (2015) A suite of metrics for assessing the performance of solar power forecasting. Sol Energy 111(157– 175):2015 14. Tian Y, Nearing GS, Peters-Lidard CD, Harrison KW, Tang L (2016) Performance metrics, error modeling, and uncertainty quantification. Mon Weather Rev 144(2):607–613 15. Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PCH, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE (2000) Physio bank, physio toolkit, and physio net: components of a new research resource for complex physiologic signals. Circulation 101(23):215–220 16. Huang S, Chuang B, Lin Y, Hung C, Ma H (2019)A congestive heart failure detection system via multi-input deep learning networks. In: 2019 IEEE global communications conference (GLOBECOM), Waikoloa, pp 1–6 17. Minku LL, Yao X (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633 18. Trimale M, Chilveri (2017) A review: FIR filter implementation. In: 2nd IEEE international conference on recent trends in electronics information & communication technology, 19–20 May 2017 19. Goswami JC, Chan AK, Fundamentals of wavelets theory, algorithms, and applications. Texas A&M University. Wiley 20. Mahmoodabbadi SZ, Ahmadian A, Abolhasani MD (2005) ECG feature extraction using Daubeches wavelts. In: Proceedings of the fifth IASTED international conference on visualization, imaging, and image proceedings, 7–9 Sept 2005 21. Abadeh MF, Saniee M (2011) A fuzzy classification system based on Ant Colony Optimization for diabetes disease diagnosis. Expert Syst Appl 38(12):4650–4659. ISSN: 0957-4174 22. Porumba M, Iadanzab E, Massaroc S, Pecchiaa L (2020) A convolutional neural network approach to detect congestive heart failure. Sci Direct Biomed Sig Process Control 55:101597 23. Sharma R, Ashish Kumar, Ram Bilas Pachori, Rajendra Acharya U (2019) Accurate automated detection of congestive heart failure using eigenvalue decomposition based features extracted from HRV signals. Bio- Cybernet Bio-Med Eng 39:312–327 24. Zhao Y, Xiong J, Hou Y et al (2020) Early detection of ST-segment elevated myocardial infarction by artificial intelligence with 12-lead electrocardiogram. Int J Cardiol. Accepted 30 Apr 2020 (Article in press), No. of pages 8
Chapter 35
Optical Flow Video Frame Interpolation Based MRI Super-Resolution Suhail Gulzar and Sakshi Arora
1 Introduction Three-dimensional images are vital for certain diagnoses, such as cranial cancer detection and lung cancer detection. Thus, high spatial resolutions can offer better diagnosis. Higher resolutions can also help other domains of medical imaging with better and more confident categorizations and segmentations. Magnetic resonance imaging (MRI) is used to scan various parts of the human body using the magnetic charge on the protons caused due to their rotation. In the presence of a strong magnetic field, the protons align in the direction of the magnetic field. When protons in this stage are introduced to a strong radio frequency pulse, the protons get realigned to either 90° or 180° to the magnetic field. When the pulse is turned off, the protons align themselves back in the direction of the magnetic field—while releasing electromagnetic energy [1]. The MRI machine can detect electromagnetic energy and, based on the intensity values, can tell the tissues apart. MRI divides the area under study into three-dimensional units called voxels, which are analogous to pixels in two-dimensional images. To capture the image in three dimensions, the voxel values are recorded by moving the electromagnetic sensor in all three dimensions [2]. Usually, these MRI scans are not captured in HR due to constraints which include—hardware limitations, the scans covering smaller areas of interest when captured in HR, HR scans having lower SSIM [3] values, HR scans taking longer durations to complete, and the need for the patients need to stay perfectly still for much longer durations.
S. Gulzar (B) · S. Arora School of Computer Science and Engineering, Shri Mata Vaishno Devi University, Katra, India e-mail: [email protected] S. Arora e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_35
451
452
S. Gulzar and S. Arora
In a different domain concerning videos, VFI methods are capable of generating new frames in between original frames given in a video to increase the overall number of frames. This increase in the number of frames has various applications such as slow-motion video and video compression [4]. The process of generating new frames using the original frames has gained a lot of traction in the last two decades due to its diverse applications. This has given rise to many new methods which, due to the advances in computational power, can be trained to approximate the generated frames to a high degree of accuracy. One such method of performing video frame interpolation is using optical flow. Optical flow is the motion of pixels in between two video frames. Such methods assume that the intensity of pixels stays consistent in between frames. This enables the processes to track the motion of objects in a video by noticing the flow in between successive frames [5]. Using the flow information between two or more frames, new frames can be generated in between the original frames. These new frames represent the intermediate position of the pixels in the flow. In recent years, models [6, 7] have been proposed which use neural networks to approximate the flow and generate new intermediate frames with high accuracy. This paper checks the viability of proposed method named 3D flow-based image interpolation—3DFBII for super-scaling MRI images by combining state-of-theart flow-based video frame interpolation methods with 3D MRIs. Using the flow information obtained from flow-based VFI on all three axes of a 3D image, new frames are approximated and generated, thus increasing the spatial resolution of the overall image.
2 Related Work 2.1 Super-Resolution of 3D Medical Images Substantial research has been done to overcome the spatial dimension constraints in 3D medical images. It has been studied that residual convolutional neural networks using 2D image stacks can reconstruct HR 3D volumes [8]. Using 3D deep neural network, HR brain MRIs have been generated from the LR variants utilizing patches of other HR brain images. The results of the experiment predicted that the CNN approach could lead to enhancement over spline interpolation and training the network with specific data gave much refined results rather than using trained models over natural images [9]. 3D Generative adversarial networks (GANs) were used to generate HR brain MRI from LR images [10]. Its basic framework included a discriminator which used adversarial loss as least squares for stabilize training and a generator, which uses loss function as a combination of least squares loss and content term based on mean square error and gradient of image. This model was named GAN Subpixel NN.
35 Optical Flow Video Frame Interpolation-Based …
453
A 3D densely connected CNN architecture was introduced named densely connected super-resolution network (DCSRN) [11] which used single input superspatial resolution (SISR) process to generate super-resolution 3D brain MRI— outperforming the other methods at the time. A 3D CNN implementation called multi-level densely connected super-resolution network (mDCSRN) [12] along with a generative adversarial network inferred that CNN with deep structures had a high number of parameters and was resource intensive. Thus, generative adversarial networks (GANs) guided training was done with mDCSRN. mDCSRN was supposed to infer quickly while GAN acted as the promoter for realistic looking HR generated frames. Similarly, a novel network of 3DSRCNNN [13] convolutional neural network acquired voxel super-resolution for computed tomography (CT) images. This method utilized different training strategies including adjustable learning rate, residual learning, gradient clipping, momentum sSGD and stacking of moderate network layers that increased the accuracy and reduced reconstruction time. The approach was beneficial in enhancement of the HR image of the CT image. DeepResolve [14] was used to learn the transformations based on residues between the HR thin slice knee MRI images and LR thick slice knee MRI images being at the same center location. It was inferred that DeepResolve outdid other methods in terms of PSNR, SSIM and RMSE when compared to tri-cubic or Fourier interpolation. 3D MRI are usually acquired in the form of orthogonal thick slices creating, high anisotropic voxels which results in decreased image quality. To restore it to a HR isotropic image, CNN utilizing patches from three orthogonal thick slice images were used [15]. Hardware limitations too cause difficulties in acquisition of anisotropic MRIs. To reconstruct these images to HR variants, CNN-based reconstruction method that is based on residual learning with long and short skip connections was defined. It was observed that there was significant improvement in the spatial resolution of the anisotropic image with high computational efficiency [16]. Residual network with fixed skip was explored so as to overcome the difficulty of network training and degenerated transmission capability in CNNs deep network structures [17]. This fixed skip-based algorithm reconstructs MR images by combining the global residual learning and shallow network-based local residual learning. Context-sensitive up sampling method based on residual CNN which was trained to learn organ-wise appearances and adopted semantically to input data [18]. Using the contextual information about the shape and appearance of distinct parts of the image resulted in super-scale images with sharp edges and fine scale on fetal scans. To enhance the output of the HR MRIs, another method consisted of the combination of two methodologies—LR images being processed by CNN to perform image restoration to get HR image and the quality of the image being restored by regularly space shifting mechanism over input images. Outcomes of the research thus show that the proposed method improves both the restored as well as residual image without additional computing time [19].
454
S. Gulzar and S. Arora
Existing SR methods necessitate the presence of training atlas in order to learn to transform LR image to HR images, but these atlases are not always readily available. Hence, a self-super-spatial resolution algorithm is employed that uses patches in blurred axial slice of image to create paired training data and finally uses Fourier burst accumulation to reconstruct the HR image [20]. DDSR is a 3D deep CNN and learns the nonlinear mapping between LR and HR MRI images. The model learns deconvolutions on multi-level features for up scaling MRI images to super-resolution. This method takes LR image patches as input to reduce complexity and accelerate super-resolution reconstruction [21]. Another form of recovering HR images in MRIs is single contrast which has no reference information and multi-contrast which applies HR image of another modality as a reference value. The model proposed here applies the single and multicontrast super-resolution simultaneously [22].
2.2 Flow-Based Video Frame Interpolation Extensive research has been done into video frame interpolation methods based on the flow of the pixels in between frames. The first such successful model was the SuperSlomo which used two-bidirectional flow estimation linear combinations while refining them to encode occlusion information into visibility maps. This was done to address the problem of objects suddenly appearing in between frames and the model not being able to approximate interpolation in those circumstances accurately [23]. The later model used the SuperSlomo model and used four successive frames to compute intermediate frames with an emphasis on better flow estimation and fusion of wrapped frames [24]. To improve the flow estimation, depth-aware video frame interpolation [6]— DAIN aimed at combining the optical flow with depth information of the objects along with hierarchical features extracted from neighboring pixels. Dain only needs two sequential frames to generate new intermediate frames. The depth information is used to detect occlusion of objects which proved to be a good method of accuracy predicting the flow of pixels when a new object behind the forefront object appeared in the scene. DAIN wraps the input frames, the depth information, and contextual features computed using optical flow for generating new intermediate frames. Real-time intermediate flow estimation [7]—RIFE uses a custom neural network called IFNet to better estimate the flow. RIFE first estimates the flow in between frames and then computes the wrapped frames which are then combined using a fusion map and residuals computed in between the frames to generate an intermediate frame. In comparison to DAIN, this method is much faster and can deal with video of much higher resolution while maintaining better quantitate results.
35 Optical Flow Video Frame Interpolation-Based …
455
3 Proposed Model To use state-of-the art flow-based video interpolation technologies (DAIN [6] and RIFE [7]) on 3D MRI images for SR, the 3D MRI scans need to represented in 2D frames varying along the time dimension. To get this representation, the 3D MRI is visualized at the origin of a three-dimensional plane such the edges of the 3D image align with the axis of the planes (Fig. 1). Now, using this representation, individual slices of the 3D scan can be selected and represented as 2D frames by keeping one axis value constant and traversing through the other two axes. For example, to get a 2D frame from the 3D MRI along the x-axis, the selection can be made using (Figs. 2 and 3) ⎡
at11 ⎢ at21 ⎢ framex (t) = ⎢ . ⎣ ..
at12 at22 .. .
··· ··· .. .
⎤ at1z at2z ⎥ ⎥ .. ⎥ . ⎦
at y1 at y2 · · · at yz
Fig. 1 Representation of a 3D image along three axes
Fig. 2 Taking a frame along x-axis
(1)
456
S. Gulzar and S. Arora
Fig. 3 Resultant frame at x =t
Similarly, slice of the 3D image can be selected on all y-axis using ⎡
a1t1 a1t2 ⎢ a2t1 a2t2 ⎢ frame y (t) = ⎢ . .. ⎣ .. . axt1 axt2
··· ··· .. .
⎤ a1t z a2t z ⎥ ⎥ .. ⎥ . ⎦
··· ··· .. .
⎤ a1yt a2yt ⎥ ⎥ .. ⎥ . ⎦
(2)
· · · axt z
And for z-axis frame selection, ⎡
a11t a12t ⎢ a21t a22t ⎢ framez (t) = ⎢ . .. ⎣ .. . ax1t ax2t
· · · ax yt
Using the above equations, any two subsequent frames can be selected from a given 3D MRI with their pixel values changes reflecting the movement in between the frames. On selection of two subsequent frames along the same axis, VFI techniques can be applied to generate new frames (Figs. 4 and 5). 3D flow-based image interpolation (3DFBII) applies this process of acquiring 2D frames from 3D image first along the x-axis using Eq. (1) and then uses flowbased VFI techniques to synthesize new slices in z-axis between the selected original
Along x axis
Along y axis
Fig. 4 Result of selecting images along different axes
Along z axis
35 Optical Flow Video Frame Interpolation-Based …
Original frame at t-1
457
Interpolated frame at t
Original frame at t+1
Fig. 5 Interpolation between two frames if done first along z axis
slices, essentially increasing the spatial resolution of the image along the x-axis. In this research, two state-of-the-art flow-based interpolation methods—DAIN [6] and RIFE [7]—were explored for SR. On completion of interpolation along x-axis, the same process of selecting 2D frames and performing VFI is done along the y-axis using Eq. (2) on the resultant image (Figs. 6 and 7).
Frame Interpolation
Original 3Dimage
2D slicesa longx-axis
Frames generated along x-axis
Fig. 6 First interpolation along x-axis (blue is source data and red is interpolated data)
Frame Interpolation
2-D slices from 3D image interpolated along x-axis
New frames generated along y-axis
Fig. 7 Second interpolation along y-axis (blue is source data and red is interpolated data)
458
S. Gulzar and S. Arora
Frame Interpolation
2-D slices from 3D image interpolated along x-axis and y-axis
New frames generated along z-axis
Fig. 8 Third interpolation along z-axis (blue is source data and red is interpolated data)
On completion of interpolation along y-axis as well, the same two-step process of selecting 2D frames and performing VFI is done also along the z-axis using Eq. (3) (Fig. 8). On completion of the two-step process along all three axes, the overall resolution of the 3D MRI is increased resulting in a SR 3D image. In this experiment, the flow-based VFI was used to get a SR 3D image with the scaling factor of 2, 3 and 4 (Fig. 9).
4 Experiment To check the functionality of 3DFBII, T2-weighted images from the Brain Tumor Segmentation Challenge 2020 (BraTS 2020) [25–27] dataset were used. The images in this dataset are resampled to 1 mm3 isotropic resolutions. The size of all the images in this dataset is 240 × 240 × 155. We did a random sampling of 50 images out of the dataset. To test the method, the interpolation was done with a scaling factor of 2 (2x)-by generating a single frame in between every two frames, scaling factor of 3 (3x)-by generating two frames in between every two sequential frames and scaling factor of 4 (4x)-by generating three frames in between every two sequential frames. For 2x super-resolution, we removed every other alternate frame before super-resolution. To compute the quantitative metrics, before applying 3DFBII, for 2x super-scale, every alternate frame of the image was removed using Eqs. (1), (2) and (3). Similarly, for 3x and 4x super-scale, two frames and three frames were removed for each selected frame sequentially, respectively. For 3x and 4x super-scale, it was noticed that RIFE on visual inspection did not produce acceptable results with images being too distorted. It is hypothesized that this may be caused due to very small 3D image sizes available for the interpolation (81 × 81 × 53 for 3× and 61 × 61 × 40 for 4x). Thus, the quantitative metrics for RIFE for 3x and 4x have not been considered.
35 Optical Flow Video Frame Interpolation-Based …
459
Fig. 9 Process flow depicting the complete 3D frame-based image interpolation process
5 Result For our quantitative measures, we used peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) [3] to compare the SR MRI with the original 3D image. The results have been compared with other 3D super-scale methods and models (Table 1). Table 1 3DFBII versus previous models for super-scale Factor
Others
3DFBII (ours)
Bicubic [13]
GAN Subpixel 3DDSRCNN [13] RIFE [6] NN [10]
DAIN [7]
PSNR
SSIM
PSNR
SSIM
PSNR
SSIM
PSNR
SSIM
PSNR
2x
35.28
0.950
39.28
0.984
40.00
0.987
40.65
0.976
40.45
0.968
3x
32.54
0.879
–
–
35.02
0.929
–
–
39.02
0.950
4x
29.55
0.738
33.58
0.958
32.47
0.849
–
–
38.20
0.931
SSIM
460
S. Gulzar and S. Arora
In the results, it is apparent that 3DFBII outperforms the previously proposed methods for almost all the scaling factors. In 2x, the highest PSNR is of 3DFBII with RIFE closely followed by the DAIN alteration. Though in 2× SR, the highest SSIM is of 3DDSRCNN. For 3× and 4x, 3DFBII with DAIN scores the highest PSNR and SSIM, thus outperforming the previous methods across both metrics (Figs. 10 and 11).
Fig. 10 PSNR values boxplot 3DFBII
Fig. 11 SSIM values boxplot 3DSBII
35 Optical Flow Video Frame Interpolation-Based …
461
6 Conclusion This paper proposes a novel method for 3D MRI SR named 3DFBII and explored its viability using of state-of-the art flow-based VFI. The results concluded by this method outperforming other voxel-based and other case specific methods and models. These results demonstrate that this model can have a real-world implementation for upscaling 3D MRIs. During the process, 3DFBII was not trained to be case specific. Thus, it is hypothesizing that this method can also be used for SR outside the medical domain.
References 1. Weishaupt B, Köchli D, Victor D, Marincek (2006) How does MRI work? An introduction to the physics and function of magnetic resonance imaging 2. Huppertz HJ, Wellmer J, Staack AM, Altenmüller DM, Urbach H, Kröll J (2008) Voxelbased 3D MRI analysis helps to detect subtle forms of subcortical band heterotopia. Epilepsia 49(5):772–785. https://doi.org/10.1111/j.1528-1167.2007.01436.x 3. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612. https://doi.org/10. 1109/TIP.2003.819861 4. Wu CY, Singhal N, Krähenbühl P (2018) Video compression through image interpolation. Lecture notes computer science (including Subseries lecture notes artificial intelligence lecture notes bioinformatics), vol 11212. LNCS, pp 425–440. https://doi.org/10.1007/978-3-03001237-3_26 5. Turaga P, Chellappa R, Veeraraghavan A (2010) Advances in video-based human activity analysis: challenges and approaches, 1st ed, vol 80, no. C. Elsevier Inc. 6. Bao W, Lai WS, Ma C, Zhang X, Gao Z, Yang MH (2019) Depth-aware video frame interpolation. In: Proceedings of IEEE computer social conference computer vision pattern recognition, vol 2019, pp 3698–3707. https://doi.org/10.1109/CVPR.2019.00382 7. Huang Z, Zhang T, Heng W, Shi B, Zhou S (2020) RIFE: real-time intermediate flow estimation for video frame interpolation 2020 (Online). Available: http://arxiv.org/abs/2011.06294 8. Oktay O et al (2016) Multi-input cardiac image super-resolution using convolutional neural networks. Lecture notes computer science (including subseries lecture notes artificial intelligence lecture notes bioinformatics), vol 9902. LNCS, pp 246–254. https://doi.org/10.1007/ 978-3-319-46726-9_29 9. Pham CH, Ducournau A, Fablet R, Rousseau F (2017) Brain MRI super-resolution using deep 3D convolutional networks. In: Proceedings of international symposium biomedical imaging, pp 197–200. https://doi.org/10.1109/ISBI.2017.7950500 10. Sánchez I, Vilaplana V (2018) Brain MRI super-resolution using 3D generative adversarial networks, arXiv, no. Midl, pp 1–8 11. Chen Y, Xie Y, Zhou Z, Shi F, Christodoulou AG, Li D (2018) Brain MRI super resolution using 3D deep densely connected neural networks. In: 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018), pp 739–742. https://doi.org/10.1109/ISBI.2018.8363679 12. Chen Y, Christodoulou AG, Zhou Z, Shi F, Xie Y, Li D (2020) MRI super-resolution with GAN and 3D multi-level densenet: smaller, faster, and better, arXiv, no. Debiao Li 13. Wang Y, Teng Q, He X, Feng J, Zhang T (2019) CT-image of rock samples super resolution using 3D convolutional neural network. Comput Geosci 133(24):104314. https://doi.org/10. 1016/j.cageo.2019.104314
462
S. Gulzar and S. Arora
14. Chaudhari AS et al (2018) Super-resolution musculoskeletal MRI using deep learning. Magn Reson Med 80(5):2139–2154. https://doi.org/10.1002/mrm.27178 15. Jurek J, Koci´nski M, Materka A, Elgalal M, Majos A (2020) CNN-based superresolution reconstruction of 3D MR images using thick-slice scans. Biocybern Biomed Eng 40(1):111– 125. https://doi.org/10.1016/j.bbe.2019.10.003 16. Du J et al (2020) Super-resolution reconstruction of single anisotropic 3D MR images using residual convolutional neural network. Neurocomputing 392:209–220. https://doi.org/10.1016/ j.neucom.2018.10.102 17. Shi J et al (2019) MR image super-resolution via wide residual networks with fixed skip connection. IEEE J. Biomed Heal Inform 23(3):1129–1140. https://doi.org/10.1109/JBHI.2018.284 3819 18. McDonagh S et al (2017) Context-sensitive super-resolution for fast fetal magnetic resonance imaging. Lecture notes computer science (including Subseries lecture notes artificial intelligence lecture notes bioinformatics), vol 10555 LNCS, pp 116–126. https://doi.org/10.1007/ 978-3-319-67564-0_12 19. Thurnhofer-Hemsi K, López-Rubio E, Domínguez E, Luque-Baena RM, Roé-Vellvé N (2020) Deep learning-based super-resolution of 3D magnetic resonance images by regularly spaced shifting. Neurocomputing 398:314–327. https://doi.org/10.1016/j.neucom.2019.05.107 20. Zhao C, Carass A, Dewey BE, Prince JL (2018) Self super-resolution for magnetic resonance images using deep networks. Department of Electrical and Computer Engineering , The Johns Hopkins University, Baltimore , MD 21218 USA Department of Computer Science , The Johns Hopkins University , Baltimore , MD 2121,” Electrical Engineering System Science, no. Isbi, pp 365–368 21. Du J, Wang L, Gholipour A, He Z, Jia Y (2019) Accelerated super-resolution MR image reconstruction via a 3D densely connected deep convolutional neural network. In: Proceedings 2018 IEEE international conference bioinformatics biomedical BIBM, pp 349–355. https:// doi.org/10.1109/BIBM.2018.8621073 22. Zeng K, Zheng H, Cai C, Yang Y, Zhang K, Chen Z (2018) Simultaneous single- and multicontrast super-resolution for brain MRI images based on a convolutional neural network. Comput Biol Med 99(January):133–141. https://doi.org/10.1016/j.compbiomed.2018.06.010 23. Jiang H, Sun D, Jampani V, Yang MH, Learned-Miller E, Kautz J (2018) Super SloMo: high quality estimation of multiple intermediate frames for video interpolation. In: Proceedings IEEE computer social conference computing vision pattern recognition, pp 9000–9008. https://doi. org/10.1109/CVPR.2018.00938 24. Xu X, Siyao L, Sun W, Yin Q, Yang MH (2019) Quadratic video interpolation. arXiv, no. NeurIPS 25. Menze BH et al (2015) The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging 34(10):1993–2024. https://doi.org/10.1109/TMI.2014.2377694 26. Bakas S et al (2017) Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci Data 4 170117. https://doi.org/10.1038/sdata. 2017.117 27. Bakas S et al (2018) Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS Challenge, (Online). Available: http://arxiv.org/abs/1811.02
Chapter 36
An Efficient Hybrid Recommendation Model with Deep Neural Networks Saurabh Sharma
and Harish Kumar Shakya
1 Introduction Recommendation systems analyse consumer behaviour in order to predict which products will be of interest to potential customers. E-commerce giants like Amazon, eBay, Netflix, and YouTube, among others, acted as personalized consultants, influencing consumer purchasing behaviour significantly. Collaborative filtering (CF) can be used to promote complex objects without the need for explicit user files and items. As a result, CF gains popularity as a technical recommendation [1, 2]. Netflix, for example, uses CF to predict movie viewer ratings [3]. Neighbourhood approaches and latent component models are two of the most common collaborative filtration strategies [4, 5]. The neighbourhood approach analyses user/item similarities and is simple and intuitive. To uncover the underlying characteristics between objects and people from raw customer rating data, certain latent factor models use a factorization–factorization matrix. When there is a high correlation between objects and users, a suggestion for specific features is made. Latent factor matrix factor modelling has recently obtained a lot of attention because of the benefits of retaining accurate scaling data, low computational costs, and reducing the problem’s high sparsity levels [6–8]. The most recent proposals for matrix factorization techniques, on the other hand, are run in batch mode on the full database’s data. In batch mode, such matrix factorization–factorization systems are thus trained in a single pass, obviating the problem of information updating. In practice, though, new people and items frequently enter the recommendation system. The initial batch model may become old and unsuitable if the fresh input data are not used to update it.
S. Sharma (B) · H. K. Shakya Amity University, Gwalior, Madhya Pradesh, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_36
463
464
S. Sharma and H. K. Shakya
Different forms of incoming data are another major design challenge for a recommendation system. The majority of current matrix factorization–factorization methods are built with explicit and easily observable feedback data. However, because of user interactions, service providers must construct feedback forms to solicit consumer ratings, which takes time and is difficult to collect data. Implicit feedback data such as browser history and search habits, on the other hand, can be acquired more naturally. However, in order to translate implicit feedback to user preferences, a recommendation system incorporating implicit feedback must cope with issues [9]. For starters, implicit feedback is not always bad. Second, implicit input is inherently noisy, making it difficult to convert it to a significant preference value. Third, it’s difficult to accurately assess the implicit feedback evaluation of the recommender. As a result, the matrix’s factorization–factorization should be tweaked to incorporate implicit data feedback. The literature looks at two elements of matrix factorization–factorization recommendation methodologies: The original cost function has been changed to overcome the implicit feedback issue as a result of the implicit feedback feature [9]. Multiple kernel functions have been proposed in [10] to improve the accuracy of the recommendation during implicit feedback analysis. Local weighted matrix factorization–factorization (LWMF) to top suggestions with implicit feedback was provided by Wang et al. [11]. LWMF used the kernel function to improve the local properties of submatrices and the weight function for user preferences. Despite the fact that [9–11] can achieve high precision, the model for real-time updates has yet to be solved. Instead, [12] proposed a real-time updating strategy that used stochastic gradient descent (SGD) to solve the real-time updating problem, but only for explicit feedback recommendations. The authors of [13] used implicit feedback to solve a binary value update problem, but instead of using the real numerical implicit feedback value, they directly translated implicit feedback to binary values. As a result, the algorithm accuracy in [13] is low [14]. A learning technique based on element-wise alternating lowest squares (ALS) technology and variably weighted missing data to successfully optimize an MF model is suggested [15]. One-sided LS as a way to combine ALS and faster learning speeds is presented. When it comes to the updating process for a single element, however, ALS performs worse than SGD [17], leading in longer delays in real-time applications. As a result, for MF suggesting systems with implicit feedback, we combine the benefits of ALS and SGD into a hybrid real-time incremental stochastic gradient descent (RI-SGD) update technique. We are conducting research on IBM Streams, an analytic data streaming platform designed in real time by IBM [18–20], to evaluate the performance of RI-SGD and other update algorithms. Our findings show that RI-SGD can provide equivalent precision to all other update methods while using only 0.02% of the refresher time for the entire model. A lightweight, adaptive updating approach was proposed in a prior paper [21]. The RI-SGD recommendation system architecture, time-dependent cost functions, and an increasingly updated model are suggested based on previous studies. In addition,
36 An Efficient Hybrid Recommendation Model with Deep …
465
in Sect. 6, we discuss the effects of the number of suggestions N and the parameter attenuation os. We also use the standard discounted cumulative gain (NDCG) to demonstrate the recommendation quality of RI-SGD in contrast to other update methodologies. The rest of this paper is organized in the same way. The background and associated work of the factorization–factorization system of the implicit feedback matrix are described in Sect. 2. The difficulty of updating recommendation systems is formulated in Sect. 3. The MF recommendation systems RI-SGD updating technique for implicit feedback is shown in Sect. 4. The RI-SGD system is depicted in Sect. 5. The experiments and numerical results are shown in Sect. 6. Finally, we provide our final thoughts in Sect. 7.
2 Background Work In this part, we provide matrix factorization and the recommendations for implicit feedback on matrix factorization.
2.1 Approach for Matrix Factorization Compared with a recommendation approach based on similarities, which takes account of user-specific similarities, the recommendation method of matrix factorization is more memory-efficient and more precise [22]. The well-known Netflix competition [23] has employed the matrix factorization method to deal with the issue of rating prediction [5]. To recommend N items with f latent factors to M clients, the matrix-factorizing system memory cost is f − M + f − N, whereas the memory cost is M2 − 2 depending on similarity of recommendation algorithm. Matrix factorization breaks down, similar to a singular value decomposition (SVD), the user-item record matrix into two low-ranked matrices in which latent user and item factor factors can be kept. Matrix factorization can therefore locate articles with comparable content and implicit traits. As shown in Fig. 1, the M by N rating matrix is broken down into the user factor matrix X and the item factor matrix Y latent. As demonstrated in Fig. The factor vectors comprise item attributes such as genres of music, movie genre, tags in the Twitter message, etc. A high score of a client factor suggests that a client favours that particular attribute. For user u and element I, the matrix factorization recommendation system results in a user vector x u 2 Rf and the vector yi 2 Rf , respectively. The interaction between user u and item I is captured by xT u yi as shown in Fig. 2. Bob has a user vector in the figure [0:4; 0:6] with two
466
S. Sharma and H. K. Shakya
Fig. 1 Concept of the matrix factorization approach with M customers, N items, f latent features
Fig. 2 Illustration of the latent feature of users and items
different factors (dim1, dim2). Two artists are Beatles and Utada. They also have varied dim1 and dim2 values. The resulting dot product xT u yi demonstrates that Bob prefers two artists. One may note that if a user supports a particular artist, the user has a comparable location in the space. For matrix factorization, the main challenge is finding a mapping of users and items in factor vectors. The value in the rating matrix è l is applied as the training data to learn the factor vectors x u and yi and is decomposed into xT yi .
2.2 Implicit Matrix Recommendation System Feedback A recommendation system collects implicit feedback, such as transactions and playlist records, without the need for customer interaction. Consider the case of a music recommendation. If a customer hears a song frequently, he may enjoy the kind of music or the vocalist. The amount of times
36 An Efficient Hybrid Recommendation Model with Deep …
467
the suggestion system has been played may be considered tacit feedback data. The frequency of a particular song, on the other hand, is non-negative and has a very large dynamic range, making it difficult to tailor to a customer’s preferences. Explicit feedback, such as questionnaires and rankings, on the other hand, has fixed ratings that may easily express a customer’s preferences. Due to the complicated collection methodologies, however, the amount of explicit input is far less than the amount of inferred feedback. Because there is so much implicit feedback training data available, more and more real-time recommendation systems are using implicit feedback instead of explicit input. Because the value of implicit feedback cannot be directly correlated with user preferences, an explicit feedback matrix referral system must be modified, which is a difficult task. The cost function for matrix factorization was revised by the authors of [9] for implicit feedback data. The binary value pui was substituted for the original rating r ui in (1), and the trust weighing cui was as follows: minimise x∗,y∗
2 cui pui − xuT yi + λ xu 2 + yi 2 ,
(1)
where cui = (1 + = first); = first is an implicit feedback attenuation parameter; r ui is a rating value for explicit feedback; and pui is produced from a r ui binarization indicating the user’s preference for a rating item. The pui values are pui =
1, rui > 0 0, rui = 0.
(2)
M-to-N terms in the cost function can quickly approach a few billion. To put it another way, the training information is not lacking. It would be impossible to go over every single training case as SGD. ALS is capable of efficiently handling implicit feedback data. As a result, [9] proposed using an alternative less square with weight regularization (ALSWR) to solve the revised cost function with implicit feedback (2). In order to deal with implicit feedback data, the authors of [10] suggested regularized kernel matrix factorizing models that used the prediction equation for nonlinear mapping and the range of expected values with different kernel functions. The authors of [11] used implicit feedback to increase the matrix factorization calculation performance and lessen the problem of sparsity by decomposing the submatrix rather than the original matrix. The authors used the kernel function to characterize user preferences and introduced local weighted matrix (LWM) factorization, which combined local, low-rank matrix (LLORMA) approximation with weighted matrix factorization (WMF).
468
S. Sharma and H. K. Shakya
3 Problem Associated with Recommendation Updating User feedback is regularly generated in real-world applications. To preserve the correctness of the advice, an online updating model should be implemented. Most of the existing matrix factorization methods are built, however, in a batch mode needing new input data to retrain the entire model [9, 10]. It is time to retrain the whole model of recommendation. Therefore, it is important to build a computer-efficient updating mechanism for a recommendation system for factoring matrix in real time. Regarding the literature on real-time updating techniques, the authors suggested explicit rating-oriented factorization matrix (PMF) and ranking-oriented matrix factorization (RMF) in order to incrementally manage new rating values without retraining models. The authors showed that PMF and RMF algorithms were linearly scaled to the number of ratings observed and were comparable to the batch-trained method. [25, 26] developed the selection and forgetting of outdated information in order to make recommendations more precise. However, these systems cannot address tiny changes in feedback data, such as the preference of particular users. Huang et al. [27] have conceived a linear user and item latent vector transformation in time to generalize the incremental MF framework for explicit feedback. The authors of [13] have now presented an incremental SGD algorithm (ISGD) which allows implicit feedback to be converted into a binary rating matrix, as shown in Fig. 3. A prefilled copyright form is usually available from the conference website. Please send your signed copyright form to your conference publication contact, either as a scanned PDF or by fax or by courier. One author may sign on behalf of all of the other authors of a particular paper, providing permission has been given to do so. In this case, the author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. Digital signatures are not acceptable.
Fig. 3 Replace the rating matrix element with binary value
36 An Efficient Hybrid Recommendation Model with Deep …
469
4 Proposed Hybrid RI-SGD Algorithm Implicit feedback data are generated continuously, but only one user and one item are associated to each data. Although ALSWR is suitable for matrix factorization with implicit feedback, the prediction value for one user and one item still suffers from significantly more complex and worse functionality than SGD. The ALSWR and SGD are therefore combined into an implicit feedback system for the hybrid real-time incremental stochastic gradient descent (RI-SGD). Three main features are included in RI-SGD: (1) training model, (2) time-based cost function, and (3) update model incrementally (Fig. 4).
5 Proposed Hybrid RI-SGD Algorithm with Time Implicit feedback data are generated on a continual basis, yet each data set is associated with only one person and one object. ALSWR is suitable for matrix factorization with implicit feedback, but the prediction value for one user and one item is substantially more sophisticated and functional than SGD. For the hybrid real-time incremental stochastic gradient descent, the ALSWR and SGD are thus merged into an implicit feedback system (RI-SGD). RI-SGD has three primary features: (1) a training model, (2) a time-based cost function, and (3) incremental model updates (Fig. 5).
6 Conclusion In this study, we looked into the specific issues with implicit feedback data streaming in recommendation systems. We created a hybrid RI-SGD recommendation system by combining ALSWR and SGD for rapid matrix factorization model updates with implicit feedback. We developed a cost function for time-dependent matrix factorization and used SGD for progressive updates. Seven different updating mechanisms are compared in our experiments. We show that RI-SGD can achieve almost the same ALSWR precision while spending only 0.02% of the ALSWR retraining time in testing and research. The true, accurate matrix factorization model for streaming implicit feedback can be produced by differentiating and updating the associated latent vectors in RI-SGD. Different collaborative filtering algorithms should be included in the suggested incremental updating concept and technique in the future.
470
Fig. 4 Flow chart
S. Sharma and H. K. Shakya
36 An Efficient Hybrid Recommendation Model with Deep …
471
Fig. 5 Updating time in case I
References 1. Shapira BPBK, Ricci F, Rokach L (2011) Recommender systems handbook. In: Recommender systems handbook. Springer 2. Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: ACM international conference on world wide web 3. Netflix prize, https://en.wikipedia.org/wiki/Netflix Prize 4. Koren Y (2008) Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: ACM SIGKDD conference on knowledge discovery and data mining 5. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37 6. Chen C, Li D, Lv Q, Yan J, Chu SM, Shang L (2016) Mpma: Mixture probabilistic matrix approximation for collaborative filtering. In: International joint conference on artificial intelligence 7. Zeng G, Zhu H, Liu Q, Luo P, Chen E, Zhang T (2015) Matrix factorization with scale-invariant parameters. In: International joint conference on artificial intelligence 8. Wu L, Ge Y, Liu Q, Chen E, Hong R, Du J, Wang M (2017) Modeling the evolution of users preferences and social links in social networking services. IEEE Trans Knowl Data Eng 29(6):1240–1253 9. Hu Y, Koren Y, Volinsky C (2008) Collaborative filtering for implicit feedback datasets. In: IEEE international conference on data mining 10. Rendle S, Schmidt-Thieme L (2008) Online-updating regularized kernel matrix factorization models for large-scale recommender systems. In: ACM conference on recommender systems 11. Wang K, Peng H, Jin Y, Sha C, Wang X (2016) Local weighted matrix factorization for top-n recommendation with implicit feedback. Data Sci Eng 1(4):252–264 12. Ling G, Yang H, King I, Lyu M (2012) Online learning for collaborative filtering. In: International joint conference on neural networks 13. Vinagre J, Jorge AM, Gama J (2014) Fast incremental matrix factorization for recommendation with positive-only feedback. In: User modeling, adaptation, and personalization. Springer 14. He X, Zhang H, Kan M-Y, Chua T-S (2016) Fast matrix factorization for online recommendation with implicit feedback. In: International ACM SIGIR conference on research and development in information retrieval 15. Yu T, Mengshoel OJ, Jude A, Feller E, Forgeat J, Radia N (2016) Incremental learning for matrix factorization in recommender systems. In: IEEE international conference on big data 16. Funk S (2006) Netflix update: Try this at home
472
S. Sharma and H. K. Shakya
17. Paterek A (2007) Improving regularized singular value decomposition for collaborative filtering. In: Proceedings of KDD cup and workshop 18. Ballard C, Farrell DM, Lee M, Stone PD, Thibault S, Tucker S et al (2010) IBM InfoSphere streams harnessing data in motion. IBM Redbooks 19. Zikopoulos P, Eaton C et al (2011) Understanding big data: Analytics for enterprise class Hadoop and streaming data. McGraw-Hill Osborne Media 20. Ibm streams, http://www-03.ibm.com/software/products/en/ibm-streams 21. Tsai K-H, Lin C-Y, Wang L-C, Chen J-R (2015) Reconstruct dynamic systems from large-scale open data. In: IEEE global communications conference (GLOBECOM) 22. Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions. IEEE Trans Knowl Data Eng 17(6):734– 749 23. Bennett J, Elkan C, Liu B, Smyth P, Tikk D (2007) Kdd cup and workshop 2007. ACM SIGKDD explorations newsletter 24. Bell RM, Koren Y (2007) Scalable collaborative filtering with jointly derived neighborhood interpolation weights. In: IEEE international conference on data mining 25. Matuszyk P, Spiliopoulou M (2014) Selective forgetting for incremental matrix factorization in recommender systems. In: Discovery science. Springer 26. Matuszyk P, Vinagre J, Spiliopoulou M, Jorge AM, Gama J (2015) Forgetting methods for incremental matrix factorization in recommender systems. In: ACM symposium on applied computing 27. Huang X, Wu L, Chen E, Zhu H, Liu Q, Wang Y (2017) Incremental matrix factorization: a linear feature transformation perspective. In: International joint conference on artificial intelligence 28. Deshpande M, Karypis G (2004) Item-based top-n recommendation algorithms. ACM Trans Inf Syst (TOIS) 22(1):143–177 29. Apache mahout: scalable machine-learning and data-mining library. http://mahout.apache.org 30. Celma Herrada O (2008) Music recommendation and discovery in the long tail. Ph.D. dissertation 31. Last.fm, http://cn.last.fm/api 32. Karypis G (2001) Evaluation of item-based top-n recommendation algorithms. In: ACM international conference on information and knowledge management 33. Balakrishnan S, Chopra S (2012) Collaborative ranking. In: ACM international conference on web search and data mining, 1
Chapter 37
Fixed-MAML for Few-shot Classification in Multilingual Speech Emotion Recognition Anugunj Naman and Chetan Sinha
1 Introduction Emotion recognition plays a significant role in many intelligent interfaces [1]. Even with the recent advances in machine learning, this is still a challenging task. The main reason behind this is that most publicly available annotated datasets in this domain are small in scale, which makes DL models prone to overfitting. Another essential feature of emotion recognition is the inherent multimodality in expressing emotions [2]. Emotional information can be captured by studying many modalities, including facial expressions, body postures, and EEG [3]. Of these, arguably, speech is the most accessible. In addition to accessibility, speech signals contain many other emotional cues [4]. We, therefore, use speech signals as a base to predict the emotion. Generally, in speech emotion recognition (SER) task, conventional supervised learning solves the problem efficiently given sufficient training data. Several studies on SER for different single corpora have been conducted using the languagedependent optimal acoustic sets over several decades. Such systems can be analyzed in monolingual scenarios; changing the source corpus requires re-selecting the optimal acoustic features and re-training the system. Human-emotion perception, however, has proved to be cross-lingual, even without the understanding of the language used [5]. An SER system is expected to recognize emotions as such. However, for an automatic SER system to recognize emotion, there are two significant problems. First, the training corpus available for many different languages is very limited. Second, it is not clear which standard features are efficient in detecting emotions across different cultures. Commonalities and differences in human-emotion A. Naman (B) · C. Sinha Indian Institute of Information Technology, Guwahati, India e-mail: [email protected] C. Sinha e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_37
473
474
A. Naman and C. Sinha
perception across languages in the valence-activation (V-A) space have recently been studied [5]. It was revealed that direction and distance from neutral to other emotions are similar across languages, and languages’ neutral positions are language dependent. In this paper, motivated by the above challenges, we want to simulate a scenario where one can provide few labeled speech samples in any language and train a model on that language for few iterations to get a robust SER system. This proposed scenario removes the requirement for a large amount of data and identifies the standard features efficient in detecting emotions about that culture and fine-tuned to it accordingly. Supervised learning has been extremely successful in computer vision, speech, or machine translation tasks, thanks to improvements in optimization technology, larger datasets, and streamlined designs of deep convolutional or recurrent architectures. Despite these successes, this learning setup does not cover many aspects where learning is possible and desirable. One such instance is learning from very few examples in the so-called few-shot learning tasks [6]. Rather than depending on regularization to compensate for the lack of training data, researchers have explored ways to leverage the distribution of similar tasks, inspired by human learning [7]. A lot of useful solutions have been developed, and the most popular solution right now uses meta-learning. Meanwhile, most of the studies on few-shot learning are conducted on image tasks. We here attempt to apply those meta-learning solutions to SER systems. We formulate the problem mentioned above as a few-shot learning problem and analyze the performance of state-of-the-art model-level few-shot learning algorithms. Meta-learning, also known as ‘learning to learn,’ aims to make quick adaptation to new tasks with only a few examples. Recently, many different meta-learning solutions have been proposed to solve the few-shot learning problems. All these solutions differ in the form of learning a shared metric [8–11], a generic inference network [12, 13], a shared optimization algorithm [14, 15], or a shared initialization for the model parameters [16–18]. In this paper, we use the Model-Agnostic Meta-Learning (MAML) approach [16] because of the following reasons: 1. 2.
It is a model-agnostic general framework that can be easily used on a new task. It achieves state-of-the-art performance in existing few-shot learning tasks.
Few-shot learning is often defined as an N-way, K-shot problem where N is the number of class labels in the target task and K is the number of examples of each class. In most previous studies, it is assumed that all the N classes or labels are new. However, in real-life applications, these classes or labels are not necessary to be all new. Thus, we further define an N+F-way, K-shot problem where N and F are the numbers of new classes or labels and fixed classes, respectively. In this new devised task, the model has to classify among new classes and fixed classes. We propose this modification to the original MAML algorithm to solve this problem and call this new model F-MAML. We conduct our experiment on EmoFilm dataset [19] to simulate a scenario in SER. We compare our approach with two baseline approaches: the conventional supervised learning approach [20] and the MAML [21] approach. Experimental results show that F-MAML leads to obvious improvement over the supervised
37 Fixed-MAML for Few-shot Classification in Multilingual Speech …
475
learning approach and is even performing better than MAML. Our contributions in this paper are summarized here: 1. 2.
We analyze the feasibility of few-shot learning for training SER models. We propose an efficient way compared to MAML (F-MAML) to train future SER models for any language with few training examples.
The rest of the paper is presented in the following manner: In Sect. 2, we discuss our work’s background. In Sect. 3, we discuss our proposed method. In Sect. 4, experiments performed in detail and results are mentioned. In Sect. 5, we finally give a conclusion.
2 Background In this section, we first briefly introduce MAML, the base, and our solution’s motivation. Model-Agnostic Meta-Learning (MAML) is one of the most popular metalearning algorithms that aims to solve the few-shot learning problem. The main goal of MAML is to train a model initializer that can adapt to any new task using very few labeled examples and training iterations [16]. The model is trained across several tasks to reach this goal, and it treats the entire task as a training example. The model is required to face different tasks to get used to adapting to new tasks. In this section, we describe the MAML training framework. As is shown in Fig. 1, the optimization procedure consists of two stages: a meta-learning stage on the training data and a fine-tuning stage on the testing tasks.
2.1 The Meta-learning Stage Given that the target evaluation task is an N-way, K-shot task, the model is trained across a set of task T where each task T i is also an N-way, K-shot task. In each iteration, a learning task, i.e., the meta-task T i is sampled according to a distribution over tasks p(T ). Each T i consists of a support set S i and a query set Qi . Fig. 1 MAML algorithm learns a good parameter initializer θ ∗ by training across various meta-tasks such that it can adapt quickly to new tasks. Adapted from [16]
476
A. Naman and C. Sinha
Consider a model represented by a parametrized function f θ with parameter θ . θi is computed from θ through the adaptation to task T i . A loss function LSi ( f θ ), which is cross-entropy loss over support set examples, is defined to guide the computation of θi : LSi (fθ ) = −
yj log fθ (xj ).
(1)
(xj ,yj )∈Si
A one-step gradient update is as below: θi = θ − α∇θ LSi (fθ ).
(2)
Here, α is the learning rate, which can be a fixed hyperparameter or learned like the Meta-SGD [17]. The gradient here is updated for multiple steps. After this, the model parameters are optimized on the performance of f θi evaluated by the query set Qi with respect to θ . LQi (fθi ) is another cross-entropy loss over query set examples: yu log fθ xu . LQi fθi = −
(3)
xu ,yu ∈Qi
Broadly speaking, MAML aims to optimize the model parameters such that few gradient steps on a new task will ultimately lead to a maximally effective behavior on that new task. At the end of each training iteration, the parameters θ are updated as below: (4) θ ← θ − β∇θ LQi fθi Here, β is the learning rate of the meta-learner. To increase the stability of training, instead of only one task, a batch of tasks is sampled in each iteration. The optimization is performed by averaging the loss across the tasks. Thus, Equation (4) can be generalized to: θ ← θ − β∇θ
LQi fθi .
(5)
i
2.2 The Fine-tuning Stage A fine-tuning is performed before the evaluation. In an N-way, K-shot task, K examples from each of the N class labels are available at this stage in the target task’s
37 Fixed-MAML for Few-shot Classification in Multilingual Speech …
477
support set. The model trained above in the meta-learning stage will now be finetuned according to Equation (2) for a few iterations. The updated model will then be evaluated on the remaining unlabeled examples (the target task’s query set).
3 Proposed Method In the original MAML, it is assumed that all class labels in the target task are new class labels. However, these class labels do not necessarily need to be all new. In real-life applications, some of the class labels are known so that more examples of these class labels can be used in the meta-learning stage. This paper will call them fixed classes as we later fix their output positions in the neural network classifier. We call this task, which has to classify among new classes and fixed classes, an N+Fway, K-shot problem where N, F, K are the number of emotion class labels, fixed class labels, and examples of each class for fine-tuning, respectively. This problem of simultaneously classifying unseen and seen class labels has not been investigated in the original MAML. In our solution, we try to tackle the problem by proposing modifications to the MAML training framework. We believe that the N+F-way, Kshot problem is more realistic, and our modification to MAML applies to various tasks. We now describe our methodology for a few-shot SER task.
3.1 Methodology: F-MAML Although the N+F-way, K-shot problem can be regarded as a specific form of the normal N-way, K-shot problem, solving it with the original MAML framework will lead to a performance degradation. Using the prior information of the F fixed classes, we modify the MAML framework in the following aspects: 1. 2. 3.
We fix the output positions, i.e., the output at the end of classification for a random sample for the fixed classes in the neural network classifier. These fixed classes occur in every meta-task T i in the meta-learning stage. The adaptation of fixed classes is not needed in the fine-tuning stage as they have already been learned in the meta-learning stage.
The above three modifications to the original MAML make the proposed framework more effectively to real applications.
478
A. Naman and C. Sinha
3.2 Speech Emotion Recognition We formulate a scenario for SER as N+F-way, K-shot classification task. N is the number of emotions that one wishes to recognize, and one should provide K speech audio samples for each such emotion. Fixed labels here are silence and neutral. Figure 2 illustrates the framework of F-MAML approach. The target data contains audio samples from one language, not in source data, while source data contains audio examples from all other languages. The fixed classes are the same in target and source data. In the meta-learning stage, several N+2-way, K-shot meta-tasks are sampled from source data for each language. Each meta-task is similar to the target task. We expect to learn a model initializer that can adapt to the target task using the provided speech samples and emotion labels. We exclude the fixed class labels from the support set in both meta-learning and the fine-tuning stages. As we can assume the availability of more training examples of fixed classes, we can keep them in the meta-tasks’ query set in the meta-learning stage. Moreover, it can be seen that the positions of silence and the neutral classes are fixed to the last of network output (the
Fig. 2 Framework of our F-MAML approach for few-shot SER
37 Fixed-MAML for Few-shot Classification in Multilingual Speech …
479
orange area). Thus, we force our model to ‘recall’ the fixed classes without the need for adaptation. Algorithm 1 MAML approach for few-shot ASR
Algorithm 1 summarizes the details of our approach. The algorithm described here is based on the work of [16] but is different in terms of how sampling is done for the support set and the query set during the meta-learning stage, which is introduced in Sect. 3.1.
4 Experimentation 4.1 Dataset We conduct our experiments on EmoFilm dataset [19]. It consists of 1115 clips with a mean length of 3.5 s, resulting in 341 English audio clips, with an average of utterances per emotion; 410 Italian audio clips with an average of 41.3 utterances per emotion; and 356 Spanish clips with an average of 35.9 utterances per emotion (std 9). The higher number of Italian clips might be due to Italian being a more ‘emotionally expressive’ language; this could also relate to the pre-test made by Italian listeners, who may be better at perceiving emotions in their language [19]. The dataset is categorized into five emotion labels: happiness, sadness, anger, fear, and disgust. We formulate three 5-way, K-shot tasks using the same setup as the audio recognition
480
A. Naman and C. Sinha
Table 1 Dataset details Language
Total samples
Samples per emotion
English
341
72—fear, 50—disgust, 69—happiness, 76–anger, 74—sadness
Italian
410
83—fear, 68—disgust, 93—happiness, 73—anger, 93—sadness
Spanish
356
63—fear, 50—disgust, 76—happiness, 82—anger, 85—sadness
tutorial in official PyTorch documentation. Table 1 gives information about total samples for each emotion in each language. We perform three experiments here. 1. 2. 3.
The first experiment is SER in the English language, where we use the English language as a testing set, while Spanish and Italian are used in training. The second experiment is SER in the Italian language, where we use the Italian language as a testing set, while English and Spanish are used in training. The third experiment is SER in the Spanish language, where we use the Spanish language as a testing set, while English and Italian are used in training.
The testing language is unseen to the meta-learning stage, and only K-labeled examples of each label are available in the fine-tuning stage. The initialized model is fine-tuned on the labeled examples and evaluated on the unlabeled examples. The samples for silence class and neutral class were self-generated with a mean length of 3.5 s.
4.2 Model Setting The 3–4 s clips are sampled at 16 kHz. We use the mel-frequency cepstral coefficient (MFCC) features. For each clip, we extract 40-dimensional MFCCs with a frame length of 30ms and a frame step of 10ms. Convolution neural networks are adopted as the base model, which contain four convolutional blocks. Each block comprises a 3 × 3 convolutions and 64 filters, followed by ReLU and batch normalization [22]. The flattened layer after the convolutional blocks contains 576 neurons and is fully connected to the output layer with a linear function. We avoided using ResNet architecture because it overfitted very quickly. The model is trained with a mini-batch size of 16 for 5, 10, 20-shot classifications. We set the learning rate α to 0.1 and β to 0.001. The learning rates were found using a grid search.
4.3 Baselines We compare our proposed approach with two baseline approaches: the state-of-theart conventional supervised learning approach[20] which trains the model on the support set of the target task only and the state-of-the-art meta-learning approach
37 Fixed-MAML for Few-shot Classification in Multilingual Speech …
481
MetaSER [21], which treats the 5+2-way problem as a 7-way problem. In the evaluation, we sample K examples from each class for fine-tuning the model and 25 examples per label for evaluation. We do 100 times random tests and evaluate different approaches on accuracy.
4.4 Results We compare our approach with three baselines. Tables 2, 3, and 4 list the performance of 5, 10, and 20-shot tasks on SER in English, Spanish, and Italian languages, respectively. Not surprisingly, MetaSER, i.e., MAML-based approaches perform much better than conventional supervised learning in a few-shot learning situation. This improvement is because it provides a good initialization of a model’s parameters to achieve optimal fast learning on a new task with few gradient steps while avoiding overfitting that may happen when using a small dataset. Finally, our proposed approach F-MAML outperforms the MetaSER. We attribute the improvement to prior information of the fixed classes acting as anchor, which helps in efficient fine-tuning to new tasks compared to the MetaSER. Figure 3 shows the loss of MetaSER compared to F-MAML on 5-shot learning. It can easily be seen Table 2 Accuracy in 5-shot learning
Model
English (%)
Italian (%)
Spanish (%)
Supervised MetaSER
24.33
16.19
20.11
65.21
64.13
F-MAML
64.95
69.71
69.13
68.85
The bold signifies which model outperforms others in that column
Table 3 Accuracy in 10-shot learning
Model
English (%)
Italian (%)
Spanish (%)
Supervised MetaSERL
32.11
32.54
16.57
69.11
70.13
F-MAML
71.15
73.71
74.22
74.55
The bold signifies which model outperforms others in that column
Table 4 Accuracy in 20-shot learning
Model
English (%)
Italian (%)
Spanish (%)
Supervised
36.53
28.64
24.87
MAML
76.21
77.13
77.95
F-MAML
81.69
80.13
80.15
The bold signifies which model outperforms others in that column
482
A. Naman and C. Sinha
Fig. 3 Convergence comparison MAML (MetaSER) versus F-MAML
that F-MAML converges quickly and in fewer steps than the original MAML making the proposed approach more robust that MetaSER.
5 Conclusion In this paper, we simulated a scenario of SER as a few-shot learning problem. We define it as an N+F-way, K-shot problem and propose a modification to the ModelAgnostic Meta-Learning (MAML) algorithm where we kept F fixed to solve the problem. Experiments conducted on the EmoFilm dataset show that our approach performs the best compared to the baselines. In the future, we will attempt to test the feasibility of the approach on Indic languages and mandarin-derived languages since these languages differ vastly from each other. We also look for using image and text as well to make a multimodal system as well.
References 1. Picard RW (2020) Affective computing. MIT Press 2. Li J, Lee C (2019) Attentive to individual: a multimodal emotion recognition network with personalized attention profile. In: Kubin G, Kacic Z, (eds) Interspeech 2019, 20th annual
37 Fixed-MAML for Few-shot Classification in Multilingual Speech …
3.
4. 5. 6.
7. 8.
9.
10.
11.
12.
13.
14.
15.
16.
17. 18. 19.
20.
21.
22.
483
conference of the International Speech Communication Association, Graz, Austria, 15–19 Sept 2019. ISCA, 2019, pp. 211–215 Sebe N, Cohen I, Gevers T, Huang TS (2005) Multimodal approaches for emotion recognition: a survey. In: Santini S, Schettini R, Gevers T (eds) Internet imaging VI. International society for optics and photonics, vol 5670. SPIE, pp 56–67 Kim J, Saurous RA (2018) Emotion recognition from human speech using temporal information and deep learning. In: Proceedings of Interspeech 2018, pp 937–940 Li X, Akagi M (2016) Multilingual speech emotion recognition system based on a three-layer model. In: Interspeech 2016, pp 3608–3612 Satorras VG, Estrach JB (2018) Few-shot learning with graph neural networks. In: International conference on learning representations. [Online]. Available https://openreview.net/forum?id= BJj6qGbRW Lake BM, Salakhutdinov R, Tenenbaum JB (2015) Human-level concept learning through probabilistic program induction. Science 350(6266):1332–1338 Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D (2016) Matching networks for one shot learning. In: Proceedings of the 30th international conference on neural information processing systems, ser. NIPS’16. Curran Associates Inc., Red Hook, pp 3637–3645 Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Proceedings of the 31st international conference on neural information processing systems, ser. NIPS’17. Curran Associates Inc., Red Hook, pp 4080–4090 Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 1199–1208 Ko T, Chen Y, Li Q (2020) Prototypical networks for small footprint text-independent speaker verification. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6804–6808 Santoro A, Bartunov S, Botvinick M, Wierstra D, Lillicrap T (2016) Meta-learning with memory-augmented neural networks. In: Proceedings of the 33rd international conference on international conference on machine learning, ser. ICML’16, vol 48. JMLR.org, pp 1842–1850 Mishra N, Rohaninejad M, Chen X, Abbeel P (2018) A simple neural attentive metal-learner. In: International conference on learning representations. https://openreview.net/forum?id=B1D mUzWAW Munkhdalai T, Yu H (2017) Meta networks. In: Proceedings of the 34rd international conference on international conference on machine learning, ser. ICML’17, vol 70, pp 2554–2563. www. JMLR.org Ravi S, Larochelle H (2017) Optimization as a model for few-shot learning. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th international conference on machine learning, ser. ICML’17, vol 70, pp 1126–1135. www.JMLR.org Li Z, Zhou F, Chen F, Li H (2017) Meta-sgd: Learning to learn quickly for few-shot learning Nichol A, Achiam J, Schulman J (2018) On first-order meta-learning algorithms Parada-Cabaleiro E, Costantini G, Batliner A, Baird A, Schuller B (2018) Categorical vs dimensional perception of italian emotional speech. In: Proceedings of Interspeech 2018, pp 3638–3642 Lim W, Jang D, Lee T (2016) Speech emotion recognition using convolutional and recurrent neural networks. In: 2016 Asia-Pacific signal and information processing association annual summit and conference (APSIPA), pp 1–4 Chopra S, Mathur P, Sawhney R, Shah RR (2021) Meta-learning for low-resource speech emotion recognition. In: ICASSP 2021–2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2021, pp 6259–6263 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on international conference on machine learning, ser. ICML’15, vol 37, pp 448–456. www.JMLR.org
Chapter 38
A Machine Learning Approach for Automated Irrigation Management System Pulin Prabhu, Anuj Purandare, Abhishek Nagvekar, Aditi Kandoi, Sunil Ghane, and Mahendra Mehra
1 Introduction In India, due to uneven distribution and overuse of water in irrigation, the efficiency of irrigation systems is only about 25%. Moreover, India’s population comprises the world’s 15% population; however, it has only 4% of the world’s freshwater resources. This leads to dryness of soil and wastage of water on a huge scale. The existing irrigation systems use manual irrigation, drip irrigation sprinkler irrigation, etc. However, due to the irregularity of the amount of water distribution in farms, these techniques result in a lot of problems like land degradation, plant health, and waste wastage. Hence, to avoid these issues, there is an urgent need for the improvement of existing systems. Therefore, the paper proposes an automated irrigation and nutrient management system along with a prediction model for irrigation tables using an artificial neural network that helps the other farmers. Mobile-based irrigation solutions have been designed with various components to satisfy constraints like energy and economical saving. In this implementation, the paper suggests an automated system that helps one save not only time and money but also valuable resources like water and land. Also, having a water level sensor constantly take readings or having an irrigation schedule P. Prabhu (B) · A. Kandoi · S. Ghane Department of Computer Engineering, Sardar Patel Institute of Technology, Mumbai, India e-mail: [email protected] A. Kandoi e-mail: [email protected] S. Ghane e-mail: [email protected] A. Purandare · A. Nagvekar · M. Mehra Department of Computer Engineering, Fr. CRCE, Mumbai, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_38
485
486
P. Prabhu et al.
will help in the prevention of dry running of the pump. This proposed system helps the consumers and farmers who cannot afford existing expensive systems and suggests irrigation and nutrition schedules by its prediction model.
2 Literature Review In the described method, the authors initially perform a check of the moisture level of the soil. This gives an idea if the soil requires water or not. If required the system with the help of its ESP-01 sensor (Wi-Fi module) and relay supply required voltage for the pump to supply water to the soil [1]. In the proposed system, the authors want to calculate the percentage of basic nutrients of the soil and get the amount for the particular type of soil. The system determines the said things in real-time [2]. In this paper, the authors say that the system will determine the moisture content in the soil and provide appropriate water to the plant. Through this implementation, the system makes the plant more self-sufficient and maintains its health [3]. These suggested systems work on various microcontrollers. They are set such that they detect the moisture level of the soil with additional sensors provided. The authors suggest that if the soil has enough moisture then the system will itself pause for a given time and when required will start hence saving valuable resources like manpower [4–9]. In this system, the authors suggest a mechanism that will find the moisture content in the soil. Another suggestion made here is to constantly capture images of the plant to study the health of the plant and to determine whether it has any diseases [10]. The author suggests having a fully automated system where one can have access to multiple features like temperature, humidity, moisture, and fertilizer levels can be monitored and triggered to get rid of abnormalities to maintain an evergreen farm/garden with good plant health [11]. To reduce manpower, the authors suggest using various ways like measuring pH levels of the soil to determine the nutrient levels. This way the fertilizer quantity can be determined according to the requirement of the crop. With this, the cost will decrease and fertilizer efficiency will improve ensuring better returns to human health and the environment [12]. These papers talk about how computer technologies like artificial intelligence and deep learning can be implemented in the field of agriculture to eradicate various problems like diseases of plants, pesticides, and the conservation of water and other natural resources [13, 14]. The author suggests a system that can be pocket-friendly and at the same time could increase the efficiency of crop production. For the said system, there is an Arduino-based camera that transmits images and soil moisture levels to another microcontroller for computation that in turn handles the water supply [15].
38 A Machine Learning Approach for Automated Irrigation …
487
The author suggests similar measures as others but also highlights the benefits of this system as the agriculturist who is elderly can make use of the said system and have full control over their harvest instead of handing it over to someone to take care of it [16]. In this system, a Dual Outlet Tap is used and the rest of the system is designed upon it. A moisture sensor is connected to the circuit as each plant has different traits the system takes the reading of each plant soil, and the water is supplied by considering each plant’s need [17].
3 Methodology The dataset used to check the proper functioning of the system was collected in an experiment conducted by Wazihub with 4 IoT sensors over 4 months in 4 fields in Senegal. The raw data comprises of 7 columns and 28,049 rows. The setup to obtain the readings for the development of the system was done by having four separate pieces of ground planted with either maize or peanuts. An IoT sensor was placed such that the quantity of the yield was approximately the same. The plots, separated by one meter perimeter, were next to each other. Following are the steps for the prediction of irrigation schedule.
3.1 Adjusting Data Types and Removing Missing Values According to the algorithm, the data is preprocessed for making the necessary calculations. Testing, in case of missing values, is tested to make the data consistent.
3.2 Analyzing the Data Pattern Frequency Histograms The analysis of the frequency of each data item was done, and it was plotted as a histogram. This helped in understanding how various factors are associated with the topography of a particular region that the dataset focused upon. The IoT soil moisture and weather setup in and around the field would communicate with the following data in 5 min intervals, given below are our findings from each of the data values: • Soil Humidity: In Fig. 1, the values mostly lie between 60 and 70, with a mean value of 63.025. The minimum value is 36 and maximum value is 88. • Air Temperature: In Fig. 2, the values mostly lie between 15 and 20, with a mean value of 24.26. The minimum value is 11.22 and maximum value is 45.56. • Air Humidity: In Fig. 3, the values mostly lie between 80 and 100, with a mean
488
Fig. 1 Soil humidity analysis
Fig. 2 Air temperature analysis
Fig. 3 Air humidity analysis
P. Prabhu et al.
38 A Machine Learning Approach for Automated Irrigation …
489
Fig. 4 Pressure analysis
value of 58.52. The minimum value is 0.59 and maximum value is 96. • Pressure: In Fig. 4, the values mostly lie between 101.0 and 101.4, with a mean value of 101.13. The minimum value is 100.5 and maximum value is 101.86. • Wind Speed: In Fig. 5, the values mostly lie between 5 and 15, with a mean value of 9.89. The minimum value is 0 and maximum value is 31.36. • Wind Gust: In Fig. 6, the values mostly lie between 0 and 40, with a mean value of 41.74. The minimum value is 0 and maximum value is 133.33. • Wind Direction: In Fig. 7, the values mostly lie between 0 and 350, with a mean value of 93.98. The minimum value is 0 and maximum value is 337. Heat Maps As seen in Fig. 8, heat map is a graphical representation of data that uses a system of color-coding to represent different values. A heat map is plotted to understand the correlation between soil humidity and other features. A correlation matrix is a table showing correlation coefficients between variables. Air temperature
Fig. 5 Wind speed analysis
490
P. Prabhu et al.
Fig. 6 Wind gust analysis
Fig. 7 Wind direction analysis
and air humidity show maximum correlation. On the contrary, pressure, wind gust, and wind direction show minimum correlation. For training and testing purposes, the dataset is divided in a 70-30 way, where 70% is used for training and the rest 30% for testing and validation purposes.
3.3 Feature Engineering In order to analyze and utilize the given timestamps feature for an overall better prediction, two new features (columns), namely “days” and “time of day,” are created to record and understand how the time of the day and days together affect the overall soil moisture reading. It turned out in the further analysis that these new features are important in predicting the moisture readings.
38 A Machine Learning Approach for Automated Irrigation …
491
Fig. 8 Wind direction analysis
Table 1 Feature engineering
Table head
Splitting timestamps to time of day Original timestamp values
Assigned time of day
1.
between 00 and 06 h
Midnight
2.
between 06 and 12 h
Morning
3.
between 12 and 18 h
Afternoon
4.
between 18 and 00 h
Night
The time of day has been split into 4 buckets as given in Table 1. • New Feature, Days: This particular feature is calculated by only considering the day part of the timestamp column. We consider the earliest day of the year as day 1 and further calculate the day for each row. • New Feature, time of day: The 24 h of a day have been split into 4 brackets as seen in the histogram.
3.4 Hyperparameter Tuning A hyperparameter is a parameter whose value is used to control the learning process, and they tend to define the architecture of the model. Here as different classification algorithms are to be used to build the model, there is a need for a process of searching for the ideal model architecture. Hyperparameter tuning is a step that can get this job done.
492
P. Prabhu et al.
Fig. 9 Days
3.5 ML Algorithms The approach to the solution is finding out the best machine learning algorithm that will give the most accurate result for the detection of frauds in transactions. Support Vector Machine Support vector machine uses a kernel. A kernel helps us find a hyperplane in the higher dimensional space without increasing the computational cost. Usually, the computational cost will increase if the dimension of the data increases. We are getting RMSE of 6.8 (Fig. 9). K-Nearest Neighbor K-nearest neighbor is a simple algorithm that stores all available cases and predicts the numerical target based on a similarity measure (e.g., distance functions). Using KNN, we got a RMSE of 7.4. XGBoost XGBoost is a decision tree-based ensemble machine learning algorithm that uses a gradient boosting framework. Using XG Boost, we got a RMSE of 7.4. LGB Classifier LightGBM is a gradient boosting framework that uses treebased learning algorithm. LightGBM grows tree vertically while other tree-based learning algorithms grow trees horizontally. LightGBM is a fast, distributed, high-performance gradient boosting framework.
3.6 Artificial Neural Network ANN stands for artificial neural networks. It is essentially a computation model. It is based off of the biological neural network architectures and functions. Since a flow of data affects the structure of the ANN, changes rely on input and output characteristics in the neural network. We should consider ANN as data which is not linear, which implies a dynamic relationship between input and output. Structure ANN takes from how the human brain makes correct decisions. In Fig. 10,
38 A Machine Learning Approach for Automated Irrigation …
493
Fig. 10 Time of day
we can see how the structure of ANN is shown by representing the neurons that are a part of the human brain which composes of 86 billion nerve cells which are in turn connected to thousands of cells with the help of Axons. The dendrites help in receiving inputs from various organs, as a result of which electric impulses are created. These impulses travel through the artificial neural network from a neuron to another in order to handle different tasks. It is seen that the ANN is composed of numerous nodes which represent the neurons in the human brain. These nodes are connected via links. Just like neurons, the nodes in the ANN accept data, perform operations, result of which the operations are passed on to other nodes. Furthermore, there are weights attached to each node which when altered make the network capable of learning. The output at the end of the node is called the node value. We will now see the two kinds of: • FeedForward ANN: The information flow in the feedforward ANN is unidirectional. While containing fixed input and outputs, the feedforward ANN does not have any kind of feedback loop. Feedforward artificial neural network model will be used in the proposed system. • FeedBack ANN: The feedback ANN gives the capability for feedback loops to exist. Working of ANN In Fig. 11, every arrow represents a connection. The flow of information throughout is represented by these arrows. We also saw that there are weights attached to these connections that help in making it capable of learning but are also used controlling the signal between two nodes. To improve the results if not accurate, the system can require the weights to be altered. If output generated is accurate, no weight alteration shall be required (Figs. 12 and 13).
494
P. Prabhu et al.
Fig. 11 Basic structure of artificial neural network
Fig. 12 Artificial neural network structure
3.7 Experimental Setup Network Architecture We shall now look at the various hyperparameters considered as shown in Table 2 and the rationale behind choosing them. • The input layers are chosen to be at 1/30 neurons, and the output layer is chosen to be at 1/1 neuron for getting a greater accuracy at higher number of data points. • The input layers are chosen to be at 1/30 neurons, and the output layer is chosen to be at 1/1 neuron for getting a greater accuracy at higher number of data points.
38 A Machine Learning Approach for Automated Irrigation …
495
Fig. 13 Types of artificial neural network—Feedforward ANN
Table 2 Model architecture for ANN
Table head
Model architecture Hyperparameters layera
1.
Input
2.
Output layera epochsa
Values 1/30 neurons 1/1 neurons
3.
Training
4.
Optimizera
10 Adam
5.
Error propagationa
Gradient descent
• Unlike iterations epochs use one forward and one reverse pass of all training data which optimizes the time and gives accurate results; hence, an epoch of 10 is chosen (Fig. 14). • A few of the reasons behind choosing SGD as an optimizer and gradient descent for error propagation in our case are: • Stochastic gradient descent works better than batch training since it performs updates more frequently • The whole dataset is not required for approximation of the gradient taking the advantage of vectorized operations to process the entire mini batch at once can make the training faster on a single data point. • In the multivariate case, the formulae are a little more complex on paper when it comes to several variables and require even more calculation when implementing it in software and is given by: β = (X X) 1X Y. With the above-mentioned experimental setup for training and validation the model performs with great accuracy, RMSE is used as an error metric and is found
496
P. Prabhu et al.
Fig. 14 Types of artificial neural networks—Feedback ANN
out to be 0.536 units. This is further discussed in the results section as this model performs best among the ones discussed during the course of this research.
4 Result In this section, we present the experimental results. As mentioned earlier, our goal was predicting the soil humidity using machine learning/artificial neural networks. The root mean square error (RMSE) is the standard deviation of all the predicted errors. The said errors are a measure of how far are the data points from the actual regression line. Strictly speaking, it shows how closely the data is clustered around the line of best fit. As seen in Table 3, support vector machine performed worst among all the models with an RMSE of 6.221. KNN and XGBoost performed moderately with RMSE values 2.06 and 2.346, respectively. LGBM was almost comparable to ANN in terms of performance with an RMSE of 0.684 while ANN performed the best with an RMSE of 0.536. Table 3 Results table
Table head
Model Comparison Models
RMSE values
1
SVM
6.221
2
KNN
2.06
3
XG Boost
2.346
4
LGBM
0.684
5
ANN
0.536
38 A Machine Learning Approach for Automated Irrigation …
497
Fig. 15 Training set ground truth data versus corresponding model prediction
4.1 Observations As seen in the training (Fig. 15) and validation (Fig. 16) graphs, the model performs with great accuracy and produces predictions that are very accurate. To prove the system accuracy, RMSE is used as an error metric and is found out to be 0.536 units. Furthermore, there has not been any use of hidden layers in the artificial neural network to avoid the overfitting of data (Figs. 15 and 16).
5 Conclusion The present irrigation methods or for that matter the automated irrigation systems either require high-cost hardware or waste some valuable natural resources; hence, we have designed a system which works with or without hardware. The proposed system is programmed to read the data recorded to calculate the schedule, that is, predicting the irrigation schedule based on the historic data values collected by the sensors and weather data that is present in the database with the help of machine learning and artificial neural network. By adopting this system, there is an all-round savings because of judicious amount of water usage, power supply saving without the
498
P. Prabhu et al.
Fig. 16 Validation set ground truth data versus corresponding model prediction
use of hardware. With this when the water table and larger quantity of healthy land increases, the agricultural production increases which in turn will make the farmer happy, the interest toward farming/agriculture would also increase, and accordingly, the problem of food scarcity would also be addressed.
6 Future Scope The project demonstrates a system which is able to determine a unique approach combining various features in order to perform irrigation schedule prediction. Having achieved a significant improvement in classification accuracies, the future scope of this study can be identified as deploying hardware alongside the machine learning or intelligent computer systems in order to get specific and accurate results as per the topography.
References 1. Sarode M, Shaikh A, Krishnadas S, Rao DY (2020) Automatic plant watering system. AJCT
38 A Machine Learning Approach for Automated Irrigation …
2. 3.
4.
5.
6.
7.
8.
9. 10.
11.
12.
13.
14.
15.
16.
17.
499
6(3):90–94. [Internet] 15 Dec 2020. Available from: https://asianssr.org/index.php/ajct/article/ view/977. Cited 18 Aug 2021 Vadalia MVD, Kapse D (2017) Real time soil fertility analyzer and crop prediction. Int Res J Eng Technol 516–521. https://doi.org/10.1109/ICCPEIC.2017.8290420 Bhardwaj S, Dhir S, Hooda M (2018) Automatic plant watering system using IoT. In: 2018 Second international conference on green computing and Internet of Things (ICGCIoT), pp 659–663. https://doi.org/10.1109/ICGCIoT.2018.8753100 Devika CM, Bose K, Vijayalekshmy S (2017) Automatic plant irrigation system using Arduino. In: 2017 IEEE international conference on circuits and systems (ICCS), pp 384–387. https:// doi.org/10.1109/ICCS1.2017.8326027 Patil A, Beldar M, Naik A, Deshpande S (2016) Smartfarming using Arduino and data mining. In: 2016 3rd International conference on computing for sustainable-global development (INDIACom), pp 1913–1917 Ishak NS, Awang AH, Bahri NNS, Zaimi AMM (2015) GSM activated watering system prototype. In: 2015 IEEE international RF and microwave conference (RFM), pp 252–256. https:// doi.org/10.1109/RFM.2015.7587756 Divani D, Patil P, Punjabi SK (2016) Automated plant watering system. In: 2016 International conference on computation of power, energy information and communication (ICCPEIC), pp 180–182. https://doi.org/10.1109/ICCPEIC.2016.7557245 Primisima I, Sudiro SA, Wardijono BA (2015) Automatic plant watering controller component using FPGA device. In: 2015 International conference on advanced computer science and information systems (ICACSIS), pp 43–49. https://doi.org/10.1109/ICACSIS.2015.7415167 Getu BN, Attia HA (2015) Automatic control of agricultural pumps based on soil moisture sensing. In: AFRICON 2015, pp 1–5. https://doi.org/10.1109/AFRCON.2015.7332052. Sai Kumar MH, Krishna Kishore K, Murthy MSBS (2017) Automatic plant monitoring system. In: International conference on trends in electronics and informatics (ICEI), pp 516–521. https:// doi.org/10.1109/ICCPEIC.2017.8290420 Sharma MO, Sonwane PM (2017) Remote monitoring and control for liquid fertilizer and water irrigation. In: 2017 International conference on computation of power, energy information and communication (IC-CPEIC), pp 516–521. https://doi.org/10.1109/ICCPEIC.2017.8290420 Sowmiya PB, Nagaswetha BK, Priyadharshini D (2017) Design of automatic nutrition supply system using IoT technique in modern cities. In: 2017 International conference on technical advancements in computers and communications (ICTACC), pp 109–111. https://doi.org/10. 1109/ICTACC.2017.36 Jiang Y, Zhao J, Duan J, Hao H, Wang M (2017) Automatic control system of water and fertilizer based on fuzzy control algorithm and simulation. In: 2017 IEEE 2nd information technology, networking, electronic and automation control conference (ITNEC), pp 1671–1674. https:// doi.org/10.1109/ITNEC.2017.8285079 Zhang P et al (2017) The construction of the integration of water and fertilizer smart water saving irrigation system based on big data. In: 2017 IEEE international conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC), vol 2, pp 392–397. https://doi.org/10.1109/CSE-EUC.2017.258 Namala KK, Prabhu KK, Math AVA, Kumari A, Kulkarni S (2016) Smart irrigation with embedded system. In: 2016 IEEE Bombay section symposium (IBSS), pp. 1–5. https://doi.org/ 10.1109/IBSS.2016.7940199 Lekjaroen K et al (2016) IoT Planting: watering system using mobile application for the elderly. In: 2016 international computer science and engineering conference (ICSEC), pp. 1–6. https:// doi.org/10.1109/ICSEC.2016.7859873 Al-Bahadly I, Thompson J (2015) Garden watering system based on moisture sensing. In: 2015 9th international conference on sensing technology (ICST), pp 263–268. https://doi.org/ 10.1109/ICSensT.2015.7438404
Chapter 39
Deepfake Images, Videos Generation, and Detection Techniques Using Deep Learning Rishika Singh, Sagar Shrivastava, Aman Jatain, and Shalini Bhaskar Bajaj
1 Introduction Recent advancements in computer graphics, computer vision, and ML have made synthesis compelling fake music, photographs, and video even easier. Extremely useful audio synthesis is feasible in the audio domain now, where a neural network can learn to synthesize speech within your voice with appropriate sample recordings [1]. Recent public concerns have grown fake images and videos, such as facial data presented by digital manipulation, particularly using deepfake methods, have been a great widespread issue lately. Deepfake is a deep learning methodology capable of making fake videos by replacing one person’s face with another. This concept was taken from a Reddit user called “deepfakes,” who reported that a machine learning algorithm had been established in late 2017 to help him turn celebrity faces into porn videos [2]. Also, fake pornography, fake documents, hoaxes, and financial manipulation are some of the negative uses of this kind of fake content. This enhances the conventional field of study for general media forensics [3] and devotes rising efforts to identify audio/video facial manipulation. These revived attempts to fake face detection are focused on previous studies in biometric anti-spoofing [4] and current deep learning data-driven. Very accurate representations of people can now be synthesized by GANs in the static image domain [5], including realistic videos by someone who says something that his maker wishes can be produced in the video domain. These so-called deepfake videos could be very enjoyable nonetheless simply protected. For example, the development of nonconsensual pornography was the first use of deepfakes and proceeded to present a specific challenge to women, including celebrities to journalists and literally to unwanted attention. A forensic method must validate deepfake videos that substitute a person’s face identification with another’s. R. Singh · S. Shrivastava · A. Jatain (B) · S. B. Bajaj Department of Computer Science, Amity University, Gurgaon 122413, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_39
501
502
R. Singh et al.
In media forensics, traditional fake detection approaches were predicated on: (1) fingerprints in the camera, evaluation of the inherent fingerprints of the camera system, software and hardware like optical lens, color filter array, including interpolation [6], as well as compression [7], and many others, (2) out camera fingerprints like copy–paste or copy–move various image components external fingerprint analyses applied by machines for editing, decreases frame rate in a video and so on. Nevertheless, most conventional fake detections’ features depend mainly on the specific training case to not stand up to unseen situations. This is especially important when fake media news is commonly posted on social networks whose website changes original image/video automatically, for instance, by compressing and resizing procedures [8]. Currently, we saw many incidents of manipulation of deepfakes videos, with fake pornography videos and images of celebrities and regular people [9, 10]. Deepfakes misinformation efforts on social media sites are often of interest to business information to influence politics, trigger hate and violence on minorities, and generate social unrest [11, 12]. Over the Internet, deepfakes may also be threat. New analysis has shown that GANs could construct deep medical images to deceive physicians with a technical support team dependent on ML.
2 Facial Manipulations Facial manipulation can be divided into four key manipulation levels into four distinct classes. Every facial manipulation group is summarized graphically in Fig. 1. Below, from high to low degrees of manipulation, there is a summary of both: 1.
2.
3.
4.
For example, entire face synthesis uses the new style GAN process. This manipulation produces entire non-existing facial images. These strategies yield impressive results, producing high quality, accurate facial images. Identity swap: Its manipulation involves swapping a person’s face with some other person’s face in a video. There are typically three distinct approaches: traditional graphic computer techniques, like FaceSwap, including (2) modern deep learning methods identified as deepfakes, e.g., current ZAO mobile application. Attribute Manipulation: This distortion, recognized as facial editing or face retouching, involves the changing of certain facial features like hair or skin color, gender, age, glasses, and so on. The manipulation method is typically achieved via GAN, as suggested by StarGAN [13]. The common mobile application FaceApp is one example of this kind of handling. Expression Swap: Its manipulation, also known as face synthesis, involves changing the person’s face language. The group focuses on Face2Face and neural textures, the most common strategies that substitute for one person’s face in a video for the other’s facial expression. Such deception may have extreme implications, such as Mark Zuckerberg’s famous video saying things he never said.
39 Deepfake Images, Videos Generation, and Detection Techniques …
503
Fig. 1 Actual and fake samples of each facial manipulation group
3 Face Manipulation Methods for Fake Image and Video Generation Recent developments in the production of fake images and videos raise society’s concerns by exploiting politics and actors or actresses. Thus, if the trend persists, digital content can be hard to trust, as spreading misleading facts and fake news may become ubiquitous. As we know, faces play a significant role in transmitting a message. Therefore, today, deepfakes are becoming an important issue. Different face manipulation methods such as Face2Face, Deepfakes, Faceswap, or Neuraltextures are used in FaceForesiscs++. Deepfakes and faceswap are part of the identity swap category, and Face2Face and NeuralTextures are part of the manipulation category of the facial expression [14] (Fig. 2).
504
R. Singh et al.
Fig. 2 An example of two categories of face manipulation methods
3.1 Types of Methods in Creating FaceForensics++ dataset Faceswap Faceswap is a graphics-based method that transfers face to the source image’s target video. This is achieved by removing the source image’s facial landmarks and modifying 3D models using blend shapes and target image projects. Image and color correction is used to blend morphed face onto source image to produce a realistic-looking fake image. For each frame, this method is repeated to create a video out of it. Deepfakes First deepfake video was formed by Redditor using an encoder and decoder network. Faceswap GitHub and the Fakeapp are available for this execution. The mechanism is depending upon two networks of encoder–decoders. The two networks share a similar encoder nonetheless various decoders during the training process. Face2Face This method belongs to facial reenactment, which aims to transfer facial expressions of the source image to the target image. This method utilizes a 3D model by blendshape coefficients for moving facial expressions from one image to another. They use the first frame and the other main points to construct fake expressions for temporary face identity. NeuralTextures This procedure also belongs to the category of facial reenactment based upon the approach of neural textures that build on the method seen in the paper. They create fakes using GANs to generate reconstruction loss and adverse loss.
3.2 Post-Processing of Videos Videos are post-processed for enhancing the quality of videos by compacting videos by H.264 codec, a method mostly used by social networks. This study utilizes 23 for generating high-quality videos and 40 for creating low-quality videos.
39 Deepfake Images, Videos Generation, and Detection Techniques …
505
Fig. 3 Example images from FaceForesnsics++ dataset
Table 1 Number of images per each technique Techniques
Train data
Validation data
Test data
Real
3,66,847
68,511
73,770
Deepfakes
3,66,835
68,506
73,768
Face2Face
3,66,843
68,511
73,770
Faceswap
2,91,434
54,618
59,640
NeuralTextures
2,91,834
54,630
59,672
3.3 Statistics of FaceForensics++ Dataset FaceForesics++ dataset comprises 1000 original videos and its manipulated videos created using various deep learning face manipulation techniques like Faceswap, Deepfakes, and Face2Face. They also offer binary face masks to help for image segmentation or classification. Figure 3 illustrates example images from FaceForensics++ dataset (). The total number of images per face manipulation procedure is shown in Table 1
4 Deepfake Video Generation and Detection 4.1 Deepfake The combination of fake and DL technologies is Deepfake. DL is an artificial intelligence feature for deepfakes to be produced and detected. Deepfake is achieved by generative opponent networks which involve two machine learning models. A model train on the dataset generates video forgeries, and other models recognize
506
R. Singh et al.
falsification. The forger renders fakes until any other construction does not see the falsification. The wider the training dataset, the better a forger can develop convincing deepfakes.
4.2 Deepfakes Challenges Deepfakes dominate the world, and deepfakes are used for numerous uses worldwide, including face-swapping, pornography reconstructing images, and fake news using people’s faces and bodies. Complexities of democracy, security, safety, religion, and people’s cultures continuously influence people. Deepfakes are rarely increased, but deep detection methods cannot be checked anymore. Since 2018 there have been about double the amount of deep online images and videos. 126,000 news published over ten years to 3,000,000 users has been analyzed by the Massachusetts Institute of Technology (MIT). Finally, they find that fake news for 1500 people is six times faster than genuine news. Deepfakes create fictional news, images, videos, and terrorists. Deep eroded media trust and caused social and financial fraud. This is deepfakes that religions, alliances, politicians, musicians, and electors influence. When deep videos and images in social media increase, the facts are forgotten. Authors in [15] speak about the deepfakes challenges affecting culture and people. Deepfakes are yet to be found by others, including joking about embarrassing a fellow worker, unauthorized access or even violence, a porn video to benefit others, and so on. DF is utilized for fake terrorism acts, challenging, deformities people, and causing political distress. Although no one is immune from basic flaws, others are more robust than others. Someone can produce a video of world leaders who say about a civil conflict with low data and computing infrastructure. Deepfakes harm a target, ignite false news and hate speech, cause political tension, frustrate viewers or contribute to conflict. A person can modify video content and video people to disseminate fake news, leading to tension among nations, a country with numerous nations, and populations in general. The development of deepfakes photographs and videos in social media is increasingly growing (Table 2).
4.3 Deepfake Datasets A dataset shall be used for designing deepfake detection models. Different detection systems are required in the deep picture and video material field. Currently, deep detection approaches need some experiments and continue to be checked. Therefore, there is a growing need for a vast number of images and video datasets for evaluating the efficiency of detection methods.
39 Deepfake Images, Videos Generation, and Detection Techniques …
507
Table 2 List of DF video and image datasets Datasets
Total videos Explanation
UADFV
98
This dataset includes 49 fake and 49 actual videos of 32,752 frames. The FakeAPP DNN system makes these videos
DF-TIMIT
320
It includes poor value videos by a total of 10,537 actual pictures, including 34.023 false images from 320 total images. It is generated by faceswap-GAN
VTD
1000
Video tampering dataset is a broad video dataset that is manipulated
FaceForensics (FFs) 2008
It is made up of 1004 YouTube video datasets. The Face2Face solution altered the video
DFD
3068
The deepfake detection data from Google or Jigsaw was based on 3068 real videos, including 28 persons of varying ages, genders, and communities
DFD
3363
It consists of 363 initial videos over 3000 videos updated using DF, including their respective binary masks
FFs++
5000
An expanded FaceForensics dataset includes real and fake videos produced by faceswap
DFDC
5214
4113 deep videos from 1131 real videos of 66 consented persons of all ages, genders including ethnic groups created Facebook’s deep detection problem
Celeb-DF
6229
There were 590 real videos, including 5639 deep videos with more than two million video frames
4.4 Deepfake Detection To detect deepfakes, different approaches were suggested following the implementation of this threat. The classifiers distinguish the manipulated and real videos in a binary classification of the deep detection technique. Such detection technique requires an extensive dataset of actual and false videos. There are several deepfake videos available on the Internet. However, the standard to test various high deepfake techniques has yet to be developed. In [16], researchers developed a deep dataset containing 620 Faceswap-GAN-created videos. A readily available VidTIMIT Web site includes both higher and lower videos that can reliably represent lip gestures, face falls, and a person’s twitch of an eye. These videos have been utilized to test different identification techniques. The results demonstrate that deepfake cannot be identified effectively by FaceNet & VGG face recognition systems. In detecting deep videos, image quantitative steps and lip-syncing with support vector machine (SVM) indicate a fault. This raises the vital need for deep detection algorithms which are more powerful and efficient. Previous detection study has explored all detection schemes: • Supervised detection schemes • Blind detection schemes.
508
R. Singh et al.
The defendant will use the data labeled in the supervised system to train a classification algorithm to view both true and false information. The protector should not have previous access to a piece of incorrect information (or generative methods) in blind identification and only access to the actual content.
4.5 Deepfake Video Generation The increasingly advanced technologies for the mobile camera, and rising reach of social media and networking portals, have made it easier than ever to make and distribute digital videos. Until lately, lack of sophisticated editing facilities, high demand for domain knowledge and complicated, and the time-consuming process had minimized no. of fake videos and their realistically. The time of video manufacture and manipulation has declined considerably over the years due to access to high volume training data, including computing power, but more specifically due to an increase in ML, including CV methods that minimize essential manual editing steps. A recent vein of fake video systems development focused on AI called deepfake has, in general, received tremendous interest lately. It uses a video from a specific person as an input (‘target’) and creates another video where the face of the goal is replaced by another person’s face (‘source’). Deepfake’s backbone is a DNN equipped to map face images to the goal automatically. The subsequent videos will achieve a high degree of realism with proper post-processing.
4.6 Deepfake Video Detection The video’s time consistency is not efficiently implemented, as well as compels the detection of deep deepfakes by using the spatiotemporal content of the video. Implementation of conventional and recurrent networks manipulates the time inconsistencies in frames. With both LSTM and CNN, temporal video characteristics can be extracted, which are indicated by the frame sequence. The deep detection network takes sequence frames and forecasts an initial or deep deepfake. Using CNN, frame features are extracted and eventually filled into the LSTM for time series frames. We can detect deepfakes with the eye blinking when a person has less or more blink than in real videos. The normal eye blinks around 2 and 10 s and takes up 0.1 and 0.4 s for each blink in the eye. Deep formation tools can, in most cases, not build eyes that blink like a normal person. Most of the time, videos are manipulated with a blinking eye speed less than in real videos. Color is taken from each eye by computer vision, and eye color differences are applied to detect DF. The deepfakes detection method uses an audio–visual information system to detect conventional energy sources among lipmoves, speech, and several slow image distinctions. Deepfakes detection technique through lip synchronization distinguishes the video clips from modified videos, in which lip movement and speech match [17].
39 Deepfake Images, Videos Generation, and Detection Techniques …
509
5 Deep Learning Techniques for Deepfakes Detection A wide variety of complex task domains is protected by deep learning. After all, recent development has also been made in DL technology, which places people’s privacy, including national security, at risk. Deepfakes, among them, generate fake photographs and videos not observable by people as fakes. Fake world leaders’ speeches can also threaten global security and peace. Deepfakes may also be used for constructive reasons aside from malicious usage, including in films for language translation or post-dubbing.
5.1 Convolutional Neural Networks CNNs are deep learning algorithms that use the various operations for suitable output learning tasks to acquire image inputs and learn features. The following layers are a typical (1) CNN: Pooling layer, (2) Convolution layer, (3) Fully connected layer, and (4) Activation layer.
5.2 Generative Adversarial Networks (GANs) It is a class of CNNs presented by Goodfellow. The network is like CNN, except a generator that produces new examples and a discriminator, classifying all real and fake data input, which deals with two subnetworks. GANs belong to the class of unsupervised learning algorithms since humans do not need to label training images manually. GANs are generative models that produce new data instances identical to the input data. The producer emits random noise as input and creates an image which is then fed to the discriminator. The discriminator takes real images and the false images (generated by generators) as inputs and outputs the likelihood of 0–1, where 0 is fake, and one is real. The generative models attempt to consider class distribution, while the discriminative models learn the boundaries between real-looking and fake images.
5.3 Recurrent Neural Networks (RNNs) RNNs are ANN groups that vary from feed-forward neural networks. RNNs can take inputs one or more and generate outputs that are affected by inputs and previous inputs/outputs. Thus, though the inputs are similar, the results can vary by the difference in previous inputs.
510
R. Singh et al.
5.4 Long Short-Term Memory (LSTM) It is a kind of RNNs that are up to recall long sequences. Standard RNNs fail to recall if time lags are above 5–10 steps, while LSTM is effective sequences lengthier than 100 steps. Two major limitations of RNNs are vanishing gradients and exploding gradients. LSTM’s uniquely built architecture should solve these disadvantages. LSTM layer consists of recurrently associated memory blocks. Each memory block has a memory cell and three re-connected gates: input gate, output door, and forget gate that gives cell writing, reading, and resetting. Each net will communicate with these gates by cells. Some LSTM implementations include speech recognition, video, and machine translation.
5.5 Optical Flow and Flownet It is a method to measure the image intensity movement. It refers to the moving pattern between consecutive frames of image objects. Videos are temporal and spatial, while images have only spatial structures. Optical flow thus adds to the achievement of temporal understanding. In several applications, CNNs were used successfully, like optical flow estimation. Optical flow increases the classification performance in video classification tasks and helps in video generation via GANs. GANs provide smoothing to produce realistic images so that reliable results can be produced (Table 3). Note AUC = Area Under Curve, Acc. = Accuracy, EER = Equal Error Rate
6 Related Works 6.1 AI-Based Video Synthesis Algorithms (AI-VSA) A newer model of AI-VSA is dependent on recent advances in novel DL models, in particular GANs. Two DNNs trained in tandem comprise a GAN model. The generators’ network attempts to generate images not distinct from real images, while the network of discriminators aims at separating them. The generator is utilized to create images of the realist approach when training is complete. Liu et al. [18] have suggested a method focused on coupling GANs to unsupervised image-to-image conversion to learn mutual image representation in various domains. The deepfake algorithm is based on this algorithm. Zhu et al. [19] suggested CycleGAN’s consistent cycle loss to improve GAN efficiency. Further measures have been taken by Bansal et al. [20] and the suggestion for RecycleGAN, which included time and space data with adverse generative adversarial networks.
39 Deepfake Images, Videos Generation, and Detection Techniques …
511
Table 3 Entire face synthesis: comparison of various state-of-the-art detection methods Author
Method
Classifiers
Databases (Generation)
Best performance
McCloskey and Albright (2018) [79]
GAN-Pipeline Features (GAN-PF)
SVM
NIST MFC2018
AUC = 70.0%
Wang et al. [80]
GAN-PF
SVM
Own (InterFaceGAN, StyleGAN)
Acc. = 84.7%
Guarnera et al. [81]
GAN-PF
SVM, k-NN, LDA Own (StarGAN, AttGAN, GDWCT, StyleGAN2, StyleGAN)
Acc. = 99.81%
Nataraj et al. [82]
Steganalysis Features
CNN
100K-Faces (StyleGAN)
EER = 12.3%
Yu et al. [83]
Deep Learning CNN Features (DLF)
Own (SNGAN, ProGAN, MMDGAN, CramerGAN)
Acc. = 99.5%
Marra et al. [84]
DLF
CNN + Incremental Learning
Own (ProGAN, Acc. = 99.3% CycleGAN, StyleGAN, Glow, StarGAN)
Dang et al. [85]
DLF
CNN + Attention Mechanism
DFFD (ProGAN, AUC = 100% StyleGAN) EER = 0.1%
Neves et al. [86]
DLF
CNN
100K-Faces (StyleGAN) iFakeFaceDB
EER = 0.3% EER = 4.5%
Hulzebosch et al. [87]
DLF
CNN, AE
Own (StarGAN, Glow, ProGAN, StyleGAN)
Acc. = 99.8%
6.2 Resampling Detection The objects added by the deepfake output pipeline are primarily due to the affinity of the facial transformations. Transformations or the underlying sampling algorithm, e.g., have been widely researched in digital media forensics literature [21]. And post-processing phases, including image/video encoding, that is not subject to easy modeling affect these methods’ efficacy. Moreover, these approaches are usually intended to analyze the exact re-sampling of whole images. Still, for our motive, it is probable to accomplish a more simplistic method simply by comparing areas of possibly synthetic faces. The rest of the photograph should be free of such objects, whereas during the first such item is a clear suggestion that the video is deepfake.
512
R. Singh et al.
6.3 Face-Based Video Manipulation Methods Different methods to target manipulation in video sequences have been suggested since the 1990s. First real-time expression transfer for faces was indicated by Thies et al. [22] and then proposed the Face2Face, the in facial reenactment system that can alter facial vibration in various video stream classifications. Different image synthesis methodologies using deep learning, which Lu et al. surveyed [23], have also been researched. GANs are utilized to modify the aging of facial alterations or face attributes, e.g., skin color. Deep characteristic interpolation shows successful outcomes in altering facial characteristics such as age, facial hair, or mouth. Lample et al. [24] achieve comparable findings of interpolations of attributes. Most of these techniques of DL image synthesis suffer from low picture resolution. Karras et al. [25] demonstrate good quality faces synthesis and progressive GANs improve graphics performance. Researchers have demonstrated the proposed method’s efficiency by number of video datasets and deepfakes.
6.4 GAN Generated Image/Video Detection Techniques can be utilized to detect traditional forgery. Zhou et al. suggested CNN, for face detection manipulation, in a specific wavelength. NoisePrint used CNN to trace forgery detection device fingerprints. GAN images or videos have also lately been identified. Li et al. noted that deepfake faces do not have realistic eye blind, as training images generated on the internet do not normally comprise pictures with closed eyes. A CNN/RNN model to show deepfake videos detects a lack of blinking of the eye. Even so, the deliberate incorporation of images with closed eyes into the training can circumvent this identification. Yang et al. used head incoherence to identify fake videos. Afshar et al. trained NNs to categorize real faces, and fake faces produced by Deepfake and Face 2face.
7 Conclusion and Future Scope The imagination in computer graphics and computer vision communities tends to be endowed with synthetical audios and videos—so-called DF. Simultaneously, the democratization of technology access, which can produce sophisticated manipulated videos of anyone who says something, is also concerning due to his capability to interrupt democratic elections, fuel disinformation campaigns, commit small fraud, also generate non-consensual pornography. Usually, the more current facial manipulations look likely to be easily detectable under controlled scenarios after fake detectors are evaluated under similar conditions. This fact was shown in most of this survey’s benchmarks, and very low error detection rates were accomplished. It is
39 Deepfake Images, Videos Generation, and Detection Techniques …
513
not a very realistic scenario because fake videos and photographs are often shared across social networks with great changes like compression, resizing, and noise. As well, facial manipulation methods are an important motivating method. New features can also be explored to include more accurate methods than current false detectors focused solely on image and video data. We will then examine the main aspects to enhance and current prospects for each group of face manipulation, such as face synthesis, identification swap, manipulation of attributes, and expression swap. Many aspects will enhance the new generation of realistic fake pictures or videos along with advanced approaches for face manipulation detection and the improvement of new deep learning and the latest deepfake detection challenge (DFDC). We would like to look into a dedicated deepfake video network structure.
References 1. Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio 2. BBC Bitesize (2019) Deepfakes: What are they and why would I make one? Available: https:// www.bbc.co.uk/bitesize/articles/zfkwcqt 3. Stamm MC, Liu KJR (2010) Forensic detection of image manipulation using statistical intrinsic fingerprints. IEEE Trans Inf Forensics Secur 5(3):492–506. https://doi.org/10.1109/TIFS.2010. 2053202 4. Galbally J, Marcel S, Fierrez J (2014) Biometric anti-spoofing methods: a survey in face recognition. IEEE Access 2:1530–1552 5. Karras T et al (2019) A style-based generator architecture for generative adversarial networks. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 4401– 4410 (A.C.) 6. Popescu, Farid H (2005) Exposing digital forgeries in color filter array interpolated images. IEEE Trans Signal Process 53(10):3948–3959 7. Lin Z, He J, Tang X, Tang C-K (2009) Fast, automatic and fine-grained tampered JPEG image detection via DCT coefficient analysis. Pattern Recogn 42(11):2492–2501 8. Rossler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) FaceForensics++: learning to detect manipulated facial images. In: Proceedings of IEEE/CVF international conference on computer vision 9. Partadiredja R, Entrena Serrano C, Ljubenkov D (2020) AI or human: the socio-ethical implications of AI-generated media content. 1–6 10. Wong CRQ (2016) Facebook removes bogus accounts that used AI to create fake profile pictures. https://www.cnet.com/news/facebookremoved-fake-accounts-that-used-ai-to-createfake-profile-pictures/ 11. Mak T (2018) Can you believe your ears? With new ‘Fake News’ tech, not necessarily. https://www.npr.org/2018/04/04/599126774/can-you-believeyour-own-ears-with-newfake\-news-tech-not-necessarily 12. BBC News (2019) Deepfake videos could ‘Spark’ violent social unrest. https//www.bbc.com/news/technology-48621452sw 13. Choi Y, Choi M, Kim M, Ha J, Kim S, Choo J (2018) StarGAN: unified generative adversarial networks for multi-domain image to-image translation. In: Proceedings of IEEE/CVF conference on computer vision and pattern recognition 14. Santha A (2020) Deepfakes generation using LSTM based generative adversarial networks. Thesis. Rochester Institute of Technology. Accessed from https://scholarworks.rit.edu/theses/ 10447
514
R. Singh et al.
15. Marwan Albahar JA (2019) Deepfakes threats and countermeasures systematic review, JTAIT 97(22):3242–3250 16. Muluye W (2020) Deep learning algorithms and frameworks for deepfake image and video detection: a review. Int J Eng Comput Sci 9(10):25199–25207 17. Zhang Y, Zheng L, Thing V (2017) Automated face swapping and its detection. In: IEEE 2nd international conference on signal and image processing (ICSIP), pp 15–19 18. Liu M-Y, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Proceedings of the 31st international conference on neural information processing systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, pp 700–708 19. Zhu J, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycleconsistent adversarial networks. In: 2017 IEEE international conference on computer vision (ICCV), pp 2242–2251. https://doi.org/10.1109/ICCV 20. Popescu A, Farid H (2005) Exposing digital forgeries by detecting traces of re-sampling. IEEE Trans Signal Process 53:758–767. https://doi.org/10.1109/TSP.2004.839932 21. Prasad S, Ramakrishnan KR (2006) On resampling detection and its application to detect image tampering. In: IEEE international conference on multimedia and expo, pp 1325–1328. https:// doi.org/10.1109/ICME.2006.262783 22. Thies J, Zollhöfer M, Stamminger M, Theobalt C, Nießner M (2016). 2018. Face2Face: realtime face capture and reenactment of RGB videos. Commun ACM 62:96–104 23. Antipov G, Baccouche M, Dugelay J-L (2017) Face aging with conditional generative adversarial networks. arXiv:1702.01983 24. Lample G et al (2017) Fader networks: manipulating images by sliding attributes. In: Advances in neural information processing systems, pp 5967–5976 25. Karras T, Aila T, Laine S, Lehtinen J (2017) The progressive growing of GANs for improved quality, stability, and variation. arXiv:1710.10196
Chapter 40
Retinal Image Enhancement Using Hybrid Approach Prem Kumari Verma and Nagendra Pratap Singh
1 Introduction The retina is a very important part of eyes. The retina is a very important part of eyes. It is the sensory membrane that lines the inner surface of the back of the eyeball [1]. It is composed of several layers, including one that contain specialized cells photo receptors. The ophthalmologist scan the retina problem of the patients by using high resolution of the fundus camera. Retina fundus image provide rich information of disease such as diabetic, hypertension, glaucoma, etc. [2].
P. K. Verma (B) Madan Mohan Malaviya University of Technology, Gorakhpur, Uttar Pradesh, India N. P. Singh National Institute of Technology Hamirpur, Hamirpur, Himachal Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_40
515
516
P. K. Verma and N. P. Singh
Image enhancement technique used to improve the contrast and highlights the retina blood vessels. In this paper, we enhance the retinal image, firstly we take the features of retina such as the cross-sectional profiles, and uniform intensity of edge. We change the smoothness of retinal image by change its intensity [3]. In this paper, we follow the enhancement technique for retinal blood vessels. Image enhancement algorithm is used to classify as noise removal and restoration method or contrast enhancement technique is also include both gray level and spatial filtering modification method [4]. Purpose of this paper is define more clear and narrowly the processing of disease retinal image [5]. In this we define enhancement process critically. This process such investigated on fundus image. Principal component analysis is a solo learning calculation that is utilized for the dimensionality decrease in AI. It is a measurable interaction that changes over the perceptions of corresponded highlights into a bunch of directly uncorrelated elements with the assistance of symmetrical change. These new changed elements are known as the principal components. It is one of the mainstream devices that is utilized for exploratory information investigation and prescient displaying. It is a procedure to draw solid examples from the given data set by lessening the changes. Here in this paper I choose PCA algorithm behind the reason because some time RGB images the ratio of RGB is disbalance so the PCA is reduce the dimensionality and give the accuracy in output.
2 Literature Review • A. M. R. R. Bandra et al. proposed enhancement technique using SAUCE is evaluated for segment accuracy of blood vessels. SAUCE is also enhance the retinal blood vessels and performance accuracy is 94% which is better to other integrated approach. • Chritina Leither et al. proposed the PCA algorithms use on speech enhancement and recognize the synthetic data. PCA works on complex STFT coefficients of speech. STFT calculated 256 samples of speech frames [6]. • Ridhi A. Vyas et al. (2007) used PCA technique for face recognition feature-based extraction with enhancement accuracy. In this paper examine the feature on Yale database. Yale database contains 165 images in GIF format. The face detection accuracy define 98.18% [7]. • S. Y. Elhabian et al. (2008) has implemented derivative-based technique for background differentiation of retinal image [8]. Here, author used the second derivative of 2D input image and performed the highlight of blood vessels. But in this system the drawback is low accuracy result of retinal blood vessels not clear differentiate the background of retinal images. • Gopal Dutt Joshi et al. (2008) has proposed the method of color retinal image enhancement based on Domain Knowledge [9]. Here, author use sampling grid and determines the local mean and variance of retinal image. In this method author used 50° field of view captured image by fundus camera.
40 Retinal Image Enhancement Using Hybrid Approach
517
• W. Setiawan, Tati R. Mengko et al. proposed the color retinal image enhancement using CLAHE method on june 2013, use R and G channel image of retina blood vessels [10]. • Singh et al. [11] has proposed the segmentation method of retinal blood vessels by using Gumbel Probability Distribution function based on match filter. Here, the authors used PCA-based gray scale conversion and entropy-based optimal thresholding on retinal fundus image and its accuracy is 92.07%. • BM Ege et al. proposed the method of feature extraction of diabetic retinopathy [12]. In this paper author followed preprocessing on retinal image and found feature ranking by quantitative analysis. Here, author found the mean value and standard deviation of normal retinal images are 332.245 and 20.120 and diabetic retinal images are 315.09 and 12.34, respectively.
3 Method and Model Here, we discuss the method of retinal image Enhancement, firstly we take the RGB image of Retina from DRIVE and STARE dataset [13]. The block diagram of the implemented work is shown in figure.
518
P. K. Verma and N. P. Singh
It is combination of preprocessing based on normal CLAHE on gray scale image and second time apply CLAHE, when apply PCA algorithm on gray scale image of retina. DRIVE Dataset The DRIVE dataset has been established for the research studies on segmentation, enhancement of retinal blood vessels. The photography of DRIVE dataset was acquired from diabetic retinopathy screening program. This screening is consist of diabetic retina images between 25 and 90 years of age. The images were captured by Canon CR5 non-mydriatic 3CCD camera with 45° field of view (FOV). Each image bits was 768 * 584 pixels [14]. The set of 40 images were divided into two category training and testing and both contain 20 images. All of the images contained in dataset was actually used for making clinical diagnoses. Here, ensure the protection of patient privacy. STARE Dataset Consist of a group of image of 2 * 10 retina. A opcon TRV-50 fundus camera has been taken at 35° and on FOV and everyone image is out of 8 bits per color channel at 650,500 pixel of which 5% are healthy and the rest are unhealthy retinal image (Figs. 1 and 2).
3.1 Preprocessing (Method 1) In the preprocessing process we improve the accuracy of retinal blood vessels. Contrast enhancement of retinal image is basic preprocessing task (Fig. 3). Preprocessing process include two step first convert RGB retinal image into gray scale retinal image. Then enhance the gray scale image by using Contrast Limited Adoptive Histogram Equalization (CLAHE) [15]. Then calculate the PSNR value of step one. Second step is again convert RGB colored retinal image into gray scale image. Then apply principal component analysis (PCA) on gray scale retinal image. After applying PCA again enhance the image the image using CLAHE on the filtering image by PCA algorithm. Then again calculate the PSNR value of step two which is explain (Tables 1, 2 and 3; Fig. 4).
3.2 Method 2 Then we comparison the PSNR value of step 1 enhancement image and step 2 enhanced image. We get better result by using PCA algorithm. psnr =
Mean of image standard deviation of image
40 Retinal Image Enhancement Using Hybrid Approach
519
Fig. 1 RGB image of retina
3.3 PCA Algorithm • Vectorized the gray scale image. • Find the zero mean value of image. • Compute the Eigen value and Eigen vector of image using principal component analysis. • Eigen value-weighted linear sum of projection is apply on gray scale image of retinal image [11].
520
P. K. Verma and N. P. Singh
Fig. 2 Gray scale image of retina Fig. 3 a Gray scale image form DRIVE dataset. b Using PCA with gray scale image
a
b
40 Retinal Image Enhancement Using Hybrid Approach Table 1 Comparison PSNR value between gray scale, PCA gray scale and PCA CLAHE gray scale image on STARE dataset
Table 2 Comparison between PSNR value of gray scale and PCA gray image of retina
521
S. No.
Gray scale image PSNR value
PCA gray image PSNR value
CLAHE-PCA image PSNR value
1
1.5654
2.1015
3.0245
2
1.5492
2.1210
3.0430
3
1.6276
2.3245
3.0326
4
1.4961
2.1365
3.0245
5
1.5778
2.2204
3.1440
6
1.5750
2.1758
3.0145
7
1.5115
2.1493
3.0111
8
1.5394
2.1582
3.0715
9
1.5631
2.1430
3.1192
10
1.5563
2.2308
3.2030
11
1.5127
2.1325
3.0432
12
1.5665
2.1711
3.1222
13
1.5127
2.1217
3.0526
14
1.5665
2.2345
3.1721
15
1.5409
2.2034
3.1574
S. No.
Gray scale image (PSNR value)
PCA gray image (PSNR value)
1
12.51701
15.45549
2
11.4726
14.81692
3
12.04655
15.33756
4
12.83124
15.18359
5
12.78181
16.21005
6
12.38133
15.23484
7
11.88643
14.86375
8
13.00399
15.59929
9
11.89831
15.49586
10
13.55869
16.18939
11
11.87693
14.81858
12
12.48098
15.47753
13
12.40604
15.07701
14
11.90996
15.0189
15
13.80266
15.8268
522
P. K. Verma and N. P. Singh
Table 3 Comparison PSNR value between gray scale, PCA gray scale and PCA CLAHE gray scale image S. No.
Gray scale image PSNR value
PCA gray image PSNR value
CLAHE-PCA image PSNR value
1
12.51701
15.45549
15.70229
2
11.4726
14.81692
15.39649
3
12.04655
15.33756
15.29923
4
12.83124
15.18359
15.9043
5
12.78181
16.21005
15.77462
6
12.38133
15.23484
15.33055
7
11.88643
14.86375
15.50263
8
13.00399
15.59929
15.83053
9
11.89831
15.49586
15.50937
10
13.55869
16.18939
16.047
11
11.87693
14.81858
15.66551
12
12.48098
15.47753
15.69076
13
12.40604
15.07701
15.60902
14
11.90996
15.0189
15.6912
15
13.80266
15.8268
16.24881
a
b
c
Fig. 4 a Gray scale image form DRIVE dataset. b Using PCA with gray scale image. c CLAHEPCA image
3.4 Contrast Limited Adoptive Histogram Equalization (CLAHE) CLAHE is used to improve the pretense level of plutonic image or video. In this paper, we use Contrast Limited Adoptive Histogram Equalization (CLAHE) to improving
40 Retinal Image Enhancement Using Hybrid Approach
523
the quality of retinal image [16]. CLAHE is different from normal Histogram Equalization because it has use several method corresponding to different part of the image and allocate the histogram which gives the better quality result compare the Adoptive Histogram Technique (AHE). CLAHE is advanced version if AHE, which is apply in both homogeneous and heterogeneous foggy image [17]. CLAHE is used both images colored and gray level.
4 Result and Discussion The proposed method has been implement on DRIVE and STARE dataset which is available publicly. The given table shows the result and performance analysis of principal component analysis using in RGB to gray scale image. Image 2, 3, 4, 10 from DRIVE dataset is decrease the performance after using PCA and CLAHE. The reason behind that in case of equalization of digital image, the image has impulsive noise whenever randomly change the pixel channel, it is also due to capturing the image and fluctuated photo stream. Also due to defective pixel noise generation by electromagnetic error of analog converter aging of the storage device and transmission error. Table 1 shows the STARE dataset retinal image enhancement result using PCA and CLAHE technique. Table 1 shows the PSNR value. First column show the actual value of retinal image and second column shows the PSNR value after applying CLAHE and third column shows the PSNR value after applying PCA and CLAHE. Table shows the accuracy of retinal image enhancement PSNR value.
5 Conclusion and Future Work In this paper, the image of retina dataset DRIVE and STARE are enhanced by using the PCA algorithm with CLAHE filtering. Retinal blood vessels are responsible for the detection of retinal daises, so the enhancement of retinal blood vessel is also popular part of task. The given process gives the better accuracy in the retinal image enhancement. Retinal images are captured by the fundus camera because it has high resolution and gives the high quality of input image [10]. Retinal image enhancement technique are used to improve contrast and brighten of an image. Here, we found better result by principal component analysis (PCA) algorithm and comparison to normal CLAHE after convert RGB to gray scale image. In future, we trying to apply Robust Sharp Vector Median for more better result of enhancement.
524
P. K. Verma and N. P. Singh
References 1. Yadav P, Singh NP (2019) Classification of normal and abnormal retinal images by using feature-based machine learning approach. Recent trends in communication, computing, and electronics. Springer, Singapore, pp 387–396 2. Bandara AMRR, Giragama PWGRMPB (2017) A retinal image enhancement technique for blood vessel segmentation algorithm. In: 2017 IEEE international conference on industrial and information systems (ICIIS). IEEE, pp 1–5 3. Nirmala D (2015) Medical image contrast enhancement techniques. Res J Pharm Biol Chem Sci 6(3):321–329 4. Tebini S, Seddik H, Braiek EB (2017) Medical image enhancement based on new anisotropic diffusion function. In: 2017 14th international multi-conference on systems, signals & devices (SSD). IEEE, pp 456–460 5. Singh NP, Srivastava R (2018) Extraction of retinal blood vessels by using an extended matched filter based on second derivative of gaussian. In: Proceedings of the national academy of sciences, India section a: physical sciences, pp 1–9 6. Leitner C, Pernkopf F (2011) The pre-image problem and Kernel PCA for speech enhancement. International conference on nonlinear speech processing. Springer, Berlin, pp 199–206 7. Vyas RA, Shah SM (2017) Comparision of PCA and LDA techniques for face recognition feature based extraction with accuracy enhancement. Int Res J Eng Technol (IRJET) 4(6):3332– 3336 8. Elhabian SY, El-Sayed KM, Ahmed SH (2008) Moving object detection in spatial domain using background removal techniques-state-of-art. Recent Pat Comput Sci 1(1):32–54 9. Joshi GD, Sivaswamy J (2008) Colour retinal image enhancement based on domain knowledge. In: 2008 6th indian conference on computer vision, graphics and image processing. IEEE, pp 591–598 10. Setiawan AW, Mengko TR, Santoso OS, Suksmono AB (2013) Color retinal image enhancement using clahe. In: International conference on ICT for smart society. IEEE, pp 1–3 11. Singh NP, Srivastava R (2016) Retinal blood vessels segmentationby using gumbel probability distribution function based matched filter. Comput Methods Progr Biomed 129:40–50 12. Ege BM, Hejlesen OK, Larsen OV, Møller K, Jennings B, Kerr D, Cavan DA (2000) Screening for diabetic retinopathy using computer based image analysis and statistical classification. Comput Methods Progr Biomed 62(3):165–175 13. Singla N, Singh N (2017) Blood vessel contrast enhancement techniques for retinal images. Int J Adv Res Comput Sci 8(5) 14. Ricci E, Perfetti R (2007) Retinal blood vessel segmentation using line op- erators and support vector classification. IEEE Trans Med Imaging 26(10):1357–1365 15. Verma PM, Singh NP, Yadav D (2020) Image enhancement: a review. In: Hu YC, Tiwari S, Trivedi MC, Mishra KK (eds) Ambient communications and computer systems. Springer, Singapore, pp 347–355 16. Singh LSS, Ahlawat AK, Singh KM, Singh TR (2017) A review on image enhancement methods on different domains 17. Singh NP, Nagahma T, Yadav P, Yadav D (2018) Feature based leaf identification. In: 2018 5th IEEE Uttar Pradesh section international conference on electrical, electronics and computer engineering (UPCON). IEEE, pp 1–7
Chapter 41
Prediction of Mental Stress Level Based on Machine Learning Akshada Kene and Shubhada Thakare
1 Introduction In present day culture, stress is one of the serious issues in numerous nations around the globe. A modest quantity of stress in work or study can be go about as certain. Be that as it may, significant level pressure for long haul can be persistent and it can cause numerous ongoing illnesses. Ongoing pressure can bring about genuine medical issue including uneasiness, sleep deprivation, muscle torment, hypertension, and a debilitated invulnerable system. Mental problems are compromising individuals’ well-being. They are viewed as a main consideration of progress the state of mind of an individual and the individual goes into a downturn. These days clients can be pushed because of social cooperation’s of informal organizations. The quick increment of mental issues or stress has become an extraordinary test to human well-being and nature of life. It ought to be noticed that the terms pressure and uneasiness are regularly utilized conversely. Their fundamental contrast is that uneasiness is typically an inclination not straightforwardly and obviously connected to outer signals or target dangers. Then again, stress is a prompt reaction to every day requests and is viewed as more versatile than nervousness. Nervousness and stress regularly include comparable actual sensations, for example, higher pulse, sweat soaked palms and beating stomach, set off by to a great extent covering neuronal circuits when the cerebrum neglects to recognize the distinction between an apparent and a genuine danger. These similitude’s reach out to the outward appearances related with each state. According to study 49% of 18–24-year-old report significant degree of stress felt looking at themselves (Fig. 1).
A. Kene (B) · S. Thakare Department of Electronics and Engineering, Sant Gadge Baba Amravati University, Amravati, Maharashtra, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_41
525
526
A. Kene and S. Thakare
Fig. 1 Impact of stress on human brain Farhad and Rizwan [1]
Likewise, lady is bound to report physical and enthusiastic indications of stress than men due to expanding working hours. But, the greater part of individuals is wonder whether or not to take legitimate treatment over it. According to an American organization, 80% of workers experience work stress and almost half say they need counseling on how to manage stress, and 42% say their colleagues need such counseling. According to the Health and Safety Executive (HSE), 44% of cases involving occupational stress, misfortune or discomfort as well as disability of all occupations have been lost in 2018–19 and 54% of all workers due to chronic illness. That’s the reason stress identification is significant. Stress has three prong impacts: Subjective impacts of pressure incorporate sensations of blame, disgrace, nervousness, animosity or dissatisfaction. People likewise feel drained, tense, apprehensive, crabby, ill humored, or desolate. Visible changes in an individual’s conduct are addressed by behavioral impacts of pressure. Impacts of conduct stress are considered such to be expanded mishaps, utilization of medications or liquor, chuckling outside any connection to the issue at hand, extraordinary or contentious conduct, truly sensitive dispositions, and additionally eating or toasting abundance. Diminishing mental capacity, weakened judgment, rash choices, carelessness as well as touchiness to analysis are a portion of the impacts of cognitive pressure.
2 Related Work The stress we experience is a physical reaction to mental excitement or actual difficulties. Modest quantities of stress might be wanted, useful, and surprisingly solid. In any case, unnecessary pressure may prompt real mischief and increment the danger of sickness. High blood pressure, asthma, failure, ulcers, stress can be a dangerous
41 Prediction of Mental Stress Level Based on Machine Learning
527
factor for the undisputed amounts of mental instability and pain. Several papers are available on stress identification they are describe as follow.
2.1 Implementation of Stress Detection using Signal Processing Jatupaiboon et al. [2] proposed to utilize constant EEG sign to arrange glad and miserable feelings inspired by pictures and old-style music. Thinking about each pair of channels and distinctive recurrence groups, worldly pair of channels gives a preferred outcome over the other territory does, and high recurrence groups give a preferable outcome over low recurrence groups do. These are useful to the improvement of feeling grouping system using insignificant EEG diverts continuously. From the outcomes, creators actualize ongoing bliss recognition framework utilizing just one sets of channels. Munla et al. [3] introduce automobile driver data set (drivedb). A few grouping methods were examined incorporating SVM-RBF portion, KNN, and RBF classifier. Stress location anticipated with a precision of 83% utilizing SVM-RBF classifier. Ghaderi et al. [4] presented the stress level of user using the biological signals based on the sensor like heart rate, respiratory, and hand and foot, etc. The various features were extracted at different time interval. For the classification KNN and SVM approach were used and obtained the exact stress level. Jung et al. [5] evaluate the mental stress and anxiety based on the multimodal sensors and measure the physical changes in the different parts in the human body. For evaluate the mental anxiety level utilized the various sensors like EEG, ECG, SPO2 , BP, and RR. For the classification of anxiety level used the SVM classifier. Crescentini et al. [6] presented study looked at the effects of the mental and physical response to an immersive virtual program environment conducted over 8 weeks of mindfulness-oriented meditation. Salai et al. [7] presents the ideas and after effects of two investigations focused at pressure location with a minimal effort of sensor like chest belt. In the gadget approval study (n = 5), they thought about pulse information and different highlights from the belt to those deliberate by a best quality level gadget to evaluate the dependability of the sensor. Zenonos et al. [8] investigate the chance of utilizing such gadgets for temperament acknowledgment, zeroing in on workplaces. They propose a novel mind-set acknowledgment structure that can distinguish five force levels for eight distinct kinds of mind-sets like clockwork. Further present a cell phone application (‘Healthy Office’), intended to encourage self-detailing in an organized way and furnish model with the ground truth. Nakashima et al. [9] stress classifies into three states relaxed, concentrated what’s more, focused and exhibit choice level combination. In Zaghouani [10], analyze the stress level of working people using the various sensors and classify the stress level of individual based on the SVM, KNN algorithms. For this research extract the SWELL-KW dataset and achieve the maximum
528
A. Kene and S. Thakare
accuracy. In Zaghouani [11] create an enormous scope dataset of clients with selfannounced sorrow messages. Use different normal language handling (NLP) devices and methods to uncover the etymological examples and the conclusions communicated by these tweets. Finally, apply AI (ML) strategies to assemble conduct expectation devices utilizing the explained corpus. In Banga and Ahuja [12], analyze the stress level of college students. Examine the various parameters of 206 students like exam pressure, and recruitment stress. The dataset ware extracted from Jaypee Institute of Information Technology. For the classification LR, NB, RF, and SVM classifier was used. In Farhad and Rizwan [1], presented the design of stress detection system to acquire the various human body signal-based on the machine learning. For the simulation and implementation used the MATLAB software. Physionet drivedb dataset was used for training and testing. The result ware showed in terms of stress level classification accuracy using SVM algorithm. In Chen et al. [13], presented the model for automatically recognizes the stress level of user in daily life based on the XGBoost machine learning technique. In this research wearable stress and affect dataset was used for stress analysis. For the feature selection electrodermal activity was used. The various evaluation parameters ware examined such as accuracy, F1score. In Chaware and Makashir [14], designs were introduced to estimate user stress levels with social media data. Facebook post was used as data. CNN was used to remove Facebook posts. KNN and Transductive SVM were used to categorize users’ posts and to estimate user stress levels on a positive and negative basis.
2.2 Implementation of Stress Detection using Image and Video Processing Chen et al. [15] introduced an HSI-based technique for the location of mental pressure. The solid material-separating capacity of HSI was used in this investigation to separate and measure the measure of blood chromophores (Hb and HbO2 ) utilizing the Beer-Lambert Law. The way where HSI signals are acquired (caught picture) portrays this strategy as a without contact pressure location method. Adams et al. [16] investigate the impact of head movement from outward appearance in the view of complex feelings in videos. They show that head movements convey enthusiastic data that is integral as opposed to repetitive to the feeling content in outward appearances. At last, they show that passionate expressivity in head movement is not restricted to gestures and shakes and that extra motions, (for example, head slants, raises, and general measure of movement) could be helpful to computerized acknowledgment frameworks. Giannakakis et al. [17] builds up a system for the location and investigation of stress/tension enthusiastic states through video-recorded facial signals. An exhaustive test convention was set up to incite orderly changeability in full of feeling states (unbiased, loose and focused/anxious) through an assortment of outside and inner stressors. The examination was centered basically around non-deliberate and
41 Prediction of Mental Stress Level Based on Machine Learning
529
Table 1 Summary table of stress identification Authors
Dataset used
Classifier
Classification
Accuracy
Ghaderi et al. [4]
Physionet
KNN, SVM
Low, medium, high
100 and 200 s 98.41% 300 s 90%
Munla et al. [6]
Drivedb
SVM-RBF, KNN, RBF
Normal, stressed
Highest accuracy SVM-RBF 83%
Sriramprakash et al. [10]
SWELL-KW
SVM
Stressed, normal
92.75%
Ahuja et al. [12] 206 students of JIIT Noida
Random forest, Naïve Bayes, SVM, KNN
High stressed, stressed, normal
Highest accuracy SVM 85.71%
Fahim Rizwan et al. [1]
Physionet
SVM
Stressed, non-stressed
98.6%
Hsieh
WESAD
XGBoost
Stressed, non-stressed
92.38%
Chaware et al. [14]
Information extraction from Facebook attributes
TVSM
Positive, Negative 84.2% Positive, negative
semi-intentional facial signals to gage the feeling portrayal all the more impartially. Highlights under scrutiny included eye related occasions, mouth movement, head movement boundaries and pulse assessed through camera-based photo plethysmography. A component choice system was utilized to choose the strongest highlights followed by characterization plans separating between pressure/uneasiness and unbiased states regarding a casual state in each experimental step. In addition, based on the ranking changes were proposed using self-reporting to examine the correlation of facial parameters with anxiety/stress/discomfort experienced by the user. The results showed that specific signals from eye, facial, and head movements, as well as camera-based cardiac activity, obtained good accuracy and accurately indicated the difference between stress and anxiety (Table 1).
3 Proposed Methodology Figure 2 shows the general architecture of stress detection system which is classified into various phases such as stress dataset, preprocessing, training and testing data, stress detection model, and predicted result. In this paper, used SVM and RF machine learning algorithms for predicting the stress of users. Stress Dataset In this phase data is collected from online stress scale questionnaire, etc. The data is collected based on the structured questionnaire for users. Attributes and class labels are created in this dataset and take the effect of stress estimation. Each tuple in the dataset contains integer values for attributes. Class label values
530
A. Kene and S. Thakare
Fig. 2 General architecture of stress detection based on machine learning stress level
are provided by psychiatrists based on input values provided as part of user survey data. In this study, total 270 male/female users are participated and total 25 questions in questionnaire for stress analysis. All the participant are belonging the 18–50 age group. The database consists of various question as an attribute and class label used for predicting the stress of participants. Each tuple (Table 2).
Table 2 Number of participant Male
150
Female
120
41 Prediction of Mental Stress Level Based on Machine Learning
531
Table 3 Stress level Parameter
Score (range)
No stress
0–5
Stress
11–15
Questionnaire for Stress • • • • • •
Over react to situation A lot of nervous energy Get agitated Problem in relaxing Oversensitive Narrow-minded.
The above question data were encoded based on the value 0 and 15. Each question score were estimated by add the values associated to the question [24]. Score = Sum of ratings of each class ∗ 2. After all the score has been calculated, the label based on the stress and No stress (Table 3). Preprocessing This phase includes various operations on input data to remove the unnecessary data which is not required for stress detection. While conducting a survey, the attributes of user name, address, age, education, nature of work, and working environment. Some fields have missing values. Data preprocessing has been done to remove the unnecessary data from the dataset. Splitting data In this phase after the data pre-processed dataset can be divided into two parts as 70% dataset are used as training and 30% are used as testing. Stress Detection Model The machine learning algorithm was implemented in Python programming. This predict the stress level of user according to the score. In this phase implement the support vector machine and random forest machine learning algorithm for stress detection and classify the mental stress of user. It predicts the stress level called as stress, No stress. The classification of stress level of user which means data is collected from participant involve in survey are used for training SVM and RF classifiers. The feature of stress is obtained and classifies stress level as stress or No stress. Predicted Result In this phase measure the model performance evaluation in terms of various parameters like accuracy, precision, recall, and confusion matrix, etc. Algorithm: Learning the Classifiers Input: Dataset D, Learning rate, Network Output: Trained the classifiers.
532
A. Kene and S. Thakare
Step 1: Input the dataset. Step 2: The random values −1 to +1, i.e., weight must be assigned to input. Each input is sent to neural network. Step 3: Sum of all weight input. Step 4: generate the Output.
3.1 Stress Level The main aim of this research is to predict the stress level of users. The target attribute class label 0–15. Predicts the stress level in which 0–10 defines No stress, 11–15 defines stress.
4 Result Evaluation Figure 3 shows the performance of proposed model over both the machine learning algorithms as follow. We implement the two machine learning models of SVM and random forest to understand the accuracy using the confusion matrix (Fig. 4). Fig. 3 Comparative analysis of both classifier in term of accuracy confusion matrix
Accuracy in %
Accuracy 90 80 70 60 50 40 30 20 10 0
80.2 65
RF
SVM
Classifiers
41 Prediction of Mental Stress Level Based on Machine Learning Fig. 4 Classification accuracy in terms of precision and recall comparative analysis
Accuracy in %
90 80
533
82 68
70
81 69
60 50 40 30
RF SVM
20 10
0
Precision
Recall
Precision and Recall
Precision = Recall =
TP FP + TP
(1)
TP TP + FN
(2)
Table 4 shows the comparative analysis of various existing machine learning Table 4 Performance comparison for mental stress detection of existing system with proposed system
Authors
Techniques
Accuracy (%)
Prof. Pingle [18]
CNN
60
Reddy et al. [19]
KNN
73
Banga et al. [1]
Naïve Bayes
71
Proposed method
SVM RF
80.2 65
534
A. Kene and S. Thakare
algorithm with the proposed machine learning algorithms. The proposed approach gives the better the better accuracy than existing.
5 Discussion According to our preliminary information, this study is using machine learning to better understand stress in the user. This paper compares four common machine learning (ML) models. All four machine learning methods provided accurate models for estimating stress due to long-term stresses. A combination of various machine learning techniques, as well as deep learning techniques, is used to find the exact stress level and also used to predict the disease. Convolutional neural network is a deep learning algorithm used to predict the users’ stress levels [14]. Various machine learning techniques such as K near neighbors (KNN) [6–17], and naïve Bayes [1] are also used to predict stress levels, user anxiety, and class labels. The final class label is based on the majority of the number of votes decided.
6 Conclusion Mental stress is a big problem in day-to-day life. It is concluded that different kinds of techniques are used to identify stress. Also, several datasets are available which is helpful for identifying stress. Different classifiers are used which is shown in summary table with their accuracies. So that it is possible to identify stress. In existing approach, stress can be predicted by face reading, interview, and other activities, people are analyses to each other. In this study design system for predicting the user mental stress using the machine learning approach. To classify and predicting the mental stress on obtained data by SVM and RF classifiers. It can be observed that SVM gives the better accuracy than RF algorithm. The proposed designed is helpful for predicting the mental stress of user using the conversation and social media data.
41 Prediction of Mental Stress Level Based on Machine Learning
535
References 1. Farhad R, Rizwan MF et al (2019) Design of a bio signal based stress detection system using machine learning techniques. In: International conference on robotics, electrical and signal processing techniques (ICREST), p 364 2. Pan-ngum S, Israsena P, Jatupaiboon N (2013) Real-time EEG-based happiness detection. The Sci World J:12 3. Khalil M, Shahin A, Mourad A, Munla N (2015) Driver stress level detection using HRV analysis. In: International conference on advances in biomedical engineering (ICABME) 4. Frounchi J, Farnam A, Ghaderi A (2015) Machine learning-based signal processing using physiological signals for stress detection. In: 22nd Iranian conference on biomedical engineering (ICBME 2015), Iranian research organization for science and technology, Tehran, Iran, 25–27 Nov 2015 5. Jung Y, Yoon YI (2016) Multi-level assessment model for wellness service based on human mental stress level 6. Chittaro L, Capurso V, Sioni R, Fabbro F, Crescentini C (2016) Psychological and physiological responses to stressful situations in immersive virtual reality: differences between users who practice mindfulness meditation and controls. Comput Hum Behav 7. Vassányi I, Kósa I, Salai M (2016) Stress detection using low-cost heart rate sensors. Corp J Healthc Eng 8. Khan A, Kalogridis G, Vatsikas S, Zenonos Z (2016) Healthy office: mood recognition at work using smartphones and wearable sensors. In: 2nd IEEE international workshop on sensing systems and applications using wrist worn smart devices 9. Kim J, Flutura S, Nakashima Y (2016) Stress recognition in daily work. In: International symposium on pervasive computing paradigms for mental health 10. Prasanna VD, Ramana Murthy OV, Sriramprakash S (2017) Stress detection in working people. Procedia Comput Sci 11. Zaghouani W (2018) A large-scale social media corpus for the detection of youth depression. In: 4th international conference on Arabic computational linguistics, Dubai, United Arab Emirates, 17–19 Nov 2018 12. Banga A, Ahuja R (2019) Mental stress detection in university students using machine learning algorithms. Procedia Comput Sci 13. Chen Y, Beh WK, Wu AYA, Hsieh CP (2019) Feature selection framework for xgboost based on electrodermal activity in stress detection. In: IEEE international workshop on signal processing systems (SiPS), Nanjing, China 14. Chaware SM, Makashir C (2020) Stress detection methodology based on social media network: a proposed de-sign. Int J Innov Technol Exploring Eng (IJITEE) 9(3) 15. Yuen P, Richardson M, Liu G, She Z, Chen T (2014) Detection of psychological stress using a hyperspectral imaging technique. IEEE Trans Affect Comput 5(4) 16. Mahmoud M, Baltrusaitis T, Robinson P, Adams A (2015) Decoupling facial expressions and head motions in complex emotions. Affect Comput Intell Interact (ACII) 17. Pediaditis M, Manousos D, Kazantzaki E, Chiarugi F, Simos PG, Marias K, Tsiknakis M, Giannakakis G (2017) Stress and anxiety detection using facial cues from videos. Biomed Sign Process Control:89–101 18. Reddy US, Thota AV, Dharun A (2018) Machine learning techniques for stress prediction in working employees. In: 2018 IEEE International conference on computational intelligence and computing research (ICCIC). IEEE, pp 1–4. https://doi.org/10.1109/ICCIC.2018.8782395 19. Priya A, Garg S, Tigga NP (2020) Predicting anxiety, depression and stress in modern life using machine learning algorithms. Procedia Comput Sci 167:1258–1267 20. Murad O, Malkawi M (2013) Artificial neuro fuzzy logic system for detecting human emotions. Hum-Centric Comput Inf Sci 21. Neerincx MA, Kraaij W, Koldijk S (2016) Detecting work stress in offices by combining unobtrusive sensors. IEEE Trans Affect Comput
536
A. Kene and S. Thakare
22. Huang G, Kilian Q, Weinberger KQ, Zheng AX, Zhixiang Xu (2019) Gradient boosted feature selection. Int J Eng Technol:10 23. Pingle Y (2021) Evaluation of mental stress using predictive analysis. Int J Eng Res Technol (IJERT), NTASU—2020 conference proceedings, special issue—2021
Chapter 42
Data Distribution in Reliable and Secure Distributed Cloud Environment Using Hash-Solomon Code Abhishek M. Dhore and Nandita Tiwari
1 Introduction Computer technology has progressed rapidly since the beginning of the twentyfirst century. Cloud computing is an on-demand technology. It was first introduced by San Jose and has been widely accepted by various segments of society. Cloudbased technology is the cloud computing technology, with cloud storage as a major component. Cloud computing does not need cloud customers to make an early financial commitment so that they may quickly set up tiny companies and build up their resources only when required. While cloud computing and its underlying virtualization technology offer many advantages to consumers such as elasticity, scalability, flexibility, null overhead maintenance and reduced costs, major worries remain regarding data stored in the cloud—such as dependability, security and privacy [1]. Because consumers often anticipate services when they want them, dependability has become one of the most urgent cloud data storage issues. Cloud service providers (CSPs) need to provide reliable cloud services, operate according to expectations, manage problems without creating downtime and recover from errors without hurting a significant number of customers or end-users. On the other side, cloud downtime studies have shown average annual unavailability of 7.5 h, which is 99.9% available [2]. This is far below the expected level of availability for key businesses, which is 99.99% on average (i.e. 1 h of unavailability per year). To solve the availability problem, leading cloud service providers have incorporated data redundancy on their server locations on many physical computers. However, recent instances of cloud interruption have shown that extra redundancy is not adequate if cloud services collapse completely.
A. M. Dhore · N. Tiwari (B) Computer Science Department, SRK University, Bhopal, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_42
537
538
A. M. Dhore and N. Tiwari
Businesses in data-sensitive sectors such as health care, banking and finance may be severely affected by long-term cloud data service failures and security problems. Fault tolerance and security procedures at their server-side locations have been incorporated by almost every leading communication service provider (CSP) to retrieve original information in the event of service failure or data corruption and to prevent cybercriminals from compromising data. Although such mechanisms are suitable for a small number of hard disc failures and attack from external hackers, they are ineffective for end-users seeking to ensure reliability and security when major cloud utilities are down or when internally hacking cloud services such as cloud service providers’ employees are at risk (CSPs). To ensure high reliability and security of critical data, clients should thus not rely exclusively on a single CSP. Fog computing is an extended cloud-based computer architecture consisting of a variety of fog nodes with storage and processing capabilities. Often traditional secure cloud storage solutions depend on data encryption. Although these methods may relieve most of the issues, none of them can address internal assaults. We propose a three-layer storage technique based on the fog computing paradigm and a Reed-Solomon-based Hash-Solomon code in this regard. We split the data of the user into three distinct components, which are stored in the cloud server, fog server and the local workstation. Depending on the Hash-Solomon code’s properties, the original data cannot be retrieved via partial data and creates a portion of redundant data blocks that are used for decoding. We give more information in this article on how to utilise HashSolomon [3] algorithms to ensure excellent dependability and security of cloud data.
2 Literature Review Many research has been carried out on the server-side using erasure codes to enhance the dependability of cloud storage systems. In Windows Azure, Huang et al. argued that erasure codes should be used [4]. They suggested a new set of erasure codes, the local reconstruction code (LRC), which may minimise the number of erasure codes required for data reconstruction. This technique separates redundant data into local and global parity sets, which are subsequently kept on isolated servers. Since local parities minimise I/O and the overhead network during data recovery, total rebuilding costs may be significantly reduced. Gomez et al. [5] have suggested a novel persistence method using erasure codes to store IaaS cloud data securely. They developed a scalable erasure-coding process that may make the local storage more reliable while needing minimal overhead and connection. The results of their tests suggest that their technology may improve the overall performance of HPC applications in the real world. Khan et al. [6] offered a few suggestions for the use of cloud file systems erasure-coding technology to aid load balancing and gradual data centre expansion. Its proposed approach may minimise
42 Data Distribution in Reliable and Secure Distributed Cloud…
539
associated data loss failures and decrease the effect on data collecting or the application of a single failure. Though the above-mentioned methods may enhance the reliability of cloud data in data centres, they do not provide support for end-users in the case of a cloud outage by a service provider. In contrast to prior methods, the application uses erasure-coding techniques level via the combination of multiple CSPs. Our technology is defective in storing cloud data when one of the cloud services fails to distribute the encoded redundancy data of the user across several cloud storage services. There is also considerable work to secure cloud data, which is directly linked to this issue. For providers such as Amazon EC2 Santos et al. [7], Santos et al. proposed a secure and trusted cloud computing platform (TCCP). The platform provides a closed box executing environment which guarantees guest virtual machines operating on a cloud infrastructure are secret. Hwang and Li suggested that data colouring and watermarking software methods may secure shared cloud data Hwang and Li [8]. Its technology may effectively secure data objects against destruction, robbery, alteration or erasure, and users can access the cloud data they need exclusively. Wang et al. [9] have built an efficient and flexible distributed architecture that includes storage precision insurance and server misbehaviour detection. This architecture protects cloud data storage servers against Byzantine failures and malicious attempts to change data using erasure-coding methods. Existing cloud data security solutions often assume that CSPs are trustworthy and able to avoid physical attacks on their systems. This cannot be the case in practice; however, since service providers often collect cloud data from users for commercial purposes, such as targeted advertising. Moreover, there were many occasions in which cloud services endangered the vital data of customers. Consequently, a suitable option for both people and businesses to store sensitive data in the cloud is not dependent only on the security measures of the service providers. Users should be allowed to apply security measures to their customer data. In contrast to the above methods for data protection in the cloud, our solution relies on no server-side security measures. The cloud data storage application on the customer’s side may instead split user data into pieces, encrypt it with delete codes and spread it to many CSPs. If no collusion exists amongst CSPs, customers may retain their data in the cloud securely. Further cloud storage research focuses on improving the performance of cloud storage. To guarantee fairness in common storage systems, Shue et al. [10] developed a cloud-based approach to balancing workloads in multi-tenant systems. They have also distributed workload across virtual machines to achieve high use and to improve server system performance. Zia and Khan have identified many significant cloud performance problems [11]. They described potential performance improvements in several areas including storage services, scalability, network services, scheduling, optimal data centre positioning and fast SQL query processing. In contrast to the previous methods, our solution focuses on the performance of network service and utilises the multithread methodology for uploading and downloading information with multiple CSPs. Experimental results show that system performance may be significantly improved with adequate data points.
540
A. M. Dhore and N. Tiwari
3 Reliable and Secure Cloud Data Storage Framework To solve the main issues in cloud storage services, we investigated a reliable and secure distributed cloud data storage system using various CSP’s from Haiping Xu and Deepti Bhalerao [12]. Figure 1 shows the foundation for such a distributed storage system. The primary part of the system is the cloud data storage application that uses erasure coding to encode and decode sections of client files and to concurrently upload and download encoded file chunks on a variety of CSPs. Note that each CSP usually contains multiple portions of a file, and thus, simultaneous transfers occur at two levels: several CSPs and one CSP. The programme first divides the file into multiple data parts, say n parts, and then codes them using the erasure-coding method in optimum quantities of m-control parts. The components are concurrently uploaded to multiple cloud-kept CSPs, with CSP1, CSP2 and CSP N markings in Fig. 1. As no CSP is fully aware of user data, this method may effectively protect any CSP from data violations. On the other side, if a user wants to download a file, the software will try to download n data pieces from multiple cloud providers concurrently. If all data portions are available, they can be efficiently merged into the original file without additional decoding costs. If one or more service providers fail, the software instantly downloads all data parts available (n1) and the checksum parts (m1). Due to the erasure-coding methods, the computer can always correctly decode the missing data bits by using
Fig. 1 A framework for reliable and secure distributed cloud data storage systems
42 Data Distribution in Reliable and Secure Distributed Cloud…
541
available data parts and get the original file. Note that the checksum components are reliable and default file tolerance redundancy.
4 Hash-Solomon Code Based on Reed-Solomon Code Hash-Solomon code is based on Reed-Solomon codes and uses a variety of digital communication and storage applications to fix a block-based mistake. It also repaired faults of other systems, such as storage devices, wireless or mobile communication, satellite and digital television. The typical system implementation utilising ReedSolomon code is shown in Fig. 2. The Solomon shooting encoder blocks digital data and adds more superfluous bits. The error arises when numerical reasons are sent or stored (noise, scratches on CD). The Shoe Decoder Solomon processes every block and tries to correct errors and retrieve the original data. Only the characteristics of the Solomon-Reed code correct the number and kinds of errors.
4.1 Properties of Reed-Solomon Code The BCH subset is Reed-Solomon codes and linear block codes. A Solomon-Reed code is computed as RS with S-bit symbols (n, k). The encoder utilises several S-bit K data symbols and transmits parity symbols for the symbol to produce an n-code word. N −kS-bit parity symbols are provided. Reed-Solomon decoder can correct errors in coding words, when 2t = n − k is included, up to t symbols.
Fig. 2 System implementation using Reed-Solomon code
542
A. M. Dhore and N. Tiwari
Because the data is kept unaltered, the parity symbols are added to the usual Solomon code. The code of Hash-Solomon includes the following features: In conjunction with an encoding matrix, we require at least K data blocks to collect the original data. Once the number of blocks is less than K, it is not possible to retrieve data. The above characteristics will be stored on a higher server after each data block has been encoded. And the lesser server saves fewer than K parts of the databases and the remainder of the data. This splits data and automatically stores certain data block percentages through the cloud server, fog server and local PC. The hacker is unable to breach the data on the individual server. While the assailant is intelligent enough to stole data on both systems, it obtains more than K blocks. User information is not received until the two server blocks have been received, since the databases are in the encoding matrix. If the encoding matrix is tried to break, the value of M and K is extremely high, and the encoding matrix cannot be cracked in theory. However, we cannot ensure the data blocks’ privacy by using encoding technology, in particular for the document file. For example, a document is coded and the document information remains in each data block component. Therefore, before we encode the original data, we perform a hatch transformation and then store the user’s local system hash information. Figure 3 is the difference between the original data transformation and the transformation of the Hash-Solomon code. This splits the original code into separate parts according to the original sequence and divides the hatch code into various sequences of fragments. The Solomon-Hash method is used here to improve privacy and prevent hackers from obtaining personal information.
Fig. 3 Original transform vs hash transform
42 Data Distribution in Reliable and Secure Distributed Cloud…
543
4.2 Efficiency Analysis The storage efficiency is a major element of the algorithm storage index. High storage efficiency systems can save more storage capacity. The networking of the storage sector determines storage efficiency by following the equation. Storage Efficiency =
Data Space Data Space + Check Space
(1)
Here the storage efficiency can be expressed as E s = K/K + M. Es =
k = k+m
k m k m
+1
(2)
When the K and M ratio rises the number of K blocks that contain the coding efficiency also increases. The K, M connection completes the 2w > K + M equation. When w increases consumption of the RAM, the reciprocal efficiency of w likewise increases and may be written as Ec =
ln(k + m) ln2
(3)
The comprehensive efficiency of the scheme can be expressed as E w = C1
ln(k + m) k + C2 ln2 k+m
(4)
The C 1 and C 2 parameters are therefore linked to the storage ratio. The K value that corresponds to the peak for the entire efficiency of the scheme is the most appropriate.
5 Experimental Results The performance and practicality of the three-layer server framework based on the fog computer model using a different approach, including encoding, decoding and testing of various data sizes. Table 1 shows the details of datastorage size required for different type of data when number of blocks are changed. Here we illustrate the correlation between the number of blocks and the storage of data on a local system that uses various data types. Parameter M is the number of redundant blocks of data, and parameter K is the number of blocks of data needed for the division of the original data. Experimental research indicates that the quantity of information stored on the local user system is decreased when the number of
544 Table 1 Data volume stored for different types of data
A. M. Dhore and N. Tiwari Number of data blocks
Data volume stored in MB Video
Audio
Images
610
140
92
5
450
110
78
10
200
90
60
15
150
80
55
20
138
72
40
0
Fig. 4 Relation between data storage and number of blocks
data blocks K rises. The more data, the less local storage. Figure 4 represents the relationship between data storage and the number of blocks. Its performance varies if the larger uses the various kinds of data, the best impact of the experiment is the amount of the data. Therefore, it is important in the real system to increase the K value for the user’s storage. As with smaller files, it must be merged before submitting the files.
6 Conclusion Cloud computing provides many benefits. Cloud storage is to save more data for consumers. The actual storage of your data is not done through the cloud storage users to separate user documents or data or information. The data is stored in three distincts: cloud, fog and computer or main memory layers. We analysed a three-layer storage framework to minimise such attacks and prevent theft. In this instance, we used a Hash-Solomon algorithm that splits data into three parts and even encrypts and decrypts data. This is used because of the reliability and compatibility of our storage technology. We thus want customers to utilise this three-layer storage to enhance performance, leverage cloud and fog computing. And protected on-and-off the server even for both.
42 Data Distribution in Reliable and Secure Distributed Cloud…
545
References 1. Fitch D, Xu H (2013) A raid-based secure and fault-tolerant model for cloud information storage 23(5):627–654. http://dx.doi.org/https://doi.org/10.1142/S0218194013400111 2. Gagnaire M et al (2021) Downtime statistics of current cloud solutions. http://iwgcr.org/? p=404. Accessed 20 Jul 2021 3. Wicker SB, Bhargava VK (1994) Reed-Solomon codes and their applications. In: IEEE communications society and IEEE information theory society, p 322 4. Huang C et al (2012) Erasure coding in windows azure storage 5. Gomez LB, Gomez LB, Nicolae B, Maruyama N, Matsuoka S (2012) Scalable Reed-Solomonbased reliable local storage for HPC applications Iaas clouds. In: 18th international European parallel conference process, pp 313–324. http://citeseerx.ist.psu.edu/viewdoc/summary?doi= 10.1.1.397.7241. Accessed Jul 20 2021 6. Khan O, Burns R, Plank J, Pierce W, Huang C, Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads 7. Santos N, Gummadi KP, Rodrigues R, Towards trusted cloud computing 8. Hwang K, Li D (2010) Trusted cloud computing with secure resources and data colouring. IEEE Internet Comput 14(5):14–22. https://doi.org/10.1109/MIC.2010.86 9. Wang C, Wang Q, Ren K, Lou W, Ensuring data storage security in cloud computing 10. Shue D, Freedman M, Shaikh A, Performance isolation and fairness for multi-tenant cloud storage 11. Zia A, Khan MNA (2012) Identifying key challenges in performance issues in cloud computing. Int J Mod Educ Comput Sci 4(10):59–68. https://doi.org/10.5815/IJMECS.2012.10.08 12. Xu H, Bhalerao D (2015) Reliable and secure distributed cloud data storage using ReedSolomon codes. Int J Softw Eng Knowl Eng 25(9–10):1611–1632. https://doi.org/10.1142/ S0218194015400355
Chapter 43
A Novel Classification of Cancer Based on Tumor RNA-Sequence (RNA-Seq) Gene Expression Shweta Koparde
1 Introduction Cancer is common term to indicate a tumor that has metastatic invasive properties that are associated with abnormal cell development in the body [1]. In 2018 more than 9 million cases of cancer have been detected in recent years. It consisted of approximately 17% of women and 20% of men developing some kind of cancer, and 10 and 13% of men dying from cancer [2]. According to WHO, every year, more than 8 million people die of cancer, and about 13 percent of this figure in the world says that this is one of the worst diseases that have ever happened [1, 2]. The tumor is the result of abnormal development of cells that can survive as well as malignant. The tumor initially develops and cannot go far from the normal state of the tissues and organs of the body. The development of hematomas occurs as the accumulation and subsequent use of blood as such types of lymph nodes (so-called metastases) in the body. Most malignancies are divided into three main groups: carcinoma, sarcoma, and leukemia or lymphoma. A carcinoma is a malignant tumor. The degree of malignancy ranges from 1 to 4. Cancer develops from the stage of cells and tissues in transformed cells to maturation, compared to the appearance of normal parental epithelial tissue [1]. Recently, research of data on gene expression efforts in the area of interest in accurate medicine, which lays the foundation for promotion and research, there are several medical processes goal is to produce medicine more presentable and predictive–prognosis. Genomics, proteomics, transcriptomics, epigenetics, metabolomics, and other fields have produced an unprecedented amount of data [3]. Especially under the name of the highly effective in the clinical field RNA-Seq sequencing method [4], which provides information about gene expression, makes an accurate diagnosis S. Koparde (B) Pccoer, Ravet Pune 412101, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3_43
547
548
S. Koparde
more effective from the point of view of molecular clinicians and the patient’s condition. Since cancer is a disease that causes changes in tumor tissues or leads to the appearance of samples showing the expression of tumor genes in the tissues, it is possible to study the molecular factors that over time favor the course or affect the patient’s survival. Gene expression data provides important data on differentiated activation stages of genes, and it is used for cancer progression. If this data is effectively fetched, you use accurate diagnostic methods that lead to better treatment and prognosis. The Cancer Genome Atlas (TCGA) is a revolutionary cancer genome database that includes gene expression, DNA methylation, somatic mutations, copy multiple mutations, microRNA expression, and protein expression. In 2006, it was considered as a pilot project of the TCGA. There are three types of cancer: mild, caused by glioblastoma. Over the next decades, the collection and creation of a large database and discovered more than 33 types of TCGA tumors in 11,000 cases describing the molecular changes of the tumor [5–7]. These large data sets have provided great opportunities for classifying the global aberration landscape at the RNA, DNA, and protein levels [7]. In recent decades, many approaches to machine learning (ML) have been adapted to solve issue of diagnosing and find cancer on data on gene expression [8, 9]. On the other hand, initial knowledge is used to select specific genes that correspond to specific clinical outcomes, such as relapse, treatment benefits, and metastasis [10, 11]. On the other hand, various automated solutions for the search and selection of characters have also been used to reduce the dimension, which is a step forward in the application of different ML models [12]. Several methods, including elastic network, used to tenure Cox models to reduce input properties for solving problems [13]. Survival prediction system had also considered decision tree techniques that proved resistant to resetting problems in multidimensional scenarios [14]. Finally, the neural network approach (NN) has been widely used to predict the survival of less-dimensional data [15], but reliability of method does not appear to exceed the effectiveness of the Cox regression model [16]. Also, the black box of NN nature complicates their interpretation of mining biological or clinical knowledge derived from the model [17, 18]. The details about deep learning are discussed in the next section.
2 A Deep Learning (DL) Where DL has revolutionized the field of machine vision [19, 20], especially image recognition and object classification and recording, it is considered a promising means of improving such automatic diagnostic systems, achieving better results, expanding diseases, and making applied real-time medical imaging [21, 22] disease classification systems. DL is subtype of AI, and it is based on protocols for processing the data, modeling processes, or builds abstraction. DL uses layers of algorithms, processing, analysis and detecting patterns, hidden data, understanding of human
43 A Novel Classification of Cancer Based on Tumor RNA-Sequence …
549
speech, and visual object recognition [23, 24]. In DL, the output of the previous layer is the input of next layer. The invisible layer between the input and output layers is known as hidden layer. Each layer is a simple rule that differs slightly in the algorithm and includes the activation of some functions [25, 26]. DL, also known as deep neural network (DNN), is an assembly field of ML that produce significant breakthroughs recently with increased capacity, improved architectural models [27], and a surge in the number of data obtained from devices. There are three main machine learning paradigms: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning algorithms should be included in the training data set, having the appropriate function (inputs) and labels (outputs). Some common supervised learning algorithms include linear and logistical regression [28], SVM [29], Naive Bayes [30], gradient boosting [31, 32], rating trees, and random forests [33, 34]. Usually, this method is used in their research and classification regressions. Unsupervised learn, not require previous outputs, is designed to find models based on output data distribution. Clustering like hierarchical clustering [35, 36] and K-means [37, 38] is the method of unsupervised learning. Latent Dirichlet Allocation (LDA) [39], PCA [40], and word2vec [41] are the most popular recent approaches to unguided learning. Reinforcement learning [42] is described as programs that search for the best solution and maximize the reward [27]. DL contains several of artificial nerve cells that mimic the cells of the human brain. Each neuron has a mass value and has an updated slope algorithm during descent, reducing the loss of global function [43]. Using nonlinearity to multiple layers of each neuron using activation functions such as sigmoid, tanx, and ReLu, we extracted more abstract mathematical relationships from the input data and mapped them to the output. This [44] is new unlabeled data that predicts the presence of well-trained models. Since DL is a branch of ML, it extracts the general information base of ML, such as probabilities and statistics, and loss functions. But we can be more flexible between them and build multiple layer networks for better result [45–50]. Recently, DL has been used in medical area on genetic variants [51, 52], which shows advanced results in the task of call-outs [53] and improving its clotting of proteins [54, 55]. In comparison with other methods, DL is reliable and usually applies discrete or continuous data [56], does not require the development of functions with experience compared to general machine learning [27], and is superior to many advanced methods [53].
3 Related Work This section conducts recent research on the application of deep learning (DL) and machine learning (ML) to tumor gene expression data. Scientists from all over the world have begun to apply the tools of the machine and thorough training to obtain remarkable results on various issues of medical images. DL includes a variety of techniques, including multilayer NN (MLNN), CNN, deep autoencoders (AE), and RNN, which are commonly used in machine vision, pattern recognition, and other
550
S. Koparde
fields. It achieves the most up-to-date results in natural language processing [57]. In high-performance genomics, it provides internal expression and three-dimensional structure of information or decodes high-level attachment and sequencing of biological information [58], traditional methods to improve the performance of interpreted ML methods. Thus, a common data set from recent studies is used for gene expression variations [17, 18, 59, 60], some of which think that AEs can be used for more complex analysis and multigenic breast cancer and biologically significant problems related to the functional characteristics and classification of the gene cancer profile [17, 18, 59, 60]. AEs have also been successfully applied to gene expression and multi-genome data [61–63]. Feedforward MLNNs have recently had a habit of predicting clinical outcomes based on multidimensional genomic data [64] or find the cancer subtypes by using unsupervised learning [65]. In [66] used RNA-Seq, focused on grading 33 different cancer patients. Five ML algorithms are shown: DT, KNN, linear SVM, polynomial SVM, and ANN. The best results show that the linear SVM is 95.8. [67] develop to detect each type of tumor. It provided a wealth of information on 33 common types of cancers. CNN was used to classify the types of tumors and neural networks to discover the genes of the top-class tumors in the intake data. The high-dimensional RNA-Seq data was embedded in a 2D image and used as a CNN to create a classification for 33 types of cancer tumors. [7] have begun to develop 9096 samples of tumors representing 31 types of tumors. The authors have used genetic algorithms and method KNN for an iterative generation subset of genes (attributes) and then verified accuracy using the KNN method. This method has achieved 90% accuracy for 31 types of tumor and produced collection of cutting-edge genes for all types of tumors. DL method also classifies tumor genes and finds certain types of cancer. [68] used the laminated de-noise autoencoder to acquire the traits from multidimensional gene expression files. These properties are then injected into the ANN monolayer network to determine if the sample is cancerous. The outcomes show that interacting ontogenesis is useful in detecting breast cancer. [69] have presented a deep learning strategy and predicting cancerous tumors through data and RNA-Seq data. [70] have also shown a new strategy for the use of a comprehensive training approach that includes several different models of machine learning. The proposed deep learning multi-model imidimeto entity had sought publicly available RNA-Seq data sets.
4 Applying DL to Cancer Prediction To better understand and better compare the development of the industry, we conducted a study built on a simple NN model consisting of 3–4 layers, and a study built on a DNN consisting of 4 layers or more. These studies and models have been revised and condensed.
43 A Novel Classification of Cancer Based on Tumor RNA-Sequence …
551
4.1 Extraction of Features Medical data have properties of large sizes, small size sample, and complex nonlinear effects between biological components, [71, 72]. Reduction dimension supports a comprehensive analysis of multi-omics data [73]. In the next study, we will test various algorithms to reduce the size of the data sequence, separate a small number of characters, and fully train the NN. Sun et al., reduced the size of gene expression and copied number alteration data [74, 75]. Then, three models build from gene expression data (Table 1). To demonstrate the effectiveness of the model, it was reported that the ROC (0.845). Table 1 An overview of the NN model using feature extraction Publication
Data set
Cancer type
Feature extraction methods
Architecture
Output
Hao et al. [71] TCGA, path-way (MsigDB)
Glioblastoma Multiform
Analysis based on paths (from data mRNA up to 12,024 genes up to 574 pathway and 4359 genes)
Layer 4 NN: Gene, Pathway, Hidden, Output
Survival time
Sun et al. [75] Gene expression profile, and clinical data
Breast cancer
400 features Layer 4 NN extracted from mRMR and 200 from CNA
Survival time
Huang et al. [76]
mRNA, Breast cancer miRNA, TMB, CNB medical data
Epigenetic matrix with 57 measurements of mRNA data and 12 measurements of miRNA data
mRNA and Survival miRNA size time reduction hybrid network, 1 hidden layer input, medical data, TMB and CNB have no hidden layers
Chaudhary et al. [79]
TCGA medical data, miRNA, methylation data
Liver cancer
Autoencoder for extracting 100 features from mRNA, miRNA, and methylation data
3 layers NN
Shimizu and Nakayama [80]
METABRIC
Breast cancer
From statistical 3 layers NN methods selects 23 genes
Feature reduction
Survival time
552
S. Koparde
However, various ways of reducing the data dimensions. Huang et al., received omics, including data on gene expression (mRNA), mRNA, load data copies, data on load activated mutations, and clinical data, as well as they learned about the features and implement a model of DL to forecast survival rates in patients with breast cancer [76]. They also used the Cox relative risk model to study survival analysis [76]. The I/O layer is isolated from mRNA and miRNA data using a local maximal QuasiClique Merger (lmQCM) algorithm [77]. A matrix called eigengene was created out of an algorithm called LMQCM and was used for the submission of 57 and 12 measurement data (Table 1). The hidden layer data contain 8 and 4 neurons, respectively (Table 1). The sigmoid function is nonlinear for each progression and has been used as an activation function to introduce a relative regression of the CoX risk and predict survival time [78], suggesting that it is better than other models (Table 1). To minimize the size of the data, algorithms also test feature extraction by applying domain information as a selection criterion. A prognostic model was constructed using gene expression data in patients with glioblastoma polymorphic with 475-12 k genes, including information on survival [71] (Table 1). NN finds information itself can be used to process multi-mix objects. The common cancer of liver is hepatocellular carcinoma (HCC). The high level of heterogeneity makes it difficult to predict the disease. [79] build the NN model using the multidata of the 360HCC model from the TCGA database [79]. Multi-omics data include clause mRNA, methylation Cpg, and medical data. An uncontrollable autoencoder NN to convert characters, subtract sizes [79], and acquire 100 top features data (Table 1). Then identified 37 key features using the Cox-PH model, used k-mean clustering to determine survival risks, obtained a feature ranking using ANOVA. Finally, they built a predictive prediction using the SVM. In [80] on statistical significance find the survival rate with breast cancer, they have selected 23 genes from 184 genes associated with breast cancer [80] (Table 1).
4.2 Models Are Build Using CNN A reference recently, the approach to DL has achieved remarkable success since the modern network was built using CNN [45–48] and RNN [50, 81]. Many advances have been demonstrated in image classification and CNN as well as NLP and RNN data sequence research. The types of tumors include pathogenetic identification of the skin [82, 83], slide pathohistological identification [84], identification of patients with Alzheimer’s disease in the cranial region, measured in the morphological classification of tumor nuclei in the form of normal cells [85], re-admission to the hospital for the operation of cases of loss of electronic medical record (EHR) information [86, 87], mortality [88] and clinical result [89]. However, verifying the promoters of MGMT genes in the brain is complex and invasive. A 50-layer pre-trained CNN ResNet50 model [90] was used in transcription training using MRI images of MGMT promoters, and maximum accuracy of 95% was achieved [91] (Table 2). Other author’s team uses MRI images of brain with
43 A Novel Classification of Cancer Based on Tumor RNA-Sequence …
553
Table 2 Overview of CNN models Publication
Data Set
Cancer type
Architecture
Output
Kather et al. [84]
H&E tissue slides
Colorectal cancer
Base models: VGG19, Resnet50, AlexNet, SqueezeNet
Classify the 9 tissues type
Korfiatis et al. [91]
MRI Images
Glioblastoma multiform
Base model: ResNet18, 34 and 50
3 classes, methylated, unmethylated, or no tumor
Mobadersany et al. [93]
H&E images, medical data
Low-grade glioma and glioblastoma
Create VGG19 base model used Cox in- formation as output data and constructed 2 models using genomics data
Survival
Courtiol et al. [94]
H&E slides
Mesothelioma
As input, each slide Survival is divided into up to 10,000 tiles, and each tile has 3 classes: epithelioid, sarcomatoid, or biphasic. ResNet50 for extracting functions
Wang et al. [95] CT scan images High grade serous ovarian cancer
5 CNN layers
A 16-dimensional mineable feature vector
many age groups of patients for implementing bipolar convolution recurrent NN model to predict methylation in the state of the MGMT gene promoter [92]. A layer of RNN was added to model acquire consistent data from MRT pictures [92], but it is not suitable because performance model does not have a layer of RNN, with or without it. In this study, they show different methods for reducing voltage, such as L2 regularization, friction, and expansion data (Table 2). [93] trained NN convolution survival using medical data using genomic markers histological images, gliomas, glioblastomas and described that prognostic strength of NN exceeded the prognostic accuracy [93]. Painted H & E sections of tissue from 1061 patients to 769 samples were identified as areas of interest (ROI) containing tumor cells, which are viable using a web platform for imaging tissues and studying CNNs with a proportional risk of Cox as output layer to predict the consequences, patients (Table 2). In [84], 100,000 CRC H&E slides were used to label 86 image patches and classify them into 9 tissue classes. Using these images as training data, as well as images of 7180 patients with 25 patient test results, they built a CNN model for VGG19 and ResNet50 to achieve a 94–99% accuracy organization rating (Table 2). In [94], MesoNet model was constructed using of histological tissue from 2300 H&E slides.
554
S. Koparde
After carrying out training and feature transfer on ResNet50, they extracted the characteristic Matrix (2048) of each tile train-mesons. It gives better result MesoNet using histological taxonomy (Table 2) [94]. The same model used to separate objects and build ML models to predict cancer prognosis. In [95], extracted image features using CT images and trained CNN models to build a Cox-PH survival prediction model. The research involved 102 patients with HGSOC were used in a trait transplant cohort that remained 2-year follow-up (Table 2). An 8917 tumor images are used for CNN models to extract features of 16-dimensional feature vectors. In this example, NN is given. Use exclusively in research will promote, in particular, CNN to benefit from further research.
5 Result and Discussion In this section, the performance (accuracy and exercise time) comparison and the validation accuracy used in training phase are presented. Figure 1 shows the indicator of effectiveness is the progress in checking accuracy at the training stage. Advances in verification accuracy indicate an improvement in the rollover process. The progress in improving the validation accuracy was improved at the training stage, since after each epoch the validation accuracy was calculated and represented in the figure by black circles, while the blue lines are the training accuracy. Figure 1 shows the progress in improving the accuracy of verification during the training process while Fig. 2 shows the classification performance comparison. Thus, the performance of linear support vector machine (LSVM) obtained using DCNN-SVM (Vapnik 1995), random forests (RF) as well as the highest performance, including the DCNN, were evaluated. To calculate the performance in classification tasks, we used Python using TensorFlow and as a Scikit library. C4.5 random forest of decision trees uses the scikit library and also uses SVM algorithm, which is one-on-one and very effective for multiple classes. Figure 2 shows the results for different versions of DCNN with SVM, LSVM, RF, and only LSVM, RF, and DCNN. The graph shows that the DCNN-SVM has
Fig. 1 Accuracy verification during the training process
43 A Novel Classification of Cancer Based on Tumor RNA-Sequence …
555
Fig. 2 Classification performance comparison
the highest resolution, but the next model is the DCNN-LSVM. Data set utilizes DCNN and SVM, so DCNN–SVM is more accurate than other models, with the highest classifier performance (76.33%) due to the efficiency of the RBF kernel. The combination of DCNN and SVM was found to be better.
6 Conclusion Deep learning (DL) and training are aimed at generating great success, changing our daily lives. Many researchers use the methods of large and deep learning with the application of breakthroughs in the field of medicine. In this paper, conduct research on the most novel deep learning approach and classification of cancer-based tumor RNA sequence (RNA-Seq) gene expression data. One of the advantages of using in-depth training learning models is the ability to continue learning in the presence of a larger amount of data. Also, because data health has different formats, such as gene expression data, clinical (structured) data, text, and graphic (unstructured) data, it is becoming more popular and useful to use a variety of architecture called solving different problems with data. This research summarizes recent studies that have applied DL to study cancer prognosis. We also study the hybrid model of DCNN and nonlinear SVM to classify the RNA-Seq gene expression data. In these studies, many DL models showed similar or better performance than other ML models.
References 1. Zhang YH (2017) Identifying and analyzing different cancer subtypes using RNA-seq data of blood platelets. Oncotarget 8(50):87494–87511 2. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68(6):394–424 3. Schuster SC (2007) Next-generation sequencing transforms today’s biology. Nat methods 5(1):16. https://doi.org/10.1038/nmeth1156 PMID: 18165802
556
S. Koparde
4. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10(1):57. https://doi.org/10.1038/nrg2484 PMID: 19015660 5. Hutter C, Zenklusen JC (2018) The cancer genome atlas: creating lasting value beyond its data. Cell 173(2):283–285 6. Sanchez-Vega F (2018) Oncogenic signaling pathways in the cancer genome atlas. Cell 173(2):321.e10–337.e10 7. Li Y (2017) A comprehensive genomic pan-cancer classi_cation using the cancer genomeAtlas gene expression data. BMCGenomics 18(1):508 8. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17. https:// doi.org/10.1016/j.csbj.2014.11.005 PMID: 25750696 9. Bashiri A, Ghazisaeedi M, Safdari R, Shahmoradi L, Ehtesham H (2017) Improving the prediction of survival in cancer patients by using machine learning techniques: experience of gene expression data: a narrative review. Iran J Public Health 46(2):165–172 PMID: 28451550 10. Gao S, Tibiche C, Zou J, Zaman N, Trifiro M, O’Connor-McCourt M et al (2016) Identification and construction of combinatory cancer hallmark-based gene signature sets to predict recurrence and chemotherapy benefit in Stage II colorectal cancer. JAMA Oncol 2(1):37–45. https://doi.org/10.1001/jamaoncol.2015.3413 PMID: 26502222 11. Li J, Lenferink AEG, Deng Y, Collins C, Cui Q, Purisima EO et al (2010) Identification of high-quality cancer prognostic markers and metastasis network modules. Nat Commun 1(34) 12. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517. https://doi.org/10.1093/bioinformatics/btm344 PMID: 17720704 13. Simon N, Friedman J, Hastie T, Tibshirani R (2011) Regularization paths for Cox’s proportional hazards model via coordinate descent. J Stat Softw 39(5):1–13. https://doi.org/10.18637/jss. v039.i05 PMID: 27065756 14. Ishwaran H, Gerds TA, Kogalur UB, Moore RD, Gange SJ, Lau BM (2014) Random survival forests for competing risks. Biostatistics 15(4):757–773. https://doi.org/10.1093/biostatistics/ kxu010 PMID: 24728979 15. Baesens B, Van Gestel T, Stepanova M, Van den Poel D, Vanthienen J (2005) Neural network survival analysis for personal loan data. J Oper Res Soc 56(9):1089–1098. https://doi.org/10. 1057/palgrave.jors.2601990 16. Xiang A, Lapuerta P, Ryutov A, Buckley J, Azen S (2000) Comparison of the performance of neural network methods and Cox regression for censored survival data. Comput Stat Data Anal 34(2):243–257. https://doi.org/10.1016/S0167-9473(99)00098-5 17. Xiao Y, Wu J, Lin Z, Zhao X (2018) A semi-supervised deep learning method based on stacked sparse autoencoder for cancer prediction using RNA-seq data. Comput Meth Programs Biomed 166:99–105. https://doi.org/10.1016/j.cmpb.2018.10.004 18. Danaee P, Ghaeini R, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. Pac Symp Biocomput 22:219–229. https://doi.org/10.1142/978 9813207813_0022 PMID: 27896977 19. Rong D, Xie L, Ying Y (2019) Computer vision detection of foreign objects in walnuts using deep learning. Comput Electron Agricult 162:1001–1010 20. Maitre J, Bouchard K, Badard LP (2019) Mineral grains recognition using computer vision and machine learning. Comput Geosci 130:84–93 21. Lundervold AS, Lundervold A (2019) An overview of deep learning in medical imaging focusing on MRI. Zeitschrift für Medizinische Physik 29(2):102–127 22. Liu S, Wang Y, Yang X, Lei B, Liu L, Li SX, Ni D, Wang T (2019) Deep learning in medical ultrasound analysis: a review. Engineering 5(2):261–275 23. Riordon J, Sovilj D, Sanner S, Sinton D, Young EWK (2019) Deep learning with micro_uidics for biotechnology. Trends Biotechnol 37(3):310–324 24. Jaganathan K, Kyriazopoulou Panagiotopoulou S, Mcrae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB, Chow ED, Kanterakis E, Gao H, Kia A, Batzoglou S, Sanders SJ, Farh KK-H (2019) Predicting splicing from primary sequence with deep learning. Cell 176(3):535.e24–548.e24
43 A Novel Classification of Cancer Based on Tumor RNA-Sequence …
557
25. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 26. Cao C, Liu F, Tan H, Song D, Shu W, Li W, Zhou Y, Bo X, Xie Z (2018) Deep learning and its applications in biomedicine. Genomics, Proteomics Bioinf 16(1):17–32 27. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J (2019) A guide to deep learning in healthcare. Nat Med 25:24–29 28. Cramer JS (2003) The origins of logistic regression. Soc Sci Electron Publ. https://doi.org/10. 2139/ssrn.360300 29. Boser BE, Guyon IM, Vapnik VN (2008) A training algorithm for optimal margin classifiers. Proc Fifth Annu Workshop Comput Learn Theory 5:144–152 30. Maron ME (1961) Automatic indexing: an experimental inquiry. J ACM 8:404–417 31. Breiman L, Friedman JH, Olshen RA (2017) Classification and regression trees. Routledge, New York, NY, USA 32. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38:367–378 33. Breiman L (1997) Arcing the edge. Technical Report; Statistics Department, University of California: Berkeley, CA, USA 34. Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106 35. Sibson R (1973) SLINK: An optimally efficient algorithm for the single-link cluster method. Comput J 16:30–34 36. Defays D (1977) An efficient algorithm for a complete link method. Comput J 20:364–366 37. Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28:129–137 38. MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Los Angeles, CA, USA, pp 281–297 39. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022 40. Pearson K (1901) Principal components analysis. Lond Edinb Dublin Philos Mag J Sci 6:559 41. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119 42. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press: Cambridge, MA, USA 43. Hinton GE (1991) Learning distributed representations of concepts. In Proceedings of the eighth annual conference of the cognitive science society; Hillsdale, NJ, USA, 1991. p 12 44. Bengio Y (2009) Learning deep architectures for AI. Found Trends® Mach Learn 2:1–127 45. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In Proceedings of the 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, USA, 7–12 June 2015 46. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv: arXiv:1409.1556 47. He K, Zhang X, Ren S, Sun J (2000) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, Hilton Head Island, SC, USA, 15 June 2000, pp 770–778 48. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017, pp 1251–1258 49. Jordan M (1986) Serial Orer: a parallel distributed processing approach. Technical Report; California University: San Diego, CA, USA 50. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780 51. Quang D, Chen Y, Xie X (2015) DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31:761–763 52. Farahbakhsh-Farsi P, Djalali M, Koohdani F, Saboor-Yaraghi AA, Eshraghian MR, Javanbakht MH, Chamari M, Djazayery A (2014) Effect of omega-3 supplementation versus placebo on acylation stimulating protein receptor gene expression in type 2 diabetics. J Diabetes Metab Disord 13:1. https://doi.org/10.1186/2251-6581-13-1 53. Poplin R, Varadarajan AV, Blumer K, Liu Y, McConnell MV, Corrado GS, Peng L, Webster DR (2018) Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng 2:158. https://doi.org/10.1038/s41551-018-0195-0
558
S. Koparde
54. Wang Y, Yao H, Zhao S (2016) Auto-encoder based dimensionality reduction. Neurocomputing 184:232–242 55. AlQuraishi M (2019) AlphaFold at CASP13. Bioinformatics 35:4862–4865 56. Biganzoli E, Boracchi P, Mariani L, Marubini E (1998) Feed forward neural networks for the analysis of censored survival data: a partial logistic regression approach. Stat Med 17:1169– 1186 57. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444. https://doi.org/10. 1038/nature14539 PMID: 26017442 58. Miotto R, Wang F, Wang S, Jiang X, Dudley JT (2018) Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 19(6):1236–1246. https://doi.org/10.1093/bib/ bbx044 PMID: 28481991 59. Way GP, Greene CS (2018) Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Pac Symp Biocomput 23:80–91 PMID: 29218871 60. Sevakula RK, Singh V, Verma NK, Kumar C, Cui Y (2018) Transfer learning for molecular cancer classification using deep neural networks. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/TCBB.2018.2822803. PMID: 29993662 61. Chen HIH, Chiu YC, Zhang T, Zhang S, Huang Y, Chen Y (2018) GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization. BMC Syst Biol 12(142) 62. Zhang L, Lv C, Jin Y, Cheng G, Fu Y, Yuan D et al (2018) Deep learning-based multi-omics data integration reveals two prognostic subtypes in high-risk neuroblastoma. Front Genet 9:477. https://doi.org/10.3389/fgene.2018.00477. PMID: 30405689 63. López-García G, Jerez JM, Franco L, Veredas FJ (2019) A transfer-learning approach to feature extraction from cancer transcriptomes with deep autoencoders. In: Rojas I, Joya G, Catala A (eds) Advances in computational intelligence. Springer International Publishing, Cham, pp 912–924 64. Yousefi S, Amrollahi F, Amgad M, Dong C, Lewis JE, Song C et al (2017) Predicting clinical outcomes from large scale cancer genomic profiles with deep survival models. Sci Rep 7(1):11707. https://doi.org/10.1038/s41598-017-11817-6 PMID: 28916782 65. Chen R, Yang L, Goodison S, Sun Y (2019) Deep learning approach to identifying cancer subtypes using high-dimensional genomic data. Bioinformatics. https://doi.org/10.1093/bioinf ormatics/btz769 66. Hsu Y-H, Si D (2018) Cancer type prediction and classi_cation based on RNA-sequencing data. In: Proceedings 40th annual international conference of the IEEE engineering in medicine and biology society (EMBC), Jul. 2018, pp 5374–5377 67. Lyu B, Haque A (2018) Deep learning based tumor type classi_cation using gene expression data. In: Proceedings ACM international conference on bioinformatics, computational biology, and health informatics (BCB), 2018, pp 89–96 68. Danaee P, Ghaeini R, Hendrix DA (2016) A deep learning approach for cancer detection and relevant gene identi_cation. In: Proc Pacic Symp Biocomputing 22:219–229 69. Xiao Y, Wu J, Lin Z, Zhao X (2018) A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data. Comput Methods Programs Biomed 166:99–105 70. Xiao Y, Wu J, Lin Z, Zhao X (2018) A deep learning-based multi-model ensemble method for cancer prediction. Comput Methods Programs Biomed 153:1–9 71. Hao J, Kim Y, Kim T-K, Kang M (2018) PASNet: Pathway-associated sparse deep neural network for prognosis prediction from high-throughput data. BMC Bioinform 19:510. https:// doi.org/10.1186/s12859-018-2500-z 72. Ma T, Zhang A (2018) Multi-view factorization AutoEncoder with network constraints for multi-omic integrative analysis. In Proceedings of the 2018 IEEE international conference on bioinformatics and biomedicine (BIBM), Madrid, Spain, 3–6 Dec 2018 73. Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC (2016) Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform 17:628– 641
43 A Novel Classification of Cancer Based on Tumor RNA-Sequence …
559
74. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27:1226–1238 75. Sun D, Wang M, Li A (2019) A multimodal deep neural network for human breast cancer prognosis prediction by integrating multi-dimensional data. IEEE/ACM Trans Comput Biol Bioinform 16:841–850 76. Huang Z, Zhan X, Xiang S, Johnson TS, Helm B, Yu CY, Zhang J, Salama P, Rizkalla M, Han Z (2019) SALMON: survival analysis learning with multi-omics neural networks on breast cancer. Front Genet 10:166. https://doi.org/10.3389/fgene.2019.00166 77. Zhang J, Huang K (2014) Normalized imqcm: an algorithm for detecting weak quasi-cliques in weighted graph with applications in gene co-expression module discovery in cancers. Cancer Inform 13, CIN. S14021 78. Steck H, Krishnapuram B, Dehing-oberije C, Lambin P, Raykar VC (2008) On ranking in survival analysis: bounds on the concordance index. In: Proceedings of the advances in neural information processing systems; Malvern, PA, USA, pp 1209–1216 79. Chaudhary K, Poirion OB, Lu L, Garmire LX (2017) Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 24. https://doi.org/10.413 7/CIN. S14021 80. Shimizu H, Nakayama KI (2019) A 23 gene–based molecular prognostic score precisely predicts overall survival of breast cancer patients. EBioMedicine 46:150–159 81. Jordan M (1986) Serial order: a parallel distributed processing approach. Technical Report; California University, San Diego, CA, USA 82. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologistlevel classification of skin cancer with deep neural networks. Nature 542:115–118 83. Levine AB, Schlosser C, Grewal J, Coope R, Jones SJM, Yip S (2019) Rise of the machines: advances in deep learning for cancer diagnosis. Trends Cancer 5:157–169 84. Kather JN, Krisam J, Charoentong P, Luedde T, Herpel E, Weis C- A, Gaiser T, Marx A, Valous NA, Ferber D (2019) Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med 16. https://doi.org/10.1371/journal. pmed.1002730 85. Radhakrishnan A, Damodaran K, Soylemezoglu AC, Uhler C, Shivashankar GV (2017) Machine learning for nuclear mechano-morphometric biomarkers in cancer diagnosis. Sci Rep 7. https://doi.org/10.1038/s41598-017-17858-1 86. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M (2018) Scalable and accurate deep learning with electronic health records. NPJ Digit Med 1:18. https://doi.org/10.1038/s41746-018-0029-1 87. Shameer K, Johnson KW, Yahi A, Miotto R, Li L, Ricks D, Jebakaran J, Kovatch P, Sengupta PP, Gelijns S (2017) Predictive modeling of hospital readmission rates using electronic medical record-wide machine learning: a case-study using Mount Sinai heart failure cohort. Pac Symp Biocomput 22:276–287 88. Elfiky AA, Pany MJ, Parikh RB, Obermeyer Z (2018) Development and application of a machine learning approach to assess short-term mortality risk among patients with cancer starting chemotherapy. JAMA Netw Open 1. https://doi.org/10.1001/jamanetworkopen.2018. 0926 89. Mathotaarachchi S, Pascoal TA, Shin M, Benedet AL, Rosa-Neto P (2017) Identifying incipient dementia individuals using machine learning and amyloid imaging. Neurobiol Aging 59:80. https://doi.org/10.1016/j.neurobiolaging.2017.06.027 90. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In Proceedings of the 2015 IEEE international conference on computer vision (ICCV), Santiago, Chile, 7–13 Dec 2015, pp 1026–1034 91. Korfiatis P, Kline TL, Lachance DH, Parney IF, Buckner JC, Erickson BJ (2017) Residual deep convolutional neural network predicts MGMT methylation status. J Digit Imaging 30:622–628 92. Han L, Kamdar M (2018) MRI to MGMT: predicting drug efficacy for glioblastoma patients. Pac Symp Biocomput 23:331–338
560
S. Koparde
93. Mobadersany P, Yousefi S, Amgad M, Gutman DA, Barnholtz-Sloan JS, Velázquez Vega, JE, Brat DJ, Cooper LAD (2018) Predicting cancer outcomes from histology and genomics using convolutional networks. Proc Natl Acad Sci USA. https://doi.org/10.1073/pnas.1717139115 94. Courtiol P, Maussion C, Moarii M, Pronier E, Pilcer S, Sefta M, Manceron P, Toldo S, Zaslavskiy M, Le Stang N (2019) Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat Med 25:1519–1525 95. Wang S, Liu Z, Rong Y, Zhou B, Bai Y, Wei W, Wang M, Guo Y, Tian J (2019) Deep learning provides a new computed tomography-based prognostic biomarker for recurrence prediction in high-grade serous ovarian cancer. Radiother Oncol 132:171–177
Author Index
A Abhay Vidyarthi, 235 Abhinav Saxena, 351 Abhishek Gupta, 373 Abhishek M. Dhore, 537 Abhishek Nagvekar, 485 Aditi Kandoi, 485 Akshada Kene, 525 Aman Jatain, 1, 501 Ameya Kale, 423 Anisa Fathima Mohammad, 281 Anitha Arumalla, 281 Ankita Vaish, 199 Anshul Kumar, 245 Anugunj Naman, 473 Anuj Kinge, 153 Anuj Purandare, 485 Anukriti, 9 Aravind Chakravarti, 31 Archana Kumar, 297 Ashutosh Soni, 165 Aswath, S., 129 Ayush Mittal, 351
Chetan Sinha, 473 Chitrakala, S., 361
D Dasari Anantha Reddy, 341 David Solomon George, 297 Davinder Paul Singh, 373 Deborah T. Joy, 1 Dhanwin, T., 259 Dinesh Kumar, 165 Diptakshi Sen, 397 Divya, S., 361
G Garigipati Rama Krishna, 341 Girish Mishra, 245, 397 Gopinath Ganapathy, 89
H Harish Kumar Shakya, 463 Harshini, V., 259
B Baijnath Kaushik, 373 Bhatt, C. M., 361 Bholanath Roy, 207 Binu, L. S., 297
I Imtiyaz Ahmad, 223 Ishan Jawade, 423 Ishan Prakash, 245
C Chagantipati Akarsh, 385 Chellammal Surianarayanan, 89
J Jerry Allan Akshay, 329 Jyoti Bharti, 207
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 S. Agrawal et al. (eds.), Machine Intelligence and Smart Systems, Algorithms for Intelligent Systems, https://doi.org/10.1007/978-981-16-9650-3
561
562 K Kalaiyarivu Cholan, M., 129 Karnewar, J. S., 437 Karpagam, C., 179 Kaviya, P., 361 Khan, M. R., 407 Krishna Murthy, S. V. S. S. N. V. G., 245 Kumar Gaurav, 207
L Lakshmi Narasimhan, V., 309 Litty Koshy, 75
M Mahendra Mehra, 485 Mohanaruba, R., 361 Mphale, Ofaletse, 309
N Nagendra Pratap Singh, 515 Nandita Tiwari, 537 Nayeemulla Khan, A., 259 Neha Shivhare, 407 Nidhi Singh, 117 Nilima Kulkarni, 145, 153, 423 Nitish Gupta, 165
P Pal, S. K., 245 Parinita Bora, 31 Pethuru Raj Chelliah, 89 Pranali Yenkar, 65 Pratik Kakade, 423 Prem Kumari Verma, 515 Priyanka Jha, 153 Pulin Prabhu, 485
R Raghavendra V. Kulkarni, 31, 45 Raghav Verma, 351 Rahul Barman, 145 Rajasekhar Nannapaneni, 31, 45 Rajeev Kudari, 341 Rajendar Singh Shekhawat, 117 Renji George Amballoor, 273 Reshma Rajan, 397
Author Index Rishika Singh, 501 Rupam Kumar Roy, 397 Rushikesh Jadhav, 423
S Sagar Shrivastava, 501 Sagi Harshad Varma, 385 Sakshi Arora, 451 Saumya Patel, 199 Saurabh Sharma, 463 Sawarkar, S. D., 65, 103 Sayali Badade, 145 S. Gomathi a Rohini, 179 Shahina, A., 259 Shalabh Bhatnagar, 45 Shalini B. Bajaj, 1 Shalini Bhaskar Bajaj, 501 Shandilya, V. K., 437 Shankar B. Naik, 273 Shanti Rathod, 407 Sharmila Joseph, J., 235 Sharvari Deshpande, 145 Shilpa Sangappa, 31 Shiva Pundir, 329 Shruti Agarwal, 145 Shubhada Thakare, 525 Shweta Koparde, 547 Shwetha Mary Jacob, 75 Sriram, N. A., 259 Suhail Gulzar, 451 Suneeta Agarwal, 223 Sunil Ghane, 485 Sushree Prangyanidhi, 1 Swati V. Narwane, 103
T Tejas Khangal, 153 Tulasi Pendyala, 281
V Vandana Niranjan, 9 Vibhav Prakash Singh, 223, 235 Vishaq, J., 259
Y Yash Oswal, 153 Yogesh Kumar Gond, 165