147 76 20MB
English Pages 653 [633] Year 2023
Smart Innovation, Systems and Technologies 371
Vikrant Bhateja · Fiona Carroll · João Manuel R. S. Tavares · Sandeep Singh Sengar · Peter Peer Editors
Intelligent Data Engineering and Analytics Proceedings of the 11th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2023)
123
Smart Innovation, Systems and Technologies Volume 371
Series Editors Robert J. Howlett, KES International Research, Shoreham-by-Sea, UK Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago, DBLP. All books published in the series are submitted for consideration in Web of Science.
Vikrant Bhateja · Fiona Carroll · João Manuel R. S. Tavares · Sandeep Singh Sengar · Peter Peer Editors
Intelligent Data Engineering and Analytics Proceedings of the 11th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2023)
Editors Vikrant Bhateja Department of Electronics Engineering Faculty of Engineering and Technology (UNSIET) Veer Bahadur Singh Purvanchal University Jaunpur, Uttar Pradesh, India João Manuel R. S. Tavares Faculty of Engineering University of Porto Porto, Portugal
Fiona Carroll Cardiff School of Technologies Cardiff Metropolitan University Cardiff, Warwickshire, UK Sandeep Singh Sengar Cardiff Metropolitan University Cardiff, Warwickshire, UK
Peter Peer Faculty of Computer and Information Science University of Ljubljana Ljubljana, Slovenia
ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-99-6705-6 ISBN 978-981-99-6706-3 (eBook) https://doi.org/10.1007/978-981-99-6706-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore Paper in this product is recyclable.
Preface
This book is a collection of high-quality peer-reviewed and selected research papers presented at the 11th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA-2023) held at Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff (Wales), UK, during April 11–12, 2023. The idea of this conference series was conceived by few eminent professors and researchers from premier institutions of India. The first three editions of this conference: FICTA-2012, 2013 and 2014 were organized by Bhubaneswar Engineering College (BEC), Bhubaneswar, Odisha, India. The fourth edition FICTA-2015 was held at NIT, Durgapur, West Bengal, India. The fifth and sixth editions FICTA-2016 and FICTA-2017 were consecutively organized by KIIT University, Bhubaneswar, Odisha, India. FICTA-2018 was hosted by Duy Tan University, Da Nang City, Vietnam. The eighth edition FICTA-2020 was held at NIT, Karnataka, Surathkal, India. The ninth and tenth editions FICTA-2021 and FICTA-2022 were held at NIT, Mizoram, Aizawl, India. All past editions of the FICTA conference proceedings are published by Springer. FICTA conference series aims to bring together researchers, scientists, engineers and practitioners to exchange and share their theories, methodologies, new ideas, experiences, applications in all areas of intelligent computing theories and applications to various engineering disciplines like computer science, electronics, electrical, mechanical, biomedical engineering, etc. FICTA-2023 had received a good number of submissions from the different areas relating to computational intelligence, intelligent data engineering, data analytics, decision sciences and associated applications in the arena of intelligent computing. These papers have undergone a rigorous peer-review process with the help of our technical program committee members (from the country as well as abroad). The review process has been very crucial with minimum two reviews each; and in many cases three–five reviews along with due checks on similarity and content overlap as well. This conference witnessed huge number of submissions including the main track as well as special sessions. The conference featured many special sessions in various cutting-edge technologies of specialized focus which were organized and chaired by eminent professors. The total toll of papers included submissions received cross country along with many overseas countries. Out of this pool, only 109 papers v
vi
Preface
were given acceptance and segregated as two different volumes for publication under the proceedings. This volume consists of 54 papers from diverse areas of intelligent data engineering and analytics. The conference featured many distinguished keynote addresses in different spheres of intelligent computing by eminent speakers like: Dr. Frank Langbein, Cardiff University, Cathays, Cardiff, Wales, UK, spoke on “Control and Machine Learning for Magnetic Resonance Spectroscopy”; Mr. Aninda Bose, Executive Editor, Springer Nature, London, UK, discussed “Nuances in Scientific Publishing”. Prof. Yu-Dong Zhang, University of Leicester, UK, spoke on Intelligent Computing Theories and Applications for Infectious Disease Diagnosis, whereas Dr. Chaminda Hewage, Cardiff Metropolitan University, UK, discussed “Data protection in the era of ChatGPT”. Dr. Imtiaz Khan, Cardiff Metropolitan University, UK, delivered a talk on “Artificial Intelligence and Blockchain in Health Care 4.0”, while Dr. Siba K. Udgata, University of Hyderabad, India, spoke on “WiSE-Tech: Wi-Fi Sensing Environment for Various technological and Societal Applications”; Dr. R. Chinnaiyan, Presidency University, Bengaluru, Karnataka, India, delivered an invited address on “AI, Digital Twin and Blockchain for Health Care and Agriculture”. Lastly, the keynote sessions were concluded with the felicitation of Dr. Xin-She Yang (Reader at Middlesex University London, UK, and also Steering Chair of FICTA-2023). These sessions received ample applause from the vast audience of delegates, budding researchers, faculty and students. We thank the advisory chairs and steering committees for rendering mentor support to the conference. An extreme note of gratitude to our Organizing Chair, Publicity and TPC Chairs for playing a lead role in the entire process of organizing this conference. We take this opportunity to thank the authors of all submitted papers for their hard work, adherence to the deadlines and patience with the review process. The quality of a refereed volume depends mainly on the expertise and dedication of the reviewers. We are indebted to the technical program committee members who not only produced excellent reviews but also did these in short time frames. We would also like to thank the participants of this conference, who have participated the conference above all hardships. Jaunpur, Uttar Pradesh, India Cardiff (Wales), UK Porto, Portugal Cardiff (Wales), UK Ljubljana, Slovenia
Dr. Vikrant Bhateja Dr. Fiona Carroll Dr. João Manuel R. S. Tavares Dr. Sandeep Singh Sengar Dr. Peter Peer
Organization
Chief Patron President and Vice-Chancellor, Cardiff Metropolitan University, UK.
Patron Prof. Jon Platts, Dean CST, Cardiff Metropolitan University, UK.
Honorary Chair Prof. Rajkumar Buyya, University of Melbourne, Australia.
Steering Committee Chairs Prof. Suresh Chandra Satapathy, KIIT Deemed to be University, Bhubaneswar, Odisha, India. Prof. Siba K. Udgata, University of Hyderabad, Telangana, India. Dr. Xin-She Yang, Middlesex University London, UK.
vii
viii
Organization
General Chairs Dr. Jinshan Tang, George Mason University, Virginia, USA. Dr. João Manuel R. S. Tavares, Faculdade de Engenharia da Universidade do Porto, Portugal.
Organizing Chair Dr. Sandeep Singh Sengar, Cardiff Metropolitan University, UK.
Publication Chairs Dr. Fiona Carroll, Cardiff Metropolitan University, UK. Dr. Peter Peer, Faculty of Computer and Information Science, University of Ljubljana, Slovenia. Dr. Vikrant Bhateja, Veer Bahadur Singh Purvanchal University, Jaunpur, Uttar Pradesh, India. Dr. Jerry Chun-Wei Lin, Western Norway University of Applied Sciences, Bergen, Norway.
Publicity Chairs Dr. Rajkumar Singh Rathore, Cardiff Metropolitan University, UK. Dr. Elochukwu Ukwandu, Cardiff Metropolitan University, UK. Dr. Hewage Chaminda, Cardiff Metropolitan University, UK. Dr. Catherine Tryfona, Cardiff Metropolitan University, UK. Dr. Priyatharshini Rajaram, Cardiff Metropolitan University, UK.
Advisory Committee Aime’ Lay-Ekuakille, University of Salento, Lecce, Italy. Amira Ashour, Tanta University, Egypt. Aynur Unal, Stanford University, USA. Bansidhar Majhi, IIIT Kancheepuram, Tamil Nadu, India. Dariusz Jacek Jakobczak, Koszalin University of Technology, Koszalin, Poland. Edmond C. Prakash, University for the Creative Arts, UK.
Organization
Ganpati Panda, IIT Bhubaneswar, Odisha, India. Isah Lawal, Noroff University College, Norway. Jagdish Chand Bansal, South Asian University, New Delhi, India. João Manuel R. S. Tavares, Universidade do Porto (FEUP), Porto, Portugal. Jyotsana Kumar Mandal, University of Kalyani, West Bengal, India. K. C. Santosh, University of South Dakota, USA. Le Hoang Son, Vietnam National University, Hanoi, Vietnam. Milan Tuba, Singidunum University, Belgrade, Serbia. Naeem Hanoon, Multimedia University, Cyberjaya, Malaysia. Nilanjan Dey, TIET, Kolkata, India. Noor Zaman, Universiti Tecknologi, PETRONAS, Malaysia. Pradip Kumar Das, IIT Guwahati, India. Rahul Paul, Harvard Medical School and Massachusetts General Hospital, USA. Roman Senkerik, Tomas Bata University in Zlin, Czech Republic. Sachin Sharma, Technological University Dublin, Ireland. Sriparna Saha, IIT Patna, India. Swagatam Das, Indian Statistical Institute, Kolkata, India. Siba K. Udgata, University of Hyderabad, Telangana, India. Tai Kang, Nanyang Technological University, Singapore. Valentina Balas, Aurel Vlaicu University of Arad, Romania. Vishal Sharma, Nanyang Technological University, Singapore. Yu-Dong Zhang, University of Leicester, UK.
Technical Program Committee Chairs Dr. Mufti Mahmud, Nottingham Trent University, Nottingham, UK. Dr. Paul Angel, Cardiff Metropolitan University, UK. Dr. Steven L. Fernandes, Creighton University, USA. Ioannis Kypraios, De Montfort University, Leicester, UK. Jasim Uddin, Cardiff Metropolitan University, Cardiff, UK.
Technical Program Committee A. K. Chaturvedi, IIT Kanpur, India. Abdul Wahid, Telecom Paris, Institute Polytechnique de Paris, Paris, France. Ahit Mishra, Manipal University, Dubai Campus, Dubai. Ahmad Al-Khasawneh, The Hashemite University, Jordan. Alexander Christea, University of Warwick, London, UK. Anand Paul, The School of Computer Science and Engineering, South Korea. Anish Saha, NIT Silchar, India. Bhavesh Joshi, Advent College, Udaipur, India.
ix
x
Organization
Brent Waters, University of Texas, Austin, Texas, USA. Catherine Tryfona, Cardiff Metropolitan University, UK. Chhavi Dhiman, Delhi Technological University, India. Dan Boneh, Stanford University, California, USA. Debanjan Konar, Helmholtz-Zentrum Dresden-Rossendorf, Germany. Dipankar Das, Jadavpur University, India. Feng Jiang, Harbin Institute of Technology, China. Gayadhar Panda, NIT Meghalaya, India. Ginu Rajan, Cardiff Metropolitan University, UK. Gengshen Zhong, Jinan, Shandong, China. Hewage Chaminda, Cardiff Metropolitan University, UK. Imtiaz Ali Khan, Cardiff Metropolitan University, UK. Issam Damaj, Cardiff Metropolitan University, UK. Jean Michel Bruel, Department Informatique IUT de Blagnac, Blagnac, France. Jeny Rajan, National Institute of Technology Surathkal, India. Krishnamachar Prasad, Auckland University, New Zealand. Korhan Cengiz, University of Fujairah, Turkey. Lorne Olfman, Claremont, California, USA. Martin Everett, University of Manchester, UK. Massimo Tistarelli, Dipartimento Di Scienze Biomediche, Viale San Pietro. Milan Sihic, RHIT University, Australia. M. Ramakrishna, ANITS, Vizag, India. Ngai-Man Cheung, University of Technology and Design, Singapore. Philip Yang, Price Water House Coopers, Beijing, China. Praveen Kumar Donta, Institut für Information Systems Engineering, Austria. Prasun Sinha, Ohio State University Columbus, Columbus, OH, USA. Priyatharshini Rajaram, Cardiff Metropolitan University, UK. Sami Mnasri, IRIT Laboratory Toulouse, France. Shadan Khan Khattak, Cardiff Metropolitan University, UK. Ting-Peng Liang, National Chengchi University Taipei, Taiwan. Uchenna Diala, University of Derby, UK. V. Rajnikanth, St. Joseph’s College of Engineering, Chennai, India. Wai-Keung Fung, Cardiff Metropolitan University, UK. Xiaoyi Yu, Institute of Automation, Chinese Academy of Sciences, Beijing, China. Yun-Bae Kim, SungKyunKwan University, South Korea. Yang Zhang, University of Liverpool, UK.
Student Ambassadors Kandala Abhigna, KIIT Deemed to be University, India. Rasagna T., KIIT Deemed to be University, India.
Contents
1
2
3
4
Artificial Bee Colony for Automated Black-Box Testing of RESTful API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seif Ahmed and Abeer Hamdy Classifying Human Activities Using Machine Learning and Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Satya Uday Sanku, Thanuja Pavani Satti, T. Jaya Lakshmi, and Y. V. Nandini Explainable Artificial Intelligence and Mobile Health for Treating Eating Disorders in Young Adults with Autism Spectrum Disorder Based on the Theory of Change: A Mixed Method Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Omobolanle Omisade, Alex Gegov, Shang-Ming Zhou, Alice Good, Catherine Tryfona, Sandeep Singh Sengar, Amie-Louise Prior, Bangli Liu, Taiwo Adedeji, and Carrie Toptan Novel Deep Learning Models for Optimizing Human Activity Recognition Using Wearable Sensors: An Analysis of Photoplethysmography and Accelerometer Signals . . . . . . . . . . . . . Rohit Kumar Bondugula and Siba Kumar Udgata
1
19
31
45
5
Link Prediction in Complex Networks: An Empirical Review . . . . . Y. V. Nandini, T. Jaya Lakshmi, and Murali Krishna Enduri
6
High Utility Itemset Mining and Inventory Management: Theory and Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gutha Jaya Krishna
69
Using Clustering Approach to Enhance Prioritization of Regression Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Umakanta Dash, Arup Abhinna Acharya, and Satya Ranjan Dash
77
7
57
xi
xii
8
9
Contents
Nucleus Segmentation Using Adaptive Thresholding for Analysis of Blood and Bone Marrow Smear Images . . . . . . . . . . . Vikrant Bhateja, Sparshi Gupta, Siddharth Verma, Sourabh Singh, Ahmad Taher Azar, Aimé Lay-Ekuakille, and Jerry Chun-Wei Lin A Systematic Review on Automatic Speech Recognition for Odia Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Namita Mishra, Satya Ranjan Dash, Shantipriya Parida, and Ravi Shankar Prasad
89
97
10 A Study on Influence Maximization in Complex Networks . . . . . . . . 111 Chennapragada V. S. S. Mani Saketh, Kakarla Pranay, Akhila Susarla, Dukka Ravi Ram Karthik, T. Jaya Lakshmi, and Y. V. Nandini 11 A Survey on Smart Hydroponics Farming: An Integration of IoT and AI-Based Efficient Alternative to Land Farming . . . . . . . 121 Snehal V. Laddha, Pratik P. Shastrakar, and Sanskruti A. Zade 12 Angiosperm Genus Classification by RBF-SVM . . . . . . . . . . . . . . . . . . 131 Shuwen Chen, Jiaji Wang, Yiyang Ni, Jiaqi Shao, Hui Qu, and Ziyi Wang 13 Signage Detection Based on Adaptive SIFT . . . . . . . . . . . . . . . . . . . . . . 141 Jiaji Wang, Shuwen Chen, Jiaqi Shao, Hui Qu, and Ziyi Wang 14 Surface Electromyography Assisted Hand Gesture Recognition Using Bidirectional LSTM and Unidirectional LSTM for the Hearing Impaired . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Neel Gandhi and Shakti Mishra 15 The Potential of Using Corpora and Concordance Tools for Language Learning: A Case Study of ‘Interested in (Doing)’ and ‘Interested to (Do)’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Lily Lim and Vincent Xian Wang 16 Analysis of Various Video-Based Human Action Recognition Techniques Using Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . 177 Lakshmi Alekhya Jandhyam, Ragupathy Rengaswamy, and Narayana Satyala 17 Penetration Testing of Web Server Using Metasploit Framework and DVWA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Tamanna Jena Singhdeo, S. R. Reeja, Arpan Bhavsar, and Suresh Satapathy 18 Periodic Rampart Line Inspired Circular Microstrip Patch Antenna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Chirag Arora
Contents
xiii
19 A Deep Learning-Based Prediction Model for Wellness of Male Sea Bass Fish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Velaga Sai Sreeja, Kotha Sita Kumari, Duddugunta Bharath Reddy, and Paladugu Ujjwala 20 Depression Detection Using Deep Learning . . . . . . . . . . . . . . . . . . . . . . 219 G. Gopichand, Anirudh Ramesh, Vasant Tholappa, and G. Sridara Pandian 21 Fusion of Variational Autoencoder-Generative Adversarial Networks and Siamese Neural Networks for Face Matching . . . . . . . 231 Garvit Luhadia, Aditya Deepak Joshi, V. Vijayarajan, and V. Vinoth Kumar 22 An Invasion Detection System in the Cloud That Use Secure Hashing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Sridevi Sakhamuri, Gopi Krishna Yanala, Varagani Durga Shyam Prasad, Ch. Bala Subrmanyam, and Aika Pavan Kumar 23 Detection of Suspicious Human Activities from Surveillance Camera Using Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 A. Kousar Nikhath, N. Sandhya, Sayeeda Khanum Pathan, and B. Venkatesh 24 High Resolution Remote Sensing Image Classification Using Convolutional Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 K. Giridhar Sai, B. Sujatha, R. Tamilkodi, and N. Leelavathy 25 FSVM: Time Series Forecasting Using Fuzzy Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 N. Sravani, B. Sujatha, R. Tamilkodi, and N. Leelavathy 26 Deep Learning Framework for the Detection of Invasive Ductal Carcinoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 K. V. Aditya, N. Leelavathy, B. Sujatha, R. Tamilkodi, and D. Sattibabu 27 Iris-Based Human Identity Recognition Using Transfer Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Chinthapalli Karthik, B. Sujatha, T. K. Charan Babu, D. PhaniKumar, and S. Mohan Krishna 28 Ensemble Model for Lidar Data Analysis and Nocturnal Boundary Layer Height Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Gurram Sunitha, K. Reddy Madhavi, J. Avanija, K. Srujan Raju, Adepu Kirankumar, and Avala Raji Reddy
xiv
Contents
29 Classical and Parameterized Complexity of Line Segment Covering Problems in Arrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 M. Rema, R. Subashini, Subhasree Methirumangalath, and Varun Rajan 30 User Story-Based Automatic Keyword Extraction Using Algorithms and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Arantla Jaagruthi, Mallu Varshitha, Karumuru Sai Vinaya, Vayigandla Neelesh Gupta, C. Arunkumar, and B. A. Sabarish 31 Prediction of Sepsis Disease Using Random Search to Optimize Hyperparameter Tuning Based on Lazy Predict Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 E. Laxmi Lydia, Sara A. Althubiti, C. S. S. Anupama, and Kollati Vijaya Kumar 32 Surveillance Video-Based Object Detection by Feature Extraction and Classification Using Deep Learning Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Elvir Akhmetshin, Sevara Sultanova, C. S. S. Anupama, Kollati Vijaya Kumar, and E. Laxmi Lydia 33 Deep Learning-Based Recommender Systems—A Systematic Review and Future Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 S. Krishnamoorthi and Gopal K. Shyam 34 Hybrid Security Against Black Hole and Sybil Attacks in Drone-Assisted Vehicular Ad Hoc Networks . . . . . . . . . . . . . . . . . . . 399 Aryan Abdlwhab Qader, Mohammed Hasan Mutar, Sameer Alani, Waleed Khalid Al-Azzawi, Sarmad Nozad Mahmood, Hussein Muhi Hariz, and Mustafa Asaad Rasol 35 Optimization of Metal Removal Rate, Surface Roughness, and Hardness Using the Taguchi Method in CNC Turning Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Zahraa N. Abdul Hussain and Mohammed Jameel Alsalhy 36 Qualitative Indicator of Growth of Segments of the Network with Various Service Disciplines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Zamen Latef Naser, Ban Kadhim Murih, and M. W. Alhamd 37 Hidden Attractor in a Asymmetrical Novel Hyperchaotic System Involved in a Bounded Function of Exponential Form with Image Encryption Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Ali A. Shukur and Mohanad A. AlFallooji
Contents
xv
38 AI-Based Secure Software-Defined Controller to Assist Alzheimer’s Patients in Their Daily Routines . . . . . . . . . . . . . . . . . . . . 453 S. Nithya, Satheesh Kumar Palanisamy, Ahmed J. Obaid, K. N. Apinaya Prethi, and Mohammed Ayad Alkhafaji 39 An Optimized Deep Learning Algorithm for Cyber-Attack Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 M. Eugine Prince, P. Josephin Shermila, S. Sajithra Varun, E. Anna Devi, P. Sujatha Therese, A. Ahilan, and A. Jasmine Gnana Malar 40 Estimation of Wind Energy Reliability Using Modeling and Simulation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 A. Jasmine Gnana Malar, M. Ganga, V. Parimala, and S. Chellam 41 Authentication Protocol for Secure Data Access in Fog Computing-Based Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 Priyanka Surendran, Bindhya Thomas, Densy John, Anupama Prasanth, Joy Winston, and AbdulKhadar Jilani 42 Efficient Data Security Using Hybrid RSA-TWOFISH Encryption Technique on Cloud Computing . . . . . . . . . . . . . . . . . . . . . 495 A. Jenice Prabhu, S. Vallisree, S. N. Kumar, R. Sitharthan, M. Rajesh, A. Ahilan, and M. Usha 43 Fuzzy-Based Cluster Head Selection for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 R. Surendiran, D. Nageswari, R. Jothin, A. Jegatheesh, A. Ahilan, and A. Bhuvanesh 44 An Optimized Cyber Security Framework for Network Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 B. Veerasamy, D. Nageswari, S. N. Kumar, Anil Shirgire, R. Sitharthan, and A. Jasmine Gnana Malar 45 Modified Elephant Herd Optimization-Based Advanced Encryption Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 R. Surendiran, S. Chellam, R. Jothin, A. Ahilan, S. Vallisree, A. Jasmine Gnana Malar, and J. Sathiamoorthy 46 Phonocardiographic Signal Analysis for the Detection of Cardiovascular Diseases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529 Deena Nath Gupta, Rohit Anand, Shahanawaj Ahamad, Trupti Patil, Dharmesh Dhabliya, and Ankur Gupta 47 Object Localization in Emoji-Based Social Networks Using Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539 Galiveeti Poornima, Y. Sudha, B. C. Manujakshi, R. Pallavi, Deepak S. Sakkari, and P. Karthikeyan
xvi
Contents
48 Mathematical Gann Square Model and Elliott Wave Principle with Bi-LSTM for Stock Price Prediction . . . . . . . . . . . . . . . . . . . . . . . . 553 K. V. Manjunath and M. Chandra Sekhar 49 Crowd Monitoring System Using Facial Recognition . . . . . . . . . . . . . . 567 Sunanda Das, R. Chinnaiyan, G. Sabarmathi, A. Maskey, M. Swarnamugi, S. Balachandar, and R. Divya 50 Performance Augmentation of DIMOS Transistor . . . . . . . . . . . . . . . . 579 S. Jafar Ali Ibrahim, V. Jeya Kumar, N. S. Kalyan Chakravarthy, Alhaf Malik Kaja Mohideen, M. Mani Deepika, and M. Sathya 51 A Hyperparameter Tuned Ensemble Learning Classification of Transactions over Ethereum Blockchain . . . . . . . . . . . . . . . . . . . . . . 585 Rohit Saxena, Deepak Arora, Vishal Nagar, Satyasundara Mahapatra, and Malay Tripathi 52 Data Integrity Protection Using Multi-level Reconstructive Error Data and Auditing for Cloud Storage . . . . . . . . . . . . . . . . . . . . . . 595 Kaushik Sekaran, B. Seetharamulu, J. Kalaivani, Vijayalaxmi C. Handaragall, and B. Venkatesh 53 Emotion-Based Song Recommendation System . . . . . . . . . . . . . . . . . . 607 A. R. Sathya and Alluri Raghav Varma 54 A Hybrid Model for Forecasting Stock Prices Using Bayesian and LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617 Rohini Pinapatruni, Faizan Mohammed, Syed Anas Mohiuddin, and Dheeraj Patel Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
About the Editors
Vikrant Bhateja is associate professor in Department of Electronics Engineering Faculty of Engineering and Technology (UNSIET), Veer Bahadur Singh Purvanchal University, Jaunpur, Uttar Pradesh, India. He holds a doctorate in ECE (Bio-Medical Imaging) with a total academic teaching experience of 20 years with around 190 publications in reputed international conferences, journals and online book chapter contributions; out of which 39 papers are published in SCIE indexed high impact factored journals. One of his papers published in Review of Scientific Instruments (RSI) Journal (under American International Publishers) has been selected as “Editor Choice Paper of the Issue” in 2016. Among the international conference publications, four papers have received “Best Paper Award”. He has been instrumental in chairing/ co-chairing around 30 international conferences in India and abroad as Publication/ TPC chair and edited 52 book volumes from Springer-Nature as a corresponding/ co-editor/author on date. He has delivered nearly 22 keynotes, invited talks in international conferences, ATAL, TEQIP and other AICTE sponsored FDPs and STTPs. He has been Editor-in-Chief of IGI Global—International Journal of Natural Computing and Research (IJNCR) an ACM and DBLP indexed journal from 2017–22. He has guest edited Special Issues in reputed SCIE indexed journals under Springer-Nature and Elsevier. He is Senior Member of IEEE and Life Member of CSI. Fiona Carroll is Reader (Eq—Associate Professor) with the Cardiff School of Technologies, Cardiff Metropolitan University (CMet), Wales, UK. She obtained her Ph.D. in Human Computer Interaction (HCI) from Edinburgh Napier University in 2008. Her research over the past twenty years has focused on the fast-changing relations between humans and digital technologies. She has been particularly proactive in applying aesthetics to digital graphics, visualization, and HCI. At CMet, she coleads the Creative Computing Research Centre and is also passionate about exploring new multidisciplinary approaches to engage more girls in STEM. As an accomplished academic, she has numerous peer-reviewed publications and has partaken in cutting-edge research projects both at national and international levels.
xvii
xviii
About the Editors
João Manuel R. S. Tavares graduated in Mechanical Engineering at the Universidade do Porto, Portugal, in 1992. He also earned his M.Sc. degree and Ph.D. degree in Electrical and Computer Engineering from the Universidade do Porto in 1995 and 2001 and attained his Habilitation in Mechanical Engineering in 2015. He is Senior Researcher at the Instituto de Ciência e Inovação em Engenharia Mecânica e Engenharia Industrial (INEGI) and Full Professor at the Department of Mechanical Engineering (DEMec) of the Faculdade de Engenharia da Universidade do Porto (FEUP). He is Co-editor of more than 75 books, Co-author of more than 50 chapters, 650 articles in international and national journals and conferences, and 3 international and 3 national patents. His main research areas include computational vision, medical imaging, computational mechanics, scientific visualization, human–computer interaction, and new product development. Sandeep Singh Sengar is Lecturer in Computer Science at Cardiff Metropolitan University, UK. Before joining this position, he worked as Postdoctoral Research Fellow at the Machine Learning Section of Computer Science Department, University of Copenhagen, Denmark. He holds a Ph.D. degree in Computer Science and Engineering from Indian Institute of Technology (ISM), Dhanbad, India, and an M.Tech. degree in Information Security from Motilal Nehru National Institute of Technology, Allahabad, India. His current research interests include medical image segmentation, motion segmentation, visual object tracking, object recognition, and video compression. His broader research interests include machine/deep learning, computer vision, image/video processing, and its applications. He has published several research articles in reputed international journals and conferences in the field of computer vision and image processing. He is Reviewer of several reputed international transactions, journals, and conferences including IEEE Transactions on Systems, Man and Cybernetics: Systems, Pattern Recognition, Neural Computing and Applications, Neurocomputing. Peter Peer is Full Professor of computer science at the University of Ljubljana, Slovenia, where he heads the Computer Vision Laboratory, coordinates the double degree study program with the Kyungpook National University, South Korea, and serves as Vice-dean for economic affairs. He received his doctoral degree in computer science from the University of Ljubljana in 2003. Within his postdoctorate, he was Invited Researcher at CEIT, San Sebastian, Spain. His research interests focus on biometrics and computer vision. He participated in several national and EU funded R&D projects and published more than 100 research papers in leading international peer-reviewed journals and conferences. He is Co-organizer of the Unconstrained Ear Recognition Challenge and Sclera Segmentation Benchmarking Competition. He serves as Associated Editor of IEEE Access and IET Biometrics. He is Member of the EAB, IAPR, and IEEE.
Chapter 1
Artificial Bee Colony for Automated Black-Box Testing of RESTful API Seif Ahmed and Abeer Hamdy
Abstract Recently, RESTful APIs are widely utilized in a variety of web applications; developers utilize RESTful APIs as a blacked-out component in microservices. Black-box testing for RESTful APIs is essential as neither of the API’s source code nor its compiled binary is always available for public access. A handful number of research studies were conducted for the automatic generation of test suite for RESTful APIs based on black-box testing. However, to our knowledge, none of them considered the test coverage criteria or the test suite optimization. This paper proposed adapting the Artificial Bee Colony (ABC) swarm intelligence algorithm for the automatic generation of test suites for RESTful APIs based on OpenAPI Specification (OAS), while also maximizing the API test coverage (path, operation, parameter, input value, and status code). Experiments were conducted on six APIs that differ from each other in the number of routes, operation types, input values, and how well the API is documented. The experiments showed that the ABC algorithm can generate test suites that achieve high coverage criteria.
1.1 Introduction Recently, web services have taken a generous portion of the web and got the attention of numerous web developers and web companies. One of the attractive features of web services is that the applications can be developed using different types of programming languages and still could communicate with each other over the web. Typically, a web service is associated (but not necessary) with service-oriented architecture (SOA) or micro-services architecture. There are two different popular types of web services SOAP and RESTful web services [3]. Both types serve the purpose of any web service which is application-to-application (A2A) communication but in S. Ahmed · A. Hamdy (B) Faculty of Informatics and Computer Science, The British University in Egypt, Cairo, Egypt e-mail: [email protected] S. Ahmed e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_1
1
2
S. Ahmed and A. Hamdy
different methods. Firstly, SOAP web services were released for a long time. SOAP stands for simple object access protocol and is an XML-based protocol for accessing web services on the web via HTTP. SOAP web service is typically using the SOAP protocol to allow sending an XML message between two applications over HTTP since many applications can understand the XML markup language. A key feature of the SOAP web services is that it is well described and documented using Web Service Description Language (WSDL) which is a standard, universal XML document that describes the SOAP web services functionalities. Recently, RESTful web services became more popular and dominant in web services applications. Representational state transfer (REST) is an architecture that uses the HTTP protocol for exchanging data over the web. REST is distinguished by the fact that communication is resource based. This means that messages are transmitted as resource (self-contained chunk of information) representations using HTTP methods. A resource can be identified using the Uniform Resource Locator (URLs) and can be described and represented using JSON. Unlike the SOAP, the RESTful web services lack well-descriptive documentation for their resources and functionalities. But fortunately, most RESTful web services provide RESTful APIs. An Application Programming Interface (API) is a set of functionalities provided and exposed by web services that can be accessed and consumed to communicate with a specific section of the web service. The RESTful APIs are well documented using the OpenAPI Specification (OAS). It usually comes in JSON/Yaml formats and establishes a standard language-independent interface to RESTful APIs that simplify the exploration and comprehend the service’s capabilities without access to source code [17]. Over the last couple of years, the usage percentage of the RESTful APIs that act as components of micro-service architecture for building any web application has grown sensationally [15]. Thus, the manual testing of the applications that include numerous RESTful APIs, that have been provided from different REST web services, is considered a burden for any developer. Automating black-box testing for RESTful APIs is a necessity, as neither the source code nor its compiled binary code is always available for public access. Black-box testing is specification-based testing that heavily depends on the input and output of the software; it does not require access to the source code. A handful number of research studies were proposed in the literature for automating black-box testing of RESTful APIs [2, 4, 12, 20–22]. However, to our knowledge, all the previous studies were focused on the automatic generation of test cases, without achieving high coverage criteria or minimizing the size of the test suite.
1.2 Aims and Contributions Evolutionary algorithms are powerful optimization techniques that were highly utilized in the field of software engineering [9, 10], to tackle several optimization problems including software testing, [5–7]. This paper proposes adapting a swarm intelligence algorithm called “Artificial Bee Colony (ABC)” for automating
1 Artificial Bee Colony for Automated Black-Box Testing of RESTful API
3
the black-box testing of RESTful APIs, based on OpenAPI Specification (OAS), while maximizing the API test coverage criteria for the generated test suite based on introducing nominal and mutation testing. Nominal test cases are intended to test the RESTful API using input data as stated in the OAS, while mutated test cases use input data that deviates from the OAS, exposing implementation weaknesses and unhandled exceptional flows in the RESTful API. The reason for implementing both nominal and mutation test cases in a single test suite is to facilitate achieving high testing coverage criteria. The ABC algorithm was selected in this work as Sahin and Akay [18] conducted a comparison experimentally among the ABC, PSO, differential evolution, and firefly algorithms in the context of automatic testing data generation. They found out that the ABC is the most effective in challenging situations with complex search spaces and is efficient over simple search spaces. This paper is organized as follows: Sect. 1.3 summarizes the previous work related to the automatic generation of test suite for RESTful APIs based on black- and whitebox testing. Section 1.5 discusses the proposed approach, while Sect. 1.6 discusses the experiments and results. Finally, Sect. 1.7 concludes the paper.
1.3 Literature Survey The desperate need for the automatic generation of test suite to test the RESTful APIs, and how challenging it is to achieve high coverage criteria, drew the attention of the testing community. Recently, a handful number of approaches have been proposed in the literature. Some of these approaches are for automating white-box testing of RESTful APIs, which requires the availability of the API source code and others for automating black-box testing. However, the existing solutions for black-box testing did not consider either achieving high API testing coverage criteria or test suite size reduction. Arcuri [1] introduced EvoMaster that uses an evolutionary algorithm many independent objectives (MIO) for automating white-box testing of RESTful APIs. EvoMaster requires the OpenAPI Specification and complete access to the source code; i.e., it requires access to the Java binary code of the RESTful API. Zhang et al. [14] extended EvoMaster by incorporating templates with organized test scenarios then utilized a search-based algorithm to produce test cases, into a more comprehensive solution for resource-based test case generation. However, this approach still requires access to the source code; thus, it works only with RESTful APIs developed using Java programming language and is not practical for testing the RESTful APIs within micro-service architectures. On the other hand, Sahin and Akay [18] concluded that the ABC algorithm is more effective in challenging situations with more complex search spaces; the random search algorithm, on the other hand, is efficient for simpler tasks. The outcomes which the ABC algorithm has provided for generating test cases were very promising, but Sahin and Akay [18] utilized the ABC algorithm and introduced new algorithm called dynamic artificial bee colony
4
S. Ahmed and A. Hamdy
with hyper-scout (DABC-HS) for white-box testing and only achieves non-functional coverage criteria. Thus, requiring access to the source code and works only on REST APIs developed using Java programming languages. However, Sahin and Akay [18, 19] improved the original ABC algorithm to overcome the drawbacks of the ABC algorithm using an archive. Segura et al. [20] introduced an approach for black-box testing that is based on the metamorphic relations between the requests and responses. For example, two test cases are executed on the same REST API, if the second test case has a stricter condition than the first test case. The results of the second test case must be a suitable subset of the first test case’s outcome; otherwise, it is exposed as a defect. Segura et al. [20] approach is not fully automated because it requires input from the user to identify the metamorphic relations, and this solution only tests the search-oriented APIs. Contrarily, Ed-douibi [4] introduced a model-based method for automatic black-box testing. Ed-douibi [4] approach relies on two meta-models OpenAPI meta-model and TestSuite meta-model. The OpenAPI meta-model is used for extracting a model from the OpenAPI specification to be able to identify and work on the API resources; the next step is to interfere with parameter values that is been extracted from the OpenAPI specification based on some rules in the OpenAPI model. Then, the TestSuite metamodel is used for creating a test case definition for REST APIs to generate both nominal test cases and faulty test cases (mutated test cases). However, this approach does not explicitly reflect the interdependence between resources since OpenAPI v2.0 does not provide such information about the dependencies. Meaning that if a source depends on the response of another source, the OpenAPI v2.0 does not reflect this information. Another restriction to this approach, it excludes the read-only operations that have an impact on the API state. To overcome the restrictions of Ed-douibi approach, Vigilantism et al. [21] proposed utilizing an operation dependency graph (ODG) that is computed dynamically through the testing process to automatically decide the order of the test cases be invoked. The ODG is initially extracted from the OpenAPI specification and represents the data dependencies between the API operations. Mart [12] proposed QuickREST for automated black-box testing for RESTful APIs, but this approach focuses only on property-based testing which creates input data and test the API with that data to see whether the defined properties are still true. Martin-Lopez [14] introduced ResTest, which is a model-based approach for automating black-box testing but rather than requiring only the OAS, it also requires the test model containing a configuration file in YAML notation. Finally, Ovidiu [2] proposed a specification-based testing of REST APIs which requires the OAS v3.0 and includes different test cases levels of configuration that can be chosen by the user. All the previously introduced approaches for automating black-box testing are focused on the automatic generation of test cases without achieving any high coverage criteria or minimizing the size of the test suite.
1 Artificial Bee Colony for Automated Black-Box Testing of RESTful API
5
1.4 ABC Algorithm The ABC algorithm is one of the swarm intelligence evolutionary algorithms that models the foraging behavior of honeybees [11]. The ABC algorithm includes three main phases that simulates the bees’ roles in foraging, which are (1) employed bees, (2) onlooker bees, and (3) scout bees. Algorithm 1 lists the main steps of the ABC. A solution in the ABC algorithm signifies a location of a food source, while the fitness function value of a food source represents its nectar quality. The ABC starts with a population of randomly generated food sources. The employed bee phase searches the vicinity of the current food sources to find out whether better food sources exist. If a new food source with a higher fitness is found, it replaces the current food source in the population; otherwise, the current one remains. To generate a new food source in the vicinity of a current one, a random neighbor is selected, then crossover and mutation operators are applied. The employed bees share the information about the food sources with the onlooker bees. Then the onlooker bees select high-quality food solutions using stochastic selection to find a new solution. If a new food source has a higher fitness value, it replaces the previous one. The scout bee phase distinguishes a weak solution (exhausted food source) and replaces it with a new one; consequently, diversity is introduced into the population to support the space exploration, while both employed and onlooker bees are responsible for the exploitation. Algorithm 1: The basic ABC algorithm 1: 2: 3: 4: 5: 6: 7: 8:
Initialization of a population of food source locations Fitness function evaluation repeat Employed Bee Phase Onlooker Bee Phase Keep the best solution achieved so far Scout Bee Phase until a predetermined termination criteria
1.5 Proposed Approach The proposed approach includes three main modules which are (1) OAS analysis, (2) test suite generation using ABC, and (3) test suite execution, as depicted by Fig. 1.1. The OAS analysis module is responsible for extracting the required data fields from the OAS. The test suite generation and optimization module adapts the ABC algorithm to generate a test suite that achieves the highest testing coverage criteria. Finally, the test suite execution module executes the test suite. The following subsections explain these modules in detail.
6
S. Ahmed and A. Hamdy
Fig. 1.1 Proposed approach overview
OpenAPI Specification (OAS) analysis OAS provides an interface for the RESTful API that allows both people and machines to identify and comprehend the service’s capabilities in the absence of source code, documentation, or network traffic examination. The HTTP, which is the main protocol used in the RESTful APIs, is a client–server protocol that allows retrieving resources from a specific backend server. It is the cornerstone of all data interchange between different web applications. The HTTP consists of two types of HTTP methods [16]: (1) HTTP request: • • • •
GET—to retrieve data from a resource. POST—to create a data record or initiate an action. PUT and PATCH—to update an existing resource or add a new one. DELETE—to delete a data record.
(2) HTTP response (status code): • 2xx status code signifies success. • 4xx status code signifies a client error (error from the client request). • 5xx status code signifies a server error (error from the internal server implementation). OAS analysis allows the understanding and defining the endpoints of the operations in a REST API, in JSON format. Extracting the API endpoints is very straightforward because the OpenAPI provides such information in a path object that holds all the resources provided by the API. Thus, the implementation of the OAS for any API must adhere to the required universal specification and definition stated by
1 Artificial Bee Colony for Automated Black-Box Testing of RESTful API
7
Swagger Team for OAS 3.x (OpenAPI Specification, n.d.). The Swagger fields of any OAS are as follows: (1) “servers”: This field must contain the URL of the working servers. (2) “components”: This field can contain several additional fields: “securitySchemes”, “schemas”, “parameters”, “requestBodies”, and “responses”. In general, it holds all the reusable components to be used and referenced at from other fields. (3) “securitySchemes”: This field is used only if the API requires any type of authentication and contains a different type of required authentication schemas field. Each authentication schemas field must contain a type (“http”, “apikey”). (4) “schemas”: These fields can hold multiple schema objects that allow the definition of input and output data types. These types can be objects, but also primitives and arrays. (5) “paths”: This field must contain the different path items (routes) that the API exposes and each route field contains the different HTTP methods that can be used on this route. (6) “path item” (route): each path item can have parameters and request body fields except the get method which can have only a parameter field. These fields contain the different inputs (parameters and properties) schema that each HTTP method takes. The parameters input schema must have two required fields: (1) name field (parameter name) and (2) the field (“query”, “header”, “path”, or “cookie”). The request body input schema must be of type “object”. The path item may also include the “security” field that contains the required security key names if needed each name must correspond to a security scheme that is declared in the security schemes. (7) “security”: This field holds the multiple required global security names; each name must correspond to a security scheme that is declared in the security schemes under the components object. Note that this general security field can be overwritten with the security field inside the path item (route) field. Any API resource may be tested if and only if the values of all its required inputs (parameters, properties) are retrieved and generated. So, it is preferable to provide an “example” field in each input schema that holds the values of an input. The OAS analysis module will firstly go through the “security schemes” field (if exists) which is inside the “components” field; to observe all the required security schemas and check if the user has input a security value for each required security name, if there is at least one missing value (the user did not provide) the module will raise an error requiring this value and will not proceed because each API request will require some type of security keys to be able to correctly execute it and get the expected response. However, if all the values are found or there is no required security, the module will continue by iterating over the “path” field to analyze it by getting all the routes and their specified HTTP methods. While iterating over each route’s HTTP methods and getting its inputs, the module will map each input to its stated information and schema in the OAS. Moreover, the module will also generate the URL for each HTTP method to be able to extract and perform the HTTP requests in addition to the required
8
S. Ahmed and A. Hamdy
security if needed. Finally, the module will map each HTTP method that contains all the information (input information, schemas, required security, responses, and URL) required to perform an HTTP request to its specified route as stated in the OAS. ABC for RESTful API test suite generation The ABC starts iterating till the maximum number of iterations mfe is reached or until a population that has the maximum possible fitness value mfv is discovered. While the ABC iterating, it performs the employed bee, onlooker bee, and scout bee phases. The employed bee phase and the onlooker bee phase perform exploitation trying to find better food sources for the population. Both phases use a set of mutation functions, namely value mutation, specific to this context. All the food sources enter the employee bee phase, while the food sources enter the onlooker bee based on a calculated probability. After the algorithm finishes the scout bee phase starts whether there is an exhausted food source in the population or not, if the food source enters this phase. The output of the algorithm is the test suite (population) that has the highest fitness value. The following subsections explain the main phases of the ABC in details. A. Solution representation The population represents a test suite which includes one test case for each path in the API, to guarantee 100% API path coverage; so the N p value is API dependent. A test case (food source) contains one or more HTTP requests. Each HTTP request can be either of type nominal or mutated and is represented using a single chromosome. Each chromosome consists of a group of genes (inputs). The gene is either immutable (i.e., IP address, HTTP headers) or mutable. A single mutable gene represents a single HTTP request input (parameter/property). The nominal HTTP request (chromosome) must consist of only nominal genes (parameters/properties) that adhere to the OAS schemas, while the mutated HTTP request (chromosome) consists of genes that violate the OAS schemas. The nominal HTTP request is expected to result in a correct output and is referred to as nominal testing, while the mutated HTTP request is expected to produce incorrect output and is referred to as mutation testing. Thus, a test suite includes two types of testing chromosomes, nominal and mutation testing chromosomes; Fig. 1.2 shows an example food source. B. Fitness function formulation The fitness function measures the nectar quality of each food source. The fitness function, as given by Eq. 1.1, is formulated as the summation of the normalized values of a set of coverage criteria minus the normalized value of the test case size; to guide the algorithm to select the smallest test case that achieve the highest coverage criteria, the coverage criteria included in the fitness function are (1) operation coverage, (2) parameter coverage, (3) input value coverage, and (4) status code coverage [13]. The fitness function does not include a term for path coverage, as the path coverage is achieved through setting the value of N p to be equal to the number of paths in an API, which guarantees 100% path coverage. The value of each coverage criteria term is in the range [0:1], 0 means no coverage, while 1 means full coverage.
1 Artificial Bee Colony for Automated Black-Box Testing of RESTful API
9
Fig. 1.2 Example of food source (a test case includes four HTTP requests)
fitness function = (operationCov + parameterCov + inputvalueCov , + statuscodeCov) − testcasesize
(1.1)
where (1) operationCov = number of executed operations/total number of operations (2) parameterCov = number of parameters used/total number of parameters (3) inputvalueCov = number of the different values given to the inputs/total number of possible values which all inputs can take (4) statuscodeCov = number of status codes obtained (maximum two per operation)/total number of status codes (5) Testcasesize = number of chromosomes/maximum number of chromosomes (5 × 2), because each test case may include at maximum five different operations, and each operation → 2 different status code “2xx, (4xx, 5xx)”. C. Initialization phase The initialization phase starts with setting the values of the different parameters (Np , mfe, mfv, limit), where the N p represents the number of test cases in the test suite, which is equal to the number of paths (routesMap length) in the API. mfe is the maximum number of fitness evaluations to find the best test suite that has the best fitness value. While mfv is the maximum possible fitness value a population can achieve, it is used as a termination criterion. The mfv is calculated through multiplying the N p × 3.5; because the fitness values are calculated as previously mentioned based on four coverage criteria, the value of each is in the range [0:1]. The algorithm attempts to maximize the four coverage criteria while minimizing the test case size. Thus, 3.5 is calculated by assuming that the maximum sum of the first four criteria is equal to 4: subtracting 0.5 as an acceptable margin for the size of the test case. The limit parameter is used to determine the exhausted food sources. Finally, the initial population is constructed as the current population. The initial population is created by randomly generating food sources, and each food source contains only nominal chromosomes. The food sources are generated randomly through iterating
10
S. Ahmed and A. Hamdy
over the output of the “OAS analysis” module. Thus, for every single path, it generates a random number of chromosomes, and in each chromosome, it must generate a random number of nominal genes only by extracting the values of each nominal gene based on some rules. There are two rules applied to extract and infer input values in the order of priority. Rule 1: A simple input value could be extracted from the following: (1) examples object (i.e., p.example and p.schema.example), (2) default values object (i.e., p.default and p.items.default if p of type array), and (3) enums (i.e., the first value of p.enum or p.items.enum if p is of type array). Rule 2: A dummy input value p is arbitrarily generated with respect to the p type (i.e., random integer/ decimal number or a random string) and a high probability is given for generating a zero value for numeric types and empty string on string types. Conversely, an arbitrary random numeric value from the permitted range or generating a randomized string by recombining an arbitrary alphanumeric character respecting the minLength and maxLength schema constraints. Lastly, each food source in the population is associated with a trial counter equal to zero. D. Employed bee phase The number of employed bees is equal to the number of food sources in the population. In this phase, all the food sources get an opportunity to generate a new food source. Thus, each employed bee exploits a single food source. Exploiting is a simulation of a real bee behavior that exploits a food source (flower) and tries to identify a better food source that has a better nectar quality than the one associated with it. However, in the employed bee phase the exploitation is performed by generating a new food source from the current exploiting food source by applying a set of value mutation operators and then, calculating the fitness values for the current and newly generated food source. Finally, the current food source is updated if the newly generated food source has better fitness value and a trial counter for the newly added food source is set to zero; otherwise, the current food source is kept, and its trial counter is incremented by 1. Value mutation operators: It is applied to every chromosome (HTTP request) in a single food source, with probability 50%. It includes two categories of mutation operators which are all working on the genomes, nominal mutation operators and error mutation operators introduced by different black-box testing papers [4, 8, 21]. Nominal Mutation Operators (Nominal Testing): (1) Change finite value: This operator changes the value of finite inputs (enums or Booleans). (2) Add new input: This operator selects and adds a new input (genome) to the generated chromosome. (3) Remove non-required input: This operator selects and removes a non-required input (genome) from a chromosome. Error Mutation Operators (Mutation Testing): (1) Remove required input: This operator selects and removes a required input (genome) from a chromosome.
1 Artificial Bee Colony for Automated Black-Box Testing of RESTful API
11
(2) Mutate input type: This operator mutates the value of the input (genome). If the genome of the type string, then it will generate a value of a random number and value of a random string. (3) Constraint violation: This operator violates a constraint given in the OAS schema for the value of the input (genome). If the genome of type string, then it will generate a value of the random string that exceeds the maximum length given, and if the genome of type number, then it will generate a value of a random number that exceeds the maximum number given. E. Onlooker bee phase The number of onlooker bees is equal to the number of food sources in the population. In this phase, only some of the food sources get an opportunity to generate a new food source based on the calculated probability for each food source. The probability value for each food source i is calculated using Eq. 1.2: fi + 0.1, probi = 0.9 mfv
(1.2)
where f i is the fitness value of food source i and mfv is the maximum possible fitness value of a single food source. If the onlooker bee exploited a food source, it will generate a new food source from the current exploiting food source through applying the proposed structure mutation operators. Then the fitness values of the current and newly generated food sources are calculated. Same as the employee bee phase, if the newly generated food source has better fitness value, it replaces the current food source, and its trial counter is reset to zero; otherwise, the current food source is kept and its trial counter in incremented by 1. F. Scout bee phase Not all the food sources enter the scout bee phase. Every food source as previously mentioned is associated with a trial counter, if the value of the trial counter is greater than a pre-specified value (limit), the food source enters the scout bee phase. In this case, the food source is identified as an exhausted food source, so it is replaced with a new food source based on the extracted information from OAS and its trial counter is reset to zero. limit is determined according to Eq. 1.3: limit = N p × D,
(1.3)
where N p is the number of food sources (population size) and D is the dimension of the problem each test case can include five different types of HTTP request (GET, POST, PUT, PATCH, and DELETE). There are three cases that the scout bee phase handles; the first case: If there is only one food source whose trial exceeded the limit, then this food source will enter the scout phase; second case: If there are several food sources whose trial exceeded the limit, then the food source with the highest trial only enters the scout phase; the last case: There are several food sources which have trials exceeded the limit, but all the food sources trials are equal then a randomly
12
S. Ahmed and A. Hamdy
selected food source will only enter the scout phase. Thus, ensuring only one food source enters the scout bee phase at a time. Test suite execution Each generated test case is compiled and executed in order according to the CRUD templates proposed by Zhang et al. [22]. The order of the testing of the operations is very critical to ensure a successful test case (i.e., a resource should be presented using PUT or POST methods to test a successful DELETE operation on this resource). The usually preferred order for operations is as follows: (1) HEAD operation is used for ensuring the validity of API operation and retrieving the header of it. Thus, it is the first in the order list. (2) POST operation is used for adding new resources and many other operations can depend on this resource, so it has high order. (3) PUT and PATCH operation is used for updating an available resource or adding a new one if it does not exist. (4) DELETE operation is used for deleting an already existing resource, and it is at bottom of the order list because the resource must be available before deleting it. Thus, testing and comparing the results to generate a complete report of what operations are tested with what values and if the operation passed the test cases or not. There are two defined methods to evaluate whether the operation is tested successfully or not. Firstly, the status code method, for any HTTP request sent to the API there exists an HTTP response that includes a status code which is a three-digit integer value that describes the status of the operation. A status code with the value 2xx code stands for correct execution, 4xx code stands for a bad request, and the 5xx code represents an internal server error; 5xx code usually presents a programming error in the source code of the REST API itself. Thus, nominal test cases returned a status code of 2xx means the operation passed this test case, and if it returns 4xx or 5xx means the operation failed the test case and will be documented in the report. Contrarily, if the mutated test cases returned a status code of 2xx means the operation failed this test case (lived mutant), and if it returns 4xx or 5xx means the operation passed the test case (killed mutant). Secondly, the schema validation method uses the swagger-schema-validator library. OAS provides such information about the schema for the request’s responses. For remote programs that interact with RESTful API, accuracy between actual responses and their schema is very critical. For nominal test cases, if there exists a mismatch between the OAS schema of an operation and the response of the test case, this operation is considered as failed and documented in the report and vice versa for mutated test cases.
1 Artificial Bee Colony for Automated Black-Box Testing of RESTful API
13
1.6 Experiments and Results A. Datasets Six RESTful APIs were used in the experiments, openWeatherMapAPI, JSONPlaceholderAPI, FakeRestAPI, DummyAPI, FakeStoreAPI, and productManagementAPI. The APIs are of different complexities, i.e. The APIs are different in terms of the number of paths, route operations, and input values as listed in Table 1.1. The value zero in the column “number of finite parameter values” indicates that the API includes only infinite parameters. All APIs under test except “OpenWeatherMAP” are purpose-built dummy APIs, specifically designed for testing and prototyping. Dummy APIs provide fake but validated data that serves as testing data. While “OpenWeatherMap” API is used in a real environment and provide real data. B. Experimental setup For every API under test, the ABC algorithm was run 10 times. At the end of every run the fitness function value for the best test suite was calculated, and then the mean and standard deviation values were computed over the 10 runs. The N p value is API dependent; it is set equal to the number of paths in an API. The value of mfe was set to equal 100, and the limit was set to equal to N p × 5. Table 1.1 Specifications of RESTful APIs under test API
Specifications Number of paths
Routes operations
Number of operations
Number of parameters
Number of Finite parameter values
OpenWeatherMap
1
GET
1
8
39
JSONPlaceholder
3
GET, POST, 7 PUT, PATCH, DELETE
8
0
GET, POST, PUT, DELETE
31
0
DummyAPI
14
18
FakeRestAPI
4
GET, POST, 9 PUT, PATCH, DELETE
24
0
FakeStore
8
GET, POST, 19 PUT, PATCH, DELETE
60
0
productManagement
8
GET, POST, PUT, DELETE
34
14
16
14
S. Ahmed and A. Hamdy
C. Results and discussion Table 1.2 lists the mean and standard deviation of the best fitness function values (m, sd) attained by the ABC algorithms, in comparison with the maximum possible fitness function value, over the six APIs under test. The mean fitness value indicates the overall performance of the ABC, where a higher value signifies better achievement of testing coverage criteria. On the other hand the standard deviation reflects the stability and robustness of the algorithm, with lower values indicating greater consistency. The percentages of the achieved coverage per criteria are also listed in the Table 1.2, in addition to the test suite sizes in terms of the number of HTTP requests. As could be observed that the ABC could generate a test suite that achieves high coverage percentages across the six APIs except for the input value coverage of the OpenWeatherMap, the ABC achieved only 30% coverage; this could be attributed to the nature of the OpenWeatherMap API which has 39 different input parameter values to cover. Also, the parameter coverage of the FakeStore is the lowest across the six APIs, as the FakeStore has the largest number of parameters which is equal to 60.
1.7 Conclusion RESTful APIs are now widely used in a wide range of web applications. The automatic generation of a test suite for black-box testing of RESTful APIs is a necessity for micro-services developers, as neither the source code nor the produced binary code for the API under test is always available to the public. This paper adapted the ABC to automate RESTful API black-box testing based on the OpenAPI specification (OAS) standards, as well as improving the API test coverage criteria for the automatically generated test suite by incorporating nominal and mutation testing. The results showed that the adapted ABC is promising in this context; it could achieve high testing coverage criteria. Currently, we are working on enhancing the ABC to improve its performance in this context.
8.875, 0.606
42.4, 0.868
12.525, 0.609 14
22.4, 0.679
24.1, 1.068
JSONPlaceholder
DummyAPI
FakeRestAPI
FakeStore
productManagement
28
28
49
10.5
3.5
2.754, 0.299
OpenWeatherMap
Max fitness
Best fitness (m, sd)
API
86
80
89
86.5
84.5
78.6
Mean: max fitness (%)
93.75
96.875
100
100
91.6
100
Operation coverage (%)
86.875
60.8
87.5
74.3
75
100
Parameter coverage (%)
92.125
–
–
–
–
30
Input value coverage (%)
Table 1.2 Test suite best fitness values (mean, standard deviation) achieved by the ABC, coverage percentages
84.375
82.125
90.625
86.87
66.61
100
Status code coverage (%)
49
48
26
76
13
10
Test suite size
1 Artificial Bee Colony for Automated Black-Box Testing of RESTful API 15
16
S. Ahmed and A. Hamdy
References 1. Arcuri, A.: RESTful API automated test case generation with EvoMaster. ACM Trans. Softw. Eng. Methodol. 28(January 2019), 37 (2019). https://doi.org/10.1145/3293455 2. Banias, O., Florea, D., Gyalai, R., Curiac, D.-I.: Automated specification-based testing of REST APIs. In: Selected Papers from the International Symposium on Electronics and Telecommunications ISETC 2020 (2021). https://doi.org/10.3390/s21165375 3. Cardoso, J., Miller, J.A., Vasquez, V.: Introduction to Web Services (2007). https://doi.org/10. 4018/978-1-59904-045-5.ch007 4. Ed-douibi, H., Izquierdo, J.L., Cabot, J.: Automatic generation of test cases for REST APIs: a specification-based approach. In: 2018 IEEE 22nd International Enterprise Distributed Object Computing Conference (EDOC), pp. 181–190 (2018). doi:https://doi.org/10.1109/EDOC. 2018.00031 5. Fisal, N., Hamdy,A., Rashed, E.: Search-based regression testing optimization. Int. J. Open Source Softw. Process. 12, 1–20 (2021) 6. Fisal, N., Hamdy, A., Rashed, E.: Adaptive weighted sum bi-objective bat for regression testing optimization. In: Artificial Intelligence and Online Engineering: Proceedings of the 19th International Conference on Remote Engineering and Virtual Instrumentation, pp. 486–495. Springer International Publishing (2022) 7. Fisal, N., Hamdy, A., Rashed, E.: Multi-objective adapted binary bat for test suite reduction. Intell. Autom. Soft Comput. (2022) 8. Fraser, G.A.: Whole test suite generation. IEEE Trans. Softw. Eng. 39, 276–291 (2013). https:// doi.org/10.1109/TSE.2012.14 9. Hamdy, A.: Genetic fuzzy system for enhancing software estimation models. Int. J. Model. Optim. 4, 227–232 (2014) 10. Hamdy, A., Mohamed, A.: Greedy binary particle swarm optimization for multi-objective constrained next release problem. Int. J. Mach. Learn. Comput. 9, 561–568 (2019) 11. Karaboga, D.: An Idea Based on Honey Bee Swarm for Numerical Optimization (2005) 12. Karlsson, S., Cauševiˇc, A., Sundmark, D.: QuickREST: Property-Based Test Generation of OpenAPI-Described RESTful APIs, pp. 131–141 (2020). https://doi.org/10.1109/ICST46399. 2020.00023 13. Martin-Lopez, A., Segura, S., Ruiz-Cortés, A.: Test Coverage Criteria for RESTful Web APIs, pp. 15–21 (2019). https://doi.org/10.1145/3340433.3342822 14. Martin-Lopez, A., Segura, S., Ruiz-Cortés, A.: RESTest: Automated Black-Box Testing of RESTful Web APIs. In: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 682–685 (2021). https://doi.org/10.1145/3460319.346 9082 15. Pahl, C., Jamshidi, P.: Microservices: a systematic mapping study. In: Proceedings of the 6th International Conference on Cloud Computing and Services Science, vols. 1 and 2, pp. 137– 146. SCITEPRESS—Science and Technology Publications, Lda, Rome (2016). https://doi.org/ 10.5220/0005785501370146 16. RESTful Web Services—Introduction: Tutorialspoint (2021). https://www.tutorialspoint.com/ restful/restful_introduction.htm 17. Richardson, L., Ruby, S.: Restful Web Services, vol. 316. O’Reilly Media, 1st edn (2007) 18. Sahin, O., Akay, B.: Comparisons of metaheuristic algorithms and fitness functions on software test data generation. Appl. Soft Comput. 49, 1202–1214 (2016). https://doi.org/10.1016/j.asoc. 2016.09.045 19. Sahin, O., Akay, B.: A discrete dynamic artificial bee colony with hyper-scout for RESTful. Appl. Soft Comput. 104 (2021). https://doi.org/10.1016/j.asoc.2021.107246 20. Segura, S., Parejo, J.A., Troya, J., Ruiz-Cortes, A.: Metamorphic testing of RESTful web APIs. IEEE Trans. Softw. Eng. 44, 1083–1099 (2018). https://doi.org/10.1109/TSE.2017.2764464 21. Viglianisi, E., Dallago, M., Ceccato, M.: RESTTESTGEN: automated black-box testing of RESTful APIs. In: 2020 IEEE 13th International Conference on Software Testing, Validation and Verification (ICST), pp. 142–152 (2020). https://doi.org/10.1109/ICST46399.2020.00024
1 Artificial Bee Colony for Automated Black-Box Testing of RESTful API
17
22. Zhang, M., Marculescu, B., Arcuri, A.: Resource-based test case generation for RESTful web services. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1426– 1434. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10. 1145/3321707.3321815
Chapter 2
Classifying Human Activities Using Machine Learning and Deep Learning Techniques Satya Uday Sanku, Thanuja Pavani Satti, T. Jaya Lakshmi, and Y. V. Nandini
Abstract The ability of machines to recognize and categorize human activities is known as human activity recognition (HAR). Most individuals today are health aware; thus, they use smartphones or smartwatches to track their daily activities to stay healthy. Kaggle held a challenge to classify six human activities using smartphone inertial signals from 30 participants. HAR’s key difficulty is distinguishing human activities using data so they do not overlap. Expert-generated features are visualized using t-SNE, then logistic regression, linear SVM, kernel SVM, and decision trees are used to categorize the six human activities. Deep learning algorithms of LSTM, bidirectional LSTM, RNN, and GRU are also trained using raw time series data. These models are assessed using accuracy, confusion matrix, precision, and recall. Empirical findings demonstrated that the linear support vector machine (SVM) in the realm of machine learning, as well as the gated recurrent unit (GRU) in deep learning, obtained higher accuracy for human activity recognition.
2.1 Introduction In this modern era, smartphones became an integral part of human life. There are numerous sensors on these smartphones that assist in playing games, taking pictures, and for gathering other information. The accelerometer and gyroscope sensors, in addition to camera sensors, measure acceleration and angular motion. By using these S. U. Sanku (B) · T. P. Satti · T. J. Lakshmi · Y. V. Nandini School of Engineering and Sciences, SRM University, Amaravati, Andhra Pradesh, India e-mail: [email protected] T. P. Satti e-mail: [email protected] T. J. Lakshmi e-mail: [email protected] Y. V. Nandini e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_2
19
20
S. U. Sanku et al.
two sensors, the University of California Irvine (UCI) created a dataset by experimenting with a group of 30 participants aged 19–48 years old. Participants conducted six tasks of walk, walk upstairs, walk downstairs, sit, stand, and laydown with Samsung-Galaxy-S-II smartphone at waist. Built-in accelerometers and gyroscopes provided readings of tri-axial linear acceleration and angular velocity at 50 Hz. The experiments were video-recorded in order for the data to be manually labelled. After collecting all the data in the form of frequency signals, the researchers preprocessed the data by using some noise filters on the sensor signals, and then those signals were fabricated in the form of vectors which consist of 128 readings per sliding window of fixed width of 2.56 s [1]. Here, the task is to build multiclass classification models by using 128-dimensional fabricated vectors from signals of accelerometer and gyroscope sensors to map to one of the six types of daily human activities.
2.2 Problem Description Given accelerometer and gyroscope tri-axial raw time series data of human activities in the form of vectors labelled with one of the six activity class labels: • • • • • •
Walking Walking-upstairs Walking-downstairs Sitting Standing Laying.
The problem of human activities classification is to predict a class label from one to six different activities for an activity instance with missing label. This is a multiclassification problem.
2.2.1 Dataset Description UCI provided a dataset named .U C I _H A R_Dataset in Kaggle platform, consisting of raw time series data as well as expert generated attributes. The raw data consists of tri-axial acceleration and tri-axial gyroscope data. Acceleration is divided into triaxial body acceleration and tri-axial total acceleration [2]. There are a total 10,299 data points of 128 dimensions each where 70% of split is given to training phase and remaining 30% of data is given to testing phase, whereas in expert generated features, there are 561 features like tBodyAcc-mean(), tBodyAcc-std(), tBodyAccmean(), and so on. Here, 7352 data points are used for training and 2947 are used for testing phase.
2 Classifying Human Activities Using Machine Learning …
21
Fig. 2.1 Distribution of instances over six classes
2.2.2 Exploratory Data Analysis For any problem, the first thing to do is exploratory data analysis. It is called exploratory because we do not know anything about the dataset we start with. Exploratory data analysis is therefore useful for gaining a deeper knowledge of a dataset. Figure 2.1 displays the distribution of examples across six classes. Univariate analysis and t-SNE were performed on the dataset, whose details are given below. • Univariate Analysis: Univariate analysis is a simple form of analysis that focuses on a single feature in order to extract useful insights from the data and identify meaningful patterns. Figure 2.2 represents separation of stationary and moving activities with the help of univariate analysis on the feature “tBodyAccMagmean”.
22
Fig. 2.2 Univariate analysis on feature “tBodyAccMagmean”
Fig. 2.3 T-Sne on 561 expert generated features
S. U. Sanku et al.
2 Classifying Human Activities Using Machine Learning …
23
• T-SNE: T-distributed stochastic neighbourhood embedding is a dimensionality reduction which is a method for improving data visualization by reducing the clustering of map points towards the map’s centre [3]. It is a state-of-the-art technique which performs better than principal component analysis in especially reducing a data point from higher dimension to lower dimension. t-SNE preserves the data points within the cluster which helps to better classify the clusters in lower dimension. In this experimentation, t-SNE is used to map higher-dimensional space to lower-dimensional space to better cluster all the six activities data points distinctly so that it can be better visualized. In t-distributed stochastic neighbourhood embedding, we have used expert generated features of 561-dimensional each to visualize the data in lower dimension. It is observed that there is an overlap between sitting and standing data points but remaining other four classes are clustered without any overlapping as shown in Fig. 2.3.
2.3 Related Literature Recognizing human actions is an area of active study. It has various applications in different domains like fitness, medical field, gym, eldercare, surveillance, and so on. Chernbumroong et al. [4] indicate that the location of the sensor is crucial to the success of activity recognition for human everyday living in a free space setting, based on an experiment in which a single sensor worn on the wrist detected five different activities. Wearing a sensor on one’s wrist has the potential to reduce mobility restrictions, discomfort, and social stigma [4]. The work of Chetty et al. [5] proves that effective machine learning and data mining techniques are needed to interpret the numerous sensor signals from smartphones to enable automatic and intelligent activity recognition. It is not known which algorithm will perform better for activity recognition on smartphones, despite the availability of numerous machine learning algorithms. Remote activity monitoring and recognition in the elderly and disabled care sectors will benefit greatly from automatic activity recognition systems based on advanced processing of many sensors included on smartphones [5]. Zameer Gulzar et al. find that threshold-based algorithm is simpler and faster and is often used to detect the human activity. But the machine algorithm produces a reliable outcome [6]. Hamza Ali Imran et al. say that for problems like activity recognition, feature engineering should be prioritized instead of sophisticated deep learning architectures and say basic machine learning classifiers should be used [7]. Tahmina Zebin et al. focused their research on the use of deep learning approaches for human activity recognition. They used CNN to automate feature learning from multichannel time series data which gave the best results in classifying the human activities [8]. Seung Min Oh et al. say that to employ deep learning techniques, you must first learn using high-quality datasets. Adequate labelling is required for quality data, which takes time, effort, and cost of humans. Active learning is mostly used to reduce labelling time [9].
24
S. U. Sanku et al.
2.4 Proposed Approach The proposed approach is shown in Fig. 2.4. The researchers generated 561 features using the raw time series data provided by UCI. Exploratory data analysis was performed on these features and found that “tBodyAccMagmean” separated the classes in the best way. Also, these experts generated features are used for training and testing the machine learning models, while raw time series data is used for deep learning models. Finally, performance measures like confusion matrix, accuracy, precision, and recall are computed to check which model performed better.
2.4.1 Machine Learning Models • Logistic regression: Used to estimate relationships between one or more independent variables in a dataset. The objective of logistic regression is to find the best hyperplanes that separate six activities distinctly from each other in a linear fashion using one vs all approach [10]. Geometrically, the data points that lie above the hyperplane and are in the same direction as the weight vector belong to the positive class while the data points that lie in the opposite direction belong to the negative class. In order to find a hyperplane that best classifies data points, parameters like weights (w) and bias (b) are required. Optimal weights for logistic regression are computed using Eq. 2.1.
Fig. 2.4 Proposed approach
2 Classifying Human Activities Using Machine Learning …
[ .
∗
W = argw min
n ∑
25
] log(1 + exp(−yi w xi )) + λw w T
T
(2.1)
i=1
After finding a best hyperplane with optimal weights and bias, to check whether the model is predicting accurately or not, a distance measure called signed distance is used for every single data point, but if we want to find the overall model performance, the sum of signed distances fails when there are outliers. So, to overcome this issue, squashing technique is used, which maps larger signed distance values to a value in between 0 and 1 by using sigmoidal activation function which gives good probabilistic interpretation. However, there might be a case, where overfitting might occur in logistic regression, which is due to outliers, leading weight values to + .∞ or .− .∞. To overcome this issue, regularize with grid search is used to control weight values with the help of hyperparameter in such a way that the model does not overfit. Here, logistic loss is used as an approximation for 0–1 loss to make loss function continuous such that it is differentiable [11]. • Linear SVM: Support vector machines (SVMs) are widely employed for classifying linear/nonlinear data. They find extensive application across various domains, particularly in handling high-dimensional data [12]. In the context of this particular problem, the primary objective of SVMs is to identify hyperplanes that maximize the margin and effectively classify six distinct human activities. If the margin distance is high, the chance that the points misclassify decreases. To construct an SVC, first construct a convex hull for similar data points and then find the shortest lines connecting these hulls and then bisect these connecting lines by using a plane. These bisecting planes are called as margin maximizers which better classify human activities. Here, hinge loss is used as an approximation for 0–1 Loss [13]. The weights(and bias) are computed using Eq. 2.2. (w∗ , b∗ ) = argminw,b
.
n 1∑ ||w|| + c. ξi 2 n i=1
(2.2)
• RBF Kernel SVM: The kernel SVM is an extension of soft margin SVM where kernel trick is additionally operated. Here in this research, radial basis function is used as a kernel which helps in handling nonlinear data. Kernelization takes data which is in d-dimensional space and performs feature transformation internally in such a way that nonlinear data can be linearly separable in transformed space d.' [14]. For doing such transformation, predominantly two parameters named c in soft margin SVM and sigma in RBF are required, which can be optimally found by using grid search. The overall computation of margin maximizing hyperplanes by using kernel trick is computed with the help of Eq. 2.3.
26
S. U. Sanku et al.
.
max αi
n ∑ n ∑
αi .α j .yi .y j .rbf_kernel(xi , x j )
(2.3)
i=1 j=1
• Decision Trees: Used for classification/regression problems. This classifier is very similar to if .. . . else logical conditions in programming. At leaf nodes of the decision tree, class labels are decided, while all non-leaf nodes are involved in the decision-making process. Corresponding to every decision, there exists a hyperplane which is axis parallel. Therefore, the decision tree classifies all the distinct human activities axis parallelly using hyperplanes [15].
2.4.2 Deep Learning Models • RNN: Recurrent neural networks are mainly used to process sequential data. Here, neural networks are termed as recurrent because the information cycles through a loop with time. In RNN, three types of weights need to be initialized based on the activation unit. If the activation unit is tanh or sigmoidal, Xavier or glorot initialization is used, or if the activation unit is ReLU, “He” initialization is used. RNN can only take care of short-term dependencies but not long-term dependencies; this will cause problems in both forward propagation and backward propagation because in real world, there might be a requirement of long-term dependencies where the input, which is learned long back ago, should be retained to the current state [16]. • LSTM: Long short-term memory is a special purpose recurrent unit which is an extension of recurrent neural network used to remember long-term dependencies. These LSTMs are designed in such a way that they can process large sequences of input data which ranges from 200 to 400 time steps. LSTMs are good fit for this problem. This model supports many parallel sequences of data like axis of the gyroscope and accelerometer data. LSTMs extract features by observing the input sequences and also learn how to map internal features to one of the six distinct activity types [16]. The advantage is that they learn directly from raw data, reducing the need for domain knowledge to construct input features. The model is able to learn an internal representation of the time series data and is also useful to retain the cell state. The main feature of LSTM is short circuit, i.e. from.Ct − 1 (previous cell state output) to Ct(Current cell state output). The output from previous cell state will not be disturbed if short circuited (.Ct = Ct − 1). If the LSTM needs to be short circuited the forget gate output should be list of one’s and later, point-wise addition (+) from the input gate should be list of zeroes. From the output of the forget gate, we can decide how much of the previous cell state to be remembered or how much should this cell state forget. When coming to the input gate, the output of this gate tells how much new information should be added. Finally, the output of the cell state is released through the output gate.
2 Classifying Human Activities Using Machine Learning …
27
• Bidirectional LSTM: Bidirectional long short-term memory is an extension of LSTM which can be used in learning both forward and backward sequence patterns of the sequential input data. In this model, later inputs can also impact the previous outputs. Here, the training is done on two LSTM instead of one on input sequence [17]. • GRU: Gated recurrent units are a gating mechanism in recurrent neural networks. GRU is the same as LSTM but with only two gates named reset and update gate, respectively. In this, there is no need for a cell state because everything is captured in output itself. Due to less complicated architecture than LSTM, the GRU performs faster computation on partial derivatives which results in less time complexity [18]. GRU has less parameters (4230) when compared to LSTM(5574). The nine raw time series data namely .body_acc_x, .body_acc_y, .body_acc_z, .body_gyr o_x, .body_gyr o_y, .body_gyr o_z, .total_acc_x, .total_acc_y, .total_acc_z is given as input to 32 gated recurrent unit (GRUs) in the form of 128-dimensional vectors. After giving input to GRUs, the dropout of 0.5 is added because the number of parameters (4230) is closer to number of data points (7352) in the training dataset. So it is very trivial to overfit the model. After generating the output from GRUs, every output is connected to the neurons present in the dense layer. The dense layer ensures that all the outputs from GRUs are fully connected. Now, from the dense layer, by using the function softmax, it predicts one of the distinct human activities according to the generated output data.
2.5 Results In this experimentation, as machine learning models are unable to fabricate features directly from raw time series data, the features that are generated by experts from the raw data are given as input to train the machine learning models. However, deep learning algorithms use the raw time series data. Accuracy of various human activities recognized using machine learning algorithms is given in Table 2.1. Three of the machine learning models better classified all the six basic human activity labels but with a slight confusion between sitting and standing data points. All
Table 2.1 Accuracy of machine learning algorithms using expert generated features Activity .→ Algorithm .↓
Laying (%)
Sitting (%)
Standing (%)
Walking (%)
Walking Walking downstairs (%) upstairs (%)
Logistic regression
100
88
97
99
96
95
Linear SVM
100
88
98
100
98
96
RBF Kernel SVM
100
90
98
99
95
96
Decision trees
100
75
89
95
84
77
28
S. U. Sanku et al.
Table 2.2 Accuracy of deep learning algorithms using raw time series data Activity .→ Algorithm .↓
Laying (%)
Sitting (%)
Standing (%)
Walking (%)
Walking Walking downstairs (%) upstairs (%)
RNN
99
88
55
45
94
89
LSTM
95
77
89
95
98
97
Bidirectional LSTM
100
74
82
86
98
99
GRU
97
78
90
99
94
97
in all linear support vector classifier performed exceptionally well on expert generated features with 96.7% accuracy, while decision tree model performed comparatively less with an accuracy of 87%. On the other hand, deep learning models used in this research are able to generate features on their own using the raw time series data. The accuracy is tabulated in Table.2.2. Here, recurrent neural networks used 1542 parameters from raw time series data for training and performed 77.64% accurately, while the gated recurrent units model used 4230 parameters and achieved an accuracy of 92.60% which performed better than all other deep learning models.
2.6 Conclusion and Future Scope In classifying the six distinct human activities, the machine learning models performed better than deep learning models due to the fact that machine learning models are provided with expert generated features which gives better understanding to the model. However, deep learning models showed no less performance even though they generated their own features using the raw time series data. In case of machine learning models, linear support vector classifiers achieved the highest accuracy of 96.7%, whereas in case of deep learning models, gated recurrent unit achieved an highest accuracy of 92.60%. Even though these models performed their best, both deep learning and machine learning models faced the same confusion between sitting and standing activities because of the reason that these two activities are stationary. So, in future, our goal is to train the models in such a way that it avoids confusion between these stationary activities such that these findings could be utilized to create smartwatches and other devices that track a user’s activities and alert him or her of the daily activity record.
2 Classifying Human Activities Using Machine Learning …
29
References 1. Anguita, D., Ghio, A., Oneto, L., Parra Perez, X., Reyes Ortiz, J.L.: A public domain dataset for human activity recognition using smartphones. In: Proceedings of the 21th International European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, pp. 437–442 (2013) 2. Learning, U. C. I. M.: Human activity recognition with smartphones. Kaggle (13 Nov 2019). https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones 3. Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. Journal of machine learning research, 9(11) (2008) 4. Chernbumroong, S., Atkins, A.S., Yu, H.: Activity classification using a single wrist-worn accelerometer. In: 2011 5th International Conference on Software, Knowledge Information, Industrial Management and Applications (SKIMA) Proceedings, pp. 1–6. IEEE (Sept 2011) 5. Chetty, G., White, M., Akther, F.: Smart phone based data mining for human activity recognition. Procedia Comput. Sci. 46, 1181–1187 (2015) 6. Gulzar, Z., Leema, A.A., Malaserene, I.: Human Activity Analysis Using Machine Learning Classification Techniques. Int. J. Innov. Technol. Explor. Eng. (IJITEE) (2019) 7. Imran, H.A., Wazir, S., Iftikhar, U., Latif, U.: Classifying Human Activities with Inertial Sensors: A Machine Learning Approach (2021). arXiv preprint arXiv:2111.05333 8. Zebin, T., Scully, P.J., Ozanyan, K.B.: Human activity recognition with inertial sensors using a deep learning approach. In: 2016 IEEE SENSORS, pp. 1–3. IEEE (Oct, 2016) 9. Oh, S., Ashiquzzaman, A., Lee, D., Kim, Y., Kim, J.: Study on human activity recognition using semi-supervised active transfer learning. Sensors 21(8), 2760 (2021) 10. Schober, P., Vetter, T.R.: Logistic regression in medical research. Anesthesia Analgesia 132(2), 365 (2021) 11. Chapter logistic regression. Stanford University (29 Dec 2021). Retrieved January 3, 2022, from https://www.web.stanford.edu/~jurafsky/slp3/5.pdf 12. Liu, L., Shen, B., Wang, X.: Research on kernel function of support vector machine. In: Advanced Technologies, Embedded and Multimedia for Human-Centric Computing, pp. 827– 834. Springer, Dordrecht (2014) 13. Gaye, B., Zhang, D., Wulamu, A.: Improvement of support vector machine algorithm in big data background. Math. Probl. Eng. 2021 (2021) 14. Achirul Nanda, M., Boro Seminar, K., Nandika, D., Maddu, A.: A comparison study of kernel functions in the support vector machine and its application for termite detection. Information 9(1), 5 (2018) 15. Izza, Y., Ignatiev, A., Marques-Silva, J.: On explaining decision trees (2020). arXiv preprint arXiv:2010.11034 16. Sherstinsky, A.: Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D Nonlin. Phenomena 404, 132306 (2020) 17. Zeyer, A., Doetsch, P., Voigtlaender, P., Schlüter, R., Ney, H.: A comprehensive study of deep bidirectional LSTM RNNs for acoustic modeling in speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2462– 2466. IEEE (2017, March) 18. Yang, S., Yu, X., Zhou, Y.: LSTM and GRU neural network performance comparison study: taking yelp review dataset as an example. In: 2020 International Workshop on Electronic Communication and Artificial Intelligence (IWECAI), pp. 98–101. IEEE (2020, June)
Chapter 3
Explainable Artificial Intelligence and Mobile Health for Treating Eating Disorders in Young Adults with Autism Spectrum Disorder Based on the Theory of Change: A Mixed Method Protocol Omobolanle Omisade , Alex Gegov , Shang-Ming Zhou , Alice Good, Catherine Tryfona, Sandeep Singh Sengar , Amie-Louise Prior, Bangli Liu, Taiwo Adedeji , and Carrie Toptan
Abstract Autistic children often face difficulties eating well into their early adolescence, putting them at a greater risk of developing disordered eating habits during this developmental stage. Research suggests that mobile devices are easily accessible to young adults, and their widespread use can be leveraged to provide support and intervention for autistic young adults in preventing and self-managing eating disorders. By utilising Explainable Artificial Intelligence (XAI) and Machine Learning (ML) powered mobile devices, a progressive learning system can be developed that provides essential life skills for independent living and improved quality of life. In addition, XAI can enhance healthcare professionals’ decision-making abilities by utilising trained algorithms that can learn, providing a therapeutic benefit for preventing and mitigating the risk of eating disorders. This study will utilise the theory of change (ToC) approach to guide the investigation and analysis of the complex integration of autism, ED, XAI, ML, and mobile health. This approach will be complemented by user-centred design, Patient and Public Involvement and Engagement (PPIE) tasks, activities, and a mixed method approach to make the integration more rigorous, timely, and valuable. Ultimately, this study aims to provide essential life
O. Omisade (B) · C. Tryfona · S. S. Sengar · A.-L. Prior Cardiff Metropolitan University, Llandaff, Cardiff, UK e-mail: [email protected] A. Gegov · A. Good · T. Adedeji · C. Toptan University of Portsmouth, Portsmouth, UK S.-M. Zhou University of Plymouth, Plymouth, UK B. Liu De Montfort University, Leicester, UK © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_3
31
32
O. Omisade et al.
skills to autistic young adults to prevent and self-manage eating disorders using XAI-powered mobile devices.
3.1 Background According to the Centre for Disease Control and Development, ED in YA has become a primary public concern [1]. Avoidable deaths from ED are still reported [2]. Patient Support Groups, the National Centre of Excellence for Eating Disorder, and the National Eating Disorder Association have continued to emphasise the seriousness of ED in YA. In 2019, the UK government suggested that improving ED services is a key priority and a fundamental part of its commitment to improving mental health services. There is an estimated prevalence of 1.25 million people with ED in the UK [3, 4], 1.2–2.4% for anorexia nervosa (AN) and 1.2–2.3% for bulimia nervosa (BN), depending on diagnostic criteria [5]. An individual with AN has a distorted sense of their body and a profound fear of gaining weight, especially if they are already underweight. The frequent food restriction of those who have anorexia nervosa can result in severe malnutrition and other health issues. BN is characterised by recurring episodes of binge eating followed by purging activities, including self-induced vomiting, abusing laxatives, or excessive exercise. During binge eating episodes, people with BN frequently feel out of control and may quickly consume a lot of food. Although limited research specifically examines the prevalence of ED in YA, it is twice as common in people aged 16–20 [6, 7]. Greater autistic traits in childhood could represent a risk factor for developing disordered eating in adolescence [8]. Autism spectrum disorder (hereafter referred to as autism) and ED can co-exist, as 20–30% of YA with eating disorders have autism [9]. ED in autistic people significantly impacts their physical, psychological, and social well-being, with some of the highest mortality rates of any mental illness [4]. They are more likely to struggle because they are already disadvantaged and are a subgroup known to have relatively poor outcomes [10]. Even with the coordinated effort of multiple organisations, including Child and Adolescent Mental Health Service (CAMHS) and programmes by the Maudsley Centre for Child and Adolescent Eating Disorders (MCCAED), the understanding and national strategy to improve sustained intervention outcome of ED in YA with autism remains inconsistent at best. For example, Family Therapy for Eating Disorders (FT-AN) is the first-line treatment for adolescents with ED in the UK and has proven effective [10]. However, it is noted that between 10 and 40% of young people have poor treatment, engagement, and attendance outcomes [11]. In addition, these interventions (all existing treatments for YA with ED will hereafter be referred to as interventions) can be resource intensive and require the physical attendance of people with ED. In another case, Cognitive Behavioural Therapy requires approximately twice the number of sessions as with FT-AN [12]. So, many young people receive costly inpatient treatment, with a high risk of relapse and readmission [11].
3 Explainable Artificial Intelligence and Mobile Health for Treating …
33
When young people are discharged from inpatient care, they may struggle to transition back to their daily lives and maintain the progress they made during treatment. This can be due to a lack of support and resources or underlying factors contributing to their mental health. Furthermore, it is unclear how the effects of the intensive treatment programme by MCCAED would be replicated in another sample [11], such as YA with autism who develop ED. In addition, disparities in patient engagement with online interventions have previously been reported concerning socioeconomic status, race, and literacy [13]. As a result, young people with ED need additional intervention, adaptation, and personalisation [3, 10, 11] that are not explicitly available. Besides, there are still concerns regarding the privacy and accessibility of intervention [13]. Much attention has been given to the quality of ED interventions. However, all the interventions mentioned so far suffer from the fact that YA frequently do not respond or have easy access to these interventions in formats that engage them [11, 14]. Notably, for an intervention to have sustained outcomes, it needs to be accessible and engaging, and the user should be able to self-manage it [15]. Nevertheless, autistic people enjoy interacting and engaging with computers and electronic devices [16]. When intelligent, these computer systems can engage users, encourage help-seeking behaviours, foster recovery motivation, contribute to positive self-concept development, and strengthen self-esteem, which could improve sustained intervention outcomes [3]. Furthermore, the ubiquitous nature of mobile devices can make intervention delivery more accessible. Artificial Intelligence (AI) can be used in various ways to enhance mHealth for people with mental health concerns. Intelligent mobile applications underpinned by Explainable Artificial Intelligence (XAI) and Machine Learning (ML) to prevent and self-manage ED could provide essential life skills necessary for independent living, well-being, and improved quality of life through progressive learning, finding structure and regularities in data. In addition, XAI can improve health monitoring and has the potential to advance healthcare professionals’ knowledge and decision-making by using trained algorithms that can acquire skills [17, 18]. The therapeutic benefit is assisting practitioners in preventing and mitigating the risk of ED and facilitating sustained treatment outcomes in a context that allows easy interaction, accessibility, and engagement [18]. Also, with the potential of XAI and Deep Learning (DL) algorithms, it is possible to detect ED early so that proper support can be provided. XAI algorithms can analyse data from mobile devices to identify patterns that may be used to recommend engagement and self-care strategies. By analysing patterns in data from mobile devices, DL algorithms can identify warning signs of disordered ED and alert individuals and healthcare professionals. This would include an analysis of individuals’ eating habits, physical activity, and mood to identify potential risks for disordered eating behaviours. Furthermore, XAI can help explain the reasoning behind the algorithms’ predictions and provide individuals with a better understanding of the factors that may contribute to their risk of developing ED. This can empower individuals to make informed decisions about their health and take action to prevent the development of
34
O. Omisade et al.
disordered eating behaviours. In producing an effective XAI and MH adjunct intervention, autistic YA will be able to have a rich source of improved health management at their fingertips without the need for resource-intensive interventions. While the current autistic ED sufferers will likely not have an in-depth knowledge of using AI and MH to manage their well-being, the next generation will more likely consider them familiar interventions. Despite the evident upsurge of perceived benefit and interest in XAI and MH, significant scientific, learning, policies, and outcomes of AI and MH interventions for ED in autistic YA are not yet sufficiently recognised, not fully understood or engaged with, even by practitioners. This is shown in different studies tracking the quality of intervention to facilitate improved quality of life for people with ED and how they are cited. We searched appropriate databases (Medline, PsychINFO, PubMed, Science Direct, and Web of Science) for existing evidence exploring the use of keywords like XAI support for YA with ED and mobile app support for autistic YA with ED, which identified no existing theory-based research. To the best of our knowledge, there are no person-centred XAI-driven mobile health interventions to improve treatment engagement, accessibility self-management, sustained treatment outcome, and overall well-being of autistic YA with ED. There is an opportunity to explore the options to maximise the opportunities and benefits of XAI, ML, and MH adjunct interventions for this group towards an inclusive and sustainable outcome.
3.2 Aims and Objectives This project aims to bring together people in the broad spectrum of ED, autism, XAI, Machine Learning, and MH in a novel multidisciplinary approach to discover and analyse ED treatment engagement, accessibility, self-management, and sustainable outcome gaps. The objectives of this project are to 1. Identify factors that influence intervention outcomes for autistic YA with ED. 2. Investigate therapeutic interventions to promote engagement, interaction, accessibility, recovery, self-management, and sustained outcomes in YA with comorbidity of autism and ED. 3. Explore how Explainable Artificial Intelligence and mobile health implementation can effectively benefit autistic YA with ED interventions. 4. Establish best practices, barriers, and ethical implications for integrating MH technology for ED intervention.
3.3 Method This research project will be carried out in accordance with ethical principles. Ethical issues associated with this project will be assessed by the National Research Ethics Service (NRES) for England and The Integrated Research Application System
3 Explainable Artificial Intelligence and Mobile Health for Treating …
35
(IRAS) Wales board which is committed to enabling and supporting ethical research while protecting the rights and safety of research participants and enabling ethical research, which is of potential benefit to science and society.
3.3.1 Study Design To rigorously achieve the aim and objectives and manage the project process, this research will adopt the theory of change (ToC). ToC is an outcomes-based approach that starts with long-term goals and maps backwards to the inputs and preconditions required to achieve these goals for complex programmes [19]. ToC is deemed more appropriate because it will make the novel integration of autism, ED, XAI, ML, and MH more rigorous, timely, and valuable. Furthermore, it will serve as a tool that guides the enquiry and analysis of the complex integration. ToC makes plans for technology integration more grounded and guarantees the quality of the change it brings, validated by three guidelines, how plausible, doable, and testable it is [19]. Importantly, it provides a structure for different techniques to plug in and guides the overall process of the project in achieving its goals. Figure 3.1 shows the use of change theory to inform this research, highlighting each component and their involvement. ToC maps illustrated components such as the problems the research was trying to solve, the key stakeholders, assumptions, inputs, interventions, outputs, measurable effects and wider benefits of the implementation to realise the long-term change. In the context of this study, the ToC components use these definitions: • Long-term change: the outcome desired by stakeholders • Problems: the difficulties identified by stakeholders • Stakeholders: the people directly or indirectly involved or affected by the success or failure of the AI-driven mobile health application implementation • Assumptions: stakeholders are challenged to make explicit assumptions and beliefs that specify the underlying reasons for the logical connections among the ToC elements. This may be based on the empirical knowledge of expert practitioners or research evidence • Inputs: the activities or tasks carried out around the intervention • Interventions: Interventions are the activities carried out within an initiative or programme that result in the desired outcomes • Outputs: the tangibles resulting from the inputs and the intervention • Measurable effects: the immediate indicators that can be traced back to the implementation process and are easily usable for evaluation • Wider benefits: generalisable pointers that can guide the stakeholders about the chances of implementing long-term change. ToC will be complemented by the unique combination of user-centred design, Patient and Public Involvement and Engagement (PPIE) tasks, activities, and a mixed method approach. The iterative nature of user-centred design (UCD) and its mutistage problem-solving process will facilitate analysing the likelihood of usage and
36
O. Omisade et al.
Fig. 3.1 Adoption of the theory of change for this research
ensure that assumptions are tested for validity regarding user behaviour in real-world tests with actual users. By adopting a UCD approach, this project places users at the centre of decision-making. PPIE activities will facilitate a comprehensive understanding of user needs, contexts, and environments. The research design (Fig. 3.2) will include using evidence obtained from the literature review and secondary data analysis of the Clinical Practice Research Datalink (CPRD) to facilitate the initial findings and assumptions. The CPRD is a population-based, longitudinal primary care database that covers a representative sample of the UK population in terms of age, sex, and ethnicity [20]. The review will also explore various neuroimage processing and classification algorithms, focusing on interventions for autistic YA with ED. Through an analysis of the similarities between these algorithms, a new classification method will be developed to categorise interventions effectively based on various constraints and factors. This will enable the identification and modelling of interventions more accurately. Moreover, existing artificial neural network approaches and modelling methods will be identified and tested based on current therapeutic and academic best practices. Furthermore, organising various PPIE activities for elicitation and knowledge transfer will foster collaboration and facilitate the exchange of new ideas and insights with the scientific community, potential user groups, and the wider public. These activities will include talk shows (including exhibitions) and workshops, as well as conferences and training sessions that bring together thinkers, makers, investors, and
3 Explainable Artificial Intelligence and Mobile Health for Treating …
37
Fig. 3.2 Research design
researchers across the area of ED to explore the relationship between innovative research and clinical health interventions, share best practices, and bridge practical needs, academic knowledge, and experience. The PPIE activities will be driven and refined by user-centred evaluation that addresses the entire user experience. The process will involve users throughout the project and will be iterative. In addition, people with lived experiences (PWLE) will participate in PPIE activities, including qualitative research such as focus groups and interviews and quantitative research that will be collected through an online survey. The research team will consider the requirements of PWLE as determined by the PPIE activities, objectives, and feedback. The needs of PWLE will be iteratively aligned with the perception of experts, practitioners, and secondary data. The team will constantly improve the user experience through feedback on PPIE activities and
38
O. Omisade et al.
will gradually introduce changes as it gains a better understanding of the target audience. The satisfaction of users’ needs and wants will be a priority, and every design decision will be evaluated in terms of its value to users. Adopting a user-centred design approach will add an emotional impact to the development process, making people with lived experiences feel assured and confident that they were involved in critical decisions to support their well-being. It is anticipated that this will facilitate dissemination and overall impact. Moreover, a pragmatic mixed methodology will be used to allow for holistic exploration and evaluation, producing a complete picture of the phenomena being studied. This will lead to a convergent design using primarily qualitative methods. The qualitative research will involve focus groups and interviews during PPIE activities to collect purposively sampled data. Based on this understanding, a classification method will be developed to categorise interventions considering constraint factors for identification and modelling purposes. This algorithm will be modelled, trained, and tested to ensure its effectiveness and reliability. Once validated, it will serve as the foundation for an adaptable conceptual framework tailored to the specific needs of YA with autism and ED. This conceptual framework will be designed to be implemented in a sustainable and user-centred mobile application that supports the target population in managing their well-being.
3.3.2 Participants Participants in this research are researchers, health practitioners, autistic YA with ED, their carers, and family. All participants must be willing to participate in the research and will not be obligated to join. For autistic YA with ED to participate in the study, their carers, family, or guardian will be consulted to ensure that inclusion/exclusion criteria are met. Interested participants will provide informed consent before participating and then access information describing the aims and objectives of the study and eligibility. All participants will be informed about the study topics to minimise the risks and possibility of reliving the adverse effects of ED on YA. We also provided resources to support women if they became distressed. The subject will be changed if the autistic YA with ED starts to get upset, and in severe circumstances, the participant may be removed from the research activity and the entire project. We will ensure that there will always be someone on hand to comfort any upset or distressed participants. The carer and a representative from the support group, Autism and ED Society, will be present at all activities. Furthermore, for face-to-face sessions, since this might be a new environment for the participants, they will be offered a quiet space to decompress. We will ensure that high, medium, and low support needs will be met throughout the project. If the researcher believes the participant’s safety is at risk at any point, the ED or autism representative will be notified and consulted. This person will be under the confidentiality set out in the consent form. Participants will be provided with detailed information about the project and can withdraw from the study at any point. Once the research has been completed, all the participants will have the opportunity to
3 Explainable Artificial Intelligence and Mobile Health for Treating …
39
have a copy of the final report. If they decide they would like a copy, it will be made available at the earliest possible opportunity, free of charge.
3.3.3 Recruitment Following ethics approval, an application will be made to the appropriate stakeholders (researchers, health practitioners, autistic YA with ED, their carers, and family) who will be invited through the Eating Disorder Society, Autism Society, schools and support organisations that will act as gatekeepers. In addition, clinicians and psychologists will be invited to join the PPIE activities to share their perceptions and expectations and provide necessary medical direction and guidance. However, our initial inquiry on participant engagement revealed that this is hard to get and retain. This is consistent with our further findings that show that up to 60% of health studies are extended or cancelled due to a shortage of participants, lack of engagement, slowing the progress of valuable research, and occasionally exposing study participants to burdens and inconvenience for no benefit [21, 22]. In addition, several ED support groups and society confirmed that this population is hard to get. To avoid these issues of lack of participation and engagement in this project, we will also use the Internet and social media to recruit and engage participants. Research shows that it can be the best recruitment and engagement method for hard-to-reach populations [23]. This approach will offer this research promising ways to improve recruitment efforts, reach, efficiency, and effectiveness at a reasonable cost [23, 24]. In particular, social networking sites will facilitate considerable reach and provide access to populations dealing with sensitive, stigmatising health conditions [23]. Features of Facebook, LinkedIn, Twitter, and Instagram like “liking,” “favourite,” “replying to,” or “retweeting” will provide robust sharing tools that we can use to encourage the public to spread the word about this research project and recruitment. However, we are aware that the Internet and social media recruitment raises unique ethical dilemmas regarding the principles of Respect for Persons and Concern for Welfare regarding privacy even before the consent to enrol in a study. These threats can include personal and sensitive information that may be collected from individuals without their knowledge or consent. This might be because individuals either are unaware of the privacy risks of online activity or consciously accept a trade-off to their privacy. Therefore, the Privacy by Design framework for online health research recruitment [23] will be used as a guiding framework to identify issues relevant to social media recruitment of hard-to-reach populations (YAs, carers, and their family). The SCOFF questionnaire: assessment of a new screening tool for ED will be completed at baseline and before participating in the evaluation of the proposed app [25].
40
O. Omisade et al.
3.3.4 Sample Size The sample size is usually subjective and dependent on available resources, feasibility, time, and expected goal. The “gold standard” for a purposive sample is to achieve saturation, which is impossible to predict. About 190 for talk shows, 40 for workshops, 50 for training, and 300 for the conference, we anticipate that this sample would be sufficient to evaluate the utility of the AI-driven mobile health tool.
3.4 Data Analysis Quantitative and qualitative data will be collected from stakeholders via PPIE activities. In addition, we plan to assess autistic YA with ED’s data, particularly individual health records captured with the CPRD service. CPRD collects fully coded patient electronic health records from GP practices and the following approval via CPRD’s Research Data Governance Process [26]. Statistical techniques will be used to summarise and describe the characteristics of a quantitative dataset, and Statistical Package for the Social Sciences (SPSS) will be used to perform these descriptive statistical analyses. The quantitative dataset will assess the strength and direction of the relationship between two continuous variables, frequencies, by comparing the means of different groups and determining whether significant differences exist. NVivo V12 will be used to organise, interpret, and make sense of the qualitative data. This will include assigning codes to segments of the data and then organising and grouping the codes in different ways to reveal patterns and relationships. In addition to the data collected from the qualitative and quantitative studies, we will review various algorithms of neuroimage processing and classification of autism intervention with similar characteristics of the ED interventions.
3.5 Outcome A classification method will be developed to categorise the interventions considering constraint factors for identification and modelling purposes. Also, existing artificial neural network approaches and modelling methods will be identified and tested based on current therapeutic and academic best practices. Once the classification algorithm is found, modelled, and tested, it will be presented as a conceptual framework that can be implemented in a mobile application with constraint management of autistic YA adults with ED outlined. The mobile application will be developed to offer personalised interventions based on the user’s profile, behaviour, and preferences. The app will also provide a range of features and functions that address the key challenges the target population faces. For instance, it may include modules for tracking and managing symptoms, offering
3 Explainable Artificial Intelligence and Mobile Health for Treating …
41
personalised recommendations for coping strategies, and connecting users with peer support networks. The app will also incorporate gamification elements to increase user engagement and motivation, making it more likely that they will adhere to the treatment plan and achieve better outcomes. The app’s design will be informed by user-centred principles and feedback from PWLE to ensure it is intuitive, easy to use, and appealing to the target audience. The research team will also conduct ongoing evaluation and improvement of the app’s functionality and user experience through PPIE activities and feedback from the target population. Overall, the aim is to create a sustainable and effective solution that can improve the quality of life and health outcomes for YA with autism and ED. It is expected that the AI-driven conceptual framework and mobile application will be able to address this group of individuals’ multifaceted and complex needs in a practical and sustainable manner. The framework and application can significantly impact their quality of life and promote better health outcomes by providing a comprehensive approach tailored to the user’s specific needs.
3.6 Dissemination The Model for Dissemination of Research [27] informed the dissemination strategy that will start early in the research, starting with publishing the research protocol. One of the benefits of using the ToC to guide this project is that it will accommodate the use of processes and variables that determine and influence the dissemination and adoption of knowledge and interventions by various stakeholders. In this context, the dissemination and adoption process begins with the stakeholder becoming aware of the innovation and being interested in understanding how it functions. The PPIE activities engage stakeholders and are dissemination strategies for research utilisation. The PPIE activity will include quizzes, games, and exhibitions that will raise awareness and facilitate the trial of innovations by the target audience, which in turn is associated with an increased rate of innovation adoption. We will use cost-effective methods such as tweets, blogs, YouTube vlog posts, Instagram, and Facebook to disseminate and leverage relevant existing networks to help amplify messages. We will create an online presence and immediate dissemination via the project website, which will act as an online database, communication, and support tool among the stakeholders, allowing efficient and quick dissemination of research results and facilitating the research’s visibility. Figure 3.3 provides more details on the dissemination for this project.
42
O. Omisade et al.
Fig. 3.3 Dissemination of research (adapted from [27])
3.7 Limitation This study may be associated with the risk of breach of confidentiality. Although most data of interest will be coded, anonymous, and non-identifiable, this study requires YA date of birth, which is necessary to validate the inclusion criteria. Data will also be stored in electronic copies on computers provided by CardiffMet. Data loss or corruption may occur during data transfer or analysis. To reduce the possibility of re-identifying any person whose data will be used in this study, we plan to handle the stakeholder’s date of birth so that re-identification of the person is highly unlikely. We plan to use the month and year of birth only and not the full date of birth data. In addition, we will not be linking stakeholder birth dates to any geographical data, including postcodes.
3.8 Impact In line with the expected research outcomes described in the ToC framework, we will track the project’s successful impact through the number of participants recorded to attend the PPIE activities. In addition, participants, facilitators, and involvement of other relevant stakeholders in the qualitative and quantitative activities will be used to gauge the impact, level of interest, and engagement. The impact will also be measured through standard online metrics, such as the number of shares/likes, page-views,
3 Explainable Artificial Intelligence and Mobile Health for Treating …
43
research citations from the production of dissemination, online media such as videos, press and social media posts, and live social media challenges. Through the project’s development of a network of public participants and stakeholders, including the policymakers in NHS, the impact will be maintained and assessed through longer-term engagement with these communities to encourage research participation, leading to ongoing data collection. Furthermore, one fundamental impact of this work is the increased awareness and facilitation of research on the use of XAI and MH for YA with ASD who have ED. Once these outputs have been disseminated, interest from policymakers, researchers, and NHS practitioners in the further clinical trial and proof-of-concept projects will also help measure the impact.
References 1. Radhakrishnan, L.: Pediatric Emergency Department Visits Associated with Mental Health Conditions Before and During the COVID-19 Pandemic—United States, January 2019– January 2022. MMWR. Morbidity and Mortality Weekly Report, vol. 71 (2022) 2. Virgo, H., Hutchison, E., Mitchell, E., Breen, G., Ayton, A.: The Cost of Eating Disorders in the UK 2019 and 2020. Hearts and Minds and Genes Coalition for Eating Disorders (2021) 3. Batchelor, R., Cribben, H., Macdonald, P., Treasure, J., Cini, E., Nicholls, D., Kan, C.: The experiential perspectives of siblings and partners caring for a loved one with an eating disorder in the UK. BJPsych. Open 8(2), e76 (2022) 4. Malson, H., Tischner, I., Herzig, H., Kitney, D., Phillips, C., Norweg, S., Moon, J., Holmes, S., Wild, K., Oldham-Cooper, R.: Key stakeholder perspectives on primary care for young people with an eating disorder: a qualitative study. J. Commun. Appl. Social Psychol. 32(2), 288–301 (2022) 5. Ioannidis, K., Hook, R.W., Wiedemann, A., Bhatti, J., Czabanowska, K., Roman-Urrestarazu, A., Grant, J.E., Goodyer, I.M., Fonagy, P., Bullmore, E.T., Jones, P.B., Chamberlain, S.R.: Associations between COVID-19 pandemic impact, dimensions of behavior and eating disorders: a longitudinal UK-based study. Compr. Psychiatry 115, 152304 (2022) 6. Elms, B., Higgins, A.: The potential role for educational psychologists working with systems supporting young people with eating disorders. Educ. Psychol. Pract. 38, 1–22 (2022) 7. Winston, A.P., Child, S., Jackson, J., Paul, M.: Management of transitions to adult services for young people with eating disorders: survey of current practice in England. BJPsych. Bull. 1–6 (2022) 8. Carter Leno, V., Micali, N., Bryant-Waugh, R., Herle, M.: Associations between childhood autistic traits and adolescent eating disorder behaviours are partially mediated by fussy eating. Eur. Eating Disord. Rev. (2022) 9. Solmi, F., Bentivegna, F., Bould, H., Mandy, W., Kothari, R., Rai, D., Skuse, D., Lewis, G.: Trajectories of autistic social traits in childhood and adolescence and disordered eating behaviours at age 14 years: a UK general population cohort study. J. Child Psychol. Psychiatry 62(1), 75–85 (2021) 10. Loomes, R., Bryant-Waugh, R.: Widening the reach of family-based interventions for Anorexia Nervosa: autism-adaptations for children and adolescents. J. Eat. Disord. 9(1), 1–11 (2021) 11. Simic, M., Stewart, C.S., Eisler, I., Baudinet, J., Hunt, K., O’Brien, J., McDermott, B.: Intensive treatment program (ITP): a case series service evaluation of the effectiveness of day patient treatment for adolescents with a restrictive eating disorder. Int. J. Eat. Disord. 51(11), 1261– 1269 (2018) 12. Eisler, I., Simic, M., Hodsoll, J., Asen, E., Berelowitz, M., Connan, F., Ellis, G., Hugo, P., Schmidt, U., Treasure, J., Yi, I., Landau, S.: A pragmatic randomised multi-centre trial of
44
13.
14. 15.
16.
17.
18. 19. 20. 21.
22.
23. 24. 25. 26.
27.
O. Omisade et al. multifamily and single family therapy for adolescent anorexia nervosa. BMC Psychiatry 16(1), 1–14 (2016) Stewart, C., Konstantellou, A., Kassamali, F., McLaughlin, N., Cutinha, D., Bryant-Waugh, R., Simic, M., Eisler, I., Baudinet, J.: Is this the ‘new normal’? A mixed method investigation of young person, parent and clinician experience of online eating disorder treatment during the COVID-19 pandemic. J. Eat. Disord. 9(1), 1–11 (2021) Bullivant, F.F., Woods, S.: Autism and Eating Disorders in Teens: A Guide for Parents and Professionals. Jessica Kingsley Publishers (2020) Omisade, O., Good, A., Fitch, T., Briggs, J.: An analysis of factors affecting postnatal depression intervention adherence. In: Data Analytics in Medicine: Concepts, Methodologies, Tools, and Applications, pp. 879–897. IGI Global (2020) Wainer, J., Dautenhahn, K., Robins, B., Amirabdollahian, F.: A pilot study with a novel setup for collaborative play of the humanoid robot KASPAR with children with Autism. Int. J. Soc. Robot. 6(1), 45–65 (2014) Amann, J., Blasimme, A., Vayena, E., Frey, D., Madai, V.I.: Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med. Inform. Decis. Mak. 20(1), 1–9 (2020) Higgins, D., Madai, V.I.: From bit to bedside: a practical framework for artificial intelligence product development in healthcare. Adv. Intell. Syst. 2(10), 2000052 (2020) Adedeji, T., Fraser, H., Scott, P.: Implementing electronic health records in primary care using the theory of change: Nigerian case study. JMIR Med. Inform. 10(8), e33491 (2022) Weiss, C.H.: How can theory-based evaluation make greater headway? Eval. Rev. 21(4), 501– 524 (1997) Gelinas, L., Pierce, R., Winkler, S., Cohen, I.G., Lynch, H.F., Bierer, B.E.: Using social media as a research recruitment tool: ethical issues and recommendations. Am. J. Bioeth. 17(3), 3–14 (2017) Folk, J.B., Harrison, A., Rodriguez, C., Wallace, A., Tolou-Shams, M.: Feasibility of social media–based recruitment and perceived acceptability of digital health interventions for caregivers of justice-involved youth: mixed methods study. J. Med. Internet Res. 22(4), e16370 (2020) Whitaker, C., Stevelink, S., Fear, N.: The use of facebook in recruiting participants for health research purposes: a systematic review. J. Med. Internet Res. 19(8), e7071 (2017) Topolovec-Vranic, J., Natarajan, K.: The use of social media in recruitment for medical research studies: a scoping review. J. Med. Internet Res. 18(11), e5698 (2016) Morgan, J.F., Reid, F., Lacey, J.H.: The SCOFF questionnaire: assessment of a new screening tool for eating disorders. BMJ 319(7223), 1467–1468 (1999) Wood, S., Marchant, A., Allsopp, M., Wilkinson, K., Bethel, J., Jones, H., John, A.: Epidemiology of eating disorders in primary care in children and young people: a clinical practice research Datalink study in England. BMJ Open 9(8), e026691 (2019) Ashcraft, L.E., Quinn, D.A., Brownson, R.C.: Strategies for effective dissemination of research to United States policymakers: a systematic review. Implement. Sci. 15(1), 1–17 (2020)
Chapter 4
Novel Deep Learning Models for Optimizing Human Activity Recognition Using Wearable Sensors: An Analysis of Photoplethysmography and Accelerometer Signals Rohit Kumar Bondugula and Siba Kumar Udgata Abstract Human activity recognition enables identifying the particular activity of an individual by analyzing sensor data. Wearable sensors are often utilized in this method to gather and categorize data. Many wearable devices include sensors for monitoring heart rate and detecting body posture. In this research, we experimented with the Photoplethysmography sensor for determining heart rate and accelerometer signals for recognizing body position to classify human activities such as squats, stepper, and resting. We have developed two novel deep learning models, ResTime and Minception, that can effectively recognize human activities through sensor data. These models identified the appropriate time intervals for activity recognition, which led to a decrease in false positives and false negatives. Our experiments on the PPG dataset yielded exceptional accuracy results, with ResTime and Mincep achieving 98.73.% and 98.79.%, respectively, surpassing other existing models. We also discovered that by adjusting the window size and selecting the appropriate model, we were able to optimize accuracy and minimize false positives or negatives. This allows for a more sophisticated decision-making system for recognizing human activities utilizing wearable sensor sensors.
4.1 Introduction Human activity recognition (HAR) is a popular research domain with manyfold applications in the contexts such as industrial settings [1], security surveillance [2], smart homes [3], and health care [4]. One common method for tracking human activity is through the use of smart and wearable sensors that are directly placed on the human body.
R. K. Bondugula · S. K. Udgata (B) WiSECom Lab, School of Computer and Information Sciences, University of Hyderabad, Hyderabad 500046, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_4
45
46
R. K. Bondugula and S. K. Udgata
Photoplethysmography(PPG) is a physiological method that is commonly used in wearable devices to detect human activity. PPG works by measuring the change in the volume of blood in the microvascular tissue. The photosensor detects reflected light which is emitted by the device and is able to detect changes in the blood flow. This method can provide accurate heart rate estimation and is noninvasive and intraoperative, similar to electrocardiography (ECG) and surface electromyography. Despite the benefits of using Photoplethysmography (PPG) signals to track human activity, there can be difficulties in obtaining accurate readings while doing the diverse activities by the subject. One of the challenges is the relative movement between the PPG light detector and the wrist skin, which can lead to corrupt data. To address this issue, researchers have proposed various signal processing techniques to eliminate the motion artifacts and improve the accuracy of the data that use PPG and accelerometer data [5, 6]. Alessandrini [7] proposed a solution for reducing motion artifacts by incorporating a recurrent neural network to a low-power, low-cost microcontroller with good accuracy in prediction. The network was found to be more accurate than other stateof-the-art techniques. The proposed architecture is an ideal method for predicting HAR in wearable devices. Many researchers have explored usage of the deep learning techniques for PPG classification. Heart rate variability, which is the variation in the time between consecutive heartbeats measured in milliseconds, is useful for detecting cardiovascular diseases such as severe myocardial infarction and uneven heartbeat. The biLSTM algorithm is an improvement over the standard LSTM algorithm, as it allows for the collection of information both before and after specific sample points. In order to accurately estimate HRV using PPG, an RNN-based biLSTM algorithm was introduced in [8] for cardiac period segmentation and the calculation of three HRV indexes. In order to evaluate blood pressure, a combination of PPG and ECG signals was used in conjunction with an RNN structure in [9]. The input hidden layer employed biLSTM technique to record contextual information in both the backward and forward directions. Additionally, several CNN-based models were studied for heart rate estimation based on the PPG data [10]. Bondugula et al. [11–13] proposed a novel MiniRocket framework, DCLS, and MiniVGG models for the HAR using the PPG, EEG, and inertial sensors. Brophy [14] focused on smart wrist-worn devices that can provide insights into human behavior and well-being through advanced analytics. Aydemir [15] proposed a robust method for PPG segmentation using the Hibert transform and classification by decision tree, nearest neighbors, and Naive Bayes algorithms. They found that this technique is useful for monitoring the heart rates and early detection of several atherosclerotic pathologies. Recent developments in machine learning and portable device technology have enabled the use of deep learning models on embedded microcontrollers, eliminating the need for wearables and the transfer of data to low-power systems. Edge computing has emerged as a solution to minimize latency, cost of communication, privacy issues, and network traffic. Edge devices, on the other hand, are incapable of supporting heavy processing demands. As a result, deep neural network (DNN) [16] models
4 Novel Deep Learning Models for Optimizing Human Activity Recognition …
47
have been developed for human activity recognition that has proven to have good performance with high computation, making them inadequate for deployment on edge devices. The rest of the work is structured as follows: In Sect. 4.2, we have given the dataset description. In Sect. 4.3, a detailed explanation of the proposed models is given. In Sect. 4.4, a thorough analysis of the experimental findings is shown which is followed by comparison and remarks in Sect. 4.4.3. Finally, in Sect. 4.5, we conclude by discussing the future work.
4.1.1 Contributions of the Research Work 1. Two novel deep learning models ResTime and Minception that can recognize the activities effectively were proposed. 2. The objective was also to reduce the corresponding window intervals’ false positives and negatives based on the activities. 3. Based on the window size interval activity, the appropriate model can be recommended for that particular activity.
4.2 Description of the Dataset This work includes the dataset [5] which is useful for evaluating PPG signals received through wrist-worn MAXREFDES100 sensor, as shown in Fig. 4.1. For our experiment, we used the publicly accessible dataset [5] for training and testing. Data collection was conducted with the participation of seven adult volunteers, and the recorded data, along with the corresponding activity, can be found in Table 4.1. In this study, a specialized weightlifting band, as depicted in Fig. 4.1, is used to securely attach the PPG module to the wrist. The cuff has a tear-off closure, making
Fig. 4.1 PPG sensor placement (Maxrefdes100) for data acquisation
48
R. K. Bondugula and S. K. Udgata
Table 4.1 Data acquisition of each subject with time Subjects Squats [s] Stepper [s] 1 2 3 4 5 6 7
311.597 216.797 231.495 212.575 246.295 237.370 266.860
442.990 397.615 271.040 269.680 241.975 325.902 254.930
Resting [s] 3271.7 2962.8 1323.8 1361.9 1440.9 1402.0 1510.7
it flexible and able to adapt to the skin surface, making it ideal for guaranteeing flawless adhesion of the sensing device to the skin’s surface. The dataset’s activities are categorized into three different classes: (i) squats, (ii) steppers, and (iii) resting.
4.3 Proposed Models 4.3.1 ResTime: Residual Network for Time Series In a ResNet architecture for sequential time series data, the input data is a series of time steps representing a sequence of sensor readings. These sensor readings could include data from the PPG, accelerometers, gyroscopes, and other sensors commonly used in human activity recognition. The ResNet architecture then uses a series of convolutional and pooling layers to extract features from the time series data. Instead of traditional convolutional layers, temporal convolutional layers, which are designed to work with time series data, would be used. These layers would extract features from the sensor readings at different time steps and then pass them through several fully connected layers to classify the human activity. The use of residual connections in the ResNet architecture allows for better gradient flow and enables the model to learn more complex, nonlinear representations of the data. This is particularly useful for time series data which can be nonlinear and complex in nature. In addition, the ResNet architecture can be adapted to use attention mechanisms, which can help the model to focus on the most important information in the time series data, further improving the accuracy of the activity recognition. The ResTime model architecture is shown in Fig. 4.2. The input is initially processed through a stack of three 1D convolutional layers. Each layer is configured with specific settings: The number of feature maps is 64, 128, and 128, respectively, while the kernel size is 7, 5, and 3 for each layer. The resulting output from this stack is then combined with a skip connection and concatenated. The concatenated output is subsequently fed into another stack of three one-dimensional
4 Novel Deep Learning Models for Optimizing Human Activity Recognition …
49
Fig. 4.2 Proposed ResTime architecture
(a) Block structure of the Inception Module
(b) Block Diagram of the Mini Inception Architecture
Fig. 4.3 Complete architecture of the minimized inception model with the block structure
convolutional layers. In this second stack, each layer has feature maps set to 64, 128, and 128, respectively, with corresponding kernel sizes of 7, 5, and 3 for each layer. Then the output of stack is concatenated again with a skip connection, and these skip connections are used to connect the output of one block of layers to the input of another block, skipping one or more layers in between. This allows information to flow directly from the input to a deeper layer, bypassing the intermediate layers. The skip connections are used to connect the output of a block of layers that processes the data at a specific time step to the input of another block that processes the data at a later time step. This allows the model to maintain information about the current activity state as it processes data from the next time step. The skip connections in ResNet are implemented using a technique called identity mapping, in which the output of a block is added to the input of the next block, element-wise as shown in Fig. 4.3. Finally, the resultant output of these convolutional layers and skip connections are then fed to a fully connected layers consisting for three units(classes) for activity prediction.
50
R. K. Bondugula and S. K. Udgata
Additionally, we proposed another method based on inception architecture, namely mini inception (Minception). The Minception model architecture is shown in Fig. 4.3.
4.3.2 Minception: Minimized Inception Architecture The minimized inception architecture is adapted for time series data analysis and is used to classify HAR using PPG and inertial sensor data. The model consists of multiple parallel convolutional layers with different kernel sizes, which allows it to learn features at different scales. The window size and overlap can be varied to adjust the model’s performance, and the model can be trained using a dataset of labeled data. The primary feature of the Minception architecture is the inception module. The architecture of inception module is defined in Fig. 4.3a. The module consists of two one-dimensional convolutional layers in parallel and a max pooling layer. The convolutional layers are of size 32 and 64, whereas the kernels are of size 3 and 5, respectively. The idea behind the module is to extract features at multiple scales and from multiple perspectives, which allows the network to learn more complex representations of the input activity signal. This module is stacked parallely with other one-dimensional convolutional layers, thus forming the entire architecture as shown in Fig. 4.3b. The input to the model is given parallely to inception module, 1D convolutional layer, and a max pooling layer. The outputs are concatenated and are sent to another block of similar structure. Finally, the outputs of the stacked blocks of inception module and convolutional layers are passed to fully connected layers. For our model, we employed two fully connected layers with hidden units set at 96 each. Both layers utilize ReLU activation. Additionally, the output layer of our model consists of three units, corresponding to the output classes. The data is preprocessed to extract features such as heart rate and body orientation and then divided into overlapping windows of different sizes. Each window is then passed through the inception model for classification. The window size and overlap can be varied to adjust the model’s performance. A larger window size would capture more context but might also introduce more noise. Overlapping the windows can help to capture patterns that might span across multiple windows.
4 Novel Deep Learning Models for Optimizing Human Activity Recognition …
51
4.4 Experiment Results and Discussions 4.4.1 ResTime Model Result The PPG and triaxial accelerometer signals that make up the dataset were collected during three different activities: rest, squat, and stepper exercises. To create multiple samples, the signals were divided into smaller segments using window samples. Different window sizes, ranging from 1 to 5 s, were employed in our experimentation to account for variations in the time required for heart rate to change during different activities. The ResTime architecture’s use of residual connections improves gradient flow and helps the model to learn more intricate, nonlinear data representations. This is especially helpful for sequence data, which has a complicated and nonlinear structure. In Table 4.2, the performance of the ResTime model is shown. The ResTime model demonstrated test accuracies more than 96.% across all window sizes, with the highest accuracy of 99.62% achieved using a 3 s window interval and overall average accuracy of 98.73.%. As given in Table 4.5, ResTime model had fewer false negatives and false positives in this interval and was able to effectively differentiate between the squat activity and other activities. This is likely due to the fact that smaller window sizes do not capture enough information from the
Table 4.2 ResTime model’s results along with the activities and respective window sizes Models
WS
Activities Pre
Recall
F1
Speci.
Sup.
Tr acc.
Test acc.
ResTime
1
Rest.
1.00
0.99
0.99
1.00
453
97.10
96.57
Squat.
0.88
0.98
0.93
0.96
261
Step.
0.98
0.91
0.95
0.99
395
Avg.
0.95
0.96
0.95
0.98
1109
Rest.
0.99
1.00
0.99
0.99
442
98.84
98.80
Squat.
0.99
0.96
0.97
0.99
264
Step.
0.97
0.99
0.98
0.98
373
Avg.
0.98
0.98
0.98
0.98
1079
Rest.
1.00
1.00
1.00
1.00
445
99.96
99.62
Squat.
0.99
0.99
0.99
0.99
242
Step.
0.99
0.99
0.99
0.99
362
Avg.
0.99
0.99
0.99
0.99
1049
Rest.
1.00
1.00
1.00
1.00
472
99.93
99.51
Squat.
0.99
0.98
0.98
0.99
214
Step.
0.98
0.99
0.99
0.98
333
Avg.
0.99
0.99
0.99
0.99
1019
Rest.
1.00
1.00
1.00
1.00
428
99.84
99.19
Squat.
0.97
0.98
0.98
0.99
237
Step.
0.99
0.98
0.98
0.99
324
Avg.
0.98
0.98
0.98
0.99
989
0.97
0.98
0.97
0.98
5245
99.13
98.73
ResTime
ResTime
ResTime
ResTime
Overall Avg.
2
3
4
5
52
R. K. Bondugula and S. K. Udgata
PPG signal to accurately identify irregularities. In Fig. 4.4b, the confusion matrix for all intervals is illustrated. The classes for squat and step activity had more FP and FN than the rest activity, which is likely due to having less data available for these classes. In comparison with the ResTime model, the Minception model performed better overall across all the time intervals.
4.4.2 Minception Model Results The Minception model had given the overall average accuracy of 98.79.% for all the window size, as given in Table 4.3 which is high in comparison with ResTime model which was 98.73.%. On average, the Minception model had an accuracy that was 0.6% higher than the ResTime model. However, as shown in Fig. 4.4b, the ResTime model performed better than the Minception model when using a 4 s interval, showing a 1.47.% improvement in accuracy. This also resulted in fewer false negatives and false positives in that interval time, as seen in Table 4.4. Hence, we can conclude that for the activities that require shorter time intervals to be identified, the Minception model is more suitable, while
Table 4.3 Minception model’s results along with the activities and respective window sizes Model
WS
Minception
1s
99.88
Minception
Minception
Minception
Overall Avg.
99.07
3s
4s
5s
Activities Pre
Recall
F1
Spec.
Sup.
Train acc.
Test acc.
98.33
97.84
Rest.
1.00
1.00
1.00
1.00
453
Squat.
0.99
0.91
0.92
0.99
261
Step.
0.94
0.99
0.97
0.96
395
Avg.
0.97
0.96
0.96
0.98
1109
Rest.
1.00
1.00
1.00
1.00
442
Squat.
1.00
0.96
0.98
1.00
264
Step.
0.97
1.00
0.98
0.98
373
Avg.
0.99
0.98
0.98
0.99
1079
Rest.
1.00
1.00
1.00
1.00
445
Squat.
1.00
0.98
0.99
1.00
242
Step.
0.99
1.00
0.99
0.99
362
Avg.
0.99
0.99
0.99
0.99
1149
Rest.
1.00
1.00
1.00
1.00
472
Squat.
1.00
0.90
0.95
1.00
214
Step.
0.94
1.00
0.97
0.97
333
Avg.
0.98
0.96
0.97
0.99
1019
Rest.
1.00
0.99
0.99
1.00
428
Squat.
0.99
0.97
0.98
0.99
237
Step.
0.98
0.99
0.98
0.99
324
Avg.
0.99
0.98
0.98
0.99
989
0.98
0.97
0.97
0.98
5245
Minception2 s
99.98
99.71
99.47
98.04
99.98
99.29
99.52
98.79
4 Novel Deep Learning Models for Optimizing Human Activity Recognition … Fig. 4.4 a ResTime model confusion matrix on all the windows. b Minception model confusion matrix on all the windows
(a)
(b)
53
54
R. K. Bondugula and S. K. Udgata
Table 4.4 False negatives and false positives for each class of the dataset Window size (s) Models 1
2
3
4
5
1
2
3
4
5
ResTime
ResTime
ResTime
ResTime
ResTime
Minception
Minception
Minception
Minception
Minception
Activities
FP
FN
Test samples
Test accuracy
Test in s
1109
96.57
0.51
1079
98.80
0.59
1049
99.62
0.77
1019
99.51
0.94
989
99.19
1.00
1109
97.84
0.58
1079
99.07
0.78
1049
99.71
1.00
1019
98.04
1.30
989
99.29
1.50
Rest.
0
1
Squat.
33
4
00 Step.
5
33
Rest.
1
0
Squat.
2
10
Step.
10
3
Rest.
0
0
Squat.
2
2
Step.
2
2
Rest.
0
0
Squat.
1
4
Step.
4
1
Rest.
0
0
Squat.
5
3
Step.
3
5
Rest.
0
0
Squat.
1
23
Step.
23
1
Rest.
0
0
Squat.
0
10
Step.
10
0
Rest.
0
0
Squat.
0
3
Step.
3
0
Rest.
0
0
Squat.
0
20
Step.
20
0
Rest.
0
1
Squat.
1
5
Step.
6
1
for the activities that require longer time intervals for accurate classification, again Minception model is better. However in the 4 s window, ResTime has given better accuracy.
4.4.3 Discussions and Performance Comparison In Table 4.5, we compare the proposed models with baseline models for HAR. For a fair comparison, the results of the proposed methodologies were reported using a one-second window interval. The proposed Minception and ResTime model had the
4 Novel Deep Learning Models for Optimizing Human Activity Recognition …
55
Table 4.5 Comparison of the proposed models with the existing methods Models Accuracy PBP [17] KLT + GMM [18] RNN/LSTM [19] VGG-16 [12] MiniVGG [12] ResTime Minception
96.42 78.00 95.54 95.04 97.75 98.73 98.79
highest accuracies of 98.79 and 98.73.%. It was a substantial improvement above the baseline model with the best performance than the MiniVGG by 1.03.% and 0.94.%, respectively. The VGG [12] and MiniVGG [12] models were used as a baseline along with PBP [17] model that had lower accuracy compared to our proposed model as given in Table 4.5. As seen in Table 4.4, the Mincpetion and ResTime model had the highest accuracy of 99.71% and 99.62.% when using a 3 s window interval and had very fewer false positives and negatives.
4.5 Conclusions and Future Directions In this research paper, two novel deep learning architecture, namely ResTime and Minception, are proposed. The models are based on the ResNet and inception and can be used to recognize human activities. The model was tested using publicly available datasets that include triaxial accelerometer and PPG data, and the data was divided into different time intervals ranging from 1 to 5 s to evaluate the model’s performance on varying lengths of time. The results showed that the Minception model had a highest accuracy of 99.79.% and an overall accuracy of over 98.79.% across all time intervals. Additionally, the proposed method performed better than other baseline models applied to the same dataset. Furthermore, as given in Table 4.2, Minception and ResTime has a high recall of 97% and a high specificity of 98%. The window sizes and overlapping used in the ResNet models can affect the performance of the model; smaller window size and more overlapping can help the model to capture more details but would increase the computational cost and may lead to over-fitting. Larger window size with less overlapping will reduce the computational cost but would lose some of the details. Therefore, it is important to find the optimal window size and overlapping to balance the trade-off between the performance and computational cost. In future, we propose to include adaptive and personalized activity recognition targeting individual users encompassing real-time and transfer learning modalities.
56
R. K. Bondugula and S. K. Udgata
References 1. Stiefmeier, T., Roggen, D., Ogris, G., Lukowicz, P., Tröster, G.: Wearable activity tracking in car manufacturing. IEEE Pervasive Comput. 7(2), 42–50 (2008) 2. Chen, L., Wei, H., Ferryman, J.: A survey of human motion analysis using depth imagery. Pattern Recogn. Lett. 34(15), 1995–2006 (2013) 3. Li, Y., Yang, G., Su, Z., Li, S., Wang, Y.: Human activity recognition based on multienvironment sensor data. Inf. Fusion 91, 47–63 (2023) 4. Bondugula, R.K., Sivangi, K.B., Udgata, S.K.: Identification of schizophrenic individuals using activity records through visualization of recurrent networks. In: Intelligent Systems, 653–664. Springer (2022) 5. Biagetti, G., Crippa, P., Falaschetti, L., Saraceni, L., Tiranti, A., Turchetti, C.: Dataset from PPG wireless sensor for activity monitoring. Data Brief 29, 105044 (2020) 6. Zhang, Z., Pi, Z., Liu, B.: Troika: A general framework for heart rate monitoring using wristtype photoplethysmographic signals during intensive physical exercise. IEEE Trans. Biomed. Eng. 62(2), 522–531 (2014) 7. Alessandrini, M., Biagetti, G., Crippa, P., Falaschetti, L., Turchetti, C.: Recurrent neural network for human activity recognition in embedded systems using PPG and accelerometer data. Electronics 10(14), 1715 (2021) 8. Xu, K., Jiang, X., Ren, H., Liu, X., Chen, W.: Deep recurrent neural network for extracting pulse rate variability from photoplethysmography during strenuous physical exercise. In: 2019 IEEE Biomedical Circuits and Systems Conference (BioCAS), 1–4. IEEE (2019) 9. Sentürk, ¸ Ü., Yüceda˘g, I., Polat, K.: Repetitive neural network (RNN) based blood pressure estimation using PPG and ECG signals. In: 2018 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 1–4. IEEE (2018) 10. Reiss, A., Indlekofer, I., Schmidt, P., Van Laerhoven, K.: Deep PPG: large-scale heart rate estimation with convolutional neural networks. Sensors 19(14), 3079 (2019) 11. Bondugula, R.K., Udgata, S.K., Sivangi, K.B.: A novel deep learning architecture and minirocket feature extraction method for human activity recognition using ECG, PPG and inertial sensor dataset. In: Applied Intelligence, 1–26 (2022) 12. Bondugula, R.K., Sivangi, K.B., Udgata, S.K.: A deep learning architecture for human activity recognition using PPG and inertial sensor dataset. In: Next Generation of Internet of Things, 549–562. Springer (2023) 13. Bondugula, R.K., Udgata, S.K., Bommi, N.S.: A novel weighted consensus machine learning model for covid-19 infection classification using CT scan images. Arab. J. Sci. Eng. 1–12 (2021) 14. Brophy, E., Muehlhausen, W., Smeaton, A.F., Ward, T.E.: Optimised convolutional neural networks for heart rate estimation and human activity recognition in wrist worn sensing applications. arXiv preprint arXiv:2004.00505 (2020) 15. Aydemir, T., Sahin, ¸ M., Aydemir, O.: A new method for activity monitoring using photoplethysmography signals recorded by wireless sensor. J. Med. Biol. Eng. 40(6), 934–942 (2020) 16. Walse, K.H., Dharaskar, R.V., Thakare, V.M.: PCA based optimal ANN classifiers for human activity recognition using mobile sensors data. In: Proceedings of First International Conference on Information and Communication Technology for Intelligent Systems, vol. 1, 429–436. Springer (2016) 17. Mahmud, T., Akash, S.S., Fattah, S.A., Zhu, W.-P., Ahmad, M.O.: Human activity recognition from multi-modal wearable sensor data using deep multi-stage LSTM architecture based on temporal feature aggregation. In: 2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS), 249–252. IEEE (2020) 18. Moghadam, Z.B., Noghondar, M.S., Goshvarpour, A.: Novel delayed Poincare’s plot indices of photoplethysmogram for classification of physical activities. Appl. Medical Inform. 43(1), 43–55 (2021) 19. Brophy, E., Veiga, J.J.D., Wang, Z., Ward, T.E.: A machine vision approach to human activity recognition using photoplethysmograph sensor data. In: 2018 29th Irish Signals and Systems Conference (ISSC), 1–6. IEEE (2018)
Chapter 5
Link Prediction in Complex Networks: An Empirical Review Y. V. Nandini, T. Jaya Lakshmi, and Murali Krishna Enduri
Abstract Any real-world entity with entities and interactions between them can be modeled as a complex network. Complex networks are mathematically modeled as graphs with nodes denoting entities and edges(links) depicting the interaction between entities. Many analytical tasks can be performed on such networks. Link prediction (LP) is one of such tasks, that predicts missing/future links in a complex network modeled as graph. Link prediction has potential applications in the domains of biology, ecology, physics, computer science, and many more. Link prediction algorithms can be used to predict future scientific collaborations in a collaborative network, recommend friends/connections in a social network, future interactions in a molecular interaction network. The task of link prediction utilizes information pertaining to the graph such as node-neighborhoods, paths. The main focus of this work is to empirically evaluate the efficacy of a few neighborhood-based measures for link prediction. Complex networks are very huge in size and sparse in nature. Choosing the candidate node pairs for future link prediction is one of the hardest tasks. Majority of the existing methods consider all node pairs absent of an edge to be candidates; compute prediction score and then the node pairs with the highest prediction scores are output as future links. Due to the massive size and sparse nature of complex networks, examining all node pairs results in a large number of false positives. A few existing works select only a subset of node pairs to be candidates for prediction. In this study, a sample of candidates for LP based are chosen based on the hop distance between the nodes. Five similarity-based LP measures are chosen for experimentation. The experimentation on six benchmark datasets from four domains shows that a hop distance of maximum three is optimum for the prediction task.
Y. V. Nandini (B) · T. J. Lakshmi · M. K. Enduri School of Engineering and Sciences, SRM University, Amaravati, Andhra Pradesh, India e-mail: [email protected] T. J. Lakshmi e-mail: [email protected] M. K. Enduri e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_5
57
58
Y. V. Nandini et al.
5.1 Introduction A network is a representation of any system that has entities and interactions between those entities. Networks, where nodes standing for entities and links for relationships between nodes, can be used to depict a variety of social, technological, ecological, and informational systems [1]. Complex networks are dynamic due to nodes and links’ constant addition and deletion. Link prediction (LP) is the task of predicting future links in a complex network [2]. When an interaction between two nodes does not already exist at this time, the LP problem seeks to estimate the likelihood that it will occur in the future. The problem of link prediction can be relevant in various disciplines. To predict missing interactions between biological entities, unknown interactions in protein-protein interaction networks, unknown reactions in metabolic networks, expensive laboratory research is required. Link prediction measures help reducing experimental costs significantly in such applications [3]. Link prediction algorithms identify spurious links in computer network. LP methods are used to propose friends on social networks like Facebook and LinkedIn. Users on online websites such as Amazon might receive product recommendations by foreseeing links between users and products in a user-item bipartite graph that indicates purchases. Link prediction in coauthorship networks can suggest future collaborations [4]. Section 5.2 discusses the existing methods of link prediction. Implementation and results are given in Sect. 5.3. Conclusions of this study are given in Sect. 5.4.
5.1.1 Problem Statement Link prediction is defined as follows: Given a network .G(V, E) .V and . E denoting node-set and edge-set, the link prediction task is to generate a list of edges that are not existent at time .G(t0 ) but are expected to form in the network .G(tn ) for .t0 < tn [2]. Figure 5.1 depicts the problem. In Fig. 5.1, a new edge has been established between nodes .c and .i at time .tn . The following steps are typically included in link-prediction. • The network data is divided into train and test sets. • List out the node pairs without an edge from training set. • For each pair of such node pairs, assign a prediction score that determines how likely a link is probable in future. • After sorting the node pairs in descending order according to the computed scores, the top .k nodes will be delivered as the desired list. • Then evaluate the performance using links in test set. The following section discusses the existing measures that assign prediction scores for node pairs.
5 Link Prediction in Complex Networks: An Empirical Review
59
Fig. 5.1 Link prediction
5.2 Link Prediction Measures LP measures are mainly categorized into similarity-based/neighborhood-based, probabilistic, learning-based [5]. Figure 5.2 summarizes these measures. The following notation is used throughout this study. • . p, q: Two nodes in the network. • . N ( p): Neighborhood set of node . p. • .n: Number of nodes in network. Neighborhood-based methods use simple approach in which the similarity scores are calculated for each pair of nodes . p and .q. Sort the scores; the pairs with the highest scores may eventually create links in future. These measures are called as local if the computation involves local neighborhood; global if path information is used in computing LP score [6]. Examples of global measures include Katz, Random-WalkWith-Restart [7], Shortest Path, etc. Quasi-local measures use a combination of these two. Probabilistic models usually require the information other than graph topology, such as knowledge of the node/edge attributes [8, 9]. It is difficult to gather such attribute information due to privacy issues. Dimensionality reduction-based measures utilize matrix factorization and embedding techniques. The main focus of this study is on local neighborhood-based LP measures. The following section discusses the neighborhood-based LP measures.
5.2.1 Neighborhood-Based Measures Common Neighbors and node degrees are typically used in the calculation of local indices. Five local neighborhood measures are described below. 1. Common Neighbors(.C N ) [10]: This measure works with the intuition that if two nodes have many neighbors in common, then the probability of link
60
Y. V. Nandini et al.
Fig. 5.2 Classification of LP measures
formation increases. .CN( p, q) is given in Eq. 5.1. CN( p, q) = |N ( p) ∩ N (q)|
.
(5.1)
It is obvious that .CN( p, q) = A2 [ p][q], where . A is Adjacency-Matrix of graph . G. 2. Jaccard Coefficient (JC) [5]: Jaccard Coefficient is the normalized common neighbor measure. It is described as the fraction of common neighbors among existing neighbors of both the nodes. . J C( p, q) is defined as. JC( p, q) =
.
|N ( p) ∩ N (q)| |N ( p) ∪ N (q)|
(5.2)
3. Preferential Attachment (PA) [11]: The node with the highest degree is expected to connect to other nodes in the future. By multiplying the degrees of nodes . p and .q, we may calculate the richness of two nodes. .PA( p, q) is defined as follows:
5 Link Prediction in Complex Networks: An Empirical Review
PA( p, q) = |N ( p)| ∗ |N (q)|
.
61
(5.3)
Only the degree of the nodes is required for this measurement. As a result, the computational complexity of .PA is the lowest. 4. Adamic/Adar Index (AA) [12] Adamic and Adar introduced a metric to determine the score of similarity between two web pages based on shared traits. Liben-Nowell et al. [12] modified this metric and used it to predict links between web sites. ∑ 1 .AA( p, q) = (5.4) log(N (r )) r ∈N ( p)∩N (q) where . N (r ) is degree of a node .r . The equation makes it obvious that common neighbors with lower degrees are given more weight. This makes sense in the real world as well; for instance, someone with more friends will spend less time and resources on each friend than someone with fewer friends. 5. Resource Allocation Index (RA) [13]: Consider two vertices . p and .q, which are not neighboring. Assuming that node . p sends some resources to node .q through the two nodes’ shared nodes, the similarity between the two vertices is evaluated in terms of resources provided from . p to .q. RA can be mathematically represented as ∑ 1 .RA( p, q) = (5.5) N (r ) r ∈N ( p)∩N (q) There are plenty of other measures available in the literature such as Cosinesimilarity [14], Sorensen Index [15], CAR-based Common Neighbor Index [16] ,CAR-based Adamic-Adar Index [16] , CAR-based Resource Allocation Index [16], CAR-based Preferential Attachment Index [16], Hub Promoted Index and Hub Depressed Index, Local Naive Bayes-based Common Neighbors [17], Leicht-Holme-Newman Local Index [10], Node Clustering Coefficient [18], Node and Link Clustering coefficient which are variations of the above mentioned measures. One of the challenges in the problem of link prediction in complex networks is the selection of the candidate node pairs. According to problem description, all those node pairs without an edge between them can be considered as candidate pairs for computing the LP score that can be used for future link formation. But the complex networks being huge in size and very sparse in nature, considering all node pairs induces large number of false positives. To address this problem, many existing works consider only a sample of node pairs to be candidates for prediction [19]. In this work we have experimented the effect of hop distance between candidate node pairs on the prediction accuracy. The following section presents the experimental setting used in this study.
62
Y. V. Nandini et al.
5.3 Experimentation and Results 5.3.1 Data Set Description Six network data sets from various disciplines are used in the experimentation. CAGrQc and ca-netscience [20] are collaborative networks with nodes representing authors and scientific collaborations denoting edges. Web-polblogs [21] is another dataset used, which is related to web graphs in which web pages are nodes and hyperlinks are edges. Bio-celegans [22] is the fourth dataset from the domain of Biology, where nodes denote substrates and metabolic reactions between them are edges. The last dataset used is E-road network [23] which is a road network located mostly in Europe. In E-road network, nodes represent cities and an edge denotes that they are connected by an E-road. And the last dataset is powerGrid [23], in this power stations and substations are represented as nodes and the power lines or transformers act as links between the nodes. A few network statistics of these datasets are given in Table 5.1. Among all the networks, .bio-.celegans is dense network with a negative assortativity coefficient. CA-GrQc is sparse with high assortativity. Clustering coefficient is high for ca-netscience.
5.3.2 Evaluation Metrics A confusion matrix (Fig. 5.3) can be used to illustrate the evaluation of prediction performance of link prediction measures [24]. In the confusion matrix, • True-Positive (TP): Number of node pairs with a predicted a link by the LP measure and the link is also existing in the test set. • True-Negative (TN): Number of node pairs with a predicted a link by the LP measure and the link is not existing in the test set.
Table 5.1 Dataset statistics Network Nodes
CA-GrQc ca-netscience web-polblogs bio-celegans Euroroad powerGrid
5242 379 643 453 1174 4941
Links
Average clusteringcoefficient
14496 914 2280 2025 1417 6594
0.5296 0.7412 0.2320 0.6464 0.0167 0.0801
Assortativity coefficient
Density
0.6593 0.0816 .− 0.2178 .− 0.2258 0.1266 0.0034
0.0010 0.0120 0.0110 0.0197 0.0020 0.0005
.−
5 Link Prediction in Complex Networks: An Empirical Review
63
Fig. 5.3 Confusion matrix
• False-Positive (FP): Number of node pairs between which the link is not predicted by LP measure and the link is actually existing in the test set. • False-Negative (FN): Number of node pairs between which the link is not predicted by LP measure and the link is not existing in the test set. The other metrics that are based on confusion matrix are as follows:
TPR =
.
#FP #TP #TP ; FPR = ; Precision = (5.6) #TP + #FN #FP + #TN #TP + #FP
There are threshold-based metrics to evaluate the performance of LP measures. Area under the receiver operating characteristics curve (AUROC) and area under the precision-recall curve (AUPR) are two such measures. • AUROC [25]: The true positive rate (sensitivity) on the Y -axis and the false positive rate (1-specificity) on the X-axis are plotted to form a roc curve. Equation 5.6 can be used to calculate the true positive rate and false positive rate, respectively. Specificity is the performance of a dataset’s entire negative part, and sensitivity is the performance of the entire positive part. The area under the roc curve is a single-point summary statistic with a range between 0 and 1 [25]. • AUPR [24]: A binary classifier’s performance is assessed using AUPR, a singlepoint summary statistic (predictor). Based on the precision-recall curve, which is a plot between the precision values on the Y -axis and the recall values on the X-axis, this number is calculated. Equation 5.6, respectively, can be used to calculate the precision and recall values. The more high the value of aupr, the better the model. As the task of link prediction is highly imbalanced with huge number of negative (non-existing) links compared to positive (existing) links, AUPR is more appropriate measure [19].
5.3.3 Results Five similarity-based LP measures discussed in Sect. 5.2.1 are chosen for experimentation. The challenge of a huge number of false positives due to the selection of the candidate node pairs is specially focused.
64
Y. V. Nandini et al.
Table 5.2 AUPR results of LP measures based on hop distance candidate between node pairs LP Candidate CA-GrQc cawebbioEuroroad powerGrid measures node-pairs netscience polblogs celegans CN
JC
AA
PA
RA
All 2-hop 3-hop 4-hop 5-hop All 2-hop 3-hop 4-hop 5-hop All 2-hop 3-hop 4-hop 5-hop All 2-hop 3-hop 4-hop 5-hop All 2-hop 3-hop 4-hop 5-hop
0.404 0.501 0.462 0.464 0.480 0.263 0.324 0.295 0.292 0.298 0.527 0.625 0.574 0.572 0.556 0.020 0.130 0.070 0.048 0.026 0.493 0.544 0.581 0.569 0.591
0.451 0.459 0.386 0.442 0.407 0.192 0.185 0.208 0.233 0.280 0.442 0.591 0.517 0.520 0.563 0.005 0.079 0.246 0.021 0.011 0.521 0.541 0.587 0.564 0.539
0.054 0.085 0.058 0.070 0.054 0.006 0.006 0.006 0.005 0.007 0.046 0.074 0.058 0.055 0.066 0.012 0.049 0.021 0.016 0.013 0.039 0.057 0.076 0.057 0.067
0.094 0.081 0.112 0.081 0.079 0.031 0.021 0.035 0.014 0.029 0.121 0.104 0.104 0.126 0.138 0.047 0.038 0.053 0.072 0.036 0.177 0.187 0.143 0.136 0.130
0.001 0.064 0.003 0.022 0.004 0.003 0.002 0.005 0.020 0.003 0.003 0.003 0.005 0.005 0.003 0.000 0.009 0.005 0.003 0.003 0.001 0.005 0.003 0.005 0.005
0.016 0.096 0.047 0.042 0.037 0.013 0.036 0.027 0.024 0.014 0.011 0.029 0.032 0.026 0.028 0.001 0.023 0.010 0.006 0.004 0.014 0.022 0.024 0.016 0.025
1. Step 1: The network is divided into two parts: training set containing 80% of links and test set containing remaining 20% of links. Five sets of candidate node pairs from training set for each network are formed as explained below. • . All: This set contains all node pairs without an edge. • .2-.hop: All node pairs within a distance of 2 -hops. • .3-.hop: All node pairs within a distance of 3 -hops. • .4-.hop: All node pairs within a distance of 4 -hops. • .5-.hop: All node pairs within a distance of 5 -hops. 2. Step 2: Link prediction measures are applied on each of these sets individually. 3. Step3: The performance of LP measures on each network for each candidate node-pair set is evaluated using AUROC and AUPR.
5 Link Prediction in Complex Networks: An Empirical Review
65
Fig. 5.4 AUPR scores of link prediction measures on CA-GrQC network Table 5.3 AUROC results of LP measures based on hop distance between candidate node pairs CA-GrQc cawebbioEuroroad powerGrid LP netscience polblogs celegans CN JC AA PA RA
0.978 0.964 0.971 0.650 0.966
0.944 0.922 0.942 0.579 0.930
0.868 0.737 0.830 0.841 0.846
0.903 0.782 0.919 0.791 0.943
0.550 0.534 0.515 0.438 0.528
0.738 0.682 0.694 0.520 0.677
The AUPR results are tabulated in Table 5.2. It is evident that considering candidate node pairs within a distance of 2-hop gives better prediction accuracy, claiming the fact that the connection becomes weak as the distance between the nodes increases. To test this claim in depth, we have conducted experiments up to 10-hops. The results of AUPR up to 10-hops for the five LP measures for the network of CAGrQC is given in Fig. 5.4. The AUPR score is high at 2-hop, decreases slightly with 3-hop and then steady and least when all node pairs are taken. The node pairs within 2-hop distance are much less than all node pairs. Therefore, considering node pairs within 2-hops not only improves prediction performance, but also reduces computation required. Out of the five LP measures considered, Adamic-Adar predicted future links more efficiently and Preferential Attachment is the least performing for almost all networks. As AUROC being the classical evaluation measure, the best AUROC scores are presented in Table 5.3. In terms of AUROC also, common neighbors produced accurate predictions and Preferential Attachments is the measure with least prediction performance.
66
Y. V. Nandini et al.
5.4 Conclusion Link prediction in complex networks is one of the significant analytical tasks in many domains. In this work five similarity-based link prediction measures are evaluated on six networks from various domains. We have taken a sample of node pairs from the training set as candidate node pairs for which prediction scores are computed. It is observed that node pairs within 2-hop distance exhibited better prediction accuracy than considering all node pairs. Limiting the candidate node pairs based on hop distance not only improves prediction performance, but also significantly reduce the computation required.
References 1. Albert, R., Barabási, A.: Statistical mechanics of complex networks. Rev. Mod. Phys. 74(1), 47 (2002) 2. Yao, L., Wang, L., Pan, L., Yao, K.: Link prediction based on common-neighbors for dynamic social network. Procedia Comput. Sci. 83, 82–89 (2016) 3. Stumpf, M.P.H., Thorne, T., Silva, E.D., Stewart, R., An, H.J., Lappe, M., Wiuf, C.: Estimating the size of the human interactome. Proc. Nat. Acad. Sci. 105(19), 6959–6964 (2008) 4. Ben Schafer, J., Frankowski, D., Herlocker, J., Sen, S.: Collaborative filtering recommender systems. In: The Adaptive Web, pp. 291–324. Springer (2007) 5. Kumar, A., Singh, S.S., Singh, K., Biswas, B.: Link prediction techniques, applications, and performance: a survey. Physica A Stat. Mech. Appl. 553, 124289 (2020) 6. Jaya Lakshmi, T., Durga Bhavani, S.: Link prediction measures in various types of information networks: a review. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1160–1167. IEEE (2018) 7. Tong, H., Faloutsos, C., Pan, J.-Y.: Fast random walk with restart and its applications. In: Sixth International Conference on Data Mining (ICDM’06), pp. 613–622. IEEE (2006) 8. Wang, C., Satuluri, V., Parthasarathy, S.: Local probabilistic models for link prediction. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 322–331. IEEE (2007) 9. Neville, J.: Statistical Models and Analysis Techniques for Learning in Relational Data. University of Massachusetts Amherst (2006) 10. Lü, L., Zhou, T.: Link prediction in complex networks: a survey. Physica A Stat. Mech. Appl. 390(6), 1150–1170 (2011) 11. Barabâsi, A.-L., Jeong, H., Néda, Z., Ravasz, E., Schubert, A., Vicsek, T.: Evolution of the social network of scientific collaborations. Physica A Stat. Mech. Appl. 311(3–4), 590–614 (2002) 12. Adamic, L., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25(3) (2003) 13. Zhou, T., Lü, L., Zhang, Y.-C.: Predicting missing links via local information. Euro. Phys. J. B 71(4), 623–630 (2009) 14. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill (1983) 15. McCune, B., Grace, J.B., Urban, D.L.: Analysis of Ecological Communities, vol. 28. MjM Software Design Gleneden Beach, OR (2002) 16. Cannistraci, C.V., Alanis-Lobato, G., Ravasi, T.: From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks. Sci. Rep. 3(1), 1–14 (2013) 17. Liu, Z., Zhang, Q.M., Lü, L., Zhou, T.: Link prediction in complex networks: a local Naïve Bayes model. EPL (Europhy. Lett.) 96(4), 48007 (2011)
5 Link Prediction in Complex Networks: An Empirical Review
67
18. Morales, A.J., Losada, J.C., Benito, R.M.: Users structure and behavior on an online social network during a political protest. Physica A Stat. Mech. Appl. 391(21), 5244–5253 (2012) 19. Lichtnwalter, R., Chawla, N.V.: Link prediction: fair and effective evaluation. In: 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 376–383. IEEE (2012) 20. Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph evolution: densification and shrinking diameters. ACM Trans. Knowl. Disc. Data (TKDD) 1(1), 2-es (2007) 21. Adamic, L.A., Glance, N.: The political blogosphere and the 2004 us election: divided they blog. In: Proceedings of the 3rd International Workshop on Link Discovery, pp. 36–43 (2005) 22. Duch, J., Arenas, A.: Community identification using extremal optimization. Phys. Rev. E 72, 027104 (2005) 23. Rossi, R., Ahmed, N.: The network data repository with interactive graph analytics and visualization. In: Twenty-Ninth AAAI Conference on Artificial Intelligence (2015) 24. Yang, Y., Lichtenwalter, R.N., Chawla, N.V.: Evaluating link prediction methods. Knowl. Inf. Syst. 45, 751–782 (2015) 25. Fawcett, T.: An introduction to roc analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Chapter 6
High Utility Itemset Mining and Inventory Management: Theory and Use Cases Gutha Jaya Krishna
Abstract High utility itemset mining is a recent trend of finding not the most frequent items sold in the store, but finding the items sold of high utility to the store in terms of price and quantity. The knowledge gained from high utility itemset mining can be utilized in multiple ways for managing the inventory of a store. This paper envisions possible use cases of high utility itemset mining to inventory management. The use cases of this paper are based on a few synthetic examples and a real-world dataset in the retail domain. The motivation of the paper is to broaden the horizon by suggesting a few possible uses of high utility itemset mining in inventory management.
6.1 Introduction to High Utility Itemset Mining High utility itemset mining [1–6] is an approach utilized in data science to spot elements inside transaction datasets that bring high value or profit. It is commonly used in retail and electronic business applications, helping companies optimize their stock management and pricing strategies by uncovering the most popular and profitable commodities. To achieve this, high utility itemset mining algorithms incorporate association rule mining with cost–benefit analysis to identify meaningful itemsets among large datasets. Let us look at high utility itemset mining in action [7, 8]: a retail store sells clothing, electronics, and home goods. Their database has the items purchased along with cost and selling price of each item in every transaction recorded. To identify items with an impressive profitability, a sophisticated high utility itemset mining algorithm is executed on the dataset. This approach first identifies groups of frequently purchased items and then computes the overall gain from each item by subtracting its cost from its selling price for all transactions that included said item. To illustrate, the algorithm G. J. Krishna (B) Administrative Staff College of India, Raj Bhavan Road, Khairtabad, Hyderabad, Telangana 500082, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_6
69
70
G. J. Krishna
may detect that a certain type of jeans offers an optimal benefit since they are regularly bought and have considerable profit margins. Here we demonstrate an example of how high utility itemset mining can be represented in Table 6.1. Table 6.1 serves as a reflection of transactions, with each row representing one individual sale. The “Transaction ID” column displays the singular identifier for that particular transaction, while the “Item” column highlights what item was bought, and the “Cost” column denotes how much it costs. Moreover, you can find out at what price something was sold in regard to its respective “Selling Price”. Lastly but certainly not least is the useful “Utility” section showcasing profit gained from selling said item (as calculated by Selling Price–Cost). A cutting-edge utility itemset mining algorithm [9, 10] can be employed to this dataset in order to uncover items with the highest utility. Utilizing this method, it was determined that Item A had a value of $5 across two transactions, while Item B and C each possessed a worth of $4 in two distinct orders. On the other hand, Item D demonstrated an impressive value of $6 throughout both instances. The store manager can then choose to keep Item D stocked continually, as it is the most beneficial to them. Furthermore, they have a choice of increasing its price or boosting promotion for that item. Regarding Items A and B, which share similar utility values, likewise with Item C—the shopkeeper may decide on maintaining comparable prices/promotional activities or even diminish their cost. It is critical to understand that this is a basic example, while in real-world situations the dataset would be significantly bigger and include additional data such as purchase date, time and location, consumer details, and more. High utility itemset mining is a complex process that relies on various mathematical techniques such as set theory, probability, statistics, and optimization [3, 9, 11]. Set theory: High utility itemset mining aims at recognizing items usually purchased together by locating subsets of items—known as “itemsets”—whose frequency in the transaction dataset meets predetermined thresholds like minimum support or minimum utility. The compilation of all these frequent sets forms what is termed the “frequent itemset”. When it comes to probability and statistics, support is the key. It is how we measure an itemset’s likelihood of appearing in a transaction: simply by dividing the number of times, it appears with the total amount of transactions. Utility meanwhile is calculated through selling off items for profit—which makes Table 6.1 Sample table to show profit margin, i.e., utility Transaction ID
Item
Cost
Selling price
Utility
1
A
$5
$10
$5
1
B
$3
$7
$4
2
A
$5
$10
$5
2
C
$4
$8
$4
3
B
$3
$7
$4
3
D
$6
$12
$6
4
C
$4
$8
$4
6 High Utility Itemset Mining and Inventory Management: Theory …
71
sense why most high utility itemset mining algorithms use optimization techniques to identify sets with maximum returns or highest chance at occurring [12, 13].
6.2 Inventory Management and Uses Mastering inventory management [14–18] is essential for any successful business, as it allows you to keep track of ordering and storing materials, parts, and finished products. Utilizing sound inventory management strategies can provide numerous advantages to any business, such as: • Elevated efficiency. • Perfectly fine-tuning inventory levels.
6.3 Motivations and Contributions 6.3.1 Motivations • Motivated to provide insights for a competitive edge. Using this technology gives companies an advantage since they will have a better comprehension on consumer behavior as well as demand patterns, allowing them to make decisions regarding inventory control and merchandise placement. • By being aware of dynamic customer behavior and demand fluctuations in real time, organizations can quickly modify their supply chain management strategies to match up with the current market trends. This improves efficiency in providing goods or services and enhances profitability for the company.
6.3.2 Contributions • This article is an invaluable source of knowledge that can bring fresh, groundbreaking insights to the field and energize advancement in its respective sector. By doing so, it has the power to initiate considerable progress. • By encouraging us to question our long-standing beliefs and assumptions, it has the potential to foster a greater level of critical thinking and exploration. • This article is an invaluable resource for people wishing to uncover new insights, propelling the industry forward. Researchers can use already established studies as a launching pad to accelerate their work and broaden the existing body of knowledge that has been gathered over time.
72
G. J. Krishna
• This article can be an instrumental resource in captivating and enlightening the public, supplying them with first-hand information on the area of expertise, heightening understanding and knowledge of topical matters. • Working across disciplines gives us the opportunity to collaborate and exchange new ideas, forming diverse perspectives which can spur innovation. Through this kind of study, we can foster a synergy between different teams that leads to greater development of knowledge and creative solutions alike.
6.4 High Utility Itemset Mining Use Cases to Inventory Management 6.4.1 Examples Here is an elaborate example in Table 6.2 of high utility itemset mining: Table 6.2 contains details related to a series of transactions, with every row representing one transaction. The “Transaction ID” column corresponds to the exceptional marker for each trade, the “Item” section identifies what was bought, while “Cost” denotes how much it costed and likewise, the “Selling Price” column indicates its sale price. Lastly, we have the “Utility” showing us our total yield from that particular deal (calculated as Selling Price–Cost). A high utility itemset mining algorithm can be utilized on this data to discover items with an advantageous value. Let us look at the outcomes for example; {A, B} Table 6.2 Sample table to show high utility itemset mining Transaction ID
Item
Cost
Selling price
Utility
1
A
$5
$10
$5
1
B
$3
$7
$4
1
C
$4
$8
$4
2
A
$5
$10
$5
2
B
$3
$7
$4
3
A
$5
$10
$5
3
C
$4
$8
$4
4
B
$3
$7
$4
4
D
$6
$12
$6
5
A
$5
$10
$5
5
D
$6
$12
$6
6
B
$3
$7
$4
6
C
$4
$8
$4
6
D
$6
$12
$6
6 High Utility Itemset Mining and Inventory Management: Theory …
73
had a total of $9 worth in two transactions, {A, C} resulted in $8 from two dealings, and similarly {B, D} was valued at $10 across two deals. Additionally interesting is that A featured prominently as part of multiple sets—when combined with C it created a value of $11 while partnering up with D yielded eight points after two trades. The store manager should consider keeping the itemset {A, D} in stock always due to its maximum potential. This can also be accompanied by an increase of price for this set or with promotional campaigns. For all other itemsets, prices and promotions may either remain same or even be better reduced if necessary. Online retail datasets provide a wealth of knowledge for mining high utility itemsets. Such datasets usually include details like customer demographics, purchase records, browsing habits, and product data. By implementing high utility itemset mining algorithms to these sets of information, it is possible to detect products with higher value that can be exploited in order to enhance inventory management, pricing, and promotional strategies. An online retail dataset may include critical information such as a unique Item ID, name of the item, category it belongs to (i.e., clothing, electronics or home goods), cost to retailer per unit along with selling price, and total number of units sold during specific time period. To better understand our customers, we can track their unique Customer ID and learn more about them through their demographics such as age, gender, and location. Additionally, to gain insights into what they are interested in or looking for on the website as well as any prior purchases made by them, we can keep a record of both browsing history and purchase history. High utility itemset mining using differential evolution [19–22] is applied to a real-world online retail dataset, and the following results are obtained: 1. RED STAR CARD HOLDER, ROLL WRAP 50.S CHRISTMAS Utility: 101122.250$ 2. RED STAR CARD HOLDER, SET of 3 COLOURED FLYING DUCKS Utility: 94721.980$ 3. GREEN PEONY CUSHION COVER, RED STAR CARD HOLDER Utility: 89766.990$ 4. PAPERWEIGHT VINTAGE COLLAGE, RED STAR CARD HOLDER Utility: 88671.330$ 5. PINK POLKADOT GARDEN PARASOL, RED STAR CARD HOLDER Utility: 86096.890$ As displayed above, the analytical findings are that the top itemset mined is RED STAR CARD HOLDER and ROLL WRAP 50.S CHRISTMAS with a maximum utility value of 101,122.250$. As evident from our data, customers who purchase both items have the highest potential to generate revenue for retail stores. To further maximize Return-on-Investment (RoI), we can identify similar customer segments
74
G. J. Krishna
and offer them targeted products [23, 24] in order to increase sales during their next visit.
6.4.2 Use Cases High utility itemset mining is a valuable technique when it comes to inventory management. By recognizing the most profitable and sought-after items, you can determine which particular objects should be given precedence regarding stock levels, costs, and advertising campaigns [25, 26]. Retail businesses rely on inventory management to monitor and manage the movement of goods, from creation to sale. This helps in guaranteeing they have adequate stock available when customers need it. Manufacturing sector depends on inventory management to monitor their raw materials, components, and finished goods in order to maintain the necessary items for meeting production needs as well as remaining free of any shortages. Health care: Hospitals and clinics must use inventory management to effectively monitor the movement of medical supplies and equipment in order to ensure there is an adequate amount for patients. By tracking this information, facilities can guarantee that patient needs are consistently being met. Examples or use cases include: • • • •
Stock optimization. To boost their earnings. Strategic store layout. Utilizing high utility itemset mining, retailers can predict future consumer demand. • By utilizing high utility itemset mining, fraudulent activity can be detected and prevented.
6.5 Future Directions and Conclusions High utility itemset mining has been a beneficial asset for inventory management, optimizing levels of stock, pricing strategies, and promotion tactics. Despite its effectiveness, there still exist many aspects to this technology that have scope for further development and enhancement. • E-commerce and IoT devices. • Predictive analytics. • Unlock a more personalized shopping experience. Ultimately, high utility itemset mining promises a wealth of opportunities in inventory management. As technology progresses and retailers and wholesalers explore this capability further, they can apply it to make better decisions that will drive greater profits. Ultimately, high utility itemset mining offers an abundance of advantages for inventory management. As technology continues to advance, so does the potential
6 High Utility Itemset Mining and Inventory Management: Theory …
75
that this incredible tool has in improving decision-making processes and increasing profits. Retailers and wholesalers have already experienced success through its usage, and harnessing its power further can prove to be beneficial on a much larger scope. Declaration We have taken permission from competent authorities to use the images/data as given in the paper. In case of any dispute in the future, we shall be wholly responsible.
References 1. Lin, J.C.W., Yang, L., Fournier-Viger, P., Hong, T.P., Voznak, M.: A binary PSO approach to mine high-utility itemsets. Soft. Comput. 21, 5103–5121 (2017). https://doi.org/10.1007/s00 500-016-2106-1 2. Lin, J.C.-W., Yang, L., Fournier-Viger, P., Wu, J.M.-T., Hong, T.-P., Wang, L.S.-L., Zhan, J.: Mining high-utility itemsets based on particle swarm optimization. Eng. Appl. Artif. Intell. 55, 320–330 (2016). https://doi.org/10.1016/j.engappai.2016.07.006 3. Chan, R., Yang, Q., Shen, Y.-G.: Mining high utility itemsets. In: Third IEEE International Conference on Data Mining, pp. 19–26. IEEE Computer Society, Melbourne, FL, USA (2003). https://doi.org/10.1109/ICDM.2003.1250893 4. Fournier-Viger, P., Wu, C.-W., Zida, S., Tseng, V.S.: FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning. In: Andreasen, T., Christiansen, H., Cubero, J.C., Ra´s, Z.W. (eds.) 21st International Symposium on Methodologies for Intelligent Systems, pp. 83–92. Springer, Cham, Roskilde, Denmark (2014). https://doi.org/10.1007/978-3-319-083 26-1_9 5. Zida, S., Fournier-Viger, P., Lin, J.C.W., Wu, C.W., Tseng, V.S.: EFIM: a highly efficient algorithm for high-utility itemset mining. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp. 530–546. Springer (2015). https://doi.org/10.1007/978-3-319-27060-9_44 6. Liu, M., Qu, J.: Mining high utility itemsets without candidate generation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management— CIKM’12, pp. 55–64. ACM Press, Maui, Hawaii, USA (2012). https://doi.org/10.1145/239 6761.2396773 7. Zihayat, M., An, A.: Mining top-k high utility patterns over data streams. Inf. Sci. 285, 138–161 (2014). https://doi.org/10.1016/J.INS.2014.01.045 8. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques: Concepts and Techniques. Elsevier (2011) 9. Yao, H., Hamilton, H.J., Butz, C.J.: A foundational approach to mining itemset utilities from databases. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 482–486. Society for Industrial and Applied Mathematics, Philadelphia, PA (2004). https:// doi.org/10.1137/1.9781611972740.51 10. Bhattacharyya, S.: Evolutionary algorithms in data mining. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining—KDD’00, pp. 465–473. ACM Press, New York, New York, USA (2000). https://doi.org/10.1145/347090. 347186 11. Yao, H., Hamilton, H.J.: Mining itemset utilities from transaction databases. Data Knowl. Eng. 59, 603–626 (2006). https://doi.org/10.1016/J.DATAK.2005.10.004 12. Christian, A.J., Martin, G.P.: Optimization of association rules with genetic algorithms. In: 2010 XXIX International Conference of the Chilean Computer Science Society, pp. 193–197 (2010). https://doi.org/10.1109/SCCC.2010.32
76
G. J. Krishna
13. Shenoy, P.D., Srinivasa, K.G., Venugopal, K.R., Patnaik, L.M.: Dynamic association rule mining using genetic algorithms. Intell. Data Anal. 9, 439–453 (2005) 14. Singh, D., Verma, A.: Inventory management in supply chain. Mater. Today Proc. 5, 3867–3872 (2018). https://doi.org/10.1016/J.MATPR.2017.11.641 15. Ivanov, D., Tsipoulanidis, A., Schönberger, J.: Inventory Management, pp. 385–433 (2021). https://doi.org/10.1007/978-3-030-72331-6_13 16. MacAs, C.V.M., Aguirre, J.A.E., Arcentales-Carrion, R., Pena, M.: Inventory management for retail companies: a literature review and current trends. In: Proceedings—2021 2nd International Conference on Information Systems and Software Technologies, pp. 71–78. ICI2ST 2021 (2021). https://doi.org/10.1109/ICI2ST51859.2021.00018 17. Agrawal, N., Smith, S.A.: Optimal inventory management for a retail chain with diverse store demands. Eur. J. Oper. Res. 225, 393–403 (2013). https://doi.org/10.1016/J.EJOR.2012.10.006 18. Ehrenthal, J.C.F., Honhon, D., van Woensel, T.: Demand seasonality in retail inventory management. Eur. J. Oper. Res. 238, 527–539 (2014). https://doi.org/10.1016/J.EJOR.2014. 03.030 19. Krishna, G.J., Ravi, V.: High utility itemset mining using binary differential evolution: an application to customer segmentation. Expert Syst. Appl. 181, 115122 (2021). https://doi.org/ 10.1016/J.ESWA.2021.115122 20. Eltaeib, T., Mahmood, A.: Differential evolution: a survey and analysis. Appl. Sci. 8, 1945 (2018). https://doi.org/10.3390/app8101945 21. Engelbrecht, A.P., Pampara, G.: Binary differential evolution strategies. In: 2007 IEEE Congress on Evolutionary Computation. pp. 1942–1947. IEEE, Singapore (2007). https://doi. org/10.1109/CEC.2007.4424711 22. Krishna, G.J., Ravi, V.: Feature subset selection using adaptive differential evolution: an application to banking. In: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, pp. 157–163. ACM Press, New York, New York, USA (2019). https://doi.org/10.1145/3297001.3297021 23. Krishna, G.J., Ravi, V.: Mining top high utility association rules using binary differential evolution. Eng. Appl. Artif. Intell. 96, 103935 (2020). https://doi.org/10.1016/J.ENGAPPAI. 2020.103935 24. Sarath, K.N.V.D., Ravi, V.: Association rule mining using binary particle swarm optimization. Eng. Appl. Artif. Intell. 26, 1832–1840 (2013). https://doi.org/10.1016/j.engappai.2013.06.003 25. Krishna, G.J., Ravi, V.: Evolutionary computing applied to customer relationship management: a survey. Eng. Appl. Artif. Intell. 56, 30 (2016). https://doi.org/10.1016/j.engappai.2016.08.012 26. Krishna, G.J., Ravi, V.: Evolutionary computing applied to solve some operational issues in banks. In: Datta, S., Davim, J. (eds.) Optimization in Industry. Management and Industrial Engineering, pp. 31–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-01641-8_ 3
Chapter 7
Using Clustering Approach to Enhance Prioritization of Regression Test Cases Umakanta Dash, Arup Abhinna Acharya, and Satya Ranjan Dash
Abstract Regression testing is necessary to maintain software quality, so it is expensive. The prioritization test case is a popular strategy for lowering this expense. When a change is made to an existing system, this testing is done to check for faults. It is more effective for test cases to be scheduled utilizing the test case prioritization technique to meet specified performance criteria. Many scholars have developed regression test case prioritizing algorithms; based on clustering methodologies to minimize the cost and improve testing’s ability to find faults. We describe a method in this research that can be used to increase the effectiveness of various clustering techniques. Code complexity and code coverage are used in prioritization strategies that use clustering approaches to enhance the effectiveness of the prioritization. Ambiguities and uncertainties are present in the process of choosing an appropriate test case and locating incorrect functionalities.
7.1 Introduction Utilizing a prioritizing strategy to plan the order of test cases being executed helps to improve regression testing. To enhance the commercial value of the systems, it is substantial progress that would be necessary as considerable advancements in software testing are developed. The researchers benefited from the test selection, test suite minimization, and test case priority techniques. Even if the set of test cases discards the collection of test cases, the test suite’s reduction revealed a significant decline in U. Dash (B) · A. A. Acharya School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Deemed to be University, Bhubaneswar, Odisha, India e-mail: [email protected] A. A. Acharya e-mail: [email protected] S. R. Dash School of Computer Application, Kalinga Institute of Industrial Technology (KIIT), Deemed to be University, Bhubaneswar, Odisha, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_7
77
78
U. Dash et al.
the frequency of fault discovery [1, 2]. Testing can begin with TCP, which is a type of development testing. Test case prioritization is a topic that is being discussed by many researchers in a variety of approaches. The objective of prioritization strategies that take coverage into account is to expand the coverage of data, before scheduling them for regression testing, test cases are prioritized to optimize some objective functions [1]. The following priority groups, for example, may be defined for the test scenarios: Priority 1: To fix the serious defects before the release of the final product, the test cases must be run. Priority 2: Test case execution is possible if there is time. Priority 3: Before the current release, the test cases were not crucial. It might be tested soon after the current software version is released. Priority 4: The test case has little to no significance because of how small an impact it has. The absence of low-priority test cases from the software is ensured by such a priority system. The main objective of this study is to evaluate and identify current methods for prioritizing the data sets, primarily focused on clearly stated TCP research goals. Several approaches that will enhance the efficiency and efficacy of regression testing are compared in this research. Within the research, these methods are grouped under the headings of eliminating, selecting, and prioritizing [2]. It will motivate several points of view that are given below: (1) To better comprehend their relationships, we propose a novel 2-way clustering that groups both test case and inappropriate functions, (2) By introducing a novel similarity measure that takes advantage of the measurement capability and offers data smoothing for efficient convergence, and (3) It is considered how the data are distributed approaches while calculating the dominancy and similarity metrics to accurately optimize the accuracy-time trade-off. The rest of the paper contains, in the second section summaries the background of TCP and clustering approach, and related work relevant to it, in the third section describes prioritization techniques that incorporate different clustering approaches. The findings of the experiment, analysis, and conclusions of the studies are described in the last section.
7.2 Background Studies and Related Work A. Test Case Prioritization (TCP) Regression testing expense reduction is the technique’s principal objective. The test case prioritization method will increase the effectiveness of the testing procedure. To achieve a certain purpose, like a high fault detection rate or enough code coverage, test cases must be prioritized and scheduled in some sequence [2]. The following objectives are just a few that test case prioritization can address: (a) The goal of software testers and developers is to find more errors faster. (b) Early identification of high-risk problems within the software testing. (c) To enhance the likelihood of regression problems caused by particular code modifications initial several phases of the testing procedure. (d) To increase the rate at which coverable code is covered. By ranking test cases according to a specific technique, test case prioritization serves
7 Using Clustering Approach to Enhance Prioritization of Regression Test …
79
a function [3]. “Code-based test case prioritizing” refers to the process of ranking test cases according to the system’s source code. Methods for prioritizing test cases based on code rely on specifics to connect the tests in the test suite to different parts of the code for the original system (before modification). Code-based test priority aims to achieve early problem discovery during regression testing of an upgraded system code [1]. The many code-based methods include [1]: a. Comparator Techniques • No prioritization: Using no technique, i.e., untreated test suites. • Random prioritization: The test suites are arranged in random order. • Optimal prioritization: It arranges the test cases in descending order of fault detection rate. This technique is not practical because it requires a prior understanding of the problems that exist and which test cases disclose them. b. Statement-Level Techniques • Total statement coverage prioritization: It ranks test cases based on the number of statements they encompass in total, with each test case covering the same number of statements. The test cases are then sorted in decreasing order of number. • Prioritization of total branch coverage: It sorts test cases in order of total branch coverage achieved to prioritize them based on the total number of branches they cover. c. Function-Level Techniques • Total function coverage prioritization: Total statement coverage is analogous to this, but statements are replaced by functions. Due to the coarse granularity, collecting function-level traces costs less than collecting statementlevel traces in total statement coverage. • Fault-Exposing Potential (FEP) Prioritization: This thought might obscure an important fact about test cases and flaws: A test case’s ability to expose a flaw is determined not only by whether it reaches (executes) a faulty statement but also by the likelihood that the flaw in that statement will fail. B. Using Clustering with TCP Like with all other problems of this kind, clustering is the most significant unsupervised learning approach; it focuses on extracting an unlabeled data set for under test suite. A set of items is separated into clusters to make groups of items that are more similar to one another than other groups’ patterns. Pair-wise comparison is extremely resilient due to redundancy, but because of its high cost, it cannot be used to prioritize test cases. A person may reliably execute up to 100 comparisons [1] before consistency fails significantly and effectiveness is reduced. However, the test suite could only include 14 test cases if it needed to perform less than 500 pair-wise comparisons. Scalability is a difficult problem to solve in real-world situations. The total number of pair-wise comparisons necessary, for instance, would be 699,656 if there were 1500 test cases to rank. To expect a human tester to accurately respond to
80
U. Dash et al.
so many comparisons is impractical [4]. There are many different clustering methods, including K-means, Clarans, Birch, Cure, Dbscan, Optics, Sting, and Clique. There are other categories into which these algorithms can be split. Partitioning, hierarchical, and density-based are three common categories. Clustering-based prioritization techniques, for example, are used to prioritize groups of test cases rather than a single test case at a time [5]. In given Fig. 7.1 it describes how different clustering approaches to TCP under a test suite. C. K-Means Clustering K-mean clustering, one of the most well-known partitioning approaches and also one of the easiest unsupervised learning algorithms, handles the well-known clustering problem. The purpose is to divide the data into k clusters using an iterative relocation strategy that reaches a local minimum, where k is the predefined input value. The initial step involves selecting k randomly chosen centers, one for each cluster. Before assigning a data point to the cluster that is closest to it, the distance between each data point in the dataset and the cluster centers must first be determined [6]. Calculating the distance frequently necessitates taking into account the Euclidean distance. Initial
Fig. 7.1 Systematic data flow of TCP using clustering approaches
7 Using Clustering Approach to Enhance Prioritization of Regression Test …
81
grouping is completed once a few clusters include all of the data points. After that, new centers are determined by averaging the points in the clusters. The introduction of new points may cause the cluster centers to alter, so this is done. Several iterations of this center updating process are carried out until the criterion function is at its minimum or the center halts updating altogether. Drawbacks: (a) when clusters have different sizes, and densities, K-Means can have issues. (b) when data includes deviations, K-means has an issue. D. DBSCAN Clustering Algorithm It is a clustering algorithm that uses estimated density distributions of input nodes to identify some groups. Reachability and connectivity are critical at different densities. Density reachability determines if two distantly close points are part of the same cluster [6]. Considering as shortcoming only if DBSCAN’s distance measure in the function to obtain neighbors is precise will it be able to produce a good clustering (P, Epsilon). To evaluate euclidean distance measure may almost lose all of its value, especially for high-dimensional data. E. Hierarchical Clustering Algorithm The objects are divided into either agglomerate or divide: agglomerative algorithms merge groups in various iterations based on a distance metric, starting with each object as its cluster. When every object is grouped together or at any other point specified by the user, clustering may end. In most cases, these techniques use a bottom-up merging that is greedy-like. (b) Divisive algorithms use the opposing approach. They start by creating a single group that contains all of the objects and then divide that grouping into subgroups to arrange the objects into any kind of cluster or other arrangements they desire [6]. The data objects are split up into various groups at each stage utilizing a divisive technique, which then repeats this process until each object is placed in its cluster. F. Related Work In their research paper, Elbaum [2] discussed ways to bring the cost of regression testing down. One of the advantages is increased test suite fault detection rates. They suggested a new approach to put suit selection and TCP into the test strategy, which described how to make use of time frames to track faults in test suites [2]. Prioritization of Testing Requirements (PORT), a system-level testing technique based on value developed by Srikanth et al. [5], would help identify the most severe system test failures more quickly and boost the effectiveness of the software industry. Additionally, they created the absolute minimum set of PORT requirements that may be effectively employed with TCP [5]. Korel and Koutsogiannakis [8] proposed several model-based test case prioritization techniques emphasized code and model-based prioritization in their experiments as well as a heuristic TCP rule set, as well as extended observation of a finite state machine demonstrating that code-based TCP, is inferior to model-based TCP [8, 9]. The empirical tests of regression testing selection methods were subjected to systematic analysis. They also discovered a qualitative
82
U. Dash et al.
analysis of the findings, a description of regression test selection methods, and relevant empirical evidence. Because the outcomes are determined by a variety of factors, no technique was deemed superior [9]. The research purpose was to explain how PORT’s system-level priority affects the rate at which extreme errors are discovered [10]. To address the shortcomings of DBSCAN and K-means clustering algorithms, Mumtaz and Duraiswamy [11] developed a novel density-based K-means clustering method, and as a result, an enhanced version of the algorithm was produced [11]. Carlson et al. [12] are examples of innovative prioritization strategies that use code coverage, code complexity, and history data for real problems, as well as a clustering methodology. The prioritization of test cases is enhanced by the clustering method described in this paper [12]. The authors suggested a semi-supervised regression testing method [13]. They integrate K-means with a semi-supervised nonlinear dimension [13, 14]. The UML state-chart diagram was converted into a component interaction graph using a new regression testing prioritization methodology [14]. Catal [15] conducted a systematic review of the literature on the use of prioritized test cases, genetic algorithms are used [15]. Using the average percentage of detected faults (APFD) measure, Upadhyay and Misra [16] suggest clustering-based prioritizing [16]. Malhotra and Tiwari [17] proposed using a Java-based genetic algorithmbased tool to prioritize test cases. The design also emphasizes the importance and advantages of employing the metric average percentage of block coverage as a fitness assessment function in the genetic algorithm [17, 18]. The Requirements, Design, and Code Collaboration test case prioritization method provides combined systematic requirement specification data, design documents, and source codes, and the results show that using collaborative knowledge during the prioritization process can be beneficial [18]. Indumathi and Selvamani [19] mentioned implementing an open dependence paradigm in one of the TCP approaches. The researchers discuss their strategy with the capabilities of the functional inputs for the suggested method [19] to effectively prioritize the test cases. Pathania and Kaur [20] prioritizing methods that make use of code complexity, code coverage, and a clustering strategy to improve the effectiveness of the prioritization. To improve the effectiveness and fault detection rate of the suggested DBKmeans clustering, the same test case prioritization method should be used [20]. Wang and Zeng [21] proposed a method for collecting TCP chronological data. TCP is a dynamically calculated rate of fault detection and Average Percentage of Fault Detected (APFD) in regression testing based on historical data [21]. History-based test prioritization, according to Rosero et al. [22], is simple to use. To identify regression flaws, use history-based test prioritization in conjunction with coverage-based TCP [22]. Spieker et al. [23] proposed a method for classifying TCPs based on their paths and detecting errors using machine learning [23]. Sultan et al. [24] provided a comprehensive overview of numerous research papers, as well as their methods, approaches, and methodology. Many methods for prioritizing test cases are investigated and compared. Various test case prioritization methods are discussed in this review article. It will assist researchers in determining which methodology is best suited to which scenario [24]. Additionally, each system function was analyzed [24]. Researchers suggest an adaptive random
7 Using Clustering Approach to Enhance Prioritization of Regression Test …
83
sequences strategy based on black-box information clustering methods. They implement the K-means and K-medoids clustering algorithms are used to group test cases based on the number of objects and methods, and then the K-medoids clustering algorithm is used to group test cases based on how similar objects and methods are invoked. Experiments were also performed to put the proposed strategy to the test; the results showed that the proposed strategy was more effective than random prioritizing and method coverage TCP techniques [25] and had a higher chance of detecting faults early. The methodology for comparing regression test cases uses both an ant colony model and a hybrid particle swarm model [26]. They used the statement, branch, and fault techniques as the foundation for their idea [26]. Lachmann et al. [27] proposed a machine learning algorithm approach based on supervised learning for black-box testing in large data sets from the manufacturing industry. By analyzing metadata and language objects, the authors devised a method for determining the priority value of manually conducted test cases [27]. Panwar et al. [28] presented an improved meta-heuristics method for selecting an optimal TCP approach using ant colony optimization techniques. It reorders in time-constrained environments, test cases are selected using mechanisms for selecting a suitable set of test cases from the testing process, revealing errors in the modified analysis [28]. Panda et al. [29] proposed a behavioral model that transforms the state diagram into a state-chart graph. This method is thought to be useful for determining the critical value of test cases, as well as their requirements, as well as for creating a test case database in chronological order [29]. Panda et al. [30] described in their paper some following key points were included: (a) describes the testing strategies of a single UML model and a combination of UML diagrams were used to implement this. (b) emphasizes information extraction methods from the UML model in the intermediate representation, (c) the meta-heuristics methods are used in the model-based testing research field, (d) hybrid model-based testing methodology, and (e) a model-based testing framework based on 7 nature-stimulated meta-heuristic algorithms and hybrid metaheuristic algorithms [30]. As discussed, machine learning techniques can help with test case prioritization [31]. According to some of the most recent research in this field, which is very effective in addressing the problem of test case prioritization, but has the drawback of requiring large training data sets to produce accurate results [31].
7.3 Experimental Result and Analysis This section explains how test cases are clustered into groups and how clusters are initialized. Following test case clustering, it continues to show how test case prioritization is beneficial. The effectiveness of clustering approaches will next be assessed using a test suite. The following illustrates test case prioritization using clustering:
84
U. Dash et al.
A. Procedure: Using Clustering Approach with TCP Input: Ti, a test suite; PTi, the set of Ti permutation; f, a PTi, function to a real number Output: Evaluate Ti' e PTi in such way that (¥Ti'' ) (Ti'' e PTi) (Ti'' =Ti' ) [f (Ti' ) ≥f (Ti'' )]. Prioritized test suite Ti' { Set T' =Ø { For each Test case Ti= {TC1, TC2, TC3, TC4,……. TCn} do Cluster i =i=1 to n Ti Product (PTi) = St. Coverage * function call
Calculate
1 ∑n i=1 T i n
and mean of cluster.
} ¥ Ti → PTi. Ti' → Ti }
As earlier discussed, we shall multiply the number of function calls by the statement coverage. This functions as the software metric or the framework for the set of test cases that have been selected. Since each test case is now assigned a value, they will be prioritized in increasing order, starting with the test case with the maximum priority. This example Fig. 7.2 shows how clustering helps test the proposed methods to execute how the statement coverage and number of function calls are acquired outputs the prioritized (in some order, either ascending or descending) test suite. The code is tested using seven test cases, and Table 7.1 also includes coverage information. PT i is determined for each T i, i.e., run in a test suite. Typically, only the number of statements and function calls that a particular cluster has traversed are used to determine this. It is calculated mathematically by multiplying the no. of statements with the no. of function calls that C i covers which are given in Table 7.2 nonprioritized and Table 7.3 prioritized (in some order ascending) test suite. Figures 7.3 and 7.4 shows the difference in code coverage of the mean cluster which was minimized while applying the prioritized method rather than non-prioritized under the test case as well as in functionality. B. Research Questions and Its Significants Here is a summary of several performance-related issues that have been studied to determine how well an algorithm performs: a. Can the TCP technique be made more effective with a clustering approach? b. These methodological approaches are related to the recently developed based on priority for TCP during regression testing. c. Can a prioritized technique based on clustering outperform non-clustering techniques in terms of optimizing APFD value? The experiment research solution can also support different clustering methods applied to different TCP methodologies.
7 Using Clustering Approach to Enhance Prioritization of Regression Test … Fig. 7.2 An illustrative example of sample code
Table 7.1 An illustrative example Product (PT i ) = statement coverage * Number function calls
Test case
Statement coverage
Total no statement coverage
No function call
TC1
1-2-3-4-5-6-7
7
1
7
TC2
1-8-9-10-11-12-13
7
1
7
TC3
1-2-3-5-7
5
1
5
TC4
1-2-3-5-6-7
6
1
6
TC5
1-2-3-5-7-8
6
2
12
TC6
1-8-9-11-13
5
1
5
TC7
1-8-9-11-12-13
6
1
5
Table 7.2 Test cases under non-prioritized using cluster 1 ∑n Cluster (C i ) Test case (T i ) Statements covered of mean cluster i=1 T i n C1
TC1, TC2
7
C2
TC3, TC4, TC5
7.67
C3
TC6, TC7
5
6.55
85
86
U. Dash et al.
Table 7.3 Test cases under prioritized using cluster 1 ∑n Cluster (C i ) Test case (T i ) i=1 T i n C1
TC3, TC6
5
C2
TC4, TC5, TC7
6
C3
TC1, TC2
7
Statements covered of mean cluster 6
Fig. 7.3 Test cases under non-prioritized and prioritized using cluster Fig. 7.4 Comparison between non-prioritization and prioritization with statements covered of mean cluster
d. What is the cost function and accuracy to use the clustering approach? This analytical survey of several experts uses the cost of evaluation for experiment research. e. How to decide or calculate no cluster under the test suite. Details about available performance metrics and objects, as well as the reasons for developing and connecting a specific test case prioritization approach.
7 Using Clustering Approach to Enhance Prioritization of Regression Test …
87
7.4 Conclusion The level of testing necessary has a direct impact on the cost of software development. In this work, a clustering-based strategy for TCP was created before increasing the effectiveness of code coverage. Under TCP, which comprises scheduling test cases in a way that makes them more successful at reaching particular performance targets, prioritized cases typically perform better in analysis than non-prioritized cases. This study might take focus on future research work like: a. It is possible to enhance the inter-cluster prioritization technique. b. This approach might consider an additional regression testing parameter that affects the regression test’s cost. c. Clustering has been put into tested using synthetic data produced by a gaussian distribution function. The challenges associated with using the method in a certain application area may be addressed in future research.
References 1. Rothermel, G., Untch, R.H., Chengyun Chu, Harrold, M.J.: Test case prioritization: an empirical study. IEEE Trans. Softw. Engi. (1999). Software maintenance for business change (Cat. No.99CB36360). https://doi.org/10.1109/icsm.1999.792604 2. Elbaum, S., Malishevsky, A.G., Rothermel, G.: Test case prioritization: a family of empirical studies. IEEE Trans. Software Eng. 28(2), 159–182 (2002) 3. Chung, K., Tainand, Y.L.: A test generation strategy for pairwise testing. IEEE Trans. Softw. Eng. 28, 109–111 (2002) 4. Chen, T.Y., Poon, P.L.: A choice Relation framework for supporting category-partition test case generation. IEEE Trans. Softw. Eng. 29(7), 577–593 (2003) 5. Srikanth, H., Williams, L., Osborne, J.: System Test Case Prioritization of New and Regression Test Cases. International Computer Software and Applications Conference, Chicago, Illinois (2006) 6. Berkhin, P.: Survey of Clustering Data Mining Techniques, Accrue Software, Inc Grouping Multidimensional Data. Springer (2006) 7. Yoo, S., Harman, M.: Regression testing minimisation, selection, and prioritisation: a survey. Test Verif Reliab 1, 1–7 (2007) 8. Korel, B., Koutsogiannakis, G.: Experimental comparison of code-based and model-based test prioritization. In: IEEE International Conference on Software Testing Verification and Validation Workshops (2007) 9. Engström, E., Runeson, P., Skoglund, M.: A Systematic Review on Regression Test Selection Techniques, Information and Software Technology. Elsevier (2009) 10. Srikanth, H., Williams, L.: Requirements-based test case prioritization. IEEE Trans. Softw. Eng. 28, 1–2 (2010) 11. Mumtaz, K., Duraiswamy. K.: A novel density-based improved k-means clustering algorithm— Dbkmeans. Int. J. Comput. Sci. Eng. 2(2), 213–218 (2010) 12. Carlson, R., Do, H., Denton, A.: A clustering approach to improving test case prioritization: an industrial case study. In: 27th IEEE International Conference on Software Maintenance (ICSM), pp. 382–391 (2011)
88
U. Dash et al.
13. Chen, S., Chen, Z., Zhao, Z., Xu, B., Feng, Y.: Using semi-supervised clustering to improve regression test selection techniques. In: Fourth IEEE International Conference on Software Testing, Verification, and Validation, pp. 1–10 (2011) 14. Mohanty, S., Acharya, A.A.,Mohapatra, D.P.: A model-based prioritization technique for component-based software retesting using UML state chart diagram. In: International Conference on Electronics Computer Technology, IEEE (2011) 15. Catal, C.: On the application of genetic algorithms for test case prioritization: a systematic literature review. In: Proceedings of the 2nd International Workshop, Springer (2012) 16. Upadhyay, A.K., Misra, A.K.: Prioritizing test suites using clustering approach in software testing. Int. J. Soft Comput. Eng. ISSN: 2231–2307, 2(4), 222–226 (2012) 17. Malhotra, R., Tiwari, D.: Development of a framework for test case. Prioritization using genetic algorithm. In: ACM SIGSOFT Software Engineering, vol. 38 (2013) 18. Siddik, M.S., Sakib, K.: An effective test case prioritization framework using software requirements, design, and source code collaboration. In: 17th International Conference on Computer and Information Technology (ICCIT) (2014) 19. Indumathi, C.P. Selvamani, K.: Test case prioritization using open dependency structure algorithm. In: Proceedings of International Conference on Intelligent Computing, Communication and Convergence (ICCC), Procedia Computer Science, vol. 48, pp. 250–255. Elsevier (2015) 20. Pathania, Y., Kaur, G: Role of test case prioritization based on regression testing using clustering. Int. J. Comput. Appl. 116, 7–10 (2015) 21. Wang, X., Zeng, H.: History-based dynamic test case prioritization for requirement properties in regression testing. In: International Workshop on Continuous Software Evolution and Delivery. ISBN 978-1-4503-4157-8/16/0 (2016) 22. Rosero, H., Gómez, S., Rodríguez, G.: 15 years of software regression testing techniques—a survey. Int. J. Softw. Eng. Knowl. Eng. 26, 675–689 (2016) 23. Spieker, H., Gotlieb, A., Marijan, A., Mossige, M.: Reinforcement learning for automatic test case prioritization and selection in continuous integration. In: 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 12–22 (2017) 24. Sultan, Z., Bhatti, S.N., Abbas, R., Shah, S.A.A.: Analytical review on test cases prioritization techniques: an empirical study. Int. J. Adv. Comput. Sci. Appl. 8, 293–302 (2017) 25. Chena, J., Zhua, L., Chen, T.Y., Toweyc, D., Kuob, F.C., Huang, R., Guoa, Y.: Test case prioritization for object-oriented software: an adaptive random sequence approach based on clustering. In: 7th IEEE International Workshop on Program Debugging (2017) 26. Agrawal, A. P., Kaur, A.: A comprehensive comparison of ant colony and hybrid particle swarm optimization algorithm through test case selection. In: Data Engineering and Intelligent Computing, pp. 397–405. Springer Singapore (2018) 27. Lachmann, R.: Machine learning-driven test case prioritization approaches for black-box software testing. In: Test and Telemetry Conference. Springer (2018) 28. Panwar, D., Tomar, P., Harsh H., Siddique, M.H.: Improved Meta-Heuristic Technique for Test Case Prioritization. Springer (2018) 29. Panda, N., Acharya, A.A., Mohapatra, D.P.: Test scenario prioritization for object-oriented systems using UML diagram. Int. J. Syst. Assur. Eng. Manage. 10, 316–325 (2019) (Springer) 30. Panda, M., Dash, S.: Test-case generation for model-based testing of object-oriented programs. In: Automated Software Testing, pp. 53–77. Springer (2020) 31. Meçe, E.M., Hakik, P., Binjaku,K.: The application of machine learning in test case prioritization—a review. Eur. J. Electr. Comput. Eng. 4, 1–9 (2020)
Chapter 8
Nucleus Segmentation Using Adaptive Thresholding for Analysis of Blood and Bone Marrow Smear Images Vikrant Bhateja, Sparshi Gupta, Siddharth Verma, Sourabh Singh, Ahmad Taher Azar, Aimé Lay-Ekuakille, and Jerry Chun-Wei Lin
Abstract The analysis of blood and bone marrow smear pictures aids in the detection of hematological malignancies. The colour levels of the nucleus and its neighbours in these smear images aid in identifying cell malignancy. The primary goal of this project is to conduct colour-based segmentation on blood/bone marrow smear images. These photos are first treated to contrast stretching using the dark contrast algorithm (DCA). This enhancement boosts the contrast of the nucleus and its surrounding components. Using an adaptive thresholding approach, colour-based segmentation of enhanced microscopic pictures (blood/bone marrow smear images) was performed. The colour segmented image is then transformed to binary to create the binary mask, which is then evaluated using the overlap ratio (OR) to determine performance. V. Bhateja (B) Department of Electronics Engineering, Faculty of Engineering and Technology, Veer Bahadur Singh Purvanchal University, Shahganj Road, Jaunpur, Uttar Pradesh 222003, India e-mail: [email protected] S. Gupta · S. Verma · S. Singh Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial College of Engineering and Management, Lucknow, Uttar Pradesh, India S. Gupta Vodafone Idea Limited, Gomti Nagar, Lucknow, Uttar Pradesh, India S. Verma LTIMindtree Pune Hinjewadi, Pune, Maharashtra, India A. T. Azar College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia Faculty of Computers and Artificial Intelligence, Benha University, Banha 13518, Egypt A. Lay-Ekuakille Department of Innovation Engineering, University of Salento, Ed. “Corpo O”, Via Monteroni, 73100 Lecce, Italy J. C.-W. Lin Western Norway University of Applied Sciences, Bergen, Norway © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_8
89
90
V. Bhateja et al.
8.1 Introduction The study of blood cells is the foundation of cell anatomy. This anatomy is performed utilising many aspects that aid in the detection of malignancy in leukocytes (also known as white blood cells-WBCs). Their shape and colour variations, among other diagnostic traits, play an important role in analysis. The colour characteristics not only give information about healthy and cancerous cells, but also aid in the differentiation of different types of WBCs. Manual observations are the traditional way for identifying cell kinds; colour factors play an important part in these approaches. These traditional procedures, however, are prone to inaccuracies due to parallax error [1]. Computer-based analysis aids in the identification and calculation of parameters and alterations that are imperceptible to the naked eye [2]. There are various colour variations and unpredictability that can only be estimated through computer-aided analysis [3–6]. Harun et al. [7] suggested a contrast stretching approach that segments nucleus by combining noise reduction via median filter and dark contrast algorithm. Clustering-based segmentation aids in the analysis of colour-based characteristics. Su et al. [8] proposed a strategy for segmenting bone marrow smear pictures that employs K-means clustering and hidden Markov random field (HMRF). This form of segmentation was focused on colour and was successful in classifying WBC kinds. K-means clustering has also been used in other publications [9]. K-medoid clustering was used in [10] to partition blood smear pictures. In colour-based segmentation to detect blood abnormalities, the results of this clustering were fairly excellent. Dasariraju et al. [11] used multi-Otsu thresholding to conduct colour-based segmentation; however, it did not save the initial colour of the nucleus. Eckardt et al. [12] developed a deep learning strategy for detecting mutations in bone marrow smear pictures, which was implemented using faster region-based convolutional neural net (FRCNN) and an image annotator tool for segmentation and boundary identification. Adaptive thresholding has been shown in experiments to be a successful strategy in colour-based segmentation thresholding procedures. The method chosen for this article predominantly employs DCA for picture contrast stretching and adaptive thresholding for colourbased segmentation. The remainder of the paper is organised as follows: Sect. 8.2 discusses the suggested technique of contrast stretching and colour-based segmentation, Sect. 8.3 discusses the experimental results and analysis, and Sect. 8.4 ends the article.
8.2 Proposed Approach of Contrast Stretching and Colour-Based Segmentation The suggested method for conducting colour-based segmentation on blood smear/ bone marrow images begins with contrast stretching and is followed by segmentation. To perform contrast stretching, the technique used is dark contrast algorithm (DCA).
8 Nucleus Segmentation Using Adaptive Thresholding for Analysis …
91
This increases the contrast of this RGB image, making it ideal for segmentation experiments. Adaptive thresholding is also used for colour-based segmentation.
8.2.1 Contrast Stretching Using Dark Contrast Algorithm (DCA) The dark contrast algorithm (DCA) is a technique for expanding the contrast of darker locations. This aids in identifying the colour pigmentation of blood cells and their components. The rate at which the pixel intensity is stretched is determined by DCA factors such as threshold value (TV) and stretching factor (HV). TV should be smaller than HV for the dark stretching procedure [13]. Algorithm#1 [14] describes the stages via which DCA operates. Algorithm#1: Approach of Contrast Stretching Begin Step 1: Input image as a Step 2: Initialize DCA variables TV ← 100 HV ← 150 Step 3: Read variables of image p N r ← Number of rows N c ← Number of columns min_a ← Minimum Intensity max_a ← Maximum Intensity Step 4: For k = 1 to r Step 5: For l = 1 to c Step 6: If a (k, l) < TV a (k,l) = [((a (k, l) − min_a)/(TV-min_a)) *NV] Else a (k, l) = [((a (k, l)-TV)/(max_a-TV)) *(255- HV) + HV End If End For End For Step 7: Display Contrast Stretched Image a End
These output contrast stretched imageries are then evaluated to see if they have been enhanced using parameters such as measure of enhancement using entropy (EMEE) and measure of enhancement (EME) [15, 16].
92
V. Bhateja et al.
8.2.2 Colour-Based Segmentation Using Adaptive Thresholding Adaptive thresholding is utilised to segment these improved blood and bone marrow smear pictures based on colour. Adaptive thresholding has been discovered to be effective. Some sections of a picture stay in darkness, and illuminations frequently change the image. A global or standard thresholding value is used as the mean value in the traditional thresholding approach [8]. The colour of the picture histogram is used as the threshold parameter in this work. Because the green channel contains the most prominent colour in these photos that may be utilised for segmentation, the histogram of the green channel is employed as the thresholding parameter. Let I (x; y) be an image that will be segmented using thresholding. G (x; y) is the normalised gradient magnitude calculated from Eq. (8.1). G(x, y) =
∇ I (x, y) . max(x, y)∇ I (x, y)
(8.1)
To use adaptive thresholding, we take T (x; y) to be the adaptive threshold to be discovered by which the picture is divided into a label image using Eq. (8.2). L(x, y) =
1, i f I (x; y) > T (x; y) . 0, i f I (x; y) < T (x; y)
(8.2)
The nucleus acquired as a result of this segmentation can be utilised to collect colour information for healthy and cancerous nuclei. This picture is also transformed to binary mask in order to achieve a clean segmented image. This segmented picture is binarised to evaluate segmentation performance. With relation to the ground truth, the segmentation is assessed using the overlap ratio (OR) [17]. Algorithm#2 depicts the procedural stages for the suggested technique of colour-based segmentation of improved blood and bone marrow smear images. Algorithm#2: Procedural Steps for Segmentation using Adaptive Thresholding Begin Step 1: Input Contrast Enhanced Image as a Step 2: Read Histogram of I a Step 3: Initialize Thresholding parameter Th Step 4: Initialize G (x, y) as Th Step 5: Calculate normalized gradient using Eq. (8.1) Step 6: Calculate adaptive threshold using Eq. (8.2) Step 7: Display segmented image L Step 8: Convert L to binary format Step 9: Display binary mask of segmented image as B End
8 Nucleus Segmentation Using Adaptive Thresholding for Analysis …
93
8.3 Experimental Results and Analysis 8.3.1 Dataset The haematological dataset used in the studies is acquired from the following online repositories: The ASH [10] dataset is utilised for bone marrow smear pictures, and the CIA [11] dataset is used for blood smear images. These photographs depict both malignant and non-cancerous blood stains. These photographs were captured with a high-resolution camera and have various dimensions, 400 × 400 and 720 × 960, respectively. One test image from each dataset was chosen to perform colour-based segmentation. The proposed approach has been applied to both categories of test photographs, and the results have been documented.
8.3.2 Simulation Results and Analysis Two test visuals (Cases #1 and #2) from each dataset are used to perform contrast stretching with DCA and the adaptive thresholding method [7, 18]. Image quality assessment (IQA) criteria such as EME and EMEE were used to assess the effectiveness of contrast stretched images created by DCA, while the overlap ratio (OR) is used to assess the performance of segmentation techniques. As illustrated in Fig. 8.1, the difference in intensity between pictures (a) and (b) is clearly discernible. As seen in figure (b), the contrast of the nucleus has risen, and the colour of the nucleus is an essential criterion used by haematologists to screen for cancer. Furthermore, the colour segmented pictures in (c) show the outcomes of effective colour-based nucleus segmentation. These improved and segmented pictures are also subjectively appraised. Table 8.1 shows the picture quality of contrast stretched images. When the values of EME [15] and EMEE [16] for the input and contrast enhanced pictures are compared, a considerable rise in the values of IQA metrics is seen. This rise indicates that the enhanced or contrast stretched image has superior texture and contrast when compared to the input image. Similarly, the colour segmented picture is assessed using the overlap ratio (OR) [17] as given in Table 8.2. The value of OR reaches 1 in both photos, indicating complete overlap with the ground truth image [17]. When the findings of the colour-based segmented picture are compared to the enhanced image, the nucleus is observed to be successfully segmented. This outcome is further supported by the large numerical value OR. As a result, the performance of the segmentation approach is tested visually and subjectively and determined to be successful.
V. Bhateja et al.
CASE#2
CASE#1
94
(a)
(b)
(c)
(d)
Fig. 8.1 Simulation results of images showing: a input bone marrow smear test image (CASE#1) and blood smear test image (CASE#2), b contrast stretched/contrast enhanced image using DCA, c segmented image with RGB mask, and d segmented image with binary mask
Table 8.1 IQA metrics for input and enhanced microscopic images Metric
Single cell image (CASE#1)
Multi-cell image (CASE#2)
Input image
Enhanced microscopic image
Input image
Enhanced microscopic image
EME
2.309
3.558
3.99
4.523
EMEE
0.0305
0.4590
0.250
0.599
Table 8.2 Evaluation of segmentation technique performance Single cell image (CASE#1)
Multi-cell image (CASE#2)
Metric
Nucleus
Nucleus
OR
0.991
0.903
8.4 Conclusion This study offers a technique for colour-based segmentation of microscope images. First and foremost, DCA was utilised to extend the intensity of the supplied picture. This helped to improve the contrast between the various cell components and the nucleus. EME and EMEE were used to analyse these photos, and both showed a considerable rise in their values. Furthermore, adaptive thresholding was employed for colour-based segmentation. In this approach, only the nucleus is segmented so the number of clusters was initialised to 2. The number of clusters can be adjusted depending on how many segmented pictures are required. The resulting cluster or
8 Nucleus Segmentation Using Adaptive Thresholding for Analysis …
95
segmented picture contains the nucleus’s RGB mask. The segmentation findings were then evaluated using OR, revealing successful nucleus segmentation. This segmented picture can also provide information on colour differences in different parts of the nucleus.
References 1. Jagadev, P., Virani, H.G.: Detection of leukemia and its types using image processing and machine learning. In: Proceedings of International Conference on Trends in Electronics and Informatics (ICTEI 2017), pp. 522–526. IEEE, Tirunelveli (2018) 2. Rezatofighi, S.H., Zadeh, H.S.: Automatic recognition of five types of white blood cells in peripheral blood. Comput. Med. Imaging Graph. 35(4), 333–343 (2011) 3. Rokaha, B., Ghale, D.P., Gautam, B.P.: Enhancement of supermarket business and market plan by using hierarchical clustering and association mining technique. In: Proceedings of International Conference on Networking and Network Applications, pp. 384–389. IEEE (2018) 4. Bhateja, V., Urooj, S., Mehrotra, R., Verma, R., Lay-Ekuakilli, A., Verma, V.D.: A composite wavelets and morphology approach for ECG noise filtering. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds.) Pattern recognition and machine intelligence. International Conference on Pattern Recognition and Machine Intelligence, vol. 8251, pp. 361–366. Springer, Heidelberg (2013) 5. Raj, A., Alankrita, S.A., Bhateja, V.: Computer aided detection of brain tumor in magnetic resonance images. Int. J. Eng. Technol. 3(5), 523–532 (2011) 6. Gupta, P., Tripathi, N., Bhateja, V.: Multiple distortion pooling image quality assessment. Int. J. Converg. Comput. 1(1), 60–72 (2013) 7. Harun, N.H., Bakar, J.A., Hambali, H.A., Khair, N.M., Mashor, M.Y., Hassan, R.: Fusion noise—removal technique with modified algorithm for robust segmentation of acute leukemia cell images. Int. J. Adv. Intell. Inform. 4(3), 202–211 (2018) 8. Su, J., Liu, S., Song, J.: A segmentation method based on HMRF for the aided diagnosis of acute myeloid leukemia. Comput. Methods Prog. Biomed. 152(7), 115–123 (2017) 9. Basavaraju, H.T., Aradhya, V.N.M., Pavithra, M.S., Guru, D.S., Bhateja, V.: Arbitrary oriented multilingual text detection and segmentation using level set and Gaussian mixture model. Evol. Intell. 14, 881–894 (2021) 10. Acharya, V., Ravi, V., Pham, T.D., Chakraborty, C.: Peripheral blood smear analysis using automated computer-aided diagnosis system to identify acute myeloid leukemia. In: IEEE Transactions on Engineering Management, pp. 1–14 (2021) 11. Dasariraju, S., Huo, M., McCalla, S.: Detection and classification of immature leukocytes for diagnosis of acute myeloid leukemia using random forest algorithm. Bioengineering 7(4), 120–131 (2020) 12. Eckardt, J.N., Middeke, J.M., et al.: Deep learning detects acute myeloid leukemia and predicts NPM1 mutation status from bone marrow smears. Leukemia 36, 111–118 (2022) 13. Gupta, S. et al.: Analysis of blood smear images using dark contrast algorithm and morphological filters. In: 10th International Conference on Frontiers of Intelligent Computing: Theory and Applications, pp. 1–10. Springer (2022) 14. Verma, S. et al.: Segmentation of blood smear images using dark contrast algorithm and Kmedoid clustering. In: 7th International Conference on Microelectronics, Electromagnetics and Telecommunications, pp. 1–10. Springer (2022) 15. Trivedi, M., Jaiswal, A., Bhateja, V.: A no-reference image quality index for contrast and sharpness measurement. In: 3rd IEEE International Advance Computing Conference (IACC), pp. 1234–1239. IEEE (2013)
96
V. Bhateja et al.
16. Prajapati, P., Narmawala, Z., Darji, N.P., Moorthi, S.M., Ramakrishnan, R.: Evaluation of perceptual contrast and sharpness measures for meteorological satellite images. In: Soni, A.K., Lobiyal, D.K. (eds.) 3rd International Conference on Recent Trends in Computing (ICRTC), Procedia Computer Science, vol. 57, pp. 17–24. Springer, India (2015) 17. Kumar, S.N., Lenin Fred, A., Ajay Kumar, H., Sebastin Varghese, P.: Performance metric evaluation of segmentation algorithms for gold standard medical images. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., Sahoo, M. (eds.) Recent Findings in Intelligent Computing Techniques. Advances in Intelligent Systems and Computing, vol. 709. Springer, Singapore (2018) 18. Aristidis, L., Vlassis, N., Veerbeek, J.J.: The global k-means clustering algorithm. Pattern Recogn. 36(2), 451–461 (2003)
Chapter 9
A Systematic Review on Automatic Speech Recognition for Odia Language Namita Mishra, Satya Ranjan Dash, Shantipriya Parida, and Ravi Shankar Prasad
Abstract Automatic speech recognition (ASR) systems need to be trained on large data sets of audio and its transcript of a particular language to produce more accurate results. In our daily life, the ASR system in our smartphones like Siri is very helpful for people who try to communicate in English; however, the same system is very less helpful to those who speak the “low resource” languages. From the previous analysis, it is found that more than 10% of searches are made by voice for the English language and also the majority of them are by Smartphones. Keeping all these aspects in mind, a successful ASR system will be beneficial for all to communicate, and also the life of the language would be prolonged. Else in due course of time, people will face a lot of problems communicating; as a result, widely spoken language (English) may replace Odia, and the Odia language will be abolished. No such work done on building an ASR system for Odia speech. In our paper, we plan to prepare an Odia speech corpus and also plan to use machine learning and deep learning techniques and various online toolkits, resources, and language models helpful to prepare an Odia ASR. Till now no work done in Odia ASR, so working in Odia ASR will be of great help to other researchers as well as society.
N. Mishra (B) School of Computer Engineering, KIIT University, Bhubaneswar, India e-mail: [email protected] S. R. Dash School of Computer Applications, KIIT University, Bhubaneswar, India S. Parida Silo AI, Helsinki, Finland R. S. Prasad Idiap Research Institute, Martigny, Switzerland © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_9
97
98
N. Mishra et al.
9.1 Introduction Automatic speech recognition is the technique that converts human speech to a written text format. The most common problem with ASR is that it is available in a very small variety of languages. ASR is used in a wide range of applications, like voice assistants, dictation software, and call centres. With the use of complex algorithms and different machine learning techniques, ASR analyses and understands different languages. In spite of the quick advance of programmed discourse acknowledgment ASR advances within the past few decades, acknowledgment of speech for low resource languages remains profoundly challenging. Odia language is spoken by more than 50 million people all over the world. Odia is spoken by more than 82% of native speakers of Odisha (formerly Orissa) also part of West Bengal, Jharkhand, and Chhattisgarh. It is the second official language in Jharkhand and the official language of Odisha. Additionally, a sizable community of at least 1 million individuals in Chhattisgarh speak the language [1]. Nevertheless, Orissa is also an educational hub of India, the scope for technical as well as medical education is significant, so students from different Indian states, as well as neighbouring countries, come to Odisha for higher education. “Speech recognition is the technique of converting human speech to text”. It enables users to interact with different machines using human voices. So this technique makes it more comfortable to give speech instruction instead of traditional input systems like keyboards or keypads. An automatic speech recognition system (ASR) should be able to receive human speech as input, “decipher” the input and extract the mentioned words and “reproduce” output action for the next device. ASR technique is going to take a revolutionary boom in future [2]. Human speech exhibits a wide variety of accents. This variation in speech accent is the major obstacle in training/realising speech recognition systems. People who speak more than one language exhibit more variation in accent than monolingual people. This variation increases when more factors are added into consideration, namely gender, location, environmental condition, etc. [3]. Another major hindrance in creating ASR is the availability of training data sets in all required languages. Currently, training data is available only for a small set of prominent languages. Different aspects of speech processing are acquisition, manipulation, storage, transfer, and output of speech signals. For major speech processing tasks, speech recognition is the input, and speech synthesis is the output [4]. Speech synthesis is the process to produce artificial speech with a goal of matching human abilities. Speech synthesis is the system to generate human speech. Speech synthesis is the exact opposite process of speech recognition [5]. Speech detection also known as voice activity detection (VAD) or Speech Activity Detection is the process used to detect the presence or absence of human speech in a given audio segment. VAD is mainly used as a pre-processing module in speech coding and speech recognition. It is used to facilitate speech processing as well as to deactivate relevant processes in silence sections in audio sessions [6]. Speech corpus is collected in a database with audio recordings and their corresponding labels. The
9 A Systematic Review on Automatic Speech Recognition for Odia …
99
labels are mostly chosen based on the desired task. For the ASR task, an audio file is input and its corresponding Text is the label; however, for TTS task Text is the input and its corresponding audio is the label. The audio recordings can be collected from podcasts, YouTube platforms, and talk shows (with permission). The major obstacles encountered during speech processing are i. The audio files may contain noise, creating a challenge for the ASR system to extract the desired information. ii. Required labels may not be available for all the tasks. iii. In the recorded audio files multiple speakers may overlap iv. Presence of speech or music like background information, which usually are more difficult to remove in comparison to other noise signals [1]. Speech pre-processing is the first stage of speech recognition. In this phase, the voiced and unvoiced signals are differentiated, and feature vectors are created. In this process, the speech signal is modified such that it becomes more acceptable to the system. Here, the speech x(n) is checked for any background noise d(n) as an additive disturbance. s(n) is the clean speech signal [1]. x(n) = s(n) + d(n). The basic model of speech recognition, in due course of time currently, speech signals are also used as one of the biometric recognition methods, and also it is used to interact with the computer as well as mobile devices. The ultimate aim of speech recognition is to develop the proper channel for communicating speech data to the computer. Additionally, speech has proven itself as the most used communication medium among human beings. With the advancement of Science, in the trend of automation, automatic voice recognition has taken significant attention for approximately six decades [7]. Two fundamental segments of a speech recognition system are feature extraction and classification. Two of the techniques for acoustic measurement are parametric and nonparametric. The parametric technique represents a temporal domain, e.g. the linear prediction [8], it is designed in such a way that it harmonises the resonant configuration of the human vocal tract and the corresponding sound is generated. Generally, we don’t recommend the linear prediction coefficients (LPC) for representing a speech as it apprehends the signal to be immobile and hence is incapable of assessing the localised events precisely. It is also unable to capture the voiceless and nasalized sounds accurately [9]. The nonparametric frequency domain method is known as the Mel-Frequency Cepstral Coefficients (MFCC) [10]. The feature extraction methods include Linear Predictive Analysis (LPC), Linear Predictive Cepstral Coefficients (LPCC), Perceptual Linear Predictive Coefficient (PLP), Mel-Frequency Cepstral Coefficients (MFCC), power spectral analysis (FFT), Mel scale cepstral analysis (MEL), relative spectra filtering of log domain coefficients (RASTA), first-order derivative (DELTA), and so on. The general architecture of ASR is shown in Fig. 9.1.
100
N. Mishra et al.
Fig. 9.1 General architecture speech recognition
9.2 Data Collection Techniques in ASR Several techniques are used for the collection of data. Some of the common techniques are listed below: 1.
Crowdsourcing:
In this process, the ASR system is trained and evaluated using data gathered from many people speaking specific phrases or sentences
9 A Systematic Review on Automatic Speech Recognition for Odia …
2.
101
Professional voice actors:
Here, the ASR system is trained and evaluated by recording a large amount of speech data from professional voice actors. 3.
Audio from real-world sources:
ASR systems can be trained and evaluated by listening to real-world audio like telephone conversations, podcasts, and TV shows. 4.
Synthetic data generation:
This entails utilising text-to-speech (TTS) systems to create speech information from the written content. This might be handy for building big amounts of information for training and assessing ASR systems. 5.
Transcription services:
Several firms provide transcription services provided, which transform audio recordings into the content. This information might be utilised to direct and assess ASR systems.
9.3 Related Work Prior to this research, a number of people gathered data and examined speech recognition. Three sections of the obtained data—voice pre-processing, feature extraction, and feature classification—can be separated using various methodologies. In this study [11], we created an Indian language speech input–output system that can help bridge the gap between the lab and the real world. Indian farmers would find this voice-activated technology highly useful for retrieving information about farming practices, such as the availability of seeds, fertilisers, pesticides, and farming techniques. The speech input–output system is extremely advantageous from two perspectives for the Indian agricultural sector. First, it requires farmers to enter their input queries orally, completely avoiding the requirement that the users be literate. Second, the system’s response is also given orally, making it more inclusive for all types of users. Using CMU Sphinx4, a voice input–output system was created to interpret the end user’s query and produce speech. For this research project, we have gathered a speech corpus in the official Indian language of Odia that is connected to Indian agriculture. The system’s effectiveness is assessed using the mean opinion score (MOS) test, percentages of word accuracy and word error rate, and other metrics. The word accuracy was determined to be 86.87% for known users and 75.13% for new users. The average score on the MOS test for voice output is 4.62. The application of the polynomial smoothing technique is demonstrated in the paper [12]. Then, adjust the standard magnitude difference function (SMDF) and
102
N. Mishra et al.
the modified autocorrelation function (MACF) (W-AMDF). The aforementioned techniques are known for separating addressed voice wave form according to the features. The method can successfully increase the sound files’ pre-signal extraction accuracy, increase the efficiency of signal processing, and decrease pitch, according to experimental results. Each person’s voice follows a set cycle. To produce changes in voice pitch, tone length, and temperature, the location of the speech signal pitch is changed in the voice file. The study [13] examines artificial neural network-based pre-processing of voice signals for voice recognition systems. It employs the eigenvalues decomposition technique, which is based on eigenvalues analysis of the data matrix or autocorrelation matrix. In particular at low signal-to-noise ratios, this method offers greater resolution and parameter estimates than other parametric methods. To put this strategy into practice, the speech message is broken down into phonemes, which can tell apart signal peaks, increases, and decays. The MATLAB envelope () function, which returns the upper and lower boundaries of the input sequence, is used for the pre-treatment. An approach for automatic annotation of speech corpora was presented in paper [14] employing transcriptions from two complementary ASR systems. Our tests demonstrated that ASR systems employing DNN plus HMM-GMM acoustic models produce much less identical errors than ASR systems using only HMM-GMM acoustic models. The technique utilised in this research seeks to generate a highquality, automatically generated, unsupervised speech corpus. The recently acquired speech corpus will be utilised to retrain the current ASR systems, enhancing their correctness of the transcription by increasing the models’ acoustic variability. In paper [15], the algorithm for detecting accent errors is carefully examined from three separate aspects: computation of posterior probability, testing model hypothesis, and supervised classification. In order to improve the system’s ability to distinguish between tone and stress, we concentrate on the simpler non-interpolation method of processing pitch curve information in DNN and incorporate the pitch into the HMM-DNN acoustic model. The underlying shared network is a common feed forward neural network, and the fundamental binary classifier is a binomial logistic regression model. Using a classifier for common education in a learning database for English, they confirm the effectiveness of accent error identification. In paper [16], a model for speech recognition is provided, in which each word’s LPC parameter is derived through feature extraction using the linear predictive approach. The support vector machines (SVM) approach is used in the appliance’s speech recognition part. Soft margin and least square are the names of these two SVM classifiers. As a result, the least square SVM classifier’s identification accuracy was 71%, while the soft margin SVM classifier’s recognition accuracy was 91%. However, in reality, issues such as pitch shifts, microphone positioning, background noise on the recording media, and changing parameters have all affected the success of recognition. One of the components in the application’s voice recognition part is identifying the ideal parameters by carrying out a parameter scan on a big scale for improvement in voice recognition.
9 A Systematic Review on Automatic Speech Recognition for Odia …
103
Audio Feature extraction mechanism ASR has long utilised the Mel-Frequency Cepstral Coefficients (MFCCs) method. However, more recent research using the deep learning technique has shown a new route for the field of improving voice recognition performance [17–19]. The relevant data set can be used to train a large number of DNNs using sophisticated machine learning algorithms. The hidden Markov model (DNN-HMV) is the most effective method for mapping a deep neural network with automatic speech recognition [20, 21]. Model DNN-HMV was used in place of the traditional Gaussian mixture model (GMM). Simple projection between the states of the HMM and the DNN-identified acoustic feature inputs for each of those states. Years ago, it was suggested to utilise a neural network in place of a GMM and to build a hybrid model that includes a multilayer perceptron with HMMs [22, 23]. Due to a lack of available computational resources, it was never put into practice. In earlier experiments, K. Noda et al. produced hybrid systems, however, they were unable to surpass GMM-HMM systems. Deep autoencoder is one of the most important methods in the field of applying DNN to ASR, and it is crucial for feature extraction. As an illustration, Sainath et al. use the deep autoencoder as a dimensionality compression technique for self-organising higher-level features from raw sensory inputs using the higherlevel features gathered as inputs to a standard GMM-HMM system [24]. A deep denoising autoencoder was also suggested by Vincent et al. [25, 26]. In contrast to the earlier model, this one uses the outputs of the deep autoencoder as the sensory feature rather than the compressed vectors obtained from the network’s intermediate layer. The partial destruction of the learnt expression by training the deep autoencoder to reconstruct the cleanly repaired input from the damaged partially destroyed input is a key concept of the denoising model. By teaching the network to reconstruct clean audio features from degraded audio characteristics, it uses a deep denoising autoencoder to collect noise-tolerant audio features. The GMM-HMM system is used in this investigation to process the recorded denoised audio characteristics. The primary benefit of GMM-HMM over DNNHMM is the seamless application of MSHMM as a multimodal integration mechanism for following tasks of AVSR. Visual feature extraction mechanisms It is well recognised that adding speaker lip movement as visual information to the ASR system can improve robustness and accuracy, especially in noisy circumstances. For extracting visual information from input photographs, previous studies have suggested a variety of various methods. [27, 28]. These methods can be roughly split into two kinds.[ The first strategy uses a top-down approach and incorporates an expression frame for the pre-lip lip shape in the model. The active shape model (ASMs) [29] and the active appearance model (AAMs) [30] are two examples. The form and look of the image in the mouth region serve as the basis for higher-level model-based features that ASM and AAM extract. Model-based features can be used to analyse internal representations explicitly. However, you need the specifics of the lip shape model and training data that is meticulously annotated by hand in order to create a statistical model that accurately depicts a genuine lip shape. The
104
N. Mishra et al.
second strategy is bottom up. You can estimate visual attributes directly from the image using a number of techniques. Principal component analysis (PCA) [31], the discrete wavelet transform [31], and the discrete cosine transform are dimensional compression methods. Since these methods do not require specialised lip shape models or manually annotated training data, they are frequently utilised to extract useful low-level image-based information. However, it is susceptible to shifts or rotations in the input image as well as variations in lighting. Because CNNs may be able to compensate for the shortcomings of traditional image-based feature extraction methods, we adopt a bottom-up strategy in this study and use them as a visual feature extraction mechanism. Building the ideal automated speech recognition (ASR) example for the intended use case requires a diverse approach to data collection. The example can learn to listen to and transcribe speech from a number of speakers by acquiring information from a variety of sources that include information from speakers with different accents, languages, and dialects. It is important to collect data from a variety of speakers since people from various generations may discourse in a same language but with different accents, intonations, pronunciations, and other differences that could affect how accurate the exemplar is. It helps the example to have a thorough understanding of peer speech that is spoken in a variety of ways and to produce more accurate transcriptions by accumulating information from different groups of speakers. Additionally, it is crucial to have a diverse and representative set of data in order to eliminate bias in the exemplar, which may affect how well it performs on particular groups of people.
9.4 Toolkits and Online Resources Much work has been done to complete the task of speech recognition. Part of the work what has been done over the years is available to us in the form of personal assistants such as Cortana and Siri. Most, if much research is done in this area the work has not been released. Some toolkits multimedia tools and applications published online resources and other commonly used resource tools are discussed below in Table 9.1. Table 9.1 Analysis of available toolkits Sl. no
Name of the tool
Code behind
Is open source
Supportive languages
1
CMU Sphinx
Java
True
English, French, Mandarin
2
Kaldi
C++
True
English
3
Julius
C
True
English, Japanese
4
HTK
C
False
English
5
RWTH ASR
C++
False
English
9 A Systematic Review on Automatic Speech Recognition for Odia …
105
9.5 Proposed Model for ASR See Fig. 9.2. Speech Input: We’ll use mobile phones to gather data from a nominal number of speakers, let’s say more than 5000. We will make sure that the speakers were from a variety of dialects and age groups when collecting the data. Five phonetically complex sentences are recorded for each speaker. Noise Reduction: The removal or reduction of unwelcome background noise from speech signals is the process of noise reduction in automatic speech recognition (ASR). Noise reduction aims to enhance speech signal quality and facilitate speech recognition by the ASR system. This is crucial because background noise can impede the ASR’s ability to recognise speech sounds and create output mistakes. Statistical models, spectral subtraction, filtering, and other methods can all be used to reduce noise. These methods are intended to increase the speech signal while suppressing the noise, thereby separating the speech signal from the noise. In an ideal listening situation (i.e. one with no background noise), such as a lab where a technician is training a new system, speech recognition systems can swiftly recognise and translate spoken words into strings of text that computers can use. In a real-world scenario, the ASR system receives more auditory input (speech plus noise versus speech alone). A voice recognition engine must first be able to distinguish between the words and background noise before it can begin to recognise the words (such as wind, other speakers, and cars). Word Segmentation: In ASR (Automatic Audio Recognition), word segmentation is the process of breaking down continuous speech signals into distinct words or semantic units. As it enables the system to reliably recognise and transcribe words, this phase in the ASR process is crucial. By analysing the audio stream and determining the borders between words based on prosodic cues such pauses, intonation, and stress patterns, Recurrent Character Generator
Speech Input
Noise Reduction
Confidence Evaluation
Word Segmentation
Word Matching
Fig. 9.2 Proposed model for ASR
Text Output
106
N. Mishra et al.
word segmentation is frequently carried out using an algorithm. The ASR system uses the generated word segments as input to convert the spoken words into text. It is crucial to utilise solid and trustworthy algorithms for this task because the word segmentation step’s accuracy directly affects the ASR system’s overall accuracy. To find new words in low-resourced languages, and to detect their dialects from audio files. Through cross-lingual word-to-phoneme alignment, we will investigate both the feasibilities and the difficulties of dialect detection from phoneme sequences. In our case, we took help of a human translator who utters in the target language (which is not written) from cues in a resource-rich source language. To aid in the process of word discovery and pronunciation extraction, we include the resource-rich source language prompts. Transcription: Transcribing spoken language into text is referred to as transcription in automatic speech recognition (ASR). This is an ASR system’s output, and it’s frequently used for dictation, voice-to-text conversion, and audio indexing, among other things. Speech recognition, language modelling, and post-processing are only a few of the phases involved in ASR transcription. The ASR system analyses the audio stream and recognises the uttered words during the speech recognition stage. In the language modelling stage, based on the identified words and the context in which they were uttered, the system employs statistical models to forecast the most likely transcription of the speech. Finally, the system employs a variety of rules and algorithms during the post-processing stage to enhance the accuracy and readability of the transcription. The accuracy of the ASR system, the calibre of the audio signal, and the complexity of the spoken language are only a few of the variables that affect how well the transcription is done. After gathering the voice data, we began the transcription process, which mainly relies on humans hearing the speech and manually recording what was said. Based on data collected from Odia phones and ILSL12 software, transcription of the data has been completed. Sample Odia phone is displayed below.
Word Matching: The technique of comparing detected speech to a list of potential transcriptional words is known as word matching in automatic speech recognition (ASR). This is a crucial phase in the ASR process since it enables the system to select the best transcription out of a variety of options. A dynamic programming technique, such
9 A Systematic Review on Automatic Speech Recognition for Odia …
107
the Viterbi algorithm, is frequently used to match words. This algorithm determines the likelihood of each potential word based on the observed speech and a language model. As a result, the system can make better informed decisions on the proper transcription thanks to the language model, which provides the probability of each word given the context in which it was uttered. The precision of word matching is a crucial component of ASR systems. The level of the recognition unit can be word, syllable, or phone level. Confidence Evaluation: The word error rate is used to assess accuracy (WER) [32]. WER = [(S + D + I )/N ] × 100 is the formula used to calculate the word error rate, (9.1) where S D I N
Number of substitutions Number of erasures Quantity of insertion Words in the reference
Text Output: In this step, the resultant Text is generated for the input speech.
9.6 Discussion and Future Scope This paper focuses on speech recognition and its various aspects. After emphasising various aspects of ASR, we may summarise that ASR majorly consists of three modules, the feature extraction module, the classification module, and the language model. Above we discussed some feature extraction techniques along with their scope and limitations. Similarly from the analysis of the language model, we observed that the addition of a language model to ASR is directly proportional to its accuracy. To improve accuracy in the Odia ASR system, large data set should be used, and a more enhanced noise reduction system should be followed.
References 1. Speech Dataset Creation: https://ogunlao.github.io/blog/2021/01/26/how-to-create-speech-dat aset 2. Gaikwad, S.K., Gawali, B.W., Yannawar, P.: A review on speech recognition technique. Int. J. Comput. Appl. 10(3), 16–24 (2010) 3. Forsberg, M.: Why is Speech Recognition Difficult. Chalmers University of Technology (2003)
108
N. Mishra et al.
4. 5. 6. 7.
Speech Processing: https://en.wikipedia.org/wiki/Speech_processin Speech Synthesis: Wikipedia Voice Activity Detection: https://en.wikipedia.org/wiki/Voice_activity_detection Shrawankar, U., Thakare, V.M.: Techniques for Feature Extraction in Speech Recognition System: A Comparative Study (2013). arXiv preprint arXiv:1305.1145 Itakura, F.: Minimum prediction residual principle applied to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 23(1), 67–72 (1975) Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Inc (1993) Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980) Mohanty, S., Swain, B.K.: Speech input-output system in Indian farming sector. In: 2012 IEEE International Conference on Computational Intelligence and Computing Research, pp. 1–5. IEEE (2012) Fan, Y., Sun-Hua, X., Ming-Hui, L., Guo-Feng, P.: Research on a new method of preprocessing and speech synthesis pitch detection. In: 2010 International Conference on Computer Design and Applications, vol. 1, pp. V1–399. IEEE (2010) Berdibaeva, G.K., Bodin, O.N., Kozlov, V.V., Nefed’ev, D.I., Ozhikenov, K.A., Pizhonkov, Y.A.: Pre-processing voice signals for voice recognition systems. In: 2017 18th International Conference of Young Specialists on Micro/Nanotechnologies and Electron Devices (EDM), pp. 242–245. IEEE (2017) Georgescu, A.L., Cucu, H.: Automatic annotation of speech corpora using complementary GMM and DNN acoustic models. In: 2018 41st International Conference on Telecommunications and Signal Processing (TSP), pp. 1–4. IEEE (2018) Rajasekhar, A., Hota, M.K.: A study of speech, speaker and emotion recognition using Mel frequency cepstrum coefficients and support vector machines. In: 2018 International Conference on Communication and Signal Processing (ICCSP), pp. 0114–0118. IEEE (2018) Eray, O., Tokat, S., Iplikci, S.: An application of speech recognition with support vector machines. In: 2018 6th International Symposium on Digital Forensic and Security (ISDFS), pp. 1–6. IEEE (2018) Feng, X., Zhang, Y., Glass, J.: Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1759–1763. IEEE (2014) Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012) Maas, A.L., O’Neil, T.M., Hannun, A.Y., Ng, A.Y.: Recurrent neural network feature enhancement: the 2nd CHiME challenge. In: Proceedings the 2nd CHiME Workshop on Machine Listening in Multisource Environments Held in Conjunction with ICASSP, pp. 79–80 (2013) Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2011) Mohamed, A.R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Process. 20(1), 14–22 (2011) Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach, vol. 247. Springer Science & Business Media (1994) Renals, S., Morgan, N., Bourlard, H., Cohen, M., Franco, H.: Connectionist probability estimators in HMM speech recognition. IEEE Trans. Speech and Audio Process. 2(1), 161–174 (1994) Sainath, T.N., Kingsbury, B., Ramabhadran, B.: Auto-encoder bottleneck features using deep belief networks. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4153–4156. IEEE (2012)
8. 9. 10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21. 22. 23.
24.
9 A Systematic Review on Automatic Speech Recognition for Odia …
109
25. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1096–1103 (2008) 26. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A., Bottou, L.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(12) (2010) 27. Lan, Y., Theobald, B.J., Harvey, R., Ong, E.J., Bowden, R.: Improving visual features for lip-reading. In: Auditory-Visual Speech Processing 2010 (2010) 28. Matthews, I., Cootes, T., Bangham, J., Cox, S., Harvey, R.: Extraction of visual features for lipreading. IEEE Trans. Pattern Anal. Mach. Intell. 24(2), 198–213 (2002) 29. Luettin, J., Thacker, N.A., Beet, S.W.: Visual speech recognition using active shape models and hidden Markov models. In: 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 2, pp. 817–820. IEEE (1996) 30. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001) 31. Matthews, I., Potamianos, G., Neti, C., Luettin, J.: A comparison of model and transform-based visual features for audio-visual LVCSR. In: IEEE International Conference on Multimedia and Expo, pp. 210–210. IEEE Computer Society (2001) 32. Droua-Hamdani, G., Sellouani, S.A., Boudraa, M.: Effect of characteristics of speakers on MSA ASR performance. In: 2013 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pp. 1–5. IEEE (2013)
Chapter 10
A Study on Influence Maximization in Complex Networks Chennapragada V. S. S. Mani Saketh, Kakarla Pranay, Akhila Susarla, Dukka Ravi Ram Karthik, T. Jaya Lakshmi, and Y. V. Nandini
Abstract Influence maximization deals with finding the most influential subset from a given complex network. It is a research problem that can be resourceful for various markets, for instance, the advertising market. This study reviews the dominant algorithms in the field of influence propagation and maximization from a decade.
10.1 Introduction The task of choosing a group of nodes (called a seed set) in a complex network that may maximize the spread of influence is known as influence maximization. It’s useful for many applications, including rumor management, competitive viral marketing, detecting outbreaks in information, and many more. The problem is defined as follows [1]: Definition Given a graph .G = (V, E) and an integer .k, the task of influence maximization is to find the set . S of cardinality .k such that .σ (S) is maximized, where .σ (S) denotes the total influence function of elements of the set . S. The problem is an optimization problem, as shown in Eq. 10.1. C. V. S. S. Mani Saketh · K. Pranay · A. Susarla (B) · D. Ravi Ram Karthik · T. Jaya Lakshmi · Y. V. Nandini School of Engineering and Sciences, SRM University, Amaravati, Andhra Pradesh, India e-mail: [email protected] C. V. S. S. Mani Saketh e-mail: [email protected] K. Pranay e-mail: [email protected] D. Ravi Ram Karthik e-mail: [email protected] T. Jaya Lakshmi e-mail: [email protected] Y. V. Nandini e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_10
111
112
C. V. S. S. Mani Saketh et al.
Fig. 10.1 Influence maximization
maximize
.
σ (S)|S ⊂ V, |S| = k.
(10.1)
Figure 10.1 explains the problem. Finding the optimal seed set and fitting a function that measures the influence are challenging. The problem is analogous to the classical subset construction algorithm, which proves that the problem of influence maximization is NP-hard. Various algorithmic approaches are used in the literature to determine the most influential seed set of nodes. Independent cascade model and linear threshold model are two primary algorithms. The following section describes the literature of influence maximization.
10.2 State of the Art The work of Domnigo et al. [1] aim at identifying the network value of a customer. The authors do this by modeling the market as a complex network of customers as nodes and their connections as edges. Customer’s probability of purchasing a product is computed as a combination of one’s intrinsic desirability and the influence of one’s neighbors. The network value of a node (customer) is then determined by inducing the constructed complex network as Markov random field. In the induced MRF, the network valued of a node is then computed by maximizing the expected profit of the node given its neighbors influence. The authors of [2] established the first verifiable approximation by picking the most influential nodes using NP in continuation of the study of [1]. The work of [2] discuss two basic diffusion models for influence spread. A small collection of active nodes start each model. The general approach of the influence propagation is based on the linear threshold model, in which each node randomly chooses a threshold
10 A Study on Influence Maximization in Complex Networks
113
value in the interval [0, 1] and becomes active if and only if the overall correlation between all of its neighbors exceeds the threshold value. This diffusion mechanism unfolds in discrete phases in a deterministic manner. It is nothing but if there are .n nodes that are active at .t step, then all the nodes active at t steps will be active and another set of nodes are chosen randomly for converting as active node at .t + 1 step. In independent cascade model, at each step an active node gives a chance to change its neighbors status from inactive to active based on probability of influencing neighbors. Reference [2] also introduces the concept of submodularity, which states that every function that meets the declining return property is submodular (i.e., adding an item to the set . S yields a benefit at least equal to that of including the same item in a super set of . S). The authors of [3] adopted a data-driven approach to study influence maximization. Credit distribution is a new model that uses available propagation traces to learn how influence travels in the network and then uses that information to estimate expected influence spread. This method also learns the various levels of user influenceability, and it is time-aware in the sense that it considers the temporal aspect of influence. Edge probabilities are rarely directly available in real life, even when the directed graph describing a complex network is. Early research assumed edge probability since true action propagation traces are hard to get. The following are some of the approaches used to assign probability to edges: 1. Considering them as constants 2. Computing values at random from a small set of constants or 3. Defining them as the reciprocal of a node’s in-degree. Real-life scenarios like product advertising strategy, movie promotions, etc., also use influence maximization problems by taking their dataset as a social network. Such real-life scenarios have larger social networks; therefore, the existing solutions have to be scaled up to get the optimal solution. To further explain this, in [4], Tang et al. propose a solution for influence propagation in a complex network where nodes are topics and links are topic associations. The authors propose a Topic Affinity Propagation (TAP) model to capture node topic distributions, network structure, and similarity between nodes. TAP is a cost-effective model that is created using efficient distributed learning techniques using the map-reduce framework. TAP models the problem as a factor graph representation of graphical probabilistic model. In the TAP model, Wang et al. [4] offered two propagation methods suitable for Map-Reduce framework: (1) based on information spread in graphical models and (2) a parallel update mechanism. The work achieves an improved scalability of the distributed learning algorithm. In [5], Fa-Hsien Li et al. introduced labeled influence maximization which discovers the set of seed nodes that can induce the largest influence propagation on a set of targeted customers in a labeled social network. Target marketing is the key for this approach as it involves splitting of the market into subsets and then focusing on the marketing on the relevant subsets. Diverse profits from a category of users are also considered for effective marketing. Labeled influence maximization requires a dataset network in which each node is labeled, each edge has a weight (influence
114
C. V. S. S. Mani Saketh et al.
probability), target labels, profit value for each target label, and a budget dependent on the number of seed nodes chosen. To solve this problem, three different approaches have been proposed that can be categorized into Labeled New Greedy and Labeled Degree Discount belong in the first category where the existing influence maximization algorithms are modified considering the label information and profits. In the second category, interactive mechanisms help in planning, evaluating, and advertising in a real-time manner. The idea behind [5] is to plan strategies where the labeled network can be procured beforehand and the seed set can be found offline exhaustively for all combinations of target labels. As this is impractical, the Maximum Coverage method has been proposed to evaluate the influence potentials that are measured based on the proximities between the nodes in the network. Labeled New Greedy Algorithm: Edges are chosen beforehand based on the influence probability . p, and the ones that are not chosen are removed from the graph and a new trimmed network is obtained and the nodes in the new network that are reachable are the ones that can be successfully influenced. Labeled Degree Discount Algorithm: It is expected that a node with high degree values tends to influence greater and have higher probability of being in the seed set. This influence is evaluated and updated in each selection round (i.e., for k seed nodes, Degree Discount is performed .k times). In this algorithm, the expectation is computed in terms of the probability that a given node fails to be activated and also the influence profit expected when the node activates its neighbors with target labels. In addition to this, the degree of a node is modified as the number of neighbors with targeted labels, and the seed set is obtained accordingly. Maximum Coverage Greedy: This allows offline computing as discussed and then followed by procuring the top-k seed nodes for target labels and corresponding profits, and it is based on the closeness of the nodes. The problem is considered as a Maximum k Coverage Problem, and the seed set that we require is the one that covers all the nodes with maximum profit. As the Maximum k Coverage Problem is NP-hard, a greedy algorithm has been developed in which a node that can increase the total profit as much as possible in each round of selection is picked. Yet another interesting approach is that, in a social network, the influence factors are different for people. For instance, one may be more influential on fashion and other on gadgets. Hence, it might be more logical to implement topic-aware influence. But this could be tough to achieve if there are too many items with various topic combinations, which is why Wei Chen et al. have focused on preprocessing every topic influence so as to decrease the time in finding the seed set. Barbieri et al. proposed independent cascade and linear threshold models in the context of topic-awareness [6]. Two models in this context are discussed in [6], namely best topic selection (BTS) and marginal influence sort (MIS). BTS algorithm minimizes computation time by simply using a seed set for one of the constituent topics in the topic mixture that has the best influence performance, whereas MIS algorithm avoids slow greedy computation, by using pre-computed marginal influence of seeds on each topic. The authors conducted experiments on two real-world datasets: Flixster and Arnetminer. Flixster is a social movie site based in the USA. The data has been modeled as a
10 A Study on Influence Maximization in Complex Networks
115
directed graph containing directed edges between two user nodes. There are around 29k nodes and 425k directed edges and 10 themes in this graph. Due to an insufficient number of samplings of propagation events over the edges when the maximum likelihood estimation approach was applied, the influence probabilities larger than 0.99 were replaced with random numbers based on the probability distribution of all 0.99 probabilities. Individual probability that was deemed insufficient was also eliminated. Arnetminer is a free web service that indexes and searches academic social media networks. This dataset contains the subject distribution of all nodes as well as the network topology, which includes authors, nodes, and edges connecting them if they coauthored a publication. There are 5k nodes in this network, 34k directed edges, and eight themes. The basic statistics reveal that, despite similar means and other characteristics, certain themes are more likely to spread than others. Different topics overlapping on edges and nodes have been investigated for both the Flixster and Arnetminer datasets. Edge overlap and node overlap coefficients have been created to do this. The two topics are distinguished if both the stated threshold and the overlap coefficients are modest, i.e., fully separable networks. When comparing the Flixster and Arnetminer datasets, it was discovered that the Flixster dataset had higher topic overlaps. As a result, Chen et al. discovered that topic classification based on influence probabilities is hugely dependent on network. Another approach has been discussed in [7]; the selection of the best subset under noisy monotone function having additive and multiplicative noises is being researched. Influence maximization is one of the major applications of subset selection which is NP-hard. It is seen that, greedy algorithm gives an optimal solution for influence maximization, Pareto Optimization for Subset Selection (POSS) is yet another algorithm which gives similar optimized solution as greedy algorithm. The main objective is to extend the approximation ratio of greedy algorithm from the submodular case to the general situation by slightly improving it by even proving the approximation ratio of POSS is nearly the same as greedy algorithm, and at times, it could be even better and finally. PONSS algorithm is proposed in [7] which is similar to POSS, but the only difference is when there are two close noisy objective values, POSS selects the best one, whereas PONSS selects both in order to reduce the risk of losing a good solution. For influence maximization PONSS .> POSS .≥ Greedy algorithm for any problem with noise. For many subset selection situations without noise, the greedy method provided the best approximation ratio; however, its performance for noisy subset selection has yet to be studied theoretically. Generally, in this algorithm, items are added iteratively with the largest . F improvement for .k items. POSS reformulates the original problem as a bi-objective maximization problem where it maximizes the original objective and minimizes the subset size simultaneously. In this algorithm, an empty set is chosen where in each iteration, a new set of solutions is chosen, if the new subset dominates the previously available subsets, then the subsets which are weakly dominating are replaced with the new subset which is dominating all the remaining subsets available. After .t iterations, the solution with the largest dominating value satisfying the size constraints is chosen.
116
C. V. S. S. Mani Saketh et al.
In many of the cases like problems with multiplicative noises or even with additive noises, the approximation bounds of greedy algorithm and POSS are nearly similar or even same, but the only difference is there would be no misleading search directions in POSS. Pareto Optimization for Noisy Subset Selection (PONSS) is the updated version of POSS where there is a threshold value in order to delete any subset. Whenever a new subset is included, old subsets will be only excluded if it has the . F (Noise Objective) value less than the threshold value. The number of subsets available could be more when we use PONSS which in turn reduces the efficiency, so, in order to reduce this problem a parameter . B is assigned to limit the number of subsets where even though the . F value of all the remaining subsets aren’t less than the threshold value, if the . B limit is reached, then the one with least . F value will be deleted. Wang et al. [8] define a variant of the influence maximization problem that incorporates social reinforcement and memory effects. A propagation model has been constructed to represent the dynamics of memory and social reinforcement. Then, in order to address the model-based difficulty, two fundamental algorithms were modified. It has been discovered that the technique catches numerous fundamental components of viral marketing through social media networks, such as social reinforcements, which may explain certain events that cannot be described in any other way. The memory effect illustrates how previous interactions within a social network can affect the flow of information in real time. According to the social reinforcement effect, if multiple neighbors endorse and share information with you, you are more likely to endorse it as well. Memory and social reinforcing effects were considered in [8] in order to transform the standard impact maximization problem into a genuine viral marketing problem. Dependent cascade model is proposed for viral marketing that captures the dynamics of the memory effect and social reinforcement influence. Reference [8] extends a novel model to account for dynamic-memory and social-reinforcement effects. According to experimental data, the new model is capable of capturing various crucial characteristics of viral marketing in networks and explain certain events that cannot be explained by the current formulation of the problem. In addition, the temporal complexity of the new model is reduced [9]. ICM is the most frequently employed propagation model. In contrast, ICM considers only two states: active and inactive. Furthermore, the process’s history has no bearing on the likelihood that a node would change from inactive to active. This is insufficient to address the issue of viral marketing. A unique propagation model has been proposed in this study [9]. For viral marketing challenges, the dependent cascade model is more suited. The influence maximization problem has been reformulated and investigated using DCM4VM. The problem could have numerous ideal solutions. This is an NP-hard issue. The new algorithms developed in [8] are • General Greedy Algorithm with time complexity . O(k N T ). • Degree Discount Algorithm which has the time complexity . O(kmnT ). • Exchangeable Greedy Algorithm with . O(k E).
10 A Study on Influence Maximization in Complex Networks
117
where .k is the initial seed size, . N is the .#nodes; . E is .#edges; .T is propagation time, and .m, n are usually less than . N . To account for dynamic changes in memory and social reinforcing effects, a novel propagation model is introduced and improved. According to experimental data, this model is capable of capturing various crucial characteristics of viral marketing in social networks and explaining certain events that cannot be explained by the current formulation of the impact maximization problem. Moreover, the model has a reduced time complexity. Finally, various behavior-aware models have been proposed to identify influential users in social networks. Models that are aware of and agnostic of behavior have been introduced. Behavior agnostic models look at the topology of graphs to find influential users, whereas behavior-aware models look at user behavior to find influential users. According to [10], behavior-aware models are more realistic in nature for the influence maximization problem because they are more relevant to the real world. Interest-aware, opinion-aware, money-aware, and physical-world-aware models are the four types of behavior-aware models. Interest-aware models consider the user’s preferences and interests. Users’ responses to a message are taken into account in opinion-aware models. Money-conscious approaches weigh the costs and benefits of disseminating an advertising inquiry. The geographic location of the users is taken into account in physical-world-aware models. According to the article, behavior-agnostic models are not always precise enough to detect influential nodes in real-world applications because this approach uses a mechanical approach to link people over a network. Behavior-aware models face their own set of challenges, such as obtaining user behavior data. And, once obtained, the data must be pre-processed before it can be used. User information may also be private, and obtaining it may jeopardize user privacy. Finally, because large amounts of new data are generated and exchanged between users on a daily basis, it is critical to consider the time complexity of various methods that address influence maximization when considering a behavior-aware approach. Table 10.1 summarizes the literature.
10.3 Conclusion In the context of influence maximization, this work summarizes various approximation algorithms such as greedy algorithms, hill climbing search algorithm, POSS algorithm, DCM4VM method, PONSS algorithms, behavior-aware models, TAP model and on observation, it is seen that the diffusion models (linear threshold model, independent cascade model) are base models for every other approach and are tweaked according to the requirement. There are many approximation algorithms which are yet to be explored. The problem of influence maximization has been addressed using pairwise interactions modeled as a graph in the literature. But
118
C. V. S. S. Mani Saketh et al.
Table 10.1 Literature summary S. No. Citation Year 1
Domingos et al. [1]
2001
2
Kempe et al. [2]
2003
3
Bharathi et al. [11]
2007
4
Tang et al. [4]
2009
5
Li et al. [5]
2011
6
Goyal et al. [3]
2011
7
Chen et al. [6]
2016
8
Qian et al. [7]
2017
9
Wang et al. [8]
2019
10
Zareie et al. [10]
2021
Contribution One of the first researchers that represent market as a complex network with nodes as customers and their connections as edges. The network value of a node (customer) is then determined by inducing the constructed complex network as Markov random field. In the induced MRF, the network valued of a node is then computed by maximizing the expected profit of the node given its neighbors influence This paper is a continuation study of [1], where provable approximations are selected by using NP-hard optimization. Diffusion models like linear threshold model, independent cascade model are discussed and compared with the greedy and hill climbing search algorithms Scenarios in which multiple innovations compete within a social network are discussed. Arguments are raised by claiming that the independent cascade model can be naturally generalized to multiple competing influences Topic-based influence analysis is done by using a graphical probabilistic model and proposed topical affinity propagation (TAP) approach. New TFG model training algorithm is presented to address efficient problems Label influenced maximization in social networks for target marketing using Maximum Coverage greedy method, Labeled New Greedy algorithm, Labeled Degree Discount algorithm Leveraged available traces of past propagations and credit distribution model directly estimates influence spread by utilizing historical data Topic aware influence maximization combined with best topic selection (BTS) and marginal influence sort (MIS) preprocessing algorithms on Flixter and Arnetminer datasets Greedy algorithms, POSS and PONSS algorithms are analyzed and compared to find the best subset under noise Proposed a new propagation model for capturing the dynamic changes of the memory effect and social reinforcement effect Various behavior-aware models have been proposed and explained
10 A Study on Influence Maximization in Complex Networks
119
most complex networks are hypergraphs naturally. Little work has been done by taking this representation in literature. In the future, we aim to work this problem on hypergraphs.
References 1. Domingos, P., Richardson, M.: Mining the network value of customers. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 57–66 (2001) 2. Kempe, D., Kleinberg, J., Tardos,É.: Maximizing the spread of influence through a social network. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 137–146 (2003) 3. Goyal, A., Bonchi, F., Lakshmanan, L.V.S.: A data-based approach to social influence maximization. arXiv preprint arXiv:1109.6886 (2011) 4. Tang, J., Sun, J., Wang, C., Yang, Z.: Social influence analysis in large-scale networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 807–816 (2009) 5. Li, F.-H., Li, C.-T., Shan, M.-K.: Labeled influence maximization in social networks for target marketing. In: 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, pp. 560–563. IEEE (2011) 6. Chen, W., Lin, T., Yang, C.: Real-time topic-aware influence maximization using preprocessing. Comput. Soc. Netw. 3(1), 1–19 (2016) 7. Qian, C., Shi, J.-C., Yu, Y., Tang, K., Zhou, Z.-H.: Subset selection under noise. Advances in Neural Information Processing Systems, vol. 30 (2017) 8. Wang, F., Zhu, Z., Liu, P., Wang, P.: Influence maximization in social network considering memory effect and social reinforcement effect. Future Internet 11(4), 95 (2019) 9. Azaouzi, M., Mnasri, W., Romdhane, L.B.: New trends in influence maximization models. Comput. Sci. Rev. 40, 100393 (2021) 10. Zareie, A., Sakellariou, R.: Influence maximization in social networks: a survey of behaviouraware methods. arXiv preprint arXiv:2108.03438 (2021) 11. Bharathi, S., Kempe, D., Salek, M.: Competitive influence maximization in social networks. In: Internet and Network Economics: Third International Workshop, WINE 2007, San Diego, CA, USA, 12–14 Dec 2007. Proceedings, vol. 3, pp. 306–311. Springer (2007)
Chapter 11
A Survey on Smart Hydroponics Farming: An Integration of IoT and AI-Based Efficient Alternative to Land Farming Snehal V. Laddha, Pratik P. Shastrakar, and Sanskruti A. Zade
Abstract Hydroponics-based farming is a sustainable, pesticide-free, and ecofriendly method to produce crops of higher quality and uses fewer resources than traditional methods. In hydroponics systems, temperature, humidity, and water are all important environmental factors that influence plant growth. The system automatically adds the necessary nutrient solution to the water while also collecting data on how much solution is needed based on the solution’s electrical conductivity. It ensures that plants receive the nutrients they need to thrive. In hydroponics, artificial intelligence (AI) can be used to analyze environmental conditions and make changes, as well as to analyze plant growth rates and calculate an estimated harvesting time for the plants. AI in farming can detect plant disease and recommend ways to treat it. This paper discusses various techniques based on IoT, AI, and image processing that have been presented by researchers all over the world for the implementation of an automated hydroponics system, as well as subsequent discussions that can help improve this domain.
11.1 Introduction The difficulty of food production in the twenty-first century gets more important as the world’s population grows. By 2050, it is expected that the world’s biodiversity will support between 9.4 and 10.1 billion people. Planting and livestock areas are S. V. Laddha (B) · P. P. Shastrakar · S. A. Zade Department of Electronics Engineering, Shri Ramdeobaba College of Engineering and Management, Nagpur, Maharashtra 440013, India e-mail: [email protected] P. P. Shastrakar e-mail: [email protected] S. A. Zade e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_11
121
122
S. V. Laddha et al.
increasingly needed for food production [1]. Human-caused environmental changes have the potential to prevent the development of new crops. Excessive fertilizer and chemical use to boost crop growth and yield is harming and eroding the soil. A consequence of urbanization is a reduction in labor in areas typically involved in food production, an increase in costs, and a reduction in production capacity [2]. Smart farming uses a variety of methods and technologies to overcome challenges such as rising food production requirements and reduced workforces [3, 4]. Smart farming, for example, a management information system and a data analysis solution, may collect data using various types of sensors (for instance, temperature, humidity, light, pressure, TDS, etc.) and communicate data via communication networks, and through AI and neural networks, assumptions are made and actions are taken to benefit plant growth [5]. Described as a network of interconnected devices (IoT), the Internet of things is a term used to describe a network of interconnected devices. The right amount, location, and timing of actions can boost productivity and reduce waste when using smart farming data [6]. Network communications, hardware size reduction, power consumption optimization, and device cost are examples of such technological advancements. In addition, the world’s largest agricultural producers have implemented incentives and policies to encourage the use of IoT in smart farming [7]. In the recent past, several reviews on IoT solutions for hydroponics have been published, indicating that new contributions are constantly being made to this research field, and it is continuously improving. Reviews of existing hydroponic systems cover a variety of topics, including various methods and topologies, and cloud-based platforms.
11.1.1 Different Types of Agriculture Process There are two types of scenarios: indoor and outdoor. Environmental factors such as solar radiation, rain, and wind are prevented from entering indoor environments. Greenhouses, hydroponics, crop beds, pots, and other indoor environments are examples. Environments for outdoor scenarios, on the other hand, are more vulnerable to climatic changes. To survive, every plant requires favorable environmental conditions. Hydroponics can be used to produce vegetables in indoor controlled environments in colder regions where certain crops cannot survive due to harsh weather conditions (Fig. 11.1).
11.1.2 Hydroponics The technique of hydroponics (land) involves growing plants without soil in water (salinated or fresh). In either static or flowing solutions, plant nutrients are delivered to the roots. Both greenhouses and glasshouses can be used to grow hydroponics. Temperature, pressure, and humidity are the limitations of a greenhouse environment.
11 A Survey on Smart Hydroponics Farming: An Integration of IoT …
123
Fig. 11.1 Types of agricultural methods that are used around the world Fig. 11.2 Illustration of hydroponics with a nutrients reservoir
A hydroponic growing system must also monitor and maintain PH value and electrical conductivity. It is simple to monitor the plants manually, however, failing to do so will lead to the plants dying [8] (Fig. 11.2).
11.1.3 Techniques of Hydroponics There are several hydroponic techniques used to grow plants without soil. Here are some of the most common ones: Deep Water Culture (DWC). This is a simple hydroponic method where the plants are suspended in a nutrient solution with their roots submerged in water. The nutrient
124
S. V. Laddha et al.
a)
b)
Fig. 11.3 a Nutrient film technique, b deep water culture
solution is constantly aerated and circulated with an air pump, providing oxygen to the roots. DWC systems are easy to set up, inexpensive, and suitable for growing a variety of plants. Nutrient Film Technique (NFT). As mentioned earlier, NFT involves a shallow trough with a thin film of nutrient solution flowing continuously over the roots of the plants. The nutrient solution is pumped from a reservoir and flows through the trough by gravity. The excess nutrient solution is collected back into the reservoir for recirculation. NFT systems are efficient, require less water than other hydroponic methods, and are ideal for growing smaller plants with shallow roots. Both DWC and NFT are popular hydroponic methods for beginners due to their simplicity and ease of use. They are also suitable for commercial growers who require a system that is efficient, productive, and requires minimal maintenance. Circulating hydroponic systems are more efficient in terms of nutrient use, require less water than traditional farming, and can produce higher yields in a smaller space. However, they require careful monitoring of pH, nutrient levels, and water quality to ensure optimal plant growth (Fig. 11.3).
11.1.4 Why Hydroponics Over Other Smart Agriculture Procedures Studies on hydroponic farming are increasing, and there is still much potential for development. Hydroponics is a popular smart agriculture procedure because it uses resources efficiently, produces higher yields, allows for a controlled growing environment, is space efficient, reduces the use of pesticides, and enables year-round production. These advantages make it an attractive option for growers looking to increase efficiency, productivity, and sustainability [9]. However, hydroponics requires careful management and monitoring and can have high initial setup costs.
11 A Survey on Smart Hydroponics Farming: An Integration of IoT …
125
11.2 Related Work In various hydroponics systems, Saraswathi et al. [10], manual monitoring is used, which is a simple task that must be completed or the plants will perish. The first goal of this project is to automate the monitoring of the greenhouse environment. The next step is to automate the maintenance of pH levels and electrical conductivity. A mobile app transmits the retrieved data via the Internet to the user’s mobile phone, allowing for easier monitoring and maintenance. The data is uploaded to the Internet via IoT (mass storage), and a mobile app transmits the current status via the Internet to the user’s mobile phone. Nalwade et al. [11] created a completely automated hydroponic system that can be used to teach business skills and be integrated into the agricultural curriculum. One of the main points of interest is that products can be developed in areas where the soil is infertile. The yield of the farm with the right amount of nutrients, light, water, and temperature can be accomplished using Arduino-based water and nutrient solution automation, as well as proper temperature management. Eridani et al. [12] used electrical conductivity to determine the nutrient concentration in the solution (EC). The proposed system is capable of automatically delivering water whenever the water level reaches its lowest point and automatically adding nutrients when the nutrient solution concentration falls below 800 ppm, according to the results. A project by Manohar et al. [13] included numerous harvests of perfect qualities that were kept in the cloud, and any boundaries that went abruptly low or were otherwise constrained were done so by a node MCU that was connected to IoT. The goal of IoT is to send the recovered data to the web (mass storage), and PCs are used to communicate the current status to clients via the web in order to make checking and support easier. Singh et al. [14] discussed a few techniques for recognizing diseases in crops that are based on machine learning and image processing that have been presented by researchers all over the world. Various image recognition techniques are used to analyze the disease, and solutions are suggested; later discussions are presented that can aid in improving this domain. We have done a comparative analysis of various hydroponic techniques and work done by different researchers as given in Table 11.1.
11.3 Methodology It is necessary to look further into some fundamental concepts in order to study the problems surrounding the usage of IoT in agriculture in the twenty-first century.
126
S. V. Laddha et al.
Table 11.1 Results of research on various hydroponics techniques by different researchers References
Type
Monitoring
[10]
IoT-based, pH sensor, EC sensor
pH and electrical Automate the greenhouse environment conductivity monitored monitoring and automation of pH and via a mobile app electrical conductivity maintenance
Automated/function
[11]
IoT-based, Arduino UNO, LDR, pH, EC sensors
Temperature, pH humidity, EC, environmental light
[12]
Arduino UNO, GP2Y0A21 proximity sensor, TDS sensor
Electrical conductivity, Nutrient solution concentration water level automation using servo motor and water level detection
[13]
Water float sensor, LM35, pH, dampness sensor
Various parameters are recorded, and plant growth is predicted using ANN model
Feed forward back propagation method using artificial neural networks (ANN) algorithm
[14]
Digital image recognition
Images are taken at regular intervals, diseases are predicted
Plant disease is predicted using probabilistic neural networks
[15]
IoT, deep neural networks, Raspberry Pi, Arduino
pH, temperature, humidity, level, lighting, water pump
Proper growth plan is analyzed by applying Bayesian network and artificial neural network to the system
[16]
IoT, Raspberry Pi 3 model B
pH, temperature, level, AI system to do hydroponic farming in light sensor, water level closed environment which will sensor automatically deliver mix of water and nutrient solution along with light
The non-circulating method and root dipping technique used for nutrients supply
11.3.1 Intelligent Farming Intelligent farming is the application of auxiliary technology to agricultural production practices in order to reduce waste and increase productivity. As a result, technological tools are used by intelligent farms for numerous aspects of the production process; this includes monitoring plantations, managing soil and water, watering, pest control, and tracking deliveries [17]. Only a few examples of these resources include agricultural information management systems, global positioning systems, and communication networks can measure temperature, luminosity, humidity, pressure, ground chemical concentration, and unmanned flying apparatuses [18]. An important factor to take into account is how technical resources are integrated into the agricultural production process. By 2023, it is expected that the precision agriculture industry would bring in $10 billion in revenue [19], providing opportunities for manufacturers, suppliers, and makers of agricultural gear and equipment.
11 A Survey on Smart Hydroponics Farming: An Integration of IoT …
127
Additionally, it is anticipated that smart farms will improve food production by improving fertilizer distribution to the soil, minimizing the need of pesticides, and using less hydraulic water [20].
11.3.2 IoT-Based Farming Essentially, the Internet of things (IoT) consists of interconnected intelligent objects that exchange messages and collect useful information about the environment they live in. As a result, almost any piece of technology that can connect to the Internet, including household appliances, electronics, furniture, machinery for agriculture or industry, and even people, can be referred to as a “thing” in the context of the Internet of things (IoT) [21]. The researchers in [22] state that despite the similarities between IoT systems and traditional computer systems, their design must take into account their special characteristics such as the computing limits of devices and the identification, detection, and control of remote objects.
11.3.3 Intelligent Agriculture An integrated agricultural product control system, an organic agricultural product traceability system, and an intelligent system of agricultural professionals make up intelligent agriculture. It links the two, enabling the construction of a contemporary green farming system with a low carbon footprint, high production, and effective use of energy using an Internet platform and cloud computing method, which also allows for the intellectualization of agricultural management, the digitalization of agricultural data, and the automation of agricultural production [23]. Agriculture has progressed to the next level of development. It is built on the business nodes of the production site, which include artificial and agricultural growth phases, early warning systems, real-time environmental monitoring, and data collection. In order to accomplish scientific production and lean management, as well as to enhance the quality of agricultural product cultivation and aquaculture, it also encompasses soil and water resource monitoring, as well as seed, pesticide, and fertilizer screening and feeding requirements [24]. On the one hand, information technology can be used by intelligent agriculture to manage production processes in a standardized and quantitative way, allocate resources based on regional production conditions and conditions for agricultural productivity growth, cut resource consumption, achieve green agriculture, and direct the healthy growth of the agricultural market. On the other side, farmers now have additional options to make money thanks to the development of intelligent agriculture [25–27]
128
S. V. Laddha et al.
Artificial intelligence can be used in hydroponics because all of the sensors record data and the various parameters are monitored. If the value of any parameter fluctuates above or below the bound, it is automatically handled. If the pH of the nutrient solution falls below a certain level, for example, the sensors record that the appropriate action is taken, which is to add more nutrients to the solution.
11.4 Discussion Farming has always been a labor-intensive activity. Agriculture is completely reliant on the environment. Furthermore, excessive fertilizer use has degraded the fertile soil’s quality. This method, however, had several drawbacks, including the fact that it was a time-consuming process that required a human to inspect the plant at each stage. Crops grown in hydroponics are self-sufficient and grow in controlled environments. The crops are grown organically, without the use of chemical fertilizers, and are therefore healthier than those grown traditionally. Machine learning and image processing techniques have been widely used in farming and recognition. The plant’s progress is tracked and sent to the cloud, where various data is gathered. Artificial intelligence in hydroponics can figure out when the best time is to harvest crops so that they are fully mature. The height of the crop is measured, and the rate of growth is calculated to aid disease detection.
11.5 Conclusion The integration of IoT and AI technologies in hydroponics farming has the potential to greatly improve the efficiency and productivity of the agricultural industry. The use of IoT sensors and devices allows for real-time monitoring and control of environmental conditions, such as temperature, humidity, and nutrient levels, in hydroponic systems. This can lead to optimized growth conditions for plants and a reduction in the use of resources such as water and fertilizers. Additionally, the use of AI algorithms can enable the prediction of crop yields and the detection of early signs of disease or pests, allowing for timely intervention and preventative measures. In summary, the use of IoT and AI in hydroponics farming has the potential to revolutionize the agricultural industry by enabling more efficient and sustainable crop production. However, it is important to note that more research is required to fully understand the potential benefits and drawbacks of these technologies and to develop practical solutions that are suitable for the farmers.
11 A Survey on Smart Hydroponics Farming: An Integration of IoT …
129
References 1. United Nations, Department of Economic and Social Affairs, Population Division: World Population Prospects 2019: Highlights. United Nations Department for Economic and Social Affairs, New York, NY, USA, 2019; Satterthwaite, D.: The implications of population growth and urbanization for climate change. Environ. Urban. 21, 545–567 (2009) 2. Satterthwaite, D.: The implications of population growth and urbanization for climate change. Environ. Urban. 21, 545–567 (2009) 3. Walter, A., Finger, R., Huber, R., Buchmann, N.: Opinion: smart farming is key to developing sustainable agriculture. Proc. Natl. Acad. Sci. U.S.A. 114, 6148–6150 (2017) 4. Wolfert, S., Ge, L., Verdouw, C., Bogaardt, M.-J.: Big data in smart farming—a review. Agric. Syst. 153, 69–80 (2017) 5. Pivot, D., Waquil, P.D., Talamini, E., Finocchio, C.P.S., Corte, V.F.D., de Mores, G.V.: Scientific development of smart farming technologies and their application in Brazil. Inf. Process. Agric. 5, 21–32 (2018) 6. Leonard, E.C.: Precision agriculture. In: Encyclopedia of Food Grains, vol 4, pp. 162–167. Elsevier, Amsterdam, The Netherlands (2016). ISBN 9780123947864 7. Joint Research Center (JRC) of the European Commission, Zarco-Tejada, P., Hubbard, N., Loudjani, P.: Precision Agriculture: An Opportunity for EU-Farmers—Potential Support with the CAP 2014–2020. European Commission, Brussels, Belgium (2014) 8. Saraswathi, D., Manibharathy, P., Gokulnath, R., Sureshkumar, E., Karthikeyan, K.: Automation of hydroponics Green House Farming using IOT. In: 2018 IEEE International Conference on System, Computation, Automation and Networking (ICSCA), 2018, pp. 1–4 9. Sardare, M.: A review on plant without soil—hydroponics. Int. J. Res. Eng. Technol. 02, 299–304 (2013) 10. Saraswathi, D., Manibharathy, P., Gokulnath, R., Sureshkumar, E., Karthikeyan, K.: Automation of hydroponics Green House Farming using IOT. In: 2018 IEEE International Conference on System, Computation, Automation and Networking (ICSCA), 2018, pp. 1–4. https://doi. org/10.1109/ICSCAN.2018.8541251 11. Nalwade, R., Mote, T.: Hydroponics farming. In: 2017 International Conference on Trends in Electronics and Informatics (ICEI), 2017, pp. 645–650.https://doi.org/10.1109/ICOEI.2017. 8300782 12. Eridani, D., Wardhani, O., Widianto, E.D.: Designing and implementing the Arduino-based nutrition feeding automation system of a prototype scaled nutrient film technique (NFT) hydroponics using total dissolved solids (TDS) sensor. In: 2017 4th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), 2017, pp. 170–175. https://doi.org/10.1109/ICITACEE.2017.8257697 13. Manohar, G., Sundari, V.K., Pious, A.E., Beno, A., Anand, L.D.V., Ravikumar, D.: IoT based automation of hydroponics using node MCU interface. In: 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), 2021, pp. 32–36.https://doi.org/ 10.1109/ICIRCA51532.2021.9544637 14. Singh, T., Kumar, K., Bedi, S.S.: IOP conference series: materials science and engineering. In:1st International Conference on Computational Research and Data Analytics (ICCRDA 2020), vol. 1022. IOP Publishing Ltd. 24th Oct 2020, Rajpura, India 15. Mehra, M., Saxena, S., Sankaranarayanan, S., Tom, R.J., Veeramanikandan, M.: IoT based hydroponics system using deep neural networks. Comput. Electron. Agric. 155, 473–486 (2018); Lakshmanan, R., Djama, M., Selvaperumal, S.K., Abdulla, R.: Automated Smart Hydroponics System Using Internet of Things. Faculty of Computing, Engineering and Technology, Asia Pacific University of Technology and Innovation (APU), Technology Park Malaysia; Malaysia Int. J. Inf. Educ. Technol. 9(8) (2019) 16. An AI Based System Design to Develop and Monitor a Hydroponic Farm Hydroponic Farming for Smart City Glenn Dbritto Computer Department St. Francis Institute of Technology Mumbai, India
130
S. V. Laddha et al.
17. Bhagat, M., Kumar, D., Kumar, D.: Role of Internet of Things (IoT) in smart farming: a brief survey. In: Proceedings of the IEEE 2019 Devices for Integrated Circuit (DevIC), Kalyani, India, 23–24 Mar 2019, pp. 141–145 18. Stoˇces, M., Vanˇek, J., Masner, J., Pavlík, J.: Internet of Things (IoT) in agriculture—selected aspects. Agris online Pap. Econ. Inf. VIII, 83–88 (2016) 19. Statista Forecasted Market Value of Precision Farming Worldwide in 2018 and 2023 20. Kite-Powell, J.: Why Precision Agriculture Will Change How Food is Produced 21. Madoka, S., Ramaswamy, R., Tripathi, S.: Internet of Things (IoT): a literature review. J. Comput. Commun. 3, 164–173 (2015) 22. Verdouw, C., Sundmaeker, H., Tekinerdogan, B., Conzon, D., Montanaro, T.: Architecture framework of IoT-based food and farm systems: a multiple case study. Comput. Electron. Agric. 165, 104939 (2019) 23. He, X., Ai, Q., Qiu, R.C., Huang, W., Piao, L., Liu, H.: A big data architecture design for smart grids based on random matrix theory. IEEE Trans. Smart Grid 8(2), 674–686 (2017) 24. Qin, J.M., Wang, X., Li, C., Dong, H., Ge, X.: Applying big data analytics to monitor tourist flow for the scenic area operation management. Discrete Dyn. Nature Soc. 2019, 8239047 (2019) 25. Wang, K., Chen, N., Tong, D., Wang, K., Wang, W., Gong, J.: Optimizing precipitation station location: a case study of the Jinsha River Basin. Int. J. Geograph. Inf. Sci. 30(6), 1207–1227 (2016) 26. Zhao, J.C., Zhang, J.F., Feng, Y., Guo, J.X.: The study and application of the IoT technology in agriculture. In: Proceedings of the IEEE 2010 3rd International Conference on Computer Science and Information Technology, vol. 2, pp. 462–465, Chengdu, China, 9–11 July 2010 27. Ramakrishnam Raju, S.V.S., et al.: Design and implementation of smart hydroponics farming using IOT-based AI controller with mobile application system. J. Nanomater. (2022)
Chapter 12
Angiosperm Genus Classification by RBF-SVM Shuwen Chen, Jiaji Wang, Yiyang Ni, Jiaqi Shao, Hui Qu, and Ziyi Wang
Abstract Angiosperm genus classification performance has plateaued in the last few years. This paper proposed a novel method based on gray-level co-occurrence matrix and radial basis function kernel support vector machine for angiosperm genus classification. We collected a 300-image dataset, 100 are Hibiscus, 100 are Orchis, and the rest 100 are Prunus by digital camera. The results showed that our method achieved an accuracy of 84.73 ± 0.41%. In all, this method is promising in angiosperm genus classification.
12.1 Introduction Nowadays, more and more researchers and developers show their interest in identifying flowers [1–3]. Microsoft Research Asia (MSRA) released an app named “Flower Recognition,” which can identify flowers by taking a photo or selecting a photo on a phone without any internet connection. “Xingse” is another app for plant recognition, which can identify a plant, including flowers, in one second. There are some other systems that can recognize flowers. Most people can’t identify them unless they can search by tools like Google. A fast accurate image classification system for this task is needed to be developed. In particular, the input of this system is an image, instead of plain text. S. Chen (B) · J. Wang · Y. Ni · J. Shao · H. Qu · Z. Wang School of Physics and Information Engineering, Jiangsu Second Normal University, Nanjing 211200, China e-mail: [email protected] Y. Ni e-mail: [email protected] S. Chen · Y. Ni Jiangsu Province Engineering Research Center of Basic Education Big Data Application, Nanjing 211200, China S. Chen State Key Laboratory of Millimeter Waves, Southeast University, Nanjing 210096, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_12
131
132
S. Chen et al.
Our study proposed a new algorithm to perform this task, which combines graylevel co-occurrence matrices [4, 5] (GLCM) and radial basis function kernel support vector machine (RBF-SVM) [6–9]. We used the GLCM algorithm to extract the texture features of the given image. The extracted features will be the input of RBFSVM. RBF-SVM will return the correct label of the given image. This combined algorithm shows promising results. Experiments show that accuracy and time cost are both acceptable in real applications. The highlights of this study are composed of 2 points: (1) We used the combination of GLCM and RBF-SVM, and it works well. (2) We used an extension of SVM so that it can be used in multi-class classification tasks.
12.2 Methodology 12.2.1 Checking the PDF File Gray-Level Co-Occurrence Matrices Gray-level co-occurrence matrices (GLCM) [10–12] is a common method to describe texture by studying the spatial correlation of gray level [13]. In our study, we find that GLCM [14] is a very useful feature extraction method for image classification tasks. A gray-level histogram is a result of statistics on every single pixel of an image [15], while GLCM is obtained by statistically calculating the gray-level state in which 2 pixels have a certain distance on the image. Assuming we have an M * N image, we can pick up two different pixels with coordinates A(x,y), B(x + a, y + b). Assuming the gray level of A is g1, and the gray level of B is g2, then (g1, g2) is the gray level of the given point pair A and B. By traversing through each pixel on the image, we can get all such pairs. Then, we can regard g1 as the row index of a matrix and g2 as the column index of the same matrix (i.e., GLCM). For each entry of this matrix, its value equals how many times (g1, g2) appears. Here is an example: Fig. 12.1 shows a raw image and its GLCM. When calculating this GLCM, we assume a = 1 and b = 0. Fig. 12.1 Image and its GLCM (a = 1, b = 0)
12 Angiosperm Genus Classification by RBF-SVM
133
Developers and researchers can choose a different combination of a and b according to the texture of the given image such as (1,0), (1,1), (0,1). In digital image processing, different filters (convolutional kernels) can be used to extract different features of an image [16]. Similarly, different combinations of a and b can be used to extract different texture features of an image. All GLCMs we obtain are useful for this task. The pseudocode of calculating GLCM is listed in Algorithm 1. Algorithm 1 Pseudocode of calculating GLCM Step 1 Import the raw image Step 2 Transform it to gray-level image Step 3 Calculate all (g1, g2) pairs Step 4 Construct the GLCM.
12.2.2 Radial Basis Function Kernel Support Vector Machine Support vector machine (SVM) [17, 18] was a linear classifier initially designed for classification tasks with only two categories. But normally samples are not linearly separable. For example, Fig. 12.2a represents a dataset with two categories: black and white, which can’t be separated by a single straight line. This problem will inevitably greatly reduce its scope of application. As is known to all, only a nonlinear classifier [19, 20] can achieve better performance when dealing with image classification tasks. However, if we perform appropriate modifications [21, 22], we can use SVM to solve this problem while its advantages remain. We can map raw features to a higher dimension, then a simple in equation is enough to classify any given sample. Figure 12.2b shows a solution: The dashed line is the decision boundary, any sample less than 0 is white, and any sample greater than 0 is black. In real applications, we use kernel functions to do such maps [23, 24]. And we use RBF here. Radial basis function (RBF) [25–27] is a kind of radially symmetrical function. RBF is usually defined as a monotonic function of the Euclidean distance between any point x in the sample space to a certain center x c . We use k(||x − x c ||) to represent RBF here. Its effect is often local, which means k(||x − x c ||) will return a very small
(a) linear inseparable dataset
(b) Separate samples via a non-linear boundary
Fig. 12.2 Illustration of the linear inseparable dataset
134
S. Chen et al.
value if x is very far away from x c . The most commonly used RBF is the Gaussian kernel function shown below. ||x − xc ||2 , (12.1) k(||x − xc ||) = exp − 2σ 2 where x is the sample, x c is the certain center, and σ is standard deviation controls the scope of this kernel function. The expression shown above is similar to normal distribution, so it is called Gaussian kernel [28, 29]. It can map raw features to a higher dimension so that we can use SVM to find a decision boundary of the hyper-plane.
12.2.3 Multi-class Classification Using SVM The dataset of our study has three categories, while the original SVM can only separate one category from the other. There are several extensions for SVM. One-Versus-Rest If we have N categories (N is an integer which is greater than 2), then we train N SVMs. For the ith SVM, we mark the ith category as a positive sample, and the others as negative [30, 31]. During testing, we apply the given sample to all these SVMs, and we will obtain N results {y1 , y2 , . . . , y N }. The result will be k, where yk is the maximum value in the result set and k is the index of the maximum value in the result set, i.e., yk = max{y1 , y2 , . . . , y N }.
(12.2)
This method is very intuitive, but it takes too much time in real applications. DAG-SVM Platt, Cristianini and Shawe-Taylor [32] proposed the DAG-SVM algorithm, which is an improvement for SVM on multi-class classification tasks. Figure 12.3 shows how it works. Its workstream looks like a binary decision tree, who’s each node will decide which branch will the sample go next. During training, it trains k * (k − 1)/2 SVMs [33, 34]. Each SVM only need to be able to know which branch to go. This is a simpler task since sometimes either the left or right branch will reach the same correct label, which means no weight needs to be updated during training time when meeting this case [35, 36]. At test time, each sample always goes through (k − 1) SVMs and reaches its label. Note here are (k − 1) SVMs, instead of k, and each SVM performs simpler calculations, so it is faster than the method mentioned above.
12 Angiosperm Genus Classification by RBF-SVM
135
Fig. 12.3 DAG-SVM classification
{1,2,3,4} 1v4 not 1
not 4 {1,2,3}
{2,3,4} 2v4 not 2
1v3 not 4
not 3
not 1
{1,2}
{3,4} 3v4 not 3
2v3
1v2 not 1
not 4 not 2
4
3
not 2
not 3 2
1
12.2.4 Classification We first extract texture features of the given image by applying GLCM algorithm, then vectorize matrices obtained by GLCM, next, use this vector as the input of RBF-SVM. In the end, SVM will return the label. This pseudocode is listed in Algorithm 2. Algorithm 2 Pseudocode of classification Step 1 Import the image Step 2 Get is GLCMs Step 3 Vectorize obtained GLCMs Step 4 Make it the argument for RBF-SVM Step 5 SVM returns the label of the imported image.
136
S. Chen et al.
(a)
(b)
(c)
Fig. 12.4 Angiosperm petal image in our dataset
12.3 Experiment 12.3.1 Dataset Figure 12.4 shows petal images of three different angiosperm plants. Figure 12.4a belongs to Hibiscus [37], Fig. 12.4b belongs to Orchis [38], and Fig. 12.4c belongs to Prunus [39]. We collect 100 flower images per type with different angles and illuminations.
12.3.2 Cross-Validation Results We use cross-validation to improve the performance of our system here. Based on the fivefold cross-validation and ten runs, we can get the sensitivity for each type of flower: Hibiscus, Orchis, and Prunus as given in Table 12.1, which shows the detection sensitivity for each fold of each run, the overall sensitivity is 85.80 ± 2.3%. Table 12.1 shows the detection sensitivity for the Orchis of each fold of each run. The overall sensitivity is shown as 82.90 ± 2.77. Table 12.1 shows the detection sensitivity of the Prunus based on the fivefold cross-validation for ten runs. The overall detection sensitivity of the Prunus is 85.90 ± 2.01. Table 12.2 shows the detection accuracy for the three types of flowers; the overall accuracy is 84.73 ± 0.41%. Figure 12.5 shows the indices of samples in five folders.
12 Angiosperm Genus Classification by RBF-SVM
137
Table 12.1 Detection sensitivity based on 10 × fivefold CV (unit: %, F = fold, O = overall) Run
Hibiscus
Orchis
F1
F2
F3
F4
F5
O
F1
F2
F3
F4
F5
O
1
85
80
85
85
85
84
90
65
75
90
80
80
2
85
90
85
90
90
88
85
75
85
80
85
82
3
85
90
85
80
90
86
75
85
85
90
80
83
4
85
90
85
80
90
86
85
75
85
85
85
83
5
75
85
80
85
85
82
85
85
95
75
90
86
6
90
85
80
75
95
85
90
85
85
80
80
84
7
90
90
90
90
85
89
85
80
85
85
80
83
8
90
90
85
100
80
89
70
85
80
80
70
77
9
90
90
85
80
80
85
85
85
90
80
90
86
10
85
75
90
80
90
84
80
80
90
90
85
85
85.80 ± 2.3 Prunus 1
85
90
85
95
90
89
2
75
75
85
85
95
83
3
90
85
85
90
85
87
4
90
90
85
75
85
85
5
85
85
95
75
90
86
6
95
80
80
90
90
87
7
80
75
80
95
85
73
8
90
85
85
95
80
87
9
90
85
75
80
90
84
10
80
90
85
90
70
85 85.90 ± 2.01
82.90 ± 2.77
138
S. Chen et al.
Table 12.2 Detection accuracy of flower types of the 10 × fivefold cross-validation (unit: %) Fold 1
Fold 2
Fold 3
Fold 4
Fold 5
Overall
1st run
81.67
80.00
85.00
85.00
90.00
84.33
2nd run
83.33
86.67
85.00
86.67
85.00
85.33
3rd run
86.67
85.00
85.00
80.00
86.67
84.67
4th run
81.67
85.00
86.67
83.33
86.67
84.67
5th run
91.67
83.33
81.67
81.67
88.33
85.33
6th run
85.00
81.67
85.00
90.00
83.33
85.00
7th run
83.33
86.67
83.33
91.67
76.67
84.33
8th run
88.33
86.67
83.33
80.00
86.67
85.00
9th run
81.67
81.61
88.33
86.67
83.33
84.33
10th run
83.33
81.67
81.67
88.33
91.67
85.33 84.73 ± 0.41
Fig. 12.5 CV index of the flower detection based on the fivefold cross-validation
12.4 Conclusion We proposed a novel angiosperm genus classification method based on GLCM and RBF-SVM. GLCM was used to extract texture features from the input image, and RBF-SVM was used to classify them by extracted features. This method achieved an accuracy of 84.73 ± 0.41%. In the future, we will try to train this system on larger datasets. We will also try to use this method to identify the angiosperm genus of other kinds.
12 Angiosperm Genus Classification by RBF-SVM
139
References 1. Botirov, A., An, S., Arakawa, O., Zhang, S.: Application of a visible/near-infrared spectrometer in identifying flower and non-flower buds on ‘Fuji’ apple trees. Indian J. Agric. Res. 56(2), 214–219 (2022) 2. Teixeira-Costa, L., Heberling, J.M., Wilson, C.A., Davis, C.C.: Parasitic flowering plant collections embody the extended specimen. Methods Ecol. Evol. 14(2), 319–331 (2023) 3. Veerendra, G., Swaroop, R., Dattu, D., Jyothi, C.A., Singh, M.K.: Detecting plant diseases, quantifying and classifying digital image processing techniques. Mater. Today Proc. 51, 837– 841 (2022) 4. Davidovic, L.M., Cumic, J., Dugalic, S., Vicentic, S., Sevarac, Z., et al.: Gray-level cooccurrence matrix analysis for the detection of discrete, ethanol-induced, structural changes in cell nuclei: an artificial intelligence approach. Microsc. Microanal. 28(1), 265–271 (2022) 5. Saihood, A., Karshenas, H., Nilchi, A.R.N.: Deep fusion of gray level co-occurrence matrices for lung nodule classification. PLoS ONE 17(9), e0274516 (2022) 6. Borman, R.I., Ahmad, I., Rahmanto, Y.: Klasifikasi Citra Tanaman Perdu Liar Berkhasiat Obat Menggunakan Jaringan Syaraf Tiruan radial basis function. Bull. Inform. Data Sci. 1(1), 6–13 (2022) 7. Su, H., Zhao, D., Yu, F., Heidari, A.A., Zhang, Y., et al.: Horizontal and vertical search artificial bee colony for image segmentation of COVID-19 X-ray images. Comput. Biol. Med. 142, 105181 (2022) 8. Tanveer, M., Rajani, T., Rastogi, R., Shao, Y.-H., Ganaie, M.: Comprehensive review on twin support vector machines. Ann. Oper. Res. 1–46 (2022) 9. Sabanci, K., Aslan, M.F., Ropelewska, E., Unlersen, M.F.: A convolutional neural networkbased comparative study for pepper seed classification: Analysis of selected deep features with support vector machine. J. Food Process Eng 45(6), e13955 (2022) 10. Christaki, M., Vasilakos, C., Papadopoulou, E.-E., Tataris, G., Siarkos, I., et al.: Building change detection based on a gray-level co-occurrence matrix and artificial neural networks. Drones 6(12), 414 (2022) 11. Pantic, I., Cumic, J., Dugalic, S., Petroianu, G.A., Corridon, P.R.: Gray level co-occurrence matrix and wavelet analyses reveal discrete changes in proximal tubule cell nuclei after mild acute kidney injury. Sci. Rep. 13(1), 4025 (2023) 12. Wang, H., Li, S., Qiu, H., Lu, Z., Wei, Y., et al.: Development of a fast convergence graylevel co-occurrence matrix for sea surface wind direction extraction from marine radar images. Remote Sens. 15(8), 2078 (2023) 13. Kisa, D.H., Ozdemir, M.A., Guren, O., Akan, A., IEEE.: Classification of hand gestures using sEMG signals and Hilbert-Huang transform. In: 30th European Signal Processing Conference (EUSIPCO). Belgrade, SERBIA (2022) 14. Zhang, Y.-D.: Secondary pulmonary tuberculosis recognition by 4-direction varying-distance GLCM and fuzzy SVM. Mob. Netw. Appl. (2022). https://doi.org/10.1007/s11036-021-019 01-7 15. Kaduhm, H.S., Abduljabbar, H.M.: Studying the classification of texture images by K-means of co-occurrence matrix and confusion matrix. Ibn AL-Haitham J. Pure Appl. Sci. 36(1), 113–122 (2023) 16. Taye, M.M.: Theoretical understanding of convolutional neural network: concepts, architectures, applications, future directions. Computation 11(3), 52 (2023) 17. Halder, S., Das, S., Basu, S.: Use of support vector machine and cellular automata methods to evaluate impact of irrigation project on LULC. Environ. Monit. Assess. 195(1), 3 (2023) 18. Gordon, D., Norouzi, A., Blomeyer, G., Bedei, J., Aliramezani, M., et al.: Support vector machine based emissions modeling using particle swarm optimization for homogeneous charge compression ignition engine. Int. J. Engine Res. 24(2), 536–551 (2023) 19. Alshikho, M., Jdid, M., Broumi, S.: A study of a support vector machine algorithm with an orthogonal Legendre kernel according to neutrosophic logic and inverse Lagrangian interpolation. J. Neutrosophic Fuzzy Syst. (JNFS) 5(01), 41–51 (2023)
140
S. Chen et al.
20. Tembhurne, J.V., Gajbhiye, S.M., Gannarpwar, V.R., Khandait, H.R., Goydani, P.R., et al.: Plant disease detection using deep learning based mobile application. Multimedia Tools Appl. 1–26 (2023) 21. Phillips, P.: Detection of Alzheimer’s disease and mild cognitive impairment based on structural volumetric MR images using 3D-DWT and WTA-KSVM trained by PSOTVAC. Biomed. Signal Process. Control 21, 58–73 (2015) 22. Wang, S.: Detection of dendritic spines using wavelet packet entropy and fuzzy support vector machine. CNS & Neurol. Disorders Drug Targets 16(2), 116–121 (2017) 23. Lu, H.M.: Facial emotion recognition based on biorthogonal wavelet entropy, fuzzy support vector machine, and stratified cross validation. IEEE Access 4, 8375–8385 (2016) 24. Gorriz, J.M., Ramírez, J.: Wavelet entropy and directed acyclic graph support vector machine for detection of patients with unilateral hearing loss in MRI scanning. Front. Comput. Neurosc. 10 (2016) 25. Tayari, E., Torkzadeh, L., Domiri Ganji, D., Nouri, K.: Investigation of hybrid nanofluid SWCNT–MWCNT with the collocation method based on radial basis functions. Euro. Phys. J. Plus 138(1), 3 (2023) 26. Rashidi, M., Alhuyi Nazari, M., Mahariq, I., Ali, N.: Modeling and sensitivity analysis of thermal conductivity of ethylene glycol-water based nanofluids with alumina nanoparticles. Experi. Techn. 47(1), 83–90 (2023) 27. Jalili, R., Neisy, A., Vahidi, A.: Multiquadratic-radial basis functions method for mortgage valuation under jump-diffusion model. Int. J. Fin. Manage. Account. 8(29), 211–219 (2023) 28. Noori, H.: Gradient-Controled Gaussian Kernel for image Inpainting. AUT J. Electr. Eng. 55(1), 2 (2023) 29. Gonzáleza, B., Negrına, E.: Operators with complex Gaussian kernels: asymptotic behaviours. Filomat 37(3), 833–838 (2023) 30. Zhang, Y.: Comparison of machine learning methods for stationary wavelet entropy-based multiple sclerosis detection: decision tree, k-nearest neighbors, and support vector machine. SIMULATION 92(9), 861–871 (2016) 31. Wang, S.: Morphological analysis of dendrites and spines by hybridization of ridge detection with twin support vector machine. PeerJ 4 (2016) 32. Platt, J.C., Cristianini, N., Shawe-Taylor, J.: Large margin DAGs for multiclass classification. In: 13th Annual Conference on Neural Information Processing Systems (NIPS). Co. 33. Wang, S.: Dual-tree complex wavelet transform and twin support vector machine for pathological brain detection. Appl. Sci. 6(6) (2016) 34. Anupong, W., Jweeg, M.J., Alani, S., Al-Kharsan, I.H., Alviz-Meza, A., et al.: Comparison of wavelet artificial neural network, wavelet support vector machine, and adaptive neuro-fuzzy inference system methods in estimating total solar radiation in Iraq. Energies 16(2) (2023) 35. Zhang, Y.: Magnetic resonance brain image classification based on weighted-type fractional Fourier transform and nonparallel support vector machine. Int. J. Imaging Syst. Technol. 25(4), 317–327 (2015) 36. Shi, C.Y., Yin, X.X., Chen, R., Zhong, R.X., Sun, P., et al.: Prediction of end-point LF refining furnace based on wavelet transform based weighted optimized twin support vector machine algorithm. Metall. Res. Technol. 120(1) (2023) 37. Chen, J., Ye, H., Wang, J., Zhang, L.: Relationship between anthocyanin composition and floral color of Hibiscus syriacus. Horticulturae 9(1), 48 (2023) 38. Kropf, M., Kriechbaum, M.: Monitoring of Dactylorhiza sambucina (L.) Soó (Orchidaceae)— Variation in flowering, flower colour morph frequencies, and erratic population census trends. Diversity 15(2), 179 (2023) 39. Wang, L., Song, J., Han, X., Yu, Y., Wu, Q., et al.: Functional divergence analysis of AGL6 genes in Prunus mume. Plants 12(1), 158 (2023)
Chapter 13
Signage Detection Based on Adaptive SIFT Jiaji Wang, Shuwen Chen, Jiaqi Shao, Hui Qu, and Ziyi Wang
Abstract In navigation and wayfinding applications, signage is crucial for finding destinations. This paper proposes a new method for detecting signage, with the aim of helping blind individuals navigate unfamiliar indoor environments. To enhance the accuracy of signage detection and eliminate interference, the proposed method initially obtains the attended areas through a saliency map. The signage is then identified in these attended areas using SIFT algorithm-based adaptive threshold. This innovative approach is capable of detecting and recognizing multiple signs. Finally, audio facilitates the detection and recognition of results transferred to the users. Our experimental results, based on indoor signage datasets we collected, demonstrate the efficiency of our proposed method. On average, it takes 0.104 s to process the data.
13.1 Introduction As of 2020, the global population of people who were estimated to be blind was 43.3 million. Globally, the prevalence of moderate to severe visual impairment has slightly increased among adults aged 50 or above from 1990 to 2020 [1]. Individuals with severe vision impairment face significant challenges when it comes to independent travel, particularly in unfamiliar environments, which can negatively impact their quality of life and safety. The City College of New York City’s multimedia institute conducted a survey of blind users, which identified the detection and recognition of signage as a high priority for providing assistance in wayfinding and navigation, emphasizing the critical role it plays in improving the quality of J. Wang · S. Chen (B) · J. Shao · H. Qu · Z. Wang School of Physics and Information Engineering, Jiangsu Second Normal University, Nanjing 211200, China e-mail: [email protected] S. Chen Jiangsu Province Engineering Research Center of Basic Education Big Data Application, Nanjing 211200, China State Key Laboratory of Millimeter Waves, Southeast University, Nanjing 210096, China © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_13
141
142
J. Wang et al.
life and safety for visually impaired individuals [2]. In response to the aforementioned challenge, the paper presents a new and innovative framework that effectively detects and recognizes restroom signage from pictures taken with a camera that can be worn. The framework is intended to assist individuals with visual impairments in navigating unfamiliar environments with greater independence by providing reliable wayfinding and navigation assistance. Based on the survey of the developed disability and assistive technologies, we found that many technologies have been applied to assist individuals who experience profound visual impairment [3, 4]. The voice vision technology, as described in reference [5], provided advanced image-to-sound renderings specifically designed for individuals with complete blindness. This was achieved through the use of a live camera. The algorithm proposed in reference [6] was designed to effectively recognize landmarks that are appropriately positioned on sidewalks. To achieve this, the proposed algorithm used a combination of Peano–Hilbert space-filling curves to reduce the dimensionality of image data. This results in a recognition method that was highly efficient and fast. Scholars have developed some computer vision-based technologies [7, 8] to assist people with severe vision impairment, such as banknote recognition [9], navigation and wayfinding [10], pattern matching and recognition [11], and text extract [12]. In order to help the visually impaired, Ref. [13] developed a system based on the mobile phone. The interface of the system was user-friendly and straightforward, allowing for real-time local and global navigation assistance. The authors presented the progress concerning local navigation. The detection of doors [14, 15] in corridors was combined with path and obstacle detection [16, 17] beyond the range of the cane, resulting in an enhanced navigation system. However, though many efforts have been made, it is still an open question of applying those vision technologies [18, 19] to help people with severe vision impairment to understand their environment. The method proposed in this paper involves two steps for recognizing and detecting restroom signage based on images: (1) We employ the saliency map to eliminate the impossible location to improve the detection and recognition accuracy and reduce the computing time. (2) SIFT algorithm based on adaptive contrast threshold is used to get the feature points and match the template and test images to recognize specific signage denoting “Men,” “Women,” or “Disabled” restrooms. The algorithm based on SIFT features is highly adaptable to the basic operations of geometric transformations of images. Furthermore, it demonstrates a degree of partial invariance to lighting changes and 3D affine transformation, thereby contributing to its robustness and reliability [20]. However, the quantity of interest points detected by the SIFT algorithm is reduced as the concentration of reduced intensity distribution, and the robustness will be sharply reduced as the SIFT algorithm gets rid of the extreme points of which the threshold is less than 0.3. It is not suitable for all situations [21]. We use the improved SIFT matching algorithm based on an adaptive algorithm.
13 Signage Detection Based on Adaptive SIFT
143
Fig. 13.1 Flowchart of the proposed method
13.2 Method 13.2.1 Overview Two main stages are included in the proposed restroom signage recognition: impossible location removal. Figure 13.1 demonstrates the process of signage recognition. In the first step, saliency is employed to remove the impossible area, then the attended area is input into the next step, we use the SIFT matching based on the adaptive contrast threshold to get the matched points. According to the number of matched points, we can get the detection and recognition results.
13.2.2 Build SM Saliency maps (SMs) quantify the saliency of each location in the field of view as a scalar value and help inform the choice of position of interest depending on the spatial distribution of saliency [22]. Like the center-surround representations of primary visual features, bottom-up saliency is affected by the difference in characteristics between a stimulus and its surroundings, across various sub-modalities and scales. As shown in Fig. 13.2, the saliency of a particular location is largely decided by the contrast between that position and its neighboring areas in relations of various factors. A global measure of conspicuity is created by combining the normalized
144
J. Wang et al.
Fig. 13.2 Architecture of building SMs
information from individual feature maps to generate the saliency map. The detailed procedures for generating the saliency map are discussed in the following sections. Initialization: SMs are computed for each pixel using linear “center-surrounded” operations that are like visual receptive fields. To extract features using the centersurround method [23], the gap between fine and coarser scales is computed. At scales vc ∈ {2, 3, 4}, the center is a pixel, while the surround is a pixel at a scale vs = vc + δ,
(13.1)
with δ ∈ {3, 4}. To calculate the difference between two maps across different scales, the finer scale map is interpolated and then point-by-point subtraction is performed [24]. Incorporating multiple scales for both the vc and δ = vs − vc results in real multi-scale feature extraction, which includes various size proportions between the center and surrounding regions. Saliency Map for Intensity: An grayscale image I is achieved as I =
vr + vg + vb , 3
(13.2)
where Gaussian pyramid I(σ ) is generated using image I with different scales σ ∈ {0, 1, ...8}. The color channels are normalized by I to separate hue from intensity, but this normalization is not performed at locations with very low luminance, where hue variation is not perceptible and therefore not salient [25]. The normalization process
13 Signage Detection Based on Adaptive SIFT
145
is limited to positions in which I is greater than 10% from its highest value over the whole image, while the other positions are set to 0, following a previous study [26]. Four color channels with broad tuning are generated, namely R(red), B(blue), G(green), and Y (yellow). Four color channels with broad tuning are generated: ⎧ v +v R = vr − g 2 b ⎪ ⎪ ⎨ b G = vg − vr +v 2 . vg +vr ⎪ B = vb − 2 ⎪ ⎩ Y = 21 vg + vr − vr − vg − vb
(13.3)
For the channels’ values obtained from the (13.3), any negative values are truncated to zero. The color channels are used to construct Gaussian pyramids denoted as R(σ ), G(σ ), B(σ ), and Y (σ ). The feature maps are gained by computing the centersurrounding variance with the “center” fine scale vc and the “surround” coarser scale vs . vc vs The initial feature maps are designed to detect intensity comparisons, which resulted from by neurons responsive to either bright centers on dark surrounds or dark centers on bright surrounds. The algorithm calculates these two categories of sensitivities at the same time by rectifying them on six maps I (vc , vs ). I (vc , vs ) I (vc , vs ) = |I (vc ) − I (vs )|.
(13.4)
Color-based SM: The second group of SMs is constructed in a similar way for the color channels. The human primary visual cortex has R/G, G/R, B/Y, and Y /B color pairs. As a result, the RG(vc , vs ) maps are generated to capture both R/G and G/R double opponency, while the BY (vc , vs ) maps are used to account for B/Y, and Y /B double opponency (13.5) and (13.6), respectively. RG(vc , vs ) = |(R(vc ) − G(vc ))!(G(vs ) − R(vs ))|,
(13.5)
BY (vc , vs ) = |(B(vc ) − Y (vc ))!(Y (vs ) − B(vs ))|.
(13.6)
Orientation-based SMs: By building oriented Gabor pyramids, local directional information is obtained from image I. O(σ, θ ), σ ∈ [0, 1, . . . , 8], θ ∈ {0◦ , 45◦ , 90◦ , 135◦ },
(13.7)
where σ represents the scale and θ represents the direction of preference. Gabor filters are designed to imitate the response profile of neurons in the primary visual cortex that are sensitive to orientation. They accomplish this by combining a two-dimensional Gaussian envelope with a cosine grating [27, 28]. Orientation feature maps O(vc , vs , θ ) encode the partial directional comparison among the central and surrounding scales.
146
J. Wang et al.
O(vc , vs , θ ) = |O(vc , θ ) − O(vs , θ )|.
(13.8)
In total, the method generates 42 feature maps that capture various visual properties, including 12 maps from color, 6 maps from intensity, and 24 maps from orientation. Combination of SMs: Combining different feature maps poses a challenge because they represent distinct modalities with varying dynamic ranges and extraction mechanisms, making them non-comparable a priori. Due to the lack of top-down supervision in saliency detection, the authors propose a map normalization operator denoted by N (.). This operator promotes maps having a few strong peaks of activity and suppresses those with numerous comparable peak responses. N (.) has three components, which are listed in Table 13.1. The feature maps are aggregated to create three “conspicuity maps” at the scale σ = 4 of the saliency map: one for intensity called I (9), one for color called C (10), and one for orientation called O (10) through across-scale addition ⊕. This involves reducing each map to scale four and performing a point-by-point addition to obtain the final conspicuity maps. =4 I = ⊕4vc =2 ⊕vvcs =v N (I (vc , vs )), c +3 =4 C = ⊕4vc =2 ⊕vvcs =v N (RG(vc , vs )) + N (BY (vc , vs )). c +3
(13.9) (13.10)
To create the orientation conspicuity map, four intermediary maps are formed by combining the six maps for a given θ , and these maps are then combined into a single map using across-scale addition.
O=
=4 ⊕4vc =2 ⊕vvcs =v N (I (vc , vs )). c +3
(13.11)
θ∈{0◦ ,45◦ ,90◦ ,135◦ }
The hypothesis behind creating three separate channels (I , C, O) is that features that are similar contest robustly for saliency [29], whereas features from diverse modalities provide separate contributions to saliency mapping [30]. Once the normalization is complete, the three channels are combined and passed to the saliency map to generate the final input S. Table 13.1 Three parts in N (.) Part Action I
The normalization operator N (.) is the first step in scaling the values in the map up to a finite range [0, M]. N (.)
II
The second step involves computing the maximum of the global value M and the mean m of all other local maximum; m
III
The third step is to globalize the map by multiplying (M − m)2
13 Signage Detection Based on Adaptive SIFT
147
Fig. 13.3 a Original figure and b attended area
(a)
S=
1 N I +N C +N O . 3
(b)
(13.12)
As shown in Fig. 13.3, in which (a) is the original image obtained by the camera and Fig. 13.3b is obtained by SM, those marked with white pixels will be input as the attended area to do the sift matching.
13.2.3 Signage Identification Based on SIFT Matching SIFT Feature Extraction and Representation: The number of feature points extracted by the SIFT algorithm is reduced as the concentration of intensity distribution, and the robustness will be sharply reduced [31]. It is because the SIFT algorithm gets rid of the extreme points of which the threshold is less than 0.3 as the non-critical points, which is not suitable for all situations. For the images with high concentration of intensity distribution, the threshold is too high to get all the critical feature points [32]. Therefore, we use SIFT matching algorithm based on adaptive threshold. To begin, key points are extracted from a collection of reference images and stored in a database for future use. This is done by searching for potential feature
148
J. Wang et al.
points across different scales and locations in the image using a difference-ofGaussian (DoG) function pyramid [33]. The DoG [34] function approximates the scale-normalized Laplacian-of-Gaussian and allows for the identification of the most stable image features based on an adaptive contrast threshold. Once the interest points are detected, a feature descriptor is generated for each of them. This is achieved by extracting the size and direction of the image gradients within a 16 × 16 neighborhood region [35]. The region is centered on the point of interest’s position and is scaled to an appropriate size based on the orientation of the interesting point. The region is then divided into 16 sub-regions of 4 × 4 pixels each. Within each sub-region, SIFT computes the gradients of the image pixels as directional histograms with 8 bins, which capture the approximate spatial architecture of the adjacent regions, by the resulted 4 × 4 array of histograms. The resulting 128-dimensional feature vector serves as the feature descriptor for each interest point. Signage Recognition by SIFT Matching: (1) According to the intensity distribution of the image, we calculate the threshold for the SIFT matching algorithm [36], extract the sift features, and create the sift descriptor. ⎧
255 − i=0 Pi log2 Pi ⎪ norm_ I _ entropy = ⎪ ⎪ log2 L ⎪ ⎨ 1 x = Norm_ I _ entropy× s−3 , x ⎪ 100 x ≥ 0.194 ⎪ 20×[1−(s−3)x]+ ⎪ 9 ⎪ ⎩ contrast_ threshold = 0.01 x < 0.194
(13.13)
In which, Pi is expressed as the proportion of the gray level i of the whole image, L represents the number of gray levels. s means the stages of the Gauss Pyramid of each sift octave. (2) We use nearest Euclidean distance [37, 38] and as the conditions to get rid of the wrong matches. The degree of similarity between the area and the signage was assessed by calculating the number of matches between the signage target image and the area of interest. Ultimately, the location with the maximum number of feature matches [39] was selected as the most likely signage recognition result among all possible matches, as shown in Fig. 13.4.
13.3 Experimental Results The program works in a computer with window XP system, 2 GB memory and MATLAB. We created a database consisting of 100 images of restroom signage, with 50 images containing the “Women (W ),” “Men (M),” and “Disabled (D)” patterns
13 Signage Detection Based on Adaptive SIFT
149
Fig. 13.4 Sift feature points of signage template
and 50 negative images without any restroom signage. Among the 50 signage images, including 50 “Men,” 48 “Women,” and 10 “Disabled.” In Fig. 13.5, we can observe that the collected database comprises images that exhibit variations in terms of illumination, scale, rotation, camera view, and perspective projection. The results indicate that our proposed method can effectively handle such variations in restroom signage. We assess the recognition accuracy of our proposed method and report the results in Table 13.2. The algorithm achieves a detection recognition rate of 87.962%, correctly identifying and recognizing 95 out of a total of 108 restroom signage in our dataset. The accuracy for negative images is 100%. Figure 13.5 presents several illustrations of restroom signs detected in various surroundings.
Fig. 13.5 Examples of signage detection and recognition are available in our database
150
J. Wang et al.
Table 13.2 Identification accuracy of restroom signs M(50)
W (48)
D(10)
Negative
M
43
1
0
0
W
5
42
0
0
D
0
0
9
0
13.4 Conclusion We have proposed a novel method that uses SM and SIFT features based on adaptive contrast threshold to detect and recognize restroom signage, with the goal of assisting blind individuals [40, 41] to independently navigate unfamiliar environments. Our method can handle variations in scale, camera views, perspective projections, and rotations. Experimental results show that the proposed method is effective and efficient in detecting and recognizing restroom signage, with an accuracy of 87.962% for the detection and recognition of 95 signage out of 108 in our dataset. The proposed method also achieves 100% accuracy for negative images. The contribution of our paper focused on followed points: (1) SM is employed to reduce the computing time. (2) SIFT algorithm is employed to compute the matching confidence between the templates. (3) The hybrid of the SM and SIFT improves the detection accuracy and meanwhile reduces the cost time. In the future, we shall try to test the performance of combing our method with transfer learning [42] and fusion methods [43].
References 1. Bourne, R., Steinmetz, J.D., Flaxman, S., Briant, P.S., Taylor, H.R., Resnikoff, S., Casson, R.J., Abdoli, A., Abu-Gharbieh, E., Afshin, A., Ahmadieh, H.: Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the global burden of disease study. Lancet Glob. Health 9(2), e130–e143 (2021) 2. Tian, Y.L., Yang, X.D., Yi, C.C., Arditi, A.: Toward a computer vision-based wayfinding aid for blind persons to access unfamiliar indoor environments. Mach. Vis. Appl. 24(3), 521–535 (2013) 3. Armstrong, N.M., Teixeira, C.V.L., Gendron, C., Brenowitz, W.D., Lin, F.R., et al.: Associations of dual sensory impairment with long-term depressive and anxiety symptoms in the United States. J. Affect. Disord. 317, 114–122 (2022) 4. Lottridge, D., Yoon, C., Burton, D., Wang, C., Kaye, J.: Ally: understanding text messaging to build a better onscreen keyboard for blind people. ACM Trans. Accessible Comput. 15(4) (2022) 5. Wang, J., Li, J.Y., Shi, X.T.: Integrated design system of voice-visual VR based on multidimensional information analysis. Int. J. Speech Technol. 24(1), 1–8 (2021) 6. Costa, P., Fernandes, H., Vasconcelos, V., Coelho, P., Barroso, J., et al.: Landmarks detection to assist the navigation of visually impaired people. Lect. Notes Comput. Sci. 6763, 293–300 (2011) 7. Wang, S.-H., Khan, M.A.: WACPN: a neural network for pneumonia diagnosis. Comput. Syst. Sci. Eng. 45(1), 21–34 (2023)
13 Signage Detection Based on Adaptive SIFT
151
8. Wang, S.-H., Fernandes, S.: AVNC: attention-based VGG-style network for COVID-19 diagnosis by CBAM. IEEE Sensors J. (2021). https://doi.org/10.1109/JSEN.2021.3062442 9. Aseffa, D.T., Kalla, H., Mishra, S.: Ethiopian banknote recognition using convolutional neural network and its prototype development using embedded platform. J. Sensors 2022 (2022) 10. Deng, L.J., Romainoor, N.H.: A bibliometric analysis of published literature on healthcare facilities’ wayfinding research from 1974 to 2020. Heliyon 8(9) (2022) 11. Wang, S., Tian, Y.: Indoor signage detection based on saliency map and Bipartite Graph matching. In: IEEE International Conference on Bioinformatics and Biomedicine Workshops: Atlanta, USA 518–525 (2011) 12. Guo, W.W.: Correlation between the dissemination of classic english literary works and cultural cognition in the new media era. Adv. Multimedia 2022 (2022) 13. Moreno, M., Shahrabadi, S., Jose, J., du Buf, J.M.H., Rodrigues, J.M.F.: Realtime local navigation for the blind: detection of lateral doors and sound interface. Procedia Comput. Sci. 14, 1–10 (2012) 14. Ning, H., Li, Z.L., Ye, X.Y., Wang, S.H., Wang, W.B., et al.: Exploring the vertical dimension of street view image based on deep learning: a case study on lowest floor elevation estimation. Int. J. Geogr. Inf. Sci. 36(7), 1317–1342 (2022) 15. Goncalves, H.R., Santos, C.P.: Deep learning model for doors detection: a contribution for context-awareness recognition of patients with Parkinson’s disease. Expert Syst. Appl. 212 (2023) 16. Zust, L., Kristan, M.: Learning with weak annotations for robust maritime obstacle detection. Sensors 22(23) (2022) 17. Faggioni, N., Ponzini, F., Martelli, M.: Multi-obstacle detection and tracking algorithms for the marine environment based on unsupervised learning. Ocean Eng. 266 (2022) 18. Zhang, Y.D., Satapathy, S.: A seven-layer convolutional neural network for chest CT-based COVID-19 diagnosis using stochastic pooling. IEEE Sens. J. 22(18), 17573–17582 (2022) 19. Zhang, Y.D., Satapathy, S.C.: Fruit category classification by fractional Fourier entropy with rotation angle vector grid and stacked sparse autoencoder. Expert Syst. 39(3) (2022) 20. Huang, Y.R., Liu, Y.W., Han, T., Xu, S.Y., Fu, J.H.: Low illumination soybean plant reconstruction and trait perception. Agriculture-Basel 12(12) (2022) 21. Liu, H.L., Ji, H.B., Zhang, J.M., Zhang, C.L., Lu, J., et al.: A novel approach for feature extraction from a gamma-ray energy spectrum based on image descriptor transferring for radionuclide identification. Nuclear Sci. Techn. 33(12) (2022) 22. Davies, C., Tompkinson, W., Donnelly, N., Gordon, L., Cave, K.: Visual saliency as an aid to updating digital maps. Comput. Hum. Behav. 22(4), 672–684 (2006) 23. Arun, N., Gaw, N., Singh, P., Chang, K., Aggarwal, M., et al.: Assessing the trustworthiness of saliency maps for localizing abnormalities in medical imaging. Radiol.-Artif. Intell. 3(6) (2021) 24. Kulikov, D.A., Platonov, V.V.: Adversarial attacks on intrusion detection systems using the LSTM classifier. Autom. Control. Comput. Sci. 55(8), 1080–1086 (2021) 25. Li, X.L., Kong, W.W., Liu, X.L., Zhang, X., Wang, W., et al.: Application of laser-induced breakdown spectroscopy coupled with spectral matrix and convolutional neural network for identifying geographical origins of Gentiana rigescens Franch. Front. Artif. Intell. 4 (2021) 26. Lu, H.H., Liu, T., Zhang, H., Peng, G.F., Zhang, J.T.: High-resolution remote sensing scene classification based on salient features and DCNN. Laser Optoelectron. Progr. 58(20) (2021) 27. Guo, M.S.: Airport localization based on contextual knowledge complementarity in large scale remote sensing images. EAI Endorsed Trans. Scal. Inf. Syst. 9(35) (2022) 28. Majhi, M., Pal, A.K., Pradhan, J., Islam, S.K.H., Khan, M.K.: Computational intelligence based secure three-party CBIR scheme for medical data for cloud-assisted healthcare applications. Multimedia Tools Appl. 81(29), 41545–41577 (2022) 29. Hachaj, T., Stolinska, A., Andrzejewska, M., Czerski, P.: Deep convolutional symmetric encoder-decoder neural networks to predict students’ visual attention. Symmetry 13(12) (2021) 30. Kerdegari, H., Phung, N.T.H., McBride, A., Pisani, L., Nguyen, H.V., et al.: B-line detection and localization in lung ultrasound videos using spatiotemporal attention. Appl. Sci. 11(24) (2021)
152
J. Wang et al.
31. Xing, Y.J., Wang, H., Ye, D., Zhang, J.X.: Relative pose measurement for non-cooperative target based on monocular vision. Chin. Space Sci. Technol. 42(4), 36–44 (2022) 32. Sudha, S.K., Aji, S.: An active learning method with entropy weighting subspace clustering for remote sensing image retrieval. Appl. Soft Comput. 125 (2022) 33. Invernizzi, A., Haak, K.V., Carvalho, J.C., Renken, R.J., Cornelissen, F.W.: Bayesian connective field modeling using a Markov Chain Monte Carlo approach. Neuroimage 264 (2022) 34. Muller, H., Lobanov, A.P.: DoG-HiT: a novel VLBI multiscale imaging approach. Astron. Astrophys. 666 (2022) 35. Hussain, L., Qureshi, S.A., Aldweesh, A., Pirzada, J.U.R., Butt, F.M., et al.: Automated breast cancer detection by reconstruction independent component analysis (RICA) based hybrid features using machine learning paradigms. Connect. Sci. 34(1), 2785–2807 (2022) 36. Lee, K.Y.J., Wee, S., Jeong, J.: Pre-processing filter reflecting human visual perception to improve saliency detection performance. Electronics 10(23) (2021) 37. Ma, R.J., Kong, W., Chen, T., Shu, R., Huang, G.H.: KNN based denoising algorithm for photon-counting LiDAR: numerical simulation and parameter optimization design. Remote Sensing 14(24) (2022) 38. Wang, W.F., Shi, B.W., He, C., Wu, S.Y., Zhu, L., et al.: Euclidean distance-based Raman spectroscopy (EDRS) for the prognosis analysis of gastric cancer: a solution to tumor heterogeneity. Spectrochim. Acta Part A Mole. Biomole. Spectrosc. 288 (2023) 39. Wang, S.Y., Zhong, Z.T., Zhao, Y.L., Zuo, L.: A variational autoencoder enhanced deep learning model for wafer defect imbalanced classification. IEEE Trans. Components Packaging Manuf. Technol. 11(12), 2055–2060 (2021) 40. Ajuwon, P.M., Olawuwo, S.O., Ahon, A.T., Griffin-Shirley, N., Nguyen, T., et al.: Orientation and mobility services in Nigeria by vision status. Int. J. Special Educ. 37(2), 1–13 (2022) 41. Seki, Y., Ito, K.: Objective evaluation of obstacle perception using spontaneous body movements of blind people evoked by movements of acoustic virtual wall. Human Behavior Emerg. Technol. 2022 (2022) 42. Wang, S.-H., Khan, M.A.: VISPNN: VGG-inspired stochastic pooling neural network. Comput. Mater. Continua 70, 3081–3097 (2022) 43. Zhang, Y.-D., Dong, Z.-C.: Advances in multimodal data fusion in neuroimaging: overview, challenges, and novel orientation. Inf. Fusion 64, 149–187 (2020)
Chapter 14
Surface Electromyography Assisted Hand Gesture Recognition Using Bidirectional LSTM and Unidirectional LSTM for the Hearing Impaired Neel Gandhi and Shakti Mishra Abstract Sign Recognition System (SRS) is a technology that enables efficient communication between the deaf-mute community and the rest of society. Recent developments in machine learning and biomedical sensor technology have led to improvements in sign recognition systems. In this paper, we propose a sign recognition model that uses sensor data from a surface electromyography (sEMG) device and recognizes particular signs using a Stacked Bidirectional and Unidirectional LSTM (SBUL) architecture. The proposed model has significant social impact, as it facilitates communication among the deaf-mute community by recognizing signs using sEMG data. The neural network architecture proposed in this paper improves the performance of the model by considering temporal data dependencies between sEMG data and the layers of the proposed LSTM. This model provides information on temporal sequential data and offers an efficient method for SRS. Experimental results using sensor data demonstrate that the proposed SBUL network for sign recognition has an average recognition rate of approximately 96%. The model based on SBUL and sEMG provides better accuracy and results for solving the problem of sign recognition.
14.1 Introduction The recognition of sign language plays a vital role in enabling effective communication between individuals with hearing impairments and the wider society. It has two forms: manual and non-manual sign language, with manual sign language being more suitable for hand gesture recognition. Different countries have their unique sign languages. However, developing an accurate sign recognition system poses challenges due to the complex spatial and temporal nature of sign language. The N. Gandhi (B) Dartmouth College, Hanover, NH 03755, USA e-mail: [email protected] S. Mishra Pandit Deendayal Energy University, 382007 Gandhinagar, Gujarat, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_14
153
154
N. Gandhi and S. Mishra
ultimate objective of sign recognition is to convert sign language into a format that is understandable by humans, such as speech. Wearable devices like the Myo armband have gained popularity for sign recognition, and machine learning techniques are commonly employed for data processing. The proposed approach utilizes surface electromyography (sEMG) and Stacked Bidirectional and Unidirectional LSTM (LSTM) architecture to detect four commonly used signs: rock, scissor, paper, and okay. LSTM is well-suited for handling long sequences of data and finds extensive applications in tasks involving speech recognition, time series predictions, and natural language processing.
14.2 Related Works Sign recognition is important for non-verbal communication by deaf and mute individuals. Various sensors, such as Sensor Gloves, Kinect, Intel RealSense, gyroscope and accelerometer, electromyography, ToF-sensor, Leap Motion controller, and other types of sensors, have been used to recognize definite skeletal movements of hands or the human body. Starner et al. [20] proposed the use of Hidden Markov model (HMM) for recognizing the vocabulary of 53 signs, with a recognition rate of 92 and 95 from 97 sign sentences performed with and without bigram language, respectively. Sawant et al. [18] proposed a Principal Component Analysis-based approach for recognizing 26 Indian sign language signs. Chuan et al. [3] introduced an approach for the recognition of English alphabets, employing k-nearest neighbor and support vector machine (SVM) algorithms for classification. Decision tree and HMM were used by Zhang [26] to recognize data from sensors and convert it into control commands. Tamura et al. [23] presented a model that extracted 3D features of the hand and converted it into 2D image features. Srinivas et al. [19] discussed classification issues faced in hand gesture recognition. Kumar et al. [10] used Leap Motion sensors and kinetic sensors for recognition of Indian Sign Language. Yin et al. [25] compensated for the limitations of CNN by using recurrent neural networks. Geng et al. [6] experimented with the extraction of features from multichannel sEMG signals, and Lejmi et al. [13] adopted two or more classifiers named fusion for better results. In recent times, there have been advancements in deep neural network architectures aimed at enhancing recognition accuracy. These include the introduction of Convolutional Neural Networks (CNN) by Atzori et al. [2], Cote-Allard et al. [4], and Long Short-Term Memory (LSTM) models by Wu et al. [24]. These innovations have significantly contributed to improving the accuracy of recognition systems. Li et al. [14] proposed an extension of the basic LSTM model to learn 3D coordinates of individual joints. Sarkar et al. [17] employed Time-of-Flight (ToF) sensor data to train a Dynamic Long Short-Term Memory (D-LSTM) network, while Naguri et al. [15] utilized LSTM and Convolutional Neural Networks (CNN) for gesture recognition systems. In their study, Kumar et al. [11] utilized Bi-LSTM to tackle 3D handwriting recognition. On the other hand, Quivira et al. [16] introduced a probabilistic model that incorporated a Gaussian Mixture Model (GMM) and D-LSTM with five layers
14 Surface Electromyography Assisted Hand Gesture …
155
to reconstruct hand motions. Our novel approach involves stacking Bi-LSTM and LSTM, and it has been successfully applied to various domains, including EEG, ECG [22], and other medical signals [12]. Moreover, the implementation of our unique approach employing Stacked Bi-LSTM and LSTM leads to notable improvements in the effectiveness of the sign recognition model.
14.3 Proposed Methodology 14.3.1 Long Short-Term Memory (LSTM) LSTM effectively addresses the issue of vanishing gradients that arise in traditional RNNs. It achieves this by utilizing memory blocks to store and retrieve information over extended periods. LSTM, is equipped with four different gates designed to effectively address long-term dependencies. This architecture proves valuable in solving nonlinear time series prediction problems, natural language processing tasks, and other sequential applications. For our specific problem of sign recognition, where the input originates from wearable devices, we have chosen to employ LSTM. Given the task of predicting from our extensive sequence of sEMG data, LSTM proves to be a suitable choice. The architecture and equations for the LSTM model can be described in Fig. 14.1. Here, the hidden vector sequence is denoted by .h t , the weight matrices are represented by .W , the logistic sigmoid function is denoted by .σ , and the gates and cell activation vectors, namely the input gate (.i t ), forget gate (. f t ), output gate (.ot ), and cell activation (.ct ), are indicated. Additionally, the learned biases are denoted by .b. LSTM has been widely employed in diverse research studies, including time series prediction [7] and analysis of sEMG data [21].
Fig. 14.1 LSTM
156
N. Gandhi and S. Mishra
Fig. 14.2 Bi-LSTM
Fig. 14.3 Deep stacked LSTM
14.3.2 Bi-LSTM Bidirectional LSTM is an advancement of the LSTM architecture that operates on data in both the forward and backward directions, offering enhanced comprehension of sequential data [8]. It consists of two separate hidden layers of LSTM that propagate in opposite directions. Bi-LSTM has improved performance compared to traditional LSTM [1]. The output at time step .t is computed as a combination of forward hidden sequence .h t, f and backward hidden sequence .h t,b using sigmoid activation function .σ , shown in Fig. 14.2.
14.3.3 Deep Stacked LSTM Deep stacked LSTM is an improved architecture of LSTM network that uses multiple LSTM layers to increase model accuracy and obtain high-level representation of the dataset. The deep stacked LSTM architecture leverages the output from the previous hidden layer as an input for the subsequent layer. The LSTM memory cell comprises inputs from . L t , h l,t−1 at time .t and previous hidden state output .h l−1,t . The architecture and equations for Deep Stacked LSTM model can be described in Fig. 14.3. This architecture has shown to improve model performance and accuracy compared to shallow LSTM networks [9, 27].
14 Surface Electromyography Assisted Hand Gesture …
157
Fig. 14.4 Stacked bidirectional and unidirectional LSTM
14.3.4 Proposed Stacked Bidirectional and Unidirectional LSTM Higher level of representation of sequential data could be built using Stacked Bidirectional and Unidirectional LSTM. This architecture having different hidden LSTM layers where each LSTM layer getting input from the previous LSTM layers and feed to next hidden LSTM layer or any other layer that is capable of enhancing the power of the model. As previously discussed, bidirectional LSTM is well-suited for handling spatial-temporal data due to its ability to capture data dependencies both directions. Use of Bi-LSTM for the earlier layer of our neural network along with Unidirectional LSTM is optimal choice for capturing forward dependencies and predict the future sequences of data. Hence, it is suitable for using Unidirectional LSTM in the last layer of our neural network architecture [5]. The combined effect of Stacked Bidirectional and Unidirectional LSTM networks would prove to be useful in achieving better performance on prediction of future sequences of the data. The architecture and equations for Stacked Bidirectional and Unidirectional LSTM model can be described in Fig. 14.4. Where . f S BU L ,t , .o S BU L ,t , .c S BU L ,t , .i S BU L ,t represent forget, output, cell and input gate respectively.
14.4 Experiment The proposed model is an architecture for sign recognition utilizing sEMG signals acquired via the Myo armband. The sign recognition process involves several steps, as depicted in Fig. 14.5. The Myo armband comprises 8 sensors that capture sEMG signals, enabling the extraction of features and prediction of sign gestures. Figure 14.6 provides a cross-sectional view of the Myo armband, illustrating its. X, Y , and. Z axes, which record changes in sensor values corresponding to skeletal muscle movements.
158
N. Gandhi and S. Mishra
Fig. 14.5 Working of proposed recognition model
Fig. 14.6 Cross sectional view of myo armband
14.4.1 Data Acquisition The sEMG dataset for sign recognition was obtained using an Myo armband with eight sensors placed on the skin surface. The dataset included recordings of basic signs, namely rock, paper, scissor, and ok. Each row of the dataset had eight consecutive readings of all sensors, resulting in a total of 64 columns of sEMG data and one labeled column for the corresponding sign class. Each sign produced a specific type of electrical activity recorded at a frequency of 200 Hz for approximately 40 ms. The dataset included six recordings of each sign held in a fixed position for about 120 s using the right hand.
14.4.2 Data Preprocessing The Myo armband with eight sensors is used to acquire surface electromyography (sEMG) data for sign recognition. Feature extraction is performed on the dataset with 64 columns of sEMG data and an additional labeled column for resulting sign prediction. To normalize the data, each feature value is normalized to a range of 0–1. The effectiveness of the sEMG signals with Myo armband in providing accurate sign prediction by training a machine learning model is significant. The recorded data has values between 0 and 3, where Rock-0, Scissor-1, Paper-2, Ok-3. Preprocessed data is fed into the model for training the sign recognition system.
14 Surface Electromyography Assisted Hand Gesture …
159
Fig. 14.7 Architecture of recognition model
14.4.3 Model Architecture We introduce a novel approach for sign/gesture prediction using the sEMG dataset, called the Stacked Bidirectional and Unidirectional LSTM (SBUL) recognition model. The SBUL model consists of four layers of LSTM, with the first three layers being bidirectional and the final layer being unidirectional. Figure 14.7 illustrates the architectural depiction of SBUL model. After the dataset undergoes processing by the SBUL model, the resulting output is fed into two fully connected layers and a softmax layer consisting of four neurons, representing the four sign classes. The training of the model utilized a dataset consisting of 11,678 samples, which were divided into train and test sets, containing 7700 and 3978 samples, respectively. The model achieved an average accuracy of 96%, demonstrating consistent loss values throughout the training and testing phases. To mitigate overfitting, dropout was implemented between the LSTM layers. Our future endeavors will be dedicated to enhancing the model’s speed and accuracy while minimizing computational costs.
14.5 Results We evaluated our proposed model on Myo armband-based sEMG dataset and achieved an average accuracy of 96% with a minimal loss. The model was found to be useful for sign recognition, as it could handle long sequential data and long-term data dependencies. Graphs representing accuracy, precision, recall, and . F1-score for various signs are shown in Fig. 14.8. The proposed SBUL was trained on dataset having four types of signs namely rock, scissor, paper, and ok sign. The evaluation of model considering train, test, and correctly classified samples as illustrated by Fig. 14.9.
160
N. Gandhi and S. Mishra
Fig. 14.8 Performance metrics of accuracy, precision, recall, and . F1-score across different sign categories
Fig. 14.9 Prediction accuracy of signs
Fig. 14.10 Performance of various LSTM models
A comparative analysis was conducted between the proposed SBUL model and various other machine learning models using a provided sEMG dataset consisting of four sign gestures. The dataset was divided into an 80:20 train-test split. The results of this comparison are presented in Fig. 14.11, showcasing the performance of our proposed SBUL model in relation to the other models. In our experimentation, we explored different configurations of the neural network, including varying the height (number of layers) and width (number of neurons). Ultimately, we selected a stacked LSTM model with four layers for sign recognition. Figure 14.10 visualizes the different stacked LSTM models utilized for sign recognition, along with their corresponding detection accuracy (Fig. 14.11). The four layers of the LSTM model consisted of three layers of bidirectional LSTM and the last layer of unidirectional LSTM. This configuration of the model was found to be appropriate for achieving better performance and adequate accuracy for classification of signs and detection of future sequences. To enhance the architecture of the model, the technique of dropout was incorporated. The accuracy of our proposed SBUL model is visualized in Fig. 14.12, while the error rate, represented as a percentage, is illustrated in Fig. 14.13.
14 Surface Electromyography Assisted Hand Gesture … Fig. 14.11 Comparison of proposed model versus other models
Fig. 14.12 Model accuracy
Fig. 14.13 Model loss function
161
162
N. Gandhi and S. Mishra
Fig. 14.14 Confusion matrix
The four layers of SBUL having 50 neurons each were selected to perform forward and backward propagation in SBUL. In this case, we demonstrate the model was able to learn machine learning problem reasonably well, achieving about 97% accuracy on the training dataset and about 95% on the testing dataset. The scores suggest the model is probably not overfitting or underfitting and can achieve significant accuracy. The architecture helps us to increase accuracy of the model but trade-off for the proposed system is increased computational cost that might be dealt with using graphics processing unit (GPU). The results also prove that combining LSTM and Bi-LSTM had a great impact on SBUL neural network architecture for improving the performance of the network. Further, optimization was also made on dense layers of multilayer perceptron to achieve adequate performance. Our proposed model on SBUL proved to effective in case of biomedical devices like sEMG for prediction of signs compared to other machine learning models. Figure 14.14 displays the confusion matrix for the proposed SBUL model, showcasing the accurate predictions made by the model.
14.6 Conclusions This study introduces a novel approach, the Stacked Bidirectional and Unidirectional LSTM model, which leverages the sEMG dataset for the purpose of sign recognition. Our model demonstrates remarkable performance, achieving a train dataset accuracy of 97% and a test dataset accuracy of 95% across four sign classes: rock, paper, scissor, and ok. The integration of LSTM with external devices such as sEMG proves to be highly effective in recognizing sign language. Our future work will focus on enhancing the model’s efficiency and precision while minimizing compu-
14 Surface Electromyography Assisted Hand Gesture …
163
tational costs. Additionally, we aim to leverage GPUs for system power to facilitate parallel processing. The proposed model serves as a valuable tool for accurate prediction of lengthy sequences, offering substantial support for communication between individuals with hearing impairments and the broader society. Conflict of Interest Neel Gandhi and Dr. Shakti Mishra declare that they have no conflict of interest. Authors received support from Pandit Deendayal Energy University, Gandhinagar, Gujarat.
References 1. Aly, S., Aly, W.: DeepArSLR: a novel signer-independent deep learning framework for isolated Arabic sign language gestures recognition. IEEE Access 8, 83199–83212 (2020) 2. Atzori, M., Cognolato, M., Müller, H.: Deep learning with convolutional neural networks applied to electromyography data: a resource for the classification of movements for prosthetic hands. Front. Neurorobot. 10, 1–10 (2016) 3. Chuan, C.H., Regina, E., Guardino, C.: American sign language recognition using leap motion sensor. In: Proceedings—2014 13th International Conference on Machine Learning and Applications, ICMLA 2014, pp. 541–544 (2014) 4. Côté-Allard, U., Fall, C.L., Drouin, A., Campeau-Lecours, A., Gosselin, C., Glette, K., Laviolette, F., Gosselin, B.: Deep learning for electromyographic hand gesture signal classification using transfer learning. IEEE Trans. Neural Syst. Rehabil. Eng. 27(4), 760–771 (2019) 5. Cui, Z., Ke, R., Wang, Y.: Deep Bidirectional and Unidirectional LSTM Recurrent Neural Network for Network-wide Traffic Speed Prediction, pp. 1–11 (2018) 6. Geng, W., Du, Y., Jin, W., Wei, W., Hu, Y., Li, J.: Gesture recognition by instantaneous surface EMG images. Sci. Rep. 6, 6–13 (2016) 7. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. IEE Conf. Publ. 2(470), 850–855 (1999) 8. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF Models for Sequence Tagging (2015) 9. Jino, P.J., John, J., Balakrishnan, K.: Offline handwritten Malayalam character recognition using stacked LSTM. In: 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies, ICICICT 2017, Jan 2018, pp. 1587–1590 (2018) 10. Kumar, P., Gauba, H., Roy, P.P., Dogra, D.P.: Coupled HMM-based multi-sensor data fusion for sign language recognition. Pattern Recognit. Lett. 86, 1–8 (2017) 11. Kumar, P., Saini, R., Roy, P.P., Pal, U.: A lexicon-free approach for 3D handwriting recognition using classifier combination. Pattern Recognit. Lett. 103, 1–7 (2018) 12. Kuo, C.E., Chen, G.T.: Automatic sleep staging based on a hybrid stacked LSTM neural network: verification using large-scale dataset. IEEE Access 8, 111837–111849 (2020) 13. Lejmi, W., Khalifa, A.B., Mahjoub, M.A.: Fusion strategies for recognition of violence actions. In: Proceedings of IEEE/ACS International Conference on Computer Systems and Applications, AICCSA, Oct 2017, pp. 178–183 (2018) 14. Li, K., Zhou, Z., Lee, C.H.: Sign transition modeling and a scalable solution to continuous sign language recognition for real-world applications. ACM Trans. Access. Comput. 8(2) (2016) 15. Naguri, C.R., Bunescu, R.C.: Recognition of dynamic hand gestures from 3D motion data using LSTM and CNN architectures. In: Proceedings—-16th IEEE International Conference on Machine Learning and Applications, ICMLA 2017, Dec 2017, pp. 1130–1133 (2017) 16. Quivira, F., Koike-Akino, T., Wang, Y., Erdogmus, D.: Translating sEMG signals to continuous hand poses using recurrent neural networks. In: 2018 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2018, Jan 2018, pp. 166–169 (2018)
164
N. Gandhi and S. Mishra
17. Sarkar, A., Gepperth, A., Handmann, U., Kopinski, T.: Dynamic hand gesture recognition for mobile systems using deep LSTM. In: International Conference on Intelligent Human Computer Interaction, pp. 19–31. Springer, Cham (2017) 18. Sawant, S.N., Kumbhar, M.S.: Real time sign language recognition using PCA. In: Proceedings of 2014 IEEE International Conference on Advanced Communication, Control and Computing Technologies, ICACCCT 2014, vol. 978, pp. 1412–1415 (2015) 19. Srinivas, K., Rajagopal, M.K.: Study of hand gesture recognition and classification. Asian J. Pharm. Clin. Res. 10, 25–30 (2017) 20. Starner, T.E.: Visual Recognition of American Sign Language Using Hidden Markov Models (1991) 21. Staudemeyer, R.C., Morris, E.R.: Understanding LSTM—A Tutorial into Long Short-Term Memory Recurrent Neural Networks, pp. 1–42 (2019) 22. Sun, L., Wang, Y., He, J., Li, H., Peng, D., Wang, Y.: A stacked LSTM for atrial fibrillation prediction based on multivariate ECGs. Health Inf. Sci. Syst. 8(1) (2020) 23. Tamura, S., Kawasaki, S.: Recognition of sign language motion images. Pattern Recognit. 21(4), 343–353 (1988) 24. Wu, Y., Zheng, B., Zhao, Y.: Dynamic gesture recognition based on LSTM-CNN. In: Proceedings 2018 Chinese Automation Congress, CAC 2018, pp. 2446–2450 (2019) 25. Yin, W., Kann, K., Yu, M., Schütze, H.: Comparative Study of CNN and RNN for Natural Language Processing (2017) 26. Zhang, X., Chen, X., Li, Y., Lantz, V., Wang, K., Yang, J.: A framework for hand gesture recognition based on accelerometer and EMG sensors. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 41(6), 1064–1076 (2011) 27. Zhu, W., Lan, C., Xing, J., Zeng, W., Li, Y., Shen, L., Xie, X.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. arXiv preprint arXiv:1603.07772 (2016)
Chapter 15
The Potential of Using Corpora and Concordance Tools for Language Learning: A Case Study of ‘Interested in (Doing)’ and ‘Interested to (Do)’ Lily Lim and Vincent Xian Wang
Abstract This study uses corpus resources to investigate the rules of usage of two closely related expressions in English—‘interested in (doing)’ and ‘interested to (do)’—in order to assess the potential for language learning. Our investigation developed in three stages—(a) gaining a general view of the trends of use of the two expressions, b) identifying the verbs that construct the two, and c) detecting the rules of usage by scrutiny of the available evidence. Our results attest that the corpus technology provides readily useful information in the first two stages, while in-depth human–computer interaction (HCI) is crucial for inquisitive language learners to discover the rules in the third stage. The process of discovery not only takes multiple stages but also requires intensive human intelligence. We discuss the model for HCI as a focus for future studies in the context of the new generation of AI such as ChatGPT.
15.1 Introduction Language technologies in the last four decades have impacted (second) language teaching and learning in many ways. Authentic language materials and easy-to-use text analysing tools have not only benefitted lexicographers and language curriculum developers, but have also been made available to language learners and language users [1–4]. Texts collected from real-life language-use situations have been utilized by lexicographers so that dictionary entries can present authentic examples rather than those thought out by lexicographers. Mainstream dictionaries are now compiled based on large-scale language corpora, and this leads to the provision of information L. Lim (B) MPU-Bell Centre of English, Macao Polytechnic University, Macao, China e-mail: [email protected] V. X. Wang Department of English, University of Macau, Macau, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_15
165
166
L. Lim and V. X. Wang
about the distribution of the use of words and expressions across different genres and text types. Such information enables dictionary users to observe the use of words in authentic contexts, so that they are less reliant on the knowledge and expertise of their teachers and that of lexicographers. Language learners’ use of corpus and concordancing tools pertains to the paradigm of data-driven learning (DDL) [5–11], and this can be compared with putting a fishing pole in the hands of the learners. Given that making use of the technological tools to harvest involves techniques, skills, and a level of research competence, a stream of research has been developed to investigate the key factors underpinning effective DDL, such as learners’ attitude in using technology, their learning outcome compared with conventional modes of learning, and the potential affordance offered by the technologies [5–13]. In general, learners have been found to favour the use of language technologies, and the experimental groups who used corpus technologies tended to outperform the control groups in their language performance. Although the results are informative and promising, there is still a great need to gather evidence on how the learners interact with the corpus technologies in the context of their language learning and language production. We hold that, of a wide array of factors involved in a computer-assisted learning environment, the depth of human–computer interaction is the key to achieving effective DDL, an area that deserves attention in future research. The present study originated from a problem arising from language learning— that is, how to use ‘interested in (doing)’ and ‘interested to (do)’ properly. Language teaching textbooks tend to contain examples of ‘interested in (doing)’, while the instances of ‘interested to (do)’ are less notable. Some language teachers simply instruct that the former construction is the correct one, and the latter incorrect. This, of course, is not in line with the fact that the latter construction is still used by native speakers in various contexts, although not as frequently as the former one. It is plausible to consider that the latter construction fulfils some function not covered by the former one and therefore should be part of the lexical repertoire of (second) language learners. The proper way to use the two constructions in context can perplex even highly proficient second language learners. This leads us to the issue of where language learners can find the rules about using the two constructions. Major English dictionaries are not ready resources for this purpose, for example, in the Oxford English Dictionary (https://www. oed.com) and Merriam-Webster Dictionary (https://www.merriam-webster.com/), either ‘interested in’ or ‘interested to’ is an entry. The investigators were able to discover that only the Longman dictionary offers instruction about the use of the two expressions (https://www.ldoceonline.com/dictionary/interested), which is brief and without elaboration on the rule of usage though: GRAMMAR: Patterns with interested • You are interested in something: She is interested in politics. Don’t say: She is interested on politics. • You are interested in doing something: Are you interested in working abroad? Don’t say: Are you interested to work abroad?
15 The Potential of Using Corpora and Concordance Tools for Language …
167
• You say that you would be interested to hear/know/see/find out something: I would be interested to know what she thinks about the idea. Don’t use interested to with other verbs such as ‘have’ or ‘buy’. Another source of information is the books that explain the proper usage of words and expressions in English. The investigators found popular usage references do not help much—e.g. Swan’s Practical English Usage [14] does not provide entries on the two expressions, and Collins COBUILD English Usage [15] introduces ‘interested in (doing)’ to language learners while warning them not to use ‘interested to (do)’. Given that the most available reference tools—dictionaries and usage books—for language learners do not provide ready answers, we turned to commonly available corpora with concordancing tools, intending to see whether language learners can access these resources in order to discover the rules of usage by themselves. This leads us to our research question for the present study: To what extent can commonly available corpora and concordances tools enable language learners to seek out information or even rules about how to use ‘interested in (doing)’ and ‘interested to (do)’?
This is an overarching question intended to probe the potential afforded by the language technologies that can enable language learners to interrogate, query, and analyse naturally occurring texts in the corpora. To answer our Research Question, we take the perspective of human–computer interaction in the context of inquisitive learning [16]. We laid out more specific research questions to guide the present study: • What questions can be raised at different stages along with language learners’ interrogation of the corpora and their development of understanding • To what extent the corpora-based searches offer useful information that answers the questions at various stages? • What is the level of expertise needed, technically and intellectually, to seek out the answer at these stages? Our intention is to tease out a prototypical model that encapsulates learners’ typical use of corpus resources to reach learning outcomes. We understand that, although a prototype is of value in its own right, there is certainly considerable room for individual language learners to develop their own ways of using the corpus resources.
15.2 Methodology The corpus resources and the stages of investigation of the present research are introduced in this section.
168
L. Lim and V. X. Wang
15.2.1 Corpus Tools We used the British National Corpus (BNC) as our main corpus for investigation, given that BNC has been used for dictionary construction and is commonly known for its language quality, coverage of genres, and the rigour of compilation. BNC was accessed on the platform of Sketch Engine (SkE, https://app.sketchengine.eu/), which provides concordancing tools and enables queries of a string of words and/or parts of speech using corpus query language (CQL). In addition, the Google Books Ngram Viewer (GNV) was used to describe the trends of word/phrase use, in view of its merits in capturing the trends diachronically and synchronically [17, 18].
15.2.2 Stages of Investigation To understand the potential offered by the corpus resources for supporting language learners to seek out the rules on how to use the two expressions, we did not use highly sophisticated techniques to interrogate the corpus materials. We attempted to pursue a path by which general language learners would normally tread to solve this language-use puzzle, in other words, a prototypical way of using corpora for rule discovery. From the present investigation, a journey to unpack the rules of usage takes at least three stages—(a) gaining a general understanding of whether the two expressions are in actual use, (b) identifying specific verbs that occur in each of the two constructions, and (c) conducting in-depth analysis to sort out the rules that stipulate which one of the two expressions should be used in what context.
15.3 Results We present our findings in the three stages of investigation (cf. Sect. 15.2.2).
15.3.1 Stage One: A General Picture In the first stage, language learners need to gain an overall picture of the two expressions in terms of their frequencies of occurrence. This allows them to see the extent to which ‘interested to (do)’ is used in authentic communication situations, and decide for themselves whether it is negligible compared to ‘interested in (doing)’. A search of the two expressions using the following query can reveal the overall frequency of occurrence of both expressions in the collection of books in the Google Books Ngram Viewer (GNV): interested in *_VERB, interested to *_VERB
15 The Potential of Using Corpora and Concordance Tools for Language …
169
Fig. 15.1 ‘Interested in (doing)’ and ‘interested to (do)’ in Google Books Ngram Viewer
Figure 15.1 illustrates that the frequency of ‘interested to (do)’ stands around onethird to a half of the frequency of ‘interested in (doing)’ from the 1950s till recent years. The structure ‘interested to (do)’ does not look trivial nor negligible. The British National Corpus (BNC) also provides essential information about the frequency of occurrence of the two structures. Two basic queries can be formulated using corpus query language (CQL) as follows: [word = “interested”] [word = “in”][tag = “V.*”] [word = “interested”] [word = “to”][tag = “V.*”] The former query returns 1300 hits while the latter 434. This result is consistent with the result from GNV—that is, the construction ‘interested in (doing)’ occurs around three times as frequently as does ‘interested to (do)’. Based on the results from both GNV and BNC, one should reach an understanding that ‘interested to (do)’, used less commonly than ‘interested in (doing)’ though, still accounts for a substantial and non-negligible portion in comparison with the latter.
15.3.2 Stage Two: Verbs that Occur in the Two Constructions Stage one should allow language learners to understand that both expressions are indispensable for their learning, and the subsequent investigation should be directed towards discovering the rules on when to use which. In this direction, the commonly used verbs that construct each of the two expressions in the slot of ‘do’/‘doing’ should be first identified, and they are indeed readily identifiable with corpus resources. A single click on the red curve in Fig. 15.1—i.e. the one for ‘interested to (do)’— displays the most frequently occurring verbs for realizing the ‘interested to (do)’ structure in GNV (see Fig. 15.2). The top six verbs are ‘know’, ‘see’, ‘hear’, ‘learn’, ‘read’, and ‘find’ listed in the order of descending frequency. Similarly, the topic six verbs in the ‘interested in (doing)’ structure can be retrieved—i.e. ‘learn’, ‘get’, ‘find’, ‘make’, ‘see’, and ‘know’—also listed in the descending order. The items
170
L. Lim and V. X. Wang
Fig. 15.2 Top verbs that construct ‘interested to (do)’ in GNV
that occur on both lists are marked in bold-faced fonts for emphasis—‘know’, ‘see’, ‘learn’, and ‘find’–and the four are in different orders in the two lists. Given the fact that four of the six most frequently used verbs that construct the two expressions overlap according to GNV, it is reasonable to speculate that there is a scope of interchangeability between the two expressions, at least regarding the use of the four verbs. At this point, the rules of usage do not easily emerge in terms of the type (as opposed to the token) of words for constructing the two expressions, because the two lists overlap considerably rather than being distinct. In addition, interrogating BNC in the interface of SkE enables the verbs that construct the two expressions to be systematically retrieved (cf. Sect. 15.3.1), tabulated, and contrasted, as in Table 15.1, in which noises have been manually cleared such as ‘interested in is/was’. Of the 22 most frequently occurring verbs that construct the two expressions, there are nine overlapping items (marked as bold-faced in Table 15.1)—‘see’, ‘find’, ‘be’, ‘have’, ‘get’, ‘know’, ‘receive’, ‘learn’, and ‘talk’. This corroborates the findings from GNV that the lists of the most commonly used verbs for both expressions overlap considerably. Even though the nine verbs can be used in either one of the two expressions, it is still possible that each of the verbs entails fine-grained meaning that differs from one construction to the other in which it occurs. Stage two involves straightforward queries, resulting in very useful lists of the commonly used verbs for the two expressions for language learners. The fact that the two lists overlap removes the possibility that there exists a simple rule stipulating that a group of verbs occur exclusively in ‘interested to (do)’, while another group of verbs are used only in ‘interested in (doing)’. The rule/s for using the two expressions, if any, should be more complex and require more in-depth investigations to uncover.
15.3.3 Stage Three: In-Depth Observations and the Rules From the investigations in Stage two, the types of the most commonly used verbs that construct the two expressions do not enable the discovery of the rules about how to use the expressions. However, a closer examination of the verbs on the two frequency
15 The Potential of Using Corpora and Concordance Tools for Language …
171
Table 15.1 Most commonly occurring verbs that construct the two expressions in BNC Rank
‘interested in’
Freq.
‘interested to’
Freq.
1
Taking
47
Know
106
2
Buying
42
Hear
101
3
Joining
39
See
79
4
Doing
38
Read
39
5
Seeing
32
Learn
24
6
Finding
31
Find
12
7
Making
30
Note
9
8
Going
27
Discover
3
9
Being
24
Determine
3
10
Developing
21
Receive
3
11
Having
20
Have
2
12
Getting
18
Observe
2
13
Working
18
Experience
2
14
Playing
17
Meet
2
15
Knowing
17
Look
2
16
Receiving
15
Share
2
17
Exploring
14
Want
2
18
Pursuing
14
Stay
2
19
Attending
13
Get
2
20
Learning
13
Come
2
21
Talking
13
Be
2
22
Running
13
Talk
2
Notes The bold-faced words are the items that overlap between the top 22 lists of the two expressions. The underlined words are the items that occur more frequently in ‘interested to (do’) than in ‘interested in (doing)’
lists in Table 15.1 in terms of the tokens (that is, the frequencies) of the verbs suggests a marked difference between the two. Sharped-eyed learners would notice that the verbs on the ‘interested in (doing)’ list gradually decrease in their frequency count from 47 on the top to 13 on the bottom of the list, while, by contrast, the verbs on the ‘interested to (do)’ list rapidly decline in the frequency count—from 106 on the top to only 3 for the eighth item and to 2 starting from the eleventh item of the list. This means there is a much wider repertoire of verbs contributing to the overall use of ‘interested in (doing)’, whereas only a small number of verbs are heavily used in ‘interested to (do)’. In the latter case, only seven verbs are used nine times or more, while these seven items account for 370 out of the total 434 (85.3%) instances of ‘interested to (do)’. This conspicuously uneven distribution in the frequency of occurrence would allow perceptive learners to understand that this small number of
172
L. Lim and V. X. Wang
the most frequently used verbs should be the foci of study, from which the rule/s of using ‘interested to (do)’ may be discoverable. The seven verbs—‘know’, ‘hear’, ‘see’, ‘read’, ‘learn’, ‘find’, and ‘note’—on the list should first be examined together to detect any characteristics or properties they share. These verbs occur in expressions such as ‘interested to know, to hear, to learn, and to find out’ to delineate an action of gaining information that one feels eager to perform. Obtaining the information pertains to mental processes involving human senses such as sight and hearing and their ability to search, notice, and learn. This group of verbs is therefore distinct from the numerous non-perception verbs on the ‘interested in (doing)’ list such as ‘take’, ‘buy’, ‘join’, ‘do’, ‘go’, and ‘play’ (cf. Table 15.1). We can now tease out the first rule of usage: Rule 1: ‘interested to (do)’ typically delineates an information-gathering action willingly performed by a human agent using eye-sight, hearing, and various mental processes such as noticing, discovering, and learning. In addition, the seven verbs for constructing ‘interested to (do)’ should be investigated individually, while the instances of each of these verbs occurring in ‘interested in (doing)’ can serve as references. Studying the examples of these verbs in context— in particular, the concordance lines of their occurrences in the SkE interface—appears to be indispensable for gaining an in-depth understanding in order to tease out the rule/s or reasons why one of the two expressions is employed in context. The top item on the list, ‘interested to know’, occurs far more frequently than ‘interested in knowing’ in BNC (cf. Table 15.1). This tendency is confirmed by enTenTen13, a much larger corpus built in 2013 and also available at SkE, in which ‘interested to know’ occurs 15,405 times and ‘interested in knowing’ 9253 times. To collect examples from BNC, the Good Dictionary Examples feature is used that provides full sentences in which ‘interested to know’ occur, while the adjacent sentences can also be easily displayed when the investigator needs more information about the context. By comparing the examples of ‘interested to know’ with those of ‘interested in knowing’, it is clear that the difference between the two is not about spoken versus written communication, because both expressions occur in both modes of communication. The differences appear to be more about fine shades of meaning. The instances of ‘interested to know’ tend to occur in dynamic social interactions, in which one of the participants is looking forward (in the sense of ‘cannot wait’) to obtain certain information, such as in examples (1) and (2). Such dynamism in the specific context is often expressed by intensifiers, such as ‘eagerly (awaited)’, ‘extremely’, and ‘suddenly’ in (1)–(3). (1) The findings will be eagerly awaited. People in the Sellafield area will be very interested to know. (2) I would be extremely interested to know the thinking behind this policy, and look forward to your assurance […] (3) The atmosphere had suddenly changed in the windowless little room, and she was interested to know why. (4) We are always interested in knowing what you, as Friends, think about the changes we are making, to help us to do even better in the future.
15 The Potential of Using Corpora and Concordance Tools for Language …
173
(5) I later discovered that this country was called Brobdingnag. […] I went with them because I was interested in seeing a new country. (Bold-faced italics and underlines are added by the investigators) By contrast, using ‘interested in knowing’, the speaker does not stress the intensity of their desire to obtain some interesting information. Rather, their desire to gain access to the information is conveyed in a more objective and detached manner, often as a general statement about their inclination, as in (4). Uttering ‘interested in knowing’ sounds like a gentle request to the hearer who holds the information, who, in turn, can either grant or deny the speaker’s access to the information at their discretion. If the phrase ‘interested in knowing’ in (4) is substituted by ‘interested to know’, the sentence would sound more coercive to the addressees, and this is not the manner in which the speakers communicated with the addressees, who were known as Friends in (4). Similarly, by ‘interested in seeing’, the speaker in (5) explains his interest in general in visiting a new country. He is not referring to a specific country which he had a strong desire to see, because he did not even know the name of the country when he set off on board. Closely reading the examples in context becomes crucial for discovering the rules for using the two alternative structures. From the qualitative study illustrated above, we can formulate tentative rules (or tendencies) of usage as follows. Of the two constructions: Rule 2: ‘interested to (do)’ is relatively more powerful for expressing one’s strong desire to obtain some interesting information here and now. Rule 3: ‘interested in (doing)’ is relatively more useful for denoting a general fact about one’s interest in something, sounding more detached (there and then) and less intense regarding satisfying the interest.
15.4 Discussion and Conclusion The present research confirms the value of using corpus resources for assisting language learners to work out rules of using two closely related expressions— ‘interested in (doing)’ and ‘interested to (do)’. The corpus resources are effective in depicting the main trends of use—in particular, the frequency of occurrence of the two constructions and the verb types for constructing the two (Stages 1 and 2). However, the facility of the corpus resources and commonly used standard queries does not allow the rules of usage about the two expressions to emerge. At this juncture, intensive human intelligence is still required to examine the typical instances of the two expressions and tease out the fine differences between the two constructions in terms of emphasized meaning, the speaker’s intention, and interpersonal functions. Table 15.2 summarizes the main findings of the present research by the stages of investigation. This study reveals that conventional corpus technologies provide tangible support for language learners’ discovery of the rules of language use, while inquisitive learners’ in-depth interaction with the corpus resources through multiple stages is
174
L. Lim and V. X. Wang
Table 15.2 Summary of the three stages: findings and required competence Stage
Findings
IT techniques
Human intelligence required
1
A general understanding that both expressions are in use
Basic queries with GNV and SKE
Low
2
The lists of the verbs that construct the two expressions overlap considerably
Standard concordance queries at SkE
Medium
3
Rules of usage detected from observing typical instances in context
Standard concordance queries at SkE
High
crucial for the outcome. Our research lends support to the model of human intelligence assisted by technology—i.e. ‘computer as tool’ [19]—in terms of human– computer interaction, attesting that intricate language rules are better perceived and articulated by human brains. However, it is noteworthy that conventional corpus technologies are no longer cutting-edge. The new generation of AI technology such as ChatGPT strongly promotes the model of ‘computer as tutor’ [19]. The relative advantage of human intelligence in language processing is now forcefully challenged by powerful AI tools. At this juncture, the model of computer-assisted language learning shall be one of the foci for investigation in future studies.
References 1. Chambers, A.: Integrating corpus consultation in language studies. Lang. Learn. Technol. 9, 111–125 (2005) 2. Kilgarriff, A., Marcowitz, F., Smith, S., Thomas, J.: Corpora and language learning with the sketch engine and SKELL. Revue Francaise De Linguistique Appliquee 20, 61–80 (2015) 3. Yang, Y.F., Harn, R.F., Hwang, G.H.: Using a Bilingual concordancer for text revisions in EFL writing. Educ. Technol. Soc. 22, 106–119 (2019) 4. Zanettin, F.: Corpus-based translation activities for language learners. Interpret Transl. Tra. 3, 209–224 (2009) 5. Ackerley, K.: Effects of corpus-based instruction on phraseology in learner English. Lang. Learn. Technol. 21, 195–216 (2017) 6. Boulton, A., Cobb, T.: Corpus use in language learning: a meta-analysis. Lang. Learn. 67, 348–393 (2017) 7. Cotos, E.: Enhancing writing pedagogy with learner corpus data. ReCALL 26, 202–224 (2014) 8. Perez-Paredes, P.: A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015. Comput. Assist. Lang. Learn. 35, 36–61 (2022) 9. Schmidt, N.: Unpacking second language writing teacher knowledge through corpus-based pedagogy training. ReCALL 35, 40–57 (2023) 10. Sha, G.Q.: Using Google as a super corpus to drive written language learning: a comparison with the British National Corpus. Comput. Assist. Lang. Learn. 23, 377–393 (2010) 11. Vyatkina, N.: Corpora as open educational resources for language teaching. Foreign Lang. Ann. 53, 359–370 (2020) 12. Ballance, O.J.: Narrow reading, vocabulary load and collocations in context: exploring lexical repetition in concordances from a pedagogical perspective. ReCALL 33, 4–17 (2021)
15 The Potential of Using Corpora and Concordance Tools for Language …
175
13. Frankenberg-Garcia, A.: The use of corpus examples for language comprehension and production. ReCALL 26, 128–146 (2014) 14. Swan, M.: Practical English usage. Oxford University Press, Oxford, England (2005) 15. Hands, P., Wild, K.: Collins COBUILD English usage. HarperCollins Publishers (2012) 16. Lim, L.: Interpreting training in China: past, present and future. In: Lim, L., Li, D. (eds.) Key Issues in Translation Studies in China: Reflections and New Insights, pp. 143–160. Springer Singapore, Singapore (2020) 17. Li, L., Huang, C.-R., Wang, V.X.: Lexical competition and change: a corpus-assisted investigation of gambling and gaming in the past centuries. SAGE Open 10, 1–14 (2020) 18. Lim, L.: Are TERRORISM and kongbu zhuyi translation equivalents? A corpus-based investigation of meaning, structure and alternative translations. In: Otoguro, R., Komachi, M., Ohkuma, T. (eds.) Proceedings of the 33rd Pacific Asia Conference on Language, Information and Computation, pp. 516–523, Future University Hakodate (2019) 19. Levy, M., Stockwell, G.: CALL Dimensions: Options and Issues in Computer-Assisted Language Learning. Routledge, London (2013)
Chapter 16
Analysis of Various Video-Based Human Action Recognition Techniques Using Deep Learning Techniques Lakshmi Alekhya Jandhyam, Ragupathy Rengaswamy, and Narayana Satyala
Abstract Human action recognition is the ability to identify and naming activities using Artificial Intelligence (AI) from the collected movement raw information through variety of resources. Distinguishing human activities from images or video sequences is a challenging task because of problems, including background untidiness, biased occlusion, and scale changes. In this survey, a complete reassess of modern and high-tech research advances in the field of human motion categorization is explicated. In particular, human activity recognition methods are classified into four categories according to the methods used. Moreover, the review is prepared based on the published year of the article, the method used for research, and performance metrics. Finally, the research gaps and concerns of systems are explained for raising an efficient practice for human action recognition techniques using deep learning approaches.
16.1 Introduction The detection of human action and its relations among the person’s surroundings is a dynamic investigation region in the past few years owing to its probable application in a range of areas. To accomplish the demanding task, several research fields concentrate on modeling human activities under its numerous features including L. A. Jandhyam (B) · R. Rengaswamy Department of Computer Science and Engineering, Annamalai University, Annamalai Nagar, Tamil Nadu, India e-mail: [email protected] R. Rengaswamy e-mail: [email protected] N. Satyala Department of Computer Science and Engineering, Seshadri Rao Gudlavalleru Engineering College, Gudlavalleru, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_16
177
178
L. A. Jandhyam et al.
feelings, relational thoughts, behaviors, and so on [1]. In computer vision, recognizing the activity of humans is a difficult task as well as it has plentiful applications together with automatic observation, aged activity observing, and communication between humans and computers. In the last few years, a widespread amount of research on human activity recognition has been performed to build an effective system. Nevertheless, recognizing human action is a difficult process because of the great differences in human appearance and activity speed. It also undergoes a variety of undecided tasks including, tangled backgrounds, occlusion, and clarification alterations [2]. The objective of recognizing the activity of humans is to understand the present activity as well as the purposes of the human body through a sequence of interpretations and investigations of human activity and its surroundings. Human action recognition has attracted concentration as a consequence of its benefits of well-known relevance in intelligent observation methods, healthcare systems, virtual reality communications, smart residences, anomalous activity identification, and other areas, in addition to its capability to offer individualized support and relation for various fields. Furthermore, in the computer science area, human action recognition has been a significant concern [3]. In recent times, numerous action recognition techniques based on brain mechanisms have been introduced with an improved understanding of the techniques responsible for recognizing activity. One of the most important benefits of deep learning approaches is the ability to achieve end-to-end optimization. For instance, a novel mechanism of action recognition based on Slow Feature Analysis (SFA) [4] was established, which extracted the gradually changing features from a quickly changing signal. Subsequently, in [5], a two-layered SFA learning method was developed to capture abstract and structural features from the video for human action recognition by combining SFA with deep learning techniques. Moreover, Artificial Neural Networks (ANNs), as a family of statistical learning algorithms stimulated by biological neural networks, were developed to identify human activity from video series [2]. Multi-task training is an approach, where associated responsibilities can advantage of one another, as recommended by Kokkinos [6]. Activity recognition and pose evaluation are generally tough to be stitched mutually to achieve a helpful joint optimization, usually involving 3D convolutions [7] or heat map transformations [8]. The techniques based on detection need a non-differentially argmax function to improve the joint coordinates like a post-processing phase, which cracks the back-propagation sequence required for end-to-end learning [9]. While the Convolutional Neural Network (CNN) has accomplished enormous achievements in image processing, a lot of valuable networks based on CNN were proposed. Although there are also many deep networks along with immense performance for action recognition, such as long short-term memory (LSTM) for long-term feature modeling [10]. The existing approaches for effective human action recognition methods are usually performed in RGB data [11]. Analyzing several existing approaches to recognize human activity based on video is the main goal of this survey. The existing approaches are classified into methods based on CNN, methods based on fusion, approaches based on the skeleton, methods
16 Analysis of Various Video-Based Human Action Recognition …
179
based on Deep Belief Network (DBN), and other methods. This survey is implemented by considering the utilized tool, metrics used for evaluation, methods classification, and so on. Similarly, accuracy is the evaluation metric for the human action recognition technique. The challenges faced by the existing techniques are portrayed in research gaps and issues. Consequently, research confronts part is planned as motivation for further extension of efficient methods in human action recognition. The order of this survey is Sect. 16.2 that explicates existing research on recognizing human action techniques using deep learning techniques; Sect. 16.3 demonstrates research issues and confronts human action recognition systems. The assessment of action recognition methods by used tools, evaluation metrics, and published year of a research article is explained in Sect. 16.4. Finally, Sect. 16.5 provides the complete work’s conclusion.
16.2 Literature Review Commonly, the majority of the existing methods efficiently predict human activity. The various methods developed for recognizing human activity based on video with deep learning techniques are described in this section.
16.2.1 Classification of Human Action Recognition Methods A variety of human action detection methods using deep learning techniques is displayed in Fig. 16.1. At this point, human action recognition techniques are classified into four categories. The categorized types are methods based on CNN, methods based on fusion approaches based on the skeleton, methods based on Deep Belief Network (DBN), and other methods. Various existing methods used for human action recognition are illuminated below.
Fig. 16.1 Classification of human action recognition methods
180
L. A. Jandhyam et al.
(a) Methods Based on CNN CNN is a deep learning technique specifically used for solving complex problems and rises above the limitations of traditional machine learning techniques. Since CNNs have high accuracy, they are commonly utilized in the classification and recognition of the image. CNN works by taking an image, assigning weights based on the various objects in the image, and followed by distinguishing them from each other. CNN requires extremely small pre-processing data when compared with other deep learning algorithms. The various CNN-based approaches considered in video-based human action recognition techniques are elucidated below: Baccouche et al. [12] proposed a fully automated deep model, which classifies human actions without the use of any prior knowledge. On account of the expansion of CNN to 3D, the initial step of this system is done and also, and it can learn automatically about spatiotemporal features. Afterward, Recurrent Neural Network (RNN) was employed, and it was trained to classify with regard to the temporal development of learned features for every time step. The classification performance was highly increased in this approach. Ji et al. [13] developed a novel 3D CNN model for movement recognition. This technique extracts functions from each spatial and temporal dimension with the aid of performing 3D convolutions, in that way capturing the motion statistics encoded in more than one adjoining frame. From the input frames, this system creates numerous channels of data, and the last function representation was received by combining information from all channels. This system was performed to identify human moves in actual-world surroundings, and it obtains higher performance without depending on handcrafted characteristics. Guha and Ward [14] explored the effectiveness of sparse representations acquired through learning a set of complete sources existing in the context of action identification in videos. This system investigates three complete dictionary learning frameworks to facilitate human movements. By using a set of spatiotemporal descriptors, an over-complete dictionary was built. It was done by representing every descriptor using some linear arrangement of dictionary factors’ small variety. Compared to prevailing techniques, this approach provides a richer illustration and greater compact video series. This system repeatedly achieves the latest consequences on numerous public data sets, and it was reasonably trendy and may be used to handle different classification troubles. Moniruzzaman et al. [15] introduced a network that implants a novel Discriminative Function Pooling (DFP) method in addition to a singular Video Segment Attention Model (VSAM), for video primarily based human action recognition from both trimmed and untrimmed videos. It also introduced an attentional pooling mechanism for 3D CNNs. To highlight the channel-wise capabilities, temporal, and most critical spatial associated with the activities inside a video segment the introduced mechanism pools attentionally 3D convolutional characteristic maps. Better performance was attained by the joint design of VSAM and DFP modules that was optimized in a lengthwise manner. This network was also proficient and smooth to implement.
16 Analysis of Various Video-Based Human Action Recognition …
181
(b) Techniques Based on Fusion Generally used for improving the performance of human activity recognition, fusionbased approaches are used. For recognizing human activity, action-fusion, decisionlevel fusion, and feature-level fusion methods are employed. The different fusionbased approaches utilized in video-based human action recognition techniques are illustrated below: Chen et al. [16] presented a fusion approach for getting better human action recognition primarily derived from two differing modality sensors consisting of a depth camera and an inertial body sensor. This approach mainly comprises two functions, namely statistical signal attributes and depth motion maps for recognizing human action. By using a collaborative representation classifier, each selection-stage fusion and feature-stage fusion were observed for recognizing action. In the feature-stage fusion, the features were generated from the two differing modality. The Dempster– Shafer principle was used to combine the class effects from classifiers, while inside the decision-level fusion. Kamel et al. [17] developed an approach for human action recognition from depth maps and posture data with CNNs. Three CNN channels were trained with incomparable inputs, to facilitate maximal feature extraction performance for correct activity categorization. Utilizing Depth Motion Images (DMIs), the primary channel was trained, the next channel was trained with together moving joint descriptors and DMIs as a group, as well as the last channel was trained by transferring joint descriptors solitary. For the final action type, the movement predictions generated from the three CNN channels were merged collectively. This approach endorses various fusion score functions to increase the right motion’s rating. (c) Skeleton-Based Approaches Skeleton data, consisting only of 2D/3D coordinates of human joints, have been extensively studied for recognizing human activity. Recognizing human activity primarily depends on skeleton series, so traditional skeleton-based methods generally need to extract movement styles from firm skeleton sequences. A range of skeletonbased methods employed in video-based human action recognition is demonstrated below. Talha et al. [18] described a unique method for immediate, point-of-view invariant, and premature action recognition. A novel descriptor Body-part Directional Velocity (BDV) that uses hierarchical information of algebraic velocity of skeleton joints was applied for early detection of actions. Afterward, Gaussian Mixture Model (GMM) with Hidden Markov Model (HMM) kingdom-output distributions were used for the categorization process.
182
L. A. Jandhyam et al.
Liu et al. [19] offered the foremost adversarial attack on skeleton-based action popularity with GCNs. Constrained Iterative Attack for Skeleton Actions (CIASA) is the aimed attack, that interrupts the position of a joint in movement series, thereby spatial integrity, anthropomorphic probability of skeleton, and temporal coherence were preserved by adversarial sequence. CIASA attains the feat by fulfilling several physical constraints as well as it utilizes spatial skeleton shifts for disturbed skeletons with generative networks with regularization of adversarial skeletons. Moreover, based on the desires of the attack, this model was executed in extraordinary modes. (d) DBN-Based Approaches DBN is employed in either an unsupervised or supervised setting. Usually, the DBN model consists of numerous layers of neural networks, and it has exact robustness in the categorization process. The range of DBN-based approaches utilized in videobased human action recognition techniques is clarified below. Uddin et al. [20] developed a depth camera-based robust Facial Expression Recognition (FER) method for improved human–machine interaction. This system attained eight directional strengths for all pixels and the signs of some peak strengths were ordered to symbolize distinctive and robust face characteristics; these features were denoted as Modified Local Directional Pattern (MLDP). Further, by Generalized Discriminant Analysis (GDA) for improving face extraction of features, the MLDP features were processed. Afterward, MLDP-GDA characteristics were applied with DBN to train various facial expressions. Finally, to distinguish expressions of face in-depth video, the trained DBN was used for the testing process. Hassan et al. [21] presented a smartphone inertial sensors-based method for human action recognition. The proficient capabilities were extracted first from unprocessed data, wherein the significant features were autoregressive coefficients, mean, median, and so on. Using Linear Discriminant Analysis (LDA) and Kernel Principle Component Analysis (KPCA), the features were further processed for obtaining a robust system. At last, the features were trained based on DBN for effective motion identification. (e) Other Approaches Other techniques employed for video-based human action recognition techniques using deep learning techniques are expressed below. Wang et al. [22] developed a novel sparse method for human action recognition using high-level action units to characterize human actions in videos. This approach has three interrelated mechanisms, at first novel context-aware spatial temporal descriptor, termed locally weighted word context was designed to progress the discriminability of conventionally employed local spatial–temporal descriptors. Afterward, action units using graph-regularized nonnegative matrix factorization were considered from the data of context-aware descriptors, thereby geometrical information was encoded. Besides, these units successfully link the semantic gap in action recognition. Finally, the spares model was designed for preserving representative objects and also overwhelming noises in action elements.
16 Analysis of Various Video-Based Human Action Recognition …
183
Yu et al. [23] introduced a novel deep model (D3D-LSTM) primarily based on 3DCNN and LSTM for each single-target and interplay movement recognition to enhance the spatiotemporal processing overall performance. This system has numerous fantastic properties, like real-time feature fusion technique, improved attention model, and alternating optimization approach. Here, a real-time feature fusion scheme was applied for attaining a more demonstrative feature series based on the local mixture composition, thus discriminating the same actions was improved. In addition, an enhanced attention system was considered on every frame by allocating dissimilar weights in real time. The existing memory state was updated using a weight-controlled attention model for long-time relations; thus, it enables memory cells for accumulating improved long-term features. Furthermore, the compactly associated bimodal structure creates local perceptrons of D3DLSTM, as well as stores, improved short-term features. This approach has better comprehensive longterm feature processing performance and also extracts spatiotemporal features for the increased detection rate of difficult movements. Luvizon et al. [9] proposed a multi-challenge framework for mutually estimating 2D or 3D human poses from monocular color photographs and categorizing human movements from video sequences. In this approach, a distinct structure was used to clear up each issue in a proficient manner and nevertheless achieves cutting-edge or similar consequences. This approach was performed well from high parameter sharing among two tasks through unifying videos and images in a single pipeline. Furthermore, significant insights were afforded for end-to-end training of multi-task approach designed through decoupling key calculation sections that direct to high precision.
16.3 Research Gaps The research confronts problems faced by existing approaches utilized in video-based human activity identification using deep neural networks. The research issues faced by CNN-based approaches are represented below: In [12], a neural-based deep model was proposed to classify sequences of human actions, without a priori modeling, but only depending on automatic learning from training examples. A new approach was introduced in [14], in which, this model cannot deal with numerous activities obtainable in one video sequence, because the spatial and temporal orientation of the extracted features was disregarded. In [15], a new human action recognition technique was implemented, where a pre-designed backbone CNN network was used as a characteristic extractor and applied only on top of the last convolutional feature maps; thus, it cannot be accessing the lower-level information of the backbone, and without fine-tuning the backbone network, higher-level feature map was not finetuned for action recognition purpose. The research issues faced by fusion-based approaches were illustrated below: A scheme for human activity recognition from depth maps and posture data using deep CNNs [17] was developed. However, it
184
L. A. Jandhyam et al.
needs extra calculation than easy feed-forward neural networks because of the 2D processing. The research issues faced by skeleton-based approaches are demonstrated in the below section: A systematic adversarial attack on skeleton-based action recognition was introduced in [19]. It was observed that it has a few drawbacks, in which the aimed system is opposed to the perturbations on a few skeleton movements. For the advanced mode of the CIASA attack, these cases were more frequently examined, where, to disturb just a few joints, they restricted the attack. In [18], a novel approach to performing human activity recognition using RGB-D sensors was introduced. In this system, to viewpoint variability, robustness remains the most important problem. The research problems confronted in other approaches are shown below: A robust human activity recognition system based on the smartphone sensors’ data [21] was proposed. In this system, the number of models for training and testing various actions was not equally dispersed in the database; thus, the performance gets affected.
16.4 Analysis and Discussion Discussing various techniques utilized for various video-based human action recognition techniques using deep learning approaches. This analysis is carried out through the approach of a range of research papers based on the used dataset, methods.
16.4.1 Analysis Based on Techniques This section explores an analysis of different techniques used in video-based human recognition techniques using deep learning approaches. The conventional techniques developed for detecting and recognizing human activities are demonstrated in Fig. 16.2. It is discovered from Fig. 16.2 that 44% of research works utilized methods based on CNN as well as 24% of research was based on skeleton-based approaches. Moreover, both fusion-based approaches and DBN-based approaches were considered in 8% of research papers, respectively. Additionally, other approaches occupied 16% of research for action recognition. Thus, this review exposed that CNN-based approaches are broadly utilized methods for the identification and recognition of human action.
16.4.2 Analysis Using Evaluation Metrics This section exposes the analysis made by performance metrics. Here, accuracy, precision, recognition rate, recall, Area Under the Curve (AUC), fooling rate, error rate, F1 score, and mean average precision are the evaluation metrics evaluated. From
16 Analysis of Various Video-Based Human Action Recognition …
185
Fig. 16.2 Analysis based on techniques
this analysis, it is found that accuracy was considered in a maximum number of papers compared to other evaluation metrics. Table 16.1 demonstrates that accuracy is the most frequently utilized evaluation metric. This section illustrates analysis in terms of values of performance metrics. Moreover, Table 16.2 demonstrates analysis based on accuracy. Accuracy is specified in four ranges, 61–70, 71–80, 82–90, and 91–99%. It is found by the table that 19 research papers gained better accuracy but the research paper [24] obtained less accuracy in the range between 61 and 70%. Moreover, the accuracy of the research paper [9] lies in the range between 81 and 90%.
Table 16.1 Analysis by means of evaluation metrics
Evaluation metrics
Number of research papers
Accuracy
[1, 3, 9, 11–14, 17, 18, 21, 22, 24–32]
Recognition rate
[2, 16, 20]
Recall
[2, 19]
AUC
[13]
Fooling rate
[19]
Error rate
[21]
F1 score
[3]
Mean average precision [15, 21]
186 Table 16.2 Analysis of accuracy
L. A. Jandhyam et al.
Accuracy (%)
Number of research papers
61–70
[24]
71–80
[13]
81–90
[9]
91–99
[3, 11, 12, 14, 17, 18, 21, 22, 25–32]
16.5 Conclusion This survey is presented on various human action recognition techniques using deep learning techniques. Different approaches were surveyed in this work, which was categorized into four groups according to the methods used to recognize human activities. The study was done by gathering 32 research papers, and collected articles are categorized on the basis of several techniques, like methods based on CNN, techniques based on fusion, methods based on the skeleton, approaches based on DBN, and other approaches. Additionally, different sources, including Google Scholar, IEEE, and so on, are considered for collecting research articles for this survey. Here, composed research papers are examined and issues faced by existing methods are established. Besides, this analysis recommends designing effective future works for human action recognition techniques using deep learning techniques taking into account quite a few research gaps and issues. As well as, this analysis is indicated based on classification methods, applied tools, datasets, and performance metrics. From the analysis, it is obviously revealed that method based on CNN is frequently used method in research articles. Similarly, accuracy is mostly used evaluation metric in the majority of the research papers.
References 1. Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010) 2. Liu, H., Shu, N., Tang, Q., Zhang, W.: Computational model based on neural network of visual cortex for human action recognition. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1427–1440 (2017) 3. Wan, S., Qi, L., Xu, X., Tong, C., Gu, Z.: Deep learning models for real-time human activity recognition with smartphones. Mobile Netw. Appl. 25(2), 743–755 (2020) 4. Zhang, Z., Tao, D.: Slow feature analysis for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 436–450 (2012) 5. Sun, L., Jia, K., Chan, T.H., Fang, Y., Wang, G., Yan, S.: DL-SFA: deeply-learned slow feature analysis for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2625–2632 6. Kokkinos, I.: Ubernet: training a universal convolutional neural network for low-, mid-, and high-level vision using diverse datasets and limited memory. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6129–6138
16 Analysis of Various Video-Based Human Action Recognition …
187
7. Zolfaghari, M., Oliveira, G.L., Sedaghat, N., Brox, T.: Chained multi-stream networks exploiting pose, motion, and appearance for action classification and detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2904–2913. 8. Choutas, V., Weinzaepfel, P., Revaud, J., Schmid, C.: Potion: pose motion representation for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7024–7033 9. Luvizon, D.C., Picard, D., Tabia, H.: Multi-task deep learning for real-time 3D human pose estimation and action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(8), 2752–2764 (2020) 10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) 11. Liu, J., Wang, G., Duan, L.Y., Abdiyeva, K., Kot, A.C.: Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans. Image Process. 27(4), 1586–1599 (2017) 12. Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: International Workshop on Human Behavior Understanding, Nov 2011, pp. 29–39. Springer, Berlin, Heidelberg 13. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2012) 14. Guha, T., Ward, R.K.: Learning sparse representations for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1576–1588 (2011) 15. Moniruzzaman, M., Yin, Z., He, Z., Qin, R., Leu, M.C.: Human action recognition by discriminative feature pooling and video segment attention model. IEEE Trans. Multimedia 24, 689–701 (2021) 16. Chen, C., Jafari, R., Kehtarnavaz, N.: Improving human action recognition using fusion of depth camera and inertial sensors. IEEE Trans. Human-Mach. Syst. 45(1), 51–61 (2014) 17. Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., Feng, D.D.: Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Trans. Syst. Man Cybern. Syst. 49(9), 1806–1819 (2018) 18. Talha, S.A.W., Hammouche, M., Ghorbel, E., Fleury, A., Ambellouis, S.: Features and classification schemes for view-invariant and real-time human action recognition. IEEE Trans. Cogn. Dev. Syst. 10(4), 894–902 (2018) 19. Liu, J., Akhtar, N., Mian, A.: Adversarial attack on skeleton-based human action recognition. IEEE Trans. Neural Netw. Learn. Syst. (2020) 20. Uddin, M.Z., Hassan, M.M., Almogren, A., Zuair, M., Fortino, G., Torresen, J.: A facial expression recognition system using robust face features from depth videos and deep learning. Comput. Electr. Eng. 63, 114–125 (2017) 21. Hassan, M.M., Uddin, M.Z., Mohamed, A., Almogren, A.: A robust human activity recognition system using smartphone sensors and deep learning. Futur. Gener. Comput. Syst. 81, 307–313 (2018) 22. Wang, H., Yuan, C., Hu, W., Ling, H., Yang, W. and Sun, C.: Action recognition using nonnegative action component representation and sparse basis selection. IEEE Trans. Image Process. 23(2), 570–581 (2013) 23. Yu, J., Gao, H., Yang, W., Jiang, Y., Chin, W., Kubota, N., Ju, Z.: A discriminative deep model with feature fusion and temporal attention for human action recognition. IEEE Access 8, 43243–43255 (2020) 24. Sahoo, S.P., Ari, S., Mahapatra, K., Mohanty, S.P.: HAR-depth: a novel framework for human action recognition using sequential learning and depth estimated history images. IEEE Trans. Emerg. Top. Comput. Intell. 5(5), 813–825 (2020) 25. Nie, Q., Wang, J., Wang, X., Liu, Y.: View-invariant human action recognition based on a 3D bio-constrained skeleton model. IEEE Trans. Image Process. 28(8), 3959–3972 (2019) 26. Wei, H., Kehtarnavaz, N.: Simultaneous utilization of inertial and video sensing for action detection and recognition in continuous action streams. IEEE Sens. J. 20(11), 6055–6063 (2020)
188
L. A. Jandhyam et al.
27. Ronao, C.A., Cho, S.B.: Human activity recognition with smartphone sensors using deep learning neural networks. Expert Syst. Appl. 59, 235–244 (2020) 28. Zhu, K., Wang, R., Zhao, Q., Cheng, J., Tao, D.: A cuboid CNN model with an attention mechanism for skeleton-based action recognition. IEEE Trans. Multimedia 22(11), 2977–2989 (2019) 29. Cheng, J., Ren, Z., Zhang, Q., Gao, X., Hao, F.: Cross-modality compensation convolutional neural networks for RGB-D action recognition. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1498–1509 (2021) 30. Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: 3-D human action recognition by shape analysis of motion trajectories on Riemannian manifold. IEEE Trans. Cybern. 45(7), 1340–1352 (2014) 31. Ryu, J., Patil, A.K., Chakravarthi, B., Balasubramanyam, A., Park, S., Chai, Y.: Angular features-based human action recognition system for a real application with subtle unit actions. IEEE Access 10, 9645–9657 (2022) 32. Du, Y., Fu, Y., Wang, L.: Representation learning of temporal dynamics for skeleton-based action recognition. IEEE Trans. Image Process. 25(7), 3010–3022 (2016)
Chapter 17
Penetration Testing of Web Server Using Metasploit Framework and DVWA Tamanna Jena Singhdeo, S. R. Reeja, Arpan Bhavsar, and Suresh Satapathy
Abstract Cyberspaces are ubiquitous today. These online spaces have made their mark in fields such as education, government, and ecommerce. It is believed that the development of cyberspace is inevitable, and we can expect to see its impact in all domains around us soon. Post-pandemic, we have seen a paradigm shift, with many services moving online. As vulnerable systems move online, there has been an exponential increase in cyber-attacks. Penetration testing is a powerful practice that can be used to safeguard against cyber-attacks. However, the framework of penetration testing, the extent of automation, and the metrics of security measures are still a work in progress. In this paper, we aimed to exploit the vulnerability of a web server. Using reverse TCP protocol, Metasploit framework and Burp Suite tool of Kali Linux, we successfully gained access into a web server based on Xampp. Our work suggests that penetration testing can be used to identify flaws of a system, and this knowledge can be used to create a more robust version.
17.1 Introduction Since the onset of the pandemic, many businesses have moved to remote access fully or partially. People are increasingly dependent on computers and its applications to do their day-to-day activities. End users are using Internet connections as the utility service like water, electricity, and gas. Usage of the Internet and connecting devices has become ubiquitous and inevitable. Keeping the connection safe and secure has become of paramount importance than ever. With the change in paradigm, there is a new concern cropping up. In order to provide services to the end user, enterprises T. J. Singhdeo (B) · A. Bhavsar Fairleigh Dickinson University, Vancouver, Canada e-mail: [email protected] S. R. Reeja VIT AP University, Amravati, India S. Satapathy KIIT University, Bhubaneswar, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_17
189
190
T. J. Singhdeo et al.
and businesses are focusing on marketing, user-friendliness of applications, and user experience. In this rat race, it is often found that security of applications has taken a back seat. Repercussion of ignoring security is causing havoc. Businesses are losing huge amount of money in lawsuits because of their irresponsible actions in security. Additionally, it is disturbing to find that cyber-attacks and attackers have become increasingly creative. The ransomware attacks have increased by 105% from 2020, and encrypted threats have risen by 170%. There was a massive Microsoft Exchange Servers attack in Jan 2021 where hackers were able to access the accounts of 300,000 servers worldwide [1]. After such incidents, it was evident that security and data safety is going to be the top most priority for every organization. Approaches like the STRIDE threat model which was practiced in industry since last decades were insufficient [2, 3]. To provide a thorough and secure environment for businesses, penetration testing is a must. There are few open-source penetration testing frameworks like Metasploit framework is a good place to begin with [4]. Metasploit is an open-source framework which can use more than 1600 exploits and 495 payloads to attack networks and computer system [5]. Payload is basically a malware that the threat actor intends to deliver to the victim in cyber-attacks. Having a proper understanding of attack models an informed insight into potential malicious networks, their intentions, and possible actions. Additionally, the insight will help to protect the network from possible attacks. It is vital to communicate that insufficient understanding will neither be able to predict nor stop attacks. It is crucial to analyze the network to identify the vulnerability list [6–8]. Research suggests advance planning of possible actions in case of cyber security attacks help in the extent of impact. A good defense is the detail understanding of one’s network, history of any attack, vulnerability of the network, possible motive of the attacker, security loop holes of the network to mitigate future attacks. It is important to identify what all things are vulnerable in a network and what are the components cyber-attacks are after. Modeling a cyber-attack saves time and secures a system effectively. Researchers have identified Diamond model [9] and Open Web Application Security Project (OWASP) threat model as one of the efficient intelligence models in cyber security perspective. The Diamond model is a model to describe cyber-attacks. It mainly has four important parts, i.e., adversary, infrastructure, capability, and target. The first part analyzes where are attackers from, information about the attackers, the motive behind the attack, and their approach. The second phase of the model entails infected devices, command to control server, data management and control, and data leakage paths. The third phase is capability, and it is about skills the attacker needs to deliver their attacks, exploit vulnerabilities, backdoor entries, and their tools. And the last phase is target which details the target country, region, industry, type of data. The model is important for its simple implementation and dealing with advanced attackers. The OWASP threat model is used to improve the security of software through its open-source community led projects. A good security strategy is knowing the vulnerability of the system or organization. Organizations are now hiring pentesters to find out the vulnerability of the system. Pentesters are otherwise called white hat hackers, the former party shares their information to the pentesters. They dig deep to understand the dataflow, the dependability,
17 Penetration Testing of Web Server Using Metasploit Framework …
191
network flow, third-party involvement, command to control servers, details about the network, and protocols about the organization. After the service level agreement, the pentesters launch attack to the organization with prior permission. The process of finding vulnerabilities is called penetration testing or ethical hacking. They closely monitor the impact of the attack. The attacks were to ensure the system is keeping up with the patches, protocols, monitors and have updated response processes, polices, and services [10]. The job of a pentester is an ongoing process, as cyber criminals are always getting creative in launching their attacks. This makes the responsibility competitive. It is found that the data obtained from successful pentesting attacks often discovery the underlying security issues [10]. As the process is supposed to be continuous and ongoing, people in academics and researchers are trying to automate the pen testing. So, an efficient and effective automated framework for the pentesting is the need of the hour. Researchers have identified a framework and the steps required to automate pentesting [10]. Data obtained from a successful pentest often discover problems which the process of evaluating the vulnerabilities can’t identify. In most cases, these data represent passwords, links between networks and personally identifiable information (PII). Security engineers who are running the pentest have access to the most sensitive resources of the company; they can access zones with heavy consequences in reality if a wrong action is made. Securing smart devices is relatively more challenging. The IEEE 802.11 gives detail criterion for WLAN communications among smart devices [11]. The most common security attacks on wireless devices are caused due to unwanted or automatic connections to the wrong network [12].
17.2 Problem Statement Securing a system is very important nowadays. Hackers are tirelessly trying to find a vulnerability in a system to gain access. To make the system more secure organizations are using firewall. Individual machines like laptop and PC also come with inbuilt firewall. Network monitoring system is beneficial to track activities. The best defense against cyber-attack is to have the thorough understanding of the network and the system. To ensure a system is secure run the vulnerability test as a pentester and launch exploitation. In this paper, we have suggested how to do pentesting using Metasploit framework.
17.3 Our Implementation In our implementation, we tried to find vulnerability of the system and tried to attain access of the web server using Metasploit framework. We used Apache HTTP server; it is an open-source web server which helps to deliver our content using web services. We broke down our attack into following steps:
192
T. J. Singhdeo et al.
(1) Scan for the available open ports and list the services and protocols used for the services. (2) Search for vulnerabilities in the services (3) Identify and think through the exploit that can be used to attain access (4) Launch a payload using Metasploit framework and search for open meterpreter sessions. To implement first step, we gave used n-map scanning available in the Metasploit framework [5, 13–16]. We installed two virtual machines (VM) on our host machine. We installed Windows VM to create a web server and installed Kali Linux VM to penetrate the web server. These two VMs are connected using a NAT Network. NAT Network is private network created virtually among the VMs to establish connection between VMs and host along with Internet. We downloaded the Xampp web server in the Windows VM and installed it. Xampp creates an Apache HTTP server on Windows VM. We started Apache and MySQL services on Xampp. The idea is to install Damn Vulnerable Web Applications (DVWA) on the Windows 10 machine to use it as a vulnerable web server. The installation of DVWA involves few steps. We found the source file to install DVWA at githum.com/digininja/DVWA. After downloading the source file, we copied the source files to the directory xampp/htdocs and modified the config.inc.php from config directory and php.ini file according to the selection of database. We have used MySQL database which creates a root user, so in the config.inc.php file we changed the value of db_user variable to root and value of db_password variable to blank. After that we opened a setup.php of DVWA in browser with address localhost/dvwa/setup.php. We followed the setup process and completed the DVWA installation. We verified it by login in with default username and password of DVWA. After that we used Nmap on Kali Linux VM to find out available devices, their open ports, and operating system on the network. From those result, we were able to find that windows server has http port 80 open. From that we captured the IP address of windows machine and tried to open it on browser of Kali Linux. We used the URL as windows_ip_address/dvwa. It led us to DVWA login, and from there, we set the security levels. Figure 17.1 shows the welcome page of DVWA installed in the VM, ensuring that the system is not vulnerable to real hackers. It provides a range of granularity of security which helps to simulate attacks [17, 18]. It provides low, medium, and high level of security against attacks. The different security levels help to learn better about the systems vulnerabilities. To perform the penetration testing on the web server. We scanned the ports to find vulnerability of the system using Nmap. Nmap is the tool which is embedded in the Kali Linux. Figure 17.2 shows the result of the Nmap scanning. It reveals the ports, protocols, and services the port used for. This insight is important reconnaissance. It helps to get an insight of the system. Pentesters and hackers use the scanning as the most important information to plan attacks. Furthermore, we decided to launch an attack using payload. We created a payload to
17 Penetration Testing of Web Server Using Metasploit Framework …
193
Fig. 17.1 DVWA login page
inject on the web server using Metasploit. We used the following command to create the payload: msfvenom -p php/meterpreter/reverse_tcp lhost=172.16.57.129 lport=4444 -f raw -o payload.php. Here, -p option is used to specify the payload. We have used php/meterpreter/ reverse_tcp. As we found that it is a web server and we can upload a file in response it gives the file path. We can use the php payload to access that address in our Kali Linux browser to gain access of web server. It is using reverse TCP protocol because the web server can be behind the firewall and it will not allow direct TCP connection from our machine to web server, so we are using the reverse TCP protocol to make connection from web server to our Kali Linux machine. To create reverse TCP connection, we are specifying our IP address and port number as lhost and lport. -f option is used to generate the file in different formats like raw, ruby, rb, perl, etc. We are using raw format. Output file name and path are specified with -o option. Using Metasploit framework, we were able to create a payload file. The payload is basically a malicious file which is used as a malware to attack on the host system. The extension type of payload file is php. The .php extension refers to open-source programming language; it is used to write server sider scripts. It can be executed in Fig. 17.2 Using Nmap to find open ports and to create payload
194
T. J. Singhdeo et al.
the web server. This is a unique payload in the Metasploit framework because this payload is one of the only payloads that are used in RFI vulnerabilities in web apps. After generating the payload, we first changed the file vulnerability option to low security level. The payload file was directly uploaded. The option requires an image file, but on low security level, the file type was not checked, and it gets uploaded directly. Figure 17.3 depicts the payload file created. The staged and created payload is used to gain meterpreter access to a compromised system. Later, we increased the security level to medium. Then, website checks if the uploaded file is image or not. It only allows jpg and png files types to be uploaded, and it wouldn’t let us upload the payload, if we try to upload our payload. The workaround to upload the payload in DVWA is to use Burp Suite to intercept the connection. Later, we turned on the interception on Burp Suite. After that we changed our file name from payload.php to payload.php.jpg and uploaded it. Burp Suite captured the request and, in that request, we changed the file name from payload.php.jpg to payload.php. After that we released the request and the file was successfully uploaded. Here, website was checking the file type and its size. It was only checking this on client side not on the server side. It was not checking for more than one extension as well. Due to that we were able to upload our payload on the web server. Figure 17.4 shows that the payload is uploaded after the file type has been changed to .php.jpg. Whereas on High Security Level, we need to embed the payload in image. Figure 17.5 shows the interception of Burp Suite to launch the attack in high security level. As it checks for file extension, file type, file size, and also file signature. It does not accept the file with more than one extension. We were not able to upload the payload by just changing its extension to jpg or png as it checks for the file signature as well. A file signature, also known as a magic number or a file header, is a unique sequence of bytes that identifies the type of a file. These bytes typically appear at the beginning of a file and are used by operating systems and other software to determine the file type and how to handle it. Due to that we uploaded the image file and interrupted the request using Burp Suite, as depicted in Fig. 17.5. We kept the first few lines of the image as it contains the image signature and remove the remaining portion. After that we added the payload code into that portion and forwarded the request. The updated payload got uploaded successfully. Figure 17.6 shows the image code while intervening using Burp Suite. We were able to upload jpg file and not php. Figure 17.7 shows the payload code uploaded successfully on the web server. It was noticed that the attack cannot be launched directly from the web server. To execute the payload, we have to exploit other vulnerability like file inclusion vulnerability. File inclusion vulnerability is a type of security vulnerability that allows an attacker to include a file on a server, typically through a web application expressed in Fig. 17.8. This can allow the pentester to execute arbitrary code on the server, access sensitive information, or launch further attacks. By exploiting file inclusion vulnerability, we try to open the URL “https://ip-address/ dvwa/fi/?page=C:\xampp\htdocs\dvwa\hackable\uploads\dog_o.png”. From Nmap scan we know that it is windows server and Xampp is usually installed on C drive, due to that we use the path “C:\xampp\dvwa”. From the webpage we received other path for the image. As we try to exploit that vulnerability, the payload gets executed, and we can get access to the server using Metasploit, as shown in Fig. 17.9. After
17 Penetration Testing of Web Server Using Metasploit Framework …
195
successfully injecting our payload into the web server, we opened the file path leading to that payload in browser of Kali Linux. Then, we opened Metasploit to exploit the payload. We set our payload, lhost, and lport in the Metasploit and exploited the payload. It provided us a different kind of access on the web server like view file system, download files, upload files, and network details. We gained access of web server using msfconsole of Metasploit framework, which is expressed in Fig. 17.8. After attaining the access to the system, we were able to access the file system. We could extract the path of the current directory after launching the exploit. We were able to exploit the second payload and gained access to server’s file system expressed in Fig. 17.10. DVWA depicts different levels of security in websites, and it can be used as a reference to create and test the level of security in any websites. As low security
Fig. 17.3 Creating php payload using Metasploit
Fig. 17.4 Intercepting payload for medium security
196
T. J. Singhdeo et al.
Fig. 17.5 Intercepting image file on high security
Fig. 17.6 Adding payload to image in high security
Fig. 17.7 Payload code successfully uploaded
level does not check for file extension, the attacker will be able to gain access very easily. On medium security level, the system checks for the file extension and file size. To gain access on this kind of security, the attacker will need to create an image as a payload to gain access to the server. On high security level, it checks for file type, file size, file signature and also allows only one file extension. To gain access in this system, it will require to inject the payload while keeping the image signature.
17 Penetration Testing of Web Server Using Metasploit Framework …
197
Fig. 17.8 Gained access of web server after exploit
Fig. 17.9 Accessing file system and path
Fig. 17.10 Image downloaded from the web server
Even after that, the attacker will have to exploit another vulnerability to gain access to the system. Whereas on the impossible security level, data of the image file are rewritten byte by byte including its metadata, by the system. On this level of security,
198
T. J. Singhdeo et al.
it is really impossible for the attacker to inject the payload. This can be used as a reference security level to achieve in a system to keep the server and user data secure.
17.4 Conclusion Penetration testing is proven to be beneficial for ensuring organization’s security. It is a pentester’s responsibility to find out all the vulnerabilities of the system before the attacker. We have expressed our implementation where we are using a DVWA which offers many granularities of situation. This versatility teaches the pentesters about the workaround attackers can used to gain access. Our research and implementation can be useful for pentesters who would like to know how they can learn and understand about the vulnerability of their system or organization. Experimental results showed a stepwise instruction to use Metasploit framework to find vulnerability of the system. We showed how the granularity of security level can be exploited. Our intension is to extend our pentesting to mobile applications. Additionally, we would like to work on automation of pentesting in bigger organizations.
References 1. Pitney, A.M., Penrod, S., Foraker, M., Bhunia, S.: A systematic review of 2021 microsoft exchange data breach exploiting multiple vulnerabilities. In: 7th International Conference on Smart and Sustainable Technologies (SpliTech), pp. 1–6. IEEE (2022, July) 2. Rouland, Q., Hamid, B., Jaskolka, J.: Specification, detection and treatment of STRIDE threats for software components: modeling, formal methods, and tool support. J. Syst. Architect. 117, 102073 (2021) 3. Schwartz, J., Kurniawati, H.: Autonomous Penetration Testing Using Reinforcement Learning. arXiv preprint arXiv:1905.05965 (15 May 2019) 4. Timalsina, U., Gurung, K.: Metasploit Framework with Kali Linux (Apr 2015) 5. Arote, A., Mandawkar, U.: Android hacking in Kali Linux using metasploit framework. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 7(3), 497–504 (2022) 6. Al-Mohannadi, H., Mirza, Q., Namanya, A., Awan, I., Cullen, A., Disso, J.: Cyber-attack modeling analysis techniques: an overview. In: 2016 IEEE 4th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), pp. 69–76. IEEE (22 Aug 2016) 7. Yohanandhan, R.V., Elavarasan, R.M., Manoharan, P., Mihet-Popa, L.: Cyber-physical power system (CPPS): a review on modeling, simulation, and analysis with cyber security applications. IEEE Access 8, 151019–151064 (2020) 8. Gao, Y., Li, X., Peng, H., Fang, B., Philip, S.Y.: HinCTI: a cyber threat intelligence modeling and identification system based on heterogeneous information network.: IEEE Trans. Knowl. Data Eng. 34(2), 708–722 (2020) 9. Caras, C.J.: Diamond Model of Intrusion Analysis—Travelex Ransomware Attack 10. Valea, O., Oprisa, C.: Towards pentesting automation using the metasploit framework. In: IEEE 16th International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 171–178. IEEE (2 Sept 2020) 11. Likhar, P., Yadav, R.S.: Securing IEEE 802.11 g WLAN using open VPN and its impact analysis. arXiv preprint arXiv:1201.0428 (2012)
17 Penetration Testing of Web Server Using Metasploit Framework …
199
12. Watts, S.: Secure authentication is the only solution for vulnerable public WiFi. J. Comput. Fraud Secur. 18–20 (1 Jan 2016) 13. Raj Raj, S., Walia, N.K.: A study on metasploit framework: a pen-testing tool. In: 2020 International Conference on Computational Performance Evaluation (ComPE), pp. 296–302. IEEE (2 July 2020) 14. Holik, F., Horalek, J., Marik, O., Neradova, S., Zitta, S.: Effective penetration testing with metasploit framework and methodologies. In: 2014 IEEE 15th International Symposium on Computational Intelligence and Informatics (CINTI), pp. 237–242. IEEE (19 Nov 2014) 15. Kennedy, D., O’gorman, J., Kearns, D., Aharoni, M.: Metasploit: The Penetration Tester’s Guide. No Starch Press (15 July 2011) 16. Marquez, C.J.: An analysis of the ids penetration tool: metasploit. J. InfoSec Writers Text Library 9 (2010) 17. Kotenko, I., Saenko, I., Lauta, O.: Modeling the impact of cyber attacks. J. Cyber Resilience Syst. Netw. 135–169 (2019) 18. Sanchez, H.S., Rotondo, D., Escobet, T., Puig, V., Quevedo, J.: Bibliographical review on cyber attacks from a control-oriented perspective. J. Ann. Rev. Control 48, 103–128 (2019)
Chapter 18
Periodic Rampart Line Inspired Circular Microstrip Patch Antenna Chirag Arora
Abstract In this paper, authors have designed a conventional circular microstrip patch antenna, which resonates at 3.8 GHz WiMAX applications. This conventional patch antenna forms a two-layer stacked structure by loading a periodic leaky wave structure in its dielectric layer to improve its gain and bandwidth. Conventional circular patch antenna resonates at 3.8 GHz with gain and bandwidth of 4.5 dBi and 350 MHz. However, under loaded conditions, the same antenna presents the gain and bandwidth of 6.4 dBi and 450 MHz, respectively, for same resonant frequency. FR-4 substrate measuring 1.48 mm thick makes up both the layers.
18.1 Introduction Since the development of microstrip patch antennas in1950s, they have been widely explored by the entire antenna community worldwide. The reason for this extensive research in field of patch antennas is their tiny size, uncomplicated planar structure, and multi-frequency operation. Moreover, they can be easily integrated with the driving circuitry on a common board or chip. Also, they can be easily fabricated using IC technology, which results in high fabrication accuracy. Hence, these antennas find various applications in mobile and wireless communication systems, including aircrafts, missiles, spaceships, etc., as aerodynamics of these systems are not disturbed by these antennas [1–4]. However, its structure suffers from the drawback that it behaves like a resonant cavity possessing limited fringing radiation. This results in poor impedance bandwidth. Further, these patch antennas have low RF power handling capability due to small gap between the ground plane and radiating patch. Moreover, gain of a single radiating patch is also not much appreciable. However, all the practical applications require antenna of broad bandwidth and large gain. But the basic microstrip patch antenna structure fails to provide such performance specifications. Therefore, various methods have been devised worldwide by the antenna community to overcome these drawbacks. Such methods include use of C. Arora (B) KIET Group of Institutions, Delhi-NCR, Ghaziabad, UP, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_18
201
202
C. Arora
thick substrates [5–8], integration of traditional patch with some parasitic elements [9–12], using shorting pins [13], or inserting slots in the conventional patch [14], etc. In [5], E. Nishiyama and M. Aikawa proposed patch antenna with thick dielectric substrate. Though appreciable gain and bandwidth has been obtained, but the designed antenna has large profile due to presence of thick parasitic substrate. Honda et al. in [8], proposed L-probe fed patch antenna with thick substrate of 3.6 mm thickness. Though, the gain improvement of 5 dB is observed by using the thick substrate, in comparison with the use of thin substrate. However, no improvement in fractional bandwidth is seen. In [9], J. Shi et al. proposed a yagi patch antenna by loading reflectors and directors. Placement of parasitic patches resulted in enhanced beam tilt angle, but at the cost of increased antenna profile and complex fabrication process. In [10], C. Arora proposed a metamaterial superstrate-based patch antenna array for 5.8 GHz WiMAX applications. The loaded antenna provided enhanced gain and bandwidth, but proper alignment of superstrate and radiating patch is quite tedious task as it requires very high accuracy. To obtain high gain, in [13], X. Zhang and L. Zhu proposed a patch antenna possessing shorting pins. This practice enhanced the performance parameters of the patch antenna, but determination of the optimum location of shorting pins is very difficult. In order to address these issues, the authors of this paper have enhanced the performance of a patch antenna using a method that produces a reasonable profile, a sturdy structure, and an easy fabrication process. In this article, authors have designed a conventional circular microstrip patch antenna which resonates at 3.8 GHz WiMAX and Unmanned Air Vehicle applications. Further to improve its gain and bandwidth, simultaneously, a leaky wave structure is embedded beneath this conventional circular patch. This leaky wave structure is designed using a periodic rampart line. The bottom layer, which is the third layer, functions as the entire ground plane. The proposed structure is robust in nature, easy to install, simple to fabricate and provides simultaneous gain and bandwidth, and thus, making it novel in nature. The forthcoming sections describe the design and results for this proposed antenna.
18.2 Antenna Design Figure 18.1a, b presents the layout of the designed stacked patch antenna, which is a two-layered structure, containing a standard circular patch antenna embedded with rampart line structure. This periodic rampart line behaves like leaky wave structure. Both layers have been designed on FR-4 substrate of same thickness (d) and dielectric constant, which are 1.48 mm and 4.4, respectively. Thus, the proposed antenna consists of a standard circular patch of radius 10 mm on top position, a leaky wave structure in the middle position, and entire ground plane at the bottom. The rampart line structure has four periods. The dimensions of this periodic rampart line structure are L = 22 mm, its width is a = 2.5 mm, and length of its each periodic unit cell is P = 20 mm. The patch’s radius is 10 mm. Feeding has been done through a coaxial SMA connector, the position of which is determined using parametric analysis so as
18 Periodic Rampart Line Inspired Circular Microstrip Patch Antenna
203
Fig. 18.1 Structure of proposed stacked patch antenna a decomposition view, b side view
to obtain best impedance matching. This double layered stacked structure is bonded and tightly fixed using nylon nuts.
18.3 Results and Discussion Results from simulations for the traditional circular patch antenna and proposed two-layer stacked patch antenna are presented in this section. Figure 18.2 presents the S11 curve for both these antennas. As observed from this graph, conventional circular patch antenna resonates at 3.8 GHz with bandwidth of 350 MHz, however when a periodic rampart line is embedded to this traditional antenna, bandwidth reached to 450 MHz at same resonant frequency. Thus, bandwidth improvement of 100 MHz is observed in the newly proposed patch antenna. Figure 18.3a, b presents the radiation pattern curve of both these antennas in XZ and YZ plane, respectively. As observed, these two antennas provide the gain of 4.5 and 6.4 dBi, respectively, thus providing the gain improvement of 1.9 dBi for the proposed stacked antenna. Thus, the proposed antenna provides simultaneous improvement in gain and bandwidth, which is the novelty of this structure. The leaky wave structure created by the periodic rampart line, which has the ability to change the effective dielectric constant of the substrate, has been utilized to explain the improvement in gain of the designed stacked patch antenna. This modification in the dielectric constant changes the distribution of electric field over the circular patch and hence expands the aperture of antenna radiation, leading to increase in antenna gain. Improvement in bandwidth can be explained on the basis of mutual coupling between the upper circular patch and periodic rampart line structure present above the ground plane. This forms a two-layered electromagnetically coupled system.
204
C. Arora Conventional Circular Patch Antenna Proposed Stacked Patch Antenna 0 -5
S11(dB)
-10 -15 -20 -25 -30 2
3
4
5
6
Freq (GHz)
Fig. 18.2 S11 curve of traditional circular patch antenna and proposed stacked patch antenna
18.4 Conclusions A periodic rampart line is positioned in between a conventional circular microstrip patch and the entire ground plane in this paper’s two-layered stacked microstrip patch antenna. This periodic rampart line acts as leaky wave structure and helps in improving the gain and bandwidth, simultaneously, of this proposed stacked patch antenna as compared to the conventional circular patch antenna designed with same parameters and operating at same resonant frequency. The proposed antenna resonates at 3.8 GHz and provides an improvement in gain and bandwidth by 1.9 dBi and 100 MHz, respectively. The proposed antenna is quite robust and can be integrated with any communication systems for WiMAX and Unmanned Air Vehicle applications. This proposed novel antenna will be fabricated for validation of simulated outcomes with experimental results.
18 Periodic Rampart Line Inspired Circular Microstrip Patch Antenna
z y
(a)
.
205
Conventional Circular Patch Antenna Proposed Stacked 0 Patch Antenna
x 10
330
30
5 0 -5
300
60
-10 -15
Gain (dB)
-20 -25 -25
270
90
-20 -15 -10 240
-5
120
0 5 210
10
150 180
(b) z x
.
Conventional Circular Patch Antenna 0 Proposed Stacked PAtch Antenna
y 0
30
330
-5 -10
60
300
-15 -20
Gain (dB)
-25 -30
90
270
-25 -20 -15 -10
240
120
-5 0
(b) 210
150 180
Fig. 18.3 Radiation pattern characteristics of conventional and proposed microstrip patch antenna in a XZ plane, b YZ plane
206
C. Arora
References 1. Wong, K.L., Jian, M.F., Li, W.Y.: Low-profile wideband four-corner-fed square patch antenna for 5G MIMO mobile antenna application. IEEE Antennas Wirel. Propag. Lett. 20(12), 2554– 2558 (2021) 2. Rahman, M.M., Ryu, H.G.: Compact multiple wideband slotted circular patch antenna for satellite and millimeter-wave communications. In: International Conference on Information and Communication Technology Convergence, pp. 233–236 (2021) 3. Lu, K., Chen, W.C.J., Li, W.Y.: Integrated four low-profile shorted patch dual-band WLAN MIMO antennas for mobile device applications. IEEE Trans. Antennas Propag. 69(6), 3566– 3571 (2021) 4. Swelam, W., Mitkees, A.A., Ibrahim, M.M.: Wideband planar phased array antenna at Ku frequency-band for synthetic aperture radars and radar-guided missiles tracking and detection. In: IEEE Conference on Radar, pp. 174–179 (2006) 5. Nishiyama, E., Aikawa, M.: Wide-band and high-gain microstrip antenna with thick parasitic patch substrate. IEEE Antennas Propag. Society Symp. 1, 273–276 (2004) 6. Jang, H.B., Young, J.Y.: 5G dual (S-/Ka-) band antenna using thick patch containing slotted cavity array. IEEE Antennas Wirel. Propag. Lett. 20(6), 1008–1012 (2021) 7. Yong, X.G., Luk, K.M., Lee, K.F.: L-probe fed thick-substrate patch antenna mounted on a finite ground plane. IEEE Trans. Antennas Propag. 51(8), 1955–1963 (2003) 8. Honda, S., Saito, S., Kimura, Y.: A miniaturized frequency-tunable varactor-loaded dual-band shorted multi-ring microstrip antenna fed by an L-probe with a thick dielectric substrate. In: IEEE International Symposium on Antennas and Propagation and North American Radio Science Meeting, pp. 1857–1858 (2020) 9. Shi, J., Zhu, L., Liu, N.W., Wu, W.: A microstrip Yagi antenna with an enlarged beam tilt angle via a slot-loaded patch reflector and pin-loaded patch directors. IEEE Antennas Wirel. Propag. Lett. 18(4), 679–683 (2019) 10. Arora, C., Pattnaik, S.S., Baral, R.N.: Metamaterial inspired DNG superstrate for performance improvement of microstrip patch antenna array. Int. J. Microw. Wirel. Technol. 10(3), 318–327 (2018) 11. Arora, C., Pattnaik, S.S., Baral, R.N.: SRR inspired microstrip patch antenna array. Progr. Electromagn. Res. C 58, 89–96 (2015) 12. Arora, C., Pattnaik, S.S., Baral, R.N.: Series fed patch antenna array with CSRR inspired ground plane. In: Proceedings of the First International Conference on Smart Computing and Informatics, pp. 161–167 (2018) 13. Zhang, X., Zhu, L.: High-gain circularly polarized microstrip patch antenna with loading of shorting pins. IEEE Trans. Antennas Propag. 64(6), 2172–2178 (2016) 14. Arora, C.: Design of metamaterial-based multilayer dual band circularly polarized microstrip patch antenna intelligent system design. Proc. India 2022, 383–390 (2022)
Chapter 19
A Deep Learning-Based Prediction Model for Wellness of Male Sea Bass Fish Velaga Sai Sreeja, Kotha Sita Kumari, Duddugunta Bharath Reddy, and Paladugu Ujjwala
Abstract The global aquaculture industry is experiencing a significant expansion, and it currently accounts for approximately 44% of the overall fish production worldwide. Despite encountering various obstacles in the aquaculture ecosystem, this surge in production has been achieved. In order to mitigate the negative effects of fish diseases, it is crucial to adopt scientifically proven and recommended methods for addressing health limitations. This paper aims to highlight some of the most effective techniques for identifying fish in an image using the ResNet50 model. Additionally, it intends to predict whether a fish is normal or abnormal based on its physical characteristics in aquaculture. To avoid spreading viral infections, fishermen must discard damaged or dead fish. Even for skilled fishermen, it might be challenging to spot odd fish since infected fish can be harder to identify than dead fish. As a result, it is desirable for anomalous fish to be detected automatically. Deep learning needs a lot of visual data, including both healthy and unhealthy fish, to detect abnormal fish from the image dataset. Aqua farmers are facing problems in assessing the health of the fish. Periodical observation of length is one of the parameters for assessing the health of the fish. But, farmers are failing to find the precise length of the fish. The length of the male sea bass fish is measured, and then length–weight relationship is applied for the derived length to achieve weight. Finally, we predicted whether the male sea bass fish is normal or abnormal with an accuracy of 92%.
19.1 Introduction Fisheries are a crucial aspect of food security, livelihoods, and socio-economic development in developing nations, serving as a significant source of income. Aquaculture has gained significant attention in recent years, driving rapid growth in the sector. This is due to technological advancements and a surge in demand for fish as a source of animal protein. To meet the increasing demand, aquaculture methods have become V. S. Sreeja · K. S. Kumari (B) · D. B. Reddy · P. Ujjwala Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_19
207
208
V. S. Sreeja et al.
more intensive, resulting in higher yields. In 2019, aquaculture production accounted for 44% of total fish production, generating a market value of $160 billion from 74 million tons of fish, which were primarily produced for human consumption. Predicting whether a fish is normal or abnormal can help identify potential health issues early, before they become more severe. This allows farmers to take action quickly to prevent the spread of disease and improve the overall health of their fish [1]. By monitoring the growth and health of their fish, farmers can make more informed decisions about feeding, breeding, and other management practices, which can help increase productivity and profitability. Healthy fish are more resilient to disease and stress, which reduces the need for antibiotics and other treatments that can have negative environmental impacts. By predicting whether fish is normal or abnormal, farmers can reduce the use of these treatments and improve the sustainability of their operations. The model will be trained using a dataset of labelled fish images to ensure that it can accurately classify new fish images [2]. Once the model is trained, it can be used to identify abnormalities in fish quickly and efficiently, enabling fishermen to take corrective actions immediately. The outcome of this project will be a powerful tool that can help improve the quality of fish harvesting by enabling quick detection of abnormal fish. This will reduce losses associated with harvesting diseased fish, which can be a significant problem for the fishing industry.
19.2 Literature Study Below is a study of the various existing methodologies, techniques, or observations related to fish abnormalities prediction which were referenced to create our proposed model. Bravata et al. [1] worked on a multi-target ensemble regressor was developed that achieved a mean percent error of 6.5, 9.2, and 24.0 for length, girth, and weight, respectively. The length predictions were found to be robust regardless of how the fish was presented in the image capture device. This research has the potential to increase the accuracy and efficiency of routine survey work for fishery professionals and enable automated collection of fish biometric data over time. Cheng et al. [2] have done research to utilize a combination of image segmentation and stereo matching techniques. Specifically, the FCN segmentation algorithm is employed to identify the object of interest within the image, while the SGBM algorithm is used to create a disparity image of the segmented region. Based on the resulting disparity map and object position, the algorithm is capable of calculating the body length of the object. This approach can be applied to detect underwater creatures and decrease the amount of depth estimation computation required. Rekha et al. [3] used convolutional neural networks (CNNs) for detecting and classifying objects has yielded impressive outcomes, surpassing those generated by techniques that rely on manual feature extraction and traditional image processing approaches. The VGGNet was the basis for the final CNNs used in the detection and
19 A Deep Learning-Based Prediction Model for Wellness of Male Sea …
209
classification modules, which attained validation accuracy of approximately 90% and 92%, respectively. Although the networks have displayed remarkable accuracy, the entire process is somewhat slow, with the detection module requiring roughly 5 s to process an image and the classification module performing almost instantaneously in under a second. Tamou et al. [4] make use of a deep convolutional neural network (CNN), which they train using an innovative approach based on incremental learning. This method involves progressively training the CNN to initially focus on effectively learning the more challenging species, followed by gradually learning additional species incrementally using a knowledge distillation loss approach while still maintaining high performance for the species previously learned. This approach results in an accuracy rate of 81.83% on the fish benchmark dataset of LifeCLEF 2015. Bunonyo et al. [5] suggest that restricting the quantity and quality of food provided to fish can result in reduced energy expenditure for maintenance and physical activity, which can ultimately lead to weight loss or mortality. The energy required for optimal growth, development, and overall health of fish species is determined by the balance between energy expenditure and food energy intake. The study demonstrates that food intake strongly influences fish weight growth, and limiting energy supply can significantly impede growth. A series of experiments to determine the feasibility of using YOLOv3 and MaskRCNN models for detecting and classifying eight distinct species of fish captured by a high-resolution DIDSON imaging sonar in the Ocqueoc River. They utilized metadata observations that contained information about the fish, such as their species, approximate location, and which videos and frames they appeared in, which were transformed into images for the detection and classification tasks [6, 7].
19.3 Proposed Work The proposed system will involve taking images of the male sea bass fish and calculating their length and weight using deep learning techniques [8–13]. The length and weight will be fed into a deep learning model to train it to differentiate between normal and abnormal fish.
19.3.1 Dataset The dataset that is used for this project is the real-time dataset that contains the images of male sea bass fishes and shrimp. Two classes of images are taken where each class has about 464 images, test data contains 203 images, and validation data contains 197 images (Figs. 19.1 and 19.2).
210
Fig. 19.1 Male sea bass fish image dataset
Fig. 19.2 Shrimp image dataset
V. S. Sreeja et al.
19 A Deep Learning-Based Prediction Model for Wellness of Male Sea …
211
19.3.2 Preprocessing The primary goal of our analysis is to identify and extract valuable insights from the dataset. We begin our process by performing image preprocessing on the dataset, which includes modifying the height and width of the images to create a more standardized and organized dataset. Additionally, we incorporate various image augmentation techniques to view a single image in multiple ways, which can help to improve the diversity and size of our dataset. The two-stage image preprocessing phase, which involves image augmentation and resizing, can help to prepare the dataset for analysis and reduce the processing time required for the machine learning algorithm. Once our dataset has been preprocessed, we proceed to divide it into three different sets for test, train, and validation purposes. Partitioning the dataset into three sets (train, test, and validation) can also help to ensure that the model is optimized and not overfitting the data.
19.3.3 Design Methodology The design methodology of the proposed model has been described below (Fig. 19.3). To start with, a dataset comprising of various images of fish is obtained. To prepare the dataset for analysis, a two-stage image preprocessing phase is initiated. The first stage involves the augmentation of the images to produce multiple variations of each image. The second stage contains adjusting the height and width of the images to a standardized size. After this, the dataset is partitioned into three sets, namely test,
Fig. 19.3 Proposed architecture
212
V. S. Sreeja et al.
train, and validation sets. To detect if a fish is present in an image, the ResNet50 model is employed. The ResNet50 model is a deep neural network architecture that is widely used in image classification tasks. This model is a popular choice for image classification tasks due to its deep architecture and high accuracy. In the proposed methodology, the ResNet50 model is employed to determine whether an image contains a sea bass fish or not. This is achieved by training the model on a labeled dataset, which includes images of sea bass fish and images without fish. Once the model is trained, it can be used to classify new, unseen images as either containing a sea bass fish or not. Once a fish is detected, the length and girth of the fish are computed using the CenterNet object detection framework. The framework is capable of localizing and detecting objects of interest in an image by estimating their center point and bounding box. The weight of the fish is then determined using a formula where the length and girth are the input parameters [8]. Specifically, the weight of sea bass fish is computed using the formula Weight of male sea bass fish = (length ∗ length ∗ girth)/1200. The weight of male sea bass fish that weighs between 2.5 and 4 kg is classified as normal while other weights are regarded as abnormal. Based on this classification, each fish is then categorized as normal or abnormal, depending on its weight, after determining its weight using the formula. In summary, the proposed methodology involves acquiring and preprocessing a fish image dataset, identifying the presence of fish in the images using the ResNet50 model, computing the length and girth of the fish, determining its weight using a specific formula, and classifying the fish as normal or abnormal based on its weight range [9]. ResNet50 ResNet50 is a deep neural network architecture that was introduced in 2015 by Microsoft Research. The architecture of ResNet50 consists of 50 layers, including convolutional layers, pooling layers, and fully connected layers. ResNet50 model is used for image classification to determine whether an image contains a sea bass fish or not. The model is trained on a labelled dataset that includes images of sea bass fish and images without fish. During the training process, the ResNet50 model learns to identify the unique features of sea bass fish images and differentiate them from non-fish images. Once the model is trained, it can be used to classify new, unseen images as either containing a sea bass fish or not. The model takes an input image and passes it through a series of convolutional layers, which extract the important features of the image. The extracted features are then passed through fully connected layers, which generate a probability score for each class. Algorithm Step 1 Collect male sea bass fish images and store it. Step 2 Import libraries needed and load the datasets. Step 3 Apply image preprocessing to adjust height and width of the images.
19 A Deep Learning-Based Prediction Model for Wellness of Male Sea …
213
Step 4 Splitting the data into train, test, and validation sets. Step 5 Fish is detected with an efficient accuracy using ResNet50 algorithm. Step 6 Image pixels are stored in a list, and these pixels are converted into inches by 1 ch = 96 px. Step 7 We are finding the length and girth of male sea bass fish, and using these values, we calculate the weight of male sea bass fish by formula, Weight of sea bass fish = (length ∗ length ∗ girth)/1200. Step 8 If weight of male sea bass fish > 2.5 and weight of male sea bass fish < 4. Then “Fish is normal” Otherwise “Fish is abnormal.”
19.4 Results and Observations The images contain the efficient accuracy obtained for the prediction of fish in an image using ResNet50 model with its weight evaluated and classifies whether the male sea bass fish is normal or abnormal. In Fig 19.4, male sea bass fish is identified with an efficient accuracy of 99% using ResNet50 model. The weight of male sea bass fish is 2.9 kg as it is in the range between 2.5 and 4 kg, the fish is considered as normal. In Fig 19.5, male sea bass fish is identified with an efficient accuracy of 97% using ResNet50 model. The weight of male sea bass fish is 4.3kg as it is outside the range between 2.5 and 4 kg, and the fish is considered as abnormal. Our research involved developing an efficient method for accurately detecting male sea bass fish in images using the ResNet50 algorithm. We also conducted a comparative analysis between the ResNet50 and VGG19 algorithms to evaluate Fig. 19.4 Male sea bass fish is normal
214
V. S. Sreeja et al.
Fig. 19.5 Male sea bass fish is abnormal
which algorithm performed better. Based on the images obtained from this comparison, it was evident that the ResNet50 algorithm was more accurate in detecting male sea bass fish. We used accuracy as a measure of success, where a higher accuracy score indicated the presence of a fish in the image, and a lower accuracy score suggested the absence of a fish. Once we identified the male sea bass fish in the image, we proceeded to measure its length and girth. Using these values, we calculated the weight of the fish using the formula: Weight of sea bass fish = (length × length × girth)/1200. We then evaluated the weight of the fish to determine whether it fell within the range of 2.5–4 kg. If the weight was within this range, the male sea bass fish was considered normal [10]. However, if the weight was outside this range, the fish was classified as abnormal. Our work highlights the importance of accurate fish detection and measurement, which is essential for fisheries management and conservation efforts (Figs. 19.6, 19.7, 19.8; Table 19.1). Although the model had a relatively low initial accuracy, its performance improved significantly as the training progressed. Throughout the training period, the validation
Fig. 19.6 Accuracy obtained using VGG19 algorithm
19 A Deep Learning-Based Prediction Model for Wellness of Male Sea …
215
Fig. 19.7 Accuracy obtained using ResNet50 algorithm
Fig. 19.8 Plot showing accuracy, loss of training, and validation sets
Table 19.1 Performance metrics of the model
Performance metrics
Proposed model
Precision
0.94
Recall
0.93
F1-score
0.93
Testing accuracy
93.50
Validation accuracy
94.05
Training accuracy
98.88
set loss fluctuated considerably. However, the model ultimately achieved a testing accuracy of 93.5%, a precision of 94%, a recall of 93%, and an F1-score of 0.93.
19.5 Advantages The proposed methodology has the following advantages: i. Automating the process of identifying and measuring fish in images can save a significant amount of time and effort compared to manual monitoring.
216
V. S. Sreeja et al.
ii. By accurately identifying and measuring fish, the system can provide farmers with valuable data on the health status of their fish [11]. iii. With the ability to accurately monitor the health and behavior of their fish, farmers can optimize their feeding and other management practices. iv. The system could potentially be implemented at a lower cost than hiring additional human labor to monitor fish health and behavior, making it a cost-effective solution for fish farmers.
19.6 Conclusion The findings of our work highlight the effectiveness of the ResNet50 algorithm in detecting male sea bass fish in images with a high degree of accuracy. The comparison of the ResNet50 algorithm with VGG19 algorithm revealed that the ResNet50 algorithm has a better accuracy rate of 92%, which indicates that the algorithm can reliably detect the presence of a fish in an image. In contrast, the VGG19 algorithm had an accuracy rate of 84%, indicating that it is less reliable in detecting the fish in an image. The identification of fish in the image allows us to measure the length and girth of the male sea bass fish. These measurements are essential in accurately calculating the weight of the fish using the formula. The weight of the fish is a critical parameter that determines the health and fitness of the fish. A male sea bass fish with a weight between 2.5 and 4 kg is considered normal, while a fish with weight outside this range is deemed abnormal. Our work demonstrates the importance of using accurate detection methods in measuring fish populations, enabling better management practices and conservation efforts. This approach can help in preserving the population of the male sea bass fish and its habitat.
19.7 Future Work As this study focused on detecting fish in images and if the weight of the fish was found to be outside of the range of 2.5–4 kg, it was considered abnormal. In the future, we want to expand this to detecting fish in video footage or in natural environments, such as rivers or oceans and further research could investigate the potential impact of having abnormal fish in an ecosystem and how to detect and address this issue.
References 1. Bravata, N., Kelly, D., Eickholt, J., Bryan, J., Miehls, S., Zielinski, D.: Applications of deep convolutional neural networks to predict length, circumference, and weight from mostly dewatered images of fish. Ecol. Evol. 9313–9325 (2020)
19 A Deep Learning-Based Prediction Model for Wellness of Male Sea …
217
2. Cheng, R., Zhang, C., Xu, Q., Liu, G., Song, Y., Yuan, X., Sun, J.: Underwater fish body length estimation based on binocular image processing. Information 11(10), 476 (2020) 3. Rekha, B.S., Srinivasan, G.N., Reddy, S.K., Kakwani, D., Bhattad, N.: Fish detection and classification using convolutional neural networks. In: Computational Vision and Bio-Inspired Computing: ICCVBIC 2019, pp. 1221–1231. Springer International Publishing (2020) 4. Ben Tamou, A., Benzinou, A., Nasreddine, K.: Live fish species classification in underwater images by using convolutional neural networks based on incremental learning with knowledge distillation loss. Mach. Learn. Knowl. Extract. 4(3), 753–767 (2022) 5. Bunonyo, K.W., Awomi, P.Z., Amadi, U.C.: Application of mathematical modeling to determine the growth in weight of a fish species. Central Asian J. Med. Nat. Sci. 3(3), 831–842 (2022) 6. Zhao, L., Montanari, F., Heberle, H., Schmidt, S.: Modeling bioconcentration factors in fish with explainable deep learning. Artif. Intell. Life Sci. 2, 100047 (2022) 7. Atmore, L.M., Ferrari, G., Martínez-García, L., van der Jagt, I., Blevis, R., Granado, J.: Ancient DNA sequence quality is independent of fish bone weight. J. Archaeol. Sci. 149, 105703 (2023) 8. Xia, C., Wang, X., Song, J., Dai, F., Zhang, Y., Yang, J., Liu, D.: Length and weight relationships of six freshwater fish species from the main channel of Yangtze River in China. Egypt. J. Aquatic Res. (2022) 9. Smoli´nski, S., Berg, F.: Varying relationships between fish length and scale size under changing environmental conditions—multidecadal perspective in Atlantic herring. Ecol. Ind. 134, 108494 (2022) 10. Oliveira, L.K., Wasielesky, W., Tesser, M.B.: Fish culture in biofloc technology (BFT): insights on stocking density carbon sources, C/N ratio, fish nutrition and health. Aquaculture Fisheries (2022) 11. Akila, M., Anbalagan, S., Lakshmisri, N.M., Janaki, V., Ramesh, T., Merlin, R.J., KamalaKannan, S.: Heavy metal accumulation in selected fish species from Pulicat Lake, India, and health risk assessment. Environ. Technol. Innov. 27, 102744 (2022) 12. Xue, Y., Bastiaansen, J.W., Khan, H.A., Komen, H.: An analytical framework to predict slaughter traits from images in fish. Aquaculture 566, 739175 (2023) 13. W˛asikowska, B., Linowska, A.A.: Application of the rough set theory to the analysis of food safety in fish processing. Procedia Comput. Sci. 192, 3342–3350 (2021)
Chapter 20
Depression Detection Using Deep Learning G. Gopichand, Anirudh Ramesh, Vasant Tholappa, and G. Sridara Pandian
Abstract It is of utmost importance to take care of mental health. This is especially true in the ongoing COVID-19 pandemic. Depression is a crippling problem and must be treated quickly. It is useful to focus efforts on new, efficient, and accurate methods of depression detection. We intend to detect whether a person is in a depressed state of mind. We will be using deep learning models. The algorithms that have been used are feed-forward deep neural network (FNN), long short-term memory (LSTM) neural network, simple recurrent neural network (RNN), 1D convolutional neural network (1D CNN), and gated recurrent units (GRU) neural network. These models have been trained by utilizing electroencephalogram (EEG) sensor datasets. They classify individuals into either “depressed” or “non-depressed” categories. These models have been compared using confusion matrix, precision, recall score, F1 score, and accuracy. The best-performing models identified are FNN and 1D CNN.
20.1 Introduction There are various ways in which depression can be detected. We approach this problem with the data collected by the EEG sensor. Abnormalities in brain function and electric activities in the brain are medically tested using an EEG. These devices observe minute electrical charges created by brain cell activity. This activity is magnified for visualizing it as a graph. EEG sensors are essential for this research due to the lack of conventional sensors which aid in detecting brain signals. Multiple studies have shown that early diagnosis of depression can significantly reduce negative effects. Due to the vast amounts of observations being made every second, the observation being temporal, and the unavailability of an easy way to link EEG sensor data to the mental state of a person, this proves to be a challenging task. Challenges for implementation are huge amounts of data and few computational resources. Deep learning algorithms are necessary to interpret the volume of data acquired from the EEG sensor datasets. They also help in classifying people into G. Gopichand (B) · A. Ramesh · V. Tholappa · G. Sridara Pandian School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_20
219
220
G. Gopichand et al.
“depressed” and “non-depressed” categories. Tricky correlations and elusive patterns can swiftly be detected using deep learning algorithms. So, it makes sense to use these algorithms to make sense of the data.
20.2 Literature Survey In [1], an inexpensive method void of contact for depression detection based on Gait analysis was proposed by the authors. Motor retardation and slow cadence displayed by depressed individuals with gait abnormalities were described by the proposed pseudo-velocity model. Zhang et al. [2] showcased a video-based network that detects stress using a two-leveled approach. This deals with designing mechanisms for attention pooling on a face and multi-scale level and for frame attention on the action level. The findings show TSDNet exceeds pre-existing strategies for feature engineering. In [3], logistical regression and random forest models are applied and combined to form a weighted average ensemble model. The random forest model produced better results. In [4], depression detection-based research is carried out with the help of a CNN based on end-to-end models and a spectrogram. The endto-end model had better performance. Dubagunta et al. [5] showed that modeling raw speech signals proved to be less effective than filtering them based on prior knowledge. Chlasta et al. [6] proposed a potential new scalable and accurate method to use audio spectrograms for the early screening of depressed individuals utilizing brief speech samples. Lam et al. [7] suggested a neoteric approach by using datadriven and context-aware approaches. The CNN showed great performance metrics for audio modality, and the transformer models were suitable for text modalities. Kumar et al. [8] made use of the IoT wearable device sensors and proposed a deep hierarchical neural network model. Cai et al. [9] used EEG sensors as the dataset and added four classifiers after they preprocessed the EEG sensor values. The KNN classifier had the best accuracy. In [10], a CNN-based model called DeprNet was developed to predict depression based on EEG data.
20.3 Methods and Experimental Procedure A comprehensive description of the different models we used for depression detection is given below along with their architectures.
20.3.1 Feed-Forward Deep Neural Network (FNN) Neural networks are algorithms that are analogous to the human brain. They help identify the connections and relations in the data. This type of neural network has a
20 Depression Detection Using Deep Learning Table 20.1 Architecture of FNN
221
Layer (type)
Output shape
Param #
Dense_1
(None, 64)
123,072
Dense_2
(None, 128)
8320
Dense_3
(None, 248)
31,992
Dense_4
(None, 128)
31,872
Dense_5
(None, 64)
8256
Dense_6
(None, 1)
65
Total params: 203,577 Trainable params: 203,577 Non-trainable params: 0
Table 20.2 Architecture of the simple RNN model
Layer (type)
Output shape
Param #
Simple_RNN
(None, 1922, 10)
120
Simple_RNN_1
(None, 20)
620
Dense
(None, 10)
210
Dense_1
(None, 1)
11
Total params: 961 Trainable params: 961 Non-trainable params: 0
wide range of applications and use cases which include time-series data prediction and modeling. The architecture of the used model is given in Table 20.1.
20.3.2 Simple Recurrent Neural Network (RNN) Artificial neural networks that take outputs from the previous step and use them as input for the next step are known as RNNs. The main feature and advantage of RNNs is the presence of a hidden state that remembers some information about the sequence. The architecture of the used model is given in Table 20.2.
20.3.3 Long Short-Term Memory (LSTM) LSTMs are a type of RNN designed to take care of the exploding and vanishing gradients. The concept of this type of network is to “remember” useful knowledge the model has seen before and “forget” the knowledge considered irrelevant. The difference between the naïve RNN and LSTM is that there are “gates” which are distinct layers for activation functions and an internal cell state is maintained by all
222 Table 20.3 Architecture of the LSTM model
G. Gopichand et al.
Layer (type)
Output shape
Param #
LSTM
(None, 1922, 10)
480
LSTM
(None, 20)
2480
Dense_1
(None, 1)
21
Total params:2981 Trainable params: 2981 Non-trainable params: 0
Table 20.4 Architecture of GRU model
Layer (type)
Output shape
Param #
GRU
(None, 1922, 10)
390
GRU_1
(None, 20)
1920
Dense_1
(None, 1)
21
Total params: 2331 Trainable params: 2331 Non-trainable params: 0
LSTM units. This is the information contained in the previous LSTM unit that was decided to be retained. There are four gates in an LSTM unit that aid in this process. The architecture of the used model is given in Table 20.3.
20.3.4 Gated Recurrent Units (GRU) Another type of RNN aims to solve the vanishing gradient problem. The working is like LSTMs and produces similar results. The difference between GRUs and naïve RNNs is the presence of gates. There are only two gates in GRU. The architecture of the used model is given in Table 20.4.
20.3.5 Convolutional Neural Network (CNN) CNNs are neural networks that possess applications mainly in image-based applications, or those which work on time-series data. The CNNs used for the latter applications are generally 1D CNNs, and therefore, we have proposed a 1D CNN model. This is based on the convolution operation, which occurs in the network’s hidden layers. They make use of kernels, which are like filters in image processing for feature extraction for the input data. The main concept used in convolution is the dot product of vectors. This can be represented with the following formula in (20.1).
20 Depression Detection Using Deep Learning Table 20.5 Architecture of the CNN model
223
Layer (type)
Output shape
Param #
Conv1D
(None, 1922, 32)
128
Conv1D_1
(None, 1918, 64)
6208
Conv1D_2
(None, 1916, 32)
6176
Flatten
(None, 61,312)
0
Dense
(None, 1)
61,313
Total params: 73,825 Trainable params: 73,825 Non-trainable params: 0
y[n] = x[n] ∗ h[n] =
∞
x[n].h[n − k].
(20.1)
k=−∞
The architecture of the used model is given in Table 20.5.
20.3.6 Sigmoid A special form of the logistic function that can return a value of [0,1] given a vector. This is used as the activation function in the output layer of a neural network in a binary classification problem. The sigmoid function is given by the formula in (20.2) σ (x) = 1/(1 + e(−x) ).
(20.2)
As an activation function, the weighted sum of the vector in the previous layer and the weights is passed as input to the sigmoid function and the output is used to compute the loss or predict the result.
20.3.7 Binary Cross-Entropy Loss Binary cross-entropy is a loss function. Cross-entropy, in general, calculates the difference between two probability distributions. Binary cross-entropy acts as a loss function which is an indicator of how well a model is performing in a binary classification task. The lower the loss, the better the model is performing. The formula for binary cross-entropy is given in (20.3) Loss = −
1 (yi log( pi ) + (1 − yi ) log(1 − pi )). N
(20.3)
224
G. Gopichand et al.
Here, N is the output size, yi is the actual label, pi is the predicted label. The first part of the formula becomes active when the actual label is 1 and the second does when it becomes 0.
20.4 Results and Discussion The description of the dataset, the metrics used in the evaluation of the models, and finally the comparison of the models are discussed in this section.
20.4.1 Dataset Description The dataset consists of EEG sensor values taken from patients. The data is collected from 46 patients, each going through the data collection process for 6 min. This data is split into 12 events. The number of channels used is 62, and each produces 31 features. The Beck’s depression inventory scores of a particular individual determine the labels of the data. The scores are categorized into four categories, the minimal range, mildly depressed, moderately depressed, and severely depressed. The minimal range is considered a normal patient and any other value of BDI is considered depressed. The data is preprocessed into a 2D matrix with each row representing a single epoch of data collected for a patient, having 1922 (31 * 62) columns and the label corresponding to the particular patient. The data is split into train and test and validation sets in the ratio of 90:10, respectively. This split ensures that sufficient data is available for the model to be trained.
20.4.2 Evaluation Metrics The metrics for evaluation are confusion matrix, precision, recall score, F1 score, and accuracy. The confusion matrix is a visual representation of how an algorithm has performed. It helps identify the false positives, false negatives, and correctly predicted values. The following formulas (20.4), (20.5), (20.6), and (20.7) represent the other metrics Accuracy = (TP + TN)/(TP + TN + FP + FN).
(20.4)
Precision = TP/(TP + FP).
(20.5)
Recall = TP/(TP + TN).
(20.6)
20 Depression Detection Using Deep Learning
F1 score = (2.precision, recall)/(precision + recall).
225
(20.7)
The instances of true positives are represented by TP, true negatives are TN, false positives are represented by FP, and FN represents false negatives.
20.4.3 Performance of the Deep Learning Models The FNN achieved an accuracy of 91.07%. The precision was 0.9, the recall was 0.91, and the F1 score was 0.9. From the confusion matrix for this particular model from Fig. 20.1, we can see the model performs well in a real-world scenario. About 91% of the normal people were classified as normal, and only 9% were classified as depressed. The false positive rate is less in this model. When it comes to depressed patients, this model classifies about 90% accurately and 10% of the depressed people are classified as normal. The RNN algorithm yielded an accuracy of 63.43%. The precision was 0.32, the recall was 0.5, and the F1 score was 0.39. From the confusion matrix in Fig. 20.2, we can see that the model performs excellently when it comes to classifying normal
Fig. 20.1 Confusion matrix for feed-forward deep neural network
226
G. Gopichand et al.
Fig. 20.2 Confusion matrix for simple recurrent neural network
patients as normal. Almost 100% of the normal patients are classified as normal but does not perform as expected for depressed patients. The LSTM was the next model to be trained. This gave an accuracy of 63.74%, which is better than the simple RNN but poor compared to the FNN. The precision was 0.62, the recall was 0.51, and the F1 score was 0.41. As can be seen from the confusion matrix in Fig. 20.3, this model performed much better than the simple RNN, the difference in the percentage of normal patients classified as normal between this model and RNN was negligible. When it comes to classifying depressed people properly, this model performed slightly better than RNN. The next model trained was the GRU neural network. The trained model gave an accuracy of 68.79%. This performed the best among all the RNN models. The precision was 0.66, the recall was 0.65, and the F1 score was 0.66. From the confusion matrix in Fig. 20.4, it can be seen that this model has a lesser percentage than the other RNN models when it comes to accurately classify normal patients as normal. This performs better than other RNNs in classifying depressed patients accurately. The final model, 1D CNN, turned out the best (and FNN). It has an accuracy of 90.48%, which is marginally less than the accuracy of the FNN. The values of precision were 0.89, the recall was 0.9, and the F1 score was 0.9. The F1 score is the same as FNN, and the other metrics are slightly less. From Fig. 20.5, we observed that the percentage of normal people classified as normal is 90.8% which is 0.2% less
20 Depression Detection Using Deep Learning
Fig. 20.3 Confusion matrix for long short-term memory neural network
Fig. 20.4 Confusion matrix for gated recurrent unit neural network
227
228
G. Gopichand et al.
Fig. 20.5 Confusion matrix for 1D convolutional neural network
than the FNN. When it comes to classifying depressed people properly, this model has an accuracy of 90%.
20.5 Conclusion We used the EEG sensor dataset from PREDICT-Patient Repository for EEG Data + Computation Tools to classify a person’s mental health as either depressed or normal. We used deep learning models and by comparing them on different metrics, observed that the FNN and 1D CNN performed the best and did well in real-world situations. The RNN models performed worse. The CNN model’s performance can be attributed to time-series data. The disadvantage of the models is that they are computationally intensive, and thus re-training models with new data would take time. We have shown that depression and mental health in general, can be classified using deep learning and EEG sensors. Future improvements include making a better dataset, training a hybrid algorithm, and fine-tuning to reduce false positives and false negatives.
20 Depression Detection Using Deep Learning
229
References 1. Wang, T., Li, C., Wu, C., Zhao, C., Sun, J., Peng, H., Hu, X., Hu, B.:A gait assessment framework for depression detection using kinect sensors. IEEE Sensors J. 21(3), 3260–3270 (2020) 2. Zhang, H., Feng, L., Li, N., Jin, Z., Cao, L.: Video-based stress detection through deep learning. Sensors 20(19), 5552 (2020) 3. Mahendran, N., Vincent, D.R., Srinivasan, K., Chang, C.Y., Garg, A., Gao, L., Reina, D.G.: Sensor-assisted weighted average ensemble model for detecting major depressive disorder. Sensors 19(22), 4822 (2019) 4. Srimadhur, N.S., Lalitha, S.: An end-to-end model for detection and assessment of depression levels using speech. Procedia Comput. Sci. 171, 12–21 (2020) 5. Dubagunta, S.P., Vlasenko, B., Doss, M.M.: Learning voice source related information for depression detection. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6525–6529. IEEE (2019, May) 6. Chlasta, K., Wołk, K., Krejtz, I.: Automated speech-based screening of depression using deep convolutional neural networks. Procedia Comput. Sci. 164, 618–628 (2019) 7. Lam, G., Dongyan, H., Lin, W.: Context-aware deep learning for multi-modal depression detection. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3946–3950. IEEE (2019, May) 8. Kumar, A., Sharma, K., Sharma, A.: Hierarchical deep neural network for mental stress state detection using IoT based biomarkers. Pattern Recogn. Lett. 145, 81–87 (2021) 9. Cai, H., Han, J., Chen, Y., Sha, X., Wang, Z., Hu, B., Yang, J., Feng, L., Ding, Z., Chen, Y., Gutknecht, J.: A pervasive approach to EEG-based depression detection. Complexity 2018 (2018) 10. Seal, A., Bajpai, R., Agnihotri, J., Yazidi, A., Herrera-Viedma, E., Krejcar, O.: DeprNet: a deep convolution neural network framework for detecting depression using EEG. IEEE Trans. Instrum. Meas. 70, 1–13 (2021)
Chapter 21
Fusion of Variational Autoencoder-Generative Adversarial Networks and Siamese Neural Networks for Face Matching Garvit Luhadia , Aditya Deepak Joshi , V. Vijayarajan , and V. Vinoth Kumar Abstract In this research paper, we propose a new methodology for content-based image retrieval, wherein the user queries the dataset by a visual example and our proposed model efficiently retrieves similar images from the dataset. Our method consists of a modified variational autoencoder for global feature extraction and a Siamese Network as a similarity measure between the query image and the dataset, renovating the current models still in practice that use autoencoders and traditional feature vectors for image enhancing and image retrieval respectively. With this new model, we not only are able to retrieve images at higher efficiency but also, we have gone a step closer toward identifying a person’s face even if it is disguised. We plan to evaluate the functioning ability of our framework using the CelebFaces Attributes Dataset, which contains more than 200,000 images with 40 binary attribute annotations. We expect to see decent improvements in accuracy and precision compared to the traditional image retrieval methods. Keywords Autoencoders · Siamese networks · Content-based image retrieval · Variational autoencoders-generational adversarial networks · Image matching
https://vit.ac.in/schools/school-of-computer-science-and-engineering. G. Luhadia (B) · A. D. Joshi · V. Vijayarajan · V. Vinoth Kumar Vellore Institute of Technology, 632014 Vellore, TN, India e-mail: [email protected] V. Vijayarajan e-mail: [email protected] V. Vinoth Kumar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_21
231
232
G. Luhadia et al.
21.1 Introduction We were first introduced to the notion of using deep learning methodologies for content-based image retrieval when we first perused [1] and [2]. When analyzing these recent trends in the field of Content-Based Image Retrieval (CBIR), it quickly became clear that Deep Belief Networks (DBNs) were a compelling step in improving the preexisting image retrieval techniques. But traditional DBNs are notoriously difficult to work with. More research down this path leads to our discovery of variational autoencoders and Generational Adversarial Networks (GANs). Likewise, when searching for an efficient similarity measure apart from k-means clustering, we came across Siamese Convolutional Neural Networks (CNNs) [3, 4]. After poring over the different methodologies of implementing Siamese networks in an image retrieval system, we settled on CNNs. A novel Siamese LSTM architecture was proposed for human re-identification in [5], however when using the latent variable representation of human faces, it does not prove particularly useful in our case. Similarly, implementing graph neural networks [6] would not prove feasible or particularly useful in our architecture, limiting our capability of extracting the maximum from the network. Although SGCNs as described in this paper are only useful in learning graphical representations of data, the lessons learnt from this research were instrumental in cementing our usage of twin networks in parsing through the latent representation of our images. Once theoretical feasibility was achieved, our motivation for this project led us to the selection of the dataset in question. In today’s age of mass surveillance and the abundance of smartphone cameras, traditional unreliable techniques like using human eyewitnesses as a credible source of information make little sense. It is hence our opinion that with advanced smartphone System on a chip (SOCs) containing bespoke Artificial Intelligence (AI) cores, we can make the most of this technological revolution to reduce crime. Therefore upon careful consideration, we saw fit to go with the CelebFaces Attributes (CelebA) dataset. Once sufficiently trained, the researchers believe that the model will help find missing people in public places like Shopping malls, Railway Stations, Bus Stands, etc. It may also be retrofitted for identifying people who use poor lighting conditions to shroud their appearance (for illegal activities). This model is trained by Variational Autoencoder (VAE), GAN and Siamese neural network. We were enthralled by the technique used in [7] to gain improved results using the fusion of autoencoders and a Siamese network being used to perform global feature extraction for signature verification. Using this knowledge, in our case the image is processed through VAE. While other facial recognition softwares use traditional autoencoders, we use this to get a probabilistic distributed answer which will help us identify faces even under disguise. The output of which is fed to the GAN which effectively creates a feedback loop for continual improvement of the training model. It is then fed to the Siamese network which makes two probabilistic vectors to find a match within the given database. Through this method, we will be able to get a model which not only retrieves images in a more efficient way but also make headway toward retrieving images even when disguised.
21 Fusion of Variational Autoencoder-Generative Adversarial …
233
21.1.1 About Variational Auto-Encoders As mentioned in [8] Variational Autoencoders provide a convenient platform to computationally optimize and devise a way for streamlining deep latent variable models (DLVMs) in conjunction with an inference model using stochastic gradient descent (SGD). DLVMs are probabilistic distributions that are parametrised by a neural network [9] represented by. pθ (x, z|y). To convert the inference-based learning problems of DLVM into an easy-polynomial time-based problem we introduce a variational parameter .φ such that q (z|x) ≈ pθ (z|x)
. φ
(21.1)
This approximation will help in the optimisation of marginal probability.
21.1.2 About Generative Adversarial Networks Generative Adversarial Networks encompass a group of generative models that aim to discover the underlying distribution behind a certain data-generating process. This distribution is discovered through an adversarial competition between a generator and a discriminator [10]. The two models are trained in a way that the discriminator attempts to distinguish between generated and truthy images, on the other hand, the generator aims to baffle the discriminator by producing images as realistic and compelling as possible. .
L D = Error(D(x), 1) + Error(D(G(z), 0)
(21.2)
The aim of the discriminator is to correctly label generated images as false and empirical data points as true. The above error function is a generalized error function to annotate and demonstrate the difference between the two parameters. .
L G = Error(D(G(z), 1)
(21.3)
A similar approach can be taken with the generator wherein the generator has to baffle the discriminator as much as feasible so that it mislabels falsey images as being genuine. In the above equation, the loss function should strive to minimize the difference between 1, the label for true data, and the discriminator’s evaluation of the generated falsey data.
234
G. Luhadia et al.
21.1.3 Applications of Face Matching 1. Improved Public Security: Facial matching makes it easier to track down burglars, thieves, and trespassers. Other than using the old-fashioned eye-witness method (with a high spoof ratio) this technology can be used for analyzing CCTV footage and using reference photographs from a substantial number of years before(aging-deaging). This technology can also be utilized to target and locate missing person cases and provide fast and reliable outcomes. Furthermore, shoplifting is major problem for supermarkets and retail units alike. People of concern can be pre-emptively identified to place security on alert. 2. Fast and Non-Invasive Identity Verification: Another benefit of face matching technology is its swift processing speed and the lack of physical contact as required by alternate means to verify the user. This point holds additional weight considering the pandemic, wherein physical contact was to be avoided as much as possible. Other identity verification methods, like passwords and pattern, require the user to remember them, while fingerprints require user contact. 3. Benefits in Banking: Despite many security measures taken worldwide by banking institutions, banking fraud is still a prevalent problem. Face Matching for Online Banking could help ameliorate it. While banks currently utilize one-time passwords(OTPs) to authorize online transactions and such, Facial Matching could potentially be used. This would allow users to authorize in a much faster, more reliable and more secure way even in the absence of their mobile phones. The benefits go beyond online banking, to physical bank branches and ATMs can also make use of the technology. It will provide a much faster way of accessing your account in banks and you could save a ton of time instead of waiting in long queues and filling out paper-based forms. Thanks to this, using debit cards and signatures may soon become history. 4. Advantages in attendance systems: Educational institutions and Industrial floors alike all over the globe are plagued by time fraud using the favor/ proxy system, which is still one of the most widespread ethics violations. A system equipped with face scanners (equipped with facial matching) to check in employees passing through would eliminate the need for ID card-based verifications and would also make the hourly wage system far more airtight as well as accurate. This also eliminates the event of forgotten id cards or them being handed to an unauthorized person. 5. Benefits of in Retail: If a certain regular customer enters the store they often prefer their service from one shopkeeper as a means of personal touch. Now imagine a certain system notifying the respective store manager about the arrival of their regular customer. This could certainly increase sales while also boosting customer loyalty.
21 Fusion of Variational Autoencoder-Generative Adversarial …
235
21.2 Related Works Facial Recognition technology is and has been widely used by organizations all over the globe. Watchlist as a service (WaaS) is a relatively new facial recognition data platform specifically designed for use in public places to help prevent shoplifting and similar violent crime. This includes a managed database of known criminals that pose a safety, theft or violent crime risk. It functions by matching the faces of the people entering the shop and cross-checking it against the aforementioned database. The database works in conjunction with the FaceFirst biometric surveillance platform, which uses feature-matching technology to alert security or law enforcement agencies of real-time threats. Several Facial Matching technologies are currently deployed around the world and the concept is not new by any means. However, the use of modern technology to better recognize and find latent facial attributes on a face is something that has not been explored to its entire depth. In order to progress further, we had to identify the type of generative model that is suitable for our application. To this end, we identified four prospective models, namely GAN, VAE, Flow-Based Models, and autoregressive models. GANs generate images from noise (so does the diffusion model). We can also use an informative conditioning variable like text encoding or a class label, and it generates a highly photo-real and lifelike image. They are inherently adversarial as the discriminator works against the generator. VAEs, however, work differently, they provide a meaningful structure to the latent representation of an image with lower dimensionality. The decoder then attempts to reconstruct the input with the goal of minimizing the distance between the input and its reproduction. Similar data points are close to each other and dissimilar data points are far away. This meaningful representation is ensured when a variational autoencoder is used. VAEs make sure that the latent representation is not labeled haphazardly, but according to a predefined distribution (in our case, Gaussian). This allows us to better sample points situated between the training data points. With this regularization, VAEs implicitly learn the data distribution. Flow-based models on the other hand are a class of models that explicitly learn the data distribution. Flow-based models apply a transformation parameterized by a neural network onto the data, however, the decoder is an exact inverse of the function. Achieving this invertible function in the case of neural networks involves mathematical complications. Hence, dealing with optical flow was not apt for our search. Autoregressive models, like Dall-E, are not very photo-realistic. These GPT-like models autoregressively generate images with low fidelity. If used, the required retrieval system will not function satisfactorily and our generation can only be achieved using GANs. The generator aspect of a GAN gets random noise, or a class conditioning variable as input is not ideal. This is when we turned to VAEGANs [11]. Improving the performance of these autoencoders using information derived from generative adversarial training. This helps to generate consistent and hyper-realistic images with fine details. As shown in [12], this is particularly helpful in identifying the minutia of facial attributes.
236
G. Luhadia et al.
21.3 Methodology The first component of the system is the amalgam of a variational autoencoder and a generative adversarial network. An autoencoder is an artificial neural network used to efficiently learn data coding in an unsupervised manner. Autoencoders take the image received by the hypothetical camera and compress it into a code while the decoder uses the same code to execute the respective instruction to reconstruct the image. A variational autoencoder (VAE) provides a probabilistic manner for describing an observation in latent space, i.e., rather than building an encoder that outputs a single value to describe each latent state attribute, we’ll formulate our encoder to describe a probability distribution for each latent attribute by two outputs: the mean and the variance. Both these mathematical terms can be used to describe probability. A Generative Adversarial Network (GAN) takes a random noise as an input and generates a Fake image and matches it with the input data set and the discriminator matches if the images in the data set are real or not. A Variational AutoencoderGenerative Adversarial Network (VAE-GAN) works the same way as GAN but it works for a probabilistic noise input. It is essentially a sequential network, where the decoder from the VAE acts as the generator from the GAN. The discriminator then takes the reconstructed image as input and outputs a Boolean value, along with backward propagation information which ultimately trains the whole network without any intervention. Over the number of epochs, the discriminator fine-tunes the whole network to minimize the loss due to compression and decompression by the autoencoder this is where the Siamese network comes into play. After the VAE-GAN has finished training and the latent space is optimized, the complete 1024-dimensional vector is fed as input to the Siamese twin network, to both arms of the network. This trains the three fully connected layers Siamese network to find patterns in the hidden representation without actually understanding what they represent. This provides a security benefit like no other network apart from our pairtrained VAE-GAN can decode the images. So even if the database of images falls into the wrong hands, the images are stored as 1024-dimensional representations which cannot be decoded by anyone else. When querying an image, as shown in Fig. 21.1, we send our Image in the VAE which makes a probability distribution of all its features, and the GAN makes fake and image features and matches them across the discriminator and the flattened image is fed into one arm of the Siamese network, while the other arm is constantly parsing through the coded images from the database. The Siamese network or twin network compares the two feature vectors accordingly and displays the top 5 most similar images. If the querying computer is powerful enough, the code can be slightly modified such that the retrieval undergoes a two-stage check. The dormant discriminator network can also compare images from the uncompressed database and give a Boolean value as to whether the image matches anything from the dataset. Then, in the second stage, the Siamese network can scoop in and display the 5 most similar images based on hidden facial attributes that existing facial recognition systems do not use.
21 Fusion of Variational Autoencoder-Generative Adversarial …
237
Fig. 21.1 System architecture
Although the present layered architecture that is in use does output results, it is not the most optimal solution to the said problem. The unique ability of both the internal models (the VAE-GAN and the Siamese Network) to be able to perform the task of retrieval is used to our advantage, as by working together as a system, the output results are inclusive and the limitations of both these internal architectures cancel out. This is partly due to our use of a deep neural network and another shallower one. Apart from this, using more complicated activation functions like a leaky Rectified Linear Unit (ReLU) instead of a ReLU would most certainly prove beneficial in any future work that we undertake. More detailed block diagrams containing all the neural layers used in each part of the retrieval system are shown in Figs. 21.2 and 21.3. The diagrams together give a decent implementation knowledge and provide insight into exactly what goes on behind the scenes. The neural network architecture is also represented, detailing all the processing steps an image goes through in our retrieval system. The combination of a really deep Variational Autoencoder-Generative Adversarial Network for dimensionality reduction and a comparatively shallow Siamese network as a measure of similarity provides the unique effect of quick parsing and retrieval at the cost of slow and deep training. This, however, is not as big a concern as the model can be pre-trained. The absence of pooling layers leads to all the spatial information being preserved appropriately as there is no reduction of spatial resolution. This is a common weakness of max pooling as well as average pooling. The decision to risk the processing being lengthy and of high compute intensity for the benefit of a better
238
G. Luhadia et al.
Fig. 21.2 VAE-GAN network architecture
and more precise representation of the latent space was a conscious decision on our part. This not only makes the retrieval system more reliable but also more precise. This being said the accuracy entirely depends on the training of the network. The positive-negative-anchor image trios also affect the performance of the system.
21.4 Results The researchers discovered three results when studying the newfound algorithm. Firstly when compared to the Signal to Reconstruction Error (SRE) and Structural Similarity Index (SSIM) the novel querying method (VAE-GAN and Siamese Network approach) required 44.35% of the time that the SRE method consumed and 35.37% of the time the SSIM method consumed, as shown in Table 21.1. Secondly, the researchers discovered that in subsequent cycles of the algorithm being used to train similar people, the model surpassed its own benchmarks, by
21 Fusion of Variational Autoencoder-Generative Adversarial …
239
Fig. 21.3 Siamese network architecture Table 21.1 Query times with alternate algorithms Algorithm used Time taken .(sec) Signal to reconstruction error (SRE) VAE - GAN approach Structural similarity index (SSIM)
707.032 393.438 608.127
Table 21.2 Query times with subsequent images Image used (. I0 ) Time taken .(sec) Preliminary Image Subsequent Image-1 Subsequent Image-2 Subsequent Image-3
393.4382016658783 201.32328367233276 65.58776712417603 28.543447494506836
caching the matrices onto the memory of the system. This allowed for significantly lesser inference times, as shown in Table 21.2. Lastly, the researchers found that due to the inherent nature of GANs the target of the image was seen as a subject, a human with latent in-alterable features such as jawline, receding hairlines, and eye color. These features require significant effort
240
G. Luhadia et al.
to be altered in real life. In addition to that, an image of a person taken yesteryear can also be successfully compared to the current appearance of a person taking into account the above latent features.
21.5 Conclusion and Future Scope In this paper, the researchers have presented a global feature extraction framework based on Variational Autoencoders coupled with Generative Adversarial Networks and Siamese networks for facial matching. This method makes use of the unsupervised learning paradigm to train neural networks to find patterns in the hidden facial features and retrieve the most similar faces from a database. Since our approach was based on deep neural networks, we suspected that the performance of our retrieval system would correlate to the size of the dataset, and the computational power of the training machine. This suspicion proved to be true, as seen by the reconstructed results of the VAE-GAN. Nevertheless, the speed and the latent feature extraction are present in our Content-Based Image Retrieval model. Future work in this line should consist of better tuning of hyperparameters and using a computationally superior computer allowing it to run for more epochs. Furthermore, recent advancements in the field (e.g., Vision Transformers and Diffusion models) should be explored and compared. Diffusion models gradually add Gaussian noise to the input, then reverse the process. As stated in [13], diffusion models define a Markov chain of diffusion steps to slowly add random noise to data and then learn to reverse the diffusion process to construct desired data samples from the noise. The computing of the decision of the optimal input combination to be fed to the Siamese network during training is a challenge that we were unable to overcome. This in addition to the further polishing of the results of the algorithm with regards to higher quality computers and training cycles is left for future readers of the paper to explore. Furthermore, we would like to explore the ability of diffusion models as a global feature extractor, as they are arguably more realistic and more faithful to the data when compared to other generative architectures.
References 1. Inbaraj, R., Ravi, G.: A survey on recent trends in content based image retrieval system. J. Crit. Rev. 7(11), 2020 (2019) 2. Dubey, S.R.: A decade survey of content based image retrieval using deep learning (2020). arXiv preprint arXiv:2012.00641 3. Melekhov, I., Kannala, J., Rahtu, E.: Siamese network features for image matching. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 378–383. IEEE (Dec 2016) 4. Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML Deep Learning Workshop, Vol. 2 (July 2015)
21 Fusion of Variational Autoencoder-Generative Adversarial …
241
5. Varior, R.R., Shuai, B., Lu, J., Xu, D., Wang, G.: A siamese long short-term memory architecture for human re-identification. In: European Conference on Computer Vision, pp. 135–153. Springer, Cham (Oct 2016) 6. Chaudhuri, U., Banerjee, B., Bhattacharya, A.: Siamese graph convolutional network for content based remote sensing image retrieval. Comput. Vision Image Underst. 184, 22–30 (2019) 7. Ahrabian, K., Babaali, B.: Usage of autoencoders and Siamese networks for online handwritten signature verification. Neural Comput. Appl. 31(12), 9321–9334 (2018). https://doi.org/10. 1007/s00521-018-3844-z 8. Kingma, D.P., Welling, M.: An introduction to variational autoencoders. Found. Trends Mach. Learn. 12(4), 307–392 (2019) 9. Sensoy, M., Kaplan, L., Kandemir, M.: Evidential deep learning to quantify classification uncertainty. Adv. Neural Inf. Process. Syst. 31 (2018) 10. Aggarwal, Alankrita, Mittal, Mamta, Battineni, Gopi: Generative adversarial network: an overview of theory and applications. Int. J. Inf. Manag. Data Insights 1(1), 100004 (2021) 11. Larsen, A.B.L., Sønderby, S.K., Larochelle, H. Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: Proceedings of The 33rd International Conference on Machine Learning, in PMLR vol. 48, 1558–1566 (2016) 12. Hou, X., Sun, K., Shen, L., Qiu, G.: Improving variational autoencoder with deep feature consistent and generative adversarial training. Neurocomputing 341, 183–194 (2019). https:// doi.org/10.1016/j.neucom.2019.03.013 13. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020)
Chapter 22
An Invasion Detection System in the Cloud That Use Secure Hashing Techniques Sridevi Sakhamuri, Gopi Krishna Yanala, Varagani Durga Shyam Prasad, Ch. Bala Subrmanyam, and Aika Pavan Kumar
Abstract The same basic issue of inadequate protection affects all businesses, but current enemies are making major attempts to breach these barriers in order to participate in unlawful insider trading. They recognise a wide range of conduits for information theft. In the present day, intrusion into private information is more likely. While there are many safeguards against various attacks, hackers are always coming up with new ways to get past current barriers. As a result, an effort has been made in this essay to develop a unique strategy that would be particularly resistant to such an onslaught. The proposed plan calls for the use of a hash map-based intrusion detection system. The object is hashed and then saved as a shared key in this system. Nowadays, a major problem is the secure movement of data. Before uploading their files, users in the data-sharing system have the option to encrypt them using their own personal keys. In this paper, a secure and effective use of the approach is shown together with a security proof. Owners of data face several challenges when trying to make their data accessible through server or cloud storage. Numerous approaches may be used to deal with the problems. Multiple techniques are needed for the secure maintenance of a shared key that belongs to the owner of the data. This article has examined the idea of using a trusted authority to confirm the users of cloud data are who they claim to be. The key will be generated by the trusted authority using the SHA algorithm, and it will then be given to the user and the owner. After receiving an AES-encrypted file from the data owner, the certified authority system uses the MD-5 algorithm to obtain the hash value.
S. Sakhamuri (B) · G. Krishna Yanala · V. Durga Shyam Prasad · Ch. Bala Subrmanyam · A. Pavan Kumar Department of Electronics and Computer Engineering, Koneru Lakshmaiah Education Foundation, GreenFields, Vaddeswaram, AP 522302, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_22
243
244
S. Sakhamuri et al.
22.1 Introduction Cloud-based intrusion detection systems (IDSs) that use secure hashing algorithms are one method of monitoring a network for hostile activities. The system would run on the cloud-based computing platform, that provides flexibility and memory adaptability [1]. To generate a unique digital fingerprint, or “hash,” of internet traffic, secure hash functions like SHA-256 and SHA-512 are utilized. The validity of the traffic may be determined by comparing the hash to a known database of good and bad hashes. Cloud computing enables the IDS to store its database of positive and negative hashes and handle massive volumes of data. It also enables remote access, which is useful for both updating and maintaining the system. The IDS would keep an eye on every data passing across the network and raise an alarm if anything fishy was seen. This has the potential to thwart a broad variety of malicious cyber activity, such as virus infections, malware, and attempted breakins. All things considered, a combination of cloud computing and secure hashing algorithms in an IDS may be a powerful tool for spotting and stopping network intrusions [2]. The use of secure hash functions in an IDS has several benefits, one of which is that them are very difficult to edit or tamper with [3]. The hash of the network activity an attacker generates will be unusual from the hashing of regular traffic, therefore it will be simple to identify even if they manage to circumvent the IDS. IDSs that make use of cloud computing also have the option of incorporating machine learning algorithms into their detection processes, further enhancing their precision. To better identify emerging threats, the IDS may, for instance, utilize past information to train a ml algorithm to spot patterns of malicious behavior [4]. One other perk of adopting a cloud-based IDS is its compatibility with other security measures [5]. This includes intrusion prevention systems, firewalls, and antivirus programs. As a result, network security may be tackled in a more holistic and unified manner. Protecting the confidentiality and integrity of your data while employing a cloud-based IDS seems to be a significant problem. It is possible to prevent data theft or manipulation by using encryption and other protection measures. Using cloud computing to deploy an IDS built on safe hashing methods may be a challenging task that necessitates expertise in information security, cloud services, and secured hash functions. Organizations should collaborate with seasoned security specialists to design, build, and manage the system [6]. Understanding the difficulties and complications of deploying an IDS based on safe hashing algorithms utilising cloud computing is essential for making the most of its potential to identify and deter malicious behavior on a network. Increased cyber-attacks have been a growing cause for worry as Internet technology has advanced rapidly in recent years [7]. One method for spotting these intrusions is the Intrusion Detection System (IDS). Despite the impressive results of the already available Intrusion Detection techniques, there is a growing need to either refine the existing approaches or create whole new ones [8]. IDSs have been used for a long time with the intention of scanning network traffic and identifying
22 An Invasion Detection System in the Cloud That Use Secure Hashing …
245
any malicious activity or threats in real time. Just like a firewall, an IDS is a security system whose primary function is to prevent unauthorized individuals from gaining access to sensitive information by ensuring the data’s integrity and availability. There are established criteria by which the effectiveness of an IDS may be evaluated. Accuracy, low resource use, high efficiency, swiftness, and fullness are all in this category. Two types of attack detection techniques are used by Intrusion detection systems: identification of anomalies and signatures (also known as detection fraud). One uses the former to examine system behaviour over time to spot out of the ordinary actions taken by the system; for instance, if the number of database queries made by users is much higher than average, an anomaly detection warning will be triggered [9]. As not every user should be treated equally, this is obviously a drawback as well. In the same vein, the number of queries you run every day won’t always be the same, thus it’s best to make that number variable. The difficulty of adapting anomaly detection to an environment where user needs are constantly changing and cannot be compared to past data is another potential drawback. Therefore, it follows that anomaly detection may produce false positives [10, 11]. In the latter way, rules are built within the IDS based on stored information called fingerprints or signatures of previously successful attacks.
22.2 Literature Review Scholars have spent a lot of time thinking about and writing about security intrusion detection systems. The methods used [12, 13] for IDS may be broken down into three broad classes: anomaly-based IDS, letter IDS, and evolutionary algorithms shown in Fig. 22.1. As a first step in working with massive data sets, dataset pretreatment is essential when dealing with IDS. Choosing relevant features is the primary step in preprocessing. After settling on a set of characteristics, methods of machine learning are used to categorize intruders’ typical and atypical activities. We first discuss the hybrid approaches to ml utilised in the IDS there in current work, then we describe the different types of IDS, and last we describe the method for selecting features. A. IDs based on anamoly An advanced optimization approach was presented by Aljawarnehh [14] for intrusion detection systems (IDSs) based around intrusion detection with image segmentation. This novel hybrid model helped estimate assaults based on the activity with processing of best-case training data possibilities. Developing a more accurate IDS model needed additional optimization methods. By observing patterns in constants (e.g., the amount for the user ought to be the same when logging in), a profiling of a web-based application was created [15]. The software was examined to see whether any of the constants had been broken. There was recorded as an anomaly each time a violation of a static element was noted. By examining the activity of all online sessions, the authors of Ref. [16] were able to create an anomaly-based intrusion detection system. It was made up of more
246
S. Sakhamuri et al.
Fig. 22.1 Based on anomalies, intrusion system architecture [13] detection
manageable data item groupings for companies. These categories were then applied to common patterns in data access utilised in workflow procedures. The authors used a programme called a HMM. There is proof that clustering can maintain accuracy outcomes while achieving low for it, and the findings demonstrated that the approach could identify abnormal web transactions. Double framework was created by Le et al. [17] to identify attacks that released information by checking both database and web server logs. For dynamic sites, they found an FP rate of 0.6% and for static pages, an FP rate of 0%. In their research, Nascimento and Correia [18] examined an IDS that had been trained using data gathered from a large-scale online application. They only thought about GET requests and ignored POST requests and their corresponding answer pages. They used the T-Sharklog converter to standardise the logs it generated. In order to generate the filtered data, we needed to call upon the ancillary programmes. Nine different sensing models were employed. Ariu [19] created an HMM-based host-based IDS to prevent attacks on mobile apps. A web application’s input properties and values were modelled using this technique. Multiple HMMs were concatenated in order to meet a certain requirement on the probabilities that was derived from the training sample, allowing for the calculation of various metrics and values. Using a “XML” file that provided the necessary properties of parameter values, a web-based firewall was created in Ref. [20] to identify any irregular requests and record their behaviours. Attacks were detected if input values strayed from the profile.
22 An Invasion Detection System in the Cloud That Use Secure Hashing …
247
The problem was that this method generated FP warnings since it ignored the more reliable route information and page. B. IDs based on signatures Intrusion detection systems that rely on signatures created from threats that have already been found are better equipped to detect new threats as they appear. Using this technique, the network interface layout is compared. When an attack occurs and the network interface pattern fits the signature, the entry is flagged. By understanding the network behaviour fingerprints, this sort of detection scheme is simple to create and comprehend. This method detects known assaults with a high degree of accuracy and almost no false positives. Additionally, it may replenish the database with new signature and affecting the pre-existing ones represented in Fig. 22.2. The biggest problem with this IDS method is that it may be fooled by even the slightest change in the attack pattern, therefore it won’t be able to prevent assaults that haven’t been seen before. Schematic representation of signature-based IDS architecture. The following works that have used biometric IDS are important. Saraniya [21] created an IDS algorithms that relies on signatures to identify network intrusions, and this system is called a NIDS. It was able to successfully collect packets from the whole network in wide range of operating conditions and compare them to attack patterns created by security experts. As a result, the network was protected and memory use was cut down. Biometrics IDS cannot recognize novel and unforeseen threats since the trademark library must be continually checked for every new type of intrusion identified. The research described by Gao and Morris [22] focused on cyberattacks against MODBUS-based industrial control systems and signature-based intrusion detection systems. The aforementioned regulations took their cue from the earlier assaults. The regulations were classified as either “unbiased” or “legislature.” Rules that examined Fig. 22.2 Architecture for intrusion detection systems based on signatures [13]
248
S. Sakhamuri et al.
a single MODBUS package for a matching sign were totally independent. Snort, an idss, enforced the isolated rules. Uddin [23] presented a new operator for signature-based networked IDS, which would move signatures from a big complimentary warehouse to a small data store, and then frequently update the libraries when new signings were identified. The suggested model’s findings shown that IDS outperformed conventional systems that relied on a central database of chain fingerprints. In [24], Kumar and Gobil built an IDS that relies on previously-created signatures. They created an intrusion detection system (IDS) using Chortle, Basis, and Tp Rewind. This technology has the potential to analyse network traffic in real-time for signs of infiltration.
22.3 Proposed System An organization’s unique requirements and available resources will determine which of many suggested systems for an IDS based on secure hashing algorithms in the cloud would be the best fit. Nevertheless, the following are some things that should be considered while designing the ideal system: (1) Using secure hashing algorithms like SHA-256 and SHA-512, which are widely used in business, to guarantee the legitimacy of data sent across a network. (2) Using cloud computing as the foundation allows for scalability and the capacity to manage massive volumes of data. (3) Machine learning techniques are used to enhance the system’s detection capabilities and respond more rapidly to emerging threats. (4) For a more unified and effective network security strategy, integration with other security tools and services like firewalls, antivirus software, and intrusion prevention systems is essential. (5) Taking precautions to safeguard information stored in the cloud, such as using encryption and other safety protocols. (6) A user interface that is both straightforward and instinctive, facilitating IDS management and monitoring by system administrators in real time. (7) Updating and maintaining the system on a regular basis is essential for keeping it secure against new forms of cyberattack. (8) The ability to conform to regulatory requirements like PCI-DSS, HIPAA, and ISO 27001. (9) Planned actions to take in the event of a breach, should one occur. To solve the problem of insecure digital certificates and data sharing in dynamic groups, we present a new secure data sharing technique. We provide a safe method of key distribution that doesn’t need any private lines of communication. Users’ public keys may be verified by the group manager, allowing them to safely get their private keys without the need for Certificate Authorities. With the aid of the groups user list, our approach is able to establish granular control over who has access to which
22 An Invasion Detection System in the Cloud That Use Secure Hashing …
249
parts of the data. We provide a safe way of exchanging information that is immune to collusion. Once a user’s access has been denied, even if they operate together with an untrusted network, they cannot be used to recover the data that was original files. The usage of a polynomial function allows our system to provide safe user revocation. Framework for application procedure shown in Fig. 22.3. Our approach quickly supports dynamic group, meaning that the security tokens of all the other individuals don’t need to be theoretical and updated whenever a new customer participates in the club or an existing user is removed from the company. System architecture shown in Fig. 22.4. Detection accuracy is provided to demonstrate the safety of our approach.
Fig. 22.3 Framework for the application’s procedure
Fig. 22.4 System’s architecture
250
S. Sakhamuri et al.
22.4 Methodology Intrusion detection systems (IDS) built on the premise of cloud computing and secure hashing algorithms may use the following approach and course of action. (1) The first stage is to gather data on network traffic for use in the IDS. A networking sniffer or even other network management tool might be used to acquire this information. (2) Next, a secure hashing technique, such SHA-256 or SHA-512, would be used to the acquired network traffic data to produce a one-of-a-kind hash. (3) This would involve building a database of trusted and unsafe hashes. Standard network traffic would be used to produce the good hashes, while malicious network traffic would be used to produce the bad hashes. (4) The IDS would next do an analysis of the internet activity by comparing the traffic’s hashes to those stored in the database. The intrusion detection system (IDS) will sound an alarm if the traffic’s hash corresponds to a previously identified malicious hash. (5) Assembling the IDS in the cloud is the plan. In terms of storage and resources, this provides for scalability. Additionally, remote access is made possible. (6) It is possible to increase the system’s precision with the use of machine learning algorithms by analysing network data for indicators of malicious behaviour. A machine learning model may be trained using data from the past, allowing the system to identify new threats more rapidly and correctly. (7) Data stored in the cloud would be shielded from prying eyes and unauthorised access with the help of encryption and other security measures. (8) Updating and Repairing: The system would be checked and repaired on a regular basis to make sure it is always working well and can identify and avoid any new cyber threats. Collecting and analyzing network traffic data, building a collection of known good and bad hashes, deploying the IDS on a cloud platform, integrating ml algorithms for greater accuracy, and enforcing security precautions to safeguard data stored in the cloud are all components of the proposed method for an IDS based on secure md5 techniques using cloud storage. Protocol for ensuring data integrity during transformation from data holder to cloud platform shown in Fig. 22.5.
22.5 Experimental Results Here, we provide the findings of the experiments conducted to evaluate the effectiveness of the various signature hashing techniques, presented in the previous section. An important purpose of many algorithms is to prevent hackers from accessing sensitive information that has been generated. Attacks designed to break encryption techniques have proliferated in tandem with their development. These attacks may
22 An Invasion Detection System in the Cloud That Use Secure Hashing …
251
Fig. 22.5 Protocol for ensuring data integrity during transmission from data holder to cloud platform
be thwarted in a number of ways, one of which is by adopting a more modern security protocol or updating the one already in use. Some assaults were successful in breaching the security given by SHA-1 Algorithms, diminishing its promise for data privacy. Therefore, Privacy was not successfully protected by SHA-1. For increased safety, SHA 2 was created by NSA experts. A variety of theoretical attacks aimed at breaking SHA 2’s security were explored. However, while SHA 2 is more secure than SHA 1, there is no assurance that any private information will be leaked. SHA-2 makes attacks such as the Overlap Attack and the Pre-Image Attack vulnerable. In a collision attack, the attacker seeks to discover inputs that will result in the same hash value for the output. Collision attacks may be divided into two categories. Since SHA-2 uses a six value digest, this collisions approach is only effective against SHA-1 [12], but not against SHA-2’s Failsafe mechanism. With SHA-2, it’s difficult to encrypt files since any two inputs will generate distinct hash values. Normal Pre-Image Assault By “Pre-image Attack,” we imply an assault that begins with the detection of an image (a bug in the programme) by a security algorithm, followed by the discovery of the same image during an attack. This type of attack cannot succeed since SHA 2 has Attribute based Opposition, which renders it impervious from the start. This is so because SHA 2 uses a double hashing technique that makes it incredibly difficult for an attacker to produce a legitimate preimage. Hash Collision, in which attackers create input data sources with identical hashing values, results in exactly the same emission hash values, is not a threat to SHA-256. This is in contrast to the MD Clustering Method as well as the SHA-1 Hashing Method.
252
S. Sakhamuri et al.
Fig. 22.6 Values hashed out by several algorithms and compared
Values hashed out by different algorithms have been compared and same have been represented in Fig. 22.6.
22.6 Conclusion When combined with other forms of security, such firewalls, an IDS may greatly improve the effectiveness of a network’s defences. The primary function of an IDS is to spot the telltale symptoms of an attack, notify the system administrators, and have them relay fake information to the intruders. IDS are often divided into two distinct types, based on their approach to detection: exploitation identification and outlier detection. In terms of how they function, you may place them in either the network or host category of intrusion detection systems. Today’s IDS include data gathered from both the network and the host computer. The more threats an Intrusion Detection System finds and sends to the false data to hackers, the more convincing it seems to be. Hence By encrypting the secret message by hash but moreover if the developer split seem to be hash code it will get just the false info, OTP will similarly would now there for confirmation, we are trying to overcome the disadvantage of the current system. It provides an integrated home that will not only be control and monitor our data with segregation of duties but also prevent malicious attacks.
22 An Invasion Detection System in the Cloud That Use Secure Hashing …
253
References 1. Medical Data in the Crosshairs: Why Is Healthcare an Ideal Target? 14 Aug 2021. Available at: https://www.trendmicro.com/vinfo/us/security/news/cyber-attacks/medical-data-in-the-cro sshairs-why-is-healthcare-an-ideal-target. Accessed 15 May 2020 2. Conaty-Buck, S.: Cybersecurity and healthcare records. Am. Nurse Today 12, 62–64 (2017) 3. eGOVERNMENT: Cloud Computing Initiatives, 22 April 2021. Available at: https://www.bah rain.bh/. Accessed 15 July 2021 4. Moukhafi, M., El Yassini, K., Bri, S.: A novel hybrid GA and SVM with PSO feature selection for intrusion detection system. Int. J. Adv. Sci. Res. Eng. 4, 129–134 (2018) 5. Kuang, F., Xu, W., Zhang, S.: A novel hybrid KPCA and SVM with GA model for intrusion detection. Appl. Soft Comput. J. 18, 178–184 (2014) 6. Al-Yaseen, W.L., Othman, Z.A., Nazri, M.Z.A.: Multi-level hybrid support vector machine and extreme learning machine based on modified K-means for intrusion detection system. Expert Syst. Appl. 67, 296–303 (2017) 7. Feng, W., Zhang, Q., Hu, G., Huang, J.X.: Mining network data for intrusion detection through combining SVMs with ant colony networks. Future Gener. Comput. Syst. 37, 127–140 (2014) 8. Ambusaidi, M.A., He, X., Nanda, P., Tan, Z.: Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans. Comput. 65, 2986–2998 (2016) 9. Mustapha, B., Salah, E.H., Mohamed, I.: A two-stage classifier approach using RepTree algorithm for network intrusion detection. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 8, 389–394 (2017) 10. Tuan, A., McLernon, D., Mhamdi, L., Zaidi, S.A.R., Ghogho, M.: Intrusion detection in SDN-based networks: deep recurrent neural network approach. In: Advanced Sciences and Technologies for Security Applications, pp. 175–195. Springer, Berlin/Heidelberg, Germany (2019) 11. Nguyen, K.K., Hoang, D.T., Niyato, D., Wang, P., Nguyen, D., Dutkiewicz, E.: Cyberattack detection in mobile cloud computing: a deep learning approach. In: Proceedings of the IEEE Wireless Communications and Networking Conference, Barcelona, Spain, 15–18 April 2018 12. He, D., Qiao, Q., Gao, Y., Zheng, J., Chan, S., Li, J., Guizani, N.: Intrusion detection based on stacked autoencoder for connected healthcare systems. IEEE Netw. 33, 64–69 (2019) 13. Mudzingwa, D., Agrawal, R.: A study of methodologies used in intrusion detection and prevention systems (IDPS). In: Proceedings of the IEEE, Southeastcon, Orlando, FL, USA, 15–18 Mar 2012 14. Aljawarneh, S., Aldwairi, M., Yassein, M.B.: Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J. Comput. Sci. 25, 152–160 (2018) 15. Ludinard, R., Totel, É., Tronel, F., Nicomette, V., Kaâniche, M., Alata, É., Bachy, Y.: Detecting attacks against data in web applications. In: Proceedings of the 7th International Conference on Risks and Security of Internet and Systems (CRiSIS), Cork, Ireland, 10–12 Oct 2012 16. Li, X., Xue, Y., Malin, B.: Detecting anomalous user behaviors in workflow-driven web applications. In: Proceedings of the IEEE Symposium on Reliable Distributed Systems, Irvine, CA, USA, 8–11 Oct 2012, pp. 1–10 17. Le, M., Stavrou, A., Kang, B.B.: DoubleGuard: detecting intrusions in multitier web applications. IEEE Trans. Dependable Secur. Comput. 9, 512–525 (2012) 18. Nascimento, G., Correia, M.: Anomaly-based intrusion detection in software as a service. In: Proceedings of the 2011 IEEE/IFIP 41st International Conference on Dependable Systems and Networks Workshops (DSN-W), Hong Kong, China, 27–30 June 2011, pp. 19–24 19. Ariu, D.: Host and Network Based Anomaly Detectors for HTTP Attacks. Ph.D. Thesis, University of Cagliari, Cagliari, Italy (2010) 20. Gimenez, C., Villaegas, A., Alvarez, G.: An anomaly-based approach for intrusion detection in web traffic. J. Inf. Assur. Secur. 5, 446–454 (2010) 21. Saraniya, G.: Securing the network using signature based IDS in network IDS. Shodhshauryam Int. Sci. Ref. Res. J. 2, 99–101 (2019)
254
S. Sakhamuri et al.
22. Gao, W., Morris, T.: On cyber attacks and signature based intrusion detection for MODBUS based industrial control systems. J. Digit. Forens. Secur. Law 9, 37–56 (2014) 23. Uddin, M., Rehman, A.A., Uddin, N., Memon, J., Alsaqour, R., Kazi, S.: Signature-based multi-layer distributed intrusion detection system using mobile agents. Int. J. Netw. Secur. 15, 97–105 (2013) 24. Kumar, U., Gohil, B.N.: A survey on intrusion detection systems for cloud computing environment. Int. J. Comput. Appl. 109, 6–15 (2015)
Chapter 23
Detection of Suspicious Human Activities from Surveillance Camera Using Neural Networks A. Kousar Nikhath, N. Sandhya, Sayeeda Khanum Pathan, and B. Venkatesh
Abstract Constant data monitoring by humans to determine if events are anomalous is a near-impossible undertaking that necessitates a crew and their undivided attention. Many organizations have installed CCTV cameras that record video and store it on a centralized server so that individuals and their interactions may be monitored at all times. The requirement for automatic systems to detect and characterize suspicious actions caused by objects is growing as the amount of video data acquired everyday by surveillance cameras grows. It’s important to show which frame and which part of it contains the odd activity so that the unexpected activity can be judged as abnormal or suspicious more quickly. The main task is to follow these moving items through the visual series. This is accomplished by turning video into frames and assessing the people and their activities within those frames.
23.1 Introduction The major goal of this project is to create a video-based module that can identify suspicious human behaviour using surveillance camera data. The main task is to follow these moving items through the visual series. This is accomplished by turning video into frames and assessing the people and their activities inside those frames. This is important for preventing threats before they occur, as well as providing solid forensic evidence for identifying offenders when they do. Face recognition and gait recognition are utilised in this sector, with face recognition being the more flexible. Face recognition may be used to predict a person’s head orientation. Many applications combine motion recognition with face recognition, such as verifying a person’s identity, identifying a person, and detecting the presence or absence of a person at A. Kousar Nikhath (B) · N. Sandhya · S. Khanum Pathan · B. Venkatesh Department of CSE-AIML and IoT, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, India e-mail: [email protected]
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_23
255
256
A. Kousar Nikhath et al.
a certain location and time. Also, it’s important to show which frame and which section of it has the odd activity, so that the unexpected activity may be judged as abnormal/suspicious more quickly.
23.2 Related Work Deep Learning for Human Suspicious Activity Detection [1]. Uses Resnet-18, Resnet-34, and Resnet-50 to identify suspicious activity in pictures and videos using a deep learning technique. Can analyse video footage from cameras in real time and determine whether or not the action is suspicious. The main disadvantage is that it does not classify the frames before and after the action. Only 5 suspicious actions are detected. Suspicious Behavior Pattern Detection Using Temporal Saliency. Applied Sciences are a subset of the sciences [2]. Using a temporal saliency map, detects suspicious behavior by integrating the moving reactivity characteristics of motion magnitude and gradient retrieved by optical flow. The biggest flaw is that it can’t recognize complicated behavior. Abnormal Event Detection in Large Videos with Sparse Coding Guided Spatiotemporal Feature Learning [1]. The quadruplet idea is utilized to describe the multilevel similarity connection between these inputs based on the statistical information of the shared atoms, which may be utilized to build a generalized triplet loss for C3D network training [2–22].
23.3 Existing System • There are existing systems which detect human activity but with less probability and false alarm rate is high [23–28]. • Fails to detect the human in case of partial occlusion in complex and challenging scenarios. Limited to academic area. • The existing system leads to misdetections as the number of people increases in the video frames [29–32].
23.4 Proposed System We offer a unique method for identifying suspicious behavior in moving persons inside the field of view of a camera. The object tracking procedure is carried out until suspicious behavior of the item is detected in a sequence of N frames. To automate this procedure, we created a ResNet model and trained it on the COCO dataset, which is accessible in Python. We created a user interface to make it easier for the user to utilise the model. The user may submit any video, and the model will extract frames from that video, which will then be evaluated by the model to determine if
23 Detection of Suspicious Human Activities from Surveillance Camera …
257
Fig. 23.1 Output image of uploading video in python script
the frame is suspicious or not. To complete this project, we used a five-step module that included uploading a video input file, segmenting the movie into frames, feature extraction, comparison, and activity classification. The user may submit any video, and the model will extract frames from that video, which will then be evaluated by the model to determine if the frame is suspicious or not. The SNV technology introduces a revolutionary framework for recognising human actions from depth camera video sequences. To simultaneously define local motion and shape information, they expand the surface normal to polynomial by building local nearby excited surface normals from a depth series. The authors next offer an universal Super Normal Vector (SNV) strategy for combining low-level polynomials into a discriminative representation.
23.4.1 Uploading Video Input File We upload a video Surveillance footage into the graphical user interface (GUI), we have a option to upload the file, by clicking on it will redirect to open file (Fig. 23.1).
23.4.2 Frames Processing by CNN Model After uploading the video file, the CNN model will convert the video file into large number of frames which will process each image for suspicious human activity (Fig. 23.2).
258
A. Kousar Nikhath et al.
Fig. 23.2 Output image of frames processing
23.4.3 Detecting Suspicious Images Each frame is then processed by CNN model, it will compare with the trained model, if any suspicious activity is detected then it will be show in GUI and also with the frame number, the Object tracking process is done until the detection of suspicious behavior of the object in a sequence of N frames. PredictImage function is used to predict the class as suspicious or not (Figs. 23.3 and 23.4).
23.5 Architecture The architecture shown demonstrates how the design project works. It gives a brief outlook of the model. In this surveillance video file is converted into frames and frames are trained with Suspicious activity video dataset and then training the model using CNN, we get to know on which frames it is suspicious.
23 Detection of Suspicious Human Activities from Surveillance Camera …
Fig. 23.3 Output image of suspicious frames displayed in the GUI
Fig. 23.4 Architecture of the project
23.6 Process Execution See Fig. 23.5.
259
260
A. Kousar Nikhath et al.
Fig. 23.5 The process execution
23.7 Results We have achieved an accuracy of 95.9% for our ResNet50 model (Fig. 23.6). The increasing trend of training and validation accuracies and decreasing training and validation losses indicate that the model has learnt successfully without any overfitting or underfitting (Fig. 23.7). Fig. 23.6 Training and validation accuracy graph
23 Detection of Suspicious Human Activities from Surveillance Camera …
261
Fig. 23.7 Training and validation loss graph
23.8 Future Scope In future, we can extend the model for person identification using criminal images collected from Police Stations. We can also make extensions like working with live videos and generating alarms by sending notifications to the monitoring authorities to take immediate action when the suspicious activity is still going on. This prevents the suspicious activities thereby decreasing the crime rate.
23.9 Conclusion We may infer that utilising the ResNet Model to classify footage obtained by security cameras is the most effective method for detecting suspicious human behaviour. In the 80-90 percentile, we’ve improved accuracy. The model can deal with occlusion. The time it takes to make a forecast is likewise short. The suggested approach aids in the discovery of crimes more quickly, allowing appropriate steps to be made before further repercussions arise.
References 1. Amrutha, C.V., Chandran, J., Joseph, A.: Deep Learning Approach for Suspicious Activity Detection from Surveillance Video, pp. 335–339 (2020). https://doi.org/10.1109/ICIMIA 48430.2020.9074920 2. Cheoi, K.: Temporal saliency-based suspicious behavior pattern detection. Appl. Sci. 10, 1020 (2020). https://doi.org/10.3390/app10031020 3. Basu, S., Kannayaram, G., Ramasubbareddy, S., Venkatasubbaiah, C.: Improved genetic algorithm for monitoring of virtual machines in cloud environment. In: Smart Intelligent Computing and Applications, pp. 319–326. Springer, Singapore (2019)
262
A. Kousar Nikhath et al.
4. Somula, R., Sasikala, R.: Round robin with load degree: an algorithm for optimal cloudlet discovery in mobile cloud computing. Scalable Comput.: Pract. Exp. 19(1), 39–52 (2018) 5. Somula, R., Anilkumar, C., Venkatesh, B., Karrothu, A., Kumar, C. P., Sasikala, R.: Cloudlet services for healthcare applications in mobile cloud computing. In: Proceedings of the 2nd International Conference on Data Engineering and Communication Technology, pp. 535–543. Springer, Singapore (2019) 6. Somula, R.S., Sasikala, R.: A survey on mobile cloud computing: mobile computing+ cloud computing (MCC = MC+ CC). Scalable Comput.: Pract. Exp. 19(4), 309–337 (2018) 7. Somula, R., Sasikala, R.: A load and distance aware cloudlet selection strategy in multi-cloudlet environment. Int. J. Grid High-Perform. Comput. (IJGHPC) 11(2), 85–102 (2019) 8. Somula, R., Sasikala, R.: A honeybee inspired cloudlet selection for resource allocation. In: Smart Intelligent Computing and Applications, pp. 335–343. Springer, Singapore (2019) 9. Nalluri, S., Ramasubbareddy, S., Kannayaram, G.: Weather prediction using clustering strategies in machine learning. J. Comput. Theor. Nanosci. 16(5–6), 1977–1981 (2019) 10. Sahoo, K.S., Tiwary, M., Mishra, P., Reddy, S.R.S., Balusamy, B., Gandomi, A.H.: Improving end-users utility in software-defined wide area network systems. IEEE Trans. Netw. Serv. Manag. (2019) 11. Sahoo, K.S., Tiwary, M., Sahoo, B., Mishra, B.K., RamaSubbaReddy, S., Luhach, A.K.: RTSM: response time optimisation during switch migration in software-defined wide area network. IET Wirel. Sensor Syst. (2019) 12. Somula, R., Kumar, K.D., Aravindharamanan, S., Govinda, K.: Twitter sentiment analysis based on US presidential election 2016. In: Smart Intelligent Computing and Applications, pp. 363–373. Springer, Singapore (2020) 13. Sai, K.B.K., Subbareddy, S.R., Luhach, A.K.: IOT based air quality monitoring system using MQ135 and MQ7 with machine learning analysis. Scalable Comput.: Pract. Exp. 20(4), 599– 606 (2019) 14. Somula, R., Narayana, Y., Nalluri, S., Chunduru, A., Sree, K.V.: POUPR: properly utilizing user-provided recourses for energy saving in mobile cloud computing. In: Proceedings of the 2nd International Conference on Data Engineering and Communication Technology, pp. 585– 595. Springer, Singapore (2019) 15. Vaishali, R., Sasikala, R., Ramasubbareddy, S., Remya, S., Nalluri, S.: Genetic algorithm-based feature selection and MOE fuzzy classification algorithm on Pima Indians Diabetes dataset. In: 2017 International Conference on Computing Networking and Informatics (ICCNI), pp. 1–5. IEEE (2017) 16. Somula, R., Sasikala, R.: A research review on energy consumption of different frameworks in mobile cloud computing. In: Innovations in Computer Science and Engineering, pp. 129–142. Springer, Singapore (2019). Kumar, I.P., Sambangi, S., Somukoa, R., Nalluri, S., Govinda, K.: Server security in cloud computing using block-chaining technique. In: Data Engineering and Communication Technology, pp. 913–920. Springer, Singapore (2020) 17. Kumar, I.P., Gopal, V.H., Ramasubbareddy, S., Nalluri, S., Govinda, K.: Dominant color palette extraction by k-means clustering algorithm and reconstruction of image. In: Data Engineering and Communication Technology, pp. 921–929. Springer, Singapore (2020) 18. Nalluri, S., Saraswathi, R.V., Ramasubbareddy, S., Govinda, K., Swetha, E.: Chronic heart disease prediction using data mining techniques. In: Data Engineering and Communication Technology, pp. 903–912. Springer, Singapore (2020) 19. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: Task scheduling based on hybrid algorithm for cloud computing. In International Conference on Intelligent Computing and Smart Communication, pp. 415–421. Springer, Singapore (2020) 20. Srinivas, T.A.S., Ramasubbareddy, S., Govinda, K., Manivannan, S.S.: Web image authentication using embedding invisible watermarking. In: International Conference on Intelligent Computing and Smart Communication, pp. 207–218. Springer, Singapore (2020) 21. Krishna, A.V., Ramasubbareddy, S., Govinda, K.: A unified platform for crisis mapping using web enabled crowdsourcing powered by knowledge management. In: International Conference on Intelligent Computing and Smart Communication, pp. 195–205. Springer, Singapore (2020)
23 Detection of Suspicious Human Activities from Surveillance Camera …
263
22. Saraswathi, R.V., Nalluri, S., Ramasubbareddy, S., Govinda, K., Swetha, E.: Brilliant corp yield prediction utilizing Internet of Things. In: Data Engineering and Communication Technology, pp. 893–902. Springer, Singapore (2020) 23. Baliarsingh, S.K., Vipsita, S., Gandomi, A.H., Panda, A., Bakshi, S., Ramasubbareddy, S.: Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network. Comput. Methods Prog. Biomed. 105625 (2020) 24. Lavanya, V., Ramasubbareddy, S., Govinda, K.: Fuzzy keyword matching using N-gram and cryptographic approach over encrypted data in cloud. In: Embedded Systems and Artificial Intelligence, pp. 551–558. Springer, Singapore (2020) 25. Revathi, A., Kalyani, D., Ramasubbareddy, S., Govinda, K.: Critical review on course recommendation system with various similarities. In: Embedded Systems and Artificial Intelligence, pp. 843–852. Springer, Singapore (2020) 26. Sathish, K., Ramasubbareddy, S., Govinda, K.: Detection and localization of multiple objects using VGGNet and single shot detection. In: Emerging Research in Data Engineering Systems and Computer Communications, pp. 427–439. Springer, Singapore (2020) 27. Sennan, S., Ramasubbareddy, S., Nayyar, A., Nam, Y., Abouhawwash, M.: LOA-RPL: Novel Energy-Efficient Routing Protocol for the Internet of Things Using Lion Optimization Algorithm to Maximize Network Lifetime (2021) 28. Rout, J.K., Dalmia, A., Rath, S.K., Mohanta, B.K., Ramasubbareddy, S., Gandomi, A.H.: Detecting product review spammers using principles of big data. IEEE Trans. Eng. Manag. (2021) 29. Sennan, S., Ramasubbareddy, S., Balasubramaniyam, S., Nayyar, A., Kerrache, C.A., Bilal, M.: MADCR: mobility aware dynamic clustering-based routing protocol in Internet of Vehicles. China Commun. 18(7), 69–85 (2021) 30. Sahoo, K.S., Ramasubbareddy, S., Balusamy, B., Deep, B.V.: Analysing control plane scalability issue of software defined wide area network using simulated annealing technique. Int. J. Grid Util. Comput. 11(6), 827–837 (2020) 31. Devulapalli, S., Venkatesh, B., Somula, R.: Business analysis during the pandemic crisis using deep learning models. In: AI-Driven Intelligent Models for Business Excellence, pp. 68–80. IGI Global (2023) 32. Kirubasri, G., Sankar, S., Prasad, G., Naga Chandrika, G., Ramasubbareddy, S.: LQETA-RP: link quality based energy and trust aware routing protocol for wireless multimedia sensor networks. Int. J. Syst. Assur. Eng. Manag. 1–13 (2023)
Chapter 24
High Resolution Remote Sensing Image Classification Using Convolutional Neural Networks K. Giridhar Sai, B. Sujatha, R. Tamilkodi, and N. Leelavathy
Abstract It is very important to extract key image features from high resolution in Remote Sensing imagery in order to serve various purposes either from the side of government or from the side of economical and ethical issues. With the advent of big data Technologies high resolution imagery can be easily managed, but at the same time with the curse of dimensionality, there exists a dilemma of which feature has to be selected. Extraction of important features and key features plays a prominent role in the whole methodology. Employing convolutional neural networks to handle these features is more crucial in these scenarios. Extracting of features from these sorts of images will be very fruitful for various applications and this can be achieved by employing feature selection in the classification of High-Resolution Remote Sensing images, which involves the use of Apache Spark, second-order Grey Level Co-occurrence Matrix features for additional processing and Convolutional Neural Networks.
24.1 Introduction The core technologies in earth observation are Remote Sensing Technology which will cover ground reception, basic research, calibration, information transmission and storage, in-orbit processing, verification and applied research which forms the digital earth’s fundamental information and resources [1]. This is carried out by spacecraft, aircraft, satellites and several terrestrial platforms which involves photoelectric devices and humans, this observe and explore activities in the environment. In a definite and predefined time period, a satellite can recurrently observed the globe K. Giridhar Sai (B) · B. Sujatha · R. Tamilkodi · N. Leelavathy Department of CSE, Godavari Institute of Engineering and Technology (A), Rajahmundry, India e-mail: [email protected] B. Sujatha e-mail: [email protected] R. Tamilkodi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_24
265
266
K. Giridhar Sai et al.
as a whole or a part of it. These days, most of the observations made on earth by satellites carry on with SAR systems whose resolutions are very high. Spectral, temporal, radiometric and spatial resolutions are the four categories of remote sensing data resolution. A projection of the pixel on the earth surface or ground is referred to as the spatial resolution [2]. In other words, the special resolution can be defined as the minimum object size which can be determined by the sensor. Depending upon the applications, the requirements for a reservation will differ. Table 24.1 depicts the requirements for different applications. The development in a spatial resolution, which is currently at 0.30 m for panchromatic images at nadir view, is commendable despite the fact that resolutions are still being improved. Continuous demand for a high-resolution data necessitates improvements in spacecraft technologies, camera systems, and the distribution network for data images. These days many government agencies and organizations are launching high resolution satellites for government and commercial applications. In all areas of geography high resolution remote sensing (HRRS) images play a vital part because it contains both spatial information and structural information abundantly. Spectral information of an image is very much used in analysis and interpretation of images in prior studies. The remote sensing images are acquired by measuring the electromagnetic radiation reflected or emitted by the Earth’s surface or atmosphere, and the interaction between the radiation and the materials on the surface can cause variations in the recorded spectra. As a result, different objects Table 24.1 Applications versus resolutions Application
Spatial resolution (m)
Spectral bands
Radiometric resolution (bits)
Temporal resolution (days)
Agriculture
5–50
Visible region—near infrared shortwave infrared
10
2–30
Forestry
05–150
Visible region—near infrared shortwave infrared- thermal infrared
8
Few months
Limnology Oceanography
20–100 100–1000
Visible region—near infrared thermal infrared
12–14
Few weeks
Disaster management
< 10
Visible region—near infrared
10
Few hours
Cartography
k. From Remark 2, any two given segments can have at most one intersection point. Hence, no pair of segments in . Sv can have a shared vertex different from .v. Thus, if .v is not in .C, more than .k distinct vertices (one from each segment in . Sv ) are required to be in .C to cover these segments. This would mean .|C| > k. The conclusion is, in case the arrangement . A contains any vertex with degree greater than .k, then that vertex must be present in every solution set .C where .|C| ≤ k. Reduction Rule GSS 4 If . A contains a vertex .v ∈ V with degree greater than .k, reduce the parameter by one and delete from . A all the segments passing through .v. The updated instance becomes .( A − Sv , k − 1). After exhaustively applying the reduction rules GSS 1 to GSS 4, we are left with a GSS instance in which, for every vertex .v, .2 ≤ d(v) ≤ k. If the highest degree vertex has a degree .h in an arrangement, then the upper bound on the number of segments a solution set .C (where .|C| ≤ k) can cover is .kh. The below given lemma follows from this fact. Lemma 2 Given a yes-instance of GSS, if the reduction rule GSS 4 is not applicable, then .|S| ≤ k 2 . Proof Since the given instance is a yes-instance, it contains a solution set .C where |C| ≤ k. Also, every segment in . A is covered by some vertex from .C. As reduction rule GSS 4 is not applicable, the number of segments intersecting at a vertex is at most .k. Hence . A can have at most .k 2 segments. If . A has more than .k 2 segments, this is a no-instance. □
.
Reduction rules GSS 1, GSS 2 and GSS 4 decrement the parameter .k. During the application of the reduction rules, if the parameter .k becomes negative then the given instance is a no-instance. This observation and Lemma 2 together lead to the last reduction rule. Reduction Rule GSS 5 For the input instance .(A, k), if .k < 0 or .|S| > k 2 , then it is a no-instance. All the reduction rules are safe as they preserve the equivalency between the instances, and all of them takes only linear time for execution. After applying all the reduction rules exhaustively, we get a kernel with at most .k 2 segments. Theorem 2 GSS admits a kernel with .O(k 2 ) segments. Consequently, the problem is FPT.
29 Classical and Parameterized Complexity of Line Segment …
335
29.3.2 FPT Algorithm for GSS Problem After the kernelization of the input instance (. A, .k), we obtain a kernel with at most k 2 segments. As the kernel can have a maximum of .k 2 segments, the maximum number of intersections (vertices) a segment can have is at most .k 2 . A maximum of 2 2 .k segments each having at most .k vertices restrict the kernel to have a maximum 4 of .k vertices. FPT algorithm for GSS follows bounded search trees method [7], which utilizes the upper bound on the count of intersection points (vertices) a segment in a kernel can have. The algorithm performs a recursive branching over the vertices on a segment. The branching is performed as follows: For the input GSS instance, all the segments have to be guarded. The strategy is to pick an arbitrary segment .s and choose a vertex .v on .s (.v ∈ Vs ) as a candidate in the solution set, and decrement the parameter .k by one. By selecting .v in the cover, the set of segments . Sv get covered by .v, and hence we can remove them from . A. Then recursively branch on the reduced instance (. A − Sv , .k − 1). If the branching with vertex .v does not give a yes-instance, backtrack and choose another vertex from .Vs for branching. For a recursive call, if any of the branching returns a yes-instance, we conclude that the given instance is a yes-instance. If none of the branching returns a yes-instance, we conclude that the given instance is a no-instance. Time Complexity: Since the number of vertices on each segment, (.|Vs |) is .O(k 2 ), the maximum number of branches in the search tree (branching factor) is .O(k 2 ). The depth of the search tree is bounded by .k, as the the bounded search trees method terminates when .k becomes zero. Thus the algorithm does a search over .O((k 2 )k ) sub-problems. Thus the overall running time of the algorithm is .O∗ (k 2k ). .
29.3.3 GSS Parameterized by Face Density of the Arrangement The efficiency of the FPT algorithm for GSS given in Sect. 29.3.2 depends on the count of the vertices on the segment selected for branching. In this section, we give an upper bound on the count of intersection points (vertices) on a segment in a kernel as a function of . f , the number of faces in the planar embedding of the arrangement. We show that, in an arrangement formed by .n segments, we can always find a segment with at most .(2 f − 4)/n + 2 vertices. We define this value as a structural parameter called face density of the arrangement which gives a better upper limit to the branching factor of the FPT algorithm in Sect. 29.3.2. Lemma 3 If the number of faces in the planar embedding of an arrangement . A of n segments is . f , then there exists a segment .si ⊆ S with at most .(2 f − 4)/n + 2 vertices.
.
336
M. Rema et al.
Proof Consider an arrangement . A with .n segments. Let the number of vertices in a segment .si ⊆ S be . ji . That is, the segment .si is divided into . ji − 1 edges in the planar embedding of . A. To count the edges in the planar embedding, take the summation over every segments in . S. .
E=
n ∑
( ji − 1) =
i=1
n ∑
ji − n
(29.1)
i=1
∑n
i=1 ji gives the sum of vertices over all the segment in . S. But in this summation, each vertex is counted at least twice, as each vertex is shared by at least two segments. In order to nullify the over-counting, we divide it by two and get the upper limit of the total number of vertices in planar embedding as, .
1∑ ji 2 i=1 n
.
V ≤
(29.2)
Substituting the values of . E and .V in the Euler’s Formula, .E + 2 = V+ f n n ∑ 1∑ . ji − n + 2 ≤ ji + f 2 i=1 i=1 1∑ ji 2 i=1 n
.
≤
f +n−2
(29.3)
Now, let .d be the minimum number of vertices in any of the segments in . S. i.e., d = min js
.
s∈S
nd
≤
1 nd 2
≤
.
.
n ∑
ji i=1 n ∑ 1 ji 2 i=1
From Eq. 29.3, .
d
.
1 nd 2
≤
≤
f +n−2
(2 f − 4)/n + 2
(29.4)
.□ FPT Algorithm with Parameter .d: Lemma 3 ensures that we can always find a segment with .(2 f − 4)/n + 2 vertices. We define this bound as a new parameter face density, .d. In the FPT algorithm discussed in Sect. 29.3.2, instead of selecting an arbitrary segment in each recursive call, we can select the segment with at most .d vertices to perform branching. This brings down the maximum number of branches in the algorithm from .k 2 to .d, resulting in time complexity of .O∗ (d k ).
29 Classical and Parameterized Complexity of Line Segment …
337
29.4 Conclusion Given an arrangement, defined as the geometric structure induced by given line segments in a 2D plane, we analyzed the complexity of two segment covering problems—Cell Cover for Segments (CCS) and Guarding a Set of Segments (GSS). The NP-completeness of CCS is proved by giving a reduction from 3-connected planar face cover problem to CCS. We have shown that GSS problem is fixed parameter tractable by coming up with a kernel of .O(k 2 ) segments and provided a .O∗ (k 2k ) FPT algorithm for solving GSS. We also defined a structural parameter called face density (.d) of the arrangement, and gave a .O∗ (d k ) FPT algorithm for GSS problem.
References 1. Bondy, A., Murty, U.: Planar Graphs, Graph Theory, Graduate Texts in Mathematics, 244, Theorem 10.28, p. 267. Springer, Berlin (2008) 2. Bose, P., Cardinal, J., Collette, S., Hurtado, F., Korman, M., Langerman, S., Taslakian, P.: Coloring and guarding line arrangements. Discrete Math. Theor. Comput0 Sci. 15(3), 139–154 (2013) 3. Brass, P.: Geometric problems on coverage in sensor networks. In: Bárány, I., Böröczky, K.J., Tóth, G.F., Pach, J. (eds.) Geometry—Intuitive, Discrete, and Convex. Bolyai Society Mathematical Studies, vol. 24, pp. 91–108. Springer, Berlin, Heidelberg (2013) 4. Brimkov, V.E., Leach, A., Mastroianni, M., Wu, J.: Guarding a set of line segments in the plane. Theor. Comput. Sci. 412, 1313–1324 (2011) 5. Brimkov, V.E., Leach, A., Wub, J., Mastroianni, M.: Approximation algorithms for a geometric set cover problem. Discrete Appl. Math. 160, 1039–1052 (2012) 6. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms. McGrawHill Science/Engineering/Math, 2 edn. (2001) 7. Cygan, M., Fomin, F.V., Kowalik, L., Lokshtanov, D., Marx, D., Pilipczuk, M., Pilipczuk, M., Saurabh, S.: Parameterized algorithms. Springer, Berlin (2015) 8. Das, G., Goodrich, M.T.: On the complexity of approximating and illuminating threedimensional convex polyhedra. In: Proceeding of the 4th Workshop Algorithms Dara Structure, Lecture Notes in Computer Science. Springer, Berlin (1995) 9. Estivill-Castro, V., Heednacram, A., Suraweera, F.: Np-completeness and fpt results for rectilinear covering problems. J. Univers. Comput. Sci. 16(5), 622–652 (2010) 10. Fáry, I.: On straight line representation of planar graphs. Acta. Sci. Math. (Szeged) 11, 229–233 (1948) 11. Gavrilova, M.L.: Computational geometry methods and intelligent computing. In: Generalized Voronoi Diagram: A Geometry-Based Approach to Computational Intelligence. vol. 158, pp. 3–10. Springer, Berlin (2008) 12. Hochbaum, D., Maass, W.: Approximation schemes for covering and packing problems in image processing and vlsi. In: STACS 84: Symposium of Theoretical Aspects of Computer Science, pp. 55–62. Paris (1984) 13. Joshi, A., Narayanaswamy, N.S.: Approximation algorithms for hitting triangle-free sets of line segments. In: Algorithm Theory SWAT 2014. Lecture Notes in Computer Science. vol. 8503, pp. 357–367 (2014) 14. Das, G.K., Roy, S., Das, S., Nandy, S.C.: Variations of base station placement problem on the boundary of a convex region. Int. J. Found. Comput. Sci. 19(2), 405–427 (2008) 15. Korman, M., Poon, S.H., Roeloffzen, M.: Line segment covering of cells in arrangements. Inf. Process. Lett. 129, 25–30 (2018)
338
M. Rema et al.
16. Lapinskait˙e, I., Kuckailyt˙e, J.: The impact of supply chain cost on the price of the final product. Bus. Manage. Educ. 12, 109–126 (2014) 17. Tanimoto, S.L., Fowler, R.J.: Covering image subsets with patches. In: Proceedings of the 5th International Conference on Pattern Recognition, vol. 2, pp. 835–839. MiamiBeach, Florida, USA (1980) 18. Whitney, H.: 2-isomorphic graphs. Amer. J. Math. pp. 245–254 (1933) 19. Yang, D., Misra, S., Fang, X., Xue, G., Zhang, J.: Two-tiered constrained relay node placement in wireless sensor networks: computational complexity and efficient approximations. IEEE Trans. Mob. Comput. 11(8), 1399–1411 (2012)
Chapter 30
User Story-Based Automatic Keyword Extraction Using Algorithms and Analysis Arantla Jaagruthi, Mallu Varshitha, Karumuru Sai Vinaya, Vayigandla Neelesh Gupta, C. Arunkumar, and B. A. Sabarish
Abstract Writing effective requirement specification documents in the face of dynamic user needs is a significant challenge in modern software development. Keyword extraction algorithms can help to retrieve relevant information from functional requirements provided by users. The process of identifying a concise set of words that capture the essence of user stories without losing important information is accomplished through automatic keyword extraction. This paper presents a comparative study of four popular algorithms for keyword extraction: Rapid Automatic Keyword Extraction (RAKE), Yet Another Keyword Extraction (YAKE), TextRank, and KeyBERT. The algorithms are analyzed based on their ability to extract keywords and provide corresponding scores. Additionally, N-gram analysis is performed using the extracted keywords. The study concludes that the RAKE algorithm exhibits better performance in extracting relevant keywords from user stories compared to the other algorithms.
A. Jaagruthi · M. Varshitha · K. S. Vinaya · V. N. Gupta · C. Arunkumar (B) · B. A. Sabarish Department of Computer Science and Engineering, Amn.rita School of Computing, Amrita Vishwa Vidyapeetham, Coimbatore, India e-mail: [email protected] A. Jaagruthi e-mail: [email protected] M. Varshitha e-mail: [email protected] K. S. Vinaya e-mail: [email protected] V. N. Gupta e-mail: [email protected] B. A. Sabarish e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_30
339
340
A. Jaagruthi et al.
30.1 Introduction The constantly evolving software environment and technologies have a significant impact on the requirement process, making the decision of selecting an appropriate requirement engineering model increasingly complex. The role of requirements engineering is to bridge the gap between software design and construction by collecting software requirements from customers, understanding, evaluating, and documenting them. Proper selection of requirement engineering methods and strategies is crucial for successful software development, as poor choices can lead to failure. To develop high-quality software effectively, it is necessary to have an efficient requirement engineering process. Quality Function Deployment (QFD) is a structured approach that translates customer needs into technical software requirements, ensuring that the user’s needs are met. The “voice of the user” is the term used to represent both stated and unstated user requirements, which can be recorded in different ways, including face-to-face discussions, surveys, user parameters, observation, guarantee data, and field reports. These user needs are then summarized in a “house of quality.” Poorly defined requirements can lead to missed deadlines and additional costs, ultimately resulting in software failure. During requirements elicitation in software engineering, a data analyst works with users and stakeholders to analyze and validate user requirements and assumptions, as well as project risks. The discussion concludes when stakeholders have no new use cases, or when new use cases have a low priority and can be implemented in subsequent iterations. User stories allow for the estimation of individual keywords or groups of keywords by analyzing what the user is trying to convey. Keyword extraction plays a pivotal role in data analysis. The initial task is to extract precise words or groups of keywords that emphasize the crucial content of a document. The fundamental operations related to keyword extraction in data mining include automatic filtration and indexing, automatic word summarization, data visualization, concept identification, and continuous integration. Statistical methodology is a crucial approach to recognize keywords based on statistical data without requiring any training data. Some prevalent techniques in statistical methodology are co-occurrence and co-allocation of words, graph-based model, and lexical and syntactic analysis [1].
30.2 Literature Survey The paper explores keyword extraction using a graph model with three features: semantic space, word location, and co-occurrence. The first step is to download the user’s micro-blog API, followed by data cleansing, word segmentation, part-ofspeech tagging, and removal of redundant words. Next, a graph model is created to identify keywords based on their co-presence and numerical order [2]. The study presents various approaches, techniques, and procedures for keyword extraction, including graph-based techniques that rely on node extraction. It also
30 User Story-Based Automatic Keyword Extraction Using Algorithms …
341
includes Croatian language extraction and a selectivity-based keyword extraction method for text summarization that can handle texts of different lengths, multiple languages, and other datasets [3]. The research focuses on different types of graphs and considers vertex and edge representations. It covers many techniques and methods for keyword extraction, including grades, out of grade, proximity, and selectivity-based approaches to represent text vertices and edges [4]. The study introduces two techniques, entropic and clustering, to the problem of statistical keyword extraction from text. A new strategy is proposed to find keywords based on user requirements. Both strategies are effective for long text and focused on short text, articles, and the results indicate that clustering is more precise than entropic. The main objective is to identify and rank the most important words in the text [5]. The study introduces a novel approach to word ranking using Shannon’s entropy to differentiate between internal and external modes. This method performs well when dealing with a single document with no prior knowledge, and it sorts extracted words based on their entropy differences. The concept of inside and outside refers to related words being grouped together [6]. This study proposes a concept extraction method that extracts both single and multiple terms from text in three languages. The results are evaluated using accuracy and recall measures, and the article proposes specificity metrics for single-word and multi-word phrases that can be used for evaluating other languages. The concept of N-grams and unigrams is also discussed [7]. The study focuses on Chinese keyword extraction methods and proposes an enhanced term frequency technique that incorporates Chinese elements into the TF method. A classification support vector is also created, resulting in a significant increase in accuracy and recall rates. The study also examines the grammar model for four improvement strategies: noun, modifier, noun phrase, and verb phrase [8]. The paper proposes an unsupervised learning method, keyphrase frequency analysis, for feature extraction of keywords in a document. This method involves data acquisition, preprocessing, and visualization of graphs [9]. Conditional random fields (CRFs) are proposed as a more successful approach for sentence segmentation and labeling, as well as for extracting keywords from documents. The process involves feature extraction and preprocessing, training the CRF model, extracting keywords and labeling with CRF, and evaluating the results. Programs like POS-tag and CRF++ are employed to extract keywords from documents [10]. Extracting keywords from patent text through unsupervised learning has become a challenging task. This paper proposes an improved version of the TextRank model, which considers two points: utilizing the model for each patent text to construct a network, and incorporating prior public knowledge from a dictionary data network with edges representing all relationships between nodes. This approach leads to an increase in the rank of patent text for each node [11]. Automatic keyword extraction has numerous applications in natural language processing, including text summarization. This paper presents a post-preprocessing
342
A. Jaagruthi et al.
method to enhance the performance of automatic keyword extraction techniques. The approach uses part-of-speech tagging to support the extraction of keywords [12]. The keyphrase extraction technique, referred to as EA, relies on machine learning based on the rules of Bayes decision. The technique utilizes TF-IDF scores to separate keyphrases from non-keyphrases. Nouns are categorized based on their occurrence in a document, their composition, and their domain-specificity, which can be determined from a database of domain-specific keyphrases [13]. This paper analyzes different supervised and unsupervised keyword extraction algorithms, including the keyphrase extraction algorithm, YAKE, TopicRank, and MultipartiteRank, from different articles. Extracted words are used to calculate similarity in different articles through cosine and Jaccard techniques [14]. The recurrence of words has been identified as an essential trait, and to decrease the word co-occurrence threshold, probabilistic measures such as KWNA, EPKLN, and EPKRN have been used. AAS, which is a method for comparing words to popular terms, uses phrases that identify the number of keywords. In this paper, 8 keywords were extracted from each page for the major set of keywords, and some were deleted using the named post-processing approach. Additionally, 800 datasets were employed [15]. The research model in this study uses two procedures: preprocessing and postprocessing. The model creates relationships for selected frequency words according to the document set, and expands them to nodes. Nodes represent the rest of the words, and links between headings and nodes are based on semantic meaning [16]. This article contributes to a document-centric approach to the automatic labeling of research articles. Auto tagging is the process of classification and tag selection. The classification process involves automatic keyword extraction using the RAKE algorithm, which uses a keyword scoring matrix. Top keyphrases are dynamically adjoined to training records for future reference [17]. Text extraction is a crucial step in analyzing journal articles which are commonly presented in PDF format and organized into sections such as Introduction, Methodology, Experimental Design, Results, and Analysis. Partition extraction is significant in identifying a representative sample of data that embodies knowledge of the entire set [18]. A document’s features and methods are described for comparing documents and generating scores. Information extraction techniques are used to compare and score documents against templates. Keyword extraction algorithms are utilized to tag documents with suitable categories for improved accuracy in project document searching [19]. Keywords and phrases provide a brief and compact text to set the research scope using the extracted keywords and keyphrases. This article presents a comparative study of offline keyword extraction algorithms, including PositionRank, TextRank, and RAKE, considering the position of each word in the document [20].
30 User Story-Based Automatic Keyword Extraction Using Algorithms …
343
30.3 Problem Statement Let the list of requirements for project specified as a set R = {R1 , R2 ,…Rn } from which for each Ri keywords will be extracted Ri K i = {K 1 , K 2 ,….K n }. Each keyword from requirements will be associated with the scores. Ri S i = {S 1 , S 2 ,….S n }. This paper tries to compare the performance of keywords extraction algorithms in terms of time complexity and N-gram analysis to identify the effectiveness of extracted keywords for writing effective requirements specification. R = {R1 , R2 , ....Rn }Ri K i = {K 1 , K 2 , ....K n }Ri Si = {S1 , S2 , ....Sn }
30.4 Methodology From Fig. 30.1, the user stories written for the work to collect requirements of the customers to deal with automatic extraction of salient keywords. This paper presents an analysis of four keyword extraction algorithms: Rapid Automatic Keyword Extraction, Yet Another Keyword Extraction, Text Rank, and KeyBERT along with scores. The N-gram analysis is performed using the extracted keywords and the results are concluded.
Fig. 30.1 Process workflow
344
A. Jaagruthi et al.
FUNCTION RAKE(text) DO rake.extract_keywords_from_text(input_text) //Keyword extraction// MyList rake.get_ranked_phrases() Res {} //extracted keyphrases stored in result// FOR i MyList do res[i] MyList.count(i) then OUTPUT(rake.get_ranked_phrases_with_scores(),res) //printing keyphrases with scores// END FUNCTION
Fig. 30.2 RAKE pseudocode
30.5 RAKE Rapid Automatic Keyword Extraction algorithm is a text mining technique used to identify and extract important words and phrases from a piece of text. It works by first splitting the text into individual words and identifying candidate keywords based on their co-occurrence with other words in the text. Scores of each candidate keyword based on its frequency are calculated. Keywords with high scores are selected as the final list of extracted keywords. This algorithm pseudocode is shown in Fig. 30.2.
30.6 YAKE It is an unsupervised automatic keyword extraction method that determines the most relevant keywords in text based on text statistical characteristics extracted from individual documents. The system does not need to learn about a specific set of documents and is independent of dictionaries, text size, domain, or language. YAKE defines a set of five keyword features that heuristically combine to give each keyword a single score. The lower the score, the more important the keyword. This algorithm pseudocode is shown in Fig. 30.3
FUNCTION YAKE(text, num_keywords) DO Words tokenizeIntoWords(text) //*tokenization of words*// term_freq calculateTermFrequency(words) background_freq calculateBackgroundFrequency(words) scores {} FOR word, tf in term_freq.items do scores[word] (tf / background_freq[word]) * log(len(words) / tf) #calculating score using frequency and length then DO sorted_keywords sortKeywordsByScore(scores) //*sorting keywords*// RETURN sorted_keywords[:num_keywords]
Fig. 30.3 YAKE pseudocode
30 User Story-Based Automatic Keyword Extraction Using Algorithms …
345
FUNCTION textRank(text) DO Sentences tokenizeIntoSentences(text) //*tokenization of sentences*// graph createGraph(sentences) scores initializeScores(sentences) then repeat scores calculateScores(graph, scores) IF converged scores then break sorted_sentences sortSentencesByScore(scores) //*sorting keywords with decreasing order of frequency*// RETURN generateSummary(sorted_sentences)
Fig. 30.4 TextRank pseudocode
30.7 TextRank It is an unsupervised graph-based model where each node represents a word and the edges represent relationships between words, formed by determining the cooccurrence of words in a moving window of a given size. This algorithm is inspired by PageRank that Google uses to rank websites. First, it tokenizes and annotates the text with parts of speech. Only single words are considered. However, N-grams are not used and multiword are restored later. An edge is created when a lexical item occurs in a window of N words to create an unweighted undirected graph. It then runs a text ranking algorithm to rank the words. Select the most important vocabulary words, then add keywords adjacent to multi-word keywords. This algorithm pseudocode is shown in Fig. 30.4
30.8 KeyBERT Keyword extraction algorithm which is a simple and easy-to-use that leverages SBERT embeddings to generate more document-like keywords and phrases from documents. First, document embeddings using the sentence-BERT model are created. Next, word embeddings for N-gram phrases are extracted. The similarity of each keyphrase to a document is measured using cosine similarity. The most similar word can then be identified as the word that best describes the entire document and is considered a keyword. This algorithm pseudocode is shown in Fig. 30.5
346
A. Jaagruthi et al.
FUNCTION KeyBERT(text, num_keywords, pre-trained_model) DO Sentences tokenizeIntoSentences(text) //*tokenization of sentences*// encoded_sentences encodeSentences(sentences, pre-trained_model) scores calculateScores(encoded_sentences) //*calculating score*// sorted_sentences sortSentencesByScore(scores) keywords extractKeywords(sorted_sentences[:num_keywords]) RETURN keywords Fig. 30.5 KeyBERT pseudocode
30.9 Results and Discussion Using the Fig. 30.7 user story I, the keywords are extracted and also repetition of extracted keywords. A graph is plotted against keywords and repetitions for each algorithm as shown in Fig. 30.6. From the visualization it can be seen that RAKE extracts more number of keywords along with highest repetitions. Unregistered user, uniquely identify and username are the keywords in RAKE which shows maximum repetition. Sample user story is depicted in Fig. 30.7 below which is used for keyword extraction. User stories are divided into sentences and these sentences will be the nodes of the graph. The sentences which have similarity in the words are connected by the edge. The edges give the cost or weight. The more similarity between any two sentences,
Fig. 30.6 Keywords versus no. of repetitions
As an unregistered user, I want to fill up the username form so that I can uniquely identify a computer Scenario: Filling in username GIVEN as an unregistered user WHEN I select option AND Username should have letters and numbers AND Username should be between 8 and 16 characters AND Username should not have more than 7 numbers AND Username should not have any special character THEN It will uniquely identify me on a computer
Fig. 30.7 Sample user story I
30 User Story-Based Automatic Keyword Extraction Using Algorithms …
347
Fig. 30.8 TextRank graph
the more the weight. From the Fig. 30.7 sample user story taken as input and the graph Fig. 30.8 drawn below using TextRank algorithm. N-grams are continual flow of words, single word is unigram, combination of two words is bigram, and more than two words is multigram. N varies based on input user story. N-gram analysis is done on the user stories rather than relying solely on words to provide relevant information. The above Fig. 30.9 graph represents N-grams from RAKE, YAKE, and TextRank from sample user story and number of repetitions. N-gram analysis is carried out on the extracted keywords to identify the usefulness and the graph is plotted N-grams versus number of repetitions. YAKE extracts more N-grams than others.
30.10 Time Complexity Analysis The graph (Fig. 30.10) depicts the time taken for keywords of a single user story (Fig. 30.11) using RAKE, YAKE, TextRank, and KeyBERT. RAKE gains success among other algorithms by least execution with scores and keywords time elapsed in seconds for execution. KeyBERT takes twice the time taken by RAKE, whereas other algorithms take approximately 20% more than RAKE (Fig. 30.12). Time complexity analysis of versatile user stories using RAKE, YAKE, TextRank, and KeyBERT is carried out. Among the 10 user stories it is observed that RAKE and TextRank shows least execution time in seconds compared to other algorithms. Rake performs better in terms of accuracy. It is understandable to have high score which takes less time elapsed in seconds for execution of keyword extraction.
348
Fig. 30.9 N-grams versus number of repetitions
Fig. 30.10 User story versus time calculated
A. Jaagruthi et al.
30 User Story-Based Automatic Keyword Extraction Using Algorithms …
349
As an unregistered user, I want to give email id, so that help to get confirmation email with a unique link Scenario: Confirmation email with link GIVEN unregistered user WHEN I select option AND I enter valid email id THEN I receive a link for verification GIVEN the user receives the link via the email WHEN the user navigates through the link received in the email THEN the system enables to set a new password
Fig. 30.11 Sample user story II
Fig. 30.12 User stories versus time calculated
30.11 Conclusion and Future Work To identify relevant keywords, analyzing user stories is essential. Visualizing graphs of extracted N-grams and their repetitions along with time complexity and scores can help to determine a preferable algorithm. In this study, RAKE extracted the most keywords, and the graph of N-grams versus time calculated showed the minimum time required. KEYBERT extracted more keywords than RAKE but took the maximum time, while YAKE and TextRank extracted N-grams, but their calculated time was not optimal compared to RAKE. The purpose of this study is to compare keyword extraction from different algorithms and conclude which algorithm performs better in extraction and timing. Ultimately, RAKE is preferred for keyword extraction of user stories.
350
A. Jaagruthi et al.
References 1. Zhang, C., Wang, H., Liu, Y., Wu, D., Liao, Y., Wang, B.: Automatic keyword extraction from documents using conditional random fields. J. Comput. Inf. Syst. 3 (2008) 2. Zhao, H., Zeng, Q.: Micro-blog keyword extraction method based on graph model and semantic space. J. Multimed. 8(5), 611–617 (2013) 3. Beliga, S.: Keyword Extraction: A Review of Methods and Approaches. University of Rijeka, Department of Informatics, Rijeka (2014) 4. Beliga, S., Meštrovi´c, A., Martinˇci´c-Ipši´c, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015) 5. Carretero-Campos, C., Bernaola-Galván, P., Coronado, A.V., Carpena, P.: Improving statistical keyword detection in short texts: Entropic and clustering approaches. Phys. A 392(6), 1481– 1492 (2013) 6. Yang, Z., Lei, J., Fan, K., Lai, Y.: Keyword extraction by entropy difference between the intrinsic and extrinsic mode. Phys. A 392(19), 4523–4531 (2013) 7. Ventura, J., Silva, J.: Mining concepts from texts. Procedia Comput. Sci. 9, 27–36 (2020) 8. Hong, B., Zhen, D.: An extended keyword extraction method. Phys. Procedia 24, 1120–1127 (2012) 9. Miah, M.B.A, Awang, S., Azad, M.S.: Region-based distance analysis of keyphrases: a new unsupervised method for extracting keyphrase feature from articles. In: 2021 International Conference on Software Engineering & Computer Systems and 4th Internal Conference on Computational Science and Information Management (ICSECS-ICOCSIM). pp 124–129, IEEE (2021) 10. Hulth, A.: Improved Automatic Keyword Extraction Given More Linguistic Knowledge 11. Huang, Z., Xie, Z.: A patent keywords extraction method using TextRank model with prior public knowledge. Compledx Intell. Syst. 8, 1–12 (2022). 10/1007/s40747-021-00343-8 12. Merrouni, Z.A., Frikh, B., Ouhbi, B.: Automatic keyphrase extraction: a survey and trends. J. Intell. Inf. Syst. 54(2), 391–424 (2020) 13. Dutta, A.: A novel extension for automatic keyword extraction. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 6(5), May 2016 14. Sarwar, T.B., Noor, N.M., Saef Ullah Miah, M.: Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding. Peer J. Comput. Sci. 8, e1024 (2022) 107717/peerj-cs.1024 15. Kian, H.H., Zahedi, M.: Improving precision in automatic keyword extraction using attention attractive strings. Arab. J. Sci. Eng. 16. Hasan, M., Sanyal, F., Chaki, D., Ali, H.: An empirical study of important keywords extraction techniques from documents. 978-1-5090-4264-7/17/$31.00 ©2017 IEE 17. Thushara, M.G., Krishnapriya, M.S., Nair, S.S.: A model for auto-tagging of research papers based on keyphrase extraction methods. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, Udupi, India (2017) 18. Jayaram, K., Sangeeta, K.: A review: information extraction techniques from research papers. In: IEEE International Conference on Innovative Mechanisms for Industry Applications, ICIMIA 2017—Proceedings, pp. 56–59 (2017) 19. Thushara, M.G., Dominic, N.: A template based checking and automated tagging algorithm for project documents. In: Second International Conference on Computing Paradigms (International Journal of Control Theory and Applications), vol. 9, no. 10. pp. 4537–4544 (2016) 20. Thushara, M.G., Mounika, T., Mangamuru, R.: A comparative study on different keyword extraction algorithms. In: Proceedings of the third international conference on computing methodologies and communication (ICCMC 2019)
Chapter 31
Prediction of Sepsis Disease Using Random Search to Optimize Hyperparameter Tuning Based on Lazy Predict Model E. Laxmi Lydia, Sara A. Althubiti, C. S. S. Anupama, and Kollati Vijaya Kumar
Abstract Sepsis is a severe infection-related host response that is linked with high mortality, morbidity and healthcare expenditures. Its treatment must be done quickly since each hour of delay increases death owing to irreparable organ damage. In the meantime, notwithstanding decades of clinical study, there are no reliable biomarkers for sepsis. As a result, early detection of sepsis using the abundance of high-resolution intensive care data has become a difficult task. There are also certain machine learning (ML) grounded models that could cut death rates, although their accuracy isn’t always reliable. This research offers a lazy predict (LP) model of ML algorithm for identifying and forecasting sepsis in intensive care unit (ICU) patients. LP model is one of the finest Python packages for semi-automating ML tasks. It generates a large number of basic models with little code and aids in determining which models function best without any parameter adjusting. This study describes various models such as XGB classifier, LGBM classifier, extra tree classifier, random forest classifier, bagging classifier and decision tree classifier are based on vital signs and clinical laboratory results and are simulated using information taken from an intensive care
E. L. Lydia (B) Department of Computer Science and Engineering, Vignan’s Institute of Information Technology, Visakhapatnam 530049, India e-mail: [email protected] S. A. Althubiti Department of Computer Science, College of Computer and Information Sciences, Majmaah University, Al-Majmaah 11952, Saudi Arabia e-mail: [email protected] C. S. S. Anupama Department of Electronics and Instrumentation Engineering, V.R. Siddhartha Engineering College, Vijayawada 520007, India K. V. Kumar Department of Computer Science and Engineering, GITAM School of Technology, Vishakhapatnam Campus, GITAM (Deemed to be a University), Visakhapatnam, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_31
351
352
E. L. Lydia et al.
unit patient’s database. Then, after getting evaluation of all the models, XGB classifier attains higher accuracy of 0.98 which is the best fit to use the LP library compare to other ML model. Moreover, this empirical study proposed hyperparameter tuning based on random search is applied to XGB classifier used to train a model. To overcome the classification challenge, this research work introduces a Lazy Classifier. Hence, the best score across all searched parameters using random search to optimize hyperparameter tuning which attains 0.99 for lazy predict with XGB algorithm.
31.1 Introduction Sepsis is a potentially fatal infection-related clinical disease caused by an abnormal host reaction [1]. Sepsis, as well as the inflammatory response which follows, can result in death and multiple organ failure syndrome. Sepsis is assessed to be prevailing in adult hospital admissions to the tune of 6% of [2] and around 1/3 of ICU patients [3]. Every year, it distresses around 49 million individuals throughout the world [4]. Sepsis is the most prevalent complication among adult inpatients who is released or died at Wuhan Pulmonary Hospital (Wuhan, China) and Jinyintan Hospital (as of 31 January 2020) during the pandemic time of coronavirus disease 2019 [5]. From sepsis to septic shock, there is a spectrum of severity. Although estimates vary widely and are based on research populations, mortality is thought to be a minimum of 10%, and while septic shock is existent, it is estimated to be 40%. In spite of several years of study, sepsis continues to be a primary reason of death and sickness in contemporary critical care units across the world. The World Health Organization as well as the World Health Assembly proclaimed sepsis a worldwide well-being precedence in 2017, with the goal of promoting sepsis prevention, identification and treatment. Sepsis must be detected early and effectively managed particularly in ICUs, wherein the utmost critically ill individuals are cared for. Primary detection of sepsis is demonstrated to decrease treatment deferments, improve suitable upkeep and lower mortality [6–9]. According to a study of 17,000 patients, each hour of antibiotic administration delay raises the risk of mortality by a factor of three [10]. Despite the fact that sepsis is a potentially lethal illness, recommendations [11] agree that early and reasonably affordable care with antibiotics, source control, fluid resuscitation and sustenance for major organ function significantly enhanced patient outcomes. Because of its syndromic character and patient variability, early detection of sepsis can be challenging. The non-existence of precise biomarkers based on blood or plasma makes early detection even more challenging. There are hundreds of biomarkers that are explored as predictive indicators in sepsis [12–14], but none have established adequate sensitivity or specificity to be used customarily in clinical exercise. There is an unmet medical necessity in this scenario to help clinicians in recognizing hospitalized individuals at danger of contracting sepsis. Sepsis is now diagnosed by conjoining data obtained from clinical examinations done by healthcare providers with data from laboratory findings and monitoring devices (i.e. grounded on experimental clinical decision rules). This is a subjective
31 Prediction of Sepsis Disease Using Random Search to Optimize …
353
as well as laborious technique (i.e. greatly reliant on the doctor’s or nurse’s abilities and expertise). For patients suffering from sepsis, prompt action is crucial. However, with current manual methods, there is a danger of deferred diagnosis and treatment. Due to human mistake, some deaths in emergency rooms occur; ML-based model might help bridge the gap between patients and physicians. The framework uses ML to lower death rates by using smart alerts and preventative approaches. Symptoms are sometimes difficult to notice, even by doctors. As a result, a framework is suggested that uses a LP model, which is simple and straightforward for anyone who is familiar with scikit-learn. Generate an instance of the estimator, in this case Lazy Classifier, and then use the fit method to fit it to the data. When you provide predictions when you create a Lazy Classifier instance, you will get predictions from all of the models for each and every observation. The framework must fit the data for each model, then use metrics to determine which model has the highest accuracy for the current dataset, and finally pick the optimal model. This procedure is laborious and may not be particularly successful. It aids in the recognition and diagnosis of illnesses, the selection of antibiotics, the development of pharmaceuticals, the use of a smart health monitor, and the storage of a large amount of historical medical data and so on. To understand the issue pattern and determine the proper label, approaches to automated learning include supervised learning (regression and classification) and unsupervised learning (association and clustering). A suggested framework made use of certain well used ML techniques in a hospital setting. In the current hospitalization system, automatic learning lowered treatment costs, doctors, personnel, duration of stay in the hospital, resources, and mortality rates well as other factors. This research offers a LP model of ML for primary detection as well as prediction of sepsis in critical care unit patients based on aforesaid situations. The following is how the paper is organized: the literature evaluation of existing approaches is described in Sect. 31.2. The recommended approach of the LP Model with ML algorithm is discussed in Sect. 31.3. Section 31.4 discusses the trials and their outcomes. Section 31.5 comes to a close with a conclusion.
31.2 Literature Review Singh et al. [15] offer a ML model that is utilized in forecasting and identifying sepsis in ICU patients. To begin, misplaced data are acquired via reproach procedure, wherein the model’s performance is improved by applying matrix factorization. The ensemble learning model is then built using several ML packages and models such as Logistic Regression (LR), XGBoost, Support Vector Machine (SVM), Nave Bayes (NB) as well as random forest (RF). For the purpose of aggregating the findings of each categorization model into a single final forecast, the majority vote method is utilized. Kuo et al. created a technique for diagnosing sepsis early on by replicating clinical settings using an artificial neural network (ANN) with purposefully missing realworld data [16]. It is made with a small proportion of misplaced values and a higher
354
E. L. Lydia et al.
rate of misplaced and inaccurate information to allow forecasting in the presence of noisy, missing and faulty inputs, such as those seen in a clinical context. Zhang et al. recommended a deep learning method for predicting sepsis in emergency room [17]. It makes use of a Long Short-Term Memory (LSTM) grounded model to capture asymmetrical time periods using temporal encodings, with the model acting as an interpreter for real-world medical applications. Kok et al. demonstrated a temporal convolutional network-based automated prediction of sepsis [18]. It is reliable, with higher correctness and accuracy, and has the prospective to be utilized in hospitals to forecast sepsis. ML models are used by Chaudhary et al. to predict patient outcomes at different phases of sepsis [19]. Several ML models are investigated in the study to see if they could be utilized to forecast the existing phase of sepsis by means of prevailing clinical information such as dynamic signs in high-risk patients as well as clinical laboratory test outcomes. Mitra et al. described a sepsis prediction and ICU patient assessment technique based on required symptoms [20]. To diagnose sepsis, it uses a range of rule-based and machine learning models, as well as first neural network detection and prediction findings on three forms of sepsis. It makes use of MIMIC-III dataset from the MIMIC-III backdated medical data mart for intensive care, which is limited to ICU patients. Desautels et al. forecasted sepsis using a ML technique with limited electronic health record data in the critical care unit [21]. The MIMIC-III dataset, which is confined to ICU patients, is used to predict sepsis by integrating multivariable groups of easily obtainable patient data. (Glasgow Coma Score, peripheral capillary oxygen saturation, vitals and age). Several ensemble approaches based on automated breast cancer detection, language function analysis, text classification, web page classification, text sentiment classification and text genre classification are described in papers [22–25]. Kim et al. [26] proposed a model in which the weights of entirely probable neural network connection nodes are shared. In a long term, the weight-sharing impact of the GA genotypes reduces the search cost. The MIMIC-III dataset is used to undertake a predictive analysis with the main goal of anticipating sepsis onset 3 h before it occurs. The suggested model’s AUROC score is 13% greater than that of a simple model based on a LSTM-NN. Umut Kaya et al. [27] created a model for sepsis diagnostics that uses multi-layered ANN. The Levenberg–Marquardt training method and the feed forward back propagation network topology is utilized to create artificial neural network models. The model’s input and output variables are the parameters that doctors use to detect septic disease and measure the severity of sepsis. The approach utilized gave an alternate prediction model for sepsis illness early detection. The possibility of contracting sepsis using ANN is attempted to be predicted using data from critical care patients aged 18–65 in Istanbul. The 2017 sepsis criteria, as well as an assessment of the methods and variables used by physicians in ANN modelling, are used to create the outputs and inputs of patients admitted to critical care unit and diagnosed with sepsis. Physicians utilize these indicators to diagnose sepsis and evaluate the severity of sepsis, and they are the simulated inputs for early detection of sepsis. The training, test and accuracy numbers for this model is all 99%.
31 Prediction of Sepsis Disease Using Random Search to Optimize …
355
Fleuren et al. [28] and Moor et al. [29] analysed earlier published sepsis prediction algorithms and discovered that very a small number of them is validated in clinical practice, and those are only been assessed in the USA. To the best of our understanding, only one ICU algorithm is currently accessible for clinical use [30, 31], and others are being potentially evaluated [32]. According to Ghias et al. [33], ML algorithms are capable of consistently predicting sepsis at the stage of patient admission to ICU utilizing six vital signs gathered through patient data above 18 years of age. On publicly accessible data, the XGBoost model scored the greatest precision of 0.97, accuracy of 0.98 and recall of 0.98 after a comparative comparison of ML models. According to Raja et al. [34], using an enhanced random forest, sepsis may be predicted with greater accuracy (97.7%). In comparison with the existing method, our proposed system assists clinicians in swiftly diagnosing sepsis. According to Gunnarsdottir et al. [35], septicemia is a ubiquitous immunological response to infection that affects billions of sick people each year in ICUs throughout the world. In this work describes a Generalized Linear Model (GLM) that predicts the prospect of an ICU patient developing septicemia based on demographic and bedside data. Demographic evaluations are carried out in order to achieve a categorization accuracy of 62.5%. Lucas et al. [36] considered studies on septic shock, severe sepsis or sepsis in any hospital situation. The index test is any supervised ML model that predicts these conditions in real time. The most important contributors to model performance are discovered by meta-analysing models with a reported of the Receiver Operating Characteristic (ROC) measure. The Synthetic Minority Oversampling Technique (SMOTE) models show the typical findings seen in ML prediction models where the proportion of sepsis patients is tall and equivalent to that of non-sepsis cases [37]. The non-SMOTE models demonstrate the model’s efficacy in typical clinical settings with a low occurrence of sepsis. For the sake of simplicity, the results for the SMOTE models as described in previous research are only provided [38–40].
31.3 Proposed Methodology The objective of this work is to predict sepsis during a patient’s admission to ICU using the LP model of ML algorithms and to discover the best model for the prediction. The topic of early identification and prediction of distinct phases of sepsis is framed as a classification problem in this paper. The idea is to constantly updating the expected likelihood of sepsis based on all known patient data up to that moment. To attain the aim, four process must be taken. 1. 2. 3. 4.
Tools utilized Data cleansing Data pre-processing LP model of ML algorithms based on random search
356
E. L. Lydia et al.
31.3.1 Tools Used Many open-source ML libraries such as numpy, scikit-learn, pandas and matplotlib are used for clustering, classification, dimensionality reduction and regression. Scikit-learn is one of the most widely used packages for evaluating models and extracting significant features. When a dataset is significantly imbalanced, it might be difficult to deal with; hence, there is a package called imbalanced-learn that offers different resampling strategies to deal with imbalanced datasets.
31.3.2 Data Cleaning To create the models, a dataset of 1,524,294 patients’ Electronic Health Records is utilized [41]. The dataset utilized is quite small, with a significant number of missing values. It describes the demographic and physiological parameters of ICU patient. Demographic data like gender, age, duration of stay in ICU and hospital admission time are also included in the dataset. The data contains 44 variables, including 26 laboratory values (white blood cell counts, bicarbonate and so on), eight dynamic signs (systolic blood pressure, temperature, heart rate and oxygen saturation among others) and six demographics (ICULOS, age, gender, etc.).The last variable indicates sepsis labels 0 and 1, suggesting that the patient has sepsis according to the sepsis 3 criteria. Only 4061 people out of 280,400 have sepsis, according to the statistics. Furthermore, numerous variables (16 out of 44) have a missing value rate of greater than 80% records are not available in the respective attributes such as EtCO2 , base excess, Fio2 , HCO3 , Paco2 , PH, AST, Sao2 , Alkanethiols, BUN, Calcium, Chloride, Bilirubin direct, Creatinine, Lactate, Glucose, Phosphate, Magnesium, Glucose, Potassium, Hct, TroponinI, PTT, Hgb, Fibrinogen, WBC, Platelets. Hence, the above-mentioned attributes have been removed in the data cleansing process. Because the measurements are based on requirements, the vast bulk of vitals and other data are filled in with null.
31.3.3 Data Pre-processing It is the most crucial step in the data formatting and normalization process. To avoid misleading outcomes, the data should be thoroughly examined. As a result, before developing a model, correct data should be interpreted. Missing qualities are addressed in the pre-processing of the data by substituting numeric and discrete whole number qualities with the attribute mean of all instances with the same classname as the occurrence under consideration, and nominal qualities are substituted using attribute mode. The highlights are rescaled to the point where they have the
31 Prediction of Sepsis Disease Using Random Search to Optimize …
357
Fig. 31.1 Recommended dataset based on randomizing the rows
Fig. 31.2 Representation of correlation heatmap
attributes of a standard typical circulation with a standard deviation of one and a mean of zero. The MinMaxScaler from scikit-realize for scaling is used, which utilizes the Eq. (31.1). Xs =
X − X Min . X Max − X min
(31.1)
The dataset utilized to forecast sepsis based on 16 variables is shown in Fig. 31.1. As a consequence, as shown in Fig. 31.2, a heatmap is constructed utilizing a few data points from the dataset.
31.3.4 Lazy Predict Model of ML Algorithms Based on Random Search There are numerous traditional ways for detecting sepsis, such as laboratory tests, qsofa scores and SIRS, but delayed diagnosis owing to ambiguous symptoms results
358
E. L. Lydia et al.
in a high death rate and increases hospital costs; hence, there is a requirement to forecast sepsis sooner than clinical reports. For this purpose, many ML approaches with a high specificity rate and sensitivity may be used. The LP model is proposed in this study. It makes good predictions on a dataset by combining various ML models. A LP model functions by training many classification models on the same dataset and letting each classification model to make its own predictions. One of the most challenging tasks is choosing the right model for the ML problem statement. Every machine learning model has advantages as well as disadvantages. Some models perform well on a given dataset, while others fail miserably. Import all of the libraries first, then modify the parameters, compare all of the models and verify the model’s performance against various targets. This procedure takes a long time to complete. The LP model has been presented as a solution to this problem. It’s one of the greatest Python packages for semi-automating ML tasks. It generates a large number of basic models with little code and aids in determining which models function best without any parameter adjusting. Assume that this research has a problem statement and that you need to apply all of the models to that dataset and evaluate how well our basic model is working. The term “basic model” refers to a model that has no parameters. As a result, LP may be used to do this task directly in this research. The LP package is one such programme that rates the ML models that are most likely to be suitable. Both the Lazy Classifier and the Lazy Regressor are included in the LP, allowing you to predict binary and continuous variables correspondingly. This work used hyperparameter tuning, fivefold cross-validation, parameter candidates to determine the most accurate parameters for a particular model and RandomizedSearchCV to select the parameters with the best score [42]. At each fold, split the training dataset and test with a new batch of data and define performance measures to evaluate the models’ correctness. Figure 31.3 displays the experimental approach for the proposed strategy. The dataset loader is the first stage after conducting data imputation, data cleaning, data pre-processing and feature extraction; the preceding section covers the specifics of all data processing and data visualization methodologies. Cross-validation is a method for evaluating how well our model worked, and it is always necessary to test the correctness of our model to ensure that it is successfully trained with data and is free of overfitting and underfitting [43]. This validation step is performed immediately after the model has been trained with data. This study used k-fold cross-validation approaches to cross-validate the training dataset. This step is repeated for the whole split, using k = 5 to divide the dataset into fivefold, fitting 5 folds for each of the 5 contenders, totalling 25 fits. This technique is reiterated k times, with one of the k subsets acting as the test set/validation set and the remaining k-1 subsets serving as the training set each time. After that take average of the model against each of the folds before completing our model. Following cross-validation, hyperparameter tweaking is required, with the optimal parameter chosen for model tuning. When it comes to generating good results with models, hyperparameters are crucial. Model-specific features that must be “fixed” before the model can be trained and evaluated on data are referred to as hyperparameters. Finding the proper hyperparameters is still a bit of a black art, and it now entails either a random search or
31 Prediction of Sepsis Disease Using Random Search to Optimize …
Load sepsis dataset
Input of missing data
Data standardization
Hyper-tune the parameters
Feature extraction
K1 folds are used for training
359 Data partitioning
5-fold cross validation based on training and testing dataset
Lazy predict model with ML algorithms
Optimization to avoid overfitting Prediction of Apply random sepsis or healthy search for predict patient best score
Performance evaluation of predictive model
Fig. 31.3 Flow diagram for evaluating the performance of various models
a grid search across Cartesian products of hyperparameter sets. Random search is used in this study to evaluate n uniformly random locations in the hyperparameter space and choose the one that produces the best results. It is a method for training a model in which random permutations of hyperparameters are chosen and employed. This method allows us to limit the amount of hyperparameter combinations that are attempted. Unlike grid search which attempts every possible combination, random search allows us to specify the number of models to train. Following the development of the appropriate hyperparameter using random search, categorize the beat label to predict patient with sepsis and those who do not have sepsis. Figure 31.4 depicts the intended work of LP Models. The following actions must be taken in order to develop the LP models. Step 1. Load a CSV-formatted dataset into the CSV module. Step 2. Import the Python’s scikit-learn library. Step 3. Using “sk learn. Model selection import train test split,” sklearn. Preprocessing import standard scalar, and passing the value to the train test split function arguments like train size, test size and random state, split the entire dataset into train records as well as test. Step 4. Select the classifier models to import the following Python libraries: import from Lazypredict. Supervised Lazyclassifier. Step 5. Import sklearn. model_selection import RandomizedSearchCV, cross_ validate. Step 6. In training records, x train is a stand-alone factor, whereas y train is the target factor.
360
E. L. Lydia et al. Use sepsis dataset
XGB classifier
LGBM classifier
Randomforest classifier
ExtraTree classifier
Decisiontree classifier
Bagging classifier
Lazy predict to find the best model
Final Prediction
Fig. 31.4 Proposed lazy predict model architecture
Step 7. Use the test on various parameters, cross-validation and hyperparameter tuning to increase the performance of the LP model. Step 8. Using random search, evaluate n uniformly random locations in the hyperparameter space and choose the one that produces the highest performance. Step 9. Calculate the performance metrics for the LP models. Rep steps 7 and 8 until the LP model produces the best outcomes. Step 10. To determine the best score across all parameters that have been searched. Step 11. Sort the patients into two groups: those who will have sepsis and those who will not get sepsis.
31.4 Result and Discussion The research in this study is conducted utilizing publicly accessible sepsis patient ICU information. This paper offers the evaluation metrics needed to analyse the performance of the LP model of ML algorithm and train utilized for the prediction and discovery of various sepsis scenarios. The performance of the LP model is evaluated using six different classification models, including the XGB classifier, LGBM classifier, extra tree classifier, random forest classifier, bagging classifier and decision tree classifier. The assessment metrics aid in determining which model performs best and which performs worse for a given situation. The first two are used to keep track of which classifications are successfully predicted, while the latter two are used to define the matrix’s confusion. The second TN classifier predicted “no sepsis” and identified patients who are not affected by sepsis (y true = 0, y pred = 0). The second FP classifier predicted individuals who are not affected by “sepsis” (y true = 0, y pred = 1). The third FN classifier accurately recognized patients with sepsis while predicting “no sepsis” (y true = 1, y pred = 0). The fourth TP is a
31 Prediction of Sepsis Disease Using Random Search to Optimize …
361
Fig. 31.5 CM for XGB classifier
Fig. 31.6 CM for LGBM classifier
classifier that predicted “sepsis” and identified those who had it (y true = 1, y pred = 1). The CM for each classifier is shown in Figs. 31.5, 31.6, 31.7, 31.8, 31.9 and 31.10. The splitting the data with training size as 75% and testing size with as 25%. The proposed LP model is compared to existing ML approaches in aspects of precision, F1-score, accuracy, specificity, recall as well as AUC to diagnose sepsis in Fig. 31.11 and Table 31.1. For prognosis of sepsis, the suggested LP model of XGB classifier with fivefold cross-validation acquired a maximum Area Under Curve (AUC) and accuracy [44, 45]. Based on the collected experimental data, it is believed that the XGB model’s LP shows a significant improvement over other ML frameworks, and that the XGB method is the best model for exact forecast of diverse stages of sepsis. The XGB algorithm is selected from among the top six classifiers for generating the optimal hyperparameter using random search to discover the most exact values for the specified model, using RandomizedSearchCV to identify the parameters with the highest precision of 0.99, accuracy of 0.98, recall of 0.96, sensitivity of 0.96 and specificity of 0.99. The AUC, which corresponds to the x- and y-axes, indicates the
362 Fig. 31.7 CM for extra tree classifier
Fig. 31.8 CM for random forest classifier
Fig. 31.9 CM for bagging classifier
E. L. Lydia et al.
31 Prediction of Sepsis Disease Using Random Search to Optimize …
363
Fig. 31.10 CM for decision tree classifier 1
Performance metrics
0.98 0.96 0.94 0.92 0.9 0.88 0.86
XGB classifier
LGBM classifier Extratree classifier
Randomforest classifier
Bagging classifier
Decisiontree classifier
ML algorithm based on working with Lazy predict nodel Accuracy
Precision
Recall
F1-score
AUC
Sensitivity
Specificity
Linear (Accuracy)
Fig. 31.11 Graphical representation of performance evaluation based on various models
Receiver Operating Characteristic (ROC) for the False Positive Rate (FPR) in aspects of probability curve and for the True Positive Rate (TPR) in terms of measuring separability. ROC curve is based on two key assessments: specificity and sensitivity, as shown in Fig. 31.12a and CM of projected labels in Fig. 31.12b. The recommended model output gives best estimator, score and parameters across all searched params as “colsample_bytree”: 0.6393, “learning_rate”: 0.1490017, “max_depth”: 9, “min_child_weight”: 9, “min_child_weight”: 4, “n_estimators”: 801, “subsample”: 0.8497. The best score obtained from the ROC for all searched parameters is 0.99504.
364
E. L. Lydia et al.
Table 31.1 Performance metrics based on various models using lazy predict model Model
Accuracy
Precision
Recall
F1-score
AUC
Sensitivity
Specificity
XGB classifier
0.98
0.99
0.96
0.98
0.98
0.96
0.99
LGBM classifier
0.97
0.99
0.94
0.97
0.97
0.94
0.99
Extra tree classifier
0.96
0.99
0.93
0.96
0.96
0.92
0.99
Random forest classifier
0.95
0.98
0.92
0.95
0.95
0.917
0.98
Bagging classifier
0.95
0.98
0.92
0.95
0.95
0.92
0.98
Decision tree classifier
0.94
0.95
0.93
0.94
0.94
0.928
0.94
Fig. 31.12 XGB based on optimize hyperparameter using random search to find out the highest accuracy of a ROC and b CM heatmap
31.5 Conclusion This research proposes a LP model of ML algorithm for primary prediction and identification of sepsis in ICU patients. To begin, missing data is obtained via the imputation procedure, and the model’s performance is improved by applying matrix factorization. Then, for the exact prediction of 0.98, the suggested LP model of the XGB method is the most suited model, and the XGB classifier produced a good classification, which enhances the proposed performance. The suggested XGB classifier model achieves the best score of all searched parameters, which is 0.99504. Patients hospitalized to the critical care unit benefit from this paradigm. Feature selection techniques may be used in the future to identify the greatest features in datasets. Many different solo and ensemble classifiers can also be used to increase accuracy and other performance assessment criteria for predicting sepsis illness.
31 Prediction of Sepsis Disease Using Random Search to Optimize …
365
References 1. Singer, M., Deutschman, C.S., Seymour, C.W., Shankar-Hari, M., Annane, D., Bauer, M, et al.: The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 315(8), 801–810, 23 Feb 2016. [FREE Full text] [https://doi.org/10.1001/jama.2016. 0287] [Medline: 26903338] 2. Rhee, C., Dantes, R., Epstein, L., Murphy, D.J., Seymour, C.W., Iwashyna, T.J.: CDC Prevention Epicenter Program. Incidence and trends of sepsis in US hospitals using clinical vs claims data, 2009–2014. JAMA 318(13), 1241–1249, 03 Oct 2017. [FREE Full text] [https://doi.org/10. 1001/jama.2017.13836] [Medline: 28903154] 3. Sakr, Y., Jaschinski, U., Wittebole, X., Szakmany, T., Lipman, J., Ñamendys-Silva, S.A.: ICON Investigators. Sepsis in intensive care unit patients: worldwide data from the intensive care over nations audit. Open Forum Infect. Dis. 5(12), ofy313 Dec 2018. [FREE Full text] [https://doi. org/10.1093/ofid/ofy313] [Medline: 30555852] 4. Rudd, K.E., Johnson, S.C., Agesa, K.M., Shackelford, K.A., Tsoi, D., Kievlan, D.R., et al.: Global, regional, and national sepsis incidence and mortality, 1990–2017: analysis for the Global Burden of Disease Study. Lancet 395(10219), 200–211, 18 Jan 2020. [FREE Full text] [https://doi.org/10.1016/S0140-6736(19)32989-7] [Medline: 31954465] 5. Zhou, F., Yu, T., Du, R., Fan, G., Liu, Y., Liu, Z., et al.: Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet 395(10229), 1054–1062, 28 Mar 2020. [https://doi.org/10.1016/S0140-6736(20)305 66-3] [Medline: 32171076] 6. Kumar, A., Roberts, D., Wood, K.E., Light, B., Parrillo, J.E., Sharma, S., et al.: Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock. Crit. Care Med. 34(6), 1589–1596, Jun 2006. [https://doi.org/ 10.1097/01.CCM.0000217961.75225.E9] [Medline: 16625125] 7. Mok, K., Christian, M.D., Nelson, S., Burry L. Time to administration of antibiotics among inpatients with severe sepsis or septic shock. Can. J. Hosp. Pharm. 67(3), 213–219, May 2014. [FREE Full text] [https://doi.org/10.4212/cjhp.v67i3.1358] [Medline: 24970941] 8. Husabø, G., Nilsen, R.M., Flaatten, H., Solligård, E., Frich, J.C., Bondevik, G.T., et al.: Early diagnosis of sepsis in emergency departments, time to treatment, and association with mortality: an observational study. PLoS One 15(1), e0227652 (2020). [FREE Full text] [https://doi.org/ 10.1371/journal.pone.0227652] [Medline: 31968009] 9. Seymour, C.W., Gesten, F., Prescott, H.C., Friedrich, M.E., Iwashyna, T.J., Phillips, G.S., et al.: Time to treatment and mortality during mandated emergency care for sepsis. N. Engl. J. Med. 376(23), 2235–2244 (2017). https://doi.org/10.1056/nejmoa1703058 10. Ferrer, R., Martin-Loeches, I., Phillips, G., Osborn, T.M., Townsend, S., Dellinger, R.P., et al.: Empiric antibiotic treatment reduces mortality in severe sepsis and septic shock from the first hour: results from a guideline-based performance improvement program. Crit. Care Med. 42(8), 1749–1755, Aug (2014). [https://doi.org/10.1097/CCM.0000000000000330] [Medline: 24717459] 11. Rhodes, A., Evans, L.E., Alhazzani, W., Levy, M.M., Antonelli, M., Ferrer, R., et al.: Surviving sepsis campaign: international guidelines for management of sepsis and septic shock: 2016. Crit. Care Med. 45(3), 486–552 Mar 2017. [https://doi.org/10.1097/CCM.0000000000002255] [Medline: 28098591] 12. Pierrakos, C., Vincent, J.: Sepsis biomarkers: a review. Crit. Care 14(1), R15 (2010) [FREE Full text] [https://doi.org/10.1186/cc8872] [Medline: 20144219] 13. Cho, S., Choi, J.: Biomarkers of sepsis. Infect. Chemother. 46(1), 1–12, Mar 2014. [FREE Full text] [https://doi.org/10.3947/ic.2014.46.1.1] [Medline: 24693464] 14. Pierrakos, C., Velissaris, D., Bisdorff, M., Marshall, J.C., Vincent, J.: Biomarkers of sepsis: time for a reappraisal. Crit. Care 24(1), 287, 05 Jun 2020. [FREE Full text] [https://doi.org/10. 1186/s13054-020-02993-5] [Medline: 32503670]
366
E. L. Lydia et al.
15. Singh, Y.V., Singh, P., Khan, S., Singh, R.S.: A ML model for early prediction and detection of sepsis in intensive care unit patients. Hindawi J. Healthc. Eng. 11 (2022), Article ID 9263391. https://doi.org/10.1155/2022/9263391 16. Kuo, Y.-Y., Huang, S.-T., Chiu, H.-W.: Applying artificial neural network for early detection of sepsis with intentionally preserved highly missing real-world data for simulating clinical situation. BMC Med. Inf. Decis. Making 21(1), 290 (2021) 17. Zhang, D., Yin, C. Hunold, K. M., Jiang, X. Caterino, J. M., Zhang, P.: An interpretable deeplearning model for early prediction of Sepsis in the emergency department. Patterns 2(2) (2021), Article ID 100196 18. Kok, C., Jahmunah, V., Oh S.L. et al.: Automated prediction of sepsis using temporal convolutional network. Comput. Biol. Med. 127 (2020), Article ID 103957 19. Chaudhary, P. Gupta, D.K., Singh, S.: Outcome prediction of patients for different stages of sepsis using ML models. In: Advances in Communication and Computational Technology. Lecture Notes in Electrical Engineering, vol. 668, Springer, Singapore (2021) 20. Mitra, A., Ashraf, K.: Sepsis prediction and vital signs ranking in intensive care unit patients. Clin. Orthop. Relat. Res. 1812, 1–10 (2019), Article ID 06686 21. Desautels, T., Calvert, J., Hoffman, J. et al.: Prediction of sepsis in the intensive care unit with minimal electronic health record data: a ML approach. JMIR Med. Inform. 4(3), e28 (2016), Article ID 27694098 22. Onan, A. Koruko˘glu, S., Bulut, H.: Ensemble of keyword extraction methods and classifiers in text classification. Expert Syst. Appl. 57, 232–247 (2016) 23. Onan, A., Korukoglu, S.: A feature selection model based ˘ on genetic rank aggregation for text sentiment classification. J. Inf. Sci. 43(1), 25–38 (2017) 24. Onan, A.: Classifier and feature set ensembles for web page classification. J. Inf. Sci. 42 (2), 150–165 (2016) 25. Zhang, Z., Chen, L. Xu, P., Hong, Y.: Predictive analytics with ensemble modeling in laparoscopic surgery: a technical note. Laparoscopic, Endoscopic Rob. Surg. 5, (2022) 26. Kim, J.K., Ahn, W., Park, S., Lee, S. -H., Kim, L.: Early prediction of sepsis onset using neural architecture search based on genetic algorithms. Int. J. Environ. Res. Publ. Health, 19, 2349 (2022). https://doi.org/10.3390/ijerph19042349 27. Kaya, U., Yilmaz, A., Dikmen, Y.: Prediction of sepsis disease by artificial neural networks. J. Selcuk-Technic Spec. Issue 2018 (ICENTE’18) 28. Fleuren LM, Klausch TLT, Zwager CL, Schoonmade LJ, Guo T, Roggeveen LF, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med 2020 Mar;46(3):383–400 [FREE Full text] [doi: https://doi.org/ 10.1007/s00134-019-05872-y] 29. Moor, M., Rieck, B., Horn, M., Jutzeler, C.R., Borgwardt, K.: Early prediction of sepsis in the ICU using machine learning: a systematic review. Front Med (Lausanne) 8, 607952 (2021) [FREE Full text] [https://doi.org/10.3389/fmed.2021.607952] [Medline: 34124082] 30. Desautels, T., Calvert, J., Hoffman, J., Jay, M., Kerem, Y., Shieh, L., et al.: Prediction of sepsis in the intensive care unit with minimal electronic health record data: a machine learning approach. JMIR. Med. Inform. 4(3), e28, 30 Sep 2016. [FREE Full text] [https://doi.org/10.2196/medinf orm.5909] [Medline: 27694098] 31. Shimabukuro, D.W., Barton, C.W., Feldman, M.D., Mataraso, S.J., Das, R.: Effect of a machine learning-based severe sepsis prediction algorithm on patient survival and hospital length of stay: a randomised clinical trial. BMJ Open Respir. Res. 4(1), e000234 (2017). [FREE Full text] [https://doi.org/10.1136/bmjresp-2017-000234] 32. Nemati, S., Holder, A., Razmi, F., Stanley, M.D., Clifford, G.D., Buchman, T.G.: An interpretable machine learning model for accurate prediction of sepsis in the ICU. Crit. Care Med. 46(4), 547–553, Apr 2018. [https://doi.org/10.1097/CCM.0000000000002936] [Medline: 29286945] 33. Ghias, N., Ul Haq, S., Arshas, H., Sultan, H., Bashir, F., Ghaznavi, S.A., Shabbir, M., Badshah, Y., Rafiq, M.: Using ML algorithms to predict sepsis and its stages in ICU patients. 2022. medRxiv preprint. https://doi.org/10.1101/2022.03.15.22271655
31 Prediction of Sepsis Disease Using Random Search to Optimize …
367
34. Kanaga Suba Raja, S., Valarmathi, K., Deepthi Sri, S., Harishita, S., Keerthanna,V.: Sepsis prediction using ensemble random forest. AIP Conf. Proc. 2405, 030027 (2022). https://doi. org/10.1063/5.0072499 35. Gunnarsdottir, K., Sadashivaiah, V., Kerr, M., Santaniello, S., Sarma, S.V.: Using demographic and time series physiological features to classify sepsis in the intensive care unit. In: 2016 38th annual international conference of the IEEE engineering in medicine and biology society (EMBC), pp. 778–782. Orlando, FL (2016) 36. Fleuren, L.M., Klausch, T.L.T., Zwager, C.L., Schoonmade, L.J., Guo, T., Roggeveen, L.F., Swart, E.L., Girbes, A.R.J., Thoral, P., Ercole, A., Hoogendoorn, M., Elbers, P.W.G..: ML for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med. 46, 383–400 (2020). https://doi.org/10.1007/s00134-019-05872-y 37. Liu, R., Greenstein, J.L., Sarma, S.V., Winslow, R.L.: Natural language processing of clinical notes for improved early prediction of septic shock in the ICU. In: Proceeding of the 41st annual international conference of the IEEE engineering in medicine and biology society, pp. 6103–6108 (2019) 38. Carnielli, C.M., et al.: Combining discovery and targeted proteomics reveals a prognostic signature in oral cancer. Nat. Commun. 9, 3598 (2018) 39. Xia, B., et al.: Machine learning uncovers cell identity regulator by histone code. Nat. Commun. 11, 2696 (2020) 40. Rennie, S., Dalby, M., van Duin, L., Andersson, R.: Transcriptional decomposition reveals active chromatin architectures and cell specific regulatory interactions. Nat. Commun. 9, 487 (2018) 41. Nakhashi, M., Toffy, A., Achuth, P.V., Palanichamy, L., C.M., Vikas, C.M.: Early prediction of sepsis: using state of-the-art ML techniques on vital sign inputs. Comput. Cardiol. Conf. 2019 42. Yao, R.-Q., Jin, X., Wang, G.-W., Yu, Y., Wu, G.-S., Zhu, Y.-B., Li, L., Li, Y.-X., Zhao, P.-Y., Zhu, S.-Y., et al.: A machine learning-based prediction of hospital mortality in patients with postoperative sepsis. Front. Med. 7, 445 (2020) 43. Taylor, R.A., Pare, J.R., Venkatesh, A.K., Mowafi, H., Melnick, E.R., Fleischman, W., Hall, M.K.: Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data–driven, machine learning approach. Acad. Emerg. Med. 23(3), 269–278 (2016) 44. T. Chen and C. Guestrin, “XGBoost: a scalable tree boosting system,” Clinical Orthopaedics and Related Research, vol. 1603, pp. 785–794, Article ID 02754, 2016. 45. Freund, Y., Schapire, R.E.: A short introduction to boosting. J. Japan. Soc. Artif. Intell. 14(5), 771–780 (1999)
Chapter 32
Surveillance Video-Based Object Detection by Feature Extraction and Classification Using Deep Learning Architecture Elvir Akhmetshin, Sevara Sultanova, C. S. S. Anupama, Kollati Vijaya Kumar, and E. Laxmi Lydia
Abstract As of late, deep learning has accomplished top exhibitions in object recognition undertakings. Be that as it may, continuously, frameworks having memory or processing restrictions extremely wide and profound organizations with various boundaries comprise a significant impediment. Profound gaining-based object location arrangements arose out of PC vision has spellbound undivided focus as of late. This examination proposes novel method in observation video-based object location by highlight extraction with characterization utilizing profound learning. Here the info information has been gathered as observation video and handled for commotion expulsion, smoothening, standardization. Then, at that point, the handled video has been separated and ordered utilizing concealed convolution fluffy perception brain organizations. The exploratory examination has been completed as far as exactness, accuracy, review, F-1 score, RMSE, NSE. Proposed method attained accuracy of 91%, accuracy of 84%, review of 86%, F-1 score of 76%, RMSE of 61%, and NSE of 48%.
E. Akhmetshin Department of Economics and Management, Kazan Federal University, Elabuga Institute of KFU, Elabuga, Russia 423604 S. Sultanova Department of Construction, Urgench State University, Urganch, Uzbekistan 220100 C. S. S. Anupama Department of Electronics and Instrumentation Engineering, V.R. Siddhartha Engineering College, Vijayawada 520007, India K. V. Kumar Department of Computer Science and Engineering, GITAM School of Technology, Vishakhapatnam Campus, GITAM (Deemed to Be a University), Visakhapatnam, India E. L. Lydia (B) Department of Computer Science and Engineering, Vignan’s Institute of Information Technology, Visakhapatnam 530049, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_32
369
370
E. Akhmetshin et al.
32.1 Introduction Surveillance is a tedious errand that involves gathering and dissecting gigantic measures of visual proof. Information assortment is a significant disappointment purpose in any observing plan. Observing cameras can see individuals inside the sight line of the camera [1]. In this way, crooks might choose when or where they have not been followed. As far as forestalling and following, independent surveillance cameras have downsides. The utilization of UAVs for observation becomes fundamental for the unwavering quality of the framework. As of late, PC vision innovation has created at a quick and promising speed. A piece of this achievement might be credited to the presentation and use of AI and profound learning techniques, while others can be ascribed to the improvement of novel portrayals and layouts for certifiable PC vision challenges, or the advancement of powerful arrangements [2]. The video examination is developing field in examination space. There are parcel of video assets accessible in libraries yet because of absence of the handling foundation for the video they were not utilized successfully. Since ascent of the profound learning with outstanding ability to provide food the need of the video examination had made change in the existence of each and every person. At first profound learning were acquainted with lessen the expense and to speed up the dull work. This had assisted enterprises with preferring mechanized vehicle fabricating plant to build the creation of autos, food and refreshment industry to food quality testing and bundling. Presently, profound learning affected the public activity of the individual. For a created country with a populace of millions, each individual is caught by a camera all the time [3]. A great deal of recordings are produced and put away for a specific time frame length. Since steady checking of these reconnaissance recordings by the specialists to pass judgment on the off chance that the occasions are dubious or not is anywhere close to an unthinkable undertaking as it requires a labor force and their consistent consideration. Subsequently, we are making a need to mechanize this cycle with high precision [4]. This will assist concerned specialists with distinguishing primary driver of irregularities happened in the meantime saving time and work expected in looking through the accounts physically. Irregularity Acknowledgment Framework is characterized as a constant reconnaissance program intended to consequently identify and represent the indications of hostile or problematic exercises quickly [5].
32.2 Related Works Object discovery [6] in recordings or pictures can be grouped into two expansive exploration regions. The principal tends to firearm and blade identification utilizing old style/non-profound learning calculations, while the second spotlights on further developing the item recognition precision utilizing profound learning calculations.
32 Surveillance Video-Based Object Detection by Feature Extraction …
371
Fundamentally, non-profound learning/old style calculations depend on varietybased division, coroner locator, and appearance. A detriment of utilizing the current traditional calculation is that it is exceptionally reliant upon the nature of casings/ pictures. Outlines with impediment and commotion are challenging to decipher; also, when forefront and foundation variety sections are coordinating, translation is troublesome while utilizing variety-based division [7]. To deal with person on foot identification in recordings of observation cameras having various elevations, work [8] proposed size of interest (SOI) and district of interest (return on initial capital investment) assessment to limit pointless calculations in commonsense multi-scale passerby discovery. The job of SOI is to decide picture scaling level by assessing viewpoint of picture and that of return for money invested is to look through region of a scaled picture. Work [9] proposed HWMs for deciding degrees of picture scaling with a separation and-vanquish calculation to diminish computational intricacy associated with handling reconnaissance video successions. Creator [10] proposed a spatially differing passerby appearance CNN model that considers the viewpoint math of the scene, since when another observation framework is introduced in another area, a scene-explicit walker locator should initially be prepared. Work [11] proposed a multi-scale CNN for a quick multi-scale passerby location calculation comprising of responsive fields of various scales and a scale-explicit identifier to deliver areas of strength for a scale walker locator. Work [3] proposed walker identification in light of sharing highlights across a gathering of CNNs that relate to person on foot methods of various sizes. In the most recent years, to decrease human endeavors in handling and increment the productivity of calculations, profound learning based object location in current methodologies appeared. The advanced methodologies of item discovery have preferable precision over old style approaches because of more calculation handling abilities [12].
32.3 System Model This part talk about original procedure in reconnaissance video-based object recognition by highlight extraction with characterization utilizing profound learning. Here the info information has been gathered as reconnaissance video and handled for commotion evacuation, smoothening, standardization. Then, at that point, the handled video has been separated and ordered utilizing concealed convolutional fluffy perceptron brain organizations. The proposed design is displayed in Fig. 32.1. For object recognition it utilizes convolutional networks. On a solitary picture Just go for it can perceive numerous items. This demonstrates that Consequences be damned finds the place of these articles on the picture alongside anticipating classes of items. The whole picture is handled by a solitary Brain Organization in Consequences be damned. This brain network divides pictures into districts and makes probabilities for all areas. There after Just go for it predicts the number and the likelihood of jumping encloses covering areas the picture and chooses the best ones as per the probabilities.
372
E. Akhmetshin et al.
Fig. 32.1 Proposed architecture
32.4 Masked Convolutional Fuzzy Perceptron Neural Networks-Based Feature Extraction and Classification It begins with the most common way of arranging and refining anchors. Besides, these anchors assume a huge part in general exhibition of Veil ConVol. This is trailed by presence of bouncing boxes or locales because of District Proposition Organization (RPN). This is an exceptionally critical point on the grounds that the Locale Proposition Organization concludes regardless of whether a specific item is a foundation and similarly bouncing boxes are stamped. The items in the not entirely set in stone by a classifier and a regressor by district of interest pooling. Subsequent to refining the jumping boxes, covers are produced and put on their suitable situations on article. The objects of various scales are distinguished by Component Pyramid Organization (FPN). Despite the fact that pyramid delivering is shunned as a result of computational as well as memory interest, a pyramid progressive system is utilized to get include pyramids without intensely compromising effectiveness. FPN permits access of highlights at higher as well as lower levels. Objects in pictures will be passed in different scales that will be distinguished to decide highlights by FPN that are the fundamental spine of this proposed technique. During this stage, the most reasonable jumping not entirely settled alongside the objectness score. These anchor boxes could differ in size due to fluctuation of item aspects in picture. The classifier is prepared to recognize the foundation and closer view. Additionally, on the off chance that the anchors are gone to vectors with two qualities and, took care of to an actuation capability, names are anticipated. A regressor of jumping box refines anchors and a regressor misfortune capability decides the relapse misfortune by following conditions (32.1) and (32.2): L loc t nt , v =
∑
|| smooth L 1 ti − vi
(32.1)
i∈{x,y,z,h}
smooth h L 1 (x) =
0.5x 2 if |x| < 1 (smooth h Loss Function of Regressor) |x| − 0.5 otherwise (32.2)
32 Surveillance Video-Based Object Detection by Feature Extraction …
373
District proposition network decides the most fitting bouncing boxes for the items with the assistance of characterization misfortune and relapse misfortune. The RPN gives the most fitting areas of jumping boxes. In this stage, highlights are extricated and can likewise be utilized for additional guesses. After this, RPN, the classifier, and the regressor is prepared out and out as well as particularly. While preparing in the primary fork, we utilized grayscale pictures since we were worried about tiny articles (e.g., handgun, blade) for location. Likewise, while preparing the subsequent fork, we utilized a hued picture and were worried about bigger boundaries. Assuming we had changed it over completely to grayscale, variety data would have been lost, so to protect the little article’s data, it is fundamental to hold the variety. Numerically, a picture addressed in tensor structure is given by Eq. (32.3). Dim f () (image) = (n H, wW, nC)
(32.3)
At the lth layer, it tends to be meant as First slug Info: a [ l − 1], with size (n_H^([l-1]), n_W ^([1]), n_C^([l-1])). Number of Channels: ln [ l ] C, where every K (n) has aspects ( f [ l ], f [ l ], n [ l − 1 ] C). Initiation Capability: φ [ l ]. Yield: a [ l ], with size (n_H^([1]), n_W ^([l]), n_C^([l])). So we have by Eq. (32.4–32.6). ∀n ∈ [1, 2, . . . , n_ c∧ ([]])]
(32.4)
cont(a ∧ ([l − 1]), K ∧ ((n))) = φ ∧ ([i]) (γ _i ∧ ([i]))_(a[1 − 1])(n_H ∧ ([1 − 1]), n_W ∧ ([i − 1]), n_C ∧ ([1 − 1])
(32.5)
dim f () (a ∧ ([I ]) = (n_ H ∧ ([l]), n_ W ∧ ([]), n_ C ∧ ([])))
(32.6)
At lth layer, learned boundaries are. Channels with ( f [ l], f [ l], n [ l − 1] C) × n [ l] C boundaries. Predisposition with (1 × 1 × 1) × n [ l] C boundaries. Y _ j ∧ ([i]) = ∑_ (l = 1)∧ (n_ (i − 1))W _ ( j, 1)∧ ([i])a_ i ∧ ([1 − 1]) (32.7) + b_ j ∧ ([i]) → a_ j ∧ ([i]) = φ∧ ([i])(γ _ i ∧ ([i]))_ (n_ H ∧ ([−1]), n_ W ∧ ([1 − 1]), n_ C ∧ (([l − 1]), 1)
(32.8)
374
E. Akhmetshin et al.
n_ (i − 1) = (n_ H ∧ ([t − 1]) × n_ W ∧ ([I − 1]) × n_ C ∧ ([t − 1]))
(32.9)
Resultant boundaries that are learned are • Loads Wj,l with n l − 1 × n l boundaries. • Inclination with n l boundaries. Goal capability depends on the weighted measure between each picture pixel and the focuses of each bunching in the picture. X = {x_1, x_2,· · · · · · x_n} is an example set, and n is quantity of pixels in the example set. FCM bunching calculation isolates test set into C classes utilizing fluffy c-implies hypothesis, and find gathering of each fluffy grouping place with goal capability accomplishing base. A fluffy C-division of X is a c × n lattice U = [u_ik] ∑ where ik u is enrollment level of k x to I-Class, dependent upon 0 ≤ uik ≤ 1 and _(i = 1)^c u_il > 0 FCM calculation expects that the amount of participation levels of each and every pixel to each bunch place should be 1. That is given by Eq. (32.10), ∑_ (i = 1)∧ c u_ ik = 1, k = 1, 2, . . . , n
(32.10)
The goal capability of FCM by Eq. (32.11), J _ m(U, V ) = ∑_ (k = 1)∧ n∑_ (i = 1)∧ c(u_ ik)∧ m(d_ ik)∧ 2
(32.11)
I v is the bunching focus of the I-fluffy division; m ∈ [1, + ∞ is the weighted record, which is utilized to control fluffy level of group results. To get the base worth of Jm (U,V ), let by Eq. (32.12). {∎((∂ J _m (U, V ))/(∂u_ik ) = 0□@(∂ J _m (U, V ))/(∂v_i ) = 0)□ − | (32.12) Following condition can be concluded from (32.13, 32.14), u_ ik = 1/∑_ ( j = 1)∧ c(d_ ik/d_ jk)∧ (2/(m − 1)) v_ i = (∑_ (k = 1))∧ n(u_ ik)∧ m x_ k)/(∑_ (k = 1)∧ n(u_ ik)∧ m)
(32.13) (32.14)
Fluffy C-implies bunching calculation is an iterative interaction. From the get go, instate the enrollment network U, and use (32.15) to ascertain the bunch places. Then, at that point, ascertain the goal capability as per (32.12). In the event that the worth of goal capability is under a decided limit esteem or the general change to the last goal capability esteem is under a specific edge esteem, the calculation stops; in any case, work out the new network U based on (32.14) to proceed. With the fresh information being viewed as fluffy quantities of singleton, mathematical information to be grouped are considered as an extraordinary type of phonetic information addressed by (32.16), in which s are fluffy singletons. Hence, (32.16) can oblige the fresh informational index given also. By this setting, the FPNN is planned,
32 Surveillance Video-Based Object Detection by Feature Extraction …
375
so that the increased (n + 1)-layered fluffy vectors A_p = (A_p1,…,A_pm,A_(p(n + 1))) are arranged. Each info can be either the phonetic term addressed by fluffy numbers or fluffy singletons of fresh information. It follows from (32.14) that the level arrangements of the result of a fluffy capability can be engendered through the brain organization. Then, the fluffy perceptron brain network for the pth fluffy. In the event that standard is characterized by Eqs. (32.15–32.17). Input units; X _ pi ∧ k j = [x_ pi ∧ (h ji ∧ L), x_( j∗)∧ hv] = [ A_ pi]r _ ji = 1, 2, . . . , n; ϒ j = 1, 2, . . . , m.
(32.15)
X _ ( p(n + 1))∧ h j = [ A_ ( p(n + 1))]_ n j = 1
(32.16)
Yield unit : Y _ P ∧l j j = sgn f () (net_ P ∧ h j) = sgn f () (([net_ p ∧ h j L , net |− |− _ p ∧ h ∧l]) = [sgn f () (net_ p ∧ h j ), sgn f () (net_ p ∧ (h p ))]
(32.17)
The principal layer is created by neurons whose enactment capabilities are enrollment elements of fluffy sets characterized for the information factors. For each information variable xij, K fluffy sets are characterized Ak, k = 1… K whose enrollment capabilities are the actuation elements of the relating neurons. In this manner, the results of principal layer are participation degrees related with info values, i.e., ajk = μ A k. Subsequent layer is formed by L fluffy rationale neurons. Every neuron plays out a weighted collection of a portion of the main layer yields. This conglomeration is performed utilizing the loads. For every information variable j, just a single first layer yield ajk is characterized as contribution of lth neuron. Besides, for creating scanty geographies, each second layer neuron is related with just nl < n input factors, or at least, weight grid w is inadequate.
32.5 Performance Analysis Dataset depiction: MOT20 dataset–the goal of the Saying Challenge was to make a reliable assessment of various item global positioning frameworks. Since walkers are all around concentrated on in the following field and exact following and recognition have extraordinary pragmatic importance, the test centers around numerous people following. MOT15, MOT16, and MOT17 have all made critical commitments by offering a clean dataset and exact strategy for benchmarking multi-object trackers after their most memorable delivery. The MOT20 dataset has eight successions, a big part of which is used for preparing and the other half for testing. Custom robotbased testing dataset–the derivations have been recorded on the custom dataset made
376
E. Akhmetshin et al.
utilizing the robot. The robot is a quadcopter coordinated with a Wi-Fi empowered activity camera SJ4000 upheld and constrained by a gimbal, a GPS gadget, a Pixhawk flight regulator, a telemetry, and a 5.8G 48CH transmitter for long reach correspondence. Table 32.1 gives analysis between proposed and existing technique based on various video dataset. Here the dataset analyzed are MOT20 and Custom drone-based testing dataset. The parametric analysis carried out in terms of accuracy, precision, recall, F-1 score, RMSE, and NSE. Figure 32.2 shows analysis between proposed and existing technique for MOT20 dataset. Here the proposed technique attained accuracy of 89%, precision of 81%, recall of 85%, F-1 score of 71%, RMSE of 59%, and NSE of 45%; existing HWMs attained accuracy of 85%, precision of 77%, recall of 81%, F-1 score of 65%, RMSE of 55%, and NSE of 41%; RCNN attained accuracy of 88%, precision of 79%, recall of 83%, F-1 score of 68%, RMSE of 57% and NSE of 43%. From above Fig. 32.3 comparative analysis between proposed and existing technique for Custom drone-based testing dataset is shown. Here proposed technique attained accuracy of 91%, precision of 84%, recall of 86%, F-1 score of 76%, RMSE of 61%, and NSE of 48%; existing HWMs attained accuracy of 86%, precision of 79%, recall of 82%, F-1 score of 68%, RMSE of 56%, and NSE of 42%; RCNN Table 32.1 Comparative analysis between proposed and existing technique based on various video dataset Datasets
Techniques
Accuracy
Precision
Recall
F1_Score
RMSE
NSE
MOT20
HWMs
85
77
81
65
55
41
RCNN
88
79
83
68
57
43
SV_OB_DLA
89
81
85
71
59
45
HWMs
86
79
82
68
56
42
RCNN
89
82
84
72
58
44
SV_OB_DLA
91
84
86
76
61
48
Custom drone-based testing dataset
Fig. 32.2 Comparative analysis between proposed and existing technique forMOT20 dataset
32 Surveillance Video-Based Object Detection by Feature Extraction …
377
Fig. 32.3 Comparative analysis between proposed and existing technique for custom drone-based testing dataset
attained accuracy of 89%, precision of 82%, recall of 84%, F-1 score of 72%, RMSE of 58%, and NSE of 44%.
32.6 Conclusion This examination proposes novel procedure in surveillance video-based object discovery by highlight extraction with order utilizing profound learning. The handled video has been extricated and arranged utilizing concealed convolutional fluffy perceptron brain networks. Additionally, as gigantic group is distinguished, bouncing boxes on objects are gotten, and red boxes are additionally noticeable on the off chance that social separating is abused. At the point when observationally tried throughout ongoing information, proposed strategy is laid out to be solid than the current methodologies as far as deduction time and casing rate. Proposed procedure attained accuracy of 91%, accuracy of 84%, review of 86%, F-1 score of 76%, RMSE of 61%, and NSE of 48%.
References 1. Zeng, T., Wang, J., Cui, B., Wang, X., Wang, D., Zhang, Y.: The equipment detection and localization of large-scale construction jobsite by far-field construction surveillance video based on improving YOLOv3 and grey wolf optimizer improving extreme learning machine. Constr. Build. Mater. 291, 123268 (2021) 2. Liu, Y.X., Yang, Y., Shi, A., Jigang, P., Haowei, L.: Intelligent monitoring of indoor surveillance video based on deep learning. In: 2019 21st International Conference on Advanced Communication Technology (ICACT), pp. 648–653. IEEE (2019) 3. Magoo, R., Singh, H., Jindal, N., Hooda, N., Rana, P.S.: Deep learning-based bird eye view social distancing monitoring using surveillance video for curbing the COVID-19 spread. Neural Comput. Appl.Comput. Appl. 33(22), 15807–15814 (2021)
378
E. Akhmetshin et al.
4. Elhoseny, M.: Multi-object detection and tracking (MODT) machine learning model for realtime video surveillance systems. Circuits Syst. Signal Process. 39(2), 611–630 (2020) 5. Kim, S., Kwak, S., Ko, B.C.: Fast pedestrian detection in surveillance video based on soft target training of shallow random forest. IEEE Access 7, 12415–12426 (2019) 6. Hou, B., Zhang, J.: Real-time surveillance video salient object detection using collaborative cloud-edge deep reinforcement learning. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2021) 7. Junayed, M.S., Islam, M.B.: A deep-learning based automated COVID-19 physical distance measurement system using surveillance video. In: International Conference on Recent Trends in Image Processing and Pattern Recognition, pp. 210–222. Springer, Cham (2022) 8. Lyu, Z., Zhang, D., Luo, J.: A GPU-free real-time object detection method for apron surveillance video based on quantized MobileNet-SSD. IET Image Process (2022) 9. Rekavandi, A.M., Xu, L., Boussaid, F., Seghouane, A.K., Hoefs, S., Bennamoun, M.: A guide to image and video based small object detection using deep learning: case study of maritime surveillance. arXiv preprint arXiv:2207.12926 (2022) 10. Khan, S., AlSuwaidan, L.: Agricultural monitoring system in video surveillance object detection using feature extraction and classification by deep learning techniques. Comput. Electr. Eng.. Electr. Eng. 102, 108201 (2022) 11. Raja, R., Sharma, P.C., Mahmood, M.R., Saini, D.K.: Analysis of anomaly detection in surveillance video: recent trends and future vision. Multimedia Tools Appl. 1–17 (2022) 12. Vasavi, S., Vineela, P., Raman, S.V.: Age detection in a surveillance video using deep learning technique. SN Comput. Sci. 2(4), 1–11 (2021)
Chapter 33
Deep Learning-Based Recommender Systems—A Systematic Review and Future Perspective S. Krishnamoorthi and Gopal K. Shyam
Abstract With the ever-increasing volume of online information, recommender systems (RS) have been an effective technique to overcome such information overload. Given its widespread implementation in a variety of web applications, the value of RS cannot be emphasized enough. RS has the potential to improve numerous issues identified with over-choice. Deep learning’s influence is also widespread, with new evidence of its efficacy in information retrieval and RS research. In RS, the field of deep learning is emerging. This article presents a thorough evaluation of recent research findings on traditional RS and deep learning-based recommender systems (DLRS). This review covers articles published between 2018 and 2022 in four major research databases: Science Direct, IEEE Explore, Springer, and Wiley. First, we provide a complete overview, comparison, and summary of traditional RS. We then systematically analyze DLRS challenges and related solution approaches. Finally, under the discussion of open issues at DLRS, we highlight possible research directions in this area. In this review, we report quantitative data from previous studies to highlight comparisons based on metrics such as F1-score accuracy, recall, accuracy, and error functions.
33.1 Introduction Modern web services are on the rise, and in the previous two decades, a vast amount of digital information exists. In this instance, users have difficulty obtaining the necessary information [1]. Therefore, users turn to social media for advice, recommendations, and relevant information. The selection process is a challenging one because of the abundance of information available online [2]. Users use social networking S. Krishnamoorthi (B) · G. K. Shyam Department of Computer Science and Engineering, School of Engineering, Presidency University, Bengaluru, Karnataka, India e-mail: [email protected] G. K. Shyam e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_33
379
380
S. Krishnamoorthi and G. K. Shyam
sites on an Internet platform to get relevant information about their choices, resulting in the rise of recommender systems (RS). RS is a technology that provides accurate recommendations for users. RS solves the issue of overloaded information by tailoring the user experience in a customized way based on the user’s preferences and context. The recommendation of the items by RS is based on the user’s activity; it also intends to examine the historical behavior and preferences of the user, develop a model, and give an automated recommendation. Additionally, it recommends unknown (new) items based on the user’s interests. Generally, the traditional RS has been divided into three subdivisions [3]. Firstly, in content-based filtering systems, items are recommended to a user based upon a description of the items and a profile of the user’s interests [4]. Secondly, collaboration filtering (CF)-based RS uses behavioral history and user preferences without receiving the personal information and description of the items of interest. Thirdly, to overcome the limitations of native content-based and CF techniques, hybrid recommendation combines collaborative filtering and content-based recommendation. It enhances the accuracy of predictions. Importantly, it avoids issues like data sparsity and information loss. Deep learning emerges as a solution to the technical shortcomings of traditional recommendation models, improving the quality of recommendations by absorbing non-trivial nonlinear relationships between users and articles. Complex abstractions can be encoded as data representations with the help of deep learning. Additionally, it captures the intricate connections between the data itself from readily available data sources including contextual, textual, and visual data. A significant revolution in deep learning-based recommendation systems (DLRS) has been made to enhance the performance of RS. The accomplishments in deep learning have led to an exponential increase in the research activities of RS, which has directed the efforts of various academic researchers toward increasing research awareness. A thorough review and summarization of the consecutive research work help researchers to identify the challenges and opportunities effectively. Even though various research activities have been undertaken to inspect DLRS [5], a relatively small number of papers have been published for the purpose of reviewing the existing work. This paper presents a literature review that examines, summarizes, and reviews recent trends, featuring high-quality research articles published on DLRS. It also provides researchers with information on future research directions. The main contributions of this paper are as follows: (1) classifying RS as traditional or DLRS. Traditional RS are further subclassified as collaborative filtering, content-based, and hybrid recommender systems. Similarly, DLRS are further subclassified based on the types of learning models they use. (2) Presenting a comprehensive review of recent work in both traditional and DLRS. A variety of domain applications, metrics, and data sets involved in each of these categories are presented. (3) Investigating open issues in DLRS, analyzing practical challenges, and proposing some future directions for RS research and application.
33 Deep Learning-Based Recommender Systems—A Systematic Review …
381
The remainder of this paper is organized as follows: Sect. 33.2 explains the detailed systematic literature review, Sect. 33.3 explains the evaluation and analysis, in brief. Section 33.4 explains the open challenges of the topic, and Sect. 33.5 explains the conclusion of the topic.
33.2 Systematic Review Literature Several studies on the features and challenges of RS have been conducted over the last few years by various researchers. However, no quality reviews have been done covering all the aspects of RS development. This review paper exhibits a comprehensive analysis of the RS development in recent research advancements. The taxonomy of RS is depicted in Fig. 33.1. Traditional filtering-based and deep learning-based RS are the two major types of RS. Traditional RS is subclassified into three groups based on the filtering techniques employed. They are content-based filtering, collaborative-based filtering, and hybrid techniques.
Recommender Systems
Deep Learning Traditional Filtering Based Collaborative Content
Recommendation Scenarios/Applications RNN
Sequential RS
CNN
Media RS
DNN
Item RS
DRL
Dynamic RS -
DBN
Image/item RS
Hybrid
Fig. 33.1 Taxonomy of RS
382
S. Krishnamoorthi and G. K. Shyam
Collaborative filtering is a domain-independent prediction technique for content such as movies and music that cannot be easily and adequately described by metadata [6]. The working principle of the collaborative filtering method is to (i) build a database of user preferences for items, (ii) match new users with the database to discover their neighbors with similar tastes in history, and (iii) recommend items that neighbors like to new users. Content-based filtering is a domain-dependent algorithm that focuses on analyzing item characteristics to make predictions. Content-based filtering technology makes recommendations based on user profiles and characteristics gleaned from the content of items the user has already rated. Hybrid filtering techniques combine multiple recommendation techniques to provide better system optimization and improve search results. The idea behind the hybrid approach is to combine RS algorithms to solve the problem. It produces more effective and accurate recommendations than a single algorithm because deficiencies of one algorithm can be corrected by another algorithm. Another broad category of RS is developed using deep learning techniques. DLRS are further classified based on the types of learning models they use. Convolutional neural networks (CNN), recurrent neural networks (RNN), deep belief networks (DBN), deep neural networks (DNN), and deep reinforcement learning (DRL) are the major deep learning models applied in RS. Figure 33.1 also highlights the recommendation scenarios or applications of each deep learning-based recommender systems. Several research articles related to the filtering-based RS are reviewed. The findings and shortcomings of each proposed RS are observed. A brief comparison of the surveyed research articles in filtering-based RS techniques is discussed in Table 33.1.
33.2.1 Overview of Deep Learning-Based Recommender Systems A subfield of artificial neural network study is deep learning. A multilayer perceptron (MLP) with several hidden layers is a deep learning structure. The study of deep learning is a recent development in machine learning. It is intended to mirror the way that the human brain processes information such as text, images, and audio. Deep learning is split into supervised and unsupervised learning, similar to machine learning. Deep learning is used for representation learning by exploiting several layers of information-processing stages in hierarchical architectures. Several achievements have been made in recommendation-related application areas with the rise of the deep neural network. Because of the capacity to interpret nonlinear data, DLRS outperforms standard RS. The main advantages of using DL for recommendations include representation learning, flexibility, sequence modeling, and nonlinear transformation. Recent research articles on DLRS are reviewed in this section.
33 Deep Learning-Based Recommender Systems—A Systematic Review …
383
Table 33.1 Comparison of filtering-based recommender system Authors and citation
Year
Type
Technique
Findings
Shortcomings
Zhang et al. [7]
2020
CF
Neighborhood reduction in UBCF approach
Resolves cold-start and data sparsity issues
Failed to incorporate neighborhood reduction into item-based CF
Zhao et al. [8]
2019
CF
Regression-based collaborative filtering
Suggests a suitable ordinary differential equation (ODE)
Computational cost is higher than a fixed-step algorithm
D’Auria et al. [9]
2019
CF
CF-based network It computes the representation trust-enhanced user learning approach similarity of different users
-
Luo et al. [10]
2020
CF
Personalized recommender system
Discovers energy-effective home appliances from large-scale residential consumers
It does not integrate with home automation products
Cui et al. [11]
2020
CF
Preference pattern-based personalized RS
Gives users accurate and quick results in a short period of time
A simple data mining technique is used to study the user’s model
Yang Li et al. [12]
2019
CF
Deep learning approach
Minimizes the quantization loss and analyzes the issues in the deep collaborative hashing codes on user ratings
It is not suitable for the implicit feedback ratings
Natarajan et al. [13]
2020
CF
RS—LOD and MFLOD
Resolves the data sparsity and cold-start issues
–
Feng et al. [14]
2020
CF
PMF model combining the multifactor similarities and global rating information
Maximizes the quality This is not of the RS in data suitable for the sparsity conditions incremental learning approach
Pujahari et al. [15]
2020
CF
Probabilistic matrix factorization model
Produces an effective ranking for items
Sun et al. [16]
2020
CF
Sampling approach
Suggests an automatic Not supported by sampling approach for heterogeneous the new defect data data
Not suitable for social and location-type RS
(continued)
384
S. Krishnamoorthi and G. K. Shyam
Table 33.1 (continued) Authors and citation
Year
Type
Technique
Findings
Shortcomings
Wang et al. [17]
2021
CF
A light relational graph convolutional network (RGCN)-based CF method
Lightens the negative impacts of information sparsity and makes strides in the execution faster
Scalability and performance of RGCN are of concern for larger data sets
Fu et al. [18] 2018
CB
Linear programming
Minimizes the Revenue weighted-sum costs of maximization the network protocol not designed to support bundle recommendation
Wang et al. [19]
2018
CB
Chi-square feature selection and softmax regression approach
Attains the interactive online response and supports the authors’ submission of their documents
Constrained capacity to grow the users’ existing interests
Ravi et al. [20]
2019
Hybrid
Ensemble-based co-training approach integrated swarm intelligence algorithm
Maximizes the effectiveness of the travel RS
Agreement decision-making issues in group travel are not addressed
Qian et al. [21]
2019
Hybrid
Emotion-aware recommender framework
Improves the RS’s effectiveness by categorizing explicit/ implicit/emotional information
This framework is not suitable for the online RS
Several research articles related to the DLRS are reviewed. A brief comparison of the surveyed research in DLRS techniques is discussed in Table 33.2.
33.3 Evaluation and Analysis This section discusses the process of selecting the research papers for review and the comparative analysis of the papers reviewed.
33 Deep Learning-Based Recommender Systems—A Systematic Review …
385
Table 33.2 Recent research methodologies utilized for deep learning-based RS in the year 2018– 2022 Author and citation Year
Learning model Description
Limitations
Liu et al. [22]
2018 RNN
GRU approach was It is not suitable to utilized to examine the handle large volumes reward sequences in of data in RS different intervals and prevents the elimination of gradients at the time of training
Cui et al. [23]
2018 RNN
MV–RNN approach was implemented to form the sequential RS and to address the cold-start issues
Item detection and segmentation are not obtained due to the large proportion of unrelated background
Xu et al. [24]
2019 RNN
A slanderous user detection RS (SDRS) incorporating hierarchical dual attention RNN (HDAN) and a modified GRU (mGRU) approach was utilized to estimate the opinion range for the reviews
It does not resolve the problems of data sparsity and cold-start
Tian et al. [25]
2020 RNN
Generic recommender It is a computationally systems based on Deep complex method RNN were utilized to improve the practicability and transferability of RS and enhance the training process
Wang et al. [26]
2020 RNN
CDHRM was utilized to integrate the cross-domain sequential data by discovering the connections between the cross-domain behavior of the consumers
The presented work failed to integrate more kinds of data like behavior’s content
Chen et al. [27]
2019 CNN
Contextual data of the document was intensively captured by the deformable convolutional network matrix factorization (DCNMF)
Too many recommendation filters lead the system to produce information overload
(continued)
386
S. Krishnamoorthi and G. K. Shyam
Table 33.2 (continued) Author and citation Year
Learning model Description
Limitations
Shu et al. [28]
2018 CNN
CNN-based content-based recommendation algorithm to forecast the text data’s latent factors in the multimedia resources
Inability to be spatially invariant for the supplied data
Zhang et al. [29]
2020 CNN
Observe the maximum gap dependence between auxiliary information of the various items to rectify the data sparsity issue in the recommendation issue
Auxiliary data of users and time sequence rating data are not considered
Da’u et al. [30]
2021 CNN
The accuracy of item recommendations was improved by leveraging the neural attention approach to develop adaptable user-item representations and user-item interaction
Requires significant time to train, and hence users may not see immediate impact
Farhan et al. [31]
2020 CNN
CNN-based bug triage Not suitable for larger approach automatically bug data sets suggests the proper developer by learning the past assignment patterns
Gong et al. [32]
2019 DNN
Hybrid deep neural network model by integrating the network embeddings and attribute attention (DFRec++ ) to construct the social friend recommendations using interactive semantics and contextual improvement
Do not consider it to be an effective method because the prediction accuracy is low
(continued)
33 Deep Learning-Based Recommender Systems—A Systematic Review …
387
Table 33.2 (continued) Author and citation Year
Learning model Description
Limitations
Zhang et al. [33]
2018 DNN
To resolve the cold-start issue, a DNN-based CF recommendation algorithm was used to obtain the accurate latent feature by improving the conventional matrix factorization algorithm
Due to the large training model, there is a burden on computational resources
Ma et al. [34]
2020 DNN
MISR approach was utilized to extricate the features and hidden structures from a different kind of communication between the services and mashups to rectify the cold-start issues
This is not suitable for the service recommendation scenario
Zhang et al. [35]
2021 DNN
A DNN-based dual adversarial network aimed to address the accuracy and quality issues in cross-domain RS
May not perform well in other target domains with data sparsity issues
Liu et al. [36]
2020 DRL
To rectify the issues in the interactive RS the DRL-based recommendation approach was employed
Optimizing long-term performance leads to a negative impact on the system
Xiao et al. [37]
2020 DRL
A DRL-based user profile perturbation model was utilized to secure the consumer’s privacy and to select the privacy budget against assaulters
Do not provide effective privacy fortification for the user profile perturbation system
Mulani et al. [38]
2020 DRL
A RS based on the DRL approach was utilized for the healthcare field to give an appropriate medical suggestion
People’s dynamic sentiments were not captured, and the long-term impacts were not considered (continued)
388
S. Krishnamoorthi and G. K. Shyam
Table 33.2 (continued) Author and citation Year
Learning model Description
Limitations
Chang et al. [39]
2019 DRL
RPMRS is utilized to Computationally hard recognize the to process consumer’s preferences from the song search list
Fu et al. [40]
2021 DRL
DRL-based deep hierarchical category-based RS (DHCRS) reconstructs the flat action space into a two-level item and category hierarchy using categories of items
DHCRS relies on explicit category classification. However, items without well-defined categories may exist in the real world
Pujahari et al. [41]
2019 DBN
Preference relation-based restricted Boltzmann machine and collaborative filtering approach were utilized to maximize the standard of the RS
While changing the user preference, this approach does not perform efficiently
Hazrati et al. [42]
2020 DBN
The visual features-based RS is employed to human demonstration free RS and to present the visual features of the video items
Not suitable for face expressions within the video
Chen et al. [43]
2020 DBN
Conditional restricted Boltzmann machine (CRBM) was employed to forecast the rating preference for the top-k RS
It is not suitable for data points like geographic location, text comments of users on items, and information about users’ social networks
Zhang et al. [44]
2021 DNN
The proposed model is able to correct prediction values and learn about high-dimensional and nonlinear intuitive among service and customers by joining MLP and a likeness versatile collector
It requires significant time to train with real-time data sets. The impact of context information is not considered
(continued)
33 Deep Learning-Based Recommender Systems—A Systematic Review …
389
Table 33.2 (continued) Author and citation Year
Learning model Description
Limitations
Wu et al. [45]
2021 DNN
Developed a neural architecture for social recommendation that naturally integrates the relationship between user-item interaction behavior and social network structure
Dynamic changes in the social network structure are not handled
Liang [46]
2022 DRL
A multi-iteration preparing methodology instructs an RL specialist how to interact with the outside heterogeneous data network environment and pick-up rewards
Instead of being trained and tested in a real-time, dynamic online environment, the learning process is conducted offline
Li et al. [47]
2022 CNN
An auxiliary review-based personalized attentional CNN (ARPCNN) is presented to address the sparsity issue and improve explainability
Performance and training time are not evaluated with real-time data sets
Lei et al. [48]
2020 DRL
With the help of the social attention layer to model the influence of individual users and social neighbors, the Social Attentive Deep Q-Network (SADQN) is able to approximate the optimized value function
When used in production systems with billions of items, SADQNs may experience computing problems
Du et al. [49]
2019 CNN
An attention-based CNN was developed to measure the degree to which user preferences and suggested articles are similar, and it greatly increases performance even with a small number of user-rated articles
This model combines the data from diverse bolster articles at a high level
(continued)
390
S. Krishnamoorthi and G. K. Shyam
Table 33.2 (continued) Author and citation Year
Learning model Description
Limitations
Huang et al. [50]
2021 CNN
In order to increase the user and item vectors’ capacity to be explained, the neural explicit factor model (NEFM) model suggests adding both a user-feature attention matrix and an item-feature quality matrix
It works only based on the item feature and does not support description-based data
Zheng et al. [51]
2022 DNN
To address the issue of nonlinear connection between users and items and the challenge of information sparsity, a new model, the DNN matrix factorization recommendation model (K-DNNMF), was proposed
This approach does not handle the energetic nature of the user-item relationship
Feng et al. [52]
2022 DNN
The Social Regularized Neural Framework Factorization (SoNeuMF) show is displayed to remake user-to-user and user-to-item intelligent for social recommenders
The dynamic nature of deep networks for social recommendations is not taken into consideration because interactions continuously evolve
33.3.1 Selection Process Deep learning approaches are applied for recommendations based on the below four key principle benefits: • Nonlinear Transformation—Compared to traditional, deep learning-based linear RS are capable of processing the complex user/item interaction pattern. • Sequence Modeling—Deep neural networks are a good fit for sequence modelingbased next basket or session-based recommendations. • Representation learning—Since there is a huge sum of descriptive data related to user/item interactions, representative learning advances for making better recommendations. • Flexibility—Deep learning-based RS can build hybrid recommendation models to tailor recommendation for any needs.
33 Deep Learning-Based Recommender Systems—A Systematic Review … Fig. 33.2 Number of published research articles taken for analysis (year-wise)
2018
391
ARTICLES SELECTION YEARWISE 2019 2020 2021
20
2022
16 10
10
10
6
5
0 Article Counts
Fig. 33.3 Number of published research articles taken for analysis (journal-wise)
Journal wise article selection 25
23 17
20 15 10
5
5
2
0 IEEE
Elsevier
Springer
Wiley/Others
The articles analyzed in this review were based on research work that (i) leveraged above mentioned deep learning capabilities to improve RS, and (ii) published between 2018 and 2022 in four renowned journal databases: Science Direct, IEEE Explore, Springer, and Wiley. Figure 33.2 shows the breakdown of research papers reviewed in this review based on their year of publication (both traditional and DLRS). Figure 33.3 shows the number of research articles reviewed for analysis based on a database of journal articles. The analysis reveals that deep learning methods were widely and actively used in the current study. This review highlights 32 research articles on DLRS-related topics such as RNNs, CNNs, DNNs, DRLs, DBNs, and DRLs for recommender systems adopted between 2018 and 2022.
33.3.2 Comparative Analysis The goal of RS is to suggest relevant items to the user. There have been a variety of strategies and metrics introduced to evaluate the RS. Research papers analyzed in the RS are mainly based on four common metrics, such as predictive accuracy metrics, decision support metrics, rank accuracy metrics, and business-specific measures. • Predictive accuracy metrics: Predictive accuracy, often known as a rating prediction metric, assesses how close RS predicted ratings are to actual consumer ratings.
392
S. Krishnamoorthi and G. K. Shyam
This type of metric is commonly used to evaluate non-binary evaluations. It is best used when accurate evaluation predictions are important. The most essential measures for this are: (i) mean absolute error (MAE), (ii) mean squared error (MSE), (iii) root mean squared error (RMSE), and (iv) normalized mean absolute error (NMAE). • Decision support metrics: Decision support metrics give knowledge into how accommodating the recommender was in helping clients in pursuing better choices by choosing great things and staying away from terrible ones. Precision and recall are two of the metrics that are used the most frequently. • Rank accuracy metrics: A RS’s capacity to estimate the suitable order of items based totally on the user’s preferences is measured by rank accuracy or ranking prediction. This is particularly useful in scenarios where a long, sorted list of recommended items is provided to the user. • Business-specific measures: Many factors influence how a company evaluates the efficacy and financial worth of a deployed RS, including the application domain and, more importantly, the firm’s business strategy. Click-through rates, results on sales distribution, sales and revenues, customer engagement and conduct, conversion and adoption are types of enterprise metrics for RS.
33.4 Open Challenges The DLRS is superior to traditional RS because of its ability to process nonlinear data. The main advantages of using deep learning for recommendations include nonlinear transformation, representation learning, sequence modeling, and flexibility. Deep learning algorithms could also be customized for specific needs. CNNs, for example, are well suited to non-Euclidean data, while DNNs are well suited to sequential data processing. Data dimensionality reduction is aided by an auto-encoder, and neural attention-based systems are excellent for filtering needed data and selecting the most representative objects. On the basis of the survey results, we propose a few potential future directions for DLRS. • Cross-domain DLRS: In terms of identifying generalizations and differences across multiple domains, deep learning is effective in producing good recommendations on cross-domain platforms. Cross-domain recommendation can use the knowledge learned from the source domain to help target domain recommendation. Single domain RS is focused entirely on one domain, ignoring the consumers’ other domain preferences, resulting in cold-start and data sparsity difficulties. Because deep learning learns complex abstractions that untangle the diversity of diverse domains, it is highly suited to transfer learning. As a result, cross-domain RS can employ transfer learning. It is a method for incorporating knowledge from other domains into learning tasks in one area. As a result, DLRS could conduct research on this.
33 Deep Learning-Based Recommender Systems—A Systematic Review …
393
• Explainability in DLRS: By combining heterogeneous data from multiple sources with deep learning models, DLRS can directly predict user preferences. The neural network’s weights between neurons are determined by model training. It is difficult to provide an adequate explanation that is directly related to the outcomes that are recommended. As a result, it appears that offering explainable recommendations is difficult. One option is to provide consumers with logical forecasts that allow them to comprehend the reasons that go into the recommendations. It gives a good reason why the RS thinks such a suggestion is reasonable. To understand more about the model, another option is to focus on user explainability, probing weights, and activations. As a result, it is another possible DLRS research topic, with the next step being to build good deep learning-based RS that can deliver conversational or generative explanations. • Scalability of DLRS: Scalability is essential to the practical application of RS as the amount of data increases. When it comes to processing large amounts of data, deep learning has demonstrated tremendous success and promise. With exponential parameter expansion, we can strike a compromise between model complexity and scalability. Network scaling is now approaching deeper layers with limited parameters. Another interesting research topic is the compression of largedimensional input data into compact embeddings to lower the space for training the model and the computation time. Future studies will focus on compressing excess parameters to move forward network exactness, whereas diminishing the number of parameters. • Security and Privacy: Most researchers overlook key issues like user privacy and system security. For instance, because the connections between users and products are made public when utilizing DLRS and knowledge graph (KG) to make explainable recommendations, personal privacy is potentially jeopardized. DLRS can be utilized to select a privacy budget and mitigate inference attacks, while differential privacy is typically utilized to safeguard user privacy. DNNs are also susceptible to attacks like adversary attacks and data poisoning, according to recent research. • Multitask learning in DLRS: Many deep learning applications, from visual assistants to robotics, have shown great success due to recent developments in multitask learning. Several of the studies evaluated also used multitask learning in a deep neural network to increase RS performance over single task learning. The benefits of using deep neural network-based multitask learning include overfitting, explainable recommendations, and data sparsity. Further studies may consider deploying multitask learning in cross-domain RS. • Attention-based DLRS: The non-interpretable problems of deep learning models have been mitigated to some extent by attention mechanisms. Because the attention weights do not merely provide bits of information about the model’s inner workings, the attention mechanism has spurred higher degrees of interpretability. The attention mechanism can give explainable results to the users. Applying the attention mechanism to RS can aid in determining the most informative aspects of items, suggesting the most representative items, and improving the model’s
394
S. Krishnamoorthi and G. K. Shyam
interpretability. The attention mechanism is now being used in DLRS. The attention mechanism, for example, provides a superior answer and aids the RNN in better memorizing inputs. From the input, attention-based CNNs can determine the most significant aspect of the problem. We believe that this is potential study direction, given that models are now prepared to capture the most informative components of the inputs.
33.5 Conclusion The sharp increase in data generated by electronic and automated equipment needs the development of smart technologies and applications to store, process, access, and analyze data appropriately and carefully to obtain the best user advantage. The main aim of this article is to examine the recent advances in RS technologies that have been developed using traditional and deep learning frameworks. This article discussed the techniques, findings, and shortcomings of each filtering-based technology (such as collaboration, content-based, and hybrid methods) used in traditional recommendation systems. Furthermore, this work critically reviews recent research findings and their limitations in the deep learning approaches used in RS. Unlike traditional RS, deep learning techniques used in RS combine different types of heterogeneous data from multiple sources and model sequential patterns of user behavior. It automatically learns various characteristics of the users and items and effectively reflects the various preferences of the users. Hence, improving the accuracy of recommendations. This paper is intended to help new and beginner researchers comprehend the current evolution of traditional filtering-based RS and DLRS. Expert researchers can also utilize this study as a reference for developing advanced RS and as a reference for its limitations.
References 1. Zhang, X., Liu, H., Chen, X., Zhong, J., Wang, D.: A novel hybrid deep recommendation system to differentiate user’s preference and item’s attractiveness. Inf. Sci. 519, 306–316 (2020) 2. Shahbazi, Z., Hazra, D., Park, S., Byun, Y.C.: Toward improving the prediction accuracy of product recommendation system using extreme gradient boosting and encoding approaches. Symmetry 12(9), 1566 (2020) 3. Lin, H., Huang, Y., Luo, Y.: The construction of learning resource recommendation system based on recognition technology. In: Smart Innovations in Communication and Computational Sciences, pp. 255–261. Springer, Singapore (2021) 4. Sreepada, R.S., Patra, B.K.: Enhancing long tail item recommendation in collaborative filtering: an econophysics-inspired approach. Electron. Commer. Res. Appl. 49 (2021), Article No. 101089 5. Chiu, M.C., Chen, T.C.T.: Assessing sustainable effectiveness of the adjustment mechanism of a ubiquitous clinic recommendation system. Health Care Manag. Sci. 23(2), 239–248 (2020)
33 Deep Learning-Based Recommender Systems—A Systematic Review …
395
6. Vithya, M., Sangaiah, S.: Recommendation system based on optimal feature selection algorithm for predictive analysis. In: Emerging Research in Data Engineering Systems and Computer Communications, pp. 105–119. Springer, Singapore (2020) 7. Zhang, Z., Zhang, Y., Ren, Y.: Employing neighborhood reduction for alleviating sparsity and cold start problems in user-based collaborative filtering. Inf. Retrieval J. 23(4), 449–472 (2020) 8. Zhao, J., Wang, H., Zhang, H.: A regression-based collaborative filtering recommendation approach to time-stepping multi-solver co-simulation. IEEE Access 7, 22790–22806 (2019) 9. Wang, W., Chen, J., Wang, J., Chen, J., Liu, J., Gong, Z.: Trust-enhanced collaborative filtering for personalized point of interests recommendation. IEEE Trans. Ind. Inf. 16(9), 6124–6132 (2019) 10. Luo, F., Ranzi, G., Kong, W., Liang, G., Dong, Z.Y.: Personalized residential energy usage recommendation system based on load monitoring and collaborative filtering. IEEE Trans. Ind. Inf. 17(2), 1253–1262 (2020) 11. Cui, Z., Xu, X., Fei, X.U.E., Cai, X., Cao, Y., Zhang, W., Chen, J.: Personalized recommendation system based on collaborative filtering for IoT scenarios. IEEE Trans. Serv. Comput. 13(4), 685–695 (2020) 12. Li, Y., Wang, S., Pan, Q., Peng, H., Yang, T., Cambria, E.: Learning binary codes with neural collaborative filtering for efficient recommendation systems. Knowl.-Based Syst. 172, 64–75 (2019) 13. Natarajan, S., Vairavasundaram, S., Natarajan, S., Gandomi, A.H.: Resolving data sparsity and cold start problem in collaborative filtering recommender system using linked open data. Expert Syst. Appl. 149 (2020), Article No. 113248 14. Feng, C., Liang, J., Song, P., Wang, Z.: A fusion collaborative filtering method for sparse data in recommender systems. Inf. Sci. 521, 365–379 (2020) 15. Pujahari, A., Sisodia, D.S. : Pair-wise preference relation based probabilistic matrix factorization for collaborative filtering in recommender system. Knowl. Based Syst. 196 (2020), Article No.105798 16. Sun, Z., Zhang, J., Sun, H., Zhu, X.: Collaborative filtering based recommendation of sampling methods for software defect prediction. Appl. Soft Comput. 90(6), Article No.106163 (2020) 17. Wang, C., Guo, Z., Li, G., Li, J., Pan, P., Liu, K.: A light heterogeneous graph collaborative filtering model using textual information. Knowl. Based Syst. 234 (2021), Article No.107602 18. Fu, Y., Yu, Q., Quek, T.Q., Wen, W.: Revenue maximization for content-oriented wireless caching networks (CWCNs) with repair and recommendation considerations. IEEE Trans. Wireless Commun. 20(1), 284–298 (2020) 19. Wang, D., Liang, Y., Xu, D., Feng, X., Guan, R.: A content-based recommender system for computer science publications. Knowl. Based Syst. 157, 1–9 (2018) 20. Ravi, L., Subramaniyaswamy, V., Vijayakumar, V., Chen, S., Karmel, A., Devarajan, M.: Hybrid location-based recommender system for mobility and travel planning. Mob. Netw. Appl. 24(4), 1226–1239 (2019) 21. Qian, Y., Zhang, Y., Ma, X., Yu, H., Peng, L.: EARS: emotion-aware recommender system based on hybrid information fusion. Inf. Fusion 46, 141–146 (2019) 22. Liu, J., Wu, C., Wang, J.: Gated recurrent units based neural network for time heterogeneous feedback recommendation. Inf. Sci. 423, 50–65 (2018) 23. Cui, Q., Wu, S., Liu, Q., Zhong, W., Wang, L.: MV-RNN: A multi-view recurrent neural network for sequential recommendation. IEEE Trans. Knowl. Data Eng. 32(2), 317–331 (2018) 24. Xu, Y., Yang, Y., Han, J., Wang, E., Ming, J., Xiong, H.: Slanderous user detection with modified recurrent neural networks in recommender system. Inf. Sci. 505, 265–281 (2019) 25. Tian, Y., Peng, S., Zhang, X., Rodemann, T., Tan, K.C., Jin, Y.: A recommender system for metaheuristic algorithms for continuous optimization based on deep recurrent neural networks. IEEE Trans. Artif. Intell. 1(1), 5–18 (2020) 26. Wang, Y., Guo, C., Chu, Y., Hwang, J.N., Feng, C.: A cross-domain hierarchical recurrent model for personalized session-based recommendations. Neurocomputing 380, 271–284 (2020) 27. Chen, H., Fu, J., Zhang, L., Wang, S., Lin, K., Shi, L., Wang, L.: Deformable convolutional matrix factorization for document context-aware recommendation in social networks. IEEE Access 7, 66347–66357 (2019)
396
S. Krishnamoorthi and G. K. Shyam
28. Shu, J., Shen, X., Liu, H., Yi, B., Zhang, Z.: A content-based recommendation algorithm for learning resources. Multimedia Syst. 24(2), 163–173 (2018) 29. Zhang, C., Wang, C.: Probabilistic matrix factorization recommendation of self-attention mechanism convolutional neural networks with item auxiliary information. IEEE Access 8, 208311–208321 (2020) 30. Da’u, A., Salim, N., Idris, R.: An adaptive deep learning method for item recommendation system. Knowl.-Based Syst. 213(8) (2021). Article No.106681 31. Zaidi, S.F.A., Awan, F.M., Lee, M., Woo, H., Lee, C.G.: Applying convolutional neural networks with different word representation techniques to recommend bug fixers. IEEE Access 8, 213729–213747 (2020) 32. Gong, J., Zhao, Y., Chen, S., Wang, H., Du, L., Wang, S., Du, B.: Hybrid deep neural networks for friend recommendations in edge computing environment. IEEE Access 8, 10693–10706 (2019) 33. Libo, Z., Tiejian, L., Fei, Z.: A recommendation model based on deep neural network. IEEE Access 6, 9454–9463 (2018) 34. Ma, Y., Geng, X., Wang, J.: A deep neural network with multiplex interactions for cold-start service recommendation. IEEE Trans. Eng. Manage. 68(1), 105–119 (2020) 35. Zhang, Q., Liao, W., Zhang, G., Yuan, B., Lu, J.: A Deep Dual Adversarial network for Crossdomain recommendation. IEEE Trans. Knowl. Data Eng. 1–1 (2021) 36. Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., & He, X., 2020. “State representation modeling for deep reinforcement learning based recommendation”. Knowledge-Based Systems, 205, Article No.106170. 37. Chen, X., Yao, L., McAuley, J., Zhou, G., Wang, X.: A survey of deep reinforcement learning in recommender systems: a systematic review and future directions (2021). arXiv preprint arXiv: 2109.03540 38. Mulani, J., Heda, S., Tumdi, K., Patel, J., Chhinkaniwala, H., Patel, J.: Deep reinforcement learning based personalized health recommendations. In: Deep Learning Techniques for Biomedical and Health Informatics, pp. 231–255. Springer, Cham (2020) 39. Chang, J.W., Chiou, C.Y., Liao, J.Y., Hung, Y.K., Huang, C.C., Lin, K.C., Pu, Y.H.: Music recommender using deep embedding-based features and behavior-based reinforcement learning. Multimedia Tools Appl. 80(26), 34037–34064 (2021) 40. Fu, M., Agrawal, A., Irissappane, A.A., Zhang, J., Huang, L., Qu, H.: Deep reinforcement learning framework for category-based item recommendation. IEEE Trans. Cybern. 1–14 (2021) 41. Pujahari, A., Sisodia, D.S.: Modeling side information in preference relation based restricted boltzmann machine for recommender systems. Inf. Sci. 490, 126–145 (2019) 42. Hazrati, N., Elahi, M.: Addressing the New Item problem in video recommender systems by incorporation of visual features with restricted Boltzmann machines. Expert Syst. 38(6) (2021). Article No. 12645 43. Chen, Z., Ma, W., Dai, W., Pan, W., Ming, Z.: Conditional restricted Boltzmann machine for item recommendation. Neurocomputing 385, 269–277 (2020) 44. Zhang, Y., Yin, C., Wu, Q., He, Q., Zhu, H.: Location-aware deep collaborative filtering for service recommendation. IEEE Trans. Syst. Man Cybern. Syst. 6, 3796–3807 (2021) 45. Wu, L., Member, Sun, P., Hong, R., Ge, Y., Wang, M.: Collaborative neural social recommendation. IEEE Trans. Syst. Man Cybern. Syst. 51(1) 464–476 (2021) 46. Liang, H.: DRprofiling: deep reinforcement user profiling for recommendations in heterogenous information networks. IEEE Trans. Knowl. Data Eng. 34(4), 1723–1734 (2022) 47. Li, Z., Chen, H., Ni, Z., Deng, X., Liu, B., Liu, W.: ARPCNN: auxiliary review based personalized attentional CNN for trustworthy recommendation. IEEE Trans. Ind. Inf. 1, 1–11 (2022) 48. Lei, Y., Wang, Z., Li, W., Pei, H., Dai, Q.: Social attentive deep Q-networks for recommender systems. IEEE Trans. Knowl. Data Eng. 34(4), 2443–2457 49. Du, Z., Tang, J., Ding, Y.: POLAR++: active one-shot personalized article recommendation. IEEE Trans. Knowl. Data Eng. 33(6):2709–2722 (2019)
33 Deep Learning-Based Recommender Systems—A Systematic Review …
397
50. Hung, H., Luo, S., Tian, X., Yang, S., Zhang, X.: Neural explicit factor model based on item features for recommendation systems. IEEE Access 9, 58448–58454 (2021) 51. Zheng, X., Ni, Z., Zhong, X., Luo, Y.: Kernelized deep learning for matrix factorization recommendation system using explicit and implicit information. IEEE Trans. Neural Netw. Learn. Syst. 1, 1–12 (2022) 52. Feng, X., Liu, Z., Wu, W., Zuo, W.: Social recommendation via deep neural network-based multi-task learning. Expert Syst. Appl. 206 (2022), Article No. 117755
Chapter 34
Hybrid Security Against Black Hole and Sybil Attacks in Drone-Assisted Vehicular Ad Hoc Networks Aryan Abdlwhab Qader, Mohammed Hasan Mutar, Sameer Alani, Waleed Khalid Al-Azzawi, Sarmad Nozad Mahmood, Hussein Muhi Hariz, and Mustafa Asaad Rasol
Abstract Vehicular ad hoc networks (VANETs) are one among the trending research area, and it is utilized in maximum of the intelligent transmission system (ITS) based application. The major challenges which are present during the process of communication between the vehicles are delay due obstacles in ground level and packet loss due to attacks. As so to overcome this drawbacks, hybrid security-based drones-assisted VANETs are introduced in this paper. The proposed approach is based on three transmission models such as (i) vehicle to vehicle (V-V), vehicle to A. A. Qader (B) Department of Computer Technical Engineering, Bilad Alrafidain University College, Baghdad, Diyala 32001, Iraq e-mail: [email protected] M. H. Mutar Department of Computer Technical Engineering, College of Information Technology, Imam Ja’afar Al-Sadiq University, Baghdad, Al-Muthanna 66002, Iraq S. Alani University of Mashreq, Research Center, Baghdad, Iraq e-mail: [email protected] W. K. Al-Azzawi Department of Medical Instruments Engineering Techniques, Al-Farahidi University, Baghdad, Iraq e-mail: [email protected] S. N. Mahmood Computer Technology Engineering, College of Engineering Technology, Al-Kitab University, Altun Kupri, Iraq H. M. Hariz Department of Computer Techniques Engineering, Mazaya University College, Dhi-Qar, Annasiriyah 64001, Iraq M. A. Rasol Department of Medical Device Industry Engineering, College of Engineering Technology, National University of Science and Technology, Dhi Qar, Iraq e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_34
399
400
A. A. Qader et al.
drone (V-D), and drone to drone (D-D). Drones are launched in the VANET network to create the obstacles-free communication. Hybrid security is performed in two stages, at the initial stage—trusted flag-based security is provided to both the V-V and V-D communication models. Secondly, trust evaluation is performed in D-D communication through trust score calculation. To analyze the performance of the hybrid security approach, black hole and Sybil attacks are introduced in the network. The parameters which are considered for the performance analysis are packet delivery ratio, throughput, packet loss, and delay. The values are calculated, and it is compared with the RSAH method in terms of black hole and Sybil attack. Finally, the proposed hybrid security approach achieves (9%) higher packet delivery ratio, (200 Kbps) higher throughput, (350 packets) lower packet loss, and (150 ms) lower delay when compared with the earlier method.
34.1 Introduction VANETs are the group of vehicles which are highly used in ITS-based advanced communication system. The vehicles present in the VANETs are equipped with onboard unit (OBU) which helps to communicate with other devices [1–3]. VANETs perform vehicle-to-vehicle and vehicle-to-infrastructure-based data transmission using dedicated short-range communication (DSRC) protocol. The specialty of VANETs is that it can able to provide cost-effective, speedy transmission with security measures. Due to its inimitable characteristics, such as highly dynamic behavior, vehicle speed, and complex scenarios, effective communication becomes the biggest challenge in VANET due to connectivity issues, obstacles, and so on [4]. During communication, the network got suffered from delay and packet loss in maximum of the earlier VANETs-based researches [5]. Additionally, providing security to VANETs becomes an essential task in order to secure the network from the external threads [6]. Vehicle-to-vehicle communication becomes ineffective due to the presence of obstacles in the network as so to overcome this issue drone is introduced in VANETs that lead to create the obstacle-free network. Various types are routing protocols are assisted for drone communication such as geographical, topology, cluster, and geo-based routing but these protocols produce high delay and packet loss during the process of communication between the drones [7]. The most suitable routing protocol for drones is position-based routing protocols. The drones are updated by transmitting the periodical hello packets before communication. The main aim of the routing protocol is providing optimal communication services to the vehicles or drones in the network [8, 9]. From the above discussion, it is understood that the major drawbacks in recent VANETs techniques are delay occurred due to the presence of obstacle in ground-level communication and threads occurred due to lack of security in the network. In order to overcome this drawback, hybrid security is introduced in drone-assisted VANETs in this research work. The major contributions of this research are discussed below. The contribution of this study is: (i) Drone-assisted VANETs are introduced to improve the effective of communication
34 Hybrid Security Against Black Hole and Sybil Attacks …
401
by creating the obstacles-free network. (ii) To minimize the delay during communication drone-to-vehicle and drone-to-drone communication is performed. (iii) In order to improve the security of the network, hybrid security is introduced in VANETs that perform trust evaluation in both the vehicle and drone-to-drone communication. (vi) To improve network performance, the parameters which are considered for analysis are packet delivery ratio, throughput, delay, and packet loss. This research paper is based on the following sections: In Sect. 34.2, the earlier works related to drone-assisted VANETs are discussed. In Sect. 34.3, the network model is elaborated. In Sect. 34.4, the proposed hybrid security is explained. In Sect. 34.5, performance analysis is performed. In Sect. 34.6, the conclusion with future work is given.
34.2 Related Works In [10], proposed a UAV-assisted data dissemination scheduling strategy to predict the mobility pattern in VANETs. The algorithms that are used in this process are recursive least squares (RLS) algorithm and maximum vehicle coverage (MVC) algorithm. The simulation results produce good results in terms of delay and throughput but it fails to improve the packet delivery ratio. In [11], presented a robust routing scheme for high level of communication in VANETs which helps to increase the energy efficiency in UAVs communication. This method produces better reliability but fails to reduce the delay and packet loss in the network. In [12–14], developed efficient data scheduling schemes to enhance the VANETs quality of service (QoS). Graph theory is used to reduce the random data transmission, and a data scheduling scheme is used to reduce the delay. However, this method fails to produce high throughput and packet delivery ratio. In [15], presented a trust and priority-based drone-assisted Internet of vehicles (TPDA-IoV) to improve the overall performance of the network. The simulation results produce better results in terms of packet delivery ratio, packet loss, and delay. However, it fails to produce high throughput. In [16, 17], the authors created a hybrid communication between the vehicles to achieve effective communication. The results show better performance in terms of delay and throughput but it fails to achieve high packet delivery ratio and lower packet loss during communication. In [18], proposed a method to improve the trust value of the vehicles called a trustbased adaptive privacy-preserving authentication scheme. Through this method, variable trust values are provided to the vehicles according to its priority that helps to improve the security of the network. While this method yields favorable results in terms of packet delivery ratio, it falls short in achieving high efficiency and throughput. In [19], developed a hybrid optimization-based Deep Maxout Network (DMN) to secure the network from the external threads and attacks. A hybrid optimizationbased cluster head (CH) selection is performed that helps to achieve high efficiency and trust score but if produce more overhead during the process of communication in VANETs.
402
A. A. Qader et al.
In [20], presented an extended review about the performance of machine learning and artificial intelligence against attacks in VANETs. In [21], proposed a blockchainbased routing model with the help of Optimized Link State Routing (OLSR) protocol. Due to the Multi-point Relay (MPR) model, malicious vehicles are isolated in the network that increases the efficiency but this method is not suitable for high-speed VANETs. In [22], proposed a trust management model-based blockchain technology. For the process of vehicle trust evaluation, a hidden Markov model (HMM) is used. By using this method, the security of the network is increased but it fails to achieve high efficiency. In [23], presented a novel approach to improve the security and privacy in group communication of the network called privacy-preserving mutual authentication scheme. The performance is moderate, and it is not suitable for highspeed networks. In [24–26], investigated the performance of black hole and Sybil attack in VANET. As so to improve the performance of the VANETs in terms of security, RSA algorithm is proposed. RSA algorithm is capable of detecting the node of the information from the sender so that it can easily secure the information and the nodes from the attacker. It produces better results in terms of security with the combination of AODV routing protocol [27, 28]. But it fails to reduce the overhead produced during communication. As a result of analyzing the recent researches in VANETs, it understood that the network has an open research challenges in terms of reducing the delay and packet loss as well as it suffers from lack of security. In order to overcome these issues, these research drones are introduced in VANETs and to improve the network security hybrid optimization is performed [29].
34.3 Network Model The network transmission model is constructed in terms of the following modules such as (i) vehicle-to-vehicle (V-V) communication, (ii) vehicle-to-drone (V-D) communication, and (iii) drone-to-drone (D-D) communication with the final data collected trusted authority (TA). The principles of each module are described below.
34.3.1 Vehicle-to-Vehicle (V-V) Communication Vehicles are the transportation devices which are equipped with an OBU and the tamperproof device (TPD) which is responsible for sensing and storing the data. Each vehicle maintains its own coverage space, and any vehicle can able to transfer the data to other which is present in its coverage area. In two situations, drones are used to transfer the information from the source to the destination. They are (i) if the source is far away from the destination and (ii) if more obstacles present in the path.
34 Hybrid Security Against Black Hole and Sybil Attacks …
403
34.3.2 Vehicle to Drone (V-D) Communication: Drones have the special authority to communicate with any vehicles without the intervention of the TA. Hence, TA acts as a data provider as well as the drone monitors. It provides all the facilities like energy, maximum density, and decision-making capacity to drones to handle the communication in the network. There is no direct communication between the vehicles and the TA; hence, drones are obstacles-free data collectors the transmission is only through the drones at any instant of time.
34.3.3 Drone-to-Drone (D-D) Communication In order to provide obstacles-free communication between the vehicles drones are introduced in VANETs, and it helps to protect the network from traffic and link failure, etc. Using drones, the stability of the network is greatly increased. The routing in drones is not similar to the vehicle routing model. Vehicles carry topology-based routing, and drones use position-based routing to perform data transmission. Drones are responsible to collect the required information and details about the road segments and keep informed to the TA.
34.3.4 Thread Model The threads which are incorporated in this network are black hole attack and Sybil attack. In general, black hole attack demolishes the vehicle connectivity. Before transferring the data, the source vehicle transmits the request packet to find the path to the destination. At the time, the attacker acts like an optimal intermediate by providing false reply once the source starts transmitting the data the attacker drops all of it. Sybil attack provides fake identity as a destination, and it misuses the data.
34.4 Hybrid Security in VANETs In order to secure the network from current threads hybrid security is introduced in VANETs. Here security is provided in three levels. They are (i) security during V-V communication, (ii) security during V-D communication, and (iii) security during D-D communication. The proposed network model with the hybrid security terms is shown in Fig. 34.1. The hybrid security process is detailed below.
404
A. A. Qader et al.
Fig. 34.1 Proposed network model with the hybrid security
34.4.1 Security During V-V and V-D Communication The security method which is used for both the V-V and V-D communication is similar. The only difference between these two communication models is V-V follows topology-based communication and V-D follows position-based communication. Maximum of the attacks happens on ground level so to protect the network from the black hole and Sybil attack the trusted flag-based security approach is used between V-V and V-D communication. At the initial stage, each vehicle is equipped with certain trusted flag values. These flag values are dynamic in nature according to the distance between the drones. When the vehicles move nearer to drones, the trusted flag value will increase. The vehicles which are present far away from the drones maintain lower flag trust value and it varies from (0 to 2). Before path selection, the vehicle checks the flag trust value of the neighbor vehicles. In general, the vehicle with zero value is neglected. The vehicles with the flag trust values of 1 and 2 are chosen, and then the RREQ packets are transmitted to the other vehicle or drone according to the situation. Once after receiving the RREQ packet, the hop node 1 transmits the RREQ packet to the next hop and it continues until it reaches the destination. At the end, the destination varies the flag trust values and it should satisfy the condition (flag trust values > 0) and then it transmits the RREP to the required vehicle and the details are stored in the routing table. The step-by-step process of trusted flag-based security approach is given in the pseudocode below. Process Initiates, Before route discovery process set T flag = (0,2) randomly for all nodes, Process route discovery, with conditions, T flag = 0; (node neglected) T flag = 1; (normal rust)
34 Hybrid Security Against Black Hole and Sybil Attacks …
405
T flag = 2; (maximum trust) Send RREQ packet with Tflag = 1 and Tflag = 2; Neighbor Node forwards the RREQ to the next hop; Verification of (Tflag < 0) if condition satisfied; Hop node transfer the RREQ to the destination; Destination check the condition (Tflag < 0); Broadcast the RREP; Link established with high security; End process.
34.4.2 Security During D-D Communication Trust estimation is performed during the process of initial state of route establishment among the drones. Each drone maintains its own trust score. According to the drone performance trust score will increase. The drone with high trust score holds the priority of data transmission. The process of providing security during route establishment is explained in an algorithmic way below: Start { Network coverage area (1500m*1500m) Drone Count = 3 (D0 to D2 ) Sender Drone (D Rs ) Destination Drone (D Rd ) Attacker = 10 (W0 to W4 ) and (B0 to B4 ) Drone range = 800m Drone Trust Score Tscor e { Initiate route establishment based Tscor e (); D Rs → R R E Q (); D Rd → R R E P (); { //Blackhole Attack Initiated If (attacker transmit false RREP to D Rs ) { Hop=Hop+1; } { if (data received) { data discarded }} { //Sybil Attack Initiated
406
A. A. Qader et al.
If (attacker (A) create false identity to D Rs ) { Data transmitted from D Rs → A A collects the packets and misuse it } { // Drone attack discovery D Rs verifies the Tscor e (); Check the condition Tscor e > 0; If condition satisfied; { D Rs → R R E Q(); D Rd → R R E P(); } Else { Discard the communication link} } }
34.5 Performance Analysis The network construction, attack creation, and security are performed in the simulation platform network simulator (NS-2) version 2.35. NS2 is the combination of two key languages such as Object-oriented Tool Command Language (front end) used for assembling and configuration process and C++ (back end) which is used to construct the internal mechanism [30]. Trace files are used to analysis the outputs. The parameters which are considered for the performance analysis are packet delivery ratio, throughput, packet loss, and delay with respect to black hole and Sybil attacks. The performance of the proposed hybrid security method is compared with the earlier works RSA security [16, 25, 26, 30, 31] in terms of black hole and Sybil attacks. The simulation parameter table is shown in Table 34.1.
34.5.1 Packet Delivery Ratio The packet delivery ratio performance of the proposed HSV approach and RSAH approach in terms of black hole attack and Sybil attack is shown in Fig. 34.2. The packet delivery ratio achieved by the calculated methods for one to ten attackers are RSAH-BL (varies from 86 to 60%), RSA-SY (varies from 89 to 64%), HSV-BL (varies 95% to 88%), and HSV-SY (varies 97% to 91%). The packet delivery ratio achieved by the proposed HSV approach is around 9% better than the RSAH
34 Hybrid Security Against Black Hole and Sybil Attacks … Table 34.1 Simulation table
407
Input parameters
Values
Operating system
Ubuntu 16.04
NS version
NS-2.35, SUMO-1.1.0
Running time
100 ms
Network size
1000 × 1000 m
No of vehicles
100 vehicles
Attackers
10 Attackers
Antenna type
Omnidirectional antenna
Propagation model
Two-ray ground model
Queue type
DropTail
Traffic flow
CBR
Traffic agent
UDP
Speed
50 km/hr
Transmission power
0.500 J
Receiving power
0.050 J
Connection
Multiple
Packet size
510 Bytes
Fig. 34.2 Attacks versus packet delivery ratio
approach in terms of both the attackers. Usage of drones provides a way for obstaclesfree transmission and hybrid security is used to protect the network from attacks. Table 34.2 shows the values of packet delivery ratio in terms of varying number of attackers.
408
A. A. Qader et al.
Table 34.2 Packet delivery ratio values No of attacker
RSAV-BL
RSAV-SY
HSV-BL
HSV-SY
1
86.25
89.15
95.23
97.21
2
85.13
88.25
94.23
96.12
3
82.14
85.28
93.96
95.65
4
78.25
80.47
93.13
95.14
5
77.46
79.16
92.14
93.56
6
68.24
75.44
91.46
93.14
7
65.14
74.24
89.13
92.87
8
61.28
72.14
88.47
92.21
9
60.23
65.28
88.14
91.45
10
60.11
64.23
88.02
91.02
34.5.2 Throughput The throughput calculation of the proposed HSV approach and RSAH approach in terms of black hole attack and Sybil attack is shown in Fig. 34.3. The throughput attained by the calculated methods for one to ten attackers are RSAH-BL (reduces from 455 to 239 Kbps), RSA-SY (reduces from 476 to 246 Kbps), HSV-BL (reduces from 685 to 452 Kbps), and HSV-SY (reduces from 725 to 496 Kbps). The throughput attained by the proposed HSV approach is around 200 Kbps better than the RSAH approach in terms of both the attackers. Through hybrid security approach, network throughput is increased. Table 34.3 shows the values of throughput in terms of varying number of attackers. Fig. 34.3 Attacks versus throughput
34 Hybrid Security Against Black Hole and Sybil Attacks …
409
Table 34.3 Throughput values No of attacker
RSAV-BL
RSAV-SY
HSV-BL
HSV-SY
1
455.23
476.74
685.47
725.65
2
425.13
465.12
654.85
702.14
3
368.47
449.47
624.13
672.14
4
352.74
401.39
586.17
645.14
5
325.41
385.14
552.34
596.46
6
296.16
349.17
496.47
574.23
7
271.49
302.46
482.46
531.74
8
268.47
285.85
476.85
529.65
9
246.13
264.13
461.29
501.24
10
239.74
246.22
452.11
496.12
34.5.3 Packet Loss The packet loss calculation of the proposed HSV approach and RSAH approach in terms of black hole attack and Sybil attack is shown in Fig. 34.4. The packet loss attained by the calculated methods for one to ten attackers is RSAH-BL (increases from 214 to 564 packets), RSA-SY (increases from 186 to 524 packets), HSV-BL (increases from 22 to 254 packets), and HSV-SY (increases from 15 to 196 packets). The packet loss attained by the proposed HSV approach is around 350 packets lower than the RSAH approach in terms of both the attackers. Through hybrid security approach loss of packets is reduced. Table 34.3 shows the values of packet loss in terms of varying number of attackers (Table 34.4). Fig. 34.4 Attacks versus packet loss
410
A. A. Qader et al.
Table 34.4 Packet loss values based on number of attackers No of attacker
RSAV-BL
RSAV-SY
HSV-BL
HSV-SY
1
214
186
22
15
2
245
208
49
42
3
268
247
65
58
4
324
286
71
68
5
334
302
86
75
6
375
351
127
96
7
419
385
154
110
8
457
421
186
153
9
524
486
214
181
10
564
524
254
196
Fig. 34.5 Attacks versus delay
34.5.4 Delay The delay calculation of the proposed HSV approach and RSAH approach in terms of black hole attack and Sybil attack is shown in Fig. 34.5. The delay produced by the current methods which are used for the analysis with one to ten attackers is RSAH-BL (increases from 52 to 246 ms), RSA-SY (increases from 32 to 219 ms), HSV-BL (increases from 14 to 102 ms), and HSV-SY (increases from 6 to 96 ms). The delay produced by the proposed HSV approach is around 150 ms lower than the RSAH approach in terms of both the attackers. Hybrid security approach greatly helps to reduce the delay of the network. Table 34.3 shows the packet loss values (Table 34.5).
34 Hybrid Security Against Black Hole and Sybil Attacks …
411
Table 34.5 Packet loss values No of attacker
RSAV-BL
RSAV-SY
HSV-BL
HSV-SY
1
52.14
32.74
14.22
6.23
2
69.46
49.24
25.38
16.85
3
81.47
61.32
31.56
22.46
4
96.74
73.46
42.74
35.29
5
105.79
82.74
58.63
41.74
6
154.47
148.74
65.27
52.74
7
172.86
169.47
78.34
65.47
8
201.74
185.45
81.76
73.58
9
224.75
196.47
95.14
92.46
10
246.24
219.47
102.47
96.33
34.6 Conclusion This research presents a hybrid security approach to drone-assisted VANET networks. Drones are introduced in VANETs in order to reduce the delay through obstacles-free communication between the sources to the destination. To improve the security in the VANETs trust establishment in performed each model of communication such as V-V, V-D, and D-D. V-V and V-D follow similar trusted flag-based security; hence, it follows topology and position-based routing, respectively. For DD communication, security is provided through trust score calculation. Black hole and Sybil attackers are capturing the network to perform malfunction. As the result of using drones as well as providing hybrid security in VANETs, delay is reduced in ground-level communication and packet loss is reduced by providing security. Through simulation analysis, the values are measured, and it is compared with the RSAH method in terms of black hole and Sybil attack. As a final point, the proposed hybrid security approach achieves (9%) higher packet delivery ratio, (200 Kbps) higher throughput, (350 packets) lower packet loss, and (150 ms) lower delay when compared with the earlier method. In the future direction, the proposed system is applied to densely populated area to analyze the performance.
References 1. Abbas, A.H., Ahmed, A.J., Rashid, S.A.: A cross-layer approach MAC/NET with updated-GA (MNUG-CLA)-based routing protocol for VANET network. World Electr. Veh. J. 13(5), 87 (2022) 2. Mansour, H.S., Mutar, M.H., Aziz, I.A., Mostafa, S.A., Mahdin, H., Abbas, A.H., Jubair, M.A.: Cross-layer and energy-aware AODV routing protocol for flying Ad-Hoc networks. Sustainability 14(15), 8980 (2022)
412
A. A. Qader et al.
3. Malik, R.Q., Ramli, K.N., Kareem, Z.H., Habelalmatee, M.I., Abbas, A.H., Alamoody, A.: An overview on V2P communication system: Architecture and application. In: 2020 3rd International Conference on Engineering Technology and its Applications (IICETA) (pp. 174–178). IEEE (2020). 4. Naseer Qureshi, K., Moghees Idrees, M., et al.: Self-assessment based clustering data dissemination for sparse and dense traffic conditions for internet of vehicles. IEEE Access 8, 10363–10372 (2020) 5. Habelalmateen, M.I., Abbas, A.H., Audah, L., Alduais, N.A.M.: Dynamic multiagent method to avoid duplicated information at intersections in VANETs. TELKOMNIKA (Telecommun. Comput. Electron. Control) 18(2), 613–621 (2020) 6. Qureshi, K.N., Bashir, F., et al.: Link aware high data transmission approach for internet of vehicles. In: International Conference on Computer Applications & Information Security (ICCAIS) (2019) 7. Amirshahi, A., Romoozi, M., et al.: Modeling geographical anycasting routing in vehicular networks. KSII Trans. Internet Inf. Syst. (TIIS) 14(4), 1624–1647 (2020) 8. Dias, J.A., Rodrigues, J.J., et al.: Network management and monitoring solutions for vehicular networks: a survey. Electronics 9(5) (2020) 9. Jubair, M.A., Hassan, M.H., Mostafa, S.A., Mahdin, H., Mustapha, A., Audah, L.H., Abbas, A.H.: Competitive analysis of single and multi-path routing protocols in mobile Ad-Hoc network. Indonesian J. Electr. Eng. Comput. Sci. 14(2) (2019) 10. Zeng, F., Zhang, R., et al.: UAV-assisted data dissemination scheduling in VANETs. In: IEEE International Conference on Communications (ICC) (2018) 11. Oubbati, O.S., Lakas, A., et al.: Leveraging communicating UAVs for emergency vehicle guidance in urban areas. IEEE Trans. Emerg. Top. Comput.Comput. 9(2), 1070–1082 (2021) 12. Fan, X., Liu, B., et al.: Utility maximization data scheduling in drone-assisted vehicular networks. Comput. Commun.. Commun. 175, 68–81 (2021) 13. Abbas, A.H., Habelalmateen, M.I., Audah, L., Alduais, N.A.M.: A novel intelligent clusterhead (ICH) to mitigate the handover problem of clustering in VANETs. Int. J. Adv. Comput. Sci. Appl. 10(6) (2019) 14. Abbas, A.H., Audah, L., Alduais, N.A.M.: An efficient load balance algorithm for vehicular ad-hoc network. In: 2018 Electrical Power, Electronics, Communications, Controls and Informatics Seminar (EECCIS) (pp. 207–212). IEEE (2018) 15. Qureshi, K.N., Alhudhaif, A., et al.: Trust and priority-based drone assisted routing and mobility and service-oriented solution for the internet of vehicles networks. J. Inf. Secur. Appl. 59 (2021) 16. Oubbati, O.S., Lakas, A., et al.: Intelligent UAV-assisted routing protocol for urban VANETs. Comput. Commun.. Commun. 107(15), 93–111 (2017) 17. Abbas, A.H., Mansour, H.S., Al-Fatlawi, A.H.: Self-adaptive efficient dynamic multi-hop clustering (SA-EDMC) approach for improving VANET’s performance. Int. J. Interact. Mob. Technol. 17(14) (2022) 18. Zhang, S., Liu, Y., et al.: A trust based adaptive privacy preserving authentication scheme for VANETs. Veh. Commun. 37 (2022) 19. Kaur, G., Kakkar, D.: Hybrid optimization enabled trust-based secure routing with deep learning-based attack detection in VANET. Ad Hoc Netw. (2022) 20. Junejo, M.H., Rahman, A.A.-H.A.: Lightweight trust model with machine learning scheme for secure privacy in VANET. 194, 45–59 (2021) 21. Inedjaren, Y., Maachaoui, M., et al.: Blockchain-based distributed management system for trust in VANET. Veh. Commun. 30 (2021) 22. Liu, H., Han, D., Li, D.: Behavior analysis and blockchain based trust management in VANETs. J. Parallel Distrib. Comput. 151, 61–69 (2021) 23. Nath, H.J., Choudhury, H.: A privacy-reserving mutual authentication scheme for group communication in VANET. Comput. Commun. 192, 357–372 (2022) 24. Shah, P., Kasbe, T.: Detecting sybil attack, black hole attack and DoS attack in VANET using RSA algorithm. Emerg. Trends Ind. 4.0 (ETI 4.0) (2021)
34 Hybrid Security Against Black Hole and Sybil Attacks …
413
25. Hassan, M.H., Jubair, M.A., Mostafa, S.A., Kamaludin, H., Mustapha, A., Fudzee, M.F.M., Mahdin, H.: A general framework of genetic multi-agent routing protocol for improving the performance of MANET environment. IAES Int. J. Artif. Intell. 9(2), 310 (2020) 26. Hamdi, M.M., Flaih, A.F., Jameel, M.L., Mustafa, A.S., Abdulelah, A.J., Jubair, M.A., Ahmed, A.J.: A study review on gray and black hole in mobile Ad Hoc networks (MANETs). In: 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), pp. 1–6. IEEE (2022) 27. Obaid A.J.: Wireless sensor network (WSN) routing optimization via the implementation of fuzzy ant colony (FACO) algorithm: Towards enhanced energy conservation. In: Kumar, R., Mishra, B.K., Pattnaik, P.K. (eds) Next Generation of Internet of Things. Lecture Notes in Networks and Systems, vol 201. Springer, Singapore (2021). https://doi.org/10.1007/978-98116-0666-3_33 28. Regin, R., Obaid, A.J., Alenezi, A., Arslan, F., Gupta, A.K., Kadhim, K.H.: Node replacement based energy optimization using enhanced salp swarm algorithm (Es2a) in wireless sensor networks. J. Eng. Sci. Technol. 16(3), 2487–2501 (2021) 29. Abdulsattar, N.F., Hassan, M.H., Mostafa, S.A., Mansour, H.S., Alduais, N., Mustapha, A., Jubair, M.A.: Evaluating MANET technology in optimizing IoT-based multiple WBSN model in soccer players health study. In: International Conference on Applied Human Factors and Ergonomics. Springer, Cham (2022) 30. Ali, R.R., Mostafa, S.A., Mahdin, H., Mustapha, A., Gunasekaran, S.S.: Incorporating the Markov Chain model in WBSN for improving patients’ remote monitoring systems. In: International Conference on Soft Computing and Data Mining, pp. 35–46. Springer, Cham (2020) 31. Mostafa, S.A., Ramli, A.A., Jubair, M.A., Gunasekaran, S.S., Mustapha, A., Hassan, M.H.: Integrating human survival factor in optimizing the routing of flying Ad-hoc networks in search and rescue tasks. In: International Conference on Applied Human Factors and Ergonomics. Springer, Cham (2022)
Chapter 35
Optimization of Metal Removal Rate, Surface Roughness, and Hardness Using the Taguchi Method in CNC Turning Machine Zahraa N. Abdul Hussain and Mohammed Jameel Alsalhy
Abstract Due to the widespread use of automation in industrial operations and cutting machines, the manufacturing process requires high reliability modeling methods to predict output during operation. In this study, Taguchi method and regression analysis have been executed to investigate the influence of some machining parameters like cutting speed, feed rate, and depth of cut on the surface coarseness, material removal rate, and hardness in CNC machining of steel AISI 1025. Different experiments were carried out using L25 by (CNC) machining. Analysis of variance (ANOVA) was applied to evaluate the impact of machining parameters on coarseness material removal rate and hardness. The results of analysis indicate that the feed rate, cutting speed, and depth of cut were the dominant parameters affecting on surface roughness (Ra), hardness, and metal removal rate (MRR), respectively; in addition, the experimental values and predicted value are very close each others.
35.1 Introduction Medium carbon steel (AISI 1025) is the most common type of carbon steel because its price is relatively low at the same time it is influenced by many mechanical applications. Medium carbon steel has carbon ratios (0.16–0.29), so it is not brittle and elastic. It also has a wide range of applications, including construction of pipelines, electrical appliances, automotive, railway, doors, windows, and other wide-ranging applications [1].
Z. N. A. Hussain (B) · M. J. Alsalhy Department of Medical Devices Technology Engineering, College of Engineering Technology, National University of Science and Technology, Dhi Qar, Iraq e-mail: [email protected] M. J. Alsalhy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_35
415
416
Z. N. A. Hussain and M. J. Alsalhy
35.2 Literature Review Nalbant et al. [2] applied the Taguchi method to find out the variables of optimal cutting for surface roughness (Ra) in turning process. The orthogonal array, the signal-to-noise ratio (S/N) utilized the analysis of variance to research the performance characteristics in the tuning process of steel bars (AISI 1030) utilizing tincoated tools. The results of this experimental work have been presented to show the performance of this tactic. Cemal et al. [3] investigated the effect of both cutting speed, cutting depth, and feed rate on surface roughness during the conversion process of steel AISI P20 using two types of carbide cutting in the same geometrical shape but with different coating layers The experimental data was applied to get a mathematical model. The results of experimental work showed that the feeding rate has the greatest effect on surface roughness after cutting speed also has a significant effect while the cutting depth has no important effect on surface roughness. Mandal et al. [4] utilized the Taguchi process and regression analysis to evaluate the ability of steel (AISI 4340) to be mechanically processed together newly developed zirconia-toughened alumina ceramic inserts. The results showed that the depth of cutting has an important effect on the wear tool. Shetty et al. [5] this study was conducted to investigate the effect of both the feed rate and the speed of cutting on surface roughness during the conversion of carbon steel. The alloys of carbon steel were of five types and converted through a CNC machine by utilizing a carbide cutting tool. The five types were EN47, EN24, EN19, EN8, and SAE8620. The outcomes are clear whenever cutting speed increases, the surface roughness is decreased while increasing roughness by increasing the feeding rate of all alloys absorbed in the search. Guo et al. [6] provided a mathematical model to improve surface quality and energy consumption through tuning process of aluminum (AlCuMgPb) with carbide cutting tools and steel (11SMnPb30), the speed of cutting, depth of cutting, and feed rate input variables in the suggested models. The outcomes of experimental work clear that the overall energy for aluminum and steel decreases with increasing feed rate and cutting depth. While surface roughness decreases with increased cutting speed, it is increased by increasing the feed rate and depth of the cut. Yi et al. [7] studied the effect of cutting conditions (feed rate, cutting speed, and cut depth) and the levels of different cutting fluids on surface roughness and wear of the tool in turning of mild steel on CNC machine. They presented the Taguchi—best selection for the parameter of cutting, and they applied ANOVA to analyze variance in the results extracted. It was noted from the results that the feed rate had an effective impact on surface roughness (increased surface roughness with increased feed rate) while the cutting speed has an effective effect on the wear tool and the rate flow of the cutting fluid appeared an obvious effect on the wear tool and surface roughness. Qehaja et al. [8] (2015) studied the effect of parameters of machines such as machining time, nose radius, and feeding rate on surface roughness (Ra). The purpose of this study is to improve the experimental model to prognosticate the surface roughness (Ra) through dry tuning of cold-rolled steel C62D. It has been observed through the results that the cutting time and nose radius have little effect compared with the feeding rate.
35 Optimization of Metal Removal Rate, Surface Roughness, and Hardness … Table 35.1 Chemical composition of AISI 1025 carbon steel in wt%
Sample
Shaft ∅ = 35
Sample
Shaft ∅ = 35
C%
0.248
Mo %
0.355
Si %
0.233
Ni %
0.0847
Mn %
0.832
Al %
0.0201
P%
0.012
Cu %
0.1940
S%
0.012
Fe %
Balance
Cr %
0.847
417
Hameed et al. [9] developed two mathematical models to predict surface roughness and tool temperature regarding spindle speed, rating of feed, and cutting depth. LAB Fit software was used to achieve these equations. The specimen that was used in the tuning and carbides cutting tools was steel (AISI 1045). Genetic algorithm was used to find the optimum cutting conditions. The results cleared that increasing the cutting depth and speed of the spindle leads to decrease in the surface roughness while increasing the feed rate leads to increasing the surface roughness and increasing the parameter of cutting (cutting depth, feed rate, and spindle speed) increases the tool temperature.
35.3 Experimental Methods 35.3.1 CNC Turning Machine As a result of the wide development in cutting machines and industrial operations, manufacturing processes must be done with high precision and with minimum time and cost. In the current work, samples were run experiments on CNC machine because it provides accuracy when operating the chemical composition of the workpiece material is given in Table 35.1 (Fig. 35.1).
35.3.2 Operating Conditions and Cutting Tools The samples of the experiments were cylindrical in diameter with a diameter of 35 mm and a length of 10‘0 mm. Lathe tests were performed at five different cutting speeds (950, 1150, 1550, 1850, and 2150 rpm) and five feed rates (0.01, 0.05, 0.1, 0.2, 0.3 mm/rev) while the depth of cut was (0.5, 1, 1.5, 2, and 2.5 mm). These conditions were arranged using the Minitab program. The cutting tool used in the experiments was the carbide cutting shown in Fig. 35.2.
418
Z. N. A. Hussain and M. J. Alsalhy
Fig. 35.1 CNC turning machine of FANUC (Series Oi mate-TC) Fig. 35.2 Carbide cutting used in the experiment
35.3.3 Measurements Extracted from Experiments Average roughness (Ra) was calculated using a device (PCE-RT 1200) as shown in Fig. 35.3; the cutoff length was fixed at 0.8 mm. The roughness was measured for three different points; the average roughness of the extracted values was taken. The metal removal rate (MRR) and tool wear were calculated by taking the difference between the weights before and after the operation. Also, the hardness was measured using the Rockwell device as shown in Fig. 35.4. As for the fonts and the sizes of the headings, this manuscript in itself constitutes a good example.
35 Optimization of Metal Removal Rate, Surface Roughness, and Hardness …
419
Fig. 35.3 Device used to measure surface roughness Fig. 35.4 Rockwell device for measuring hardness
35.4 The Experimental Design and Optimization 35.4.1 The Design of Experiments and Taguchi Method The tests for this work are striped using Taguchi’s design of experiments (DOE). The design of Taguchi’s technology provides the engineer with a systematic and efficient way to determine the optimal design and cost. This method can significantly reduce the number of experiments required for all required data. In the present study, the control factors that were selected in this study were the depth of cutting, cutting depth (v), and feed rate (f). Their levels were obtained clearly in Table 35.2. The most appropriate orthogonal array L25 was selected to determine the optimal cutting
420
Z. N. A. Hussain and M. J. Alsalhy
parameters and to analyze the impact of machining parameters [10, 11]. The L25 mixed orthogonal array shown in Table 35.3 was used for conducting the experiments. The Taguchi technique utilizations a loss function to calculate the aberration between required values and experimental values. This loss function is transformed into signal-to-noise (S/N) ratio (ï). Generally, there are three types of qualitative characteristics in the analysis of signal-to-noise (S/N), viz. the higher-the-better, Table 35.2 Turning parameters and their levels Parameters
Symbol
Level1
Level2
Level3
Level4
Level5
Cutting speed (m/min)
A
107.44
141.37
175.31
209.23
243.16
Feed rate (mm/rpm)
B
0.001
0.005
0.1
0.2
0.3
Depth of cut (mm)
C
0.5
1
1.5
2
2.5
Table 35.3 Design of experimental using an l25 orthogonal array by Taguchi method
Experiment no
Factor A
Factor B
Factor C
1
1
1
1
2
1
2
2
3
1
3
3
4
1
4
4
5
1
5
5
6
2
1
2
7
2
2
3
8
2
3
4
9
2
4
5
10
2
5
1
11
3
1
3
12
3
2
4
13
3
3
5
14
3
4
1
15
3
5
2
16
4
1
4
17
4
2
5
18
4
3
1
19
4
4
2
20
4
5
3
21
5
1
5
22
5
2
1
23
5
3
2
24
5
4
3
25
5
5
4
35 Optimization of Metal Removal Rate, Surface Roughness, and Hardness …
421
the nominal-the-best, and the lower-the-better. Each type depends on the goal of the study meaning when it is desired the minimize we choose “lower-the-better” or when the goal is maximized then we can choose “higher-the-better.” The goal of this study was to maximize metal removal rate (MRR) and hardness and minimize surface roughness (Ra) and tool wear.
35.5 Analysis and Estimation of Experimental Results 35.5.1 Analysis of the Signal-To-Noise (S/N) Ratio The experimental design for each input factor utilizing the Taguchi technique employed for measuring the surface roughness, wear performance, metal removal rate, and hardness, the optimization process for the control parameters (input parameters) that measured were submitted by signal-to-noise (S/N) ratios [5]. The lowest value of surface roughness and tool wear are very significant for quality as it reduces the cost and improve the product. For this reason, the “lower-the-better” equation was used for the computation of the S/N ratio [12]. Table 35.4 displays the values of the S/N ratios for notes of the metal removal rate (MRR), surface roughness (Ra), and hardness. On the other hand, because of the obvious effect of the highest value of hardness and metal removal rate (MRR) in improving the product and reducing the cost, the equation of “higher-the-better” was used to estimate the S/N ratio. where “The lower-the-better” S/N = − log((1/n) (y 2 ) (35.1) “The higher-the-better” S/N = − log((1/n)
1 y2
,
(35.2)
where yi is the observed data at the ith experiment and n is the number of observations of the experiment [13]. An analysis was conducted to study the effect of input parameters on surface roughness, hardness, and metal removal rate was performed with a S/N response table; the response tables of S/N for Ra, hardness, and MRR are shown in Table 35.5. These tables were made using Taguchi technique to know the optimal levels of input parameters to get the best responses (Ra, hardness, and MRR). The level values of control factors for Ra, hardness, and MRR given in Table 35.6 are shown in Figs. 35.5, 35.6, and 35.7. Optimal machining parameters of the control factors for minimizing the surface roughness can be easily determined from these graphs. The optimum level for each control factor was established according to the lowest S/N ratio in
107.44
107.44
107.44
107.44
141.37
141.37
141.37
141.37
141.37
175.31
175.31
175.31
175.31
175.31
209.23
209.23
209.23
209.23
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
0.2
0.1
0.05
0.01
0.3
0.2
0.1
0.05
0.01
0.3
0.2
0.1
0.05
0.01
0.3
0.2
0.1
0.05
0.01
1
0.5
2.5
2
1
0.5
2.5
2
1.5
0.5
2.5
2
1.5
1
2.5
2
1.5
1
0.5
1.889
1.179
1.466
1.255
2.246
1.885
1.977
1.444
1.332
2.188
1.996
1.988
1.365
1.348
2.422
2.112
2.011
1.356
1.318
Surface roughness (Ra)
107.44
Output parameters
Speed cutting (m/ Feed rate (mm/ min) rpm)
Depth of cut (mm)
Input parameters
1
No
Table 35.4 Results of tests and S/N ratios values for RA, MRR, and hardness
50.145
−5.5246
70.292
−3.1913
50.498
64.5
−2.4900
81.179
37.197
−6.8009
−1.4327
79.353
−6.0032
−3.3226
69.149
−5.9683
71.8
62.495
−2.7026
−1.9728
53.5
−2.5938
48.425
78.749
−7.6834
47.133
69.035
−6.4938
−7.0282
60.803
−6.0682
−5.5062
51.644
−2.6451
80.17
42.9
−2.3983
–−5.9201
MRR (g)
S/N ratio for Ra
34.004
34.065
38.188
37.122
33.701
33.466
38.080
36.938
36.191
31.410
37.991
36.795
35.916
34.567
37.924
36.781
35.678
34.260
32.649
S/N ratio for MRR
36
37
38
39.5
33.7
34.6
35
36
36.5
32.8
33.6
34
35
35.5
32.4
33
33.5
34.5
35
Hardness
(continued)
31.126
31.364
31.595
31.931
30.552
30.781
30.881
31.126
31.245
30.317
30.526
30.629
30.881
31.004
30.210
30.370
30.500
30.756
30.881
S/N ratio for H
422 Z. N. A. Hussain and M. J. Alsalhy
243.16
243.16
243.16
243.16
243.16
21
22
23
24
25
0.3
0.2
0.1
0.05
0.01
0.3
2
1.5
1
2.5 0.5
1.5
2.422
1.512
1.258
1.127
1.224
2.244
Surface roughness (Ra)
209.23
Output parameters
Speed cutting (m/ Feed rate (mm/ min) rpm)
Depth of cut (mm)
Input parameters
20
No
Table 35.4 (continued) MRR (g) 60.196 82.416 41.906 50.669 60.237 67.23
S/N ratio for Ra −7.0204 −1.7556 −1.0384 −1.9977 −3.5910 −7.6834
36.551
35.597
34.094
32.445
38.320
35.591
S/N ratio for MRR
36.2
37
38.2
40
41.5
35.5
Hardness
31.174
31.364
31.641
32.041
32.361
31.004
S/N ratio for H
35 Optimization of Metal Removal Rate, Surface Roughness, and Hardness … 423
424
Z. N. A. Hussain and M. J. Alsalhy
the levels of that control factor. According to this, the levels and S/N ratios for the factors giving the best Ra value were specified as factor A (Level 5, S/N = −3.213), factor B (Level 1, S/N = −2.242), and factor C (Level 1, S/N = −3.435). Optimal machining parameters of the control factors for maximizing the hardness and MRR also can be easily calculated from these diagram. The levels and S/N ratios for the factors giving the best hardness value were specified as factor A (Level 5, S/N = 31.72), factor B (Level 1, S/N = 31.48), and factor C (Level 5, S/N = 31.12) as for MRR factor A (level 4, S/N = 35.79), factor B (Level1, S/N = 35.77), and factor C (level, S/N = 38.10) [14, 15].
35.5.2 ANOVA Method ANOVA is a statistical procedure which is utilized to estimate the individual interactions of all of the input parameters (control factor) in the test design. In this paper, ANOVA was utilized to analyze the impacts of cutting speeds, feeding rate, and depth of cut on surface roughness, tool wear, hardness, and MRR. The ANOVA results for the surface roughness, hardness, MRR are shown in Table 35.6. The importance of input parameters (control factors) in ANOVA is determined by comparing the F values of all input factors. F values for A, B, and C factors for the roughness of the surface were 3.50, 30.76, and 3.01, respectively. Thus, the most significant factor affecting the surface roughness was feed rate (factor B, 30.76). Whereas F values for A, B, and C factors for hardness were 87.92, 40.12, and 1.18, respectively. Thus, the most significant factor affecting the hardness was cutting speed (factor A, 87.92) while the values of F for A, B, and C factors for MRR were 1.12, 2.92, and 195.49. This means that the most important factor affecting the rate of metal removal was depth.
35.5.3 Regression Analysis of Hardness, Surface Roughness, and MRR Regression analyses are utilized for the modeling and analyzing of various variables where there is connection between one or more independent variables and a dependent variable [5]. In this study, the dependent variables are surface roughness (Ra), hardness, and MRR, whereas the independent variables are cutting speed (S), feed rate (f), and depth (D). In obtaining predictive equations for the surface roughness, hardness, and MRR, regression analysis was used. These predictive equations were made for both quadratic and linear regression models. The predictive equations which were found by the linear regression model of surface roughness, hardness, and MRR are given below:
30.67
30.92
31.40
31.72
1.17
1
2
3
4
5
Delta
Rank
2
0.83
30.65
30.83
31.00
31.28
31.48
Depth (B)
3
0.12
31.12
31.05
31.00
31.02
31.08
3
0.46
35.40
35.79
35.68
35.34
35.46
Speed (A)
Feed (B)
Speed (A)
30.54
MRR
Hardness
Control factors
1
Level
Table 35.5 S/N response table for hardness, MRR, and RA
2
0.73
35.04
35.57
35.74
35.55
35.77
Feed (B)
1
5.29
38.10
36.84
35.80
34.80
32.81
Depth (B)
−2.580 −4.277 −5.424 −7.243
−4.814 −4.827 −3.855 −3.213 2
1
5.001
−2.242
−5.058
1.845
Feed (B)
Speed (A)
Surface roughness (Ra)
3
1.627
−4.937
−5.062
−3.958
−3.958
−3.435
Depth (B)
35 Optimization of Metal Removal Rate, Surface Roughness, and Hardness … 425
426
Z. N. A. Hussain and M. J. Alsalhy
Table 35.6 Results of ANOVA for surface roughness, hardness, and MRR Variance source
Degree of freedom (DoF)
Sum of squares (SS)
Mean square (MS)
F ratio
Surface roughness (Ra) Speed (A)
4
0.3889
0.09721
3.50
Feed (B)
4
3.4131
0.85327
30.76
Depth (C)
4
0.3337
0.08344
3.01
Error
12
0.329
0.02774
Total
24
4.4686
Speed (A)
4
86.044
21.5110
87.92
Feed (B)
4
39.268
9.8170
40.12
Depth (C)
4
1.152
0.2880
1.18
Error
12
2.936
0.2447
Total
24
129.200
Hardness (Ha)
Metal removal rate (MRR) Speed (A)
4
24.07
6.02
1.12
Feed (B)
4
62.92
15.73
2.92
Depth (C)
4
4208.22
1052.05
195.49
Error
12
64.58
5.38
Total
24
4359.78
Fig. 35.5 Main effects plot for S/N for surface roughness
35 Optimization of Metal Removal Rate, Surface Roughness, and Hardness …
427
Fig. 35.6 Main effects plot for S/N for hardness
Fig. 35.7 Main effects plot for S/N for MRR
Ra l = 1.446 − 0.002477 S + 3.464 F + 0.1559 D R − (sq) = 89.19% R − sq(ad j) = 87.65%
(35.3)
Hard l = 30.462 + 0.03778 S − 11.40 F + 0.120 D R − (sq) = 91.51% R − sq(ad j ) = 90.30%
(35.4)
MRR l = 34.50 + 0.00635 S − 13.69F + 18.304D R − (sq) = 97.30% R − sq(ad j ) = 96.91%
(35.5)
428
Z. N. A. Hussain and M. J. Alsalhy
Here Ra l, hardness l, and MRR l show the predictive equations of surface roughness, hardness, and MRR, respectively. In Fig. 35.8, the comparison of predicted values and actual test results which were found by the linear regression model are given. R-sq values of the equations which were obtained by linear regression model for Ra, hardness, and MRR were found to be 89.19%, 91.51, and 97.30, respectively. The predictive equations for the quadratic regression of surface roughness, hardness, and MRR are given below: Ra q = 2.341 − 0.01786 S − 7.22 F + 1.288 D + 0.000001 S2 + 15.94 F2 + 0.307 D2 + 0.00672 S F − 0.000943 S D − 3.62 F D R − (sq) = 93.42% R − sq(ad j ) = 89.47%
(35.6)
Hard q = 34.92 − 0.0102 S − 3.3 F − 1.50 D + 0.000002 S2 + 22.7 F2 − 0.104 D2 − 0.01114 S F + 0.00096 S D + 1.89 F D
(35.7)
MRR q = 10.5 + 0.418 S + 120.9 F − 3.04 D − 0.000017 S2 − 270 F2 − 2.69 D2 − 0.0797 S F + 0.01339 S D + 55.6 F D R − (sq) = 98.47% R − sq(ad j ) = 97.56%
(35.8)
Hither Ra q, hardness q, and MRR q show the predictive equations of surface roughness, hardness, and MRR, respectively. In Fig. 35.9, the comparison of predicted values and real test results which were established by the linear regression model are given. R-sq values of the equations which were acquired by quadratic regression model for Ra, hardness, and MRR were found to be 93.42%, 97.74%, and 98.47%, respectively. Hence, more dense predicted values were obtained by the quadratic regression model as compared to the linear regression model. As a result, the quadratic regression model was shown to be effective for the appreciation of surface roughness, hardness, and metal removal rate (MRR) as shown in Figs. 35.8 and 35.9.
35.6 Conclusion In the current research, the Taguchi method was applied to determine optimal cutting conditions and the variance was analyzed using the ANOVA. The following was concluded:
35 Optimization of Metal Removal Rate, Surface Roughness, and Hardness …
429
Fig. 35.8 a, b, and c Comparison of the linear regression model with experimental results for Ra, MRR, and hardness
430
Z. N. A. Hussain and M. J. Alsalhy
Fig. 35.9 a, b, and c Comparison of the quadratic regression model with experimental results for Ra, MRR, and hardness
35 Optimization of Metal Removal Rate, Surface Roughness, and Hardness …
431
• The best levels of the input parameters (control factors) for reducing the surface roughness using S/N rates were determined. The optimal status for surface roughness was observed at A5 B1 C 1 (i.e., Cutting speed = 243.16 m/min, feed rate = 0.001 mm/rev, and depth of cutting = 0.5 mm). • The optimum levels of the control factors (input parameters) for maximizing the hardness, metal removal rate MRR using S/N rates were determined. The optimal status for hardness and MRR was observed at A5 B1 C 5 (i.e., Cutting speed = 243.16 m/min, feed rate = 0.001 mm/rev, and depth of cutting = 2.5 mm) and A4B1C1 (i.e., Cutting speed = 209.23 m/min, feed rate = 0.001 mm/rev, and depth of cutting = 0.5 mm), respectively. • From the results of statistical analysis, it was established that the feeding rate was the most important parameter for surface roughness (Ra) with F value of 30.76. While it was found that the cutting speed was the most important parameter for hardness (H) with F value of 87.92, the depth of cutting was the most important parameter for metal removal rate (MRR) with F value of 195.49. • The quadratic regression models were developed based on the input parameters. It was observed that the quadratic regression gives more accurate relationship compared to the linear regression and gives high correlation coefficients (Ra = 0.9256, hardness = 0.983, and MRR = 0.9824) between the measured and predicted values for surface roughness, hardness, and MRR Whole the results proved that the Taguchi technique was a dependable manner to reduce manufacturing costs and reduce the time required in manufacturing. In the future, the results obtained in the current research can be used for industrial applications in addition to academic research. Further research and studies can consider other factors such as geometric shape of cutting tools, coatings, radius cutting tools, lubricating oils, and others.
References 1. Erfani, T., Utyuzhnikov, S.V.: A method for even generation of the Pareto frontier in multi objective optimization. Eng. Optim. 43(5), 467–484 (2011) 2. Nalbant, M., Gokkaya, H., Sur, G.: Application of Taguchi method in the optimization of cutting parameters for surface roughness in turning. Mater. Des. 28, 1379–1385 (2007) 3. Ensarioglu, C., Cemal, C.M., Demirayak, I.: Mathematical modeling of surface roughness F or evaluating the effects of cutting parameters and coating material. J. Mater. Proces. Technol. 209 (2009) 4. Mandal, N., Doloi, B., Mondal, B., Das, R.: Optimization of flank wear using Zirconia Toughened Alumina (ZTA) cutting tool: taguchi method and regression analysis. Measurement 44, 2149–2155 (2011) 5. Kumara, N.S., Shetty, A., Shetty, A., Ananth, K., Shetty, H.: Effect of spindle speed and feed rate on surface roughness of Carbon Steels in CNC turning. Procardia Eng. 38 (2012) 6. Guo, Y., Leonders, J., Duflou, J., Lauwers, B.: Optimization of energy consumption and surface quality in finish turning. Procedia CIRP 1 (2012)
432
Z. N. A. Hussain and M. J. Alsalhy
7. Yi, Q.S., Sujan, D., Reddy, M.M.: Influence of cutting fluid conditions and cutting parameters on surface roughness and tool wear in turning process using Taguchi method. Measurement. 78, Sept 2015 8. Qehaja, N., Jakupi, K., Bunjaku, A., Bruci, M., Osmani, H.: Effect of machining parameters and machining time on surface roughness in dry turning process. Procedia Eng. 100 (2015) 9. Hameed, R., Maath, H.: Optimization of Sustainable Cutting Conditions in Turning Carbon Steel by CNC Machine. University of Dhi Qar University of Engineering Sciences (2017). 10. Abdulbaqi, A.S., Obaid, A.J., Hmeed Alazawi, S.A.: A smart system for health caregiver based on IoMT: Toward tele-health caregiving. Int. J. Online Biomed. Eng. 17(7), 70–87 (2021) 11. Agarwal, P., Idrees, S.M., Obaid, A.J.: Blockchain and IoT technology in transformation of education sector. Int. J. Online Biomed. Eng. (iJOE) 17(12), 4–18 (2021). https://doi.org/10. 3991/ijoe.v17i12.25015 12. Gupta, A., Singh, H., Aggarwal, A.: Taguchi-fuzzy multi output optimization (MOO) in high speed CNC turning of AISI P-20 tool steel. Expert Syst. Appl. 38, 6822–6828 (2011) 13. Holman, J.P.: Experimental Methods for Engineers. Eighth Edition, McGraw-Hill Companies (2012) 14. ASTM International, Specification for Steel Bars, Carbon and Alloy, Hot-Wrought and ColdFinished, General Requirements For, ASTM SA-29/SA-29M, (1998) 15. Koksoy, O., Muluk, Z.F.: Solution to the Taguchi’s problem with correlated responses, gazi university. J. Sci. 17(1), 59–70 (2004)
Chapter 36
Qualitative Indicator of Growth of Segments of the Network with Various Service Disciplines Zamen Latef Naser, Ban Kadhim Murih, and M. W. Alhamd
Abstract In a preliminary assessment, the quality survey of media transmission channels provides an estimation of unused capacity, assessing the development of the media transmission network within a hybrid framework. This assessment involves coordinating between a control network within the circuit section and a control network within the packet-switched section, resulting in units measured in (Kbit/sec) km. This comparison appears what portion of the arrange ought to be contributed to begin with for its further development and the associate the advancement with the duty plans that shows the quality of competing administrations. Until recently, one of the main indicators of network growth was the indicator serviced and newly commissioned canal-kilometers. This indicator network growth reflected an increase in the volume of services for analog subscribers of the telephone network public— Public Switched Telephone Network (PSTN).
36.1 Introduction With the increment within the share of information transmission activity, the abovementioned pointer of arrange development in channel-kilometers ceased to reflect the genuine state of issues, since the stack got to be blended. Blended stack is served agreeing to diverse benefit disciplines. Voice activity is served with misfortune and information parcels are served with delay.
Z. L. Naser (B) College of Engineering Technology, Department of Medical Device Industry Engineering, National University of Science and Technology, Dhi-Qar, Iraq e-mail: [email protected] B. K. Murih Al-Mustaqbal University College, Babylon, Iraq e-mail: [email protected] M. W. Alhamd Directorate of Research and Development, Iraqi Atomic Energy Commission, Baghdad, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_36
433
434
Z. L. Naser et al.
MAN heritage from 1.5 Mbps (DS-1) to 45 Mbps (DS-3)
Bottleneck Fig. 36.1 MAN network as a bottleneck in the PSTN network
Additionally, the unfurling joining of broadband applications starting from the buyer and the presence of high-capacity numbering capacity within the long-distance spine (worldwide and interconversion systems) has contributed to the foundation of far reaching noteworthy data weight on the boundaries of Metro systems. In other words, the framework of the Metropolitan Zone Organize (MAN) has slacked behind the advancements within the Local-Area Arrange (LAN) and Wide Zone Organize (Faded) organize foundation. Both LANs and WANs have performed critical picks up in channel transfer speed, whereas MAN remained unaltered. Figure 36.1 outlines the issue of blockage within the MAN arrange [1]. MAN interconnects were primarily restricted to DS1 joins (serving 1.5 Mbps T1 streams), or private DS3 joins (serving 45 Mbps T3 streams), which utilized time-division innovation multiplexing—time-division multiplexing (TDM). For numerous customers, T1 did not give sufficient transmission capacity and DS3 was as well costly. Due to the previously mentioned confinements of the administrations advertised, the bottleneck within the arrange is presently the get to (and center) of Metro systems. Unused era of optical Metro get to and transport advances such as Gigabit Ethernet (GigE) guarantee to extend this bottleneck between conclusion clients and the organize spine. The engineering of the open Metro arrange must alter totally on the off chance that the very final bottleneck within the end-to-end arrange is continuously extended. Hence, with the move from analog to computerized transmission strategies, unused media transmission innovations started to create in communication systems. The benefit pointer of communication channels has too changed. It started to be calculated within the fundamental computerized channels (BCCs) or Computerized Benefit/Flag of level (DSO) with a beat code balance (PCM) flag transmission rate of 64 kbit/s in full compliance with the proposal of the Universal Media transmission Union (MST—Universal Media transmission Union—ITU-T)—Rec. G.711 ITU-T. Be that as it may, amid the execution of Ethernet innovation, the share of information transmission activity comes to 40% (USA, 2002) [1], and the assignment of re-evaluating the control of the media transmission arrange, as of now with a
36 Qualitative Indicator of Growth of Segments of the Network …
435
blended stack within the fragments of voice and information systems, has emerged once more.
36.2 Material and Methods With the increment within the volume of phone (voice) activity, the channel benefit marker began to be measured within the essential advanced channels of the plesiochronous chain of command of computerized frameworks (European standard for fiber-optic systems)—Plesiochronous Computerized Hierarchy (PDH) (in Europe—E1 (30 DSO), within the USA and Japan—DS1 (24 DSO)) and tertiary computerized channels (in Europe—E3 (480 DSO), within the USA—DS3 (672 DSO), in Japan—DSJ3 (480 DSO)). At that point, as the share of information activity increments, this metric is measured within the OC1 (672 DSO) and OC3 (1916 DSO) synchronous optical carriers within the US SONET synchronous digital chain of command. At the side them, the channel benefit marker starts to be measured within the STM-1 (1920 DSO) synchronous transport modules of the primary level of the European synchronous computerized chain of command SDH [2, 3]. It is characteristic that the unused marker of benefit of communication channels not took under consideration the length of communication lines. There was a activity “closure” when a telecom administrator started to serve supporters at a diminished rate without taking into consideration the length and heading of long-distance and worldwide communication in interest of the growth of the supporter base, and the tax for phone administrations ceased taking into consideration the length and course of long-distance and universal communication. Let us compare the capabilities of competing innovations: PDH progression, non-concurrent exchange mode ATM—Offbeat Exchange Mode (ITU standardized fixed-length bundle exchanging innovation; is offbeat within the sense that parcels from person clients are transmitted aperiodically; gives effective exchange of different sorts of information (voice, video, mixed media, LAN activity) over long separations) and Ethernet innovation. The DSO voice channel (BCC) passes 8 bits (1 byte) amid the time interim Td = 125 μs, the ATM cell contains 53 bytes (48 bytes—data, 5 bytes—header), whereas the payload field of the IP information bundle/MPLS (Web Convention/Multiprotocol Name Exchanging—multi-protocol name exchanging; the MPLS determination, which makes it conceivable to coordinate arrange activity over certain virtual channels, exchanging IP parcels) utilizing Ethernet innovation can contain from 64 to 1518 or more bytes. On the one hand, it gets to be clear that the utilize of voice B-channels (data BCC within the PDH chain of command) and ATM cells for information transmission will be disadvantageous, since Ethernet innovation scales with transfer speed on request. On the other hand, in Ethernet innovation, the transmission of voice bundles in genuine time is still troublesome (requires tall client abilities). In any case, it is still fundamental to compare the volumes of given administrations for voice and data transmission, and again there is a got to tally activity in units of phone activity in Erlangs.
436
Z. L. Naser et al.
The taking after strategy is suggested for changing over voice and information parcel activity to activity units in Erlangs. We acknowledge that, in alignment with the communication directives within the framework of the Service Level Agreement (SLA), which is an agreement between the access service provider (telecom operator) and the customer outlining quantitative and qualitative service characteristics, such as backbone availability, customer support, fault recovery time, etc., a flow control scheme is implemented on the backbone lines. This flow control scheme involves two primary mechanisms: policing of incoming (ingress) traffic (policing) and, if necessary, shaping outgoing traffic (shaping). Secondly, in the event that the stream control components are conveyed all through the organize, at that point a few bundle or information source underpins the SLA indicated over, and as a result, the quality of the QoS benefit given to the bundle can be decided, where QoS stands for Quality of Benefit—the quality and lesson of information transmission administrations given (as a rule depicts the arrange in terms of delay, transfer speed, and flag jitter). Both stream control components (requesting and profiling) utilize transport descriptors on a bundle to show the classification of the parcel to guarantee redress transmission and benefit. In this case, active streams start to comply the Markov property, which, in turn, makes it conceivable to classify active streams as Poisson. Markov property is memory misfortune. For Markov forms, the longer term does not depend on the past, and long-term is decided as it was by the current show. Poisson streams are the benefit prerequisites entering the media transmission arrange, which are disseminated exponential. Third, since the call activity stack for a Poisson call stream is characterized as: a = λ · h, [Earl],
(36.1)
where is the packet arrival rate, [pack /c]; h = lpak/v—average service (transmission) time of one packet, [c/pack]; lpak—average length of one package, [bit/pack]; υ— transmission speed, [bit/c], Then we equate it to the telephone load over time. Fourthly, then the product of the telephone load in Erlang units per hour of occupation (hr) per hour of maximum load (PNN) will set the capacity of the trunk line (SL) bundles for a given quality of service: (36.2) where P is the packet loss rate (blocking probability); η—efficiency lines or occupation.
36 Qualitative Indicator of Growth of Segments of the Network …
437
Fifth, from expression (36.2), we determine the power of the Ethernet data transmission network in channel-kilometers [4] for a given quality P: D=
si j li j , [channel − kilometer s],
∀i, j
(36.3)
where lij—network graph edge length [km].
36.2.1 Numerical Experiment Consider two numerical illustrations, separately, for information and voice organize segments. Example 1 Let us calculate the control of a information transmission organize fragment. Let the normal bundle length lpak = 1518 bytes, the transmission speed within the Quick Ethernet communication line is υ = 100 Mbit/s. At that point, the normal benefit (transmission) time of parcel (36.1) will be.
μs 1518 bytes. 8 bit s = 121.44 , pack100.10 bit bytes Pack 1600 pack 121.44 10−6 s ≈ 0.1943 Earl. α= s. pack
−
h=
Taking into consideration the channel utilization figure ρ = 0.2 at the bundle misfortune rate P = 7.53 × 10–9 , we calculate the capacity of the SL pillar (36.2): s = [0.1943 Erl (1 – 7.53 × 10–9 )/0.2] = 1, and at that point, the control of the Ethernet arrange (point-to-point topology) at l = 10 km will be (36.3): D = 1 × 10 km = 10 channel − kilometers. Example 2 Let us calculate the power of a speech network segment through the capacity in DSO channels. STM-1 stream is capable of carrying 1920 DSO channels at 155.52 Mbps. Since the transmission rate of the STM-1 stream is higher than the speed of the Fast Ethernet stream, to equalize the average speeds in both network segments up to 100 Mbit/s, we rent three containers in the STM-1 synchronous transport module—C-31 (480 DSO), i.e., 480 channels. 3 = 1440 channels Taking into account the efficiency voice traffic lines η = 0.75 with losses P = 0.05, the STM-1 stream for our order can actually carry voice traffic of the above
438
Z. L. Naser et al.
quality in 1440 channels 0.75 = 1080 channels, randomly switched on the field of leased 1440 channels. Given that the actual load of real channels in Example 1 is only a = 0.1943 Erlangs, it can be calculated that in a system with 1080 channels, the approximate total loaded Erlang channels will be around 209.8 Erlangs. Hence, taking into account formula (36.3), the capacity of the loaded telephone network will be D = 209.8 channels.10 km = 2098 channel − kilometers.
36.3 Results and Discussion As can be seen from the examples considered above, the power of the Fast Ethernet network in channel-kilometers at the same load and transmission rate as in the telephone network turned out to be much less than the power of the telephone network. The results have turned out to be incomparable. While centralized telephone networks can artificially inflate data rates, there is a serious threat of cheaper competing Ethernet bypass networks. Now let us look at the above two examples in terms of bandwidth. In this case, the network capacity is calculated through the throughput of the trunk communication channels: K bit km (36.4) D= C i j li j , s ∀i, j where Cij—network graph edge bandwidth, kbit . S Then, the power of the network under the conditions specified in Examples 1 and 2, taking into account formula (36.4), will be as follows: kbit For the network PSTN D = 209, 8 channels 0.64 s.channel . 10 km = 134 272 kbit . s km; byte bit For the network Ethernet D = 1600 pack 0.1518 pack . 8 byte 0.10 km = 194,304 s kbit . km. s As you might expect, the power of the PSTN segment is less than the power of the Ethernet segment, as the PSTN has historically been optimized for voice traffic. At the same time, the capacities of different network segments of different traffic . km became comparable to each other. Figure 36.2 shows the with the dimension kbit s dependence of the network capacity on the indicators of network growth. Figure 36.2 shows that the network capacity in the circuit-switched segment (PSTN) in terms of growth in channel-kilometers is greater than the network capacity in the packet-switched (Ethernet) segment by at least 209 times. Hence, it may seem that it is more profitable to develop a network segment, which is built according to
36 Qualitative Indicator of Growth of Segments of the Network …
439
Fig. 36.2 Growth rates of network segments with various service disciplines differ significantly
the discipline of service with losses (that is, if a call is lost, then the subscriber must redial the number). Indeed, the graph is based on the calculations of Examples 1 and 2. But the fact that the calculations show a significant discrepancy in the network power in different segments, when the volume of transmitted traffic (follows from the initial data of the above examples) in both network segments is set the same, leads to the wrong conclusion. Capital investments must be made in the circuit-switched network segment. Although the incoming subscription for the operation of the PSTN segment may still exceed the incoming subscription for the operation of the Ethernet segment, in fact, the statistics will record the growth of the subscriber base in the Ethernet segment. A contradiction arises. The fallacy of this conclusion lies in the fact that the indicator of network growth in channel-kilometers has ceased to reflect the actual capacity of the mixed network. And the transition to the new indicator of network growth proposed by the author in units (kbit/s) km as can be seen from Fig. 36.2 eliminates the above contradiction: with the same initial data, the capacities in both network segments are equalized. The rapid growth of the transmission speed in the Internet has caused a high degree of demand for communication services. Expectations among users, suppliers, and the public have now reached an all-time high level as a new phase in the development of civilization—the Global Information Society (GIS), where information and knowledge become the main products of production [5–8]. That is why the Internet is constantly changing the boundaries of long-distance communication. Its sheer volume and growing importance in business mean that compatible technical standards and good practice must be brought into use globally. Thus, access networks are becoming more and more broadband, and the estimate of network growth under mixed load obtained in this article makes it possible to single out the most promising among the kaleidoscopic set of newly emerging technologies. Direction of development of telecommunication networks is bringing tough competition between various operators directly to the subscriber socket.
440
Z. L. Naser et al.
36.4 Conclusion An engineering methodology has been developed for measuring packet traffic in units of telephone load in Erlangs, which allows the use of standardized ITU-T methods for measuring the quality of service Plain Old Telephone Service (POTS) and IP/ MPLS. It was found that when assessing the capacity of a mixed communication network, it is advisable to use the indicator of network growth in units of kbit. nkm, s, which in the first approximation makes it possible to compare the network power in circuit-switched segments and the network power in packet-switched segments and provide guaranteed quality of service to subscribers in different service disciplines. A characteristic feature of the above indicator of network growth is the need to assess mixed traffic (voice and data) in Erlango channels.
References 1. Kostyukovsky, A.G.: Combining digital signals when switching time-separated channels. Minsk (2007) 2. Recommendations ITU-T series G.: [Electronic resource]—Mode of access: http://www.itu.int. Date of access: 20 Oct 2014 3. Qualitative indicator of the growth of network sectors with various service specialties. Bsuir Rep. 5(91) (2015) 4. Gigabit Ethernet for Metro Area Networks [Electronic resource]—Access mode: www.access engineeringlibrary.com. Date of access: 20 Oct 2004 5. Theory of communication networks: textbook. In: Roginsky M.V.N. (Ed.) Allowance (2017) 6. International Telecommunication Union—ITU [Electronic resource]. The Global Information Society: a Statistical View. Mode of access: http://www.itu.int. Access date: 20 Oct 2019 7. Agarwal, P., Idrees, S.M., Obaid, A.J.: Blockchain and IoT technology in transformation of education sector. Int. J. Online Biomed. Eng. (iJOE) 17(12), 4–18 (2021). https://doi.org/10. 3991/ijoe.v17i12.25015 8. Saeed, M.M., Hasan, M.K., Obaid, A.J., Saeed, R.A., Mokhtar, R.A., Ali, E.S., Akhtaruzzaman, M., Amanluo, S., Hossain, A.K.M.Z.: A comprehensive review on the users’ identity privacy for 5G networks. IET Commun. Commun. 00, 1–16 (2022). https://doi.org/10.1049/cmu2.12327
Chapter 37
Hidden Attractor in a Asymmetrical Novel Hyperchaotic System Involved in a Bounded Function of Exponential Form with Image Encryption Application Ali A. Shukur and Mohanad A. AlFallooji Abstract In this paper, an asymmetrical novel 4D system with a bounded function of exponential form, which can exhibit chaotic and hyperchoatic behaviors has been proposed. By calculating Lyapunov exponents and bifurcation diagram, the dynamical behaviors of such system are discovered. The proposed system has involved in bounded function and we show the behavior changes according to the corresponding function. An application to image encryption has been obtained.
37.1 Introduction and Formulation of the System In the last two decades, a large number of chaotic systems have studied with their application in weather forecasting [1], telecommunication [2], biological modeling [3] and so on. Chaotic systems can be classified into different categories by physical, dynamical and algebraic features. After Lorenz’s system (1964), [4], researchers go through their shock and accept the reality of chaotic systems. Later, started the attempts to provide chaotic systems with unique peculiarities (physical and dynamical) such as biological model of Rössler which contain only one nonlinear term [5], electronic circuit of Chua which exhibit two scroll chaotic behavior [6] and simplification of Lorenz system by Chen and Lu system [7, 8]. On other side, chaotic systems with special algebraic structure were provided such as Wei’s system which has no equilibria [9] and Wang’s system which has only one stable equilibrium [10]. In the late 1970s, Rössler proposed a very interested choatic system with two positive Lypunouv exponents which later was called hyperchoatic [13]. After that, hyperchaotic systems got the interest of researchers in different ares and many of A. A. Shukur (B) Faculty of Computer Science and Mathematics, University of Kufa, Kufa, Iraq e-mail: [email protected] Faculty of Mechanics-Mathematics, Belarusian State University, Minsk, Belarus M. A. AlFallooji National University of Malaysia, Kuala Lampur, Malaysia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_37
441
442
A. A. Shukur and M. A. AlFallooji
Fig. 37.1 a Symmetrical system (37.2) whereas no . f (x). b Asymmetrical system (37.2)
hyperchaotic systems have introduced, especially, 4D hyperchaotic Lorenz-type system [14]. Note that such kind of systems with unusual peculiarities can have neither heteroclinic orbit or homoclinic orbit, and thus the Shilnikov method [15] may not help to verify the chaos, so, hyperchaotic systems are more complicated. This way, its obvious that designing a hyperchaotic systems is not easy task and it needs to be a compromise between the Lyapunov exponents and dissipation criteria. In [10], by Yang, Wei and Chen was proposed a generalized Lorenz-type system as ⎧ ⎪ ⎨x˙ = −a(x − y); . (37.1) y˙ = −cy − x z; ⎪ ⎩ z˙ = x y − d. In this paper, we propose a generation of an asymmetrical four-dimensional system with simple structure based on the above three-dimensional system; such as: ⎧ x˙ = −ax + byw; ⎪ ⎪ ⎪ ⎨ y˙ = cy − x z; . (37.2) ⎪ z˙ = x y − dz f (x); ⎪ ⎪ ⎩ w˙ = my + r where. f (x) = 1+e1 −x be a sigmoid function and.a, b, c, d, m, r are system parameters. Note that . f (x) is a bounded function, i.e., .0 < f (x) < 1. One can ask about the role of . f (x). The function . f (x) changed the topological structure such that the system (37.2) without . f (x) is a symmetric around .z and .w-axis with transformation .(x, y, z, w) → (−x, −y, z, w). See Fig. 37.1. Organizing of this paper as follows: in Sects. 37.1, introduce the theoretical model of the system and study some of it’s fundamental properties. In Sect. 37.2, the ultimate boundness of the proposed system has been obtained. In Sect. 37.3, an application to image encryption was processed. In Sect. 37.4, the possibility of the synchronization scheme of the proposed systems is studied.
37 Hidden Attractor in a Asymmetrical Novel Hyperchaotic System …
443
37.2 Dynamics Analysis To investigate the stability of system (37.2), we consider the following algebraic equations: ⎧ −ax + byw = 0, ⎪ ⎪ ⎪ ⎨cy − x z = 0, . (37.3) dz ⎪ x y − 1+e −x = 0 ⎪ ⎪ ⎩ my − r = 0. Obviously, if .r = 0 then the system (37.2) has a line of equlibiria at .(0, 0, 0, w). For .r ∈ Z except .{0}, the system (37.2) has no equilibrium. In both cases, the system (37.2) can be classified as a chaotic system with hidden strange attractor, a basin of attraction which does not contain neighborhoods of equilibria. Hidden attractor was observed in some kind of chaotic and hyperchaotic systems such as systems with no equilibria. To have better understanding to hidden attractor, we refer to [11, 17]. The divergence of disspativity is
.
V =
∂ x˙ ∂ y˙ ∂ z˙ ∂ w˙ de x + + + = −a + c − ∂x ∂y ∂z ∂w 1 + ex x
x
de de Note that for .x ∈ Z, we have .lim x→−∞ 1+e x = 0 either .lim x→+∞ 1+e x = d. Thus system (37.2) is disspative for.c < 0 and for any.x(t),.t → ∞. This way, we obtained that each volume containing the system trajectory shrinks to zero as t approach the infinity. Follows that all orbits are ultimately confined into a specific limit set of zero volume, and the asymptotic motion settles onto an attractor. Thereby, the existence of attractor of systems (37.2) is proved for any value of the given sigmoid function. One of the important One of the important tool to indicate hyperchaotic is Lyapunov exponents (LEs) which at least must contain two positive values. Now, by setting .a = 10, b = 1, c = 12.5, d = 20, m = −1.5 and .r = 0.5 with .x(0) = 1, . y(0) = 1, .z(0) = 1 and .w(0) = 1 we obtained finite-time LEs such as. L 1 = 3.70957, L 2 = 0.666986, L 3 = −4.85606, L 4 = −10.715758. This means that the proposed system (37.2) shows hyperchaotic behavior. LEs is shown in Fig. 37.2a and the dynamics are shown in Fig. 37.3. Moreover, the bifurcation diagram is shown in Fig. 37.2b, which was obtained by plotting the local maxima of .z(t) when displaying the value of .r in the interval .[−2, 2].
444
A. A. Shukur and M. A. AlFallooji
Fig. 37.2 a Lyapunov exponents of the system (37.2) for.c = 12.5. b Bifurcation diagram of (37.2) with varying .r ∈ [−2, 2]
Fig. 37.3 a x-y plane. b y-z plane. c z-w plane. d y-z-w plane
37 Hidden Attractor in a Asymmetrical Novel Hyperchaotic System …
445
37.3 An Application to Image Encryption 37.3.1 Encryption Procedure In this section, we study the exploit of the proposed in theory of image cryptosystem. The scheme is shown in Fig. 37.4. In particular, the following steps need to be done: 1. Calculate the following value: ∑ ( P + M N) .X = (M N + 223 ) and then updates the value of . X using the formula: .
X (i) =
mod (X (i − 1) ∗ 1eK , 1)
∑ where . M N is the total number of pixels in the image, . P is the sum of the pixel values, . K is the system’s dimensions and .i = 2, . . . , K . Usually, M is the rows and N is the column. 2. Solve the system (37.2) using the initial values from the previous step. 3. Sort the output of the differential equations the system (37.2) and store the sorting indices. 4. Rearrange the original image after convert it to vector of pixel values according to the order of the sorted indices form the previous step. 5. Reshape the vector of pixel values into a matrix with . M rows and . N columns and store it as matrix .(R) then extract .2 block of elements from matrix .(R) after that store it in matrix .(C x). 6. The matrix multiplication .C x ∗ A is performed and the resulting .2 matrix is stored in the corresponding block of .C which represent the encrypted image, where . A is a secret .2 matrix. Besides, the reverse process of encryption is the decryption, see Fig. 37.5. After convert the encrypted image to a double-precision floating-point array P and get the size of it (M * N), then according to the number of rounds that is used in encryption and by going from the last round to the first round we apply a linear transformation to each .2 × 2 block of pixels in the encrypted image using the adjoint of matrix A that used in encryption. Then sorts the elements of the transformed image using the indices in the key for that round, flattens the transformed image into a single vector, and sorts the vector using the same indices. After that reshape the sorted vector into a matrix with the same size as the original image and applies the modulo operation element-wise to ensure that all values are within the range 0–255. After all the rounds have been completed, the decrypted image converted to an unsigned 8-bit integer.
446
A. A. Shukur and M. A. AlFallooji
Fig. 37.4 The scheme of the encryption algorithm
Fig. 37.5 The scheme of the decryption algorithm
37.3.2 Performance and Security Analysis Its well-known that in cryptosystem that minor modifications are possible to the plain image and to test the sensitivity, there are two measures, named by UACI and NPCR. The equations are: NPCR = .
UACI =
M N 1 ∑∑ dist(i, j); M N i=1 j=1 N M ∑ ∑ 1 |C1 (i, j) − C2 (i, j)|, 255 × M N i=1 j=1
37 Hidden Attractor in a Asymmetrical Novel Hyperchaotic System … Table 37.1 NPCR and UACI PNCR
UACI 33.4674 31.9249 33.4615 26.3126 26.1285
99.6100 99.2000 99.6095 99.5830 99.6140
447
Table 37.2 .X 2 test Ref. [18] Ref. [19]
Ref. [20]
Ref. [18] Ref. [19] Ref. [20] Proposed with . f Proposed without . f
Proposed with . f
Proposed without f
.
0.5113
0.3661
0.5339
0.28155
0.27177
where . M .dist (i, j) is: { dist(i, j) =
.
0, if C1 (i, j) = C2 (i, j); 1 if C1 (i, j) /= C2 (i, j).
Table 37.1 displays the comparative results of UACI and NPCR.
37.3.2.1
Statistical Analysis
Here, we will check the following statistical properties (Table 37.2): 1. Correlation analysis. By. N p we denote the neighboring pixels. In the usual images . N p values are very close to each other and this means that connected pixels are highly correlated in the original images. For specialist, highly correlated feature can be used to break the cipher. Thus, . N p in the cipher image should be highly uncorrelated. The correlation coefficient is: ∑Kt i=1 (αi − E{α})(βi − E{β}) .C ab = / /∑ ∑Kt Kt 2 2 (α − E{α}) i i=1 (βi − E{β}) i=1 where .αi , βi denote grayscale values of . N p , . K t is the total number of pixels taken for the calculation and .E{.} considered as the expected values of the random variables. It can be observed from Fig. 37.6 a–c that the coefficient’s correlation in the cipher images in all three (horizontal, vertical, and diagonal) directions are almost approach to zero. Therefore, the connected pixels in the ciphered image’s are highly uncorrelated.
448
A. A. Shukur and M. A. AlFallooji
Fig. 37.6 Distribution of connected pair pixels in image of airplane. a horizontally. b Vertically. c Diagonally (c).
2. Histogram and Chi-Square Test. To visual examination to the distribution of pixel intensities in the image we need to consider the histogram of an image which give graphic view. Figure 37.7 shows grayscale image, namely Airplane, along with their histograms. The Chi-square test by the following equation: 255 ∑ (εi − l)2 .X = , l i=0 2
where .εi indicates the frequency of occurrence of a particular pixel value and l = (M N )/256. In the experiment process, the distribution is considered to be uniformly when the chi-square test is found to be more than a significance level .τ (τ ∈ [0, 1]), this way, the null hypothesis is accepted. Table 37.3 shows the chisquare test. 3. Entropy Information. One of the most important measure in dynamical systems theory is entropy which defined as .
37 Hidden Attractor in a Asymmetrical Novel Hyperchaotic System …
449
Fig. 37.7 Original images of above (a) Airplane and its histogram (b). Ciphered image of down (a) Airplane and its histogram (b). Table 37.3 Entropy information Ref. [18] Ref. [19]
Ref. [20]
Proposed with . f
Proposed without f
.
7.9993
7.98615
7.9993
.
H (X ) = −
Ks ∑
7.9970
7.9971
p(xi ) log( p(xi )),
i=1
where . X is he source, . p(xk ) is the probability of the element .xk , and . K c is the number of different elements generated by . X. The ideal of entropy is to obtain when all pixel levels appear with an equal probability showing that all pixel uniformly distributed. In Table 4, we show different entropies considered in literature.
450
A. A. Shukur and M. A. AlFallooji
37.4 Conclusion In this paper, the complex dynamics of an asymmetrical 4D hyperchaotic system involved in bounded function was investigated. The corresponding ultimate bound region to this hyperchaotic system was shown. As an application, the proposed cryptosystem consists of several stages. First, some initial values are calculated by applying equation, its parameters depending on the dimensions of the image to be encrypted. The initial values that were calculated previously were used to solve a system of differential equations. After that, the output indices of the differential equations will be the initial values that are used to rearrange the pixels of the input image. The resulting image is rebuilt in .2 matrices. Each matrix multiplies a secret matrix of dimension .2. Also, the resulting matrix is converted into unit 8, which represents the encrypted image. Adaptive control of the new system was obtained. At the end, throughout this study, we observed that the proposed system involved to sigmond function is making the system weaker. In particular, the other dynamics of the introduced system of this paper are expected to be further studied.
References 1. Sooraksa, P., Chen, G.: Chen system as a controlled weather model-physical principle, engineering design and real applications. Int. J. Bifu. Chaos 28, 1–12 (2018) 2. Udaltsov, V.S., Goedgebuer, J.P., Larger, L., Cuenot, J.B., Levy, P., Rhodes, W.T.: Communicating with hyperchaos: the dynamics of a DNLF emitter and recovery of transmitted information. Opt. Spectrosc. 95, 114–118 (2003) 3. Schiff, S.J., Jerger, K., Duong, D.H., Chang, T., Spano, M.L., Ditto, W.L.: Controlling chaos in the brain. Nature 370, 615–620 (1994) 4. Lorenz, E.N.: Deterministic nonperiodic flow. Atmos. Sci. 20, 130–141 (1963) 5. Lorenz, E.: On the prevalence of aperiodicity in simple systems. In: Grmela, M., Marsden, J.E. (eds.) Global Analysis, Lecture Notes in Mathematics, 755, pp. 53–75. Springer, Berlin 6. Chua, L., Komuro, M., Matsumoto, T.: The double scroll family. IEEE Trans. Circuits Syst. 33, 1073–1118 (1986) 7. Chen, G.R., Ueta, T.: Yet another chaotic attractor. Int. J. Bifurc. Chaos 9, 1465–1466 (1999) 8. Lu, J.H., Chen, G.: A new chaotic attractor coined. Int. J. Bifurc. Chaos 12, 659–661 (2002) 9. Wei, Z., Yang, Q.: Dynamical analysis of a new autonomous 3-D chaotic system only with stable equilibria. Nonlin. Anal. Real World Appl. 12, 106–118 (2011) 10. Yang, Q., Wei, Z., Chen, G.: An unusual 3D autonomous quadratic chaotic system with two stable node-foci. Int. J. Bifurc. Chaos 20, 1061–1083 (2010) 11. Kuznetsov, N.V.: The Lyapunov dimension and its estimation via the Leonov method. Phys. Lett. A 380(25–26), 2142–2149 (2016) 12. Kuznetsov, N.V.: The Lyapunov dimension and its estimation via the Leonov method. Phys. Lett. A 380(25–26), 2142–2149 (2016) 13. Rossler, O.E.: An equation for hyperchaos. Phys. Lett. A 71, 155–157 (1979) 14. Barboza, R.: Dynamics of a hyperchoatic Lorenz system. Int. J. Bifurc. Chaos 17, 4285 (2007) 15. Neimark, Y., Shilnikov, L.: A condition for the generation of periodic motions SOV. Math. Docklady 6(163), 1261–1264 (1965) 16. Kuznetsov, N.V.: The Lyapunov dimension and its estimation via the Leonov method. Phys. Lett. A 380(25–26), 2142–2149 (2016)
37 Hidden Attractor in a Asymmetrical Novel Hyperchaotic System …
451
17. Leonov, G.A., Kuznetsov, N.V.: Hidden attractors in dynamical systems. From hidden oscillation in Hilbert Kolmogrov, Aizerman, and Kalman problems to hidden chaotic attractor in chua circuits. Int. J. Bifurc. Chaos 23, 1–69 18. Hua, Z., Zhou, Z.: Design of image cipher using block-based scrambling and image filtering. Inf. Sci. 396, 97–113 (2017) 19. Souyah, A., Faraoun, K.M.: Fast and efficient randomized encryption scheme for digital images based on quadtree decomposition and reversible memory cellular automate. Nonlin. Dyn. 84, 715–732 (2016) 20. Nestor, T., Belazi, A., Abd-El-Atty, B., Aslam, Md., Volos, C., De Dieu, N., El-Latif, A.: A new 4D hyperchaotic system with dynamics analysis, synchronization, and application to image encryption. Symmetry 14, 424 (2022)
Chapter 38
AI-Based Secure Software-Defined Controller to Assist Alzheimer’s Patients in Their Daily Routines S. Nithya, Satheesh Kumar Palanisamy, Ahmed J. Obaid, K. N. Apinaya Prethi, and Mohammed Ayad Alkhafaji
Abstract According to reports, 60–70% of the estimated 50 million dementia patients globally have Alzheimer’s disease. Forgetting earlier encounters or occurrences is one of the illness’ early warning signals. People with Alzheimer’s frequently repeat words and inquiries, forget discussions, appointments, or activities, and then fail to recall them later, as well as routinely misplace their items and become lost in places they are acquainted with. There is no proven medication for Alzheimer’s disease that can stop the brain’s disease process in its tracks. The goal of this study is to develop a personal assistant that can remind a person with Alzheimer’s or Parkinson’s disease of important chores. The suggested remedy is a convolutional neural network-based voice-based smart controller. It makes it easier for the user to remember where they put the thing. The suggested solution also contains various extra characteristics that will be beneficial to both the Alzheimer’s patient and regular people. The controller also often sends an alert at predetermined intervals so that individuals can pick up their phones, remind themselves of tasks, and see their own to-do list. The controller sends a notification before the meeting if the user has a
S. Nithya · K. N. Apinaya Prethi Department of Computer Science and Engineering, Kumaraguru College of Technology, Coimbatore, India e-mail: [email protected] K. N. Apinaya Prethi e-mail: [email protected] S. K. Palanisamy Department of Electronics and Communication Engineering, Coimbatore Institute of Technology, Coimbatore, India e-mail: [email protected] A. J. Obaid (B) Faculty of Computer Science and Mathematics, University of Kufa, Kufa, Iraq e-mail: [email protected] M. A. Alkhafaji College of Engineering Technology, National University of Science and Technology, Dhi Qar, Iraq © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_38
453
454
S. Nithya et al.
meeting booked. This smart controller will serve as an assistant to the individual with Alzheimer’s and might also be helpful to ordinary human beings.
38.1 Introduction In today’s world, an estimated 6.5 million people aged 65 and older are living with Alzheimer’s in 2022. Seventy-three percent are 75 or older. About 1 in 9 aged 65 and older (10.7%) has Alzheimer’s. Given that age is a known risk factor for acquiring Alzheimer’s disease and that more people are expected to get the condition in an aging community, it is becoming more and more of a public health problem. Simple daily actions, like brushing their teeth or eating meals, can be difficult for those with Alzheimer’s. They might also become disoriented and lose their sense of place and time, making it simple for them to get lost away from home. Additionally, they could forget recent happenings or the names of some of their loved ones. As a result, these challenges not only provide a challenge for the patient, but also for the patient’s caregiver. This application would also assist ordinary people who could easily forget certain things. The main objective of our project is to develop a secure and progressive web application that serves as a personal assistant and reminder for Alzheimer’s patients. Therefore, we decided to create a progressive web application that collects data from users in the form of speech, converts it to text, and stores it in a database. We decided to use encryption and decryption techniques to keep data in databases securely to avoid the loss of personal information. The user can use the web app to query the database and retrieve his/her information. To remind the user of critical appointments, we further send push notifications to them. On a daily basis, we also display the patient’s to-do list.
38.2 Literature Review Ali Bou Nassif proposed a system [1] that discussed the use of convolutional neural networks in a paper titled speech recognition using deep neural networks: A systematic review. Research has been concentrated on applying deep learning to speechrelated applications. In several applications, including speech, this new branch of machine learning has outperformed others, making it an extremely appealing topic for research. This paper presents a comprehensive review of the studies on deep learning for speech applications that have been carried out since 2006, when deep learning emerged as a new field of machine learning. This review, which was undertaken by extracting particular data from 174 papers published between 2006 and 2018, provides a complete statistical analysis. A work by Trivedi [2] addresses that not just on a management level, but on an individual basis, it is crucial to convey information to the appropriate person and
38 AI-Based Secure Software-Defined Controller to Assist Alzheimer’s …
455
in the appropriate way. In today’s technologically advanced society, communication methods including phone conversations, emails, text messages, and others have become essential. Numerous applications that operate as a middleman and aid in efficiently transmitting messages in the form of text or audio signals over miles of networks have entered the picture in order to serve the aim of effective two parties communicating without obstacles. The majority of these applications make use of features like articulatory and acoustic-based speech recognition, text-to-speech signal and text-to-text signal conversion, language translation, among other things. A framework by Isewon [3] deals with the challenges to read for those who are visually impaired. To aid them in completing the reading assignment a text-to-speech synthesizer that transforms inputted text into synthesized speech and reads aloud to the user was designed as a Java application. There are two parts to it: the GUI and the main module. The GUI elements control processes such as direct keyboard input from the browser or file-based parameter entry for conversions. The DJNativeSwing and SWT APIs were used for these activities while in the main module FreeTTS API was employed. NLP and DSP are components of the FreeTTS. NLP performs phonetic transcription of the text read using text analysis, dictionary-based pronunciation rules for managing phonemes, and prosody for phrasing, and accentuation. The NLP provides symbolic data, which the DSP transforms into audible speech. The proposed system [4] by artificial reading and keyword extraction approaches are time- and energy-consuming, and the outcomes are frequently subpar, according to D. M. Harikrishna, who deals with the issue of the growing variety of narrative book genres. A Children’s Story Classification in Indian Languages was created to save time when classifying the books. A vector space model (VSM), which places stories into a vector space, was used to categorize the stories. Three feature reduction techniques—sparse word elimination, latent semantic analysis, and linear discriminant analysis—were used to lessen the dimensionality of feature vectors. The article also looked at linguistic components, such as part-of-speech (POS) tags, to categorize stories. For keyword-based feature selection, term frequency (TF) and term frequency inverse document frequency (TFIDF) features are investigated. The interannotator agreement for Hindi and Telugu, according to the Fleiss Kappa values, was 0.95 and 0.9, respectively. The development of software that mimics human communication [9] is still relevant today. The database of questions and their responses serves as the simplest representation of communication. Problems with describing the knowledge base and implementing the interpreter software exist in this situation. The aim of the paper is to identify the gender of the user in order to increase the human likeness of the Chabot’s response. The Chabot response is automatically generated by an algorithm that analyzes and parses the user’s text. The method for translating audio inputs into text based on a sequential ensemble of recurrent encoding and decoding networks is the scientific originality of the results. Voice-based natural language query processing [10] by using natural language processing, structure query language proposed a method says. Today, it is necessary to store and retrieve data from databases. Non-expert users require a system that allows them to communicate with databases using natural language, like English,
456
S. Nithya et al.
since they must learn SQL in order to use databases. Natural language processing’s primary goal is to make it possible for people and computers to communicate. To obtain the necessary result, a voice-based user interface is employed. It is very beneficial for placement cell officials who work on student databases and can use the technique to extract data. NLP is a method that enables the computer to comprehend the languages spoken by people. NLP techniques assist in resolving both simple and complex queries. The most significant uses of natural language processing are machine translation, information organization, and information retrieval.
38.3 Proposed System The main objective of our project is to develop a AI-based secure software design controller which serves as a personal assistant as well as reminder to person who are affected by Alzheimer’s, or dementia or Parkinson’s. Therefore, we decided to create a secured controller that collects data from users in the form of speech and converts it to text and stores it in a database. This controller applies encryption and decryption techniques to keep data in databases securely to avoid the loss of personal information. The user can use the controller to query the database and retrieve his/ her information. To remind the user of critical appointments, we further send push notifications to them. On a daily basis, controller also displays the patient’s to-do list.
38.3.1 Workflow The controller is designed essential machine learning module and other necessary options. The workflow of controller is depicted in Fig. 38.1. Initially, the user registers the information which they really want to be remember, those information has been collected as a voice input, and it will be converted to text. Encryption techniques are applied for the security of the user data, which is the most crucial aspect of this project. We employ the advanced encryption standard (AES) encryption algorithm to encrypt the text provided by the user for security reasons. The AES Key is used to encrypt the data (user’s speech). The text is then sent to the backend, where the AES decryption algorithm is used to open it. Utilizing the Natural Language Toolkit (NLTK), is used to carry out a number of operations like tokenization into nouns, verbs, adjectives, and other types of words. Once more, the data is encrypted using a unique AES key and stored in a database. ReactJS and JavaScript were utilized for user interaction together with Python (Flask) and MongoDB to store the data in the database. The user enters Query in the form of speech to retrieve the data from the database using the Web Speech API. As was already indicated, the info module encrypts the data before storing it, and the query module follows suit by employing the AES
38 AI-Based Secure Software-Defined Controller to Assist Alzheimer’s …
457
Fig. 38.1 Workflow of the proposed controller
encryption algorithm and key. The text is then transferred to the backend where the AES Key is used to decrypt it. The text that has been tokenized is encrypted with the info key, or double encryption, using the NLTK parsing. The following steps involve tokenizing the text, validating it, and then searching for the keyword by comparing it with the encrypted text stored in the backend. When the keyword is matched, it is encrypted using the communication key, returned to the front end, and then decrypted. This decrypted data is then displayed for the user’s convenience in
458
S. Nithya et al.
Fig. 38.2 Architecture of the proposed model
the form of speech. Once the final decryption is complete and the keyword is not matched, that is not found is shown to the user. Figure 38.2 depicts an overview of the architecture. This smart controller has the following strength within • • • •
Cost-effective and Powerful querying and analytics tool Widely supported and code-native data access Change-friendly design Easy horizontal scale-out with sharing.
38.3.2 Component of the Controller The smart controller has the two main verticals within. • Store vertical • Retrieve vertical (Query vertical). 38.3.2.1
Store Vertical: Store Vertical Has Four Main Functionalities
i. Speech Recognition ii. Encrypt iii. NLTK parse().
38 AI-Based Secure Software-Defined Controller to Assist Alzheimer’s …
459
The user can give his/her input to the application by using “info” Functionality. Once the “info” Functionality is enabled it starts to record all of the information given by the user in terms by using the Speech Recognition() of voice until “Stop” Functionality enables. Then encrypt() will be raised automatically to receive and encrypt text along with the key (communication key) is sent to the backend, once the text reaches the backend it is decrypted using the communication key. Using the NLTK parse(), the decrypted text is tokenized. The tokenized words are then encrypted using AES with different keys (Information database key). The encrypted text is stored in the database. Thus, double encryption is used to improve the security of the information [17–20].
38.3.2.2
Retrieve/Query Vertical: This Vertical Has Two Main Functionalities
i. Query () ii. Decrypt(). If the user wants forget and he needs to assist this vertical comes in to help the user. The user can raise query to the Controller by using the “query” functionality. Then it starts to record all of the information given by the user in speech format, until “stop()” function is enabled, stops the recording, and converts the speech into text. Encrypt the text using AES and the encrypted text along with the key (communication key) is sent to the backend, once the text reaches the backend it is decrypted using the communication key. And keyword matcher matches the query keyword with the word retrieved from repository and then sends to the user as a voice.
38.3.3 Additional Features of the Controller i. Remainder assist: Remainder assist is one of the crucial applications created to help users carry out their daily tasks. The user’s challenge of remembering every job that needs to be completed at a specific time is lessened by this capability. The included tasks are: a. Managing the task b. Keeping the task in mind c. Task must be completed by the deadline. ii. History of Speech stored in Database: The user can read and acquire specifics of their previous speech, as well as the type of information they said for the inquiry, using this function. iii. Alert functions: For patients with Alzheimer’s disease, the alert function is an important application. The features include:
460
S. Nithya et al.
(a) Scheduling particular events or activities; (b) Updating specific user data in accordance with letting everyone know about the upcoming event.
38.4 Results and Discussion The major difference between the related work and the proposed system is using the AES algorithm and Web Speech API keys. The AES data encryption method’s main benefit is the availability of various key lengths, despite the fact that it is potentially more complex and appealing. In order to generate a series of new keys known as round keys and increase security, AES uses a key expansion method that makes use of the initial key. These round keys are produced through multiple rounds of modification, each of which increases the difficulty of decryption. TheWebSpeechAPIisaW3C-supported protocol that enables browser vendors to provide the speech recognition engine of their choice (local or cloud-based) to provide an API you may use straight from the browser without having to worry about API restrictions (Tables 38.1 and 38.2; Figs. 38.3 and 38.4). How a proposed smart controller is better than other speech recognition? • The Web Voice API is more effective than other speech recognition technologies because it converts speech to text more effectively than conventional approaches. • It is a sophisticated technology that provides speech-to-text transcription with the highest level of accuracy. • Applications: dictation, voice control, translation. • 84% of all uttered words are correctly detected by this Web speech API, with 4218 words out of 5000 being correctly identified. Table 38.1 Basic parameter comparison between AES and other hashing algorithm Parameter under consideration
AES
DES
RSA
Computation time Faster
Moderate
Slower
Memory utilization
Moderate memory space
Least memory space
More memory space
Security level
High
Medium
Low
38 AI-Based Secure Software-Defined Controller to Assist Alzheimer’s …
461
Table 38.2 Advanced parameter comparison between AES and other hashing algorithm Factors
AES
Other hashing algorithms
Key length
128, 192OR 256 bits
56, 168, 112 bits Depends on the number of bits in module, n = p*q (RSA)
Round(S)
10–128 bit key, 12–192 bit key, 14–256 bit key
16, 48, and 1 bit key
Blocksize
128 bits
64 bits, variable
Speed
Fast
Slow, slowest
Ciphertype
Symmetric block cipher
Symmetric/asymmetric block cipher
Security
AES has never been cracked yet and is safe against any brute force attacks contrary to belief and arguments
Not secure enough
Principle used
It follows the substitution and permutation
Others follow the principle of Feistel structure
Fig. 38.3 Throughput of AES with other algorithms
38.5 Corporate Social Responsibility This controller can be used by every human being. The user, at first, needs to download as an application in the website so that he need not search for the link all the time. This model does not violate any user privileges. It does not access the user details at any situation, and it is very user-friendly. In order to serve as an assistant for people who have been affected by Alzheimer, and also for common people, this AI-based smart controller could be a stepsetter.
462
S. Nithya et al.
Fig. 38.4 Time taken for encryption and decryption of data
38.6 Conclusions An application’s usability must be taken into account to guarantee that users may take full advantage of it and benefit from its use. There are several benefits to executing this project, including: The application was deemed to be user-friendly, and people have reported being able to handle the system—especially when entering or extracting data. It was also perceived as a speedy and effective system. Users of Alzheimer’s assistant exhibited trust in their abilities to utilize it and their belief that the system itself is effective. Alzheimer companion has a verification mechanism in place to evaluate the accuracy of data either being submitted or extracted.
References 1. Nassif, A.B., Shahin, I., Attili, I., Azzeh, M., Shaalan, K.: Speech recognition using deep neural networks: a systematic review. IEEE Access 7 (2019) 2. Trivedi, A., Pant, N., Shah, P., Sonik, S., Agrawal, S.: Speech to text and text to speech recognition systems—a review. Int. J. Adv. Res. Comput. Commun. Eng. 10(1); 20(2), 36–43 (2021) 3. Harikrishnan, D.M., Sreenivasa Rao, K.: Children’s story classification in Indian languages using linguistic and keyword-based features. ACM Trans. Asian Low-Resour. Lang. Inform. Process. Process. 19(2), 1–22 (2020) 4. Dimauro, G., Di Nicola, V., Bevilacqua, V., Caivano, D., Girardi: Assessment of speech intelligibility in Parkinson’s disease using a speech-to-text system. IEEE Access (2017) 5. Isewon, I., Oyelade, J., Oladipupo, O.: Design and implementation of text to speech conversion for visually impaired people. Int. J. Appl. Inform. Syst. 7(2) (2014) 6. Basystiuk, O., Shakhovska, N., Bilynska, V., Syvokon, O., Shamuratov, O.: The developing of the system for automatic audio to text conversion. In: Symposium on Information Technologies and Applied Sciences (2021)
38 AI-Based Secure Software-Defined Controller to Assist Alzheimer’s …
463
7. Munde, P., Tambe, S., Shaikh, A., Sawant, P., Mahajan, D.: Voice based natural language query processing. Int. Res. J. Eng. Technol. 7 (2020) 8. Palanisamy, S., Thangaraju, B., Khalaf, O.I., Alotaibi, Y., Alghamdi, S.: Design and synthesis of multi-mode bandpass filter for wireless applications. Electronics 10(22), 2853 (2021). https:// doi.org/10.3390/electronics10222853 9. Kumar, S., Balakumaran, T.: Modeling and simulation of dual layered U-slot multiband microstrip patch antenna for wireless applications. Nanoscale Rep. 4(1), 15–18 (2021). https:// doi.org/10.26524/nr.4.3 10. Mahajan, R., Roy, A., Jadhav, A., Rao, A.: Database interaction using speech recognition. Int. J. Adv. Res. Innov. Ideas Educ. 7 (2021) 11. Li, X.: A method for extracting keywords from English literature based on location feature weighting. In: Proceedings of the International Conference on Communication Technology (2020) 12. Palanisamy, S., Thangaraju, B.: Design and analysis of clover leaf-shaped fractal antenna integrated with stepped impedance resonator for wireless applications. Int. J. Commun. Syst. 35(11), e5184 (2022). https://doi.org/10.1002/dac.5184 13. Nivethitha, T., Palanisamy, S.K., Mohana Prakash, K., Jeevitha, K.: Comparative study of ANN and fuzzy classifier for forecasting electrical activity of heart to diagnose Covid-19. Mater. Today Proc. 45, 2293–2305 (2021). https://doi.org/10.1016/j.matpr.2020.10.400 14. Sam, P.J.C., Surendar, U., Ekpe, U.M., Saravanan, M., Satheesh Kumar, P.: A low-profile compact EBG integrated circular monopole antenna for wearable medical application. In: Malik, P.K., Lu, J., Madhav, B.T.P., Kalkhambkar, G., Amit, S. (eds.) Smart Antennas. EAI/ Springer Innovations in Communication and Computing. Springer, Cham (2022). https://doi. org/10.1007/978-3-030-76636-8_23 15. Satheesh Kumar, P., Jeevitha, Manikandan: Diagnosing COVID-19 virus in the cardiovascular system using ANN. In: Oliva, D., Hassan, S.A., Mohamed, A. (eds.) Artificial Intelligence for COVID-19. Studies in Systems, Decision and Control, vol. 358. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69744-0_5 16. Palanisamy, S., Thangaraju, B., Khalaf, O.I., Alotaibi, Y., Alghamdi, S., Alassery, F.: A novel approach of design and analysis of a hexagonal fractal antenna array (HFAA) for nextgeneration wireless communication. Energies 14(19), 6204 (2021). https://doi.org/10.3390/ en14196204 17. Kwak, Y., Huh, J.H., Han, S.T., Kim, I.: Voice presentation attack detection through textconverted voice command analysis. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, pp. 1–12 (2019) 18. Abdulbaqi, A.S., Obaid, A.J., Abdulameer, M.H.: Smartphone-based ECG signals encryption for transmission and analyzing via IoMTs. J. Discr. Math. Sci. Cryptogr. (2021). https://doi. org/10.1080/09720529.2021.1958996 19. Dutta, P., et al.: J. Phys. Conf. Ser. 1963, 012167 (2021) 20. Abdulbaqi, A., Abdulhameed, A., Obaid, A.: A secure ECG signal transmission for heart disease diagnosis. Int. J. Nonlinear Anal. Appl. 12(2), 1353–1370 (2021). https://doi.org/10. 22075/ijnaa.2021.5235
Chapter 39
An Optimized Deep Learning Algorithm for Cyber-Attack Detection M. Eugine Prince, P. Josephin Shermila, S. Sajithra Varun, E. Anna Devi, P. Sujatha Therese, A. Ahilan, and A. Jasmine Gnana Malar
Abstract A security event which tends to protect the network applications, servers, information, etc., from unauthorized user is cyber security. The emergence of network-related applications increases the privacy and security issues. Therefore, many researchers are conducted to design an effective Intrusion Detection System (IDS). However, the conventional approaches fail to detect the unknown attacks in the network. Hence, in this article, an optimized cyber-security framework named Bat Optimization-based Spiking Neural System (BAbSNS) was designed to identify the cyber-attacks accurately. The presented algorithm includes three stages, namely, data preprocessing, feature selection, and attack classification. Initially, the raw input dataset was preprocessed to eliminate the errors. Then, the important attack features are tracked and extracted using the bat-optimal fitness. Finally, the extracted features are compared with trained attack features for intrusion detection. The presented M. Eugine Prince Department of Physics, S.T. Hindu College, Nagercoil, Tamil Nadu 629002, India P. Josephin Shermila (B) Department of Artificial Intelligence and Data Science, R. M. K. College of Engineering and Technology, Chennai, Tamil Nadu 601206, India e-mail: [email protected] S. Sajithra Varun Department of IIOT, MVJ College of Engineering, Bangalor, Karnataka 560067, India E. Anna Devi Department of ECE, Sathyabama Institute of Science and Technology, Jeppiaar Nagar, Chennai, Tamil Nadu 600119, India P. Sujatha Therese Department of Electrical and Electronics Engineering, Noorul Islam Centre for Higher Education, Kumaracoil, Tamil Nadu, India A. Ahilan Department of Electronics and Communication Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India A. Jasmine Gnana Malar Department of Electrical and Electronics Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_39
465
466
M. Eugine Prince et al.
model was tested on NSL-KDD dataset and the results are determined. Moreover, a comparative assessment was made to validate the performances of the presented model. The experimental analysis states that the developed scheme outperforms the traditional schemes in terms of accuracy, precision, recall, and precision.
39.1 Introduction Cyber security deals on protecting the computer networks, data/information, servers, mobile devices, etc., from unauthorized users or access. Data security is a broader category which protects the information from attackers [1]. The emergence of Artificial Intelligence techniques integrated in the cyber-attack detection system achieved significant results. The security system in the existing techniques is usually developed to monitor and control the network traffic. Moreover, it helps in detecting the malicious activities within the traffic [2]. Typically, the Intrusion Detection System (IDS) can be either anomaly-based or signature-based. The signature-based IDS detects the malicious events by estimating the relationship between the signatures or learned rules of known attacks [3]. On the other hand, anomaly-based IDS observes the network traffic and compares the traffic with learned patterns to identify the suspicious events. The experimental analysis shows that the anomaly-based detection outperformed the traditional signature-based approach. Although the anomaly-based detection attained greater results, still they are not able to recognize the new attacks. Moreover, the false-positive rate achieved by this approach is higher. Recent researchers show that the exploration of AI techniques such as machine learning (ML) and deep learning (DL) in cyber-security problems such as anomalous detection, malware detection, and threat intelligence earned greater results in terms of detection rate, and false-detection rate [4]. The ML-based technique utilizes network resources to train the system for detecting the malicious activities. On the other hand, the DL-based approaches utilize a learning model with different layers for identifying the cyber threats [5]. The traditional IDS techniques like IDS framework based on ML-based techniques [6], cyber-security model based on Generative Adversarial Network (GAN) [7], IDS scheme based on machine learning [8], etc., increase the resource usage and computational complexity. Hence, to overcome these issues, an optimized neural system was designed in this article for malware detection in the network. The arrangement of the article is described as follows: the literatures related to cyber security are listed in Sect. 39.2, the problems in traditional cyber-security framework are described in Sect. 39.3, the developed framework is explained in Sect. 39.4, the results of the proposed model are analyzed in Sect. 39.5, and the conclusion of the article is described in Sect. 39.6.
39 An Optimized Deep Learning Algorithm for Cyber-Attack Detection
467
39.2 Related Works Some of the recent literatures related to cyber security are listed below. The wide usage of system engineering causes challenges like system security, data privacy, etc. Network detection is the process of screening the security of the system. Parasuraman et al. [6] designed an IDS framework based on ML-based techniques (supervised and unsupervised learning). This model employs Convolutional Neural Network (CNN) approach to extract the features from the dataset. The presented technique attained 93% of detection rate for User-to-Root attack (U2R) type. However, this model cannot detect unknown attacks. Cyber-attacks in the computer networks create damage to server, physical objects, and cause safety risks like privacy and data risks. Freitas et al. [7] designed a cybersecurity model based on Generative Adversarial Network (GAN). GAN is an unsupervised learning algorithm, which identifies the cyber-attacks in a simple and effective manner. The experimental analysis shows that the presented approach attained 5.5 times faster detection than the existing techniques. The growth of network-related services has created a large amount of sensitive information over the internet. This creates increased possibilities of intrusions, where the sensitive data from various applications are accessed by the unauthorized users. Hence, Soumyadeep et al. [8] presented an IDS scheme based on machine learning to detect and classify the cyber-attacks. This model is implemented and results are evaluated on CICIDS 2017 dataset. Yunfeng et al. [9] designed a Multivariate Ensemble Classification approach to predict the intrusions in the network effectively. This system reduces the cyber risks and minimizes the security threats. The presented scheme was validated on IEEE standard 14- and 118-bus system, and the results are estimated. This ensemble-based decision-making algorithm provides increased network security. Moreover, extreme and light gradient-boosting machine learning algorithms are used as individual detectors for classification of attacks in the network.
39.3 System Model with Problem Statement Network security is a system which protects the servers, data, and other electronic devices from intrusions, breaches, and other threats. The network security protects the data/information from security threats like viruses and unauthenticated users. To protect the networks from safety and privacy threats, an anomaly-based detection framework was developed. Although the anomaly-based model earned better results, still they cannot detect unknown attacks. Moreover, the evolution of networks in various fields evolves new attacks in the network-related services. Recent researchers on network security show that the integration of AI techniques offered greater results in terms of attack detection rate. However, the system training requires huge resources
468
M. Eugine Prince et al.
and increases the computational time. Therefore, an optimized neural system-based IDS framework was introduced in this article to detect the intrusions in the network.
39.4 Proposed BABSNS for Cyber-Attack Detection A hybrid Bat Optimization-based Spiking Neural System (BAbSNS) framework was designed in this article to detect the intrusions in the cyber-security network. This framework integrates the Bat Optimization Algorithm (BAO) and Spiking Neural Network (SNN) to identify the malicious data in the network. The presented approach was implemented in MATLAB tool and validated with NSL-KDD dataset. Initially, the dataset was collected from the Kaggle site and imported into the system. Further, the raw dataset was filtered to remove the null or error data present in it. The important features in the filtered dataset were extracted in the feature extraction phase. Further, the extracted features are tested with the learned features for attack or malicious event detection. Finally, the results are determined and validated with a comparative analysis. The proposed BAbSNS framework is displayed in Fig. 39.1.
Fig. 39.1 Proposed BAbSNS structure
39 An Optimized Deep Learning Algorithm for Cyber-Attack Detection
469
39.4.1 Dataset Preprocessing Kindly to design and validate the cyber-security model, a NSL-KDD network dataset was collected and imported into the system. The input dataset is split into 7:3 ratios for training and testing purposes, respectively. Initially, the dataset was initialized and preprocessed in the system to remove the unwanted data or information from the dataset. This filtering mechanism helps in improving the false-injected data accurately and reduces the detection error. In the proposed framework, the SNN properties are integrated in the filtering phase to eliminate the errors present in the dataset. The filtering mechanism is formulated in Eq. (39.1). + Nsld , τ p (Nsld ) = E ds − E ds
(39.1)
where τ p indicates the filtering function, E ds refers to the errorless data, E ds denotes the error data, and Nsld represents the NSL-KDD dataset. The integration of SNN in the proposed framework reduces the computational time effectively. The filtered dataset contains both important and unimportant features.
39.4.2 Feature Selection In intrusion detection, optimal feature selection algorithm plays a significant role, as it isolates the features useful for classification and neglects the unimportant features. This helps in improving the detection rate and reduces the resource usage. In the proposed scheme, the optimal fitness of BOA was incorporated into the feature extraction layer to track and extract the important features optimally. BOA is a type of meta-heuristic approach based on characteristic of bats. Bats are only mammals with wings and they have improved capacity of echolocation. The bats utilize echolocation sense to determine the distance between them and the food prey. This unique foraging behavior of bats is considered as the fitness solution of BOA. In the proposed scheme, this foraging property of bats is used to track the important features from the filtered dataset. The BOA fitness function is expressed in Eq. (39.2). Fx f (Nsld ) = B f + Nsld − Iˆmk ϑ.
(39.2)
Here, ϑ defines the feature tracking function parameter, Fx f refers to the feature extraction function, B f denotes the bat-optimal fitness, and Iˆmk unimportant features. These extracted features contain malicious and benign data. To detect the intrusion in the network dataset, extracted features are compared with trained features. If the trained attack features match with the extracted features, it is predicted as “malicious”. If the trained attack features not match with the extracted features, it is identified as “benign”. The workflow of the proposed framework is illustrated in Fig. 39.2.
470
M. Eugine Prince et al.
Fig. 39.2 Flowchart of BAbSNS
39.5 Results and Discussion In this article, an optimized deep neural-based cyber-security framework was developed to detect the intrusions in the network. The presented approach was trained and tested with NSL_KDD dataset (Network Intrusion Dataset). This approach involves three phases: data filtration, feature selection, and classification. The developed scheme was implemented in MATLAB software, version R2020a, and the outcomes are estimated in terms of accuracy, precision, recall, and f-measure.
39 An Optimized Deep Learning Algorithm for Cyber-Attack Detection
471
39.5.1 Performance Analysis In performance assessment, the outcome metrics like accuracy, recall, f-measure, and precision are evaluated and compared with existing IDS frameworks like Intelligent Tree-based Intrusion Detection Scheme (ITbIDS) [10], Intrusion Detection Scheme-based on Specific Deep Auto-encoder (IDSbSDA) [8], Multivariable Ensemble Classification-based Intrusion Detection Scheme (MECbIDS) [9], and Deep Learning-based Intrusion Detection Scheme (DLbIDS) [11]. The performance metrics like accuracy, precision, recall, and f-measure are expressed in Eqs. (39.3), (39.4), (39.5), and (39.6). Arq =
m+
m+ + m− , + m − + n+ + n−
Psc = Rlv = Fms
(39.3)
m+ , m + + n+
(39.4)
m+ , + n−
(39.5)
m+
Psc .Rlv . =2 Psc + Riv
(39.6)
Here, Arq , Psc , Rlv , Fms , m + , m − , n + , and n − refer to accuracy, precision, recall, fmeasure, true-positive, true-negative, false-positive, and false-negative, respectively. The overall comparative analysis is illustrated in Fig. 39.3. On testing the developed scheme on NSL-KDD dataset, the presented model earned higher performances of 98.12% accuracy, 98.45% precision, 99.34% recall, and 98.19% f-measure. But the existing techniques achieved lower performances compared with the developed scheme.
Fig. 39.3 Comparative performance
472
M. Eugine Prince et al.
39.6 Conclusion The evolution of intelligent networks increases the possibilities of network vulnerabilities. Thus, in this paper, an optimized deep learning-based cyber-security framework was developed to identify the malicious events effectively. This model combines SNN algorithm and BOA to predict and classify the attacks. Initially, the dataset was filtered and the important data features are extracted using the bat-optimal fitness function. The presented algorithm was tested and validated with NSL-KDD dataset, and the outcomes are evaluated. Furthermore, the obtained results are compared with traditional IDS framework and performance enhancement rate is determined. It is observed in the presented approach that the metrics like accuracy, precision, recall, and f-measure are improved by 6.30%, 8.96%, 6.98%, and 7.90%, respectively. Thus, the developed IDS framework detects the intrusions in the network effectively.
References 1. Saharkhizan, M., et al.: An ensemble of deep recurrent neural networks for detecting IoT cyber-attacks using network traffic. IEEE Internet Things J. 7, 8852–8859 (2020) 2. Lee, J., et al.: Cyber threat detection based on artificial neural networks using event profiles. IEEE Access 7, 165607–165626 (2019) 3. Habibi, M.R., et al.: Detection of false data injection cyber-attacks in DC microgrids based on recurrent neural networks. IEEE J. Emerg. Sel. Topics Power Electron. 9, 5294–5310 (2020) 4. Zografopoulos, I., Konstantinou, C.: Detection of malicious attacks in autonomous cyberphysical inverter-based microgrids. IEEE Trans. Ind. Inform. 18, 5815–5826 (2021) 5. Singh, S.K., Roy, P.K.: Detecting malicious DNS over https traffic using machine learning. In: 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT). IEEE (2020) 6. Kumar, P., et al.: Analysis of intrusion detection in cyber-attacks using DEEP learning neural networks. Peer-to-Peer Netw. Appl. 14, 2565–2584 (2021) 7. de Araujo-Filho, P.F., et al.: Intrusion detection for cyber–physical systems using generative adversarial networks in fog environment. IEEE Internet Things J. 8, 6247–6256 (2020) 8. Thakur, S., et al.: Intrusion detection in cyber-physical systems using a generic and domain specific deep autoencoder model. Comput. Electr. Eng. 91, 107044 (2021) 9. Li, Y., et al.: Intrusion detection of cyber physical energy system based on multivariate ensemble classification. Energy 218, 119505 (2021) 10. Al-Omari, M., et al.: An intelligent tree-based intrusion detection model for cyber security. J. Netw. Syst. Manage. 29, 1–18 (2021) 11. Ashiku, L., Dagli, C.: Network intrusion detection system using deep learning. Procedia Comput. Sci. 185, 239–247 (2021)
Chapter 40
Estimation of Wind Energy Reliability Using Modeling and Simulation Method A. Jasmine Gnana Malar, M. Ganga, V. Parimala, and S. Chellam
Abstract In wind energy systems, reliability analysis plays a significant role in increasing the lifetime of wind turbines. Moreover, improving the reliability of wind turbines minimizes the maintenance cost of the energy systems. Reliability refers to the probability that the wind turbine continues to attain its projected function without failure under operational conditions. In recent times, wind energy installation is increasing rapidly to meet the demand for pollution-free energy. However, the problems in wind energy systems like uncertainty, reliability issues, etc., need to be addressed to improve the performance. An analysis of wind energy system reliability is conducted using the Monte Carlo simulation technique. An estimate of the wind farm’s performance is achieved by establishing a resistance-load relationship. Furthermore, the Weibull probability and cumulative distribution function were used to estimate the performance of the system. The simulation results illustrate that when the number of trials increases the probability of failure and error reduces.
A. Jasmine Gnana Malar (B) Department of Electrical and Electronics Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India e-mail: [email protected] M. Ganga Department of Biomedical Engineering, Hindusthan College of Engineering and Technology, Coimbatore, Tamil Nadu 641050, India V. Parimala Department of Electrical and Electronics Engineering, KPR Institute of Engineering and Technology, Coimbatore, India S. Chellam Department of Electrical and Electronics Engineering, Velammal College of Engineering and Technology (Autonomous), Madurai, Tamil Nadu 625009, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_40
473
474
A. Jasmine Gnana Malar et al.
40.1 Introduction The rapid increase in the demand for electric demand increases the energy cost, and pollution in the environment [1]. To resolve this high demand for electric power, renewable energy sources like wind, solar, hydro, etc., are utilized to produce electric power remarkably. Among the available renewable resources, wind energy is widely deployed in electricity production because of its numerous advantages consistent, inexhaustible, and non-pollutant in nature [2]. In recent times, the concept of reliability plays a significant factor to improve the performance of wind power systems. Thus, developing a wind turbine with a greater reliability percentage is one of the challenging factors. Many researchers are developed to recognize the merits of the combination of both reliability and performance in an amalgamated derivative model [3]. Recently, a Monte Carlo simulation technique is used to design a dependability assessment for a laboratory-based wind turbine microgrid system in order to effectively analyze the system [4]. The researchers related to simulation techniques showed that the wind power system outcome computational index created a channel between the deterministic and probabilistic approaches [5]. Moreover, it defines the indices which are helpful in the evaluation of microsystems practically. Additionally, a performance assessment of harsh weather wind power systems was designed using the dependability analysis of the microsystems using Monte Carlo methods based on system transport theory [6]. In addition, an improved design for the reliability index calculation of different wind turbines is configured. An autoregressive average model and the Monte Carlo simulation method were used in order to detect the energy failure in the microgrid system [7]. Researchers have recently suggested a reliability simulation method for a microgrid system with diverse wind turbines at various heights [8]. According to the experimental examination of this strategy, certain wind turbines are higher in height, and the shear effect causes a reduction in energy loss caused by the wake effect. Furthermore, they suggested a new Monte Carlo simulation method to identify and analyze the turbine energy and wind speed. This new framework was tested with the nearby wind turbine data; thus, the results are not accurate. Thus, to offer accurate detection of wind speed, the on-site weather data is utilized. However, they face a significant challenge in predicting wind speed and turbine energy accurately. Moreover, estimating the probability of failure in the wind power system is more complex. In order to assess the reliability of the power system, the Monte Carlo simulation method is applied in the paper that is being presented. The presented article is sequenced as follows, the background of the Monte Carlo simulation technique is explained in Sect. 40.2, the methodology of the developed scheme was detailed in Sect. 40.3, the results and performances of the presented technique are explained in Sect. 40.4, and the conclusion is presented in Sect. 40.5.
40 Estimation of Wind Energy Reliability Using Modeling and Simulation …
475
40.2 Monte Carlo Simulation Kindly Monte Carlo simulation is a statistical analysis method that is used in many engineering fields to calculate the likelihood that power systems would fail [8]. This method uses random sampling and runs more computer experiments to demonstrate the statistical properties of the system’s outcomes. The procedure of the Monte Carlo simulation is illustrated in Fig. 40.1. In the first step, the cumulative distribution functions (CDF) of the random variable are linked to the created random number rnu . The analysis model of the system is expressed in Eq. (40.1). knu = FK−1 (rnu ) Fig. 40.1 Monte Carlo simulation
(40.1)
476
A. Jasmine Gnana Malar et al.
Here, K defines the random variable and rnu indicates the random number. In the second step, the performance function is evaluated. The random variable with metrics α K β K is log-normally distributed. The random variable generation at ith iteration is expressed in Eqs. (40.2), (40.3), and (40.4). ln(knu ) = α K + β K ε − 1(rnu )
(40.2)
rnu = ε(ln(knu β)K − α K )
(40.3)
knu = exp(α K + β K ε − 1(rnu ))
(40.4)
where rnu defines the random number (1 and 0) α K and β K represents the log-normal distribution parameters. The distribution can be used to create random numbers using a computer algorithm. Typically, the system itself generates random variables for common distributions. Equation (40.1) will be utilized if the system cannot generate a random variable for a specific distribution. The third step is the statistical analysis, which is used to determine the probability of failure. Resistance-Load (RL) is one of the major applications of Monte Carlo methods. This RL-based technique is frequently used to assess the performance in terms of dependability in energy systems with a certain resistance (R) for the applied load (L). In order to assess the reliability performance, this method concurrently takes into account the randomness of power generation and load. This method also provides an expectation for the energy dependability of the selected wind farm by calculating the probability of R > L.
40.3 Methodology There are various processes involved in estimating the reliability of the traditional wind energy system using the RL-based method [10]. First, it’s crucial to analyze the probability distribution of the developed system’s random and deterministic variables. To model the RL values, the estimation of probability distribution parameters is also important. The Monte Carlo method is also used to assess the likelihood that the entire system would fail.
40.3.1 Identification of Random Variables Reliability forecasting using the RL-based Monte Carlo technique involves the detection of deterministic and random variables. The parameter R in the proposed study defines the overall energy production produced over numerous years by the selected wind farm. The proposed model needs to be designed using simulated wind speed (s)
40 Estimation of Wind Energy Reliability Using Modeling and Simulation …
477
and system losses (). Equation (40.5) depicts the power produced by a single wind turbine. Pw =
1 (A R ρai Pcf s) 2
(40.5)
where Pw refers to the generated power, A R indicates the rotor area, ρai denotes the air density, Pcf represents the performance coefficient, and s defines the wind speed. The system losses for a known probability distribution are calculated using the Monte Carlo method using distribution parameters like variance and mean. The performance of the system is represented in Eq. (40.6). Prf =
1 A R ρai Pcf s − ls − ld 2
(40.6)
Here Prf refers to the performance function, ls indicates the system loss, and ld defines the load. The total loss of the system is formulated in Eq. (40.7). T pl =
(ls1 + ls2 + ls3 + · · · + lsn ) 100 × Ntur × Pot
(40.7)
Here T pl indicates the total power loss, Ntur refers to the total number of turbines in the energy system, and Pot determines the total output power.
40.3.2 Probability Analysis Graphical techniques like probability plots and histograms are utilized to analyze the probability distributions of wind speed, load, and power losses. In addition, the R and L values are modeled using the simulation methods, and distribution parameters. The optimal wind speed distribution is determined using the Easy-fit tool. This program allows for different distributional types and compares them using a number of tests, including the Anderson darling test and K-S test. Traditionally, the Weibull distribution chi-square test was used to obtain the simulated wind speed and probability distribution. The Weibull distribution utilizes two parameters like Weibull scale and shape parameter. It is expressed in Eqs. (40.8), and (40.9). WShp =
σ −1.086 sd
m
0.433 −1/ Wshp W cp = 0.568 + m Wshp
(40.8)
(40.9)
478
A. Jasmine Gnana Malar et al.
The Weibull scale parameter W cp and Weibull shape parameter Wshp are determined using the standard deviation σsd and mean m. Furthermore, the CDF of the Weibull distribution is expressed in Eq. (40.10). s F(s) = P(S ≤ s) =
f (s)ds = 1 − exp −(W sp s)Wshp
(40.10)
0
Here P(S ≤ s) is the probability of the measured wind speed which is less than or equal to s.
40.4 Result and Discussion The estimated simulation results show that the wind speed dramatically changes as it reaches 10 m/s. Before a wind speed of 10 m/s, the model’s performance is negative. The poor performance indicates that the built wind farm’s output is insufficient to meet the region’s energy needs (Fig. 40.2). The performance of the developed scheme per the wind speed is shown in Table 40.1 represents the produced power of the system against the load of the selected region. The results of the Monte Carlo simulation with many trails are shown in Table 40.2. It has been observed that as the number of trials increases, so do the likelihood of failure, error, and time consumption.
40.5 Conclusion An efficient method for estimating the reliability of the wind power system was developed in this article and is based on Monte Carlo simulation. In order to calculate the likelihood of failure in the region with the chosen wind farm, the proposed model examines the power generated and demand. According to the simulation results, there is a striking increase in simulation time along with a drop in failure probability and inaccuracy as the number of trails increases. Additionally, it has been found that the performance of the wind energy system is influenced by wind speed and power demand. The Weibull distribution is utilized to simulate the wind speed distributions. Additionally, the effectiveness of the system is assessed in relation to wind speed. The negative performance of the system between (2 and 8) m/s shows the probability of failure at that range. As a result, the system effectively determines the reliability of wind energy systems.
40 Estimation of Wind Energy Reliability Using Modeling and Simulation …
479
Fig. 40.2 a Measurement versus wind speed, b wind speed versus performance, c Weibull distribution function Table 40.1 Simulation outcomes Wind speed (m/s)
Resistance (MW)
Load (MW)
Performance (MW)
2
0.8
7.8
6.8–
4
1.8
8
5.3–
6
4.2
8.2
3.7–
8
8
8.4
10
16.7
9
14.8
12
30.3
9.1
38.2
14
47.5
8.6
44.4
16
69
8
64.9
18
90
20
143
8.6 10
0.89–
84 137
480
A. Jasmine Gnana Malar et al.
Table 40.2 Probability failure No. of trails 10,000 100,000 1,000,000
No. of failure
Failure probability (%)
Error (%)
Time (s)
22.8
6.9
1.56
22,180
22.18
2.12
15.99
221,000
22.1
0.76
1856
2280
References 1. Nazir, M.S.: Environmental impact and pollution-related challenges of renewable wind energy paradigm—a review. Sci. Total Environ. 683, 436–444 (2019) 2. Behera, B.K.: eEnergy Security. Bioenergy for Sustainability and Security, pp. 1–77 (2019) 3. Mukherjee, A.: iGridEdgeDrone: hybrid mobility aware intelligent load forecasting by edge enabled Internet of Drone Things for smart grid networks. Int. J. Parallel Prog. 49, 285–325 (2021) 4. Chatterjee, A.: Wind-PV based generation with smart control suitable for grid-isolated critical loads in Onshore, India. J. Inst. Eng. India Ser. B 1–11 (2022) 5. Yang, L.: A continual learning-based framework for developing a single wind turbine cybertwin adaptively serving multiple modeling tasks. IEEE Trans. Ind. Inf. 18, 4912–4921 (2021) 6. Javed, M.S.: Solar and wind power generation systems with pumped hydro storage: review and future perspectives. Renew. Energ. 148, 176–192 (2020) 7. Abud, T.P.: State of the art Monte Carlo method applied to power system analysis with distributed generation. Energies 16, 394 (2023) 8. Shezan, S.A.: Effective dispatch strategies assortment according to the effect of the operation for an islanded hybrid microgrid. Energ. Convers. Manage. X 14, 100192 (2022) 9. Krupenev, D., Boyarkin, D., Iakubovskii, D.: Improvement in the computational efficiency of a technique for assessing the reliability of electric power systems based on the Monte Carlo method. Reliab. Eng. Syst. Saf. 204, 107171 (2020) 10. Kannan, P.: Evaluating prolonged corrosion inhibition performance of benzyltributylammonium tetrachloroaluminate ionic liquid using electrochemical analysis and Monte Carlo simulation. J. Mol. Liq. 297, 111855 (2020)
Chapter 41
Authentication Protocol for Secure Data Access in Fog Computing-Based Internet of Things Priyanka Surendran, Bindhya Thomas, Densy John, Anupama Prasanth, Joy Winston, and AbdulKhadar Jilani
Abstract The Internet of things (IoT) is mainly an infrastructure in millions of computers and sensors that produce massive amounts of information. Computing paradigms are required near the edge devices due to latency, bandwidth, and storage constraints. Fog computing (FC) is a more sophisticated form of cloud computing, and it gives the end-users access to some of its features. FC offers end-users with processing, storage, and network resources as part of the IoT. One of the primary issues with FC is authentication and safe data access between fog servers and fog nodes, as well as fog nodes and IoT users. This research presents a technique for safe information access and authentication between fog and IoT end-users. Initial encryption is performed using SHA1 and AES algorithms. The homomorphic approach was utilized to provide additional data security in the first phase. Homomorphic encryption is a cryptographic approach that permits computations on encrypted data, avoiding the required to reveal the original message to intermediaries (servers) that are merely providing a service and not a data consumer. In this paper, an effective mutual authentication technique between cloud and fog is proposed. The performance of the proposed system is evaluated in terms of execution time. The technique is suitable for FC in an IoT setting, according to the performance and security studies. P. Surendran (B) · B. Thomas · D. John · A. Prasanth · J. Winston · A. Jilani College of Computer Studies, University of Technology Bahrain, Salmabad, Bahrain e-mail: [email protected] B. Thomas e-mail: [email protected] D. John e-mail: [email protected] A. Prasanth e-mail: [email protected] J. Winston e-mail: [email protected] A. Jilani e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_41
481
482
P. Surendran et al.
41.1 Introduction The Internet of things (IoT) refers to an ever-increasing number of physical components that are linked to another set of physical components [1]. The administration of diverse things on a network, as well as network connectivity, enables communication and data exchange. In the sectors of smart cities, smart homes, health care, intelligent mobility, factory automation, and disaster management, IoT plays a critical role. IoT application is made up of a range of IoT devices that can capture data and share it in real time so that better and smarter decisions can be made [2]. IoT development adds to the creation of massive amount of data by utilizing massive computational power, physical capacity, and connection bandwidth [3]. IoT data (45% of total information) is gathered, evaluated, and analyzed near or in the network’s corners [4]. Some IoT applications may need to respond rapidly, while others may include sensitive data that should be saved and processed locally, and still, others may generate massive amounts of data, potentially causing network congestion. IoT devices have limited processing data and battery space, making them more vulnerable to hacking, smashing, and theft [5]. Existing cloud storage solutions have evolved to address special safety and privacy issues in fog computing (FC) due to their unique properties such as unified architecture, accessibility support, location consciousness, and less latency. Because of the local storage of data and non-real-time data sharing with cloud settings, FC is more secure than cloud computing. If end-users don’t have adequate resources, they can use fog nodes as proxies to carry out safe activities [6]. In FC, there is no explicit awareness of security and privacy issues, as well as technical services. As a result, evaluating the security and protection objectives of FC is crucial before building and deploying fog-aided IoT applications. Transportation, agriculture, health care, energy generation, and distribution are just a few of the domains where IoT is being used. An IoT system is made up of a number of functional blocks that help with various system functions including detecting, identifying, actuating, communicating, and managing. The major components of IoT devices are depicted in Fig. 41.1.
Fig. 41.1 Components of IoT devices
41 Authentication Protocol for Secure Data Access in Fog …
483
The IoT system’s gadgets are capable of sensing, actuating, controlling, and monitoring the process. IoT devices may exchange data with other connected devices and applications, as well as acquire data from other devices. It analyzes data locally, transfers data back-ends to remote servers or cloud-based systems for processing, or performs other functions locally, including specific IoT network activities, depending on time and space restrictions [7]. The interface between devices and servers is handled by the communication block. IoT communication protocols operate at the data connection layer, transportation layer, network layer, and application layer, respectively. Application modeling services, computer administration, data dissemination, data processing, and software discovery services are all performed by an IoT framework. IoT tools and systems must be able to react swiftly to changing situations and take appropriate action based on their working settings, user context, and sensing surroundings. IoT systems may communicate via a variety of compatible protocols and link to other devices as well as networks. A unique identity and identification are assigned to each IoT device. Smart interfaces for IoT systems should respond to context, engage with people, and interact with the outside environment [8]. FC does not replace cloud processing and storage, but rather complements it. Fog nodes and cloud work together to establish a hierarchical network. Fog nodes store transit information analyze historical data and perform global analysis. The fog nodes are used in a variety of ways near to the network’s edge devices. Figure 41.2 depicts the basic architecture of FC. Cisco proposed FC in 2012, which is characterized as a cloud computing extension model that provides networking, processing, and storage capabilities between end devices and a regular cloud platform. IoT systems are linked to large-scale cloud infrastructure and storage via FC. The data gathered by these devices/objects must be
Fig. 41.2 Architecture of fog computing
484
P. Surendran et al.
evaluated and processed in actual time to improve the performance of IoT initiatives. It will deliver cloud networking, processing, and storage to the network edge, eliminating real-time concerns with IoT devices and providing strong and functioning IoT applications. Fog and IoT together offer a number of advantages for a variety of applications.
41.2 Related Works FC connects IoT devices to large-scale cloud and storage infrastructure. Fog and IoT together offer a lot of benefits in a variety of applications. Following that, the sensor nodes make context-aware judgments. This feature enhances the overall network energy efficiency and consequently extends the network’s lifespan across a large region. IoT connects things to sensing devices for data exchange, monitoring, and administration. In terms of logic, IoT has three layers: vision, application, and movement. The information is sensed by the layer of perception and sent to the layer of application via the Internet or network transmission. The sensor nodes are exploited in an unmanned environment that is exposed to malicious attacks. The IoT is an evolving Internet movement that encourages low latency, accessibility, and globally dispersed resources. The IoT from fog nodes is cloud-assisted. The untrusted cloud practices the confidentiality of data through many cryptographic techniques. The security in three-layer architecture, i.e., user-fog-cloud should now be ensured. Figure 41.3 shows the security issues involved in IoT. Kumar et al. [9] suggested IoT-based FC paradigms in which Cooja and Contiki were used to conduct the quality testing. The technology was put to the test in a real-time power distribution network with a variety of transmission ranges and node counts. It employs destination-oriented directed acyclic graph (DODAG) dynamic output and has unidirectional and bidirectional correspondence. All routers and their interconnections are restricted by the low-power and lossy networks (LLN). On LLN routers, power, resources, memory, and connectivity are all constrained. The LLN network consists of a few hundred to 1000 routers with a high error rate, poor data rate, and instability. Alreshidi et al. [10] introduced cloud computing, FC, IoT, and facial recognition method. These are examples of modern advanced computing technologies that might be used to augment present anti-theft systems. The cloud’s architecture is centralized, with large data centers scattered around the world, far away from client systems. The cloud provides a resource-intensive direct real-time interface with devices, whereas fog functions as a bridge between data centers and hardware, bringing it closer to end-users. Fog allows data to be processed and stored near to the data source, which is important for real-time operations. Face recognition technologies are presently being developed to help and improve the tracking process, as well as to alleviate many of the problems it confronts. Shukla et al. [11] recommended the healthcare IoT identification and authentication-oriented fog computing model. The combination of FC and blockchain addressed the problem of identifying, authenticating, and certifying healthcare IoT devices for scalable, frequent data exchange. The suggested
41 Authentication Protocol for Secure Data Access in Fog …
485
Fig. 41.3 Three-layer architecture
task is solved and implemented using the SimBlock, iFogSim, and the Python Editor tools. The suggested method increases the identification of malicious nodes’ accuracy and consistency. The ASE technique also reduces packet error for patient health data (PHD) communication between end-users and healthcare IoT. The mentioned analytical approach is used to verify certificates and keys. For further FC system verification, PHD is created and disseminated to additional fog nodes. Ionita et al. [12] proposed a fog controller in which the implementation fits nicely with the fog computing architecture. Their suggested solution employs STIX expressions to define threat and attack data, which is gathered from branch agents and integrated into alien vault’s Open Threat Exchange (OTX) feed for actionable correlations. Oma et al. [13] proposed a tree-based fog computing (TBFC) technique for spreading operations and information among hosts and fog nodes in the IoT to save energy. They discovered that the TBFC model’s overall electric energy usage is lower than the cloud computing model in the evaluation. Each fog node in the tree-based fog computing (TBFC) paradigm is responsible for both calculating and routing. In the cloud computing infrastructure, each fog node, on the other hand, serves as a routing node. Fog computing paradigm, in which operations and information are disseminated not just to servers but also to fog nodes, is presented as a way to achieve the IoT. They observed that the TBFC model uses less overall electric energy than the cloud computing method in the test. Guardo et al. [14] proposed the fog-based
486
P. Surendran et al.
IoT architecture, and it uses the two-tier fog and related resources to reduce information transferred to cloud, increase computational load balancing, and reduce wait times. Precision agriculture, which incorporates all agricultural land management systems, makes use of the recommended FC technology. Furthermore, they modeled and underlined how this architecture’s two-tier FC technology can drastically reduce the amount of information transferred to the cloud. This method also offers and discusses a prototype application for managing and monitoring farms that is based on the previous framework and has a considerable impact on both commercial and environmental performance.
41.3 Methodology The IoT connects millions of devices and sensors, resulting in tremendous data volumes. It is sent from the cloud to process and calculate the data; however, because to latency, bandwidth, and storage constraints, various computing models near the edge components are required. FC-based network gathers and distributes data across routers, reducing traffic between entry points. Fog improves service levels to lower latency and improves access to nearby Internet services [15]. Figure 41.3 depicts the structure of three-layer fog architecture. In the current state of an untrusted cloud network, it is critical to encrypt and analyze information in the encrypted data to protect secret data held on fog nodes. However, the most recent cloud solution is incompatible with the fog environment. Researchers are being done in this area to address security and privacy concerns. However, the option for delivering resource-constrained programs like IoT applications is still a complicated task. The fog servers/node claim to be completely legal, and they are connected to borders without being authorized. The improper individual might take advantage of user behavior to carry out more attacks. When fog computing adds a significant amount of final users, connections, and services, authentication and data security become key factors. The data owner must communicate encrypted data in a secure manner that prevents unwanted users and repositories from accessing it. Policy-based ciphertext for secure data transfer between the user and the fog node, ciphertext policy homomorphic Paillier encryption (CP-HPE) is used.
41.3.1 Authentication and Privacy Preservation in FOG For FC security as a service to large end-users, authentication in front fog nodes is a crucial challenge. FC frequently necessitates the use of edge equipments (sensing devices, telephones, etc.), fog devices (base stations, gateways, microservers, and other computing equipments), and cloud data sections. Standard authentication mechanisms such as certificates and PKI are ineffective in IoT devices due to
41 Authentication Protocol for Secure Data Access in Fog …
487
resource constraints. Fog nodes also enter and exit the fog layer on a regular basis, which raises the issue of mobility in end clients. A trustworthy third party (TTP) is employed in centralized systems to create a link between users and the location-based service (LBS) server. In comparison with the traditional security models, Laplacian approach-based models are advantageous in terms of execution productivity, consistency conservation of protection, information utility, and energy consumption. To decrease resource dissipation, many fog storage service providers would want to use data duplication techniques as an extension of cloud storage. The list ciphertext and information ciphertext are both based on a single basic ciphertext policy attribute-based encryption (CP-ABE) and share the same key pair, thereby increasing information access execution and lowering key management costs. Asset-required end devices can quickly gather ciphertexts over the Internet and safely re-appropriate a considerable portion of the unscrambling interaction to mist hubs. Validation across several passage levels, where fog nodes act as information collecting and control points for asset limitation devices, is the most difficult challenge in fog computing. As a result, in this scenario, a lightweight is just as critical as start-to-finish confirmation.
41.3.2 Secured Communication Between FOG and IoT The cryptographic technique is used to safeguard end-to-end data in order to ensure end-user and fog node data security. Furthermore, when a user is hacked, the complete network key update is necessary, which significantly increases the network’s overhead computation and communication within a short period of time. A novel onetime-key setup protocol that can be used to communicate with three parties has been presented. This protocol only requires information to be exchanged four times. When compared to other protocols, the protocol’s coordination and computation costs were reduced by around 20% after the performance evaluation. The architecture of fog nodes is depicted in Fig. 41.4. The fog computers interacted across secure channels provided by the regulatory agency. Registration system’s (RS) public key and the secret key should be exchanged on each fog server. There is no requirement for the fog servers to connect with one another. Only the RS and edges connect with the fog server. In practice, all servers are insecure and faulty, and thus RS is a safe bet. The nodes in fog architecture include a cloud server for the storage and retrieval of data. The registering authority consists of owners of data, fog servers, and users. Fog server provides a link between the cloud and the user. Data owner is cloud-based storage of encrypted files. Data user looks for the files in the cloud and retrieves them. The authentication procedure is depicted in Fig. 41.5. Major steps involved in this process are key generation, key exchange, and authenticate. In the key generation step, the users, RS, and fog server create the public, private key pairs. In the key exchange step, the public keys are transferred between RS and fog server, as well as fog server and users. In the authenticate step,
488
P. Surendran et al.
Fig. 41.4 Architecture of fog nodes
RS validates the fog server with the generated key, and the fog server confirms the clients with the produced key. The initial encryption is performed using SHA1 algorithm or AES algorithm. Homomorphic encryption paradigm allows the user to perform logical or mathematical operations on the data that has been encrypted. For example, two numbers p1 and p2 are available and it is possible to encrypt these numbers using a public-key encryption. This scheme utilizes two types of keys such as public key denoted by Kpb and private key denoted by Kpv . The two ciphertexts obtained are represented using Eqs. (41.1) and (41.2). c1 = E pb ( p1 )
(41.1)
c2 = E pv ( p2 ).
(41.2)
Normally, encryption aims to make all encrypted numbers indistinguishable from random numbers for anyone who does not have the private key required for decryption. E represents the encryption technique, and D represents the decryption technique in a public-key encryption method. Initially, 2 large prime numbers x and y are randomly selected to generate keys. To encrypt a plain text p, Kpb and Kpv are required and are computed using Eqs. (41.3) and (41.4).
41 Authentication Protocol for Secure Data Access in Fog …
489
Fig. 41.5 Authentication process
K pb = N = x y
(41.3)
K pv = (λ, μ) = LCM(x − 1, y − 1), λ−1 mod N .
(41.4)
Plain text p can be encrypted using Eq. (41.5). The value of r is chosen randomly and is given by r ∈ {a ∈ Z N |gcd(a, N ) = 1}. The plain text can be obtained by decrypting the ciphertext using Eq. (41.6). E( p) = c = (N + 1) p r N mod N 2
(41.5)
D(c) = p = L cλ mod N 2 μ mod N
(41.6)
L(u) =
(u − 1) N
(41.7)
Algorithm 1. Paillier Scheme 1 Step 1: for input, prime numbers are x and y, N = xy, λ = LCM(x − 1, y − 1), g ∈ Z n , the order of g is the multiple of n Step 2: The public keys are n and g Step 3: The private keys are x, y and λ
490
P. Surendran et al.
Step 4: During encryption, the plain text p < n, the ciphertext is computed using the equation c = g m r n mod n 2 Step 5: During decryption, the ciphertext c < n2 , the plain text is computed using the equation λ L c mod n 2 p = L (gλ mod n 2) mod n, where L(u) = u−1 for u = 1 mod n. n ( )
Algorithm 1. Paillier Scheme 2 Step 1: for input, prime numbers are x and y, N = xy, λ = LCM(x − 1, y − 1), g ∈ Z n , the order of g is αn Step 2: The public keys are n and g Step 3: The private keys are x, y and α Step 4: During encryption, the plain text p < α, the ciphertext is computed using the equation r c = g m g n mod n 2 Step 5: During decryption, the ciphertext c < n2 , the plain text is computed using the equation α L c mod n 2 p = L (gα mod n 2) mod n, where 1 < α ≤ λ. ( )
41.4 Results and Discussion This section explains and analyzes the proposed work’s performance and security. Table 41.1 displays the time taken to authenticate both IoT and fog nodes users using the SHA mechanism and the AES decryption and encryption processes. The performance of encryption algorithm depends on the time required for the encryption and decryption process. The computational time depends on the complexity of the algorithm. The encryption algorithm must be highly complex so that the attackers can seldom retrieve data from the ciphertext.
41 Authentication Protocol for Secure Data Access in Fog …
491
Table 41.1 Computation time for authentication Structure
SHA1 encryption (ms)
SHA1 decryption (ms)
AES encryption (ms)
AES decryption (ms)
Overall time (ms)
IoT user (IU)
0.19
0.23
1.023
2.013
3.456
Fog server (FS)
0.06
0.08
1.061
2.023
3.221
Fig. 41.6 Computation time for authentication
The IU and FS participate in the authentication process and execute the SHA1 algorithm and AES encryption and AES decryption process. SHA1 is a less complicated algorithm and requires less time to execute. In the proposed framework, for executing the encryption part of SHA1 algorithm, IU requires 0.19 ms and FS requires only 0.06 ms. For executing decryption part of SHA1 algorithm, IU requires 0.23 ms and FS requires only 0.08 ms. This implies that the execution time of FS is far better compared to the IU. But in the case of AES encryption, the computation time for IU is 0.038 ms lesser compared to FS. In the case of AES decryption, the computation time for IU is 0.01 ms lesser compared to FS. The overall computation time for authentication in IU is 3.456 ms and for FS is 3.221 ms. The FS is 0.085 ms faster than the IU in the paradigm of authentication (Fig. 41.6). The key generation time is an important factor that decides the efficiency of an authentication algorithm. Three variants of keys can be included in the proposed system, and it is based on the discretion of the user. The key sizes can be 64, 128, or 256 bits. The key generation time varies with the increase in the number of bits used as key. The variation in the key generation time with respect to the key size is depicted in Fig. 41.7 (Table 41.2). Table 41.1 provides information about the overall execution time required for the total authentication process with various node counts varying with the size of data being transferred. In this experiment, file having sizes varying from 1 to 100 KB are evaluated. Proposed authentication algorithm uses the combination of SHA1 and AES algorithms along with homomorphic Paillier schemes. RSA-3DES and RSAAES are the two algorithms used for the comparison of performance. The execution time of proposed algorithm is very less compared to other schemes and depicted
492
P. Surendran et al.
Fig. 41.7 Variation in key generation time
Table 41.2 Comparison of encryption time
File size (KB)
Time (ms) RSA-3DES
RSA-AES
Proposed
1
234
275
152
5
369
535
299
10
634
710
417
25
1343
1527
1268
50
2684
3200
1211
75
4913
5838
3694
100
7104
7921
5786
in Fig. 41.8. The increase in execution time with respect to file size is depicted in Fig. 41.8. The proposed authentication algorithm is robust against various attacks, and the computational time is very less. Thus the authentication between IU and FS can be secured using the proposed algorithm.
41 Authentication Protocol for Secure Data Access in Fog …
493
Fig. 41.8 Comparison of encryption time
41.5 Conclusion Fog computing is a sort of cloud computing in which some services are brought closer to the user. Fog computing offers end-users with processing, storage, and network capabilities as part of the Internet of things. Through fog nodes or servers, several fog nodes interact and monitor the edge independently. This work takes advantage of multilevel encryption concept to manage user-fog authentication, which comes in handy when a fresh fog server enters the system. The system builds a safe index and information encryption in which the decrypting and encrypting data increase the safety of confidential user and any approved servers subsequently decode. With an emphasis on access controls, the study generated user authorization with limited key replacements. In this authentication system, SHA1 and AES algorithms were used. In order to increase the complexity of encryption, homomorphic Paillier schemes were introduced to encrypt the ciphertext. Analysis has been performed in terms of encryption time, decryption time, and key generation time. The variation in execution time with respect to the transferred file size is also analyzed. These analyses indicate that the proposed system outperforms existing systems in terms of execution time. This technique reduces storage space and makes it perfect for a fog system in the fog network by eliminating critical security expenses by keeping just one key for each user in the method.
494
P. Surendran et al.
References 1. Rana, S., Mishra, D., Arora, R.: Privacy-preserving key agreement protocol for fog computing supported internet of things environment. Wirel. Pers. Commun. 119(1), 727–747 (2021) 2. Verma, U., Bhardwaj, D.: Design of lightweight authentication protocol for fog enabled internet of things—a centralized authentication framework. Int. J. Commun. Netw. Inform. Sec. 12(2), 162–167 (2020) 3. Maharaja, R., Iyer, P., Ye, Z.: A hybrid fog-cloud approach for securing the Internet of Things. Clust. Comput. 23(2), 451–459 (2020) 4. Andrade, L., Lira, C., Mello, B.D., Andrade, A., Coutinho, A., Prazeres, C.: Fog of things: fog computing in internet of things environments. In: Special Topics in Multimedia, IoT and Web Technologies, pp. 23–50. Springer, Cham (2020) 5. Yang, R., Xu, Q., Au, M.H., Yu, Z., Wang, H., Zhou, L.: Position based cryptography with location privacy: a step for fog computing. Futur. Gener. Comput. Syst. 78, 799–806 (2018) 6. Liu, X., Yang, Y., Choo, K.K.R., Wang, H.: Security and privacy challenges for internet-ofthings and fog computing. Wirel. Commun. Mob. Comput. (2018) 7. Patil, A.S., Hamza, R., Yan, H., Hassan, A., Li, J.: Blockchain-PUF-based secure authentication protocol for Internet of Things. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 331–338. Springer, Cham (2019, December) 8. Naeem, R.Z., Bashir, S., Amjad, M.F., Abbas, H., Afzal, H.: Fog computing in internet of things: practical applications and future directions. Peer-to-Peer Netw. Appl. 12(5), 1236–1262 (2019) 9. Kumar, A., Sharma, S., Goyal, N., Gupta, S.K., Kumari, S., Kumar, S.: Energy-efficient fog computing in Internet of Things based on routing protocol for low-power and Lossy network with Contiki. Int. J. Commun. Syst. 35(4), e5049 (2022) 10. Alreshidi, E.J.: Introducing fog computing (FC) technology to internet of things (IoT) cloudbased anti-theft vehicles solutions. Int. J. Syst. Dyn. Appl. (IJSDA) 11(3), 1–21 (2022) 11. Shukla, S., Thakur, S., Hussain, S., Breslin, J.G., Jameel, S.M.: Identification and authentication in healthcare internet-of-things using integrated fog computing based blockchain model. Internet Things 15, 100422 (2021) 12. Ionita, M.G., Patriciu, V.V.: Secure threat information exchange across the internet of things for cyber defense in a fog computing environment. Informatica Economica 20(3) (2016) 13. Oma, R., Nakamura, S., Duolikun, D., Enokido, T., Takizawa, M.: An energy-efficient model for fog computing in the internet of things (IoT). Internet Things 1, 14–26 (2018) 14. Guardo, E., Di Stefano, A., La Corte, A., Sapienza, M., Scatà, M.: A fog computing-based iot framework for precision agriculture. J. Internet Technol. 19(5), 1401–1411 (2018) 15. Saad, M.: Fog computing and its role in the internet of things: concept, security and privacy issues. Int. J. Comput. Appl. 180(32), 7–9 (2018)
Chapter 42
Efficient Data Security Using Hybrid RSA-TWOFISH Encryption Technique on Cloud Computing A. Jenice Prabhu, S. Vallisree, S. N. Kumar, R. Sitharthan, M. Rajesh, A. Ahilan, and M. Usha
Abstract In recent times, whichever technology can’t be specified faultlessly up to its permitted susceptibility. Several skills have grown up in regular life that proposes accessible data storing, actuality capable to contact the data after everywhere every time you need and utmost essentially, they deliver on the Internet use some software. Cloud computing is an example given that you through data storing and too it owns dissimilar facilities on let convention ability. In the context of cloud computing A. Jenice Prabhu (B) Arunachala College of Engineering for Women, Vellichanthai, Tamil Nadu 629203, India e-mail: [email protected] S. Vallisree Department of Electronics and Communication Engineering, Geethanjali College of Engineering and Technology, Hyderabad 501301, India e-mail: [email protected] S. N. Kumar Department of Electrical and Electronics Engineering, Amal Jyothi College of Engineering, Kanjirappally, India e-mail: [email protected] R. Sitharthan Department of Electrical and Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India e-mail: [email protected] M. Rajesh Department of Electrical and Electronics Engineering, Sanjivani College of Engineering, Korapakkam, Maharashtra 423603, India e-mail: [email protected] A. Ahilan Department of Electronics and Communication Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India M. Usha Department of MCA, MEASI Institute of Information Technology, Chennai, Tamil Nadu 600014, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_42
495
496
A. Jenice Prabhu et al.
cloud safety is a big worry. To protect the cloud environment, several investigation mechanisms are being planned. Cryptography is used to overwhelm the safety concern and attain the CIA’s possessions. Cryptography is the great equalizer of data transmission and storing safety it is an identical and valuable method to safeguard. Customary symmetric and asymmetric have certain restrictions. To resolve this, we are successful to present a fresh hybrid method to attain extraordinary data safety and privacy. In this paper, we are examining RSA and Twofish to appliance a hybrid procedure. The presentation of the hybrid method is equated through the current hybrid technique and displays that the planned technique delivers high safety and privacy of patient data. To overthrow the problems of together symmetric and asymmetric, hybrid cryptography is used.
42.1 Introduction Cloud computing explains that it is a common pool of configurable calculating source networks such as attendants, uses, systems, facilities, and storing on request concluded the Internet [1]. To stock data or procedures, cloud computing is used on distant servers. Securing data is constantly of dynamic significance then as of the acute environment of cloud computing then the huge quantities of composite data it transmits; the necessity is even further significant [2]. Hereafter, anxieties concerning data isolation and safety are demonstrated to be a fence to the larger acceptance of cloud computing facilities [3]. Employing several concerns changing their statistics to the cloud the statistics experience various variations and here are several tests to overcome. To be definite, cloud data safety is contingent on basically smearing suitable data safety techniques and security methods [4]. Computerestablished safety methods typically exploit operator approval and verification. A lot of the before succumbed cryptosystems are community key arithmetical cryptosystems needful an important quantity of processor power then frequently include an actual time-overwhelming, compound technique for an underground station group [5]. Numerous other values and methods have examined these limits. Between them, the impression could be secondhand to construct an unseen key for neural systems and numerous might deliver a possible answer for a dangerous problematic after the key conversation [6]. Cryptography negotiations the participation or mechanism of adversaries or third parties, for example, to unconstitutional parties avoiding data drip, a technique for developing and examining procedures for secure discussion [7]. Data security has several stages that are secrecy, data reliability, and justification [8]. The procedures used for this procedure are identified as cryptographic procedures or codes, these are symmetric key and asymmetric key procedure keys used as means established on two simple types are characterized. Symmetric encryptions are the eldest technique of encryption, herein, to decrypt and encrypt data a solitary clandestine key is jumblesale. The dispatcher and recipient segment of the key is a foremost disadvantage since the key interchange network can be examined by an invader to decrypt the
42 Efficient Data Security Using Hybrid RSA-TWOFISH Encryption …
497
statistics. In this study, we are examining RSA and Twofish to appliance a HA. The presentation of the hybrid method is equated by the present hybrid technique and displays that the planned technique delivers great safety and privacy of enduring statistics. The introduction of this research is presented in Sect. 42.1. Related works on this topic are presented in Sect. 42.2. In Sect. 42.3 proposed models are discussed. The result and discussion are given in Sect. 42.4. To end, the research is finished with the conclusion in Sect. 42.5.
42.2 Related Works Priyadarshini et al. [9] presented an improved encryption-established safety outline in the CPS Cloud. Owing to physical connectivity limitations, systems are additionally disposed to safety pressures. The planned method takes data quantity encryption and decryption in a translated setup. The encrypted statistics is before kept in the cloud catalog for security details. The route-discovery procedure is unique which transfer the data is after the solitary finish to the extra finish. The statistics are encrypted and established on the basis and endpoint. Cloud data safety then numerous safety procedures is presented by Tyagi et al. [10]. In Cloud, calculating users do not essential to transmit all their papers with them as cloud admittance is worldwide. Safety in the cloud is the main problem for investment manufacturing to accept this trending skill and its facilities. Consequently at hand is essential to deliver safety to the cloud. This paper purposes a cloud simple procedure and examines certain security procedures and their application in a cloud investment atmosphere. Imam et al. [11] suggested a methodical and detailed study of RSA-established cryptography discovers cryptosystems, whichever alterations in core or requests of the algorithm across dissimilar areas, methodically classifying in several groups, and finally as long as answers and signs. As an outcome, this paper will monitor investigators and experts sympathetic to the historical and current position of cryptography laterally by the option of its requests in other areas. Soni and Malik [12] have presented a well-organized cipher system for hybrid simulations by interior construction alteration. The purpose of the planned arrangement is to recover the method presentation of the flowed hybrid ideal by performance alterations to the interior construction of every procedure secondhand in the ideal by decreasing the amount of circles necessary for encryption and decryption. In this study, the presentation examination of the current hybrid cryptography ideal by the planned cipher system consuming a compact amount of sequences.
498
A. Jenice Prabhu et al.
42.3 Proposed Model 42.3.1 Twofish Algorithm Twofish is a symmetric tablet cipher, this is for AES state organization of ideals and skill scheme standard proposed to be used and operated. 128 bits, 192 bits, and 256 bits of distance three variables’ keys established to plan, inputs communications deliberate blocks joined by a key. Flexible plans and solid keys are the features. It is active, fast on hardware also software, then related on varied stages. To close, it is too suitable for stream ciphering. The essential effort of Twofish is established on the Feistel system by 16 repetitions. Toward the start, the Twofish interruptions 128-bit clear text into four block terms of 32, formerly every term is joined employing XOR through four terms of 32 bits in the load fading procedure. The outcome of fading is to be delivered into the f purpose and then components. Twofish has a purpose named the instruction purpose F, which contains five mechanisms of processes that are complete of 4 keys through 4 S-boxes through load of 8 bits and an outcome of 8 bits, monitored by a calculated cipher called a stable extreme detachment distinguishable matrix. The Pseudo-Hadamard convert is an upfront 32-bit socializing process by means of a totaling mod of 232. Subsequently 16 sequences, Twofish too achieve output fading (Fig. 42.1).
Fig. 42.1 Block diagram of proposed model
42 Efficient Data Security Using Hybrid RSA-TWOFISH Encryption …
499
42.3.2 RSA Algorithm RSA algorithm is an asymmetric cryptography procedure. It uses a public and private key encryption technique. Public key means given to everybody and it has two statistics where one number is an increase of two big prime numbers. The private key is retained in private then it is too resulting as of the similar two prime numbers. RSA keys were characteristically 1024 or 2048 bits lengthy, nevertheless specialists trust that 1024-bit keys might be cracked in the close upcoming. However until then it appears to be an impractical task. Key generation, key delivery, encryption, and decryption are the types of RSA.
42.3.2.1
Key Generation
To generate a key, select two dissimilar prime statistics too approximately p and q. These statistics are selected arbitrarily and preserved secret for safety commitments. Compute n = p ∗ q afterward selecting p and q. Aimed at public and private keys, value of n is used as modulus. Lastly, to select a whole number e which lies among 1 and λ(n). To end, by calculating private key d is evaluated in Eq. 42.1. d = e − 1(modλ(n))
42.3.2.2
(42.1)
Key Delivery
To deliver information, the recipient’s public key sender should be aware. Public key distribution is finished over a trusted channel, which is not essentially secret.
42.3.2.3
Encryption and Decryption
The encryption and decryption are finished by the support of keys, which are public and private keys, correspondingly to encode data concluded a network the dispatcher uses the recipient’s public key, then to decrypt statistics, the recipient usages its private key.
500
A. Jenice Prabhu et al.
Table 42.1 Time analysis of encryption Data
AES
Twofish
DES
Proposed
1
4.50
3.90
3.50
1.30
5
4.05
4.10
2.90
1.10
7
4.85
3.99
3.90
1.22
10
5.00
4.50
4.10
2.35
Table 42.2 Time analysis of decryption Data
AES
Twofish
DES
Proposed
1
4.30
4.45
2.99
1.23
5
4.25
4.10
2.90
1.04
7
4.85
4.40
3.50
1.17
10
5.90
4.50
4.10
2.78
Fig. 42.2 Time comparison of encryption
42.4 Result and Discussion In Tables 42.1 and 42.2, the encryption and decryption time for dissimilar input size data is calculated and showed. It is detected that the proposed technique has less encryption and decryption time equated to further symmetric methods like AES and Twofish. In Figs. 42.2 and 42.3, the algorithm DES, Twofish, and AES are equated by our hybrid procedure. DES, Twofish, and AES derived under symmetric key cryptography. Symmetric procedures require the foremost benefit of sooner implementation and efficacy for huge quantities of statistics. Proposed hybrid procedure is well-organized than the other procedures are showed in Fig. 42.2 and 42.3.
42 Efficient Data Security Using Hybrid RSA-TWOFISH Encryption …
501
Fig. 42.3 Time comparison of decryption
42.5 Conclusion The protected data storing problematic is resolved by presenting our proposed hybrid cryptography technique. The problems of the cloud are the absence of better safety and isolation. This idea proposed is calculated and applied in Java, joining the greatest methods of together Twofish (symmetric key) and RSA (asymmetric key). The RSA and Twofish procedures are used for the development of key generation, key delivery encryption, and decryption. For storage data in the cloud storing, data is encoded by Twofish, and keys are achieved using the RSA procedure. This hybrid technique obtainable profits like profligate encryption, huge prime statistics for key generation, then effective key organization. The simulation outcomes visibly display that the encryption and decryption period of the planned hybrid method is improved than further approaches measured for comparison. Acknowledgements The authors would like to thank the reviewers for all of their careful, constructive and insightful comments in relation to this work.
References 1. Mishra, S., Tyagi, A.K.: The role of machine learning techniques in internet of things-based cloud applications. In: Artificial Intelligence-Based Internet of Things Systems, pp. 105–135 (2022) 2. Akhtar, N., et al.: A comprehensive overview of privacy and data security for cloud storage. Int. J. Sci. Res. Sci. Eng. Technol. (2021) 3. Bhadra, S.: Cloud computing threats and risks: uncertainty and unconrollability in the risk society (2020) 4. Krishna, R.R., et al.: State-of-the-art review on IoT threats and attacks: taxonomy, challenges and solutions. Sustainability 13–16, 9463 (2021) 5. Saad, W., et al.: Wireless Communications and Networking for Unmanned Aerial Vehicles. Cambridge University Press (2020) 6. Hashim, F.A., et al.: Archimedes optimization algorithm: a new metaheuristic algorithm for solving optimization problems. Appl. Intell. 51, 1531–1551 (2021)
502
A. Jenice Prabhu et al.
7. Sander, B.: Democracy under the influence: paradigms of state responsibility for cyber influence operations on elections. Chin. J. Int. Law 18(1), 1–56 (2019) 8. Paiva, T.A.B.: Attacking and defending post-quantum cryptography candidates. Diss., Universidade de São Paulo (2022) 9. Priyadarshini, R., et al.: An enhanced encryption-based security framework in the CPS cloud. J. Cloud Comput. 11(1), 64 (2022) 10. Tyagi, K., Yadav, S.K., Singh, M.: Cloud data security and various security algorithms. J. Phys. Conf. Ser. 1998 (2021). IOP Publishing 11. Imam, R., et al.: Systematic and critical review of RSA based public key cryptographic schemes: past and present status. IEEE Access 9, 155949–155976 (2021) 12. Soni, P., Malik, R.: Efficient cipher scheme for hybrid models with internal structure modification
Chapter 43
Fuzzy-Based Cluster Head Selection for Wireless Sensor Networks R. Surendiran, D. Nageswari, R. Jothin, A. Jegatheesh, A. Ahilan, and A. Bhuvanesh
Abstract Selecting the cluster head over the WSN is a challenge. In the existing methodologies, the CHs require high residual energy and also reduce the quality of the system. So, to keep the system more efficient as well as to enhance the system quality a novel fuzzy-based sea lion optimization (FbSLO) mechanism has been developed in this research. Here, the cluster head selection is completely based on a fuzzy algorithm. Moreover, based on sea lion optimization shortest transmission path is measured to transmit the collected node information to the Base station (BS). Subsequently, the performance of the proposed model can be measured in four different parameters as Throughput, Network lifetime, Delay, and Residual energy. While comparing the performance score of the proposed model with the existing models, the developed model attains a better rate of performance. The proposed model attains high throughput is about 75 Mbps and the system has a poor delay is about 55 ms.
R. Surendiran (B) School of Information Science, Annai College of Arts and Science, Kumbakonam 612503, India e-mail: [email protected] D. Nageswari Department of ECE, Nehru Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India R. Jothin Department of ECE, PSN Engineering College, Tirunelveli, Tamil Nadu, India A. Jegatheesh Department of ECE, Arunachala College of Engineering for Women, Manavilai, Nagercoil, Tamil Nadu 629203, India A. Ahilan Department of ECE, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India A. Bhuvanesh Department of EEE, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_43
503
504
R. Surendiran et al.
43.1 Introduction A WSN encompasses with a huge number of small sensor nodes which are tightly arranged either intimate the phenomenon to be detected or very nearer to it [1]. The sensor nodes contain data sensing, data processing, and the components used for data communication [2]. In the communication networks, the location of sensor nodes is non-determined. Some of the applications of WSN are sensing, capturing, monitoring, detection, and processing. Moreover, clustering mechanism is considered as a major method to enhance the lifetime of a network [3]. Basically, grouping of sensor nodes are collectively called as cluster and cluster head selection enhances the network lifetime [4]. The function of CHs in WSN is it gathers the data from each cluster nodes and transmits the collected data to the BS. In this research fuzzy-based clustering algorithm is used to clustering [5]. Thus, it helps to enhance the network lifetime [6]. Routing mechanism has been used in this research and it is used for find the shortest transmission path over the network [7]. Basically, WSN contains numerous sensing nodes as well as having more paths to BS to transmit the collected node information’s [8]. But, routing by sea lion optimization helps to fins the shortest communication path to data transmission [9]. Finally, the goal of this present research is to enhance the network lifetime by increasing the parameter performance [10]. The basic clustering process can be illustrated in Fig. 43.1. Introduction of this research is given in Sect. 43.1. Recent literatures related to this topic is discussed in Sect. 43.2. Proposed model of the research is explained in Sect. 43.3 and performance of the proposed model can be detailed in Sect. 43.4 and finally, the research is concluded with the conclusion.
Fig. 43.1 Clustering in WSN
43 Fuzzy-Based Cluster Head Selection for Wireless Sensor Networks
505
43.2 Related Work Some of the recent literatures related to this topic are discussed below. Radhika and Rangarajan [11] have developed fuzzy related clustering mechanism in machine learning (ML). In this research fuzzy algorithm is developed for cluster head selection. Basically, clustering helps to enhance lifespan of the WSN. In the existing mechanisms power loss due to re-clustering and poor lifetime are the demerits. Moreover, the major aim of this research is to decrease the clustering overhead as well as to increase the network lifetime. But fuzzy-based systems are attaining poor accuracy. Murugaanandam and Ganapathy [12] have developed fuzzy-based clustering to enhance the performance over WSN. At this research, reliability oriented efficient mechanism for gathering of preference via similarity ideal solution. In this proposed design the selection of cluster heads are based on the residual energy (RE), energy usage rate, amount of nodes present in nearer, distance from the nearby nodes, and reliability index of the node. However, fuzzy-based systems are not more accepted due to poor accuracy. Robinson et al. [13] have presented the probability-based CHs and fuzzy-based multipath routing mechanism for extending the network lifetime in WSN. In this research work, power-based routing mechanism is proposed for WSN to enhance the threshold rate as well as to develop the energy efficiency. Cumulative residual node energy is helpful for validating the overall energy used for the network. Finally, the collected data are transmitted to the BS via multi-hop communication. But, fuzzy-based system is working with inaccurate input data. Thangaramya et al. [14] have presented energy aware clustering then neuro-fuzzy related routing algorithm for WSN over IoT. In IoT WSN are used for data collection as well as sending the collected information to the BS. Subsequently, packet drop, huge energy depletion, high delay, and poor packet delivery ratio are considered as the major demerits in the existing IoT-based WSN models. Thus, to avoid the above-mentioned demerits neuro-fuzzy-based clustering as well as routing mechanism model has been developed with higher rate of efficiency. As a result, the comparison analysis of the developed model proves that the proposed model have higher rate of network lifetime and packet delivery ration and poor delay. Askari [15] has developed fuzzy C-means algorithm for clustering. Inaccuracies and misplaced clustering are due to noise. The above-mentioned problem rate can be increased by cluster size. Moreover, in some cases, the small clusters are removed by large clusters and it causes more noise. In this research, fuzzy C-means (FCM) model has been developed and problems due to this proposed design have been discussed in detail. Apart from, FCM other existing mechanism related to this topic has been discussed in this research. Among them, the revised FCM can able to reduce the noise more efficiently than other existing algorithms.
506
R. Surendiran et al.
The key contribution of the research work is described as follows: • Initially, cluster is formed by the WSN nodes. After that, cluster head is selected for quick and easy data transmission. • In this research, the cluster head selection is done based on the fuzzy-based optimization algorithm. • Subsequently, the function of cluster head is it collects all the data presented in the sensor nodes and passes it to the BS. • Sea lion optimization (SLO) has been used for routing. Choosing the shortest transmission path to transmit the data is called as routing. • Consequently, the performance of the proposed model can be validated with different metrics such as.
43.3 Proposed Model In the existing methodologies such as fuzzy C-means [15] and ML [11] poor throughput, high delay rate, high noise, and poor network lifetime are considered as the major demerits. So to avoid the above-mentioned demerits a novel fuzzy clustering-based sea lion optimization (FCbSLO) model has been developed. In this research work, fuzzy clustering algorithm is used for clustering and SLO algorithm can be used for routing. Moreover, routing process helps to find the shortest path of the WSN for transmitting the collected node information to BS. Thus, the overall working function of the proposed model can be illustrated in Fig. 43.2.
Fig. 43.2 Proposed model
43 Fuzzy-Based Cluster Head Selection for Wireless Sensor Networks
507
43.3.1 Fuzzy Clustering Algorithm Sensor nodes in the WSN are collectively called as cluster. For quick and easy data transmission cluster head is choose among the clusters. The cluster head collects all the information present in the sensor nodes and then it provides the collected data to the BS. For increasing the performance of the proposed system fuzzy-based clustering algorithm has been used in this research. The function of fuzzy clustering algorithm is it can enclose more than one cluster in one data point.
43.3.2 Sea Lion Optimization SLO is a meta-heuristic algorithm and based on the hunting as well as prey detection characteristic of the sea lion is used in this research for routing process. Routing is finding the shortest transmission path and it helps the system to transmitting the data quickly to the BS than the existing models.
43.4 Result and Discussion In result and discussion section, the performance of the proposed model can be discussed in detail. Moreover increased rate of parameters shows that the system can able to select the cluster head as more efficiently than the existing models. In this proposed model the performance can be measured through different metrics such as throughput, residual energy, network lifetime, and delay. The above-mentioned performance metrics of the model can be measured and then compared with the existing models such as hybrid grey wolf and crow search optimization algorithm oriented optimal CHs (HGWCSOA-OCHS) [16], synchronous firefly algorithm related CHs (SFA-CHS) [16], artificial bee colony oriented CHs (ABC-CHS) [16], and with firefly cyclic grey wolf optimization related CHs (FCGWO-CHS) [16] models. Among them, the developed design has been attaining better performance score. The values are selected in corresponding with 1000 sensor nodes.
43.4.1 Throughput and Delay At this research, throughput defines total amount of information can be passed to the BS at certain time period. The proposed model attains 75 Mbps and it is high comparing with other four existing techniques. Thus, the throughput of the proposed model can be measured through Eq. (43.1),
508
R. Surendiran et al.
Fig. 43.3 Comparison of delay and throughput
t =
1 T
(43.1)
where t refers to the function used for calculating throughput of the developed design and the parameter T defines total amount takes the system to transmit the collected information to BS. Moreover, delay refers the amount of delay caused in the data transition of the developed model. Lower rate of delay can be achieved through the proposed design is about 55 ms. Comparing with other existing models, the proposed model attain very poor delay rate. Comparison of throughput and delay with the existing models can be illustrated in Fig. 43.3.
43.4.2 Network Lifetime and Residual Energy In the WSN, the time till the initial sensor node consumes its energy is termed as lifetime of the network. In the proposed model, 90% of network lifetime has been attained. Comparing with the existing models, the developed design attains higher network lifetime. Network lifetime of the proposed model is calculated through Eq. (43.2), β(e) =
e0 en(t)
(43.2)
where β(e) is the parameter used for calculating network lifetime of the model, e0 refers to the initial node energy, and en(t) is the energy consumption of the node per unit time. The network lifetime of the proposed model can be compared with the existing models among them the proposed model attains better network lifetime. Residual energy of the proposed design is 200 J. While comparing the rate of residual energy to the existing models, the proposed model possesses higher residual energy. Residual energy is the energy produced by the system for transmitting the data
43 Fuzzy-Based Cluster Head Selection for Wireless Sensor Networks
509
Fig. 43.4 Comparison of network lifetime and residual energy
Table 43.1 Overall performance of the proposed design
Parameters
Performance (%)
Network lifetime
90%
Residual energy
200 J
Throughput
75 mbps
Delay
55 s
over the network. Comparison of residual energy and network lifetime can be given in Fig. 43.4. Overall performance of the proposed model can be given in Table 43.1.
43.5 Conclusion In this research work, fuzzy-based sea lion optimization is developed to select the CHs in the WSN environment more effectively. Initially, sensor nodes in WSN formed in clusters. After cluster formation cluster head has been chosen through fuzzy-based algorithms. Then, the collected data from the sensor node can be passed to the BS. Before transmitting the data routing process can be done to select the shortest path of the network. For path selection, sea lion optimization can be used. Moreover, the performance of the proposed model can be measured in terms of different metrics. The performance score of the proposed model is compared with the existing models. The throughput of the proposed model is 75 Mbps; the proposed model enhances 18% of the throughput rate than the existing models. Subsequently, a lower rate of delay of about 55 ms is achieved through the proposed model; 190 ms of delay can be reduced in the proposed model than the existing models. Subsequently, the network lifetime of the developed design is in the range of 90%; 11% network lifetime is enhanced by the proposed model. Finally, the residual energy required for the network is about 200
510
R. Surendiran et al.
J; 10-J energy can be improved by the developed design. The increased performance rate of the proposed model shows that the developed system can able to transmit the node information to the BS more efficiently.
References 1. Bochie, K., et al.: A survey on deep learning for challenged networks: applications and trends. S J. Netw. Comput. Appl. 194, 103213 (2021) 2. Adimoolam, M., et al.: Green ICT communication, networking and data processing. In: Green Computing in Smart Cities: Simulation and Techniques, pp. 95–124 (2021) 3. Xu, C., et al.: An energy-efficient region source routing protocol for lifetime maximization in WSN. IEEE Access 7, 135277–135289 (2019) 4. Sudha, C., Suresh, D., Nagesh, A.: An enhanced dynamic cluster head selection approach to reduce energy consumption in WSN. In: Innovations in Electronics and Communication Engineering: Proceedings of the 8th ICIECE 2019. Springer Singapore (2020) 5. Yadav, R.K., Mahapatra, R.P.: Hybrid metaheuristic algorithm for optimal cluster head selection in wireless sensor network. Pervasive Mob. Comput. 79, 101504 (2022) 6. Salem, A., Amer, O., Shudifat, N.: Enhanced LEACH protocol for increasing a lifetime of WSNs. Pers. Ubiquitous Comput. 23, 901–907 (2019) 7. Mittal, M., et al.: Analysis of security and energy efficiency for shortest route discovery in lowenergy adaptive clustering hierarchy protocol using Levenberg-Marquardt neural network and gated recurrent unit for intrusion detection system. Trans. Emerg. Telecommun. Technolog. 32(6), e3997 (2021) 8. Toor, A.S., Jain, A.K.: Energy aware cluster based multi-hop energy efficient routing protocol using multiple mobile nodes (MEACBM) in wireless sensor networks. AEU-Int. J. Electron. Commun. 102, 41–53 (2019) 9. Kumar Pulligilla, M., Vanmathi, C.: An authentication approach in SDN-VANET architecture with Rider-Sea Lion optimized neural network for intrusion detection. Internet Things 100723 (2023) 10. Sharma, H., Haque, A., Jaffery, Z.A.: Maximization of wireless sensor network lifetime using solar energy harvesting for smart agriculture monitoring. Ad Hoc Netw. 94, 101966 (2019) 11. Radhika, S., Rangarajan, P.: On improving the lifespan of wireless sensor networks with fuzzy based clustering and machine learning based data reduction. Appl. Soft Comput. 83, 105610 (2019) 12. Murugaanandam, S., Ganapathy, V.: Reliability-based cluster head selection methodology using fuzzy logic for performance improvement in WSNs. IEEE Access 7, 87357–87368 (2019) 13. Robinson, Y.H., et al.: Probability-based cluster head selection and fuzzy multipath routing for prolonging lifetime of wireless sensor networks. Peer-to-Peer Netw. Appl. 12, 1061–1075 (2019) 14. Thangaramya, K., et al.: Energy aware cluster and neuro-fuzzy based routing algorithm for wireless sensor networks in IoT. Comput. Netw. 151, 211–223 (2019) 15. Askari, S.: Fuzzy C-means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: review and development. Exp. Syst. Appl. 165, 113856 (2021) 16. Subramanian, P., et al.: A hybrid grey wolf and crow search optimization algorithm-based optimal cluster head selection scheme for wireless sensor networks. Wirel. Pers. Commun. 113, 905–925 (2020)
Chapter 44
An Optimized Cyber Security Framework for Network Applications B. Veerasamy, D. Nageswari, S. N. Kumar, Anil Shirgire, R. Sitharthan, and A. Jasmine Gnana Malar
Abstract The evolution of computer networks and Internet of Things (IoT) in various fields increases the privacy, and security concerns. The increased usage of network-related applications demands a cost-efficient cyber security framework to protect the system from attackers. In this article, an optimized neural-based cyber security model named Golden Eagle-based Dense Neural System (GEbDNS) was designed to detect the intrusion in the network. Initially, the network dataset CICIDS 2017 was collected and imported into the network. The dataset contains both normal and abnormal data. The raw dataset was pre-processed to eliminate the training flaws and errors and the important data features are extracted. Further, the optimal data features are selected for detection phase using the optimal solution of golden eagle optimization. Then, the selected data features are matched with the trained attack data for attack classification. Finally, the results are evaluated and verified with existing techniques in terms of accuracy, true-positive rate, and false-positive rate. B. Veerasamy (B) Department of Electronics and Communication Engineering, Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu 626126, India e-mail: [email protected] D. Nageswari Department of ECE, Nehru Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India S. N. Kumar Department of Electrical and Electronics Engineering, Amal Jyothi College of Engineering, Kanjirappally, Kottayam, India e-mail: [email protected] A. Shirgire Civil Engineering, Dr D Y Patil Institute of Technology, Pimpri, Pune, Maharashtra 411018, India R. Sitharthan Department of Electrical and Electronics Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India e-mail: [email protected] A. Jasmine Gnana Malar Department of Electrical and Electronics Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_45
511
512
B. Veerasamy et al.
The experimental analysis states that the developed security framework outperforms the traditional schemes with greater accuracy of 98.45%.
44.1 Introduction Nowadays, the concerns over privacy and security in the computer networks are growing rapidly and the computer security has become a basic requirement for the transmission of information with each other [1]. The wide usage of network applications and the modern technologies like Internet of Things (IoT), cloud computing, etc., increases the possibilities of security threats [2]. In IoT, a huge number of network devices like sensors, computers, servers, etc., are interconnected without the demand of human [3]. Presently, the IoT network plays a significant role in various day-to-day applications like healthcare units, smart grid, farming, transportation, etc. These IoT connected applications can be accessed by everyone at any place over the Internet connection. They significantly save time, and resource, thus it opens enormous opportunities for the transfer of information, knowledge, and growth [4]. Typically, the Internet is the center and core of the IoT applications, thus the security threats over the Internet affects the performance of the IoT applications [5]. Generally, in IoT networks the senor nodes gather information from various fields like agriculture, healthcare, military, transportation, and industries and store it in the cloud server for further data analysis purposes. The cyber-attacks in IoT applications lead to false-data injection which enables the attackers to alter the collected information [6]. Therefore, developing an effective cyber security framework is important to protect the system from hackers (attackers). The cyber-attacks in the network increase the traffic and congestion, which leads to loss of information [7]. The traditional cyber security framework includes training and testing process in which the system will be trained to detect the attacks. Although the conventional approach identifies the cyber-attacks effectively, the massive increase in the amount of information available in the network makes it inappropriate. Thus, the traditional schemes are not suitable for large-scale network datasets [8]. To overcome these challenges, the Artificial Intelligence (AI) techniques are applied in the cyber security models. The AI technique includes machine learning (Ml), and deep learning (DL) approaches to detect the cyber-attacks [9]. The MLbased methods like Random Forest, Support Vector Machine, Gradient boosting, etc., involves training the system with numerous resources to predict the attacks, whereas the DL-based technique involves feature extraction and selection to track the cyber-attacks [10]. The comparative security performance of ML and DL-based techniques shows that the deep neural systems outperform in terms of true-positive rate, and detection accuracy. However, the detection of unknown attacks is not possible using these techniques. Moreover, they utilize huge resources and increase the time complexity. Thus, in this article an optimal deep learning algorithm was designed to detect the cyber-attacks effectively. The major contributions of the proposed work are described as follows:
44 An Optimized Cyber Security Framework for Network Applications
513
• Initially, a network intrusion dataset was collected and initialized in the MATLAB system. • A hybrid optimized deep neural framework named (GEbDNS) was designed in the system with detective mechanism to identify the cyber-attacks. • The raw dataset was standardized using the pre-processing function, and the data features are tracked using the optimal fitness solution. • The outcomes of the proposed method were evaluated in terms of true-positive rate, false detection rate, accuracy, etc., and a comparative analysis was performed to validate the results. The arrangement of the article is described as follows, the motivation of the cyber security is illustrated in Sect. 44.2, the proposed technique is explained in Sect. 44.3, the results of the proposed method is described in Sect. 44.4, and the conclusion of the article is described in Sect. 44.5.
44.2 Related Work A cyber-attack is the act of hijacking the information or data available in the system over the Internet. In day-to-day life, the applications related to Internet were growing rapidly, which lead to network vulnerabilities. This huge usage of data demands an effective cyber security framework, which accurately predicts and neglects the attacks available in the network. To detect the intrusion various security models are developed in the past. However, the traditional schemes are not applicable for largescale networks. To analyze the huge network data, ML and DL-based techniques are integrated in the security system to detect the attacks in a timely manner. Although the AI techniques attained greater accuracy, they utilize more resources to train the system. These techniques demand huge resources and increases the computational cost. Therefore, in this paper an optimized neural-based approach was developed to detect the cyber-attack optimally.
44.3 Proposed GEbDNS Model For Cyber-Attack Detection A hybrid Golden Eagle-based Dense Neural System (GEbDNS) was designed to detect the cyber-attacks in the IoT network. The designed scheme incorporates dense neural network (DNN) and golden eagle optimization (GEO). Initially, the IDS dataset was collected and imported into the system. To initialize the detection process, the raw dataset was standardized using the pre-processing function. In the proposed model, the DNN features are utilized as the filtering mechanism. The optimal fitness solution of the GEO was applied to track and extract the data features. Further, the optimal data features are selected from
514
B. Veerasamy et al.
Fig. 44.1 Proposed GEbDNS framework
the extracted features for attack detection. The framework of the proposed model is illustrated in Fig. 44.1. The results of the presented algorithm were estimated and validated with the traditional schemes in comparative analysis.
44.3.1 Data Collection and Pre-processing The initial phase of cyber-attack detection is the data collection in which the data was gathered from the Kaggle site and imported into the system. The imported dataset contains errors, and null values in it. Thus, the dataset was pre-processed to eliminate the training flaws, errors, and null values. The pre-processing mechanism not only removes the errors but also increases the detection accuracy, and reduces
44 An Optimized Cyber Security Framework for Network Applications
515
the computational time. In the developed scheme, the DNN attributes are integrated in the pre-processing phase to filter the raw dataset. The pre-processing function is represented in Eq. (44.1). Pre [dt , ndt ] =
1 2 exp(dt −ndt ) 2π α
(44.1)
Here Pre represents the pre-processing function, dt denotes the input data, ndt indicates the noise data, and α refers to the pre-processing variable. The next step of attack detection is feature extraction.
44.3.2 Feature Extraction Feature extraction is one of the important steps in attack detection algorithm. In this phase, the meaningful data features are tracked and extracted using the golden eagle fitness function. The GEO is one of the meta-heuristic approaches which are developed to solve the multi-objective problems. Here, GEO fitness function was attacking exploitation attribute was utilized to track and extract the meaningful features. Further, the optimal data features are selected from the extracted features using the prey selection attribute. The feature extraction and selection is represented in Eqs. (44.2) and (44.3). m
mdti .m dti
(44.2)
Fsel = κ mdt − m dt
(44.3)
G ext =
i=1
where G ext indicates the feature extraction function, è denotes the feature tracking variable, mdt refers to the meaningful data, m dt indicates the meaningless data, Fsel denotes the feature selection function, and κ GEO optimal fitness solution. Thus, the optimal features are selected from the dataset for attack detection. These selected features are matched with the trained attack data for detection purpose. If the attack features match with the selected features, it is predicted as “Malicious data”. If the selected features not match with the trained attack feature, it is detected as “Benign data”. Thus, the presented scheme detects the cyber-attacks in the network effectively.
516
B. Veerasamy et al.
44.4 Result and Discussion In recent times, the growth of network-related applications increases the possibilities of cyber-attacks over the Internet. In this article, a hybrid cyber security framework was designed to detect the attacks in the network effectively. This presented model was tested and validated with a CICIDS 2017 dataset and the results are evaluated. In the developed model, the golden eagle optimal fitness is integrated in feature extraction phase to track and select the optimal features for attack detection. The results are estimated by executing the proposed technique on MATLAB software version R2020a. Furthermore, a comparative analysis was performed to validate the obtained results with the conventional cyber security models. The performances of the proposed technique are tabulated in Table 44.1. The comparison of proposed technique performance with the existing technique is displayed in Fig. 44.2. Table 44.1 Performance assessment Metrics
Performance
Accuracy (%)
98.45
False-positive rate (%)
1.02
True-positive rate (%)
98.54
Computational time (ms)
Fig. 44.2 Comparative analysis
7.45
44 An Optimized Cyber Security Framework for Network Applications
517
44.5 Conclusion In this article, a novel cyber security framework was developed with dense neural system and golden eagle optimization to detect the cyber-attacks effectively. This framework integrates involve data processing, feature extraction, feature selection, and attack detection. The fitness function of GEO is utilized to select the optimal data features for attack detection. The presented approach was trained and tested with network intrusion dataset. Further, the outcomes of the presented approach were evaluated and compared with some existing techniques in comparative analysis section. Moreover, the performance improvement score is also determined from the comparative assessment. In the developed scheme, the performance metrics like accuracy and true-positive rate are increased by 3.56% and 11.58%, respectively. In addition, the false-positive rate is minimized by 2.87% in the presented approach. Acknowledgements The authors would like to thank the reviewers for all of their careful, constructive and insightful comments in relation to this work.
References 1. Arauz, T., Chanfreut, P., Maestre, J.M.: Cyber-security in networked and distributed model predictive control. Annu. Rev. Control 53, 338–355 (2022) 2. Al-Sanjary, O.I., et al.: Challenges on digital cyber-security and network forensics: a survey. In: Advances on Intelligent Informatics and Computing: Health Informatics, Intelligent Systems, Data Science and Smart Computing, pp. 524–537. Springer International Publishing, Cham (2022) 3. Mandru, D.B., et al.: Assessing deep neural network and shallow for network intrusion detection systems in cyber security. In: Computer Networks and Inventive Communication Technologies: Proceedings of Fourth ICCNCT 2021. Springer Singapore (2022) 4. Zhu, J., et al.: A few-shot meta-learning based siamese neural network using entropy features for ransomware classification. Comput. Sec. 117, 102691 (2022) 5. Ullah, I., Mahmoud, Q.H.: An anomaly detection model for IoT networks based on flow and flag features using a feed-forward neural network. In: 2022 IEEE 19th Annual Consumer Communications and Networking Conference (CCNC). IEEE (2022) 6. Lo, W.W., et al.: E-graphsage: a graph neural network-based intrusion detection system for iot. In: NOMS 2022–2022 IEEE/IFIP Network Operations and Management Symposium. IEEE (2022) 7. Tekerek, A., Yapici, M.M.: A novel malware classification and augmentation model based on convolutional neural network. Comput. Sec. 112, 102515 (2022) 8. Kanna, P.R., Santhi, P.: Hybrid intrusion detection using mapreduce based black widow optimized convolutional long short-term memory neural networks. Exp. Syst. Appl. 194, 116545 (2022) 9. Gehlot, A., et al.: Application of neural network in the prediction models of machine learning based design. In: 2022 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES). IEEE (2022) 10. Zhang, Z., et al.: Artificial intelligence in cyber security: research advances, challenges, and opportunities. Artif. Intell. Rev. 1–25 (2022)
518
B. Veerasamy et al.
11. Evangelou, M., Adams, N.M.: An anomaly detection framework for cyber-security data. Comput. Sec. 97, 101941 (2020) 12. Hossein, M.R., et al.: Anomaly detection in cyber-physical systems using machine learning. In: Handbook of Big Data Privacy, pp. 219–235 (2020) 13. Jia, Y., et al.: Adversarial attacks and mitigation for anomaly detectors of cyber-physical systems. Int. J. Crit. Infrast. Prot. 34, 100452 (2021) 14. Alguliyev, R., Imamverdiyev, Y., Sukhostat, L.: Hybrid DeepGCL model for cyber-attacks detection on cyber-physical systems. Neural Comput. Appl. 33(16), 10211–10226 (2021)
Chapter 45
Modified Elephant Herd Optimization-Based Advanced Encryption Standard R. Surendiran, S. Chellam, R. Jothin, A. Ahilan, S. Vallisree, A. Jasmine Gnana Malar, and J. Sathiamoorthy
Abstract Cryptography is commonly employed to ensure secure data transfer via unsecure communication networks. With the rising need for picture transmission confidentiality and privacy, an efficient encryption approach becomes vital. The architectural flow of the newly created technique is implied in the proposal. Based on Cryptographic Method for Digital Picture Security, the design of an efficient Modified Elephant Herd Optimization-based Advanced Encryption Standard (MEHOAES) is presented. The suggested MEHO-AES approach employs multilevel discrete Cosine transform (DCT) for image decomposition, with the input picture divided into R. Surendiran (B) School of Information Science, Annai College of Arts and Science, Kumbakonam 612503, India e-mail: [email protected] S. Chellam Department of Electrical and Electronics Engineering, Velammal College of Engineering and Technology (Autonomous), Madurai, Tamil Nadu 625009, India e-mail: [email protected] R. Jothin Department of Electronics and Communication Engineering, PSN Engineering College, Tirunelveli, Tamil Nadu, India A. Ahilan Department of Electronics and Communication Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India S. Vallisree Department of Electronics and Communication Engineering, Geethanjali College of Engineering and Technology, Hyderabad, India e-mail: [email protected] A. Jasmine Gnana Malar Department of Electrical and Electronics Engineering, PSN College of Engineering and Technology, Tirunelveli, Tamil Nadu 627152, India J. Sathiamoorthy Department of Computer Science and Engineering, R.M.K. Engineering College, Kavaraipettai, Tamil Nadu 601206, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_46
519
520
R. Surendiran et al.
RGB components for examining the fundamental colors of all image sections. Moreover, the AES encryption technology is used throughout the encryption procedure. Additionally, the MEHO method is used to determine the best encryption keys.
45.1 Introduction Cryptography, sometimes known as cryptology, is the art and research of safe communication tactics in the face of hostile behavior [1]. In a larger sense, cryptography is the development and application of technologies that prevent third parties or the general public from reading private messages [2]. Encryption is the process of encoding information in cryptography. This procedure transforms plaintext, the original information representation, to binary code into ciphertext, an alternative version [3]. Initially, identical keys are used in symmetric key cryptography to accomplish both encryption and decryption [4]. The secret key used is only known to the declared transmitter and receiver. As a result, a symmetric key cryptosystem demands secure data transport [5]. Also, non-identical keys are used in an asymmetric key cryptosystem. The main idea behind this module is that while it is computationally viable to generate a pair of public and private keys, it is still difficult for a receiver to obtain the secret key from the public key [6]. The Advanced Encryption Standard (AES) is a cryptographic method that, when applied correctly, may be used to protect data [7]. This AES method operates on 4 × 4 matrix data chunks. Symmetrical ciphertext blocks may both encrypt and decode information [8]. AES is used to safeguard vital data in software and hardware all around the world. It is critical for government computer security, cybersecurity, and the safeguarding of electronic data [9]. The main issue with AES symmetric key encryption is that the key must be delivered to the entity with whom you are exchanging data [10]. Symmetric encryption keys are frequently encrypted and transferred separately using an asymmetric technique such as RSA. The following have made significant contributions to the work: • Based on Cryptographic Method for Digital Picture Security, the design of an efficient Modified Elephant Herd Optimization-based Advanced Encryption Standard (MEHO-AES) is presented. • The MEHO-AES model that is being described employs multilevel discrete Cosine transform (DCT) for image decomposition, with the input picture being divided into RGB components for examining the primary colors of each image component. • Moreover, the AES encryption method is used throughout the encryption procedure. Moreover, the MEHO method is used to choose the best encryption keys. The remainder of the research is organized in the following manner. The literature review is represented in Sect. 45.2. The proposed methodology is represented
45 Modified Elephant Herd Optimization-Based Advanced Encryption …
521
in Sect. 45.3. Section 45.4 represents the experimental results, and Sect. 45.5 represents the conclusion and future work.
45.2 Literature Review In 2022, Walid et al. [11] had proposed an optical-based method that guarantees the quick transmission and security of medical images (whether in color or grayscale) through unreliable channels. This hybrid optical-based technology was created for the safe transmission of medical pictures in either color or grayscale. It employs effective steganography, encryption, and hashing methods. The final ciphertext medical image will be created by encrypting the digital pixels of these components using a Rubik’s cube-based encryption technique. In 2020, Zhou et al. [12] had proposed a double random-phase encoding (DRPE) and compressed sensing-based image encryption technique with reliable and blind authentication capability (CS). DRPE is used to extract the plaintext image’s phase information, which is then quantized to produce authentication information. CS applies compression on the plaintext image, and the sigmoid map is used to quantize the measurements. In 2020, Chai et al. [13] have developed a double random encryption method and compressive sensing-based color image compression methodology. A discrete wavelet transform is used to initially split up a color plain image’s red, green, and blue component into three sparse coefficient matrices (DWT). The coefficient matrices are then made even more complicated by the addition of a double random position permutation (DRPP). In 2022, Rani et al. [14] had developed a brand-new technique for encrypting images based on the fused magic cube, which is created by fusing two magic cubes of the same or different orders. Moreover, the fused magic cube’s structural complexity, larger key space, and pseudo randomness properties all enable its use in the encryption of digital images. The suggested image encryption approach does not require any additional components because the fused magic cube is utilized in both the confusion and diffusion phases. In 2022, Zhao et al. [15] have suggested an alternative quantum binomial distribution with a controlled Rubik’s Cube transformation for a color image encryption technique. Initially separated from one another, the three-color image channels (Channel R, Channel G, and Channel B) are three. After that, each image pixel is arbitrarily allocated to one of the six Rubik’s Cube faces, which are then divided up and put in a certain order on a two-dimensional plane. In 2022, Liu et al. [16] had suggested a more effective version of the Advanced Encryption Standard (AES) method that uses the alternating quantum walk (AQW) as the keystream generator and probability distribution matrix, respectively. The AQW was applied to the conventional AES algorithm, which theoretically resulted in the creation of a secure key for the algorithm by combining the two systems’ individual chaotic dynamics.
522
R. Surendiran et al.
45.3 Proposed Methodologies The Cryptographic Method for Digital Image Security is used to build an efficient MEHO-based Advanced Encryption Standard (MEHO-AES). The MEHOAES model provided here employs multilevel discrete Cosine transform (DCT) for image decomposition, with the input picture classed as RGB components for examining the fundamental colors of all image sections. Moreover, the AES encryption technology is used throughout the encryption procedure. Additionally, the MEHO method is used to determine the best encryption keys. The following figure shows the flow diagram of the proposed methodology as shown in Fig. 45.1.
45.3.1 Medical Data The term “medical data” refers to a broad category of information on a patient, including their history, clinical findings, diagnostic test results, pre-operative care, operation notes, post-operative care, and daily records of their progress and medication. Health data classification (also known as medical coding or medical classification) is the process of converting the nomenclature of medical diagnoses and procedures into a widely accepted code number system.
45.3.2 Separate of RGB Components Red, green, and blue are represented by the abbreviation. A technique for specifying the colors that will be used on a visual display device is called the RGB (red, green, and blue) color model. Every visible spectrum color may be created by mixing
Fig. 45.1 Overall flow diagram of proposed method
45 Modified Elephant Herd Optimization-Based Advanced Encryption …
523
different ratios of red, green, and blue. The initials of the three additive fundamental colors red, green, and blue were used to create the model’s name. A good RGB separation is indicated by RGB lines in ColourSpace that are closer to the white line (or the black line in LightSpace). After calibration, the RGB separation will be outstanding. It should be noted that the RGB lines will not cross the white/black line. RGB channels are used in computer displays and roughly correlate to human eye color receptors.
45.3.3 Discrete Cosine Transform DCT to proceed with the invisible transaction of coefficient extraction for further encryption. In, DCT the picture is transformed first and the message is contained inside the image and LSB sensitive information is included in “Sensitive Bits.” As a result, these models’ operational characteristics vary depending on the variables used. Before to being validated on a regression framework, the ELM approach is initially sampled separately from a picture. Choosing the ideal location for message embedding with optimal measures is therefore linked to estimated metrics. Homogeneity, contrast, and texture are a few of the training aspects. ELM is also used to solve the overfitting problems. Extended ELM is intended to perform better than already existing techniques employing imperceptibility.
45.3.4 Elephant Herd Optimization-Based Advanced Encryption Standard Improved EHO Algorithm: This model’s precise tweaking of parameters finds the global minimum with dependable convergence efficiency while balancing these two phases. Next, employing adaptive measurements and accurate shift between exploration and exploitation has been useful for further optimization. Moreover, only a portion of iterations is used for exploration, with the other half being used for exploitation. The goal of maximal search space exploration is often to reduce the likelihood of local optimum stagnation. Excessive investigation yields the same outcomes as chance and produces no better outcomes. The least randomness, however, is crucial to further exploitation. Exploration and exploitation should thus coexist in harmony. ) ( d , B =2 1− D
(45.1)
where B denotes the current iteration and suggests a large number of previous iterations. The recently put-out MEHO model uses an exponential function to break down values. Suppose that,
524
R. Surendiran et al.
) d2 B = 1− 2 . D (
(45.2)
The exponential decay function is used to apply maximum iteration values for exploration and exploitation.
45.3.5 Encryption Process Using DES The approach converts regular text, which is delivered in 64-bit blocks, into ciphertext using 48-bit keys. The secret picture is now broken down into RGB elements. With AES, the discrete components are encrypted. To provide the highest level of security throughout the embedding process, this is encrypted. The opposite principle of encryption is then given by decryption.
45.4 Performance Evaluation Figure 45.2 shows the sample images for the validation of the MEHO-AES method.
Fig. 45.2 Sample images
45 Modified Elephant Herd Optimization-Based Advanced Encryption …
525
Table 45.1 Existing with proposed MEHO-AES method Images
MSE
PSNR
NCC
MEHO-AES
EHO
MEHO-AES
EHO
MEHO-AES
EHO
Lungs
0.124
0.532
57.175
55.86
0.999
0.998
Stethoscope
0.061
0.282
60.245
58.62
0.999
0.999
CT-scan
0.061
0.305
60.263
58.45
0.999
0.999
X-ray
0.097
0.683
58.240
54.78
0.999
0.999
EEG
0.043
0.315
61.783
58.18
0.999
0.999
Table 45.1 summarizes the MEHO-AES model’s findings in terms of MSE, PSNR, and NCC. The EHO technique enhanced MSE of 0.532 on the applied lungs picture, but the MEHO-AES method obtained a lower MSE of 0.532. At the same time, on the used stethoscope picture, the EHO method acquired a higher MSE of 0.282, while the MEHO-AES method obtained a lower MSE of 0.061. Finally, on the used CT-scan image, the EHO algorithm obtained a higher MSE of 0.305, while the MEHO-AES method obtained a lower MSE of 0.061. Furthermore, on the used X-ray picture, the EHO technique received a higher MSE of 0.683, but the MEHO-AES model obtained a lower MSE of 0.097. For the applied EEG picture, the EHO method produced a higher MSE of 0.315, whereas the MEHO-AES method obtained a lower MSE of 0.043.
45.4.1 Experimental Results The experimental arrangement of the proposed technique-based smart appliances was implemented using MATLAB. Accuracy, specificity, precision, and recall are the different metrics used to evaluate it. A comparison of the proposed technique performance with HGPDO, CNN, and ANN is made. Figure 45.3 employs established techniques to perform the MEHO-AES MSE analysis on separate pictures. The graph showed that the MEHO-AES method has demonstrated successful results by providing a low MSE value. Figure 45.4 shows the MEHO-AES PSNR analysis using current methods with various test pictures. The figure showed that by obtaining a greater PSNR, the MEHO-AES model has demonstrated effective outcomes. Using current methods and a variety of test pictures, Fig. 45.5 shows the NCC analysis of the MEHO-AES. By achieving a higher NCC, the MEHO-AES model was shown to have portrayed effective outcomes in the figure. Figure 45.6 shows a comparative result analysis of the proposed model. The parameters accuracy, specificity, precision, and recall for the DRPE, AES, and DWT techniques have been compared.
526 Fig. 45.3 MSE analysis of MEHO-AES method
Fig. 45.4 PSNR analysis of MEHO-AES method
Fig. 45.5 NCC analysis of MEHO-AES model
R. Surendiran et al.
45 Modified Elephant Herd Optimization-Based Advanced Encryption …
527
Fig. 45.6 Comparison via performance analysis
45.5 Conclusion The Cryptographic Method for Digital Image Security is used to build an efficient MEHO-based Advanced Encryption Standard (MEHO-AES). The MEHO-AES model provided here employs multilevel discrete Cosine transform (DCT) for image decomposition, with the input picture splitted as RGB components for examining the fundamental colors of all image sections. Moreover, the AES encryption technology is used throughout the encryption procedure. Additionally, the MEHO method is used to determine the best encryption keys. A number of tests were carried out to assess the effective performance of the provided MEHO-AES method, and the findings are analyzed across several dimensions. The obtained simulation results ensured that the provided strategy produced effective outcomes. Watermarking techniques can be used to enhance performance in the future. Acknowledgements The authors would like to thank the reviewers for all of their careful, constructive, and insightful comments in relation to this work.
References 1. Zhao, C., Zhao, S., Zhao, M., Chen, Z., Gao, C.Z., Li, H., Tan, Y.A.: Secure multi-party computation: theory, practice and applications. Inf. Sci. 476, 357–372 (2019) 2. Hakamin, Z.U., Mary, P., Kirshnakanth, K.K.: Lossless and reversible data hiding in encrypted images with public key cryptography. Int. J. Human Comput. Intell. 1(1), 13–17 (2022) 3. Tayal, S., Gupta, N., Gupta, P., Goyal, D., Goyal, M.: A review paper on network security and cryptography. Adv. Comput. Sci. Technol. 10(5), 763–770 (2017)
528
R. Surendiran et al.
4. Anusha, R., Shankari, N., Shetty, V.S., Bhat, S.: Analysis and comparison of symmetric key cryptographic algorithms on FPGA. In: 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), pp. 293–300. IEEE (2022) 5. Jyothi, V.E., Prasad, B.D.C.N., Mojjada, R.K.: Analysis of cryptography encryption for network security. In: IOP Conference Series: Materials Science and Engineering, vol. 981, pp. 022028. IOP Publishing (2020) 6. Hebbar, P., Hegde, P., Nayak, S., Kerni, S., Rajgopal, K.T.: Study and performance evaluation of different symmetric key cryptography technique for encryption. Int. Res. J. Eng. Technol. (IRJET) 6(5), 1151–1154 (2019) 7. Arpaia, P., Bonavolonta, F., Cioffi, A.: Problems of the advanced encryption standard in protecting Internet of Things sensor networks. Measurement 161, 107853 (2020) 8. Chhabra, S., Lata, K.: Hardware obfuscation of AES IP core using combinational hardware Trojan circuit for secure data transmission in IoT applications. Concurr. Comput. Pract. Exp. 34(21), e7058 (2022) 9. Mulya, M., Arsalan, O., Alhaura, L., Wijaya, R., Ramadhan, A.S., Yeremia, C.: Text steganography on digital video using discrete wavelet transform and cryptographic advanced encryption standard algorithm. In: Sriwijaya International Conference on Information Technology and Its Applications (SICONIAN 2019), pp. 141–145. Atlantis Press (2019) 10. Rahman, Z., Yi, X., Billah, M., Sumi, M., Anwar, A.: Enhancing AES using chaos and logistic map-based key generation technique for securing IoT-based smart home. Electronics 11(7), 1083 (2022) 11. El-Shafai, W., Almomani, I., Ara, A., Alkhayer, A.: An optical-based encryption and authentication algorithm for color and grayscale medical images. Multimed. Tools Appl. 82, 23735–23770 (2022) 12. Zhou, K., Fan, J., Fan, H., Li, M.: Secure image encryption scheme using double random-phase encoding and compressed sensing. Opt. Laser Technol. 121, 105769 (2022) 13. Chai, X., Bi, J., Gan, Z., Liu, X., Zhang, Y., Chen, Y.: Color image compression and encryption scheme based on compressive sensing and double random encryption strategy. Signal Process. 176, 107684 (2020) 14. Rani, N., Sharma, S.R., Mishra, V.: Grayscale and colored image encryption model using a novel fused magic cube. Nonlinear Dyn. 108(2), 1773–1796 (2022) 15. Zhao, J., Zhang, T., Jiang, J., Fang, T., Ma, H.: Color image encryption scheme based on alternate quantum walk and controlled Rubik’s Cube. Sci. Rep. 12(1), 14253 (2022) 16. Liu, G., Li, W., Fan, X., Li, Z., Wang, Y., Ma, H.: An image encryption algorithm based on discrete-time alternating quantum walk and advanced encryption standard. Entropy 24(5), 608 (2022)
Chapter 46
Phonocardiographic Signal Analysis for the Detection of Cardiovascular Diseases Deena Nath Gupta, Rohit Anand, Shahanawaj Ahamad, Trupti Patil, Dharmesh Dhabliya, and Ankur Gupta
Abstract Under this work, we propose a novel method for analyzing heart sounds to find cardiovascular illnesses. The phonocardiography (PCG) signal is divided into four pieces by the heart sound segmentation process: S1, systolic interval, S2, and diastolic interval. One may argue that it is an essential stage in the automated analysis of PCG signals. Mechanical activity of the circulatory system is conveyed via heart sounds. This data comprises the subject’s precise physiological condition as well as any short-term variations connected to the respiratory cycle. This paper’s focus is on an issue that is currently open: how to analyze noises and extract physiological state changes while keeping short-term variability.
D. N. Gupta CDAC, Mumbai, Maharashtra, India R. Anand (B) Department of ECE, G. B. Pant DSEU Okhla-I Campus (Formerly G. B. Pant Engineering College), New Delhi 110020, India e-mail: [email protected] S. Ahamad College of Computer Science and Engineering, University of Hail, Hail City, Saudi Arabia T. Patil Department of Engineering and Technology, Bharati Vidyapeeth Deemed to be University, Navi Mumbai, Maharashtra, India e-mail: [email protected] D. Dhabliya Department of Information Technology, Vishwakarma Institute of Information Technology, Pune, Maharashtra, India e-mail: [email protected] A. Gupta Department of Computer Science and Engineering, Vaish College of Engineering, Rohtak, Haryana, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_47
529
530
D. N. Gupta et al.
46.1 Introduction An image is a column- and row-organized array, or matrix, of square pixels. Images are the most common and economical way to share or disseminate information [1– 3]. Images clearly communicate information about the locations, dimensions, and connections between items. They depict geographical data that humans may identify being items [4, 5]. Humans are skilled at deciphering such images due to our inherent optical and cognitive abilities. Humans get around 75% of their information in visual form. Digitizing an image converts it into a format that can be stored in the computer’s RAM or on an external storage device like a CD-ROM or hard disc [6, 7]. Most people are aware of the concept of image compression, which entails lowering the amount of memory required to retain a digital image [6]. The three main categories of image processing activities are image compression, image enhancement and restoration, and measurement extraction [8–11]. They aid in the rectification of picture flaws brought on by either the digitization process or flaws in the imaging system. All of the examples run on 256-grayscale pictures. To operate on color pictures, these processes can be expanded. The measurement extraction techniques may be used to extract valuables in sequence from the image after it is of good quality [12, 13]. Below are a few illustrations of image enhancement and measurement extraction. Numerous transformations and approaches are used in image processing, most of which are derived from the study of signal processing [14, 15]. Standard geometric changes include rotation, linear translation, size reduction, and expansion. According to a certain mapping technique, it is possible to alter the colors of photographs by boosting contrasts or even changing the image’s color palette altogether. Image compositions are widely used to combine bits from other photos [16, 17]. Our focus will be on digital image processing, which entails altering the characteristics of a digital image with a computer. Realizing that these two factors are two distinct but equally significant parts of picture processing is crucial [18, 19]. The worst process for the satisfying condition is one that improves the appearance of a picture or satisfies a requirement [20, 21]. Humans like visuals that are crisp, clear, and detailed, whereas robots favor images that are uncomplicated and uncluttered. The mechanical activity of the circulatory system is conveyed via heart sounds. This data comprises the subject’s precise physiological condition as well as any shortterm variations connected to the respiratory cycle [22, 23]. This paper’s focus is on an issue that is currently open: how to analyze noises and extract physiological state changes while keeping short-term variability. Better outcomes are obtained by using a feature extraction and wavelet decomposition approach to identify the heartbeat state [24, 25]. Different characteristics of the signal can be obtained by extracting its features. Features are effectively matched with database signals to get the desired results [26–29].
46 Phonocardiographic Signal Analysis for the Detection …
531
Paper Organization Section 46.1 is introducing different transformation approaches used in image processing. The focus and evolution in research are also discussed. Section 46.2 is presenting existing research in the relevant area along with their methodology. Section 46.3 is presenting the research methodology and process flow of the proposed work. Section 46.4 is presenting the result and implementation as per the proposed work. Section 46.5 is the conclusion part that is presenting the outcome of the research. Section 46.6 is considering the future scope of work.
46.2 Literature Survey It is discussed how to analyze acoustic heart sounds. To provide a list of potential diagnoses, the system analyzes heart sound data and isolates essential characteristics of the heartbeat signatures. However, it has the drawback that they do not incorporate GUI development, software integration, or hardware design for data collecting. This work offers a signal processing technique that identifies heart sounds and generates clinical characteristics that distinguish systolic and diastolic heart murmurs avail oneself of autoregressive modeling. However, this specific technique has the limitation of being unable to analyze cardiac events that combine systolic and diastolic murmurs. In this study, the categorization of cardiac disorders using heart sound data is done using an extreme learning machine including an autonomous segmentation technique. The suggested approach greatly outperforms HMM-, MLP-, and SVMbased classifiers in categorizing nine different heart disorders, according to experimental data. The determination of the instantaneous rotation speed is proposed using the HARD approach. The suggested HARD-based methodology for the instantaneous frequency estimate has a moderate processing cost and is very easy to implement, allowing the method to be used even online. Research on the cardiac auscultatory abilities used in internal medicine was carried out by Mangione and Nieman. The author focused on family practice residents and included a comparison of their diagnostic prowess [1]. Kumar presented a new methodology for the compression of images based on the wavelet approach. The author was successful in accomplishing this goal by using the SPIHT algorithm [2]. Vyas concentrated his efforts on the development of sophisticated picture compression. Wavelet transform was investigated by the researcher, who also used the SPHIT method [3]. The categorization of cardiac sounds was completed by Olmez and Dokur. They use an artificial neural network in their operations [4]. The classification of heart sounds was the primary emphasis of Leung’s work. They use the time–frequency approach as well as artificial neural networks in their work [5]. Hebden was the one who determined that the patient had aortic stenosis. The author performed a heart sound study to diagnose mitral regurgitation [6]. Myint contributed to the development of an electronic stethoscope that is capable of diagnosis [7]. Using computer vision, Sindhwani performed an examination of the performance of deep
532
D. N. Gupta et al.
neural networks [8]. Juneja and Anand conducted research on the enhancement of contrast in images using DWT-SVD and DCT-SVD [9]. Saini and Anand identified flaws in plastic gears in their project. The author uses image processing in their work [10]
46.3 Proposed Methodology To determine the condition of the heartbeat in our suggested study, we employed a PCG signal. Using a Butterworth low-pass filter, the phonocardiography signal must be preprocessed. Discrete wavelet transform is then used to deconstruct it. For subsequent processing, the level with the lowest error rate must be chosen. It is necessary to find all maximum peaks before identifying S1 and S2. The next step is to determine the systolic and diastolic intervals. It is necessary to choose important characteristics, and the KNN method [30] must then be used to classify cardiac disorders. The suggested method includes several phases, as shown in Fig. 46.1.
46.3.1 Preprocessing Choose one PCG. At runtime, a signal is produced in.wav format. The axis displays the selected signal. We use the filtering technique to eliminate the noisy signal. In that situation, we employ the Butterworth filter [31].
46.3.2 Segmentation Comparing the DWT to the CWT, the DWT is far simpler to implement. In this part, the fundamental ideas of the DWT, along with its characteristics and the techniques that were utilized to calculate it, will be covered. Using a half-band low-pass filter, we can get rid of anything over 50% of the signal’s peak frequency. As an example, Document Selection
Pre-Processing
Segmentation
Feature Extraction
Performance Evaluation
Classification Algorithm
Feature Selection
Fig. 46.1 Flow diagram of proposed system
46 Phonocardiographic Signal Analysis for the Detection …
533
if your signal has a maximum component frequency of 1000 Hz, you may use a halfband low-pass filter to get rid of anything over 500 Hz. Currently, the frequency unit is very significant. The unit of measurement for frequency in discrete transmissions is radians. As a result, the signal’s sampling frequency is 2p radians in terms of radial frequency. Due to the signal being characterized by just half as many samples as before, this decomposition reduces the temporal resolution in half. The signal’s frequency range now only covers half of the prior frequency band, essentially halving the uncertainty in the frequency, but this procedure doubles the frequency resolution.
46.3.3 Feature Extraction From the PCG signal, we can extract a lot of important features to be discussed in the upcoming sections.
46.3.4 Feature Selection by Genetic Algorithm (GA) Gas offers a straightforward, all-encompassing, and effective framework for feature selection, a method of optimization for exploring very big areas influenced by the biological processes of reproduction and natural selection. It is a method of global optimization [32–35]. A community of structures, each appropriately represented as a string of symbols, is searched probabilistically and uses data from objective functions rather than derivatives.
46.3.5 Classification As a kind of prediction model, regression trees use a decision tree to draw connections between data points about an object and expert opinions about that object’s worth. This technique is used in data mining [36], statistics, and machine learning [37, 38] for making predictions. Such tree models also go by the terms classification trees and decision tree learning. Each node in the tree represents a class, and its corresponding probability distribution is shown as a leaf. Following the classification process, the supplied signal is classified as being a normal signal or an aberrant signal.
534
D. N. Gupta et al.
46.4 Results and Implementation In this section, we provide a cutting-edge technique for the analysis of cardiac sounds to detect cardiovascular diseases. Heart sounds are used to communicate the mechanical action of the circulatory system. This information includes both the subject’s precise physiological state and any transient alterations related to the respiratory cycle. Physiological state changes may be extracted from sounds while preserving short-term variability, which is the main topic of this work. This problem is yet unresolved. To determine the condition of the heartbeat in our suggested study, we employed a PCG signal. Phonocardiography signal analysis is shown in Fig. 46.2. Using a Butterworth low-pass filter of any order (as shown in Fig. 46.3), the phonocardiography signal must be preprocessed. Discrete wavelet transform is then used to deconstruct it. For subsequent processing, the level with the lowest error rate must be chosen. It is necessary to find all maximum peaks before identifying S1 and S2. The next step is to determine the systolic and diastolic intervals. The necessary characteristics must be chosen. Finally, the KNN algorithm must be used to categorize cardiac disorders.
Fig. 46.2 Phonocardiography signal analysis for the detection of diseases
46 Phonocardiographic Signal Analysis for the Detection …
535
Fig. 46.3 Graphic representation of low-pass filter in MATLAB
46.5 Conclusion Better outcomes are obtained by using a feature extraction and wavelet decomposition approach [39, 40] to identify the heartbeat state. Different characteristics of the signal can be obtained by extracting its features. Features are effectively matched with database signals to get the desired results. The clinicians may use this program to determine the state of the heartbeat with great ease. Automatic applications minimize manual labor. They also provide outcomes for signal content that does not exist. Diagnosing and treating cardiac disease are highly beneficial to the doctor. The findings of the planned work are precise. A decision tree classifier [41] sorts the signal to produce the desired outcome.
536
D. N. Gupta et al.
46.6 Future Scope The phonocardiogram (PCG) has been used to represent the heart’s murmurs and other noises. The phonocardiograph is the device that records these sounds. It is one of the unobtrusive devices that records the audible state of the heart. Present research could play a significant role in the detection of cardiovascular diseases in a more efficient manner.
References 1. Mangione, S., Nieman, L.Z.: Cardiac auscultatory skills of internal medicine and family practice trainees, a comparison of diagnostic proficiency. JAMA 278(9), 717–722 (1997) 2. Kumar, R., Anand, R., Kaushik, G.: Image compression using wavelet method and SPIHT algorithm. Digital Image Process. 3(2), 75–79 (2011) 3. Vyas, G., Anand, R., Hole, K.E.: Implementation of advanced image compression using wavelet transform and SPHIT algorithm. Int. J. Electron. Electric. Eng. 4(3), 249–254 (2011) 4. Olmez, T., Dokur, Z.: Classification of heart sound using an artificial neural network. Pattern Recogn. Lett. 24, 617–629 (2003) 5. Leung, T.S., White, P.R., Collis, W.B., Brown, E., Salmon, A.P.: Classification of heart sounds using time-frequency method and artificial neural networks. In: Proceedings of the 22nd Annual International Conference, Engineering in Medicine and Biology Society (Chicago, IL), pp 988–991 (2020) 6. Hebden, J.E., Torry, J.N.: Identification of aortic stenosis and mitral regurgitation by heart sound analysis. Comput. Cardiol. 24, 109–112 (1997) 7. Myint, W. W., Dillard, B.: An electronic stethoscope with diagnosis capability. In: Proceedings of the 33rd Southeastern Symposium on System Theory, pp. 133–137 (2001) 8. Sindhwani, N., Anand, R., Meivel, S., Shukla, R., Yadav, M.P., Yadav, V.: Performance analysis of deep neural networks using computer vision. EAI Endorsed Trans. Indus. Netw. Intell. Syst. 8(29), e3–e3 (2021) 9. Juneja, S., Anand, R.: Contrast enhancement of an image by DWT-SVD and DCT-SVD. In: Data Engineering and Intelligent Computing: Proceedings of IC3T 2016, pp. 595–603. Springer, Singapore (2018) 10. Saini, P., Anand, M.R.: Identification of defects in plastic gears using image processing and computer vision: a review. Int. J. Eng. Res. 3(2), 94–99 (2014) 11. Sharma, S., Rattan, R., Goyal, B., Dogra, A., Anand, R.: Microscopic and ultrasonic superresolution for accurate diagnosis and treatment planning. In: Communication, software, and networks: proceedings of INDIA 2022, pp. 601–611. Singapore: Springer Nature Singapore (2022) 12. Mason, D.: Listening to the heart. F. A. Davis Co. (2000) 13. Milios, E.E., Nawab, S.H.: Signal abstractions in signal processing software. IEEE Trans. Acoust. Speech Signal Process. 37, 913–928 (1989) 14. Ruiz, R., Barro, S., Presedo, J., Palacios, F., Vila, J.: Abstraction of information in electrocardiographic monitoring for the detection of myocardial ischaemia. In: V International Symposium on Biomedical Engineering, pp. 291–294 (1994) 15. McDonnell, J.T.E.: Knowledge-based interpretation of foetalphonocardiographic signals. IEE Proc. Radar Signal Process. 137, 311–318 (1990) 16. Sharif, Z., Zainal, M.S., Sha’ameri, A.Z., Salleh, S.H.S.: Analysis and classification of heart sounds and murmurs based on the instantaneous energy and frequency estimations. In: Proceedings of the IEEE TENCON, 2, pp. 130–134 (2000)
46 Phonocardiographic Signal Analysis for the Detection …
537
17. Liang, H., Lukkarinen, S., Hartimo, I.: Heart sound segmentation algorithm based on heart sound envelogram. Comput. Cardiol. 23, 105–108 (1997) 18. Griffith, L.: Rapid measurement of digital instantaneous frequency. IEEE Trans. ASSP 23, 207–222 (1975) 19. Loughlin, P.J., Tacer, B.: Comments on the interpretation of instantaneous frequency. IEEE Signal Process. Lett. 4, 123–125 (1997) 20. Yoshida, H., Shino, H., Yana, K.: Instantaneous frequency analysis of systolic murmur for phonocardiogram. In: Proceedings of the 19th International Conference, Engineering in Medicine and Biology Society, vol. 4, pp. 1645–1647, Chicago, IL (1997) 21. Sun, L., Shen, M., Chan, F.H.Y.: A method for estimating the instantaneous frequency of nonstationary heart sound signals. In: Proceedings of the 2003 International Conference on Neural Networks and Signal Processing, vol. 1, pp. 798–801 (2003) 22. Dushyant, K., Muskan, G., Gupta, A., Pramanik, S.: Utilizing machine learning and deep learning in cyber security: an innovative approach. In: Ghonge M. M., Pramanik S., Mangrulkar R., Le D. N., (eds.) Cyber security and digital forensics. Wiley, Hoboken (2022). https://doi. org/10.1002/9781119795667.ch12 23. Bansal, R., Obaid, A.J., Gupta, A., Singh, R., Pramanik, S.: Impact of big data on digital transformation in 5G era. In: 2nd International Conference on Physics and Applied Sciences (ICPAS 2021) (2021). https://doi.org/10.1088/1742-6596/1963/1/012170 24. Pramanik, S., Bandyopadhyay, S.K., Ghosh, R.: Signature image hiding in color image using steganography and cryptography based on digital signature concepts. In: IEEE 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bangalore, India, pp. 665–669 (2020). https://doi.org/10.1109/ICIMIA48430.2020.9074957 25. Babu, S.Z.D. et al.: Analysation of big data in smart healthcare. In: Gupta, M., Ghatak, S., Gupta, A., Mukherjee, A.L. (eds.) Artificial Intelligence on Medical Data. Lecture Notes in Computational Vision and Biomechanics, vol. 37. Springer, Singapore (2023). https://doi.org/ 10.1007/978-981-19-0151-5_21 26. Gupta, A., Singh, R., Nassa, V.K., Bansal, R., Sharma, P., Koti, K.: Investigating application and challenges of big data analytics with clustering. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing, and Automation (ICAECA), 2021, pp. 1–6 (2021). https://doi.org/10.1109/ICAECA52838.2021.9675483 27. Pandey, B.K., et al.: Effective and secure transmission of health information using advanced morphological component analysis and image hiding. In: Gupta, M., Ghatak, S., Gupta, A., Mukherjee, A.L. (eds.) Artificial Intelligence on Medical Data. Lecture Notes in Computational Vision and Biomechanics, vol. 37. Springer, Singapore (2023). https://doi.org/10.1007/978981-19-0151-5_19 28. Pathania, V., et al. A database application for monitoring COVID-19 in India. In: Gupta, M., Ghatak, S., Gupta, A., Mukherjee, A.L. (eds.) Artificial Intelligence on Medical Data. Lecture Notes in Computational Vision and Biomechanics, vol. 37. Springer, Singapore (2023). https:// doi.org/10.1007/978-981-19-0151-5_23 29. Veeraiah, V., Khan, H., Kumar, A., Ahamad, S., Mahajan, A., Gupta, A.: Integration of PSO and deep learning for trend analysis of meta-verse. In: 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), pp. 713–718 (2022). https://doi.org/10.1109/ICACITE53722.2022.9823883 30. Gupta, A., Anand, R., Pandey, D., Sindhwani, N., Wairya, S., Pandey, B.K., Sharma, M.: Prediction of breast cancer using extremely randomized clustering forests (ERCF) technique: prediction of breast cancer. Int. J. Distrib. Syst. Technol. (IJDST) 12(4), 1–15 (2021) 31. Goyal, B., Dogra, A., Khoond, R., Gupta, A., Anand, R.: Infrared and visible image fusion for concealed weapon detection using transform and spatial domain filters. In: 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), pp. 1–4. IEEE (2021) 32. Anand, R., Chawla, P.: A novel dual-wideband inscribed hexagonal fractal slotted microstrip antenna for C-and X-band applications. Int. J. RF Microwave Comput. Aided Eng. 30(9), e22277 (2020)
538
D. N. Gupta et al.
33. Chawla, P., Anand, R.: Micro-switch design and its optimization using pattern search algorithm for applications in reconfigurable antenna. Modern Antenna Syst. 10, 189–210 (2017) 34. Anand, R., Chawla, P.: Bandwidth optimization of a novel slotted fractal antenna using modified lightning attachment procedure optimization. In: Smart Antennas: Latest Trends in Design and Application, pp. 379–392. Springer International Publishing, Cham (2022) 35. Raghavan, R., Verma, D.C., Pandey, D., Anand, R., Pandey, B.K., Singh, H.: Optimized building extraction from high-resolution satellite imagery using deep learning. Multimed. Tools Appl. 81(29), 42309–42323 (2022) 36. Sansanwal, K., Shrivastava, G., Anand, R., Sharma, K.: Big data analysis and compression for indoor air quality. In: Handbook of IoT and Big Data, pp. 1–21. CRC Press (2019) 37. Weber, B.G., Mateas, M.: A data mining approach to strategy prediction. In: 2009 IEEE Symposium on Computational Intelligence and Games, pp. 140–147. IEEE (2009) 38. Jain, S., Sindhwani, N., Anand, R., Kannan, R.: COVID detection using chest X-ray and transfer learning. In: Intelligent Systems Design and Applications: 21st International Conference on Intelligent Systems Design and Applications (ISDA 2021) Held During December 13–15, 2021, pp. 933–943. Springer International Publishing, Cham (2022) 39. Sindhwani, N., Anand, R., Niranjanamurthy, M., Verma, D. C., Valentina, E. B. (eds.): IoT Based Smart Applications. Springer Nature, Berlin (2022) 40. Sindhwani, N., Anand, R., Vashisth, R., Chauhan, S., Talukdar, V., Dhabliya, D.: Thingspeakbased environmental monitoring system using IoT. In: 2022 Seventh International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 675–680. IEEE (2022) 41. Saxena, H., Joshi, D., Singh, H., Anand, R.: Comparison of classification algorithms for Alzheimer’s disease prediction. In: 2022 Seventh International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 687–692. IEEE (2022)
Chapter 47
Object Localization in Emoji-Based Social Networks Using Deep Learning Techniques Galiveeti Poornima, Y. Sudha, B. C. Manujakshi, R. Pallavi, Deepak S. Sakkari, and P. Karthikeyan
Abstract Contrary to picture classification, object localization is a computer vision issue for which, despite the use of deep learning, solutions have not yet reached human level performance. Moving sliding windows over the image, teaching neural architecture to forecast bounding boxes, and applying venerable image processing methods like Scale Invariant Feature Transform (SIFT) and region proposals are common ways to approach this issue in the framework of deep learning. The abovementioned techniques locate images by using the structure of the categorization neural network. Emojis (such as smileys, memes, and hearts) are frequently used by members of online or chat social networks to express the emotions behind their textual interactions in an effort to improve interpretability, particularly for brief polysemous words. Any chat or social network that uses semantic-based context recognition technologies may understand text-based emoticons (i.e., emojis made up of a combination of symbols and characters) and convert them into audio data (e.g., text-to-speech readers for individuals with vision impairment). For the purpose of object localization, the proposed method operates under the presumption that there is only one object present in every given image. The proposed Convolution Neural Networks’ (CNNs) model can successfully categorize and localize that object. The objective of the proposed study is to (i) create synthetic data for model training, (ii) create custom metrics and callbacks in Keras, (iii) create and train a multi-output G. Poornima · Y. Sudha (B) · B. C. Manujakshi · R. Pallavi Vision and Learning Lab, School of CSE, Presidency University, Bengaluru, India e-mail: [email protected] G. Poornima e-mail: [email protected] R. Pallavi e-mail: [email protected] D. S. Sakkari Computer Science Lab, Department of CSE, SKIT, Bengaluru, India P. Karthikeyan Computer Science Lab, National Chung Cheng University, Chiayi, Taiwan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_48
539
540
G. Poornima et al.
neural network to perform object localization. The outcome of the work is to locate the emoji, that can be used for sentiment analysis.
47.1 Introduction In the SMS messaging, there has been a growing demand for additional visual data to assist and establish the context of brief messages. Sentiment analysis (SA) in natural language processing manually labels emotions for sentences which can be predicted using emoji of text posted on social media without labeling manually [1]. Emojis are frequently used by people worldwide as a tool to express one’s emotional states and have recently been considered for assessment in research [2]. In textual communication, people enjoy including emotional cues using emoji to make up for restricted or unavailable visual expression, even when users send their messages using a variety of facial emotions [3]. People also value contributing reactions to live video conversation as feedback (e.g., live streaming, video calls). One of the easiest ways to evoke emotional and empathic connection through communication media is through pictures. Recent work in SA attempted to employ emotional emojis as a noisy label of sentiments on social media [4]. Although one can obtain a large dataset from a number of application domains such as biology or astronomy, it is not possible to specifically annotate bounding boxes since this would require domain-level expertise, and those who possess this expertise do not have the time to devote weeks to doing so. Additionally, a lot of algorithms that do use annotated boundary box data also use region of interest [5, 6] that indicate potential object placements. These techniques connect the areas to CNN and check the class averages. If a certain class score is very high, the algorithm can infer that the suggested location belongs to that class. These region suggestions are typically produced by methods that were created when deep learning became particularly popular and have no neural network foundation. The only tool that may be available in the first scenario in which there is no boundary box data is a categorization neural network. In the latter case, one has access to data but utilizing a technique like region suggestions looks inaccurate and inefficient especially whenever a classification neural network has indeed been trained to distinguish between various objects, which necessitates possessing spatial knowledge. Additionally, CNNs are nothing more than a succession of convolutions that are applied one after another, with the outputs of the convolution operations corresponding to certain spatial places in the previous layer. It would be fascinating and perhaps more helpful to see regions provided by the classification artificial network as opposed to using region of interest from algorithms like Selective Search [7]. So, it makes sense to go inside a pre-trained CNN to find how to construct object positions. In fact, it seems outrageous that there has been little or no research on this issue despite the fact that using a categorization network for localization seems so obvious. This work demonstrates the categorization networks that provide highly important information to locate images because there are many excellent practical and scientific
47 Object Localization in Emoji-Based Social Networks Using Deep …
541
motives for doing so. Overall, the proposed method includes two strategies to solve the issue of localizing pictures using simply a pre-trained CNN. The visualization of the aspects of images that cause neurons to activate has received a lot of attention. The neuron visualization techniques like deconv [8] and guided backpropagation [9] have shown to be quite successful. The proposed method identifies the objects of interest by fusing the capacities to identify significant neurons and also to map neurons again into the image. To be more precise, the proposed method takes an image as input and outputs a boundary for an object that is present in the source images.
47.2 Background According to the 38th statistical report on the growth of the Chinese Internet published by the China Internet Information Center [10], China has 710 million Internet users and the penetration rate was 51.7% in June 2016. Around 656 million of them used the Internet on their mobile devices, 242 million used Micro-blog, and more than 100 million maintained daily blogs. The majority of these countless short text messages are filled with negative emotions. Emojis are becoming more and more popular, so there is a lot of interest in examining and researching how they are used. When analyzed from an NLP perspective, emojis can be demonstrated to act as a potent sentiment signal that generalizes well [11]. Few studies have recently attempted to determine the relationships between emoji and image modalities [12]. As far as we are aware, there is no effort being made to produce image locations solely based on CNN classification. There are techniques [5, 6, 8] that learn on boundary box data and partially utilize a classification network. A classification CNN is trained in the OverFeat and integrates localization, classification, and detection framework [8]. The activations of a specific layer inside the CNN are then used to train a boundary box regression network. An algorithm like Selective Search [7] proposes roughly 2000 regions of interest in the Regions with CNN (R-CNN) framework [5] which are then sent to a CNN. The activations are fed into a predictive network at some layer of the CNN to forecast bounding box data. Although these techniques use CNN to extract features, they also have bounding box data as additional information which simplifies the problem. Despite the exception of [6], the systems are not entirely deep learning systems because they need area proposals from the pioneers of computer vision. Although this should not be taken as a criticism of these techniques per se, one would anticipate that a deep educational model as is the case with classification would work entirely independently without the need for further support from signal processing. In conjunction to these techniques, a localization technique that does not make use of bounding box data is described in a study by Oquab [13]. The authors’ use of a different architecture makes their solution different from the proposed solution, as opposed to having K outputs that correspond to K potential classes. The output
542
G. Poornima et al.
provided by the authors is m by n by K, where m and n represent a potential range of positions in the image. Therefore, if location (1, 2, 10) and location (7, 8, 12) both have high outputs, it indicates that object (10 in the image) 10 is at one location and object (12 in the image) is in a different position farther distant from object 10. Since the number of input to the neural network is actually smaller, the authors achieve this by expanding images up to 500 times and moving the neural network all along image. Consequently, the CNN’s output after processing the image is a three-dimensional array (two dimensions for locations that the neural network operated on and a third dimension for class scores). The affine level is changed into an analogous convolution layer in a follow-up study by Oquab [14], which eliminates sliding windows. The authors of [14] and [13] create a training goal that takes into consideration numerous locations and the presence of different objects in images. Despite the lack of bounding box data in [14], the authors are nonetheless able to localize objects quite effectively. Even though the researchers are utilizing a dataset with many fewer classes than something like ImageNet, they are still able to achieve localization accuracy up to roughly 70%. This paper indicates that the training for effective localization performance does not require box data which is crucial for real-world applications. The work in [5, 6, 8, 13, 14] rescales the original image to produce many duplicates of the image. They do this to train their classifiers and regressor to handle objects at various scales more effectively. There are two methods that are really helpful for comprehending how a CNN works. Researchers have utilized techniques like deconv [8] and directed backpropagation [9] to view the aspects of images that cause neurons to become highly active. These methods can produce figures in the source image like those in Fig. 47.1. Despite the fact that these images are really good, no theory has yet been developed to explain why they are more visually pleasing than those created using standard backpropagation [15]. Although we intuitively assume that the pictures produced by directed backpropagation represent what causes a neuron to fire, there is no convincing theoretical justification for this. The capacity to map neurons back into picture space would be highly helpful in locating objects because directed backpropagation maps onto different features in the image. Overall, there are clues in the present research that a categorization CNN might be utilized on its own to help locate even in the absence of bounding box data [7, 8, 13, 14]. However, no published work has ever been able to locate objects using only a trained CNN.
47.3 Proposed Approach The proposed approach is divided into the following activities: Activity 1: Introduction The study uses a TensorFlow to create a model which will be trained to classify and locate emojis in the input images. Localization means the position of the emojis in the images. So, the model will have one input and two outputs. In this object
47 Object Localization in Emoji-Based Social Networks Using Deep …
543
Fig. 47.1 Difference between guided backpropagation versus regular backpropagation
localization [12], task works with the assumption that there is just one object in any given image and the model will classify and localize that object. The TensorFlow is used as machine learning framework. The Google Colab platform is used to run the Python codes. Activity 2: Download and Visualize Data The data used is synthesized from a few emojis. Some of the emojis which are of 72 by 72, used in the study, are shown in Fig. 47.2. There are nine objects that will be ultimately localized in the input images when the model is trained. The sample output of the study is shown in Fig. 47.3. Activity 3: Create Examples In this activity, the synthesized examples are created, that will be used later when training the model. The dictionary is created with unique class id for all the nine emotions. Then randomly pick a class and synthesize an image for that class. The emoji will be placed randomly in 144 × 144 blank image as shown in Fig. 47.4. Activity 4: Plot Bounding Boxes Whenever it comes to information processing for projects involving image and video annotation, bounding boxes are among the most well-liked and well-known techniques. Data annotators outline the desired object within each image by specifying its X and Y coordinates by drawing these boxes over machine learning images. This saves significant processing resources and makes it simpler for predictive modeling algorithms to locate what they are looking for and identify collision paths. One of the most widely used picture annotation methods in deep learning is the use of bounding boxes or rotational bounding boxes. This method can save expenses
544
G. Poornima et al.
Fig. 47.2 Sample input emojis
and improve annotation efficiency when compared to existing image segmentation processing techniques.1 A green color boundary box around the identified smiley is shown in Fig. 47.5. Activity 5: Two-Dimensional Convolution Model In the proposed method, a 2D convolution model with five layers is used as shown in Fig. 47.6. Activity 6: Custom Metric: IoU When creating machine learning algorithms for using in the actual world, it assesses these models to see how well they perform on unobserved data points. The model uses an assessment metric to compare the performance of several models on actual 1
https://keymakr.com/blog/what-are-bounding-boxes/
47 Object Localization in Emoji-Based Social Networks Using Deep …
Fig. 47.3 Sample output Fig. 47.4 Randomly picked emoji placed in 144 × 144 black image
545
546
G. Poornima et al.
Fig. 47.5 Boundary box around the identified emoji
data points with the same goal in mind. IoU2 is a common evaluation metric for the task of localizing images. Intersection over Union is referred to as IoU. Activity 8: Model Training and Testing Last stage of the proposed approach is training and testing.
47.4 Results and Discussion In the proposed approach, by varying the number of layers from 5 to 15, the results are compared and depicted as in Table 47.1. For N = 5, the summary of the model is shown in Table 47.2. For 50 epochs with 500 steps for each epoch, the actual loss, class out loss, box out loss, class out accuracy, box out IoU are shown in Table 47.2. The actual loss as shown in Fig. 47.7 is reduced approximately to zero after fifth epoch. Loss in class out as shown in Fig. 47.8 and box out as shown in Fig. 47.9 are also reduced after fifth epoch. Accuracy of class identification is 100% as shown in Fig. 47.10 after third epoch. IoU of 0.5 is typically considered as a good score, while 1 is perfect in theory (Fig. 47.11).3 The IoU is improved gradually as shown in Fig. 47.12. The outcome of the proposed work can be applicable for the sentiment analysis using localization of emoji. 2 3
https://www.einfochips.com/blog/understanding-object-localization-with-deep-learning/. https://blog.superannotate.com/intersection-over-union-for-object-detection/.
47 Object Localization in Emoji-Based Social Networks Using Deep … Fig. 47.6 Model of proposed system
547
548
G. Poornima et al.
Table 47.1 Summary of the model when N = 5 Layer (type)
Output shape
Param #
Image (input layer)
[(None, 144, 144, 3)]
0
Conv2d (conv2D)
(None, 142, 142, 16)
448
Normalization
(None, 142, 142, 16)
64
Pooling
(None, 71, 71, 16)
0
Conv2d1 (conv2D)
(None, 69, 69, 32)
4640
Normalization
(None, 69, 69, 32)
128
Pooling
(None, 34, 34, 32)
0
Conv2d2 (conv2D)
(None, 32, 32, 64)
18,496
Normalization
(None, 32, 32, 64)
256
Pooling
(None, 16, 16, 64)
0
Conv2d3 (conv2D)
(None, 14, 14, 128)
73,856
Normalization
(None, 14, 14, 128)
512
Pooling
(None, 7, 7, 128)
0
Conv2d4 (conv2D)
(None, 5, 5, 256)
295,168
Normalization
(None, 5, 5, 256)
1024
Pooling
(None, 2, 2, 256)
0
Flatten (flatten)
(None, 1024)
0
Dense (dense)
(None, 256)
262,400
Class out (dense)
(None, 9)
2313
Box out (dense)
(None, 2)
514
Table 47.2 Summary of the model with 50 epochs Epoch #
Actual loss
Class out loss
Box out loss
Class out accuracy
Box out IoU
Lr
1
1.2045
0.9267
0.2778
0.6785
0.1774
0.0010
5
0.0043
0.0014
0.0029
0.9999
0.4347
2.0000e–04
10
0.0018
4.2458E–04
0.0014
1
0.5739
4.00E−05
15
0.0013
2.4951E−04
0.001
1
0.6375
8.00E−06
20
0.0011
2.0556E−04
9.1179E−04
1
0.674
1.60E−06
25
0.001
1.9340E−04
8.3542E−04
1
0.6975
3.20E−07
30
0.001
2.1079E−04
8.2670E−04
1
0.7137
3.20E−07
35
0.001
1.9198E−04
8.2132E−04
1
0.7254
3.20E−07
40
0.001
2.1648E−04
8.0842E−04
1
0.7344
3.20E−07
45
0.0011
2.2272E−04
8.3200E−04
1
0.7413
3.20E−07
50
9.87E−04
1.8437E−04
8.0241E−04
1
0.7469
3.20E−07
47 Object Localization in Emoji-Based Social Networks Using Deep …
Fig. 47.7 Actual loss versus epochs
Fig. 47.8 Class out loss versus epochs
549
550
Fig. 47.9 Box out loss versus epochs
Fig. 47.10 Class out accuracy versus epochs
G. Poornima et al.
47 Object Localization in Emoji-Based Social Networks Using Deep …
551
Fig. 47.11 Box out IoU versus epochs
Fig. 47.12 Box_out_iou
47.5 Conclusion As per the proposed method, the emojis in the text of any chatting applications can be classified and localized as an object for the sentiment analysis. The results obtained using deep learning method are up to 90% for emoji classification, and the emoji is localized using plot bounding box method. In future, with the increased number of epochs, exact localization of an emoji is possible.
552
G. Poornima et al.
References 1. Tomihira, T., Otsuka, A., Yamashita, A., Satoh, T.: Multilingual emoji prediction using BERT for sentiment analysis. Int. J. Web Inf. Syst. 16(3), 265–280 (2020) 2. Kutsuzawa, G., Umemura, H., Eto, K., Kobayashi, Y.: Classification of 74 facial emoji’s emotional states on the valence-arousal axes. Sci. Rep. 12(1), 398 (2022) 3. Dalugama, T. U.: Three-dimensional emoji prediction using facial emotion recognition. Doctoral dissertation (2021) 4. Liu, C., Fang, F., Lin, X., Cai, T., Tan, X., Liu, J., Lu, X.: Improving sentiment analysis accuracy with emoji embedding. J. Saf. Sci. Resilien. 2(4), 246–252 (2021) 5. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587 (2014) 6. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, p. 28 (2015) 7. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013) 8. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. Preprint at arXiv:1312. 6229 (2013) 9. Zeiler, M. D., Fergus, R.: Visualizing and understanding convolutional networks. CoRR, abs/ 1311.2901. Preprint at arXiv:1311.2901 (2013) 10. Peng, S., Cao, L., Zhou, Y., Ouyang, Z., Yang, A., Li, X., Yu, S.: A survey on deep learning for textual emotion analysis in social networks. Digital Commun. Netw. 8(5), 745–762 (2022) 11. Priyanka, G., Bharathi, K., Kavin, N. Sentiment analysis and emoji mapping. In: 2022 8th international conference on advanced computing and communication systems (ICACCS), vol. 1, pp. 1973–1976. IEEE (2022) 12. Al-Halah, Z., Aitken, A., Shi, W., Caballero, J.: Smile, be happy :) emoji embedding for visual sentiment analysis. In: Proceedings of the IEEE/CVF international conference on computer vision workshops (2019) 13. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1717–1724 (2014) 14. Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Is object localization for free?-weakly-supervised learning with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 685–694 (2015) 15. Baydin, A.G., Pearlmutter, B.A., Radul, A.A., Siskind, J.M.: Automatic differentiation in machine learning: a survey. CoRR abs/1502.05767. Preprint at arXiv:1502.05767 (2015)
Chapter 48
Mathematical Gann Square Model and Elliott Wave Principle with Bi-LSTM for Stock Price Prediction K. V. Manjunath and M. Chandra Sekhar
Abstract The process involved in predicting the value of stock price is an effort to evaluate the future stock values in order to enhance the profit of the company. The major objective of this research is to utilize Gann square mathematical model and Elliott Wave Principle to predict the stock price values. Moreover, the Elliott Wave Principle, which is used by this research, is combined with Bi-directional Long Short-Term Memory (Bi-LSTM) to analyze the recurrent long-term changes of price patterns in waveforms associated with consistent changes. The performance of the proposed stock prediction model is evaluated based on Mean Absolute Error (MAE), Mean Square Error (MSE), and Root Mean Square Error (RMSE). The outcome of Gann square model and Elliott Wave Principle with Bi-LSTM is evaluated in light of these aforementioned metrics and provides MAE of 0.56, MSE of 0.42, and RMSE of 0.54.
48.1 Introduction Predicting stock prices to maximize profits is making the stock market a lucrative investment channel. Convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are tested to predict market prices continually [1]. Predicting future price rates is crucial since investors invest reduced income and anticipate larger returns [2]. Investors earn from stock market decisions that enhance prices [3]. CNN classifies data, while LSTM captures data changes for efficient stock price predictions [4]. LSTM, GRU, and CNN successfully estimate stock price values based on past data; however, near price rates create a delay [5, 6]. Data denoising improves prediction accuracy [7]. The stock price is predicted using previous and present price growth rates and projected price improvement [3]. Existing approaches provide good K. V. Manjunath (B) · M. Chandra Sekhar Department of Computer Science and Engineering, Presidency University, Chennai, India e-mail: [email protected] M. Chandra Sekhar e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_49
553
554
K. V. Manjunath and M. Chandra Sekhar
input–output learning patterns, but network paralysis and local minima prevent accurate predictions [8]. The approximate price prediction utilizing multiple approaches yields efficient results and allows investors simple to invest for maximum earnings [9]. Noise and nonlinearity in input data make stock price prediction challenging [10]. Time-series projections make it easy to track stock market price fluctuations [11]. Accurate price prediction helps investors make money [12]. Hence, a mathematical model using the Bi-LSTM method is suggested to address the restrictions [13]. The Bi-LSTM efficiently analyzes data via forward and backward methods and provides a lot of price prediction information [14]. Gann square and Elliott Wave Principle resistance levels help to anticipate future prices. This study predicts efficient outcomes using continuous prediction. Ensemble model’s significant contributions: 1. The Gann Square model and Elliott Wave Principle track stock market price changes based on previous and present data. The Gann square model and BiLSTM algorithm forecast efficiently. 2. The Gann Square model’s mathematical equations compute price prediction fluctuations and future prices. Ongoing prediction improves accuracy. 3. The Gann square model and Elliott Wave Principle utilizing the Bi-LSTM algorithm predict stock prices continuously and help investors to generate huge returns. Bi-LSTM predicts efficiently using forward and backward techniques. The organization of the paper follows the literature review in Sect. 48.2, the proposed methodology and resulting related equations in Sect. 48.3, Sect. 48.4 represents the results evaluated by using the proposed algorithm, and comparative analysis is represented in Sect. 48.5, and finally concluded in Sect. 48.6.
48.2 Literature Review CNN-Bi-LSTM-method AMs predicted tomorrow’s stock market price [15]. CNN and Bi-LSTM used input data to estimate the stock market’s pricing the next day. CNN-Bi-LSTM-AM improved stock price prediction accuracy but had unclear components and noise. Yu et al. [16] anticipate prices using Phase-Space Reconstruction (PSR) using DNN-LSTM models. Several elements’ pricing data predicted stock market prices in a chaotic environment. Preprocessing minimizes timedependent series and noise. PSR method’s diminishing gradient hampered this work’s model. Wu et al. [17] created a CNN-LSTM Stock Sequence Array Convolution LSTM (SACLSTM) for stock price prediction. This study proposed SACLASTM, CNNpred, CNN-corr, NN, and SVM classification algorithms. LSTM statistically predicted financial variables from convolutional layers. Convolutional LSTM efficiently predicts stock market prices by removing unnecessary data and nonlinear financial market behavior. The model anticipated stock market rise and fall better but failed to pinpoint the exact point. Li et al. [18] proposed a stock correlation matrix-based LSTM-RGCN to predict stock market price moments. RGCN and graph predict news-related stock moments.
48 Mathematical Gann Square Model and Elliott Wave Principle …
555
Market data provided associated company stock price correlation matrix data. RGCN does not test stock price moments for complicated datasets. Khuwaja et al. [19] predicted stock prices using an Adversarial Learning Network (ALN). LSTM collected authentic financial market data, while NDDP imputed missing data. ALN examined tweets, global indicators, and stocks. Hyperparameter exploration was ALN’s downside. Jin et al. [20] constructed an LSTM-based model to anticipate stock market values, assisting investors. EMD-LSTM has simplified attention and prediction sequences. EMD predicts stock values simply. Volatility and noise hindered LSTM-based EMD time-series prediction. Rezaei et al. [21] extracted temporal sequences, deep features, and efficient predictions using a Complete Ensemble Empirical Mode Decomposition-CNN-LSTM (CEEMD). CEEMD eliminates cloud enhancement interactions and improves model analysis. CEEMD and EMD have predicted stock prices well, but data fluctuations made CEED harder. Existing approaches are complex, fluctuating, and unpredictable. This work proposes the Gann square model combining Elliott Wave Principle with Bi-LSTM algorithm to address the restrictions. Combining forward and backward data transmission with Elliott wave concept, the Gann square model can handle complicated data and forecast well. Bi-LSTM time series forecasts these.
48.3 Methodology The mathematical GANN square model and Elliott Wave Principle with Bi-LSTM algorithm are used to forecast price movements in the stock market. The Gann square model is used for predicting the stock price and to extract the relevant features that the Bi-directional LSTM technique is presenting. The efficiency of the prediction levels is calculated by using the parameters like MAE, MSE, and RMSE (Fig. 48.1).
48.3.1 Data Collection Yahoo Finance provides stock statistics. Yahoo gives investors closing prices, opening prices, profits, and historical prices. Preprocessing data removes noise. Yahoo databases include S&P500, DJIA, N225, and CSI 300. S&P500 tracks 500 American corporations’ stock performance. S&P500 stock market prices are more accurate and quicker. American equities index S&P500 (US). Data is collected from July 2016 to June 2018.
556
K. V. Manjunath and M. Chandra Sekhar
Fig. 48.1 Stock price prediction using the Gann square model and Elliott Wave Principle with Bi-LSTM
48.3.2 Data Preprocessing Predicting stock prices properly and avoiding noisy data depends on data quality. Noise reduces model accuracy and performance. Normalization reduces noisy data and improves accuracy in data preparation. Normalization lowered data complexity, and stock price prediction time series shows aberrant stock prediction variances, indicating extrinsic elements that do not change without constrained data. Data values were scaled for price prediction. The scaling Min–Max normalization method where the data varies in between the range of [0, 1] is calculated by Eq. (48.1). X norm = X − (X min × X max ) − X min ,
(48.1)
where X represents the value of every feature in stock data and X min, X max represent the minimum and maximum values of every feature in stock price data.
48.3.3 Elliot Wave Patterns Elliot waves predict stock price changes. Stock market psychology drives its wave theory. Candlestick or stick charts first discover Elliot wave patterns for stock price
48 Mathematical Gann Square Model and Elliott Wave Principle …
557
forecasts. Gann angles are used to chart data. Corrective and impulsive behaviors create wave patterns that follow location rules. Trends in the stock market reveal patterns. The second wave’s length and pricing should match the previous waves. The diagonal triangle arises when the fifth wave is on the same line as the third and fourth waves, the first and fifth waves are shorter than the third wave, and the first and third waves are in opposing directions. Technical analysis of stock prices relies on chart support and resistance levels. Two price chart levels show market range. If there are more buyers than sellers, the price rises; if there are more sellers, it falls. • Support level: The level, where the price frequently stops falling and bounces back up in the price chart, is known as support level. • Resistance level: the level, where the price regularly stops raising and dips back down, is known as resistance level. These levels are identified by considering historical price data and previous support and resistance levels. The 90-degree angle for a particular data is divided into nine portions in the chart for predicting stock values. These portions are segmented using a Gann square model.
48.3.4 Gann Square Model Gann theory predicts stock price changes utilizing current, historical, and future market data. Stock prices alter with diverse angles, and each angle has a distinct feature to forecast price behavior. The square starts with one and increases clockwise. The Gann square model follows a harmonic pattern; hence, 54 is followed by 29. Every integer in the square is computed by taking the square root, subtracting 2, and re-squaring. Instead of subtracting 2, the square root of the number left to the input number is added. Forecasts use price and time moments. The degree and factor of each degree in the Gann square model are shown below: 45° = 0.25, 90° = 0.50, 120° =0.66, …360° = 2. For additional numbers in the square, the ordinal and cardinal crosses are shown as diagonal and straight lines, respectively. The square’s ordinal and cardinal crossings reflect resistance and support values. Trigonometry gives 360 as 2 and 250 as 1.4; hence, 1.4 is used. Add the degree to the square root’s low value to compute the square’s high value (Fig. 48.2). Gann noted that all angles provide support and resistance levels based on the trend. Strong support is provided by the 1 × 1. It is a major signal of a reversal when the price begins to fall below the 1 × 1 angled trend line. The Gann square model evaluates the time and price predictions by using the Gann angles at various pivot levels. The stock price determination is done by using the resistance and support levels and the variation of the stock price is also identified by Gann’s time study model. The activity of an investor and stock price moments were predicted continuously by the patterns of stock price which results in the reversal in stock’s movement.
558
K. V. Manjunath and M. Chandra Sekhar
Fig. 48.2 Gann angles in Elliot waves
48.3.5 Bi-directional Long Short-Term Memory (Bi-LSTM) Based on the angles of Gann chart, the stock price values are identified using BiLSTM. The Bi-LSTM layer is used to capture the bi-directional time dependencies and informative features to improve the accuracy of the prediction compared to the unidirectional LSTM. The Bi-LSTM layer operates the forward and backward operations of the data to predict the variations that take place in stock price values. The unidirectional LSTM consists of a memory cell Ct instead of neurons with three gates including the input gate i t , output gate ot , and the forgot gate f t . The input data is calculated by using the input gate at that moment and new input information is controlled in the internal memory unit. The internal memory unit where the previous time data is stored is controlled by using the forget gate. The output evaluated by the internal memory unit is controlled by using the output gate. The input data of the LSTM is considered as x, h is the hidden state that represents the network memory ability, and various time steps are determined by t with t − 1 subscripts. The connections between the directed graph nodes with a particular sequence are calculated based on the output of the previous layer’s hidden state and the input of the current moment. The principle of the LSTM is calculated by Eqs. (48.2), (48.3), and (48.4). The value of the input gate i t is calculated by Eq. (48.2), and the input cell candidate state value C˜ t at time t is calculated by Eq. (48.3). The forget gate f t activation value at time t is calculated by Eq. (48.4). ] [ i t = σ (Wi · h t−1 , xt + bi )
(48.2)
[ ] C˜ t = tanh(Wc · h t−1 , xt + bc )
(48.3)
48 Mathematical Gann Square Model and Elliott Wave Principle …
] [ f t = σ (W f · h t−1 , xt + b f )
559
(48.4)
The forgot and the input gates are used to control the initial information, and the newly improved data value of a cell state at time t is calculated by Eq. (48.5). Ct = f t ∗ Ct−1 + i t ∗ C˜ t .
(48.5)
The output gate value is calculated by Eq. (48.6) with the updated memory cells to calculate the hidden state which is measured by Eq. (48.7), [ ] Ot = σ (Wo · h t−1 , xt + bo ,
(48.6)
h t = Ot ∗ tanh(Ct ),
(48.7)
where the Wi , Wc , W f , and Wo are the weights of the four different matrices, the offset is represented by the bi , bc , b f , and bo , ∗ represents the outer vector product, and σ represents the sigmoid function. The function of long-term memory is effectively performed by using the LSTM algorithm and obtaining the long-distance feature information which is the data obtained before the output time, and reverse information is not used. Bi-directional LSTMs improve prediction accuracy by using time-series data from both directions. The Bi-backward LSTMs and forward LSTMs update data before and after transmission, enabling full judgments based on past and future data. Figure 48.3 displays forward and backward computations. The horizontal arrow indicates timeseries information flowing from the input to the hidden to the output layer in both directions of the model.
Fig. 48.3 Structure of Bi-LSTM
560
K. V. Manjunath and M. Chandra Sekhar
48.4 Results The mathematical Gann square model and Elliott Wave Principle using the Bi-LSTM algorithm are proposed for continuous stock price prediction to future stock price values and validated on the system with 8 Gb Random Access Memory (RAM) with 2.2 GHz using the 3.7 version of Python. The prediction performance of stock price value is improved by using the proposed Gann square method using the Bi-LSTM algorithm. The performance of the continuous stock prediction is validated by the parameters of MAE, RMSE, and MSE. MSE is used to calculate the result between the predicted values of the proposed and original values. The minimum value of MSE gives the more accurate stock price values, and the MSE is calculated by using Eq. (48.8) MSE =
N 2 1 ∑ (y − yi ) . N i=1 i Δ
(48.8)
The variation of the error scale in MSE does not give an accurate comparison between MAE and MSE; hence, the square root of MSE is calculated as the value of RMSE by Eq. (48.9). [ | N |1 ∑ 2 RMSE = √ (y i − yi ) . N i=1 Δ
(48.9)
The gap between the RMSE and MAE is more where the individual errors are maximum. The MAE value is calculated by using Eq. (48.10). MAE =
N 1 ∑ |yi − yi |, N i=1 Δ
(48.10)
where yi represents the original values, N denotes the total number of samples, and yi represents the predicted values of the proposed method. Δ
48.4.1 Quantitative Analysis The Gann square model and Elliott Wave Principle employing Bi-LSTM for continuous stock price forecast values in the stock market validate MSE, MAE, and RMSE. Bi-LSTM outperforms LSTM and RNN in stock price prediction (Figs. 48.4, 48.5, and 48.6). Compared to previous approaches, predictions are more accurate. The Gann square model accurately calculates MAE, MSE, and RMSE as 0.56, 0.42,
48 Mathematical Gann Square Model and Elliott Wave Principle …
561
Fig. 48.4 MSE performance using various methods
Fig. 48.5 MAE performance using various methods
and 0.54. Table 48.1 compares MSE, MAE, and RMSE to current models and the Bi-LSTM using diverse datasets.
48.5 Comparative Analysis The comparative analysis between the existing methods, namely CNN-LSTM [21], LSTM [20], and DNN-LSTM [16] with Bi-LSTM algorithm, is represented in Table 48.2. The results were evaluated by using the Bi-LSTM as the MSE evaluates as 0.042, RMSE evaluates as 0.054, and MAE as 0.056. The Gann square model using Bi-LSTM results in improved performance compared to the existing CNN, LSTM, and CNN-LSTM models. The prediction performance of the model is improved by using machine learning algorithms and the main advantage of the
562
K. V. Manjunath and M. Chandra Sekhar
Fig. 48.6 RMSE performance using various methods
Table 48.1 Experimental results of various parameters Datasets
RNN
CNN
LSTM
Bi-LSTM
S&P500
0.328
0.256
0.196
0.096
DJIA
0.349
0.324
0.184
0.084
N225
0.384
0.346
0.157
0.056
CSI 300
0.259
0.267
0.144
0.042
S&P500
0.473
0.345
0.505
0.084
DJIA
0.386
0.348
0.569
0.064
N225
0.445
0.365
0.588
0.075
CSI 300
0.425
0.365
0.516
0.056
S&P500
0.324
0.255
0.176
0.069
DJIA
0.254
0.334
0.154
0.079
N225
0.235
0.346
0.146
0.066
CSI 300
0.246
0.257
0.132
0.054
MSE
MAE
RMSE
Gann square model and Elliott Wave Principle with Bi-LSTM models predicts the stock price values accurately by efficiently analyzing the data in forward and backward directions to improve the profits. The graphical representation of Table 48.2 is represented in Fig. 48.7. Investors make and lose money due to stock price predictions. Common investors, businesspeople, and top corporations utilize stock market price value prediction to increase earnings and reduce risk. Stock market history and data forecast minuteby-minute prices. With basic stock market understanding, investors and traders may
48 Mathematical Gann Square Model and Elliott Wave Principle … Table 48.2 Comparison between existing and Bi-LSTM models
563
Methods
MSE
RMSE
MAE
CNN-LSTM [21]
0.245
0.242
0.242
LSTM [20]
0.216
0.226
0.226
DNN-LSTM [16]
0.084
0.165
0.156
Gann-Bi-LSTM
0.042
0.054
0.056
Fig. 48.7 Comparison between existing and Bi-LSTM models
anticipate future stock prices. Forecasting data uses past occurrences to predict stock market values. Existing techniques use machine learning algorithms for efficient predictions, but complexity limits performance. The Gann square model and Elliott Wave Principle utilizing the Bi-LSTM algorithm decrease error and make accurate stock price forecasts.
48.6 Conclusion The continuous stock price prediction using the mathematical Gann square model and Elliott Wave Principle with the Bi-LSTM algorithm is proposed to predict the high moments in stock price values. The future stock price values are predicted based on past and present information to evaluate the efficient stock price values. The open-source Yahoo dataset is used for input data and the noisy content of the data is reduced by using the Min–Max normalization technique in preprocessing. The Gann square model and Elliott Wave Principle are used to predict the values of stock prices efficiently by using the forward and backward propagations of BiLSTM. The prediction performance of continuous stock prices is validated by using the parameters of MSE, RMSE, and MAE. The Bi-LSTM model beats previous approaches in forecasting stock price values, giving excellent performance measures such as an MAE (Mean Absolute Error) of 0.56, MSE (Mean Squared Error) of 0.42,
564
K. V. Manjunath and M. Chandra Sekhar
and RMSE (Root Mean Square Error) of 0.54. Further, this model makes real-time stock price predictions every second, leading to highly accurate profit forecasts.
References 1. Khan, W., Ghazanfar, M.A., Azam, M.A., Karami, A., Alyoubi, K.H., Alfakeeh, A.S.: Stock market prediction using machine learning classifiers and social media, news. J. Ambient Intell. Human. Comput. 13(7), 3433–3456 (2020) 2. Polamuri, S.R., Srinivas, K., Mohan, A.K.: Multi-model generative adversarial network hybrid prediction algorithm (MMGAN-HPA) for stock market prices prediction. J. King Saud Univ. Comput. Inform. Sci. 34(9), 7433–7444 (2022) 3. Nemes, L., Kiss, A.: Prediction of stock values changes using sentiment analysis of stock news headlines. J. Inform. Telecommun. 5(3), 375–394 (2021) 4. Sharma, D.K., Hota, H.S., Brown, K., Handa, R.: Integration of genetic algorithm with artificial neural network for stock market forecasting. Int. J. Syst. Assur. Eng. Manag. 13(2), 828–841 (2022) 5. Chung, H., Shin, K.S.: Genetic algorithm-optimized multi-channel convolutional neural network for stock market prediction. Neural Comput. Appl. 32(12), 7897–7914 (2020) 6. Parray, I.R., Khurana, S.S., Kumar, M., Altalbe, A.A.: Time series data analysis of stock price movement using machine learning techniques. Soft Comput. 24(21), 16509–16517 (2020) 7. Kumar, R., Srivastava, S., Dass, A., Srivastava, S.: A novel approach to predict stock market price using radial basis function network. Int. J. Inform. Technol. 13(6), 2277–2285 (2021) 8. Kamalov, F.: Forecasting significant stock price changes using neural networks. Neural Comput. Appl. 32(23), 17655–17667 (2020) 9. Jing, N., Wu, Z., Wang, H.: A hybrid model integrating deep learning with investor sentiment analysis for stock price prediction. Expert Syst. Appl. 178, 115019 (2021) 10. Yu, X., Li, D.: Important trading point prediction using a hybrid convolutional recurrent neural network. Appl. Sci. 11(9), 3984 (2021) 11. Patel, M.M., Tanwar, S., Gupta, R., Kumar, N.: A deep learning-based cryptocurrency price prediction scheme for financial institutions. J. Inform. Secur. Appl. 55, 102583 (2020) 12. Mughees, N., Mohsin, S.A., Mughees, A., Mughees, A.: Deep sequence to sequence Bi-LSTM neural networks for day-ahead peak load forecasting. Expert Syst. Appl. 175, 114844 (2021) 13. Pang, X., Zhou, Y., Wang, P., Lin, W., Chang, V.: An innovative neural network approach for stock market prediction. J. Supercomput. 76(3), 2098–2118 (2020) 14. Li, X., Wu, P.: Stock price prediction incorporating market style clustering. Cogn. Comput. 14(1), 149–166 (2022) 15. Lu, W., Li, J., Wang, J., Qin, L.: A CNN-BiLSTM-AM method for stock price prediction. Neural Comput. Appl. 33(10), 4741–4753 (2021) 16. Yu, P., Yan, X.: Stock price prediction based on deep neural networks. Neural Comput. Appl. 32(6), 1609–1628 (2020) 17. Wu, J.M.T., Li, Z., Herencsar, N., Vo, B., Lin, J.C.W.: A graph-based CNN-LSTM stock price prediction algorithm with leading indicators. Multimed. Syst. 29(3), 1751–1770 (2021) 18. Li, W., Bao, R., Harimoto, K., Chen, D., Xu, J., Su, Q.: Modeling the stock relation with graph network for overnight stock movement prediction. In: Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, pp. 4541– 4547 (2021) 19. Khuwaja, P., Khowaja, S.A., Dev, K.: Adversarial learning networks for FinTech applications using heterogeneous data sources. IEEE Internet Things J. 10, 2194–2201 (2021)
48 Mathematical Gann Square Model and Elliott Wave Principle …
565
20. Jin, Z., Yang, Y., Liu, Y.: Stock closing price prediction based on sentiment analysis and LSTM. Neural Comput. Appl. 32(13), 9713–9729 (2020) 21. Rezaei, H., Faaljou, H., Mansourfar, G.: Stock price prediction using deep learning and frequency decomposition. Expert Syst. Appl. 169, 114332 (2021)
Chapter 49
Crowd Monitoring System Using Facial Recognition Sunanda Das, R. Chinnaiyan, G. Sabarmathi, A. Maskey, M. Swarnamugi, S. Balachandar, and R. Divya
Abstract The World Health Organization (WHO) suggests social isolation as a remedy to lessen the transmission of COVID-19 in public areas. Most countries and national health authorities have established the 2-m physical distance as a required safety measure in shopping malls, schools, and other covered locations. In this study, we use standard CCTV security cameras to create an automated system for people detecting crowds in indoor and outdoor settings. Popular computer vision algorithms and the CNN model are implemented to build up the system and a comparative study is performed with algorithms like Support Vector Machine and KNN algorithm. The created model is a general and precise people tracking and identifying the solution that may be used in a wide range of other study areas where the focus is on person detection, including autonomous cars, anomaly detection, crowd analysis, and many more.
S. Das · A. Maskey Department of CSE-Cyber Security, JAIN (Deemed-to-be-University), Bengaluru, India e-mail: [email protected] R. Chinnaiyan (B) Department of CSE, Alliance College of Engineering and Design, Alliance University, Bengaluru, India e-mail: [email protected] G. Sabarmathi School of Business and Management, CHRIST (Deemed to be University), Bangalore, India M. Swarnamugi Department of CS, Jyoti Nivas College, Bengaluru, India S. Balachandar · R. Divya VTU-RC, MCA, CMR Insitute of Technology, Bengaluru, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_50
567
568
S. Das et al.
49.1 Introduction Facial recognition is the technique of using facial features in systems that use biometrics. It consists mainly two tasks: verification and identification [1]. Face-to-facial image processing is compared in a 1:1 matching procedure called “face verification,” and there is a 1:N face query image matching [2]. The purpose of this approach is to create a travel plan based on face recognition strategies. Using AI concepts like computer vision to find and identify human faces in images or videos is known as Facial Detection technology. The primary goal of the model is to provide a smart crowd-monitoring system capable of counting and detecting people in the scene. AI technology like computer vision having Facial Recognition and Facial Detection is increasingly being used to power crowd control solutions. Additionally, today privacy and security have emerged as major concerns and many companies and firms have started implementing RFID technology with smart barcodes, not limiting to only this but also Face Detection and Face Recognition scanners. Even while a wide range of businesses and organizations may profit from crowd control technologies, entertainment and sporting events stand to gain the most, improving security, saving money, and guaranteeing a great patron experience. Sensors can be employed at music festivals to monitor crowd density at various performances and identify which meal and refreshment stands are getting the greatest foot traffic. By tracking the timing and mode of arrival of customers, sensors can also aid in managing sudden influxes of clients. Such practical insights enable planners to control and improve venue traffic flow.
49.2 Related Work A. Selecting a Template Convolutional Neural Network (CNN) For the computer vision task, CNN is the most used and successful architecture in the deep learning community. It is mostly used in computer vision tasks. CNN was first proposed by Fukushima [3] in his seminal study, according on the hierarchical receptive field model of the visual brain by Wiesel and Hubel. Again introducing CNN, Waibel et al. distributed the weights across backpropagation training for phoneme recognition and temporal receptive fields. LeCun et al. [4] developed a practical CNN architecture for document recognition. The first deep model for face detection, built on a CNN, was proposed by Li et al. [5]. It was called the Cascade-CNN-based model. Similarly, the R-CNN and FasterR-CNN-based models were studied by Jiang and Miller [6], Wang et al. [5], and Sun et al. [7], who were successful in obtaining cutting-edge results on two popular face detection benchmarks [8]. B. Feature Pyramid Network-Based Model
49 Crowd Monitoring System Using Facial Recognition
569
One of the families of deep models used for face detection is the Feature Pyramid Network-based model [9]. These models have also been used in the field of object detection and classification [19–30]. It is a neural network structure. Zhang et al. [10] Feature Agglomeration Networks (FANet’s), a face detection system that was proposed, were built on the Feature Pyramid Network. By combining higher-level semantic feature maps of different scales as contextual cues to extend the lower-level feature maps using a hierarchical agglomeration method to only slightly increase computational costs, the main objective of this model is to investigate the inherent multi-level features of a single Convolutional Neural Network. C. Face Detection in Uncontrolled Environments Recently, in an enhanced cascade with a simple feature framework, Chen et al. [11] suggest using indexed shape features to perform face detection and face alignment together. Similarly, Zhang et al. [12] and Park et al. [13] adopt the idea of multiple resolutions in general object detection.
49.3 Methodology The first thing that we have to carry out is to perform face detection. The foremost thing that we have to perform is to do face detection. We need to find the positions or locations of faces from the image. As we are concerned with finding the faces, we convert our image to grayscale because color is not an important factor in determining the face of the person. The algorithm to find the face will be to use the HOG algorithm. HOG plots image pixel orientations and gradients on a histogram. It simplifies representation by drastically retaining only the crucial information. This way HOG minimizes the error. A. Gradient Calculation Gradient calculation is one of the important steps in creating descriptors. This stage determines the precision of the determined orientations and histograms, and the outcomes are consequently tightly tied to the technique employed to calculate the picture gradient [14]. HOG uses the Sobel algorithm to carry out operations to find the gradient and magnitude of the images. It is a simple algorithm however with correct results. In this step, in both the horizontal and vertical directions, we apply 1D-derived masks. Mask dimensions can be 1 * 3 and 3 * 1 for the horizontal and vertical directions (Fig. 49.1) [15]. Dx = [ −1 0 1 ] Dy = [ −1 0 1 ]T
570
S. Das et al.
Fig. 49.1 8 * 8 gradient directions (left), 8 * 8 gradient magnitude (right)
The gradients in x and y directions are calculated using the equation given below: Gx = difference in the pixel on left − the difference in the pixel on right. Gy = difference in pixel above − the difference in pixel below. ) Gx . Angle = θ = arctan Gy √ Magnitude = Gx 2 + Gy 2 . (
The next stage in extracting the HOG features is orientation binning. Depending on an orientation, each pixel in the cell casts a weighted vote for the histogram channel which again is based on the number of values received in the gradient calculation. The cells may be radial or rectangular and the canals are spread out 0–3600. The contribution of the Histogram, HOG, prepares bins of 20 degrees each. So there are nine bins in total [16]. HOG starts inserting gradient magnitude values as per the pixel orientation into the nine bins. We get a 1 * 9 face matrix for the cell. S HOG computes such 1 * 9 face matrices for the remaining cells. To take local variations in contrast and light into account, the gradient’s strengths must be normalized. Cells must be arranged into bigger, spatially related blocks for this. The HOG descriptor is constructed by concatenating elements of normalized cell histograms from all block sections. The fact that these blocks often overlap means that each cell contributes at least twice to the final description. The feature vector obtained from the orientation binning should be normalized (Fig. 49.2). Let us consider a vector ‘v’ to be the non-normalized vector that consists every histogram in a specific block. Its ||v|| k is to be k-normal, where k = 1, 2, and e are to be some small constant. So, we can calculate the normalizing factor using the L2 − norm : f = √
v
, ||v|| + e2 v L1 − norm : f = √ . ||v|| + e1
49 Crowd Monitoring System Using Facial Recognition
571
Fig. 49.2 Histogram of nine bins
It all starts with converting the images into grayscale or black and white, as we do not need color data to find faces. We traverse every single pixel of our image and look at the neighboring pixels of the current image. The objective is to find the intensity of the current pixel compared to the pixel that surrounds it. We make an arrow in the direction where the intensity is decreasing. We repeat this process for every pixel available in the image. The results obtained will be the image in which all the pixels are replaced by an arrow. Those arrows, which replace these pixels, are also called gradients, which depict the movement from high intensity to low intensity across the image. We have too much information after calculating the gradient and magnitude of each pixel. Since we simply want to see the underlying pattern of the photos, we are only interested in light/dark flow at a higher level. In order to do this, the image is divided into 16 * 16 pixel-sized squares. Counting of the number of transitions for each square in the important direction is done, which will be replaced by arrow (Fig. 49.3). Therefore, we must locate a region of our image that resembles the Histogram of Gradient pattern in order to identify the face in the picture, that was extracted by training other faces. This way we can find the face from the image. B. Face Landmark Estimation Fig. 49.3 HOG
572
S. Das et al.
Fig. 49.4 Face landmark estimation
There are cases in the images in which the faces are positioned differently and have no match with the computer. Fiducial landmarks play a very crucial role in facial identity [17]. The corners of the eyes, nose, nostrils, and mouth, as well as the chin, ear lobes, and tip of the nose are the most frequently utilized landmarks, etc. To solve this problem, we have made an effort to wrap each image so that the lips and eyes, nostrils corners, eye mouth corners, chin, etc., are consistently in the same location. It will be much simpler for us to compare faces later on. We used an algorithm known as face landmark estimation [18]. Identifying 68 different spots, also known as landmarks, on each face is the basic notion. Examples include the top of the chin, the outer edge of each eye, the inner edge of each brow, etc. Then, we will teach a machine learning system how to identify these 68 specific places on any face [18]. The only thing left to do is rotate, scale, and shear the image to put the eyes and mouth in the middle of the picture now that we know where they are. This method allows us to roughly center the eyes and mouth on the image, which helps us with the next phase (Fig. 49.4). C. Face Encoding The process to follow for face encoding is to use Deep Convolutional Neural Network. Unlike using CNN for recognizing pictures of objects, for each face, we train it to produce 128 measurements. The algorithm then examines the metrics that are produced for each of the pictures we set to train. It then makes a few adjustments to the neural network to ensure that the measurements are somewhat closer while ensuring that the measurement of dissimilar faces is farther apart. We are not sure which parts of the face these 128 numbers are exact. To us, it does not matter. We just care that the network produces about the same results when comparing two images of the same individual. D. Face Recognition Now given the encoding of the face that is obtained from the camera, we have to match it to our database to recognize the face of the given individual. To classify the
49 Crowd Monitoring System Using Facial Recognition
573
Fig. 49.5 SVM classifier
access, the face of the person whose image is contained in the database of well-known individuals and whose encoding is closest to our test image must be located. A straightforward linear SVM Classifier is utilized for this. Training a classifier that can use the encoding from a test image that has a match would be the last stage of process. E. Support Vector Machines (SVMs) SVM is the classifier that computes the best decision boundary to separate the classes. In SVM, the high-dimensional data are mapped to features for categorizing the data, even if the data are not linearly separable. SVM is considered as a discriminative classifier that partitions the hyperplane. A newly built hyperplane differentiates raw samples based on described training data. The division between the classes is found and the data are transformed so that a hyperplane (decision boundary) could be drawn. As shown in figure, if we have classes of f1 and f2, then the separation between the two data points is classified using the best hyperplane, such that margin is maximum. These data points are also called support vectors (Fig. 49.5). Let us consider (x 1 , …, x n ) are the face feature vector that is obtained from the above encodings. Then, the equation of the line is given by: f (x) = w1 x1 + w2 x2 + . . . wn xn f (x) =
n ∑
wiT xi .
i=1
So, the hypothesis h(x) can be given by: ⎧ h(x) =
1, i f w T x ≥ 0 0, other wise
So, the cost function of SVM can be denoted by:
574
S. Das et al.
min min 1 ∑ = (w j )2 . j (w) w 2 j=1 n
Depending on constraints, w T x (i) ≥ +1, i f y (i ) = 1. w T x (i ) ≤ −1, i f y (i) = 0. As a result, a decision boundary that seeks to maximize the margin from the support vector is indicated. The decision boundary becomes more resilient against lower margins as a result of the maximum of the margin as viewed with it. In addition, i is the property of the SVM that classifies the face of the person from the database and test image. The above equation is minimized such that the margin is maximum, so we get the optimal decision boundary that classifies the face of the person from the database and test image (Fig. 49.6).
Fig. 49.6 Architecture of mode
49 Crowd Monitoring System Using Facial Recognition
575
49.3.1 Result and Discussion In Table 49.1, there are two datasets such as CelebA and annotated faces in the wild. The number of samples, number of identifiers, number of landmarks, number of attribute, and number of the label are shown. In Table 49.2, there is a measured report of CelebA and different algorithms such as Support Vector Machine, K-Nearest Neighbor, RESNET, and VGGNET. There are different parameters used along algorithms such as accuracy, time complexity, precision, recall, and F1-score. In Table 49.3, there is a measured report of annotated faces in the wild and different algorithms such as Support Vector Machine, K-Nearest Neighbor, RESNET, and VGGNET. There are different parameters used along algorithms such as accuracy, time complexity, precision, recall, and F1-score. Table 49.1 Dataset/details Dataset/details Number of samples
Number of identifier
Number of landmark
Number of attribute
Number of label
CelebA
2,02,599
10,177
5
40
64
Annotated faces in the wild
205
107
6
25
473
Table 49.2 Measured report of CelebA Algorithm
Accuracy
Time complexity (s)
Precision
Recall
F1-score
Support Vector Machine
80.22
485
0.81
0.79
0.78
K-Nearest Neighbor
99
128
0.98
0.99
0.96
RESNET
98
135
0.87
0.88
0.87
VGGNET
96
119
0.91
0.89
0.85
Table 49.3 Measured report of annotated faces in the wild Algorithm
Accuracy
Time complexity (s)
Precision
Recall
F1-score
Support vector machine
80.16
490
0.79
0.76
0.77
K-Nearest Neighbor
96
126
0.97
0.98
0.97
RESNET
97
125
0.83
0.89
0.89
VGGNET
98
123
0.89
0.9
0.89
576
S. Das et al.
49.4 Conclusion The crowd-monitoring system we proposed and implemented should be able to help in crowd management in various public domains. Our crowd-monitoring system has turned up to be reliable with an accuracy of 99%. Implementation of face recognition and detection using dlib was implemented which was efficient at detecting faces in the crowd. Today, technology is seen improving rapidly, hardware improvements that are the availability of higher definition cameras, and better and faster data processing along with improved facial-recognition algorithms and Python modules would help us in the future in detecting faces and recognizing them in a better way improving furthermore accuracy and solving problems like deep learning failing to work on obscuring parts of faces hidden with sunglasses or changing seen in one’s hairstyle.
References 1. Singh, H., Kaur Student, J., Bhsbiet, Bt., Harjeet Singh Assistant, L., Bhsbiet, E.: Face detection and recognition: a review face detection and recognition: A review Akanksha Student, B.Tech (ECE) BHSBIET, Lehragaga. https://www.researchgate.net/publication/323390774 2. Bhagat, S.: Face recognition attendance system. Int. J. Res. Appl. Sci. Eng. Technol. 10(1), 280–283 (2022). https://doi.org/10.22214/ijraset.2022.39702 3. Fukushima, K.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 193–202 (1980). https:// doi.org/10.1007/BF00344251 4. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791 5. Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015, 5325–5334 (2015). https://doi.org/10.1109/CVPR.2015.7299170 6. Wang, H., Li, Z., Ji, X., Wang, Y.: Face R-CNN (2017). Preprint at: https://doi.org/10.48550/ arxiv.1706.01061 7. Sun, X., Wu, P., Hoi, S.C.H.: Face detection using deep learning: an improved faster RCNN approach. Neurocomputing 299, 42–50 (2018). https://doi.org/10.1016/j.neucom.2018.03.030 8. Minaee, S., Luo, P., Lin, Z., Bowyer, K.: Going deeper into face detection: a survey (2021). Preprint at http://arxiv.org/abs/2103.14983 9. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection (2016). https://doi.org/10.48550/arxiv.1612.03144. 10. Du, H., Shi, H., Zeng, D., Zhang, X.-P., Mei, T.: The elements of end-to-end deep face recognition: a survey of recent advances (2020). Preprint at http://arxiv.org/abs/2009.13290 11. Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and alignment. Springer, Singapore (2014) 12. Zhang, W., Zelinsky, G.J., Samaras, D.: Real-time accurate object detection using multiple resolutions. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007) 13. Park, D., Ramanan, D., Fowlkes, C.: LNCS 6314—Multiresolution models for object detection (2010) 14. Said, Y., Atri, M., Tourki, R.: Human detection based on integral histograms of oriented gradients and SVM. In: 2011 International Conference on Communications, Computing and Control Applications, CCCA 2011 (2011). https://doi.org/10.1109/CCCA.2011.6031422
49 Crowd Monitoring System Using Facial Recognition
577
15. Mohammed, M.G., Melhum, A.I.: Implementation of HOG feature extraction with tuned parameters for human face detection. Int. J. Mach. Learn. Comput. 10(5), 654–661 (2020). https://doi.org/10.18178/ijmlc.2020.10.5.987 16. Dadi, H.S., Mohan Pillutla, G.K.: Improved face recognition rate using HOG features and SVM classifier. IOSR J. Electron. Commun. Eng. 11(04), 34–44 (2016). https://doi.org/10. 9790/2834-1104013444 17. Kazemi, V., Kth, J.S.: One millisecond face alignment with an ensemble of regression trees (2014) 18. Ouanan, H., Ouanan, M., Aksasse, B.: Facial landmark localization: past, present and future. CIST 2016, 487–493 (2016). https://doi.org/10.1109/CIST.2016.7805097 19. Sabarmathi, G., Chinnaiyan, R.: Reliable machine learning approach to predict patient satisfaction for optimal decision making and quality health care. In: 2019 International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, pp. 1489–1493 (2019) 20. Sabarmathi, G., Chinnaiyan, R.: Big data analytics framework for opinion mining of patient health care experience. In: International Conference on Computing Methodologies and Communication (ICCMC 2020), IEEE Xplore Digital Library (2020) 21. Sabarmathi, G., Chinnaiyan, R.: Reliable feature selection model for evaluating patient home health care services opinion mining systems. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pp. 1–4 (2021). https://doi.org/10.1109/ICAECA52838.2021.9675485. 22. Sabarmathi, G., Chinnaiyan, R.: Envisagation and analysis of mosquito borne fevers—a health monitoring system by envisagative computing using big data analytics. In: ICCBI 2018— Springer on 19.12.2018 to 20.12.2018 (Recommended for Scopus Indexed Publication IEEE Xplore digital library) (2018) 23. Sabarmathi, G., Chinnaiyan, R.: Mining patient health care service opinions for hospital recommendations. Int. J. Eng. Trends Technol. 69(9), 161–167 (2021) 24. Hari Pranav, A., Senthilmurugan, M., Pradyumna Rahul, K., Chinnaiyan, R.: IoT and machine learning based peer to peer platform for crop growth and disease monitoring system using blockchain. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–5 (2021) 25. Latha, M., Senthilmurugan, M., Chinnaiyan, R.: Brain tumor detection and classification using convolution neural network models. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pp. 1–5 (2021) 26. Senthilmurugan, M., Latha, M., Chinnaiyan, R.: Analysis and prediction of tuberculosis using machine learning classifiers. In: 2021 International Conference on Advancements in Electrical, Electronics, Communication, Computing and Automation (ICAECA), pp. 1–4 (2021) 27. Preetika, B., Latha, M., Senthilmurugan, M., Chinnaiyan, R.: MRI image based brain tumour segmentation using machine learning classifiers. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–9 (2021) 28. Chinnaiyan, R., Stalin Alex, D.: Early analysis and prediction of fetal abnormalities using machine learning classifiers. In: 2021 2nd International Conference on Smart Electronics and Communication (ICOSEC), pp.1764–1767 (2021). https://doi.org/10.1109/ICOSEC51865. 2021.9591828 29. Chinnaiyan, R., Alex, S.: Machine learning approaches for early diagnosis and prediction of fetal abnormalities. In: 2021 International Conference on Computer Communication and Informatics (ICCCI), pp. 1–3 (2021). https://doi.org/10.1109/ICCCI50826.2021.9402317 30. Chinnaiyan, R., Alex, S.: Optimized machine learning classifier for early prediction of fetal abnormalities. Int. J. Comput. Intell. Control 13(2) (2021)
Chapter 50
Performance Augmentation of DIMOS Transistor S. Jafar Ali Ibrahim, V. Jeya Kumar, N. S. Kalyan Chakravarthy, Alhaf Malik Kaja Mohideen, M. Mani Deepika, and M. Sathya
Abstract In this study, the source region of a partially depleted PD SOI n-MOSFET has a high peak-to-valley ratio tunnel diode buried within it. In the MOSFET source region, intrinsic germanium is added at various micro-lengths. The result is compared with various embedded tunnel diode peak-to-valley ratios to determine the constant saturation drain current with a large voltage swing. Other phenomena that come from this device, such as low output resistance, capacitance, high voltage gain, and output signal distortion, were greatly reduced. The proposed device is one of the most promising candidates for low-power digital and analog circuits. This structure’s performance is evaluated using I–V characteristics for various voltages using the COGENDA device simulator.
S. J. A. Ibrahim (B) Department of IoT, School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamilnadu 632 014, India e-mail: [email protected] V. J. Kumar · M. M. Deepika Department of ECE, QIS College of Engineering and Technology, Ongole, Andhra Pradesh, India e-mail: [email protected] M. M. Deepika e-mail: [email protected] N. S. K. Chakravarthy School of Computer Science and Engineering, QIS College of Engineering and Technology, Ongole, Andhra Pradesh, India e-mail: [email protected] A. M. K. Mohideen CFY Deanship, King Saud University, Riyadh, Saudi Arabia e-mail: [email protected] M. Sathya Department of Information Technology, Nadar Saraswathi College of Engineering and Technology, Theni, Tamilnadu, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_51
579
580
S. J. A. Ibrahim et al.
50.1 Introduction Fundamentally, microelectronics is about how to make electronic chips smaller while improving their functionality. To comprehend the path taken in this area, a little firm named Intel developed the 4004, the first microprocessor, 50 years ago [1]. This chip, which measured 12 mm2 , was quickly regarded as the world’s eighth wonder due to its 2300 transistors. Around 10 µm has separated each transistor from the next. This is quite little, but always “thinkable” from a human perspective, given that they might be visible under a small microscope. When considering the transistors of one of the most recent Intel architectures, the Sky Lake CPUs, which were introduced by.
50.2 Literature Review The FET is a well-liked biosensor for the label-free detection of charged biomolecules. According to the idea for capacitive modification of such a vertical nanogap just at FET’s gate generated by the existence of macromolecules, it has now become conceivable to use FET biomaterials for the detection and identification of caution biological macromolecules, too though. The silicon as mentioned above— on—insulator FET (DMFET)—[2] based biosensors have a good sensitivity to both electrostatic modification or charges of both the biomaterials, only with two effects usually having an opposing impact on the device parameters and decreasing sensitivity. The DMFET also shows a strong association between the nanogap length for short-channel lengths and the sensitivities of biomolecule detection. We initially suggest in this letter that P. Bergveld Department of Electrical Engineering [3], Twente University of Technology, Box 217, 7500 AE Enschede [4]. According to their findings, The Development and Application of FET-based Biosensors (The Netherlands) (Acquired 27 September 1985; received 8 July 1985) ABSTRACT After considering the general description of biosensors, the details of one type we may considered. The pH-sensitive ISFET—currently, the subject of clinical investigation for measuring intravascular blood pH—is reviewed. Recently, silicon nanowires have been produced as transducers for extremely sensitive chemicals and biosensors. Because of their small size, high surface-to-volume ratio, and interface reactivity to charged species, silicon nanowires (SiNWs) [5] are indeed the subject of this inquiry. Significant work is being conducted to develop a new type of elevated biochemical and chemical sensors using SiNWs as sensitive functional units, with the likelihood that SiNWs’ production will be compatible with current silicon-based. In addition to the superior electrical properties of combined sensing and digital signal in silicon-based, SiNWs-based sensors’ incorporation will boost reliability [6] and decrease production costs. In a slew of recent investigations, several silicon nanowire synthesis methods have been examined.
50 Performance Augmentation of DIMOS Transistor
581
50.3 System Modeling Diagram demonstrates its standard p-channel I-MOS and p-channel DIMOS layout. In our simulation, the following numbers were used: Silicon film thickness (TSi) is equal to 100 nm, silicon film doping (SFD) is equal to 21,015 cm−3 , buried oxide thickness (TBOX) is equal to 300 nm, gate oxide thickness (Tox) is equal to 5 nm, size of nominal stream (L) is equal to 200 nm, nanogap thickness (TGAP) is equal to 15 nm, chromium gate length (LCr) is equal to 25 nm. A entrance operation (M) = 4.6 eV [7] and this same length of an entrance (LG) = 125 nm are the supplementary parameters used for the type I-MOS. For both devices, the length of the intrinsic region (LIN) is equal. Supply contributor loading (ND) and draining receiver loading (NA) are considered when representing the doped characteristics in the sources and drain areas as a Gaussian kernel as 1020 cm−3 . Provides experimental support for the nanogap’s creation in the gate configuration. The chromite entrance building’s edges were erased, producing a nanogap with a refractive index (K) of 1 and air inside. The dielectric constant effectively increases due to the adsorption of macromolecules (K > 1) in the nanogap, increasing the conduction losses and lowering the threshold voltage VT [8]. This VT shift’s finding paves the way for label-free biomaterials identification.
50.4 Summary with Data To evaluate that state’s responsiveness for identifying whenever antibodies immobilize the nanogap, we built a DIMOS biosensor simulation [9]. The immobilized event is simulated by altering the nanogap’s dielectric constant since macromolecules’ dielectric constants would vary from that of air.
582
S. J. A. Ibrahim et al.
Fig. 50.1 P-channel DIMOS ID versus VG characteristics
The technique outlined is used to choose K = 2 and 12 to represent a collection of biomolecules with low and high dielectric constants, respectively. We used band-toband tunneling effects in the simulation and Selberherr’s Model for impact ionization. The ID versus VG DIMOS biosensor features are shown in Fig. 50.1 with VS = 7.8 V and VD = 0 V. Considering that the simulation was run on a p-Channel DIMOS [10], a negative sweep is utilized too. A. Effect of Gate Length (LG) at a fixed Cr Gate Length (LCr) We used simulation to determine the ideal LG value for the DIMOS biosensor to maximize its responsiveness to biomolecular immobilization. The variation of VT and ION for various LG values is displayed. The sensor is most sensitive for LG = 115 nm if VT shift is used as the sensing element. It has been demonstrated that DMFETs with a 300 nm channel length are less sensitive. While the proposed DIMOS [11] has a short-channel effect of 200 nm and exhibits excellent responsiveness, it is possible to reduce the size of this device. Whenever we repeat the DIMOS at the pulse duration (L) of 1 m, the biological recognition action is still valid, proving the value of the DIMOS concept. Impact of Biomolecular Charges Charges in the biomolecules have a substantial impact on the DMFET biosensing function. Regardless of the polarity of the electrons, the type of semiconductor (PMOS or NMOS) might have to be changed. Using LG = 115 nm [12] also as the optimal value for the best feasible device sensitivity, we examined how biomaterials’ ions impacted the effectiveness of the DIMOS biomaterials. For various biomolecular charges, the change in VT and ION is shown in Figs. 50.2 and 50.3. At the silicon-oxide contact, the energies on the biomaterial molecules have been modeled as charged objects. The DIMOS exhibits a low change in the sensitivity to biomolecular charges, unlike the conventional DMFET, which allows the energy impact even to outweigh its conductivity impact. Because the dielectric-modulation effect predominates in DIMOS, it is possible to detect biomolecules with different
50 Performance Augmentation of DIMOS Transistor
583
Fig. 50.2 ΔVT and ΔION for the DIMOS as a function of gate length, LG
Fig. 50.3 Horizontal electric field (EX) profile along the x-axis
charge polarities using the same device type (p-channel or channel), which is not achievable with traditional DMFETs.
50.5 Conclusions The usage of silicon-on-insulator implications MOS (DIMOS) biosensing has indeed been suggested and investigated. In contrast to standard DMFETs, where biomaterials’ charges have had a major effect on gadget responsiveness, the suggested structure exhibits outstanding responsiveness for biomaterials’ detection and maintains conductivity as the primary effect. When low cost is the deciding criterion, the described sensing element exhibits excellent adaptability over manufacturing technology nodes and could be an intriguing alternative to FET biomaterials with both ancient and more advanced transistors.
584
S. J. A. Ibrahim et al.
References 1. Bergveld, P.: The development and application of FET-based biosensors. Biosensors 2(1), 15–33 (1986) 2. Wenga, G., Jacques, E., Salaun, A.-C., Rogel, R., Pichon, L., Geneste, F.: Step-gate polysilicon nanowires field effect transistor compatible with CMOS technology for label-free DNA biosensor. Biosens. Bioelectron. 40, 141–146 (2013) 3. Guan, W., Duan, X., Reed, M.A.: Highly specific and sensitive nonenzymatic determination of uric acid in serum and urine by extended gate field effect transistor sensors. Biosens. Bioelectron. 51, 225–231 (2014) 4. Im, H., Huang, X.J., Gu, B., Choi, Y.K.: A dielectric-modulated field-effect transistor for biosensing. Nat. Nanotechnol. 2(7), 430–434 (2007) 5. Kim, C.H., Jung, C., Lee, K.B., Park, H.G., Choi, Y.K.: Label-free DNA detection with a nanogap embedded complementary metal oxide semiconductor. Nanotechnology 22(13), 135502 (2011) 6. Gu, B., Park, T.J., Ahn, J.H., Huang, X.J., Lee, S.Y., Choi, Y.K.: Nanogap field-effect transistor biosensors for electrical detection of Avian influenza. Small 5(21), 2407–2412 (2009) 7. Kim, C.H., Jung, C., Park, H.G., Choi, Y.K.: Novel dielectric modulated field-effect transistor for label-free DNA detection. BioChip J. 2(2), 127–134 (2008) 8. Choi, J.M., Han, J.W., Choi, S.J., Choi, Y.K.: Analytical modeling of a nanogap-embedded FET for application as a biosensor. IEEE Trans. Electron Dev. 57(12), 3477–3484 (2010) 9. Gopalakrishnan, K., Griffin, P.B., Plummer, J.D.: Impact ionization MOS (I-MOS)-Part I: device and circuit simulations. IEEE Trans. Electron Dev. 52(1), 69–76 (2005) 10. Gopalakrishnan, K., Woo, R., Jungemann, C., Griffin, P.B., Plummer, J.D.: Impact ionization MOS (I-MOS)-Part II: experimental results. IEEE Trans. Electron Dev. 52(1), 77–84 (2005) 11. Jafar Ali Ibrahim, S., et al.: Rough set based on least dissimilarity normalized index for handling uncertainty during E-learners learning pattern recognition. Int. J. Intell. Netw. 3, 133– 137 (2022). ISSN 2666–6030. https://doi.org/10.1016/j.ijin.2022.09.001. https://www.scienc edirect.com/science/article/pii/S2666603022000148 12. Jeyaselvi, M., Jayakumar, C., Sathya, M., Jafar Ali Ibrahim, S., Kalyan Chakravarthy, N.S.: Cyber security-based Multikey management system in cloud environment. In: 2022 International Conference on Engineering and Emerging Technologies (ICEET), Kuala Lumpur, Malaysia, pp. 1–6 (2022). https://doi.org/10.1109/ICEET56468.2022.100071044. https://iee explore.ieee.org/abstract/document/10007104
Chapter 51
A Hyperparameter Tuned Ensemble Learning Classification of Transactions over Ethereum Blockchain Rohit Saxena, Deepak Arora, Vishal Nagar, Satyasundara Mahapatra, and Malay Tripathi
Abstract The transactions of Ethereum are recorded on a decentralized ledger that is open to everyone. Behind a false identity, i.e., a pseudonym called an address, the true identity of the Ethereum Blockchain user is hidden. Ethereum is commonly utilized in criminal activities including gambling and ransomware threats. The activities and addresses of the numerous malevolent cybercriminal users must therefore be categorized. The Blockchain’s open data allow for a thorough investigation. This paper classifies Ethereum Blockchain addresses and users’ activities through the application of ensemble machine learning (ML) models. Randomized search on the hyperparameters is performed to improve the classifications, enhancing the models’ accuracy. In this research, the classification models are evaluated based on cross-validation (CV) accuracy, Precision, F1-score, and Recall. Random Forest has emerged as the best classification model with a CV accuracy of 59.14%, while Adaptive Boosting as the worst classification model with a CV accuracy of 54.7%. The employment of randomized search has improved the CV accuracy of the classification models. The outcomes demonstrate that it is practically possible to recognize malevolent users’ addresses and their activities. Additionally, the CV accuracy of the classification models can be optimized by the use of hyperparameter techniques.
51.1 Introduction Over the past few years, there have been great advancements in Blockchain technology [1]. It was originally proposed by Bitcoin’s creator, Nakamoto [2], in order to advance the Bitcoin cryptocurrency [3]. Following the launch of smart contracts in Ethereum, the concept has now been pushed to a number of different financial R. Saxena (B) · D. Arora Amity University Uttar Pradesh, Lucknow Campus, Lucknow, India e-mail: [email protected] V. Nagar · S. Mahapatra · M. Tripathi Pranveer Singh Institute of Technology, Kanpur, India © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_52
585
586
R. Saxena et al.
applications. A greater variety of industries are starting to use it. The Blockchain network allows anyone to join, making it possible to distinguish between the players of the different ecosystems [4]. For instance, some players (known as miners) look after the Blockchain network, while a few others manage auctions, offer gaming or other services, and so on. The protection of Internet banking systems is greatly enhanced by anti-money laundering regulations. The expense of implementing these restrictions, however, is higher. Blockchain-based cryptocurrencies provide networklevel investigation and forensic analysis because of the publicly accessible nature of their ledger [5]. But because of its dynamic nature, massive data volume, and good standard of anonymity, evaluating and researching the Blockchain network’s transactional activities take a long time. The high degree of anonymity allows shady individuals to use Blockchain-based cryptocurrencies and can easily hide from view. Money laundering can now be prevented; however, current anti-money laundering strategies are ineffective [6]. As a result, an efficient method for identifying suspect people or activities is needed for online transactions. A new avenue for research into emerging technological and economic problems has been made possible by the rapid popularity of cryptocurrencies like Bitcoin and Ethereum [7]. Along with their unreliability, Bitcoin and Ethereum have also been used and abused in a number of instances, which has caused significant financial losses [7–9]. For instance, cybercriminals impair network security by engaging in illegal operations like ransomware attacks, illegal drug trade, and the trading of illicit products [6, 10, 11], or by using cryptocurrency with Blockchain technology to gain privacy. Blockchain-based cryptocurrencies face increased security threats as a result of unauthorized activities like fraud, heists, phishing, and hacking. In order to prevent such catastrophic failures, it is essential to identify any malevolent users’ addresses and activities as soon as possible. There have been several methods developed for spotting malevolent users or transactions by looking at the transactional logs of entities on Blockchain networks, and they all, however, rely on a labor-intensive, ineffective human or partially automated analysis technique [12, 13]. To increase the security of the Blockchain network, the emphasis is on developing automated analytical software which can be used to identify the addresses of the Ethereum Blockchain users that are globally available and are involved in malicious activities. Additionally, it is often impractical to conduct a human investigation to identify suspicious addresses because of the growing amount of data being transmitted to the Blockchain [14]. There are numerous studies [6, 15, 16] addressing the issue of detecting malicious activity in the Bitcoin network, but they do not immediately apply to Ethereum because of important differences between the two [17]. However, the exact amount is transferred, in Ethereum Blockchain, across accounts without change. In addition, a single Bitcoin transaction can have a large number of inputs or outputs to combine various Unspent Transaction Outputs (UTXOs) for paying or sending money to numerous beneficiaries. Contrarily, only one sender and one recipient are permitted in an Ethereum transaction. Significant differences exist between Bitcoin and Ethereum in terms of the time taken in processing the block, the way miners are rewarded, the amount of money available, and transaction prices [7– 9, 17]. Utilizing the potential of robust machine learning algorithms for automating
51 A Hyperparameter Tuned Ensemble Learning Classification …
587
the discovery process is an attractive way to examine the challenge of identifying malevolent entities on a cryptocurrency network. This research attempts to classify addresses of Ethereum Blockchain as malevolent or non-malevolent, and subsequently the users’ activities as malevolent or non-malevolent activities. In order to examine the transactional data of various entities and comprehend the behavior of malevolent and non-malevolent users, ensemble ML classification models, i.e., Adaptive Boosting (AB), Bagging (BG), eXtreme Gradient Boosting (XGB), and Random Forest (RF) are employed. Also, the hyperparameter tuning is applied to improve the Cross-validation accuracy.
51.2 Related Work In a number of research studies [15, 16, 18–20], as well as on data accuracy prediction [21–24], machine learning methods were investigated for their potential to identify malevolent users in the cryptocurrency network. Previous studies have mostly concentrated on the Bitcoin Blockchain owing to its prominence and widespread adoption. Monamo et al. [16] employ clustering methods and an unsupervised ML technique to initially label transactions that have been recorded to the Bitcoin Blockchain network. In deciding whether the transaction was legal or illegal, researchers made a lot of inferences about its suspicion-level behavior. After all the transactions were clustered, the authors determined that 1% of all the transactions with the largest distance from the center of the respective cluster were designated as anomalous transactions. In order to analyze the earlier work under this constraint, a number of assumptions must be made, a few of which might or might not be supported by logical reasons. In [15], methods for detection of anomaly for the Bitcoin transactions are put out with the intention of identifying questionable individuals or transactions. Pham and Lee [15] base a major premise on the idea that suspicious behavior can be used as a good proxy for spotting dishonest individuals or transactions. It is evaluated whether the suggested method can spot transactions that are one of a known conglomerate for damaging transactions. Several studies [10, 25] have examined particular criminal acts. The research in [10] makes an attempt to identify Ponzi schemes, which are illegitimate investments, using data mining techniques. Although it sounds exciting, most criminal activities are novel, and their behavioral patterns do not correspond to any known pattern. This makes it difficult to train a learning system that can predict an illegal act in unseen data. Therefore, it is unlikely that the method for identifying Ponzi schemes will be helpful when very sophisticated attacks are conducted. Various other ML techniques have been presented to de-anonymize [19, 26–28] or enhancing the anonymity [28–30] over the Blockchain network, as well as employing ML algorithms for the detection of anomaly in the Bitcoin Blockchain. For instance, Harlev et al. [19] use supervised ML methods to lessen the degree of anonymity of the Bitcoin Blockchain. Utilizing the preprocessed dataset, Harlev et al. [19] manually classified and grouped addresses into a category according to their behavioral characteristics. The datasets contributed
588
R. Saxena et al.
to [19, 26] have already been manually or statistically labeled and the viability of the suggested methods is reliant on the availability and accessibility of such preprocessed data. Saxena et al. [20] presented a solution for Bitcoin anonymity by using backpropagation neural network on the Bitcoin Blockchain transactions. Saxena et al. [31] classified the Ethereum Blockchain addresses using the linear, nonlinear, and ensemble ML models. Saxena et al. [32] utilized the capabilities of ensemble learning for the classification of Bitcoin addresses to lawful and unlawful acts.
51.3 Data Preparation The procedure to gather the dataset from multiple repositories and getting the dataset ready for the purpose of analysis is the key in this research (Fig. 51.1). Analysis of behavioral patterns is challenging due to the enormity of the Ethereum Blockchain, which contains several entities with diverse transactional patterns. Each user is assigned a public address on the Ethereum network. These public addresses are employed to conduct business and move funds [33]. Additionally, certain metadata about the addresses, which also includes the labels attached to the address, are picked manually from Ethereum Block Explorer by data crawling. Due to their erratic behavior, the Ethereum ecosystem has linked these labels to addresses. The labeled dataset comprising the address of the malevolent and non-malevolent is acquired by crawling the data that is accessible in the standard Ethereum Block Explorer. While malevolent user activities include travel, trading, ticketing, and music, malevolent user behaviors are labeled as phishing, scams, adults, and hack. A sample of the unlabeled dataset is extracted from the Blockchair (universal Blockchain explorer and search engine) in order to produce the feature-rich dataset. The datasets are merged by the utilization of transaction hash which is the common feature in both the datasets. Feature scaling is the basis of the Min–max scaler, used to normalize the datasets that are a combination of datasets. Min–max normalization is one of the most often used data normalizing methods.
51.4 Methodology The overall classification process for users’ activities and addresses is shown in Figs. 51.2 and 51.3. For classification, ensemble learning models—including Adaptive Boosting (AB), Bagging (BG), Random Forest (RF), and eXtreme Gradient Boosting (XGB)—are employed in this research. Using the hyperparameters, these models were trained, and the optimal classification accuracy is achieved. Figure 51.2 demonstrates the regular process of classification. It involves selecting the classifiers, distributing the dataset into training and evaluation datasets, training and validating the dataset over the classification models, and comparing and examining the results of the classification.
51 A Hyperparameter Tuned Ensemble Learning Classification …
589
Fig. 51.1 Preparation of data
Fig. 51.2 Regular classification process
Fig. 51.3 Hyperparameter-tuned classification process
In Fig. 51.3, the hyperparameter-tuned classification has been depicted. It involves selecting the classifiers, distribution of datasets into training and validating datasets, obtaining the hyperparameters using the randomized search technique, and finally comparing and examining the results of the classification models. The Scikit-Learn packages are employed to train and validate the classifiers on Google’s Colaboratory. The non-malevolent and malevolent user activities and remaining parameters are plotted on the y-axis and x-axis, respectively, for model training and validation of the users’ activities.
51.5 Results and Discussion Training and validation of the ensemble learning model over the dataset for classifying the Ethereum user address and activities into non-malevolent and malevolent has been carried out in this research. Table 51.1 illustrates the proportion of Ethereum addresses used for the user activity.
590
R. Saxena et al.
Table 51.1 Share of Ethereum addresses Category
Users’ activities
Address count
Malevolent
Scam
403,546
9.06
Adult
632,564
14.20
Phishing
685,849
15.40
Hack
750,489
16.85
Music
525,087
11.79
Charity
438,163
9.84
Trading
456,213
10.24
Donate
562,032
12.62
Non-malevolent
Total
Users’ address (%)
4,453,943
Figure 51.4 displays the percentage share of Ethereum Blockchain users’ addresses from the available dataset that is used for malevolent and non-malevolent activities based on the users’ activities. The fraction of addresses associated with malevolent activities is 55.50%, compared to the share associated with non-malevolent activities, which is 45.50%. The metrics F1-score, Recall, Precision, and Cross-Validation (CV) accuracy are used to assess and compare learning algorithms. The ratio of accurate predictions to all input samples is how CV accuracy is measured. Precision is determined by the ratio of the number of positive outcomes to all positive outcomes predicted by the classification model. It is a classifier accuracy metric that ranges from 0 to 1, with 1 being the highest level of Precision. A Recall score ranges from 0 to 1, with 1 representing a high level of Recall. The harmonic mean of Recall and Precision is used to generate the F1-score. It accepts inputs in the range of 0 to 1. The classification report for the models is displayed in Table 51.2. The classification report of various models when applied to user activities on the Ethereum Blockchain is graphically represented in Fig. 51.5. When trained on the
SHARE OF ETHEREUM ADDRESS(%) 14.20
15.40
16.85 11.79
9.06
Scam
Adult
Phishing
Hack
Malevolent
Fig. 51.4 Share of Ethereum addresses (%)
Music
9.84
10.24
Charity
Trading
Non-Malevolent
12.62
Donate
51 A Hyperparameter Tuned Ensemble Learning Classification …
591
Table 51.2 Classification report Classification models
CV accuracy (%)
Precision
Recall
F1-score
AB
54.7
0.49
0.6
0.63
BG
55.54
0.53
0.53
0.54
RF
59.14
0.51
0.55
0.53
XGB
56.08
0.47
0.61
0.42
dataset, Random Forest (RF) yields better CV accuracy, while the rest of the model produces relatively lower CV accuracy. After classifying the data, the parameters are adjusted by using the random search strategy. The exhaustive selection of all possible combinations loses ground to the random selection among all possible combinations. A random search can outperform a grid search [34]. Table 51.3 shows the various hyperparameters for ensemble learning. The hyperparameters are used to train and validate the classification models. The use of randomized search has improved the CV accuracy of the classification models.
CV Accuracy(%)
59.14
54.7
AB
56.08
55.54
BG
RF
XGB
Ensemble Classification Models Fig. 51.5 Comparison of ensemble classification model
Table 51.3 Hyperparameters for ensemble learning algorithms Ensemble learning
Hyperparameter
Adaptive Boosting (AB)
‘algorithm’: ‘SAMME’, ‘n_estimators’: 23, ‘learning_ rate’: 1.05
Bagging (BG)
‘max_samples’: 12, ‘max_features’: 6, ‘n_estimators’: 140
Random Forest (RF)
‘max_features’: ‘sqrt’, ‘min_samples_leaf’: 2, ‘bootstrap’: True, ‘max_depth’: 4, ‘n_estimators’: 130, ‘min_samples_ split’: 12
eXtreme Gradient Boosting (XGB) ‘colsample_bytree’: 0.6, ‘gamma’: 0.5, ‘min_child_ weight’: 1, ‘learning_rate’: 1, ‘max_depth’: 4
592
R. Saxena et al.
CV Accuracy(%)
Table 51.4 Regular versus randomized search
Classification models
Regular (%)
Randomized search (%)
AB
56.08
57.28
BG
55.54
57.43
RF
54.7
59.35
XGB
59.14
63.82
63.82 56.08 57.28
AB
55.54
59.14
54.7
BG Regular
59.35
57.43
RF
XGB
Randomized Search
Fig. 51.6 Comparison of regular versus hyperparameter-tuned CV accuracy
Table 51.4 shows the comparison of regular CV accuracy with CV accuracy obtained after applying the randomized search strategy. Figure 51.6 depicts the comparison of regular and hyperparameter-tuned CV accuracy showing the supremacy of classification using hyperparameters over the regular classification.
51.6 Conclusion and Future Work In order to figure out the proportion of the addresses of Ethereum Blockchain participating as malevolent and non-malevolent activities, respectively, this research does a binary classification. In this paper, the capability of ensemble learning models has been examined in order to discriminate between Ethereum Blockchain users involved in malevolent and non-malevolent activities. These models are assessed using the CV Accuracy, Recall, Precision, and F1-score. To improve the models’ CV accuracy, randomized search, a hyperparameter tuning approach, is applied. The CV accuracy has been significantly increased by the use of randomized search. This research shows that it is feasible to perform the classification Ethereum Blockchain addresses into malevolent and non-malevolent activities, indicating that it is not difficult to identify the Ethereum users who are engaged in malevolent activities. With the help of this binary classification, the researchers can possibly de-anonymize Ethereum users.
51 A Hyperparameter Tuned Ensemble Learning Classification …
593
Although the results indicate that using hyperparameters has improved crossvalidation accuracy, the resulting CV accuracy is on the lower end. Results can be enhanced using heuristic feature selection methods, hybrid algorithms, or both.
References 1. Zheng, Z., Xie, S., Dai, H.N., Chen, X., Wang, H.: Blockchain challenges and opportunities: a survey. Int. J. Web Grid Serv. 14(4), 352–375 (2018) 2. Nakamoto, S.: Re: Bitcoin P2P e-cash paper. In: The Cryptography Mailing List, pp. 1–2 (2008) 3. Mukhopadhyay, U., Skjellum, A., Hambolu, O., Oakley, J., Yu, L., Brooks, R.: A brief survey of cryptocurrency systems. In: 2016 14th Annual Conference on Privacy, Security and Trust (PST), pp. 745–752 (2016) 4. Bonifazi, G., Corradini, E., Ursino, D., Virgili, L.: A social network analysis-based approach to investigate user behaviour during a cryptocurrency speculative bubble. J. Inf. Sci. 01655515211047428 (2021) 5. Cheng, Z., Hou, X., Li, R., Zhou, Y., Luo, X., Li, J., Ren, K.: Towards a first step to understand the cryptocurrency stealing attack on Ethereum. In: 22nd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2019), pp. 47–60 (2019) 6. Weber, M., Domeniconi, G., Chen, J., Weidele, D.K.I., Bellei, C., Robinson, T., Leiserson, C.E.: Anti-money laundering in bitcoin: experimenting with graph convolutional networks for financial forensics (2019). arXiv preprint arXiv:1908.02591 7. Conti, M., Kumar, E.S., Lal, C., Ruj, S.: A survey on security and privacy issues of bitcoin. IEEE Commun. Surv. Tutor. 20(4), 3416–3452 (2018) 8. Atzei, N., Bartoletti, M., Cimoli, T.: A survey of attacks on Ethereum smart contracts (SoK). In: International Conference on Principles of Security and Trust, pp. 164–186. Springer, Berlin, Heidelberg (2017) 9. Meng, W., Tischhauser, E.W., Wang, Q., Wang, Y., Han, J.: When intrusion detection meets blockchain technology: a review. IEEE Access 6, 10179–10188 (2018) 10. Bartoletti, M., Pes, B., Serusi, S.: Data mining for detecting bitcoin ponzi schemes. In: 2018 Crypto Valley Conference on Blockchain Technology (CVCBT), pp. 75–84 (2018) 11. Saxena, R., Arora, D., Nagar, V., Mahapatra, S.: Bitcoin: a digital cryptocurrency. In: Blockchain Technology: Applications and Challenges, pp. 13–28. Springer, Cham (2021) 12. Vasek, M., Moore, T.: Analyzing the Bitcoin Ponzi scheme ecosystem. In: International Conference on Financial Cryptography and Data Security, pp. 101–112. Springer, Berlin, Heidelberg (2018) 13. Brenig, C., Müller, G.: Economic analysis of cryptocurrency backed money laundering (2015) 14. Jovicic, S., Tan, Q.: Retracted: machine learning for money laundering detection in the block chain financial transaction system. J. Fundam. Appl. Sci. 10(4S), 376–381 (2018) 15. Pham, T., Lee, S.: Anomaly detection in the bitcoin system—a network perspective (2016). arXiv preprint arXiv:1611.03942 16. Monamo, P.M., Marivate, V., Twala, B.: A multifaceted approach to Bitcoin fraud detection: global and local outliers. In: 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 188–194 (2016) 17. Chen, T., Li, Z., Zhu, Y., Chen, J., Luo, X., Lui, J.C.S., Lin, X., Zhang, X.: Understanding Ethereum via graph analysis. ACM Trans. Internet Technol. 20(2), 1–32 (2020) 18. Rahouti, M., Xiong, K., Ghani, N.: Bitcoin concepts, threats, and machine-learning security solutions. IEEE Access 6, 67189–67205 (2018) 19. Harlev, M.A., Sun Yin, H., Langenheldt, K.C., Mukkamala, R., Vatrapu, R.: Breaking bad: de-anonymising entity types on the bitcoin blockchain using supervised machine learning. In: Proceedings of the 51st Hawaii International Conference on System Sciences (2018)
594
R. Saxena et al.
20. Saxena, R., Arora, D., Nagar, V.: Integration of back-propagation neural network to classify of cybercriminal entities in blockchain. In: Proceedings of Trends in Electronics and Health Informatics, pp. 523–532. Springer, Singapore (2022) 21. Karthik, S., Bhadoria, R., Lee, J., Sivaraman, A.K., Samanta, S., Balasundaram, A., Chaurasia, B., Ashokkumar, S.: Prognostic Kalman filter based Bayesian learning model for data accuracy prediction. Comput. Mater. Contin. 72, 243–259 (2022). https://doi.org/10.32604/cmc.2022. 023864 22. Singh, R., Agarwal, B.B.: Automatic image classification and abnormality identification using machine learning. In: Proceedings of Trends in Electronics and Health Informatics: TEHI 2021, pp. 13–20. Springer Nature Singapore, Singapore (2022) 23. Singh, R., Agarwal, B.B.: An automated brain tumor classification in MR images using an enhanced convolutional neural network. Int. J. Inf. Technol. 17, 1 (2022) 24. Singh, R., Agarwal, B.B.: A hybrid approach for detection of brain tumor with levy flight cuckoo search. Webology. 19(1), 5388–5401 (2022) 25. Chen, W., Zheng, Z., Ngai, E.C.H., Zheng, P., Zhou, Y.: Exploiting blockchain data to detect smart ponzi schemes on Ethereum. IEEE Access 7, 37575–37586 (2019) 26. Ermilov, D., Panov, M., Yanovich, Y.: Automatic bitcoin address clustering. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 461–466 (2017) 27. Meiklejohn, S., Pomarole, M., Jordan, G., Levchenko, K., McCoy, D., Voelker, G.M., Savage, S.: A fistful of bitcoins: characterizing payments among men with no names. In: Proceedings of the 2013 Conference on Internet Measurement Conference, pp. 127–140 (2013) 28. Möser, M., Böhme, R.: Anonymous alone? Measuring bitcoin’s second-generation anonymization techniques. In: 2017 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), pp. 32–41 (2017) 29. Ziegeldorf, J.H., Matzutt, R., Henze, M., Grossmann, F., Wehrle, K.: Secure and anonymous decentralized bitcoin mixing. Futur. Gener. Comput. Syst. 80, 448–466 (2018) 30. Möser, M., Böhme, R.: The price of anonymity: empirical evidence from a market for bitcoin anonymization. J. Cybersec. 3(2), 127–135 (2017) 31. Saxena, R., Arora, D., Nagar, V.: Classifying transactional addresses using supervised learning approaches over Ethereum blockchain. Proc. Comput. Sci. 1(218), 2018–2025 (2023) 32. Saxena, R., Arora, D., Nagar, V.: Efficient blockchain addresses classification through cascading ensemble learning approach. Int. J. Electron. Secur. Digit. Forens. 15(2), 195–210 (2023) 33. Buterin, V.: A next-generation smart contract and decentralized application platform. White Paper 3.37, 2-1 (2014) 34. Liashchynskyi, P., Liashchynskyi, P.: Grid search, random search, genetic algorithm: a big comparison for NAS (2019). arXiv preprint arXiv:1912.06059
Chapter 52
Data Integrity Protection Using Multi-level Reconstructive Error Data and Auditing for Cloud Storage Kaushik Sekaran, B. Seetharamulu, J. Kalaivani, Vijayalaxmi C. Handaragall, and B. Venkatesh
Abstract Cloud storage services offer all the users/clients to save any huge quantity of data and to access the data anytime from anywhere. Clients can edit or view their data anytime, but costing may be valid for whichever data they are using. In this paper, we introduce system to enable data replication, integrity checking, and data recovery on cloud data. This approach may increase the data integrity in the cloudbased storage. A hybrid approach using replication along with coding technique is used to tackle the reconstruction. Erasure coding of data is coupled with extra parity bits to reduce the reconstruction cost. This enables a reduction in reconstruction time while maintaining a reasonable storage overhead. Overall, our novel approach in the cloud storage is cost-effective as well as increasing the data integrity protection in the cloud storage.
K. Sekaran (B) · B. Seetharamulu Department of Computer Science and Engineering, Faculty of Science and Technology (Icfai Tech), ICFAI Foundation for Higher Education, Hyderabad, India e-mail: [email protected] B. Seetharamulu e-mail: [email protected] J. Kalaivani Computing Technologies, SRM Institute of Science and Technology, Kattankulathur, Chennai, India e-mail: [email protected] V. C. Handaragall School of Computing and Information Technology, REVA University, Bengaluru, India e-mail: [email protected] B. Venkatesh Department of Computer Science and Engineering (AIML&IoT), VNR Vignana Jyothi Institute of Engineering & Technology, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_53
595
596
K. Sekaran et al.
52.1 Introduction Cloud computing is growing in rapid phase, especially for the demand in secure cloud storage. The Pay-As-You-Go (PAYG cloud computing) model is termed as Public Cloud when it is offered to public. Internal data centers of institutions, businesses, or organizations not made available to the general public are termed as private cloud. Cloud computing generally gives users’ access to infinite resources available on demand. This generally is purposed for uses such as scalability and load surges. Cloud computing also does not require users to pay lump sum upfront charges for users. This allows even small-to-medium enterprises to utilize the cloud and scale usage as and when needed. In recent times, cloud storage has gained popularity due to its scalability as well as its smaller setup costs. In cloud storage since the data has been sent and outsourced for a third party, security concerns arise. In cloud, always we need to allow the integrity checking of data outsourced to the cloud as well as enable data recovery if the data drops at the public cloud storage provider due to corruption or malicious attacker. Once data is outsourced to a cloud storage and during integrity checks, it is found to be corrupted, and the next step is to recover the original data. Recovery of a failed server data can be achieved through proper data encoding techniques. If the entire data is on a single server, it results in a single point of failure [1]. Hence, it is logical to distribute the data across multiple servers. Microsoft’s cloud storage service, Azure uses replication to maintain data integrity and recovery in case of failure. The data is replicated three times in RAID arrays [2]. If the data can be coded, such that reconstruction is possible, the triple replication scheme can be avoided. The advantages of removing a replication-based scheme are tremendous cost savings as total data stored is in the range of Exabyte’s. Lesser hardware is required for storing the same amount of data with same fault tolerance level [3]. When using erasure coding instead of replication, performance is impacted negatively. In case of a disk, node, or rack failure, reconstruction is initiated. If the end user requests the same data, the data cannot be served instantly. Hence, the aim is to minimize reconstruction time. The main goals are: • Minimize the fragments required to read during reconstruction. This reduces network overhead and I/O operations. It also reduces the overall read time for reconstruction • Reduction in storage overhead compared to replication.
52 Data Integrity Protection Using Multi-level Reconstructive Error Data …
597
52.2 Literature Survey Static data integrity checking is first considered in the paper by Juels and Kaliski [4] and Ateniese et al. [5]. These techniques were designed for single server systems, thus limiting their use in a multi-node environment. If the single server is under the control off, an attacker integrity checks will reveal corrupted data, but reconstruction is impossible in this case. For a multi-server setting, some of the techniques proposed are replication [6], erasure coding [7], and regenerating coding [8]. Here, data is encoded and distributed across multiple servers. In case of a failure, data can be reconstructed from a subset of the total servers. Few authors in the paper [9–11] proposed about the necessary models for data optimization and data deduplication for the energy improvements in cloud. Chen et al. [8] propose regenerating codes based on the work of POR scheme by Shacham and Waters [12]. This work assumes that the server has encoding capabilities. This work is not applicable in a thin-cloud setting used by cloud storage providers and with the use of servers and hardware. HAIL [7] is based on erasure coding. It is developed for a per file-based storage. It is a service-oriented version of RAID. Some of the cloud storage architectures as mentioned in the papers of Kaushik et al. [13, 14] are very much useful in analysis of erasure code in the cloud data. Also, the papers which have the core security module for the data reconstructions have been mentioned by Kaushik et al. in the papers [15, 16].
52.3 Proposed System In the proposed system, a MLRC, multi-level local reconstruction coding is employed. A parity-based code is being used for data reconstruction while providing higher speed for recovery. In case of data corruption/loss due to a malicious attacker or data corruption, the following procedure is followed: Algorithm 1. Verify data fragments in cloud Server. 2. If data is corrupted, • Check Global Parities 3. If Global Parity is corrupted • Use Multi-level parity to recover Global parity 4. Follow Local Reconstruction technique for Data fragment loss. MLRC coding provides for the data reconstruction, and TPA is employed for integrity checking. Figure 52.1 represents the cascade layers of storage.
598
K. Sekaran et al.
Fig. 52.1 Cascade layers of storage
52.3.1 Coding Equations Consider coding coefficient’s θ and φ for groups x and y, respectively. Let Dx,0 = θ0 x0 + θ1 x1 + θ2 x2 , Dx,1 = θ02 x0 + θ12 x1 + θ22 x2 , Dx,2 = x0 + x1 + x2 . And D y,0 = φ0 y0 + φ1 y1 + φ2 y2 , D y,1 = φ02 y0 + φ12 y1 + φ22 y2 , D y,2 = y0 + y1 + y2 , C0 = Dx,0 + D y,0 , C1 = Dx,1 + D y,1 ,
52 Data Integrity Protection Using Multi-level Reconstructive Error Data …
599
C x = Dx,2 , C y = D y,2 . Now, we have to determine the values of θ and φ. The equations obtained can be represented by the coefficient matrix. ⎛
1 ⎜0 G=⎜ ⎝ θi θi2
1 0 θj θ 2j
0 1 φs φs2
⎞ 0 1 ⎟ ⎟, φt ⎠ φt2
( ) ) ( Det(G) = θ j − θ _i (φt − φs ) θi + θ j − φs − φt . Only one of p x and p y fails: Assuming p y fails, ⎛
⎞ 1 1 0 G ' = ⎝ θi θ j φs ⎠, θi2 θ 2j φs2 ( )( ) ( ) Det G ' = φs θ j − θi φs − θ j − θi . Both P x and p y fail: ''
G =
(
) θi φs , θi2 φs2
( ) Det G '' = θi φs (φs − θ _i ). To ensure all cases are recoverable, the matrices G, G ' , and G '' must be nonsingular. Hence, θi , θ j , φs , φt /= 0, θi , θ j /= φs , φt , θi + θ j /= φs + φt . To fulfill conditions, we can select θ and φ such that they belong to the finite ( these ) field G F 24 . In this Galois field, each element has four bits. The lower two bits for
600
K. Sekaran et al.
Fig. 52.2 Steps in data fragmentation
θ are set to zero and the higher two bits for φ are set to zero. Figure 52.2 represents the steps in data fragmentation.
52.3.2 Parity Distribution We propose an additional parity px y which is computed from parities. The basic idea is to enable parity reconstruction in the event of parity loss and through hierarchy of parity structure that is given in Fig. 52.3. Here, in the traditional LRC case, Px y is absent. Hence, if P0 or P1 is lost, the reconstruction cost is 6. In our system, P0 or P1 can be reconstructed using Px y with a reconstruction cost of 2. So, we readily achieve a cost reduction of 33%. The additional parity will contribute to storage overhead. Fig. 52.3 Hierarchy of parity structure
52 Data Integrity Protection Using Multi-level Reconstructive Error Data …
601
52.4 Module Description The architecture of data storage in cloud servers is represented in Fig. 52.4, and the implementation of the hybrid approach was done using the given functional modules: 1. 2. 3. 4. 5. 6.
Data owner. Main cloud server. Replica cloud server. Encryption and parity. Erasure coding. Third-party auditor.
52.4.1 Data Owner Data owner module will upload the data in the cloud server. Main cloud server: where the data owner stores the data. For retrieval, the data owner sends the request for the data they want to retrieve, and the request will first be sent to the cloud server.
52.4.2 Replication Cloud Server A full replica of the data in the main cloud server is maintained in the replication cloud server. This enables us to instantly service the data owner request for data even if the main cloud server fails, without reconstruction cost for one point of failure.
Fig. 52.4 Architecture of data storage in cloud servers
602
K. Sekaran et al.
52.4.3 Encryption The data which has to be upload into the cloud server will be encrypted and split into multiple parts and stored in separate main cloud servers. Encryption is employed so that the data is still encrypted even if the data is stolen from cloud storage provider. Parity is calculated and added to each of the separate parts.
52.4.4 Key Server The keys are used for encryption/decryption and can be tied to user accounts.
52.4.5 Erasure Coding We employ erasure coding on the date pre-storage. Erasure coding requires the use of Galois field arithmetic. We can either use multiplication or XOR operations. Using some multiplication methods based on integer arithmetic would slow down the processing. Employing multiplication would require preprocessing of addition and addition and multiplication tables. Hence, it was avoided. The XOR operation was employed since modern CPUs perform XOR.
52.4.6 Third-Party Auditor The auditor is used for the challenge response handling required for integrity checking. A third party requests the cloud storage to verify the file. The auditor verifies each part separately. In case of failure, erasure coding is invoked for the part which failed the integrity check. The decode and encode functions are given in Fig. 52.5.
Fig. 52.5 Decode and encode functions
52 Data Integrity Protection Using Multi-level Reconstructive Error Data …
603
52.5 Implementation For implementing the system, we have used Google Cloud Platform with the Intel Xeon (R) Dual core 2.5 Ghz processor coupled with 13 Gb Ram, Redhat VirtIO Ethernet Adapted, 4 Gbps Network link on a static IP running Windows Server 2012 and this whole configuration helped in developing the front end and Key Server. The implementation employed a real-time cloud storage provider, Dropbox. The main cloud server and replica cloud server were created as Dropbox storage locations. Using Java programming language, we have developed the TPA module that has the Key server which is connected with a MySQL database. The Key Server and the TPA module were on the Google Cloud Machine.
52.6 Analysis All six data fragments are required for each parity computation. When any of the data fragments becomes unavailable, in this case all six fragments may be used for reconstruction. Reconstruction cost is defined as the fragments required to reconstruct an unavailable data fragment [17, 18]. Hence, the reconstruction cost is 6. In our system, we divide the parities into local and global. A global parity is computed from all data fragments. A local parity uses only a subset of fragments. We use four parities, the first two parities (p0 and p1) are global parities. The other two parties are local parities, px is computed from three data fragments (x0, x1, x2), and py is computed from three data fragments (y0, y1, y2). Assume we lose x0, and for reconstruction of x0, we require px and (x1 and x2) fragments. Here, the reconstruction cost of any one data fragment is 3. We have effectively reduced the reconstruction cost by adding a parity.
52.6.1 Average Repair Rate Average repair rate of single fragment failures = 6.9270 × 10−04 .
52.6.2 Cost and Performance Trade-Offs A (6,2,2) Reconstruction Code parameter yields the following calculations in data fragments. The coding parameters can be altered to chive the storage overhead of three replications, but the reconstruction cost will vary accordingly. In real-time implementations, lower reconstruction costs are preferred.
604
K. Sekaran et al.
52.6.3 Code Parameters As we vary the (k,l,r) parameters in the Reconstruction Code, the cost and performance also varies accordingly. Let us take a (6,3) Reed–Solomon code, it has an overhead of 1.5 × and reconstruction cost of 6. We keep overhead constant across the codes in this case. We can use a (12,4,2) Reconstruction Code to achieve for 1.5 × overhead and get a reconstruction cost of 3 that is equal to 50% reduction. If we keep the reconstruction cost as 6, then we have to replace Reed–Solomon (6,3) with a (12,2,2) Reconstruction Code, so that we might get a 1.3× storage overhead compared to a 1.5× overhead in the Reed–Solomon codes. In order to achieve a significant data savings in the disks, the system has to process the codes in the form of Exabyte’s of storage in the server. The Reconstruction Code system was designed for reconstructing data fragments. In case of parity fragment loss, modern codes such as Weaver [19], HoVer [20] are more efficient [3, 18]. Here, Reconstruction Code trades parity reconstruction for performance of data reconstruction. Hence, the overall coding techniques and MTTF (years) are represented in Fig. 52.6 and detailed view is given in Table 52.1.
MTTF (In years) 3E+12 2.5E+12 2E+12 1.5E+12 1E+12 5E+11 0 3-code Replication
(6,3) –code Reed-Solomon
(6,2,2) LRC
(6,2,2,1) MLRC
Fig. 52.6 Graphical representation of some coding techniques versus MTTF years
Table 52.1 Coding technique and MTTF years
Coding technique
MTTF years
3-code replication
3.5 × 109
(6,3)-code Reed–Solomon
6.1 × 1011
(6,2,2) LRC
2.6 × 1012
(6,2,2,1) MLRC
2.3 × 109
52 Data Integrity Protection Using Multi-level Reconstructive Error Data …
605
52.7 Conclusion Erasure coding is a fundamental necessity in cloud storage to reduce storage cost. This research article is clearly addressing the erasure codes with the following meta-data. A (6,2,2) Reconstruction Code requires a 1.67 × storage overhead, while (12,2,2) can achieve a 1.3 × storage overhead saving significant I/O’s and reconstruction bandwidth compared to 12,4 Reed–Solomon. Reed–Solomon Codes are MDS that requires minimum capacity for the system which did not fail. Reconstruction Code uses additional parity sacrificing storage overhead for more efficient data reconstruction as used in other state-of-the-art codes such as Weaver and HoVer. A (12,2,2) code can tolerate three errors like a three-replication scheme. The Reconstruction Code scheme can be varied to achieve performance or overhead optimization. Hence, data integrity has been protected by using multi-level reconstructive error data and auditing in the cloud storages.
References 1. Armbrust, M., et al.: A view of cloud computing. Commun. ACM 53(4), 50–58 (2010) 2. Weatherspoon, H., Kubiatowicz, J.D.: Erasure coding vs. replication: a quantitative comparison. In: Proceedings of IPTPS (2002) 3. Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S.: Erasure coding in windows azure storage. In: USENIX Annual Technical Conference (2012) 4. Jules, A., Kaliski, B.S.: PORs: proofs of retrievability for large files. In: Proceedings of 14th ACM Conference on Computer and Communication Security (2007) 5. Ateniese, G., Burns, R., Curtmola, R., Herring, J., Khan, O., Kissiner, L., Peterson, Z., Song, D.: Remote data checking using provable data possession. ACM Trans. Inf. Syst. Secur. 14, Article 12 (2011) 6. Curtmola, R., Khan, O., Burns, R., Ateniese, G.: MR-PDP: multiple-replica provable data possession. In: Proceedings on IEEE 28th International Conference Distributed Computing Systems (ICDCS ’08) (2008) 7. Bowers, K.D., Juels, A., Oprea, A.: HAIL: a high-availability and integrity layer for cloud storage. In: Proceedings of 16th ACM Conference on Computer and Communication Security (CCS ’09) (2009) 8. Chen, B., Cutmola, R., Ateniese, G., Burns, R.: Remote data checking for network coding-based distributed storage systems. In: Proceedings of ACM Workshop Cloud Computing Security (CCSW ’10) (2010) 9. Rajakumar, R., Sekaran, K., Hsu, C.-H., Kadry, S.: Accelerated grey wolf optimization for global optimization problems (2021). https://doi.org/10.1016/j.techfore.2021.120824 10. Kaushik, S., Singh, S., Pathan, R.K.: Design of novel cloud architecture for energy aware cost computation in cloud computing environment (2017). https://doi.org/10.1109/IPACT.2017.824 5199 11. Sekaran, K., Venkata Krishna, P., Swapna, Y., Lavanya Kumari, P., Divya, M.P.: Bio-inspired fuzzy model for energy efficient cloud computing through firefly search behaviour methods (2020). https://doi.org/10.1007/978-3-030-41862-5_106 12. Shacham, H., Waters, B.: Compact proofs of retrievability. J. Crytol. 26, 442–483 (2013) 13. Sekaran, K., Krishna, P.V.: Big cloud: a hybrid cloud model for secure data storage through cloud space (2016). dx.doi.org/https://doi.org/10.1504/IJAIP.2016.075731
606
K. Sekaran et al.
14. Sekaran, K., Krishna, P.V.: Cross region load balancing of tasks using region-based rerouting of loads in cloud computing environment (2017). https://doi.org/10.1504/IJAIP.2017.088151 15. Sekaran, K., Khan, M.S., Patan, R., Gandomi, A.H., Krishna, P.V., Kallam, S.: Improving the response time of M-learning and cloud computing environments using a dominant firefly approach. https://doi.org/10.1109/ACCESS.2019.2896253 (2019) 16. Sekaran, K., Rajakumar, R., Dinesh, K., Rajkumar, Y., Latchoumi, T.P., Kadry, S., Lim, S.: An energy-efficient cluster head selection in wireless sensor network using grey wolf optimization algorithm (2020). https://doi.org/10.12928/TELKOMNIKA.v18i6.15199 17. Bhagwan, R., Tati, K., Cheng, Y.-C., Voelker, G.M.: Total recall: system support for automated availability management. In: Symposium on Networked Systems Design and Implementation (NSDI) (2004) 18. Rodrigues, R., Liskov, B.: High availability in DHTs: erasure coding vs. replication. In: Peerto-Peer Systems IV: 4th International Workshop, IPTPS 2005, Ithaca, NY, USA, February 24–25, 2005. Revised Selected Papers 4. Springer Berlin Heidelberg (2005) 19. Hafner, J.L.: WEAVER codes: highly fault tolerant erasure codes for storage systems. Fast 5 (2005) 20. Hafner, J.L.: HoVer erasure codes for disk arrays. In: International Conference on Dependable Systems and Networks (DSN’06). IEEE (2006)
Chapter 53
Emotion-Based Song Recommendation System A. R. Sathya and Alluri Raghav Varma
Abstract In current scenario, many music applications are competing with each other to increase their customer base. The user chooses the app based on criteria like less latency, UI friendly, and so on. Existing song recommendation systems suggest songs based on the user’s previous music preferences, such as by examining at his previous song choices, the amount of time he spends listening to music, etc. But classifying the data and preparing separate playlists based on user’s history are time-consuming. But, song recommendation based on user emotions is another advancement where the application recommends songs based on user’s facial expressions. The user emotions can be determined by their facial expressions. Therefore, the need to read a person’s emotions and recommending songs accordingly is something new to explore. Thus, in this paper, we propose a machine learning approach that focuses on detecting human emotions from the input image and creating a music playlist based on the emotions detected. The machine learning model, Convolutional Neural Network (CNN) is used for image classification in this approach. The main goals of the suggested work are to analyze the user’s image, categorize their emotion, and offer songs depending on that feeling to increase customer satisfaction.
53.1 Introduction Our daily lives are significantly influenced by music. It has been believed that music may change a person’s disposition. Capturing the user image and identifying the emotion displayed by a person based on which the songs that fit that person’s mood can be suggested, which can gradually calm the user’s mind. While a user might have a large collection of songs, but frequently struggle when trying to manually create playlists. Additionally, it is challenging to remember every song. When music A. R. Sathya (B) · A. R. Varma Department of Data Science and Artificial Intelligence, Faculty of Science and Technology, ICFAI Foundation for Higher Education, Hyderabad, Telangana, India e-mail: [email protected] A. R. Varma e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_54
607
608
A. R. Sathya and A. R. Varma
is uploaded, it is possible for it to be unheard for a while. This uses up a lot of storage space on the device and necessitates active song removal by the user. As a result, this project aims to identify the user’s emotion and then play music that has been especially created for them. The new prospect in the realm of music information retrieval is the automatic analysis and understanding of music by computers. Researchers are involved in a wide variety of topics linked to musicology, including computer science, digital signal processing, mathematics, and statistics, because music content is so diverse and rich. By identifying and capturing the user’s emotions in real time, this notion suggests music to the user. In this paper, we presented a method to categorize various types of music into distinct moods, such as joyful, sad, furious. Existing techniques were employing collaboration techniques that used user data from prior sessions to select music. However, these techniques take a lot of manual work. The proposed emotionbased song recommender system combines the Chrome web browser as its front end and the machine learning algorithm to detect emotions on the user’s face. Based on the emotion detected, the songs would be recommended to the users [1]. The proposed system uses a real-time machine that can access the nearby machinery to take a picture of a human. In addition to that, a queue playlist is created for making a personal playlist. The Python Eel library is used to select a random song from the playlist. Additionally, the image captured will then be compared with the dataset already stored in the local device through processing. In order to achieve the mentioned functionalities, OpenCV, EEL, NumPy, and other libraries have been used for implementation. Because music has played such an important role recently, this system is primarily suggested.
53.2 Literature Review A research work of Londhe [2] examines variances in the curves of faces and the intensity of the pixels that correspond to them. In this work, an Artificial Neural Network (ANN) was employed by the author to categorize the feelings. The author also recommends a number of playlists’ strategies. Tambe [3] put out a suggestion for an automatic music player that learnt every user’s tastes, feelings, and activities and then provided songs in response. In his approach, the users’ numerous facial expressions are captured in order to analyze their emotions and anticipate the musical style. Jha [4] suggested employing image processing to create an emotional music player. This research demonstrated how diverse algorithms and strategies provided by the researchers from their study may be used to link the music playlist and the user emotions. By playing the most appropriate song for the user’s expression, it helped to lessen the user’s work in making and managing playlists and gave music listeners an amazing experience. Habibzad [5] in his research work suggested a new technique with three phases, namely preprocessing phase, feature extraction phase,
53 Emotion-Based Song Recommendation System
609
and classification phase—to identify the user’s facial expression. An emotion-based music recommendation using Convolutional Neural Network (CNN) is proposed by Abdul et al. [6]. In [7–9], recommendations based on mood and user emotions using machine learning are discussed. The organization of the paper is as follows: The initial section provides an overview of the proposed system architecture, including preprocessing and filtering that are used to extract different facial features, following which the methodologies adapted in the proposed system are discussed in detail. The implementation details are presented in the next section with conclusion and future work.
53.3 Proposed System Architecture The suggested system has the ability to recognize the person’s facial movements and extract facial landmarks from those movements. These landmarks are then categorized to ascertain the user’s specific emotion. Once the emotion was determined, the user would be shown songs that matched that emotion. It could let a user decide which tune to listen to, which could help the user feel less stressed. The user would save time by not having to search for or look up tunes. The three main modules proposed in this work are, Audio extraction module, Emotion extraction module, and Emotion Audio extraction module. Although the proposed system has some limitations, such as its inability to accurately capture all emotions because the image dataset being utilized only included a limited number of photos. To get accurate results from the classifier, the image that is input into it needs to be captured in a well-lit environment. The image quality must be at least higher than 320p [10] (Fig. 53.1).
53.4 Methodologies 53.4.1 Face Capturing This session’s primary goal is to record photos; thus, we are using a camera here; however, you may also utilize other physiological devices. We are making use of the computer vision library for that. Because it can now be more easily integrated with other libraries that support NumPy, which is how it is mostly used for real-time computer vision, this is a benefit. When the process’s initial execution begins, the camera stream is accessed to begin taking roughly ten photos for later processing and emotion recognition. In order to take the photographs and perform face detection throughout the project’s initial phase, we use an algorithm to categorize the photos, and to accomplish that, we need a lot of positive images that only have people’s faces on them, as well as negative images that only have individuals without faces on them to train the classifier. The model makes use of the categorized photos.
610
A. R. Sathya and A. R. Varma
Fig. 53.1 Flow diagram of proposed emotion-based recommendation system
53.4.2 Face Detection Face recognition is one of the best techniques to assess human emotions. This is why we adopt it in particular as it maximizes the classification during training across classes. The image recognition system uses the Principal Component Analysis (PCA) approach to minimize the face space dimensions before applying the Fisher’s linear discriminant (FDL) or Linear Discriminant Analysis (LDA) method to extract the feature of the images. We can categorize the expressions by utilizing the matching faces’ technique, which employs the simplest Euclidean algorithm. The process of image recognition is aided by this algorithm. Fisher does not use illustrative images as the subject of their open CV because the emphasis is primarily on the class-specific transformation matrices and emotion is primarily drawn from the model’s assertion that the value assessed from the process can assist us in determining the user’s state of mind. It is possible to play music based on suggestions given by the system by comparing datasets where every emotion is matched with tens of recorded photos and scale delivers the precise sentiment or emotion. And unlike other software now on the market, it is not dependent on other personal Information. The proposed system architecture is presented in Fig. 53.2.
53 Emotion-Based Song Recommendation System
611
Fig. 53.2 High-level architecture of the proposed system
53.4.3 Emotion Classification When a face is successfully identified, a box will appear and overlay the image, allowing the face to be retrieved for further investigation. In the subsequent step, a function processes the previously extracted photos and uses use the index-based approach to compute the intensity values of the pixel at each location. In order to predict the class that contains the emotion, it compares the input data to the stored data. If any of the following four emotions—sadness, neutrality, anger, or happiness—are present, command to classify them will be executed. It also seems like speed is deteriorating (Fig. 53.3).
53.4.4 Music Recommendation Real-time photographs are captured using the web camera as the source of the input images. Since it is challenging to describe all of the emotions, we will concentrate on the four fundamental emotions in this article. By using fewer options, we can speed up the compilation process and produce a more comprehensive output. It compares the threshold values that the code uses. The values will be transferred for the online service to be completed. Depending on the emotion detected, the music will be played. Each song has an own set of emotions. When an emotion is transferred, the right music is selected; each song’s associated emotions are numbered, sorted, and allocated. To create recommendations, we can utilize a number of models according to their accuracy, though.
612
A. R. Sathya and A. R. Varma
Fig. 53.3 Emotion classification process
It obtained are contrasted with the values that serve as a threshold. In addition to the queue mode and the random mode, there are various alternatives to the emotionbased approach. We can create a queue mode and the final option is the random mode, which selects songs without regard to order or other factors and is also one of the treatments that can lift our spirits. It works similarly to other common music software programs like playlists. Four different emoticons that correspond to the user’s emotions are displayed simultaneously when a music is played based on those emotions. Each emotion is given a number, and this applies to both the music and the emoticons that have been identified (Fig. 53.4).
Fig. 53.4 Music recommendation process flow
53 Emotion-Based Song Recommendation System
613
Fig. 53.5 Emotion-based recommendation system
53.5 Implementation Software Requirements: Python 3.6, Open CV 3.4.2, Android Studio, Pycharm IDE. Libraries: (1) NumPy, (2) pandas, (3) Cv2, (4) streamlit, (5) TensorFlow, and (6) Keras, and the following emotions were identified in this project: (1) Neutral, (2) Fear, (3) Happy, and (4) Angry (Figs. 53.5 and 53.6).
53.6 Conclusion and Future Enhancement In this study, we presented a model to recommend music based on facial expressions that indicate emotion. This project suggested designing and developing a face recognition system-based recommendation system for music based on emotions. The goal of this project is to improve communication between the user and the music system. Because listening to music can alter one’s mood and, for some, serve as a stress reliever, recent developments show that there is a plenty of room to grow for developing an emotion-based music selection system. The current technology therefore provides a face-based identification system that recognizes emotions based on facial expressions and plays music in response. In today’s society, a music player with a facial recognition technology is absolutely necessary for everyone. This system has been further improved with features that can be upgraded in the future. The mechanism for improving music playback that occurs automatically uses facial expression recognition. The RPI camera’s programming interface allows for the detection of facial expression. An alternate approach built
614
A. R. Sathya and A. R. Varma
Fig. 53.6 Emotion classification
on feelings other than revulsion and terror that are not recognized by our system. To assist the automatic playing of music, this feeling was introduced. The music player we use can only be utilized locally, but everything has become portable and efficient to carry. It is now possible to use galvanic skin reaction (GSR) and plethysmography (PPG) physiological sensors to take a person’s emotion into consideration rather than undertaking all the laborious effort. Having enough information like that would allow us to precisely forecast the customer’s mood. This system can profit from having expanded capabilities, but it also needs to be updated frequently because it has more advanced functions. The methodology that improves music playback that occurs automatically is done through detection. Facial expressions are recognized using a programming interface that is integrated into the local machine.
53 Emotion-Based Song Recommendation System
615
References 1. Hussain, S.A., Abdallah Al Balushi, A.S.: A real time face emotion classification and recognition using deep learning model. J. Phys. Conf. Ser. 1432, 012087 (2020) 2. Londhe, R.R., Pawar, V.P.: Analysis of facial expression and recognition based on statistical approach. Int. J. Soft Comput. Eng. 2 (2012) 3. Tambe, P., Bagadia, Y., Khalil, T., Noor, U.S.: Advanced music player with integrated face recognition mechanism. Int. J. Adv. Res. Comput. Sci. Softw. Eng. (2015) 4. Vivek, J.D., Gokilavani, A., Kavitha, S., Lakshmanan, S., Karthik, S.: A novel emotion recognition based mind and soul-relaxing system. In: 2017 International Conference on Innovations in Information, Embedded and Communication Systems, pp. 1–5. IEEE (2017) 5. Ninavin, A.H., Kamalmirnia, M.: A new algorithm to classify face emotions through eye and lip feature by using particle swarm optimization. In: 2012 4th International Conference on Computer Modeling and Simulation 6. Abdul, A., Chen, J., Liao, H.-Y., Chang, S.-H.: An emotion-aware personalized music recommendation system using a convolutional neural networks approach. Appl. Sci. 8(7), 1103 (2018) 7. Gilda, S., Zafar, H., Soni, C., et al.: Smart music player integrating facial emotion recognition and music mood recommendation. In: 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pp.154–158 (2017) 8. Chankuptarat, K., Sriwatanaworachai, R., Chotipant, S.: Emotion-based music player. In: 5th International Conference on Engineering, Applied Sciences and Technology (ICEAST), pp. 1–4 (2019) 9. Gorasiya, T., Gore, A., Ingale, D., Trivedi, M.: Music recommendation based on facial expression using deep learning. In: 2022 7th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, pp. 1159–1165 (2022). https://doi.org/10.1109/ ICCES54183.2022.9835929 10. Chang, S.-H., Abdul, A., Chen, J., Liao, H.-Y.: A personalized music recommendation system using convolutional neural networks approach. In: 2018 IEEE International Conference on Applied System Invention (ICASI), pp. 47–49 (2018)
Chapter 54
A Hybrid Model for Forecasting Stock Prices Using Bayesian and LSTM Rohini Pinapatruni, Faizan Mohammed, Syed Anas Mohiuddin, and Dheeraj Patel
Abstract Forecasting stock price and intraday direction is the main problem in the area of Quantitative Finance. This paper explores the efficacy of Bayesian Long Short-Term Memory Neural Network Model (to be precise LSTM + BNN) in price forecasting. Performance was tested against ML models like Random Forest, XGBoost, and Vanilla LSTM. Public data on Indian stocks of five companies Reliance, Dr Reddy, Dmart, TCS and Hindunilvr is collected from YahooFinance. Performance of the proposed model is measured using root mean square error and mean absolute error. The model is deployed through a webapp, which displays the predictions for stocks along with statistical metrics such as level of confidence and uncertainty with respect to the predictions. Objective of the proposed research is to create a hybrid model B-LSTM and compare the result with existing models which outperforms with low error rate.
54.1 Introduction Over the past decades, there has been a significant progress in the fundamental aspects of information technology, which has transformed the course of business. Financial markets, as one of the most fascinating sectors, have a significant impact on the nation’s economy. The financial markets are extremely dynamic, intricate, and ever-changing systems, where people from all over the world actively trade R. Pinapatruni (B) · F. Mohammed · S. A. Mohiuddin · D. Patel Department of Data Science and Artificial Intelligence, Faculty of Science and Technology, ICFAI Foundation for Higher Education, Hyderabad, India e-mail: [email protected] F. Mohammed e-mail: [email protected] S. A. Mohiuddin e-mail: [email protected] D. Patel e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3_55
617
618
R. Pinapatruni et al.
in currencies, stocks, bonds, commodities, derivatives/options, and other financial products via online trading platforms supported by stock brokers. One of the most quintessential indicators by which the health of any economy can be estimated is by observing the state of the stock market of that economy. The financial well-being of most people in many nations is directly or indirectly affected by fluctuations in the stock market. Modelling stock prices and direction are very important problems in Quantitative Finance. At any moment of time, there are countless factors that affect the price of a stock. All the participants in the market are most probably analyzing every single factor and all publicly available information regarding the company, in order to value the company and figure out the fair price of the stock, given all the available information at that moment in time. Every investor does his own research, has his own biases and comes to maybe different conclusions regarding what the price of a stock should be. Many analysts and academics have come up with methods and tools to predict changes in stock prices and help investors make wise decisions. Researchers can forecast the market utilizing cutting-edge models and non-traditional approaches such as doing sentiment analysis of current textual data from media (Basically NLP), inorder to further gain edge than other market participants. Prediction accuracy has been greatly enhanced by the application of advanced machine learning techniques like text data analytics and ensemble algorithms. Stock market research and prediction remain among the most challenging academic areas because of the dynamic, inconsistent, and chaotic nature of the data.
54.2 Related Work In finance, especially in todays’ day and age of electronic stock exchanges run by computers and dominated by computers executing various investment and trading strategies. There is abundance of stock price data, available easily and publicly. Technical Analysis is a type of analysis, done by traders and investors to predict the trend, trading range and maybe even the stock price in short term. In technical analysis one important assumption made is that all available information at any moment in time is baked into the stock price. Nobody can say with 100% certainty what the future business growth and profitability of a company can be. So, there is a lot of uncertainty regarding the price of the stock. Uncertainty or stochasticity is inherent to the stock market. That’s the reason for proposing the fusion of Long Short-Term Memory (LSTM) [1] models, one of the best models used for modeling sequential data, and Bayesian Neural Networks (takes into account the epistemic uncertainty of data) [2]. Bayesian neural networks have been used for stock price prediction [3]. The most dynamic modeling technique used for stock price modeling deep learning (DL). Specifically, LSTM-based models, which are capturing the sequential natured time series stock data. About four kinds of LSTM LSTM-based approaches
54 A Hybrid Model for Forecasting Stock Prices Using Bayesian and LSTM
619
along with different input data sequence lengths have been demonstrated and compared against each other in [4] for the purpose of predicting stock prices. Various machine learning approaches have been used for the purpose of stock price prediction [5]. Many different types of classical and modern machine learning algorithms can be used for forecasting stock prices. The ones we have chosen to compare against BLSTM model performance are XGBoost [6], Random Forest [7] and LSTM. Random Forest is an ensemble decision tree-based machine learning models. It was shown to be an effective approach for stock price prediction along with vanilla ANN in [8]. In [9], Random Forest and LSTM are used for the purpose of predicting intraday returns of stocks in NIFTY50 stock index of NSE and compared the performance (CAGR of returns generated by both models over the validation period) of both models. Authors proved that deep learning (LSTM) approach outperforms classic machine learning (Random Forest) approach. Orsel and Yamada [10] also compared a variety of LSTM based model architectures and also compared the performance of LSTM models against Kalman Filter and established that LSTM approach outperforms Kalman Filter. Wei [11] also concluded that LSTM is effective for time series forecasting. XGBoost is a very powerful and versatile ML algorithm that leverages boosting to improve prediction quality. XGBoost works very well for stock price prediction in [12] and the performance of XGBoost was compared with LSTM.
54.2.1 Long Short-Term Memory Neural Network (LSTM) The long short-term memory network (LSTM) is a type of recurrent neural network (RNN). LSTM is internally made up of three gates. The three gates are: an input gate, a forgetting gate, and an output gate. All three gates act as valves and only allow useful and essential information in the input to pass through. Relevant information is retained and rest is discarded by the forgetting gate. To avoid exploding and vanishing gradient problem, LSTM is employed instead of RNN. A Python library named TensorFlow is used to create and train the model. The historical stock data table includes various types of information. Figure 54.1 represents the architecture of the LSTM which has two LSTM layers and three Dense layers with swish activation function.
54.2.2 Random Forest The random forest algorithm is composed of a large number of decision trees [13]. First, the method creates random subsets of original training set, each of which forms a decision tree. The classifiers are then ranked using all of the decision trees. Finally, a voting procedure will be used to determine the final class of a single fresh sample.
620
R. Pinapatruni et al.
Fig. 54.1 LSTM model summary
Random forest classifiers construct a set of n decision trees, with each tree calculated over a subset of next samples drawn with replacement from the training set. At each decision split, a random subset of feature characteristics is chosen for each tree, and the decision split is only resolved over this subset. The optimum feature and decision threshold to split on at each split is the feature that minimizes the Gini index.
54.2.3 XGBoost XGBoost [6] is a boosting ensemble method that uses decision trees as base learners/ weak learners. Gradient boosting decision tree algorithm is implemented in the XGBoost package that is designed for speed and performance. Boosting is an ensemble technique in which new additional weak learner models are added sequentially to an ensemble, which allows for the correction of errors in existing previous models already in the ensemble. Models are gradually introduced till no more advancements are possible (loss stops decreasing with increase in no. of weak learners). Objectives of the proposed work is to develop a model and test the efficacy of a Bayesian LSTM (LSTM + BNN) neural network in forecasting the stock price and direction for the following trading day. Results of the existing models are compared with the results of B-LSTM.
54 A Hybrid Model for Forecasting Stock Prices Using Bayesian and LSTM
621
Contributions of the proposed approach are: • To create and explore the efficacy of a Bayesian LSTM (LSTM + BNN) neural network model in effectively predicting the next day’s stock price. Example: If today stock price is 100. Our model will try to predict the next day’s price. • Comparing the performance of Bayesian LSTM model against the other ML models (like Random Forest, XGBoost, Vanilla LSTM etc.) using root mean square error and mean absolute error. • Model is deployed through a website, to display the predictions for selected stocks along with statistical metrics that account for the level of confidence and uncertainty with respect to the predictions. Manuscript is organized as: Brief introduction and existing models are discussed in Sects. 54.1 and 54.2. A detailed discussion of the B-LSTM model is given in Sect. 54.3. Section 54.4 presents the implementation details, tables and graphs. In Sect. 54.5, Conclusions and future scope is discussed.
54.3 Proposed Bayesian-Long Short Term Memory (B-LSTM) Bayesian neural network (BNN) combines neural network with Bayesian inference. Weights and outputs are treated as random variables in BNN and marginal distributions are identified, that best fit the data. Standard neural network and Bayesian neural network is depicted in Fig. 54.2. Creating a BNN is taking a probabilistic approach to deep learning, which allows accounting for uncertainty, so that models can assign less levels of confidence to incorrect predictions [14]. Sources of uncertainty can be found in the data, due
Fig. 54.2 Standard versus Bayesian neural network
622
R. Pinapatruni et al.
to measurement error or noise in the labels, or the model, due to insufficient data availability for the model to learn effectively. To prevent overfitting problem Bayesian neural nets are used to solve problems in domains having less data. BNNs allow estimating uncertainty in predictions, which is a great feature for fields like finance. The unmistakable advantage that BLSTM obviously has over all the other said classical and modern deep learning and machine learning models, is that the concept of uncertainty or the fact that the model predictions will always have associated errors due to aleatoric uncertainty (inherent non-reducible stochasticity of a phenomenon) of the process itself, whose dynamics our neural network in trying to learn, is built into architecture of the neural network itself by letting the kernel and weights be probability distributions. To create the proposed model the data set is prepared by considering stock data of five companies for recent five years. A benchmark model is developed by setting up a basic linear regression with scikit learn library to standardize the parameters. Basic LSTM model is created with Keras. An improved model by combining Bayesian and LSTM models is proposed for obtaining low error rate. Figure 54.3 depicts the entire process flow from collecting the data to creating a model and representing the predicted stocks data using visualization.
54.3.1 Creating a BLSTM Figure 54.4 represents the summary of BLSTM. The very first layer after input layer in the neural network is a LSTM layer, LSTM have been known to be a good model for modelling data that has sequential characteristics. After the first two stacked LSTM layers, Bayesian dense layers with multivariate gaussian distribution as prior for the weights and biases are present. Here, the LSTM layers are used as feature extractors for the sequential input data, akin to how convolutional layers are used for feature extraction for image data. The final output is a univariate normal distribution, whose parameter, mean and variance are inferred from the input data by the BLSTM. Adam [15] is used as optimizer.
54.3.2 Inference for Bayesian LSTM The output of BLSTM is stochastic, basically for the same input the output is different every time inference is performed. So, to get proper output, the inference is performed multiple times and the average of both the output parameters, mean and standard deviation are considered. Number of times is considered as 100 arbitrarily to perform inference for every input.
54 A Hybrid Model for Forecasting Stock Prices Using Bayesian and LSTM
623
Fig. 54.3 Process flow of the proposed B-LSTM
54.4 Experimental Results Four models LSTM (Long Short-Term Memory Neural Network), XGBoost (Extreme Gradient Boosting), Random Forest., and proposed Bayesian LSTM were built. Data used for training and testing the models has been procured from a python library called yfinance. Last five years of daily stock data i.e., from 21-03-2017 to 02-04-2022 is taken. Data of five companies Reliance, Dr. Reddy, Dmart, TCS, and Hindunilvr is considered. Intel I5 processor with 8 GB Ram, 3 GB Nvidia GTX 1050 Graphic Card is used for the implementation. The processor has 4 cores which runs at 2.2 GHz. Training phase takes 15–20 min of time and testing phase takes few minutes to make
624
R. Pinapatruni et al.
Fig. 54.4 Summary of BLSTM
predictions and calculate various error metrics. Features considered for predicting stocks is shown in Table 54.1. The features or attributes used by the model to predict stock prices are Open, High, Low, Close, Volume, Return made last day, and Target. Figure 54.5 shows the features considered in stock data. Performance of the models is evaluated using mean absolute error and root mean square error. Table 54.1 Represents the features considered for stock prediction of a day Feature name
Description
Open
Price at which the stock gets traded for first time, daily when the market opens
High
Highest price at which stock gets traded
Low
Lowest price at which stock gets traded
Adj close
Last price at which the stock gets traded
Volume
Total number of stocks that were traded
returns_lag1
Percentage change between the day before yesterday’s close price and yesterday’s close price
Target
Percentage change from today’s close price to tomorrow
54 A Hybrid Model for Forecasting Stock Prices Using Bayesian and LSTM
625
Fig. 54.5 Features of stock data
54.4.1 Mean Absolute Error Statistical metrics are regression error measures that are utilized to calculate risks. Model assessment is crucial, and it must be done in order to decrease risks and improve model performance. Mean Absolute Error is used with regression models. Equation 54.1 depicts mean absolute error Mmae , where z p is estimated value, z p is actual value, and q represents data points. Δ
Mmae =
q | 1 ∑|| z p − z p| q p=1 Δ
(54.1)
54.4.2 Root Mean Square Error Standard deviation of residuals known as root mean square error is represented in Eq. 54.2 which is used in regression analysis for results verification. [ ) | q ( |∑ z p − z p 2 √ = q p=1 Δ
Rrmse
(54.2)
54.4.3 Performance Evaluation and Data Visualization Stock data of five companies is considered for assessing the performance. Table 54.2 represents the errors obtained by using existing models and the proposed BLSTM. From Table 54.2, it is observed that the proposed BLSTM gives less error.
626
R. Pinapatruni et al.
Table 54.2 Represents the error generated by existing models and proposed model Company
Random forest
XGBoost
Mmae
Rrmse
Mmae
Rrmse
LSTM Mmae
Rrmse
BLSTM Mmae
Rrmse
Reliance
0.6431
0.8023
0.5787
0.7728
0.0328
0.0433
0.0153
0.0197
Dr. Reddy
0.8898
0.7915
0.5966
0.8427
0.0249
0.0355
0.0149
0.0193
Dmart
0.8783
0.9372
0.6110
0.8751
0.0345
0.0470
0.0205
0.0256
TCS
0.6685
0.8166
0.5886
0.8052
0.0149
0.0203
0.0125
0.0160
Hindunilvr
0.8249
0.9584
0.6347
0.8659
0.0214
0.0284
0.0130
0.0167
Data visualization is a critical stage in data analysis. It is a method of communicating the study and data (set) conclusions using interactive charts and plots. Streamlit is used to build applications for your machine learning project using simple Python scripts [16]. It also supports hot-reloading, which allows the app to update in real time if any content is changed. Graphs representing the true price, predicted price and 95% confidence intervals of Reliance, Dr Reddy, Dmart, TCS and Hindunilvr are shown in Figs. 54.6, 54.7, 54.8, 54.9 and 54.10 A web application is designed to deploy BLSTM for displaying the predictions for selected stocks along with statistical metrics that account for the level of confidence and uncertainty with respect to the predictions. Figure 54.11 shows web page of the stock market prediction of reliance data set. Stock predictions with respect to the features close, open, high and volume are shown in Figs. 54.12, 54.13 and 54.14 respectively. Opening and closing prices of stocks of a specific time period are shown in Fig. 54.12. Stocks having high prices during 2017 to 2022 is depicted in Fig. 54.13. Increase of the stock prices can be observed from the given graph.
Fig. 54.6 Visualization of true price, predicted price and 95% confidence interval of Reliance
54 A Hybrid Model for Forecasting Stock Prices Using Bayesian and LSTM
627
Fig. 54.7 Visualization of true price, predicted price and 95% confidence interval of Dr Reddy
Fig. 54.8 Visualization of true price, predicted price and 95% confidence interval of Dmart
Total number of stocks that were traded can be represented by the feature volume. Figure 54.14 represents the number of stocks that were traded during the five years period considered.
54.5 Conclusions and Future Scope In forecasting stock price, proposed hybrid model Bayesian LSTM performs the best with RMSE and MAE far less than the said other deep learning and machine learning models XGBOOST, RANDOM FOREST, and vanilla LSTM. The unmistakable
628
R. Pinapatruni et al.
Fig. 54.9 Visualization of true price, predicted price and 95% confidence interval of TCS
Fig. 54.10 Visualization of true price, predicted price and 95% confidence interval of Hindunilvr
advantage that BLSTM obviously has over all the other said classical and state of the art models, is uncertainty. Model predictions will always have associated errors due to aleatoric uncertainty of the process itself. The developed neural network tries to learn the dynamics by letting the kernel and weights by probability distributions. The proposed work can be improved by using probability distributions other than multivariate normal distribution as prior for the weights and biases, by using different architectures for the neural network, by changing the length of input sequence by increasing the number of days and to consider more features.
54 A Hybrid Model for Forecasting Stock Prices Using Bayesian and LSTM
Fig. 54.11 Webpage showing the predictions of Reliance stock data
Fig. 54.12 Graph for open and close
629
630
R. Pinapatruni et al.
Fig. 54.13 Graph for high
Fig. 54.14 Graph for volume
References 1. Hochreiter, S., Schmidhuber, J.: Long short_term memory (1997) 2. Blundell, C., Comebise, J., Kavukcuoglu, K., Wierstra, D.: Weight Uncertainty in Neural Networks (2015) 3. Chandra, R., He, Y.: Bayesian neural networks for stock price forecasting before and during COVID-19 pandemic (2021) 4. Mehtab, S., Sen, J., Dutta, A.: Stock Price Prediction Using Machine Learning and LSTMBased Deep Learning Models (2020). arXiv 5. Singh, G.: Machine Learning Models in Stock Market Prediction (2022) 6. Chen, T., Guestrin, C.: XGBoost: A Scalable Tree Boosting System (2016). arXiv 7. Breiman, L.: Random forests. Mach. Learn. (2001)
54 A Hybrid Model for Forecasting Stock Prices Using Bayesian and LSTM
631
8. Vijh, M., Chandola, D., Tikkiwal, V.A., Kumar, A.: Stock closing price prediction using machine learning techniques (2020) 9. Ghoush, P., Neufeld, A., Sahoo, J.K.: Forecasting directional movements of stock prices for intraday trading using LSTM and random forests (2021) 10. Orsel, O.E., Yamada, S.S.: Comparative study of machine learning models for stock price prediction (2022) 11. Wei, D.: Prediction of stock price based on LSTM neural network (2019) 12. Gumelar, A.B., Setyorini, D. P. Adi, S. Nilowardno, L. A. Widodo, A. T. Wibowo, M. T. Sulistyono and E. Christine, Boosting the Accuracy of Stock Market Prediction using XGBoost and Long Short-Term Memory, 2020. 13. J. R. Quinlan, Introduction of decision trees, Machine Learning, 1986. 14. L. v. Jospin, H. Laga, F. Boussaid, W. Buntine and M. Bennamoun, Hands-on Bayesian Neural Networks - A Tutorial for Deep Learning Users, arXiv, 2022. 15. D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, arXiv, 2014. 16. H. Waghmare, “Review on Frameworks Used for Deployment of Machine Learning Model,” Ijraset Journal For Research in Applied Science and Engineering Technology, 2022
Author Index
A Adedeji, Taiwo, 31 Adepu Kirankumar, 315 Aditya Deepak Joshi, 231 Aditya, K. V., 291 Ahamad, Shahanawaj, 529 Ahilan, A., 465, 495, 503, 519 Ahmed, Seif, 1 Aika Pavan Kumar, 243 Akhila Susarla, 111 Akhmetshin, Elvir, 369 Alani, Sameer, 399 Al-Azzawi, Waleed Khalid, 399 AlFallooji, Mohanad A., 441 Alhamd, M. W., 433 Alkhafaji, Mohammed Ayad, 453 Alluri Raghav Varma, 607 Alsalhy, Mohammed Jameel, 415 Althubiti, Sara A., 351 Anil Shirgire, 511 Anirudh Ramesh, 219 Ankur Gupta, 529 Anna Devi, E., 465 Anupama, C. S. S., 351, 369 Apinaya Prethi, K. N., 453 Arantla Jaagruthi, 339 Arunkumar, C., 339 Arup Abhinna Acharya, 77 Avala Raji Reddy, 315 Avanija, J., 315 Azar, Ahmad Taher, 89
B Balachandar, S., 567 Bhavsar, Arpan, 189
Bhuvanesh, A., 503
C Chandra Sekhar, M., 553 Charan Babu, T. K., 305 Ch. Bala Subrmanyam, 243 Chellam, S., 473, 519 Chen, Shuwen, 131, 141 Chennapragada V. S. S. Mani Saketh, 111 Chinnaiyan, R., 567 Chinthapalli Karthik, 305 Chirag Arora, 201
D Deena Nath Gupta, 529 Deepak Arora, 585 Deepak S. Sakkari, 539 Dharmesh Dhabliya, 529 Dheeraj Patel, 617 Divya, R., 567 Duddugunta Bharath Reddy, 207 Dukka Ravi Ram Karthik, 111
E Eugine Prince, M., 465
F Faizan Mohammed, 617
G Galiveeti Poornima, 539
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 V. Bhateja et al. (eds.), Intelligent Data Engineering and Analytics, Smart Innovation, Systems and Technologies 371, https://doi.org/10.1007/978-981-99-6706-3
633
634 Ganga, M., 473 Garvit Luhadia, 231 Gegov, Alex, 31 Giridhar Sai, K., 265 Good, Alice, 31 Gopal K. Shyam, 379 Gopichand, G., 219 Gopi Krishna Yanala, 243 Gurram Sunitha, 315 Gutha Jaya Krishna, 69
H Hamdy, Abeer, 1 Hariz, Hussein Muhi, 399 Hussain, Zahraa N. Abdul, 415
J Jafar Ali Ibrahim, S., 579 Jasmine Gnana Malar, A., 465, 473, 511, 519 Jaya Lakshmi, T., 19, 57, 111 Jegatheesh, A., 503 Jenice Prabhu, A., 495 Jeya Kumar, V., 579 Jilani, AbdulKhadar, 481 John, Densy, 481 Josephin Shermila, P., 465 Jothin, R., 503, 519
K Kakarla Pranay, 111 Kalaivani, J., 595 Kalyan Chakravarthy, N. S., 579 Karthikeyan, P., 539 Karumuru Sai Vinaya, 339 Kaushik Sekaran, 595 Kollati Vijaya Kumar, 351, 369 Kotha Sita Kumari, 207 Kousar Nikhath, A., 255 Krishnamoorthi, S., 379 Kumar, S. N., 495, 511
L Lakshmi Alekhya Jandhyam, 177 Laxmi Lydia, E., 351, 369 Lay-Ekuakille, Aimé,, 89 Leelavathy, N., 265, 279, 291 Lim, Lily, 165 Lin, Jerry Chun-Wei, 89 Liu, Bangli, 31
Author Index M Mahmood, Sarmad Nozad, 399 Malay Tripathi, 585 Mallu Varshitha, 339 Mani Deepika, M., 579 Manjunath, K. V., 553 Manujakshi, B. C., 539 Maskey, A., 567 Mohan Krishna, S., 305 Mohideen, Alhaf Malik Kaja, 579 Murali Krishna Enduri, 57 Murih, Ban Kadhim, 433 Mutar, Mohammed Hasan, 399
N Nageswari, D., 503, 511 Namita Mishra, 97 Nandini, Y. V., 19, 57, 111 Narayana Satyala, 177 Naser, Zamen Latef, 433 Neel Gandhi, 153 Nithya, S., 453 Ni, Yiyang, 131
O Obaid, Ahmed J., 453 Omisade, Omobolanle, 31
P Paladugu Ujjwala, 207 Pallavi, R., 539 Parimala, V., 473 PhaniKumar, D., 305 Prasanth, Anupama, 481 Pratik P. Shastrakar, 121 Prior, Amie-Louise, 31
Q Qader, Aryan Abdlwhab, 399 Qu, Hui, 131, 141
R Ragupathy Rengaswamy, 177 Rajesh, M., 495 Rasol, Mustafa Asaad, 399 Ravi Shankar Prasad, 97 Reddy Madhavi, K., 315 Reeja, S. R., 189 Rema, M., 327
Author Index
635
Rohini Pinapatruni, 617 Rohit Anand, 529 Rohit Kumar Bondugula, 45 Rohit Saxena, 585
Surendran, Priyanka, 481 Suresh Satapathy, 189 Swarnamugi, M, 567 Syed Anas Mohiuddin, 617
S Sabarish, B. A., 339 Sabarmathi, G., 567 Sajithra Varun, S., 465 Sandhya, N., 255 Sanskruti A. Zade, 121 Satheesh Kumar Palanisamy, 453 Sathiamoorthy, J., 519 Sathya, A. R., 607 Sathya, M., 579 Sattibabu, D., 291 Satya Ranjan Dash, 77, 97 Satyasundara Mahapatra, 585 Satya Uday Sanku, 19 Sayeeda Khanum Pathan, 255 Seetharamulu, B., 595 Sengar, Sandeep Singh, 31 Shakti Mishra, 153 Shantipriya Parida, 97 Shao, Jiaqi, 131, 141 Shukur, Ali A., 441 Siba Kumar Udgata, 45 Siddharth Verma, 89 Singhdeo, Tamanna Jena, 189 Sitharthan, R., 495, 511 Snehal V. Laddha, 121 Sourabh Singh, 89 Sparshi Gupta, 89 Sravani, N., 279 Sridara Pandian, G., 219 Sridevi Sakhamuri, 243 Srujan Raju, K., 315 Subashini, R., 327 Subhasree Methirumangalath, 327 Sudha, Y., 539 Sujatha, B., 265, 279, 291, 305 Sujatha Therese, P., 465 Sultanova, Sevara, 369 Sunanda Das, 567 Surendiran, R., 503, 519
T Tamilkodi, R., 265, 279, 291 Thanuja Pavani Satti, 19 Thomas, Bindhya, 481 Toptan, Carrie, 31 Trupti Patil, 529 Tryfona, Catherine, 31
U Umakanta Dash, 77 Usha, M., 495
V Vallisree, S., 495, 519 Varagani Durga Shyam Prasad, 243 Varun Rajan, 327 Vasant Tholappa, 219 Vayigandla Neelesh Gupta, 339 Veerasamy, B., 511 Velaga Sai Sreeja, 207 Venkatesh, B., 255, 595 Vijayalaxmi C. Handaragall, 595 Vijayarajan, V., 231 Vikrant Bhateja, 89 Vinoth Kumar, V., 231 Vishal Nagar, 585
W Wang, Jiaji, 131, 141 Wang, Vincent Xian, 165 Wang, Ziyi, 131, 141 Winston, Joy, 481
Z Zhou, Shang-Ming, 31